Database Systems, CSCI 4380-01 Homework # 4 Due Monday March 15, 2021 at 11:59:59 PM EDT Homework Statement. This homework is worth 5% of your total grade. This homework will concentrate on SQL skills. You will be able to test your queries against real data and expected output. If you want to create the database on your local database server, you can find the database dump file at: https://www.cs.rpi.edu/academics/courses/spring21/csci4380/_downloads/ministreaming.dmp You can create a database for this data, let’s call it ministreaming on psql shell: psql> create database ministreaming; and then load the data using the following Unix command (on Linux and Macs): cat ministreaming.dmp | psql ministreaming If you get a message similar to this: CREATE EXTENSION psql:ministreaming.dmp:28: ERROR: must be owner of extension plpgsql it is safe to ignore it. The data model for this homework is given in the end of this homework. We have data regarding Movies and TV Shows and which streaming platforms they are available in. This data is sampled down from a much larger database for simplicity (less than 2% of the full data) and has potentially some noise. We will use the database “as is” and not fix issues unless they are major. 1 Database Server Use Rules If you want to install and create the database on your own computer, you can use the data scripts I used. You do not have to have a database server installed to do this homework. This database will be created as ministreaming on the shared database server at: http://rpidbclass.info Feel free to use it for testing your queries, but please be considerate of others when using the server. Here are a few ground rules: Server response to load can be unpredictable. So, be patient if you see some slow down. This is a medium sized database for a 100+ student class, so you can expect a serious slow down near the homework deadline. Please do not wait until the last minute to submit your homework. Test your queries one at a time. Best setup is using a browser and a text editor. Write queries elsewhere and test in the server with cut and paste. Make every effort to read your queries before submitting. A forgotten join condition may cause a disaster. Check join conditions first, then run your queries. Remember, if you have an unresponsive query, it will continue to use system resources even if you quit your browser. So, opening a new browser window will solve your problem but may slow things down for everyone. Queries should terminate after 2 minutes, so if you increased the load with a bad query, then wait for your query to end before submitting another. If you are experiencing problems with a query, read it carefully before running it again in a separate window. Remember: missing join conditions can be the difference between: 2,094,266,610,000 tuples and 3335 tuples. If the server is not responsive, let us know on Submitty/Webex Teams. I will see if some jobs need to be killed or whether server needs to be made more powerful. Please be patient. Please do not include a query that does not run in your homework submission. I will run all your queries in a batch job and an incomplete query will cause me a great deal of problems. 1 Problem Description Write the following queries in SQL. In all your queries, use the simplest possible expression possible. Do not forget your JOIN conditions and pay attention to returned attributes and ordering of tuples. Returned attributes should be listed in exactly the same order in which they are listed in the description. Attribute names should be the same as in the corresponding tables unless a different name is requested in parentheses. Query 1 Return the title, platform and number of seasons of series with at least 5 but no more than 10 seasons while having IMDB rating of 4 or lower. Order results by title ascending and platform descending. Query 2 Find the average Rotten Tomatoes rating (avgrottentomatoes) of series that have no date when the series was added and have either no age group the series is intended for or the age group is ’18+’. When computing the average, only consider non-negative Rotten Tomatoes ratings. The value of avgrottentomatoes should be output rounded to two decimal places (you may use ROUND() function). 2 Query 3 Return the title, year, first director (director1), second director (director2), and lan- guage of all movies made after 2000, available in at least one of the following languages: English, Spanish, or French, and having at least two directors who have the same last name ’Wachowski’ but who are different people (i.e., the values of director attribute needs to be different but contain ’Wachowski’). director1 should always be lexicographically smaller than director2. Both director1 and director2 have to be Wachowski, even if there are other direc- tors who are not Wachowski. Also, there should be no duplicate rows in the output. Order results by language ascending, year descending, and finally title ascending. Query 4 Return the id (id) and title of all movies and series that are animation but not horror. Return an additional column called streamingtype that has the value 'movie' or 'series' depending on the source. Order results by the title in descending order and streamingtype in ascending order. Query 5 Return the total number of tuples in the series relation (numtuples), total number of tuples with an imdbrating value (numimdbrating), earliest year and last year for a series (minyear, maxyear), average minimum age series are intended for (avgcontentrating), and average number of seasons in series (avgseasons). Ignore any tuples with contentrating values that are not in the d+ format where d is a whole number. Format your averages with only two decimal places! Remember to carry out all arithmetic operations with full precision and only show two decimal places when producing the final output. For example, do not round values before summing or averaging them. Note: This query requires you to do type casting. Often there are minor differ- ences in floating point operations on different Postgresql servers. So, for simplic- ity I have decided to ask everyone to use the same casting: cast contentrating as an integer and averages as numeric(5,2). Format for doing this is as follows: select '18'::int, '4.1234123412'::numeric(5,2) ; Query 6 Find directors who have directed both a movie and a series available in the same country of either United States or India. Return the name of directors (director) and the total number of movies (nummovies) and series (numseries) they have directed that are available in one of these countries. Order by the nummovies, numseries, and director ascending. Query 7 Find platforms that have more than 40 movies. Return the name of these platforms (platform), and the total number of movies (nummovies) and languages (numlanguages) that are available on each of these platforms. Order results by nummovies descending, platform ascending, and numlanguages descending. Query 8 For each language that does not have movies with the Rotten Tomatoes rating of at least 60, return the language, the total number of genres (numgenres) and movies (nummovies) in that language, and the spread of Rotten Tomatoes ratings (spreadrotten) for these movies. Spread is defined as the absolute value of the difference between the maximum and the min- imum values. Order results by spreadrotten descending, nummovies descending, numgenres descending, and language ascending. Query 9 For all cast members who appeared in the same movie with Adam Sandler in more than one movie, return their name and the number of times they appeared together with Adam Sandler (nummovies). Order results by nummovies descending and castname ascending. Query 10 Return the name of all cast members who were cast in a series released after 2000 and have also directed a series but were not cast in any movies. Order the results by castname ascending. 3 SUBMISSION INSTRUCTIONS. You will use Submitty for this homework. Please submit each query to the appropriate box in Submitty. You must add a semi-colon at the end of each query to make sure it runs properly. If you want to provide any comments, all SQL comments must be preceded by a double hyphen: -- Example comment. 4 Database Schema This database is merged from multiple datasets, containing data for movies and TV Series, both appearing in various streaming platforms. Note that this is real data, scraped from websites and then parsed, so it may be noisy. Make a habit of using simple queries to first explore the data to understand various issues. create table movies( movieid int primary key , title varchar(1000) , year int -- year the movie was made , contentrating varchar(10) -- age group the movie is intended for , imdbrating float -- imdb rating , rottentomatoes float -- rotten tomatoes rating , runtime int , date_added date -- date movie is added to Netflix , duration varchar(40) , description text ) ; -- Which countries the movie is available in create table moviescountry ( movieid int , country varchar(100) , primary key (movieid, country) , foreign key (movieid) references movies(movieid) ) ; -- Directors of the movie create table moviesdirectors ( movieid int , director varchar(100) , primary key (movieid, director) , foreign key (movieid) references movies(movieid) ) ; create table moviesgenres ( movieid int , genre varchar(100) , primary key (movieid, genre) , foreign key (movieid) references movies(movieid) ) ; -- Languages the movie is available in create table movieslanguages ( movieid int , language varchar(100) , primary key (movieid, language) , foreign key (movieid) references movies(movieid) ) ; -- People who starred in the movie create table moviescast ( movieid int , castname varchar(100) , primary key (movieid, castname) , foreign key (movieid) references movies(movieid) ) ; 5 -- Which platform the movie is in (if ispresent is True), -- or not in (if ispresent is False) create table moviesonplatform ( movieid int , platform varchar(100) , ispresent boolean , primary key (movieid, platform) , foreign key (movieid) references movies(movieid) ) ; -- TV series on various platforms create table series ( seriesid int primary key , title varchar(400) , yearreleased int , contentrating varchar(40) -- age group the series is intended for , imdbrating float -- imdb rating , rottentomatoes int -- rotten tomatoes rating , description text , seasons int -- how many seasons are available , date_added date -- date series is added to Netflix ) ; -- Which platform the series is available in create table seriesonplatform ( seriesid int , platform varchar(100) , primary key (seriesid, platform) , foreign key (seriesid) references series(seriesid) ) ; -- People who starred in the series create table seriescast ( seriesid int , castname varchar(100) , primary key (seriesid, castname) , foreign key (seriesid) references series(seriesid) ) ; -- Categories for series create table seriescategory ( seriesid int , category varchar(100) , primary key (seriesid, category) , foreign key (seriesid) references series(seriesid) ) ; -- Countries the series is available in create table seriescountry ( seriesid int , country varchar(100) , primary key (seriesid, country) , foreign key (seriesid) references series(seriesid) ) ; -- Directors for the series create table seriesdirectors ( seriesid int 6 , director varchar(100) , primary key (seriesid, director) , foreign key (seriesid) references series(seriesid) ) ; -- Genre from series create table seriesgenres ( seriesid int , genre varchar(100) , primary key (seriesid, genre) , foreign key (seriesid) references series(seriesid) ) ; 7
欢迎咨询51作业君