2020/11/10 COMP3311 20T3 - Assignment 3 https://cgi.cse.unsw.edu.au/~cs3311/20T3/assignments/ass3/index.php 1/6 COMP3311 20T3 Assignment 3 Python, Psycopg2 and IMDB Database Systems Last updated: Tuesday 10th November 12:43pm Most recent changes are shown in red ... older changes are shown in brown. [Assignment Spec] [Database Design] [SQL Schema] [Sample Outputs] Aims This assignment aims to give you practice in manipulating a moderately large database (IMDB) implementing SQL views to satisfy requests for information implementing PLpgSQL functions to satisfy requests for information implementing Python scripts to extract and display data The goal is to build some useful data access operations on the Internet Movie Database (IMDB), which contains a wealth of information about movies, actors, etc. You need to write Python scripts, using the Psycopg2 database connectivity module, to extract and display information from this database. Summary Submission: Login to Course Web Site > Assignments > Assignment 3 > [Submit] > upload required files, or on a CSE server, give cs3311 ass3 RequiredFiles Required Files: xtras.sql, helpers.py, best, rels, minfo, bio Deadline: 15:00 Monday 23 November Marks: 14 marks toward your total mark for this course Late Penalty: 0.1 marks off the ceiling mark for each hour late How to do this assignment: read this specification carefully and completely create a directory for this assignment unpack the supplied files into this directory login to grieg and run your PostgreSQL server remove your Assignment 2 database set up a copy of the supplied database (must be called ass3) complete the tasks below by editing the files xtras.sql ... put any views or functions here helpers.py ... put Python helper functions here best ... Python script to list best movies rels ... Python script to show world releases for a Movie minfo ... Python script to show cast+crew for a Movie bio ... Python script to show biography/filmography of a Name submit these files via WebCMS (you can submit multiple times) Details of the above steps are given below. Note that you can put the files wherever you like; they do not have to be under your /srvr directory. You also edit your SQL files on hosts other than grieg. The only time that you need to use grieg is to manipulate your database. Introduction 2020/11/10 COMP3311 20T3 - Assignment 3 https://cgi.cse.unsw.edu.au/~cs3311/20T3/assignments/ass3/index.php 2/6 The Internet Movie Database (IMDB) is a huge collection of information about all kinds of video media. It has data about most movies since the dawn of cinema, but also a vast amount of information about TV series, documentaries, short films, etc. Similarly, it holds information about the people who worked on and starred in these video artefacts. It also hold viewer ratings and crticis reviews for video artefacts as well as a host of other trivia (e.g. bloopers). The full IMDB database is way too large to let you all build copies of it, so we have have created a cut-down version of the database that deals with well-rated movies from the last 60 years. You can find more details on the database schema in the [Database Design] page. Some comments about the data in our copy of IMDB: there seems to be preponderance of recent Bollywood movies; some of the aliases look incorrect (e.g. for "Shawshank Redemption"); the data only goes to mid 2019, so you won't find recent blockbusters. Doing this Assignment This section describes how to carry out this assignment. Some of the instructions must be followed exactly; others require you to exercise some discretion. The instructions are targetted at people doing the assignment on Grieg. If you plan to work on this assignment at home on your own computer, you'll need to adapt the instructions to "local conditions". If you're doing your assignment on the CSE machines, some commands must be carried out on grieg, while others can (and probably should) be done on a CSE machine other than grieg. In the examples below, we'll use grieg$ to indicate that the comand must be done on grieg and cse$ to indicate that it can be done elsewhere. The first step in setting up this assignment is to set up a directory to hold your files for this assignment. cse$ mkdir /my/dir/for/ass3 cse$ cd /my/dir/for/ass3 cse$ unzip /home/cs3311/web/20T3/assignments/ass3/ass3.zip Archive: /home/cs3311/web/20T3/assignments/ass3/ass3.zip inflating: best inflating: bio inflating: helpers.py inflating: minfo inflating: rels inflating: xtras.sql Note that the database dump is quite large. Do not copy into your assignment directory on the CSE servers, because you only need to read it once to build your database (see below). If you're working at home, you will need to copy it onto your home machine to load the database. The next step is to set up your database: ... login to Grieg and source env as usual ... grieg$ dropdb imdb ... if you already had such a database grieg$ createdb imdb grieg$ bzcat /home/cs3311/web/20T3/assignments/ass3/database/imdb.dump.bz2 | psql imdb grieg$ psql imdb ... examine the database contents ... Note the database contains non-ascii characters, and so you will need to make sure that your PostgreSQL server uses UTF8 encoding (and corresponding collation) before it will load. Loading the database should take less than 10 seconds on Grieg, assuming that Grieg is not under heavy load. (If you leave your assignment until the last minute, loading the database on Grieg will be considerably slower, thus delaying your work even more. The solution: at least load the database Right Now, even if you don't start using it for a while.) (Note that the imdb.dump file is 20MB in size; copying the compressed version under your home directory or your srvr/ directory, and then decompressing it, is not a good idea). If you have other large databases under your PostgreSQL server on Grieg or you have large files under your /srvr/YOU/ directory, it is possible that you will exhaust your Grieg disk quota. In particular, you will not be able to store a copy of the MyMyUNSW database as well as the IMDB database under your Grieg server. The solution: remove any existing databases before loading your IMDB database. 2020/11/10 COMP3311 20T3 - Assignment 3 https://cgi.cse.unsw.edu.au/~cs3311/20T3/assignments/ass3/index.php 3/6 If you're running PostgreSQL at home, the file ass3.zip contains copies of the required files, but not the database, to get you started. You will need to download the database dump file separately. If you copy ass3.zip and imdb.dump.bz2 to your home computer, unzip it, and perform commands analogous to the above, you should have a copy of the IMDB database that you can use at home to do this assignment. Think of some questions you could ask on the database (e.g. like the ones in the Online Problem-solving Sessions) and work out SQL queries to answer them. One useful query is imdb=# select * from dbpop(); This will give you a list of tables and the number of tuples in each. Your Tasks Answer each of the following questions by writing a Python/Psycopg2 script. You can add any SQL views or PLpgSQL functions that are used in your Python scripts into the file xtras.sql You can add any functions that you want to share among the Python scripts into the file helpers.py. If you want to use any other Python modules, make sure that they are available on Grieg; your submitted code has to run on Grieg using the Python installation there. You can change the indentation from the two-space indent in the templates. Hint: use PostgreSQL's case-insensitive regular expression pattern matching operator (~*) for matching partial names and titles. If you do this, it has the added advantage that your command-line arguments can be in any case you like, and you can even put regular expression chars into them. Q1 (3 marks) Complete the script called "best" so that it prints a list of the top N highest-rating movies (default N = 10). The script takes a single command-line argument which specifies how many movies should appear in the list. Movies should be ordered from highest to lowest rating, and should be displayed as, e.g. $ ./best 5 9.8 Randhawa (2019) 9.6 Fan (2019) 9.6 Mama's Heart. Gongadze (2017) 9.5 Ananthu V/S Nusrath (2018) 9.5 Family of Thakurganj (2019) Within groups of movies with the same rating, movies should be ordered by title. For more examples of how the script behaves, see [Sample Outputs]. Q2 (4 marks) Complete the script called "rels" so that it prints a list of the different releases (different regions, different languages) for a movie. The script takes a single command-line argument which gives a part of a movie name (could be the entire name). If there are no movies matching the supplied partial-name, then you should print a message to this effect and quit the program, e.g. grieg$ ./rels xyzzy No movie matching 'xyzzy' If the partial-name matches multiple movies, simply print a list of matching movies, e.g. grieg$ ./rels mothra Movies matching 'mothra' =============== Mothra (1961) Mothra vs. Godzilla (1964) Godzilla and Mothra: The Battle for Earth (1992) Godzilla, Mothra and King Ghidorah: Giant Monsters All-Out Attack (2001) 2020/11/10 COMP3311 20T3 - Assignment 3 https://cgi.cse.unsw.edu.au/~cs3311/20T3/assignments/ass3/index.php 4/6 If the partial name matches exactly one movie, then print that movie's title and year, and then print a list of all of the other releases (aliases) of the movie. For each alias, show at least the title. If a region exists, add this, and if a language is spcified, add it as well, e.g. grieg$ ./rels 2001 2001: A Space Odyssey (1968) was also released as '2001' (region: XWW, language: en) 'Two Thousand and One: A Space Odyssey' (region: US) '2001: Odisea del espacio' (region: UY) '2001: Een zwerftocht in de ruimte' (region: NL) Movie releases should be ordered accoring to the ordering attribute in the Aliasestable. Note that if there are two movies with exactly the same original title, you will not be able to view their releases, since the title alone does not distinguish them. We consider this problem in the next question. For more examples of how the script behaves, see [Sample Outputs]. Q3 (5 marks) Complete the script called "minfo" so that it prints a list of cast and crew for a movie. The script takes a command-line argument which gives a part of a movie name (could be the entire name). It also takes an optional command-line argument, which is a year and can be used to distinguish movies with the same title (or, at least, titles which match the partial movie name). grieg $ ./minfo Usage: minfo 'MovieTitlePattern' [Year] If there are no movies matching the supplied partial-name, then you should print a message to this effect and quit the program, e.g. grieg$ ./minfo xyzzy No movie matching 'xyzzy' If the partial-name matches multiple movies, simply print a list of matching movies, the same as in Q2. If you need to disambiguate, either use a larger partial-name or add a year to the command line. If a longer partial-name and year still doesn't disambiguate, print a list of matching movies as above. If the command-line arguments identify a single movie, then print the movie details (title and year), followed by a list of the principal actors and their roles, followed by a list of the principal crew members and their roles. The list of actors should be sorted according to the ordering attribute in the Principals table (i.e the biggest star comes first). The list of crew members should also be sorted according to the ordering attribute in the Principals table. For more examples of how the script behaves, see [Sample Outputs]. Q4 (6 marks) Complete the script called "bio" so that it prints a filmography for a given person (Name), showing their roles in each of the movies they are associated with. The script takes a command-line argument which gives a part of a person's name (could be the entire name). It also takes an optional command-line argument, which is a year (their birth year) and which can be used to distinguish people with similar names. grieg$ ./bio Usage: bio 'NamePattern' [Year] If the name pattern doesn't match anyone in the Names table, then you should print a message to this effect and quit the program, e.g. grieg$ ./bio Smmith No name matching 'Smmith' 2020/11/10 COMP3311 20T3 - Assignment 3 https://cgi.cse.unsw.edu.au/~cs3311/20T3/assignments/ass3/index.php 5/6 If the name pattern matches more than one person, then print a list of the matching people, with the years they were born and died in parentheses afer the name. grieg: ./bio rooney Names matching 'rooney' =============== Darrell Rooney (???) Frank Rooney (1913-) Jennie Rooney (???) Mickey Rooney (1920-2014) Nancy Rooney (???) Rooney Mara (1985-) Sharon Rooney (1988-) Note that some people do not have death years or birth years. If the name pattern, optionally combined with a year, matches a single person, then you should print their name and birth and death years, followed by a list of all the films they have been a principal in. Films should be in chronological order; if there are multiple films in a given year, order them by title within the year. For each film, show first any acting roles they had, including the role they played, and then any production crew roles they had. Note that one person could be both and actor and director in the same movie. grieg$ ./bio 'spike lee' Filmography for Spike Lee (1957-) =============== She's Gotta Have It (1986) playing Mars Blackmon -- acting role as Director -- crew role as Writer School Daze (1988) as Director as Writer Do the Right Thing (1989) as Director as Writer Mo' Better Blues (1990) playing Giant as Director as Writer ... etc. etc. etc. ... For more examples of how the script behaves, see [Sample Outputs]. Style (2 marks) Your programming style will be marked according to the following criteria consistent indentation (somewhat forced by Python) meaningful variable/function names effective abstraction (use of functions) efficient use of the database There is a performance requirement in each of the tasks. Any task that takes longer than 2 seconds on any our test cases will be penalised for that case, even if it eventually produces the correct result. Submission and Testing We will test your submission as follows: create a testing subdirectory containing your xtras.sql and Python scripts 2020/11/10 COMP3311 20T3 - Assignment 3 https://cgi.cse.unsw.edu.au/~cs3311/20T3/assignments/ass3/index.php 6/6 create a new database TestingDB and initialise it with imdb.dump.bz2 run the command: psql TestingDB -f xtras.sql (using your xtras.sql) load and run the tests in the check script Your submitted code must be complete so that when we do the above, your xtras.sql will load without errors. You should throgouhly test your scripts before you submit them. If you Python or SQL scripts generate load-time errors and/or have missing definitions, you will be penalised by a 2 mark administrative penalty. Before you submit, it would be useful to test out whether the files you submit will work by following a similar sequence of steps to those noted above. Have fun, jas
欢迎咨询51作业君