代写辅导接单-Assignment 4

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top

Assignment 4

Overview

This is a practical and real-world project that puts the knowledge you gained into practice. You are required to

investigate and understand a publicly available dataset, design a conceptual model for storing the dataset in a

relational database, apply normalisation techniques to improve the model, build the database according to your

design and import the data into your database, and develop SQL queries in response to a set of requirements.

The objective of this assignment is to reinforce what you have learned. Specifically, it involves how to build a

simple application that connects to a database backend, running a simple relational schema.

Part A: Understanding the Data (Preliminary Work)

Part B: Designing the Database

 Task B.1 Produce an ER diagram for a relational database that will be able to store the given dataset.

Part C: Creating the Database and Importing Data

 Task C.1 Produce one SQL script file

 Task C.2 Create a database file and import the given dataset into your database.

Part D: Data Retrieval and Visualisation

 Task D.1-D.5 Produce one SQL query file that includes five SQL queries, produce a PDF file that includes

the running result screenshot of five queries.

 Task D.1-D.5 Represent each query result as graph. Include graph in the PDF file for query result.

Assessment criteria

This assessment will measure your ability to:

 Analyse the requirements outlined in the problem description

 Develop a conceptual model for the design of a database backend required for the system

 Use an industry-standard ER modelling tool to draw the ER model

 Use 7-step mapping process to create relational database schema

 Use normalisation process to evaluate the schema and make sure that all the relations are at least 3NF

 Create tables on SQLite Studio and populate them with data available from the specified sources.

 Write SQL statements required for CRUD (create, read, update and delete) operations on the database

you built

 Develop your knowledge further to represent data in a meaningful way using data visualisation.

Expectation

Note that receiving a complete grade for the query questions D1-D5 will require using a data

visualisation technique to represent the result set. Any tool (e.g. Excel) can be used to produce the

visuals. They must be meaningful and easy-to-understand.

For draft we expect a fairly complete attempt at part A, B and a partially completed part C

We do not expect your answers to be 100% perfect.

The draft will attract up to marks:

ER provided

Schema provided

Database SQL provided

Data Populated for 1 large or 2 small tables

Assessment details

Part A: Understanding the Data

In this assignment, we are working with the publicly available dataset: A Global Database of COVID-19

Vaccinations. Further details about this dataset are available in the article available through the following URL:

https://www.nature.com/articles/s41562-021-01122-8. The abstract of the article is as follows.

An effective rollout of vaccinations against COVID-19 offers the most promising prospect of bringing

the pandemic to an end. We present the Our World in Data COVID-19 vaccination dataset, a global

public dataset that tracks the scale and rate of the vaccine rollout across the world. This dataset is

updated regularly and includes data on the total number of vaccinations administered, first and

second doses administered, daily vaccination rates and population-adjusted coverage for all countries

for which data are available (169 countries as of 7 April 2021). It will be maintained as the global

vaccination campaign continues to progress. This resource aids policymakers and researchers in

understanding the rate of current and potential vaccine rollout; the interactions with non-vaccination

policy responses; the potential impact of vaccinations on pandemic outcomes such as transmission,

morbidity and mortality; and global inequalities in vaccine access.

A live version of the vaccination dataset and documentation are available in a public GitHub repository at

https://github.com/owid/covid-19-data/tree/master/public/data/vaccinations. These data can be downloaded in

CSV and JSON formats.

For the purposes of completing this assignment, we are only using the following files. You are required to review

and analyse the dataset available in these files. You will find that reviewing the rest of the files, even if not listed

below, will help you to form a better understanding about the big picture.

FILE NAME DESCRIPTION

1 locations.csv Country names and the type of vaccines administered. Each line

represents the last observation in a specific country. Refer to

README.md for the details.

2 us_state_vaccinations.csv History of observations for various locations in the US.

3 vaccinations-by-age-group.csv History of observations for vaccinations of various age groups in each

country.

4 vaccinations-by-manufacturer.csv History of observations for various types of vaccines used in each

country.

5 vaccinations.csv Country-by-country data on global COVID-19 vaccinations. Each line

represents an observation date. Refer to README.md for the details.

6 country_data/China.csv Daily observations of vaccination in China.

7 country_data/India.csv Daily observations of vaccination in India.

8 country_data/United States.csv Daily observations of vaccination in US.

9 country_data/ Ireland.csv Daily observations of vaccination in Ireland.

Table 1: List of data files

To complete the tasks in the following sections, you are required to review and analyse the dataset that is

available in the named files.

Part B: Designing the Database

Task B.1 Produce an ER diagram for a relational database that will be able to store the given dataset.

It is important to note that the given CSV files are not necessarily representing a good design for a relational

database. It is your task to design a database that will adhere to good design principles that were taught. This

means your database schema will not match the structure of the CSV files and, therefore, you will require to

manipulate the structure of the dataset (and not the data itself) to import it into your database. Importing the

data is required to complete Task C.2.

The ER diagram must be produced by Lucidchart similar to the exercises that were completed. UML notation is

expected and using other notations will not be acceptable. Including a high-quality image representing your model

is important, which can be achieved using Export function of Lucidchart.

You are also required to transform the ER diagram into a database schema that will be used in the next part of the

assignment.

Creating a good database design typically involves some database normalisation activities. You should document

your normalisation activities and support them with good reasoning. This typically involves explaining what the

initial design was, what the problem was, and what changes have been made to rectify the issue.

The expected outcome of completing this task is one PDF file named Model.pdf containing the following sections.

1. Database ER diagram and, if needed, a reasonable set of assumptions.

2. Explanation of normalisation challenges and the resulting changes.

3. Database schema.

Part C: Creating the Database and Importing Data

Task C.1 Produce one SQL script file named Database.sql. This script file requires all the SQL statements necessary

to create all the database relations and their corresponding integrity constraints as per your proposed design in

Part B. The script file must run without any errors in SQLite Studio and contain necessary commenting to separate

various relations. Note that this script is not supposed to store any data into the relations.

The expected outcome of completing this task is one script file with the specific name of Database.sql.

Task C.2 Create a database file named Vaccinations.db and import the given dataset into your database.

To complete this task, you may need to change the format of the CSV files to match the attributes of your

designed database. You can use a spreadsheet editor such as Microsoft Excel.

The next step is to import the spreadsheets into the database you create in SQLite Studio. To complete this task,

use the menu option Tools – Import in SQLite.

The expected outcome of completing this task is one database file named Vaccinations.db, which must contain

all the data that is stored in the CSV files named in Table 1.

Part D: Data Retrieval and Visualisation

Now that you have created and populated a database, it is time to create some queries to investigate the data in

various ways. In addition to writing the required queries, you are also asked to produce data visualisation for the

results of your queries.

Each query must consist of one SQL statement. It would be acceptable to use several nested queries, combine

several SELECT statements with various operators etc. However, it would not be acceptable to have multiple and

separated queries for each task (or to use views).

After you have written each query, you are expected to inject your query into webpage, run the query on

SQLite_Web, produce a data visualisation for each result set. You have the freedom to choose the tool for

creating your visuals (e.g., Excel, Google Charts, Tableau) as well as the visualisation techniques (e.g., charts, plots,

diagrams, maps). Completing this portion of the work will require that you understand the nature of the results of

each query, undertake research to choose a visualisation tool you are comfortable with, decide about the best

technique to visually represent each result set, and produce the visualisation. Answers to tasks in Part D that are

not supported by a visualisation can achieve up to 80% of the grade associated with each task.

The expected outcome of completing this task is as follows.

1. One SQL script file named Queries.sql containing all the queries developed for the tasks in this section. It

is important that you add comment lines to separate the queries and indicate which task they belong to.

Note that valid SQL comments must not generate errors in SQLite Studio. The marker of your work will

use this file to execute and test your queries.

1. A PDF file named Queries.pdf containing the following elements for each task.

a. The SQL query.

b. A snapshot of your query conducted on SQLite_Web. The snapshot must also show the total number

of results retrieved by the query. A sample snapshot is provided below for your reference.

Figure 1: Sample results snapshot with total rows

c. Data visualisation. This must be represented as a graph or chart that presents the results in an easy to

understand manner. Consider how to order or group the data to make it more meaningful visually.

List of Tasks

Task D.1 For any three given dates (i.e., you can assume any three dates, e.g., 1 Jan. 2021, 1 June 2021 and 1 Jan.

2022), list the dates, the total number of vaccines administered in each observation date in each of all countries,

and the percentage change between the administered vaccines. The countries in resulting list should be ranked

based on the total change in vaccine administration. Each row in the result set must have the following structure.

(Note: OD3 is after OD2, OD2 is after OD1). (3 marks)

Date 1 Country Vaccine Date 2 Vaccine Date 3 Vaccine Percentage change of totals

(OD1) Name on OD1 (OD2) on OD2 (OD3) on OD3 [(VOD2-VOD1)/VOD1-

(CN) (VOD1) (VOD2) (VOD3) (VOD3-VOD2)/VOD2]

Figure 2: Column Headers in the Result Set for Task D.1

Task D.2 Find the monthly growth rate of vaccine administered by each country, and compare each country's

performance to the global average. Output growth rate of vaccine and the comparison for countries above the

global average. Each row in the result set must have the following structure. (3 marks)

Country Month Year Growth rate of vaccine (GR) Difference of growth rate to global

Cumulative Doses in this Month

Name ( ) average (GR-average GR for all countries)

Cumulative Doses in previous Month

Figure 3: Column Headers in the Result Set for Task D.2

Task D.3 Produce a list of the top 5 market share percentage of each vaccine type within each country, and the

countries taking each of these vaccine types. For a vaccine type that has been taken in multiple countries, the

result set is required to show several tuples reporting each country in a separate tuple. Each row in the result set

must have the following structure. (3 marks)

Vaccine Type Country Percentage of vaccine type

Figure 4: Column Headers in the Result Set for Task D.3

Task D.4 There are different data sources used to produce the dataset. Output the number of vaccines

administered in each country for each month according to each data source (i.e., unique URL). Order the result set

by the monthly administered vaccines. Each row in the result set must have the structure below. (3 marks)

Country Name Month Source Name (URL) Total Administered Vaccines

Figure 5: Column Headers in the Result Set for Task D.4

Task D.5 How do various countries compare in the speed of their vaccine administration?

Produce a report that lists all the observation days in 2022 and 2023, and then for each date, list the increment of

total number of people fully vaccinated in each one of the 4 countries used in this assignment comparing with the

previous date. (3 marks)

Dates United States China Ireland India

Figure 6: Column Headers in the Result Set for Task D.5

Assessment Criteria

Your report will be assessed on the following criteria:

 This assessment will measure your ability to:

 Analyse the requirements outlined in the problem description

 Develop a conceptual model for the design of a database backend required for the system

 Use an industry-standard ER modelling tool to draw the ER model

 Use the 7-step mapping process to create relational database schema

 Use the normalisation process to evaluate the schema and make sure that all the relations are at least 3NF

 Create tables on SQLite Studio and populate them with data available from the specified sources

 Write SQL statements for CRUD (create, read, update, delete) operations on the database you built

 Develop your knowledge further to represent data in a meaningful way using data visualisation 

Submission Format

You are required to submit the files with the exact names as below.

1. Model.pdf

2. Database.sql

3. Vaccinations.db

4. Queries.sql

5. Queries.pdf

In the previous sections of the assignment, the expected content of each of the files is explained in detail.

51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: Fudaojun0228