辅导案例-CS 6400

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
Version 1.0 1 CS 6400 Fall 2020
CS 6400 Database Project:
PricePalace Discount Wholesale Data Warehouse
Fall 2020
Project Overview
The purpose of this project is to analyze, specify, design, and implement a data warehouse for
an up-and-coming club store called PricePalace Discount Wholesale. The project will proceed
in three phases as outlined in the methodology for database development: Analysis &
Specification; Design; and Implementation & Testing. The system will be implemented using a
Database Management System (DBMS) that supports standard SQL queries.
The PricePalace Data Warehouse
PricePalace Discount Wholesale is an up-and-coming wholesale club business with stores
throughout the United States. PricePalace sells all kinds of products. Your team has been
tasked with designing and building a data warehouse used by PricePalace executive team to
determine how PricePalace stores are doing and make major decisions about the future of
the company. This section describes in detail the requirements for PricePalace data
warehouse (PPDW).
A data warehouse is a database system used for reporting, analysis, and other tasks required
for decision support. Unlike transactional databases which are generally designed to record
repetitive day-to-day business transactions (e.g., point of sale, buy and sell stock orders,
online shopping carts, etc.), data warehouses are specially suited for reporting and analysis
over millions of records to support enterprise-wide decision making. As an example, a large
online merchant like amazon.com or bestbuy.com relies on a transactional (also called
operational) database system for recording customer orders and payments in real time. A
data analyst tasked with generating a report that compares sales of a certain product among
the different regions of the United States will typically query a specially-designed data
warehouse for the report instead of accessing the transactional databases directly. There are
several reasons for this: the data warehouse can store data from multiple transactional
databases in a consolidated form, the data warehouse schema is designed to support complex
queries aggregating millions of rows, and queries against the data warehouse do not impact
the performance of the transactional database which must support high transaction
throughput.
For this project, you will design the database schema for PricePalace Electronics Depot data
warehouse and attach it to a rudimentary user interface. You need not be concerned with the
transactional databases that we assume exist to support the point-of-sale system at each of
PricePalace stores. Instead, you will design the schema to support a consolidated view of the
Version 1.0 2 CS 6400 Fall 2020
products offered and sold in all PricePalace stores across the country. What follows is a
description of the requirements for the data warehouse in terms of what information must be
stored to support a set of reports defined by the PricePalace executive team.
Even though some amount of redundancy is typically acceptable in a data warehouse schema,
for this project you should create a normalized schema with as little redundancy as possible.
Version 1.0 3 CS 6400 Fall 2020
Data Requirements
PricePalace Data Warehouse (PPDW) maintains information about each store, including a
unique store number, the store’s phone number, and the store’s street address. (You do not
need to store individual components of the address, such as street number, prefix, etc., as the
addresses may be given to you unparsed.)
PPDW should also maintain information about each store’s city, including the city name, the
state in which the city is located, and the population of the city. It is possible that multiple
stores are located in the same city.
PPDW contains information about every product for sale at PricePalace stores. Products have
a numeric unique identifier (PID), similar to a UPC barcode, as well as the name of the
product. Assume that all products are available and sold at all stores—that is, there is no need
to specify that a certain product is only available at a certain store.
Each product is related to a single manufacturer. Each manufacturer has a name, which can
be assumed to be unique. It is possible that multiple products are made by the same
manufacturer.
To help identify the kinds of products that are popular, each product is assigned one or more
categories. Each category has a name, which we assume to be unique. Every product must be
in at least one category.
Every product has a retail price. The retail price is in effect unless there is a sale. PPDW
maintains the sale date and sale price of any product that goes on sale. If a product is on sale
for multiple days in a row, then a record is stored in the data warehouse for each day of the
sale. It is possible that the same product goes on sale multiple times (i.e., different days) with
different sale prices. If a product goes on sale, it is on sale at the same price in all stores—
i.e., stores are not allowed to hold sales independently or have store-specific sale prices.
The data warehouse should disallow sale prices that are higher than retail prices. Some
manufacturers put a cap on the maximum discount that any retailer can apply to any of the
manufacturer’s products in terms of a percentage. For example, if a manufacturer has a
maximum discount of 20%, then no product can be placed on sale for less than 80% of the
retail price. A maximum discount of 0% means the product cannot be placed on sale. Even if a
maximum discount is not specified by the manufacturer, as a general rule of PricePalace, no
product can be discounted more than 90% of retail. Be aware, however, that as with any
retail store, some pricing errors may and will occur for a variety of reasons, and any sales data
with such errors should not be corrected.
The PricePalace executive team would like the ability to compare sales data on holidays
versus non-holidays, so PPDW should maintain information about which specific dates are
holidays. The specific name of the holiday is also required.
Version 1.0 4 CS 6400 Fall 2020
PPDW stores information about which products are sold, including the store where it is sold,
the date of the sale, and the quantity of the product purchased. The price of the sale is not
stored explicitly, but can be derived based on the date purchased and the quantity. Assume
there is no sales tax. Also, the data warehouse is not required to store which products were
purchased together during a single sales transaction.
Since PricePalace is a club warehouse, they sell memberships. For each new membership, the
member ID, which will be unique, the signup date, the signup store, and the membership type
are tracked. Signups only occur in stores, there is no other way for memberships to be
bought such as online or by mail. PricePalace currently offers two types of membership:
“Yellow Jacket” or “Giant Hornet”. You do not need to worry about storing information about
existing memberships or the renewal of memberships, just new ones.
PricePalace’s DBAs are working on an extract of sample data from their point-of-sale system
for you to test in your data warehouse, however, to avoid revealing confidential information,
PricePalace’s data security team has directed them to use data from almost twenty years ago
with only certain categories of data, and refuse to allow newer data to be used. Retrieving the
data from tape backup and sanitizing it will take at least two to three months before it can be
made available to you. You will need to ensure that your schema design matches the data as
described here so that any transformation prior to loading is kept to a minimum.
Version 1.0 5 CS 6400 Fall 2020
PricePalace Data Warehouse User Interface
All of your reports will be accessible from a “dashboard” UI that must be developed. Since
this is the first version of the system, you do not need to concern yourself with configuring
usernames or passwords to control access to the system, as PricePalace’s data security team
will handle that for you in the future.
There should be a main menu screen which can be used to access all functionality of the
system that has been described in this specification. On this main menu, the following
statistics should be displayed along with any buttons/links to reports or functionalities: the
count of stores, manufacturers, products, and memberships sold in the data warehouse.
In addition to the reports, there are some relatively simple interfaces you should design and
provide as part of maintaining the data warehouse. First, you must provide an interface for
holidays to be maintained by the user. This interface must allow for viewing and adding
holiday information directly within the user interface. Second, your UI must allow for
updating the population of any cities in the data warehouse, should a city’s population
change.
PricePalace Data Warehouse Reports
PricePalace management has put your team in charge of developing the queries necessary to
produce the following reports. Many of the reports have derived and/or aggregate data. As
mentioned previously, these reports will be accessed with the user interface that you will
create.
Some of the report queries are expensive to run given the large number of rows in the
PricePalace Data Warehouse. Therefore, whenever possible you should include the filter
conditions specified. For example, some reports ask for data from only a certain time period.
If you leave off this filtering condition, the query will likely take a long time to return any
results.
Report 1 – Manufacturer’s Product Report
For each manufacturer, return the manufacturer’s name, total number of products offered by
the manufacturer, average retail price of all the manufacturer’s products, minimum retail
price, and maximum retail price. Ignore all sale days (do not take into account the days the
product is discounted). Sort the results by average price with the highest average price
appearing first, for only the top 100 manufacturers based on average price.
This report should also have “drill-down” detail (in other words, each line in the parent report
should have a method for loading its detail, such as a hyperlink on the manufacturer name or
a button) for the manufacturer, which shows in the report header the manufacturer’s details
(name and maximum discount), the summary information from the parent report, and lists
for each of the manufacturer’s products’ its product ID, name, category (or categories), and
Version 1.0 6 CS 6400 Fall 2020
price, ordered by price descending (high to low). If a product has multiple categories it must
not show up as multiple rows on the report, but as a single row with multiple categories
concatenated together.
Report 2 – Category Report
For each category, return the category name, total number of products in that category, total
number of unique manufacturers offering products in that category, and the average retail
price (not including sale days) of all the products in that category, sorted by category name
ascending.
Report 3 – Actual versus Predicted Revenue for GPS units
PricePalace executives want to predict whether offering items at a discount actually helps to
increase revenue by encouraging a higher volume of sales. This report compares how much
revenue was actually generated from a product’s sales to a predicted revenue if the product
were never offered on sale. After speaking with some marketing consultants, PricePalace
executives have learned that product discounts introduce on average a 25% increase in
volume (quantity sold). Therefore we assume that if an item that was offered at a discount
were instead offered at the retail price, the quantity of items sold would be reduced by 25%.
However, it is still possible that the predicted revenue would be higher since the reduced
volume of products would be sold at a higher price per product. Initially, the executives are
only interested in seeing the report for products in the GPS category.
Here is a simple example:
Assume that Product Z has a retail price of $10. Assume that it was offered at a discount for
on 6/1/2012 and 6/2/2012. Also assume the following transaction data for Product Z:
Date Price Quantity Actual Revenue
5/1/2012 10.00 5 50.00
6/1/2012 8.00 10 80.00
6/2/2012 7.00 5 35.00
TOTALS 20 $165.00
Table 1 - Actual Revenue
The predicted revenue is calculated by assuming that the product is never offered at a
discount and only 75% of the original quantity was actually sold on discounted days. Note that
because this is just a predicted average, we assume that it is possible to sell a fraction of a
product (e.g., 7.5 DVD players).
Version 1.0 7 CS 6400 Fall 2020
Date Price Quantity Predicted
Revenue
5/1/2012 10.00 5 50.00
6/1/2012 8.00 10.00 10 * .75 = 7.5 75.00
6/2/2012 7.00 10.00 5 * .75 = 3.75 37.50
TOTALS 16.25 $162.50
Table 2 - Predicted Revenue
In this example, the discounted prices resulted in slightly more revenue due to the higher
volume of sales ($2.50 more).
Generate the following report: For each product in the GPS category, return the product ID,
the name of the product, the product’s retail price, the total number of units ever sold, the
total number of units sold at a discount (i.e., during a sale day, the total number of units sold
at retail price, the actual revenue collected from all the sales of the product, the predicted
revenue had the product never been put on sale (based on 75% volume selling at retail price),
and the difference between the actual revenue and the predicted revenue. If the difference is
a positive number, it means that the discounts worked in favor of PricePalace because the
predicted revenue is less than the actual revenue collected. If it is a negative number, it
indicates that PricePalace would have been better off not offering the product discounts.
Only predicted revenue differences greater than $5000 (positive or negative) should be
displayed and sorted in descending order.
Report 4 –Store Revenue by Year by State
This report shows the revenue collected by stores per state grouped by year. The states
available for querying should be presented in a drop down box. For example, the user would
select “New York” and the system would show each store in New York state, show the store
ID, store address, city name, sales year, and total revenue. Be sure the revenue calculation
takes into account items that were sold at a discount. Sort the report first by year in
ascending order and then by revenue in descending order.
Report 5 – Air Conditioners on Groundhog Day?
Some of the sales staff have noticed that air conditioner sales seem to spike on Groundhog
Day (which falls on February 2 each year). They surmise that this is because customers begin
thinking about the warm spring weather ahead. The PricePalace marketing team would like
to know for sure if this is the case, so they have requested the following report.
For each year, return the year, the total number of items sold that year in the air conditioning
category, the average number of units sold per day (assume a year is exactly 365 days), and
the total number of units sold on Groundhog Day (February 2) of that year. Sort the report on
Version 1.0 8 CS 6400 Fall 2020
the year in ascending order. The marketing team will use the report to determine if the total
number of units sold on Groundhog Day each year is significantly higher than the average
number of units sold per day.
Report 6 – State with Highest Volume for each Category
PricePalace management is planning to recognize all stores in the states that sell the greatest
number of units for each category. They want to view this monthly, so the user interface
must allow choosing a year and month from the available dates in the database before
running the report. The report will return for each category: the category name, the state
that sold the highest number of units in that category (i.e., include items sold by all stores in
the state), and the number of units that were sold by stores in that state. This output shall be
sorted by category name ascending. Note that each category will only be listed once unless
two or more states tied for selling the highest number of units in that category. The report
can take a significant time to run, which may require tuned indices for the final
implementation, but do not focus on their creation until the final phase.
Report 7 – Revenue by Population
To help forecast expansions into other cities, PricePalace management would like to see what
the average revenue is for specific population categories, and to see if there is a trend for
growth, the revenue should be broken down on an annual basis. The categories for city size
are: Small (population <3,700,000), Medium (population >=3,700,000 and <6,700,000), Large
(population >=6,700,000 and <9,000,000) and Extra Large (population >=9,000,000). There is
some flexibility in formatting this report, in that it could be “pivoted” to present it with either
years or city category as columns or as rows, so ensure that both elements are arranged in
ascending order (oldest to newest for years, smallest to largest for city size) so that no matter
how it is formatted it is properly organized and understandable.
Report 8 – Membership Trends
PricePalace’s management team wishes to track the number of new signups per city, as this
can help them determine if more stores are needed in an area or if stores should be closed
and consolidated.
This report will have three parts. First, the initial report will show each year, in descending
order (most recent to least recent) and the total number of memberships sold for that year.
The second part is a drill-down (in other words, each line in the parent report should have a
method for loading its detail, such as a hyperlink on the total or a button) for that year
showing the top 25 cities that sold the most memberships, in descending order, and another
section showing the bottom 25 cities that sold the least memberships for that year, in
ascending order. Cities which have sold 250 or more memberships should have their total
highlighted with a red background. Cities which have sold 30 or fewer memberships should
have their total highlighted with a yellow background. Since city names can be used in other
Version 1.0 9 CS 6400 Fall 2020
states, be sure to include the state the city is in. The header of this report should include the
selected year.
The last part is a drill-down (again, each line in the parent report should have a method for
loading its detail, such as a hyperlink on the total or a button) by city to the store level, only if
that city has more than one store. The store number, street address, and city should be
displayed along with the total number of memberships sold during that year. The header of
the report should include the selected year.

Version 1.0 10 CS 6400 Fall 2020
Revision History
Version Notes Date
1.0 New version for Fall 2020 8/31/20



欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468