代写辅导接单-CS-GY 6513

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top

Homework 2

Hide Assignment Informaon

Instructions

CS-GY 6513 Big Data Fall 2024

TK

Assignments Homework 2

Due Date: Friday, October 25, 11:59PM

SUBMIT YOUR SOLUTION AS A JUPYTER NOTEBOOK.

Use your netid: e.g. jcr365-hw2.ipynb

If I cannot run your notebook, you will not get full credit.

**** Give attribution to any code you use that is not your original code ****

Instructions

Refer to the notebook HW2.ipynb and the data folder in the course website.

*** ALL DATASETS ARE AVAILABLE IN THE JUPYTERHUB SHARED FOLDER

 

1. 25 points    Data: shared/data/Bakery.csv

Show the highest selling item for Mondays, per hour, for the 7AM to 11AM hours.

Note that “weekday”, “period” have to be computed.

For example (these are made up numbers....)

  Item     qty, weekday,  Date ,  Hour-period, qty   

   Bread, 102, Monday, 2016-10-31, 7AM

   Coffee, 132, Monday, 2016-10-31, 8AM

   :

 

2. 25 points   Data: shared/data/Bakery.csv

Show the top 2 (by qty) items bought by Daypart, by DayType.

Note:

Daypart = Breakfast if 6AM – 10:59AM, Lunch if 11:01AM – 3:59PM, Dinner otherwise

DayType = Weekend if Sat, Sun, Weekday otherwise

For example (not necessarily the right numbers....)

   Weekend, Breakfast, (coffee, Muffin)

   Weekend, Lunch, (cookies, pastry)

   :

** The Answer MUST include the 2 items in a single column

 

3. 20 Points    Data: shared/data/Restaurants_in_Durham_County_NC.json

Show the number of entities by “fields.rpt_area_desc”

Example (not true numbers):

  “Food Service”,  13

  “Tatoo Establishment”,  2

    :

4. 20 Points.   Data:  shared/data/populationbycountry19802010millions.csv

10/19/24, 10:51 PMHomework 2 - CS-GY 6513 Big Data Fall 2024 - NYU

https://brightspace.nyu.edu/d2l/lms/dropbox/user/folder_submit_files.d2l?ou=404094&db=9194071/3

Due on Oct 25, 2024 11:59 PM

Attachments

HW2.pdf (91.31 KB)

populationbycountry19802010millions (1).csv (57.72 KB)

Restaurants_in_Durham_County_NC (2).json (1.83 MB)

durham-nc-foreclosure-2006-2016 (2).json (640.49 KB)

Bakery.csv (693.87 KB)

Download All Files

Submit Assignment

Files to submit

Show the country or region with the biggest percentage increase in population AND

the country with biggest percentage decrease  in population, between the years

1990 and 2000. Use only the countries, not ‘World’.

Example (Not the real answer):

North America,  2.30%     <- assuming North America was max

Aruba, -22.2%...                  <- assuming Aruba was min

5. 20 Points   Data: hw1text (from HW1).

Solve: do WordCount

Do word count exercise using pyspark.

Ignore punctuation and normalize to lower case.

i.e. replace characters in NOT in this set: [0-9a-z] with space.

HINT: You can use the sparkml package.

6. 20 Points   Data: hw1text (from HW1)

Find the 10 most common bigrams

HINT: You can use the sparkml package.

 

 

6. Extra credit – 40 points

Data:

durham-nc-foreclosure-2006-2016.json

Restaurants_in_Durham_County_NC.json

  

a. Find food service and active restaurants (“status” = “ACTIVE” and

“"rpt_area_desc" = "Food Service”) closest to the following coordinate: of

35.994914, -78.897133, and show it.

b. With that restaurant in (a) as your center point, find the number of foreclosures

within a 1 mile radius

You can use an external library for calculating coordinate distances.

The haversine library is available in Jupyterhub’s bigdata environment.

10/19/24, 10:51 PMHomework 2 - CS-GY 6513 Big Data Fall 2024 - NYU

https://brightspace.nyu.edu/d2l/lms/dropbox/user/folder_submit_files.d2l?ou=404094&db=9194072/3

Add a FileRecord AudioRecord Video

(0) file(s) to submit

After uploading, you must click Submit to complete the submission.

Comments

10/19/24, 10:51 PMHomework 2 - CS-GY 6513 Big Data Fall 2024 - NYU

https://brightspace.nyu.edu/d2l/lms/dropbox/user/folder_submit_files.d2l?ou=404094&db=9194073/3

51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: Fudaojun0228