Assignment 1: Getting to know the data
Due Sep 2 by 3am
Points 100
Submitting a text entry box
New Attempt
PREREQUISITES: Review the Requirements, and Data Engineering lectures, and the pandas and
matplotlib package tutorials. SecureBank provided you three data sources from various gatekeepers:
1. customer_release.csv
2. transactions_release.parquet
3. fraud_release.json (the dictionary structure will have `trans_num` as the key/attribute and the
corresponding value as `is_fraud`)
Place these files in a directory called securebank/data_sources/ (i.e.,
securebank/data_sources/customer_release.csv). NOTE: Do NOT check these files in GitHub! You can
add these files to a .gitignore file.
OBJECTIVES: Familiarize yourself with the SecureBank dataset provided. In an Jupyter ipython
notebook titled (`securebank/analysis/data_analysis.ipynb`) complete the following tasks. You are
permitted to use any python packages you find fit to accomplish the homework:
Task 1: Write a function called __merge() which takes the arguments customer_filename: str,
transaction_filename: str, fraud_filename:str. This function should return a pandas DataFrame the
merges data sources indexed by `trans_num`, and sorted by `trans_date_trans_time`.
Task 2: Analyze the data provided, and provide evidence using tables and graphs to corroborate
(or disprove) the analysts' insights (FOUR total). Furthermore, provide TWO other insights you
uncover through your initial analysis. Provide an argument for the results and use graphs, tables,
etc., to support them.
SUBMISSION: You will need to check in your `securebank/analysis/data_analysis.ipynb` submission into
your provisioned GitHub repository, and provide GitHub the URL link to this notebook via Canvas to get
credit for this submission. Use proper markdown headings to delineate these tasks and subtasks.
2024/9/19 20:43Assignment 1: Getting to know the data
https://jhu.instructure.com/courses/82966/assignments/872354?return_to=https%3A%2F%2Fjhu.instructure.com%2Fcalendar%23view_name%3Dmo...1/2
2024/9/19 20:43Assignment 1: Getting to know the data
https://jhu.instructure.com/courses/82966/assignments/872354?return_to=https%3A%2F%2Fjhu.instructure.com%2Fcalendar%23view_name%3Dmo...2/2