Diabetes 130-US hospitals for years 1999-2008 Data The dataset represents 10 years (1999-2008) of clinical care at 130 US hospitals and integrated delivery networks. It includes over 50 features representing patient and hospital outcomes. Information was extracted from the database for encounters that satisfied the following criteria. ● It is an inpatient encounter (a hospital admission). ● It is a diabetic encounter, that is, one during which any kind of diabetes was entered to the system as a diagnosis. ● The length of stay was at least 1 day and at most 14 days. ● Laboratory tests were performed during the encounter. ● Medications were administered during the encounter. This data has been prepared to analyze factors related to readmission as well as other outcomes pertaining to patients with diabetes. Dataset is in .csv format. It has 100000 instances. An instance corresponds to an admittance of a patient. You have to predict whether a patient is readmitted in ‘less than 30 days’, ‘more than 30 days’ or no readmission at all. Dataset has 50 features. Feature name Type Description and values Encounter ID Numeric Unique identifier of an encounter Patient number Numeric Unique identifier of a patient Race Nominal Values: Caucasian, Asian, African American, Hispanic, and other Gender Nominal Values: male, female, and unknown/invalid Age Nominal Grouped in 10-year intervals: 0, 10), 10, 20), …, 90, 100) Weight Numeric Weight in pounds. Admission type Nominal Integer identifier corresponding to 9 distinct values, for example, emergency, urgent, elective, newborn, and not available Discharge disposition Nominal Integer identifier corresponding to 29 distinct values, for example, discharged to home, expired, and not available Admission source Nominal Integer identifier corresponding to 21 distinct values, for example, physician referral, emergency room, and transfer from a hospital Time in hospital Numeric Integer number of days between admission and discharge Payer code Nominal Integer identifier corresponding to 23 distinct values, for example, Blue Cross/Blue Shield, Medicare, and self-pay Medical specialty Nominal Integer identifier of a speciality of the admitting physician, corresponding to 84 distinct values, for example, cardiology, internal medicine, family/general practice, and surgeon Number of lab procedures Numeric Number of lab tests performed during the encounter Number of procedures Numeric Number of procedures (other than lab tests) performed during the encounter Number of medications Numeric Number of distinct generic names administered during the encounter Number of outpatient visits Numeric Number of outpatient visits of the patient in the year preceding the encounter Number of emergency visits Numeric Number of emergency visits of the patient in the year preceding the encounter Number of inpatient visits Numeric Number of inpatient visits of the patient in the year preceding the encounter Diagnosis 1 Nominal The primary diagnosis (coded as first three digits of ICD9); 848 distinct values Diagnosis 2 Nominal Secondary diagnosis (coded as first three digits of ICD9); 923 distinct values Diagnosis 3 Nominal Additional secondary diagnosis (coded as first three digits of ICD9); 954 distinct values Number of diagnoses Numeric Number of diagnoses entered to the system Glucose serum test result Nominal Indicates the range of the result or if the test was not taken. Values: “>200,” “>300,” “normal,” and “none” if not measured A1c test result Nominal Indicates the range of the result or if the test was not taken. Values: “>8” if the result was greater than 8%, “>7” if the result was greater than 7% but less than 8%, “normal” if the result was less than 7%, and “none” if not measured. Change of medications Nominal Indicates if there was a change in diabetic medications (either dosage or generic name). Values: “change” and “no change” Diabetes medications Nominal Indicates if there was any diabetic medication prescribed. Values: “yes” and “no” 24 features for medications Nominal For the generic names: metformin, repaglinide, nateglinide, chlorpropamide, glimepiride, acetohexamide, glipizide, glyburide, tolbutamide, pioglitazone, rosiglitazone, acarbose, miglitol, troglitazone, tolazamide, examide, sitagliptin, insulin, glyburide-metformin, glipizide-metformin, glimepiride-pioglitazone, metformin-rosiglitazone, and metformin-pioglitazone, the feature indicates whether the drug was prescribed or there was a change in the dosage. Values: “up” if the dosage was increased during the encounter, “down” if the dosage was decreased, “steady” if the dosage did not change, and “no” if the drug was not prescribed Readmitted Nominal Days to inpatient readmission. Values: “<30” if the patient was readmitted in less than 30 days, “>30” if the patient was readmitted in more than 30 days, and “No” for no record of readmission. Instructions: ● You should create a model to predict the target class using the features given. ● You have the freedom to use any method or techniques to analyze the data, train models and evaluate the results. But you should only use standard python libraries + scipy + pandas tools. No third-party libraries are allowed. ● You must deliver your project in the form of a jupyter notebook. ● Jupyter notebooks are all about telling a story using the data. So make your notebook that way, present everything nicely and have a good flow. ● We will not mark you on the final accuracy you get. We will mark you on how well you explain the decisions you have made in every stage of your project, reasoning behind your decisions and proper data representation, evaluation and analyzing methods. All of these must be shown within the notebook itself. ● Make sure you attend labs and complete assignments, that will help you a lot.
欢迎咨询51作业君