代写接单- BCPM0090 - Assignment 1

 2022/9/1 15:27 Pass - Jupyter Notebook BCPM0090 - Assignment 

In this assessment, I will choose the dataset 1 - Airlines. This dataset is a flat file database, with all the relavent information on a single table. The table consists of 24 columns and 129881 rows (including titles and index) The purpose of this dataset is to collect customers satisfaction of a particular airline. 129880 customers were invited to take a survey. The dataset records their gender, age, loyalty type, flight class, travel type, total distance and total delay in minute. They then ask to rate from 1 to 5 for various service on board including food, leg room, inflight entertainment and etc. In [ ]: import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from scipy import stats import warnings from sklearn import preprocessing localhost:8888/notebooks/Desktop/Pass.ipynb 1/14 2022/9/1 15:27 Pass - Jupyter Notebook In [ ]: le = preprocessing.LabelEncoder() warnings.filterwarnings("ignore") """ read dataset """ train_data = pd.read_csv('/content/Exam1_Airline.csv') train_data.info() #check if there are null data #the three continuous characteristic are divided into 10 buckets train_data['flight_distance'] = pd.cut(train_data['flight_distance'], bins = 10,right = True) train_data['departure_delay_in_minutes'] = pd.cut(train_data['departure_delay_in_minutes'], bins = 1 train_data['arrival_delay_in_minutes'] = pd.cut(train_data['arrival_delay_in_minutes'], bins = 10,ri <class 'pandas.core.frame.DataFrame'> RangeIndex: 129880 entries, 0 to 129879 Data columns (total 24 columns): # Column --- ------ 0 Unnamed: 0 1 Gender 2 customer_type 3 age 4 type_of_travel 5 customer_class 6 flight_distance 7 inflight_wifi_service 8 departure_arrival_time_convenient 129880 non-null int64 9 ease_of_online_booking 10 gate_location 11 food_and_drink 12 online_boarding 13 seat_comfort 14 inflight_entertainment 15 onboard_service 16 leg_room_service 17 baggage_handling 18 checkin_service 19 inflight_service 20 cleanliness 21 departure_delay_in_minutes 22 arrival_delay_in_minutes 23 satisfaction 129880 non-null int64 129880 non-null int64 129880 non-null int64 129880 non-null int64 129880 non-null int64 129880 non-null int64 129880 non-null int64 129880 non-null int64 129880 non-null int64 129880 non-null int64 129880 non-null int64 129880 non-null int64 129880 non-null int64 129487 non-null float64 129880 non-null object dtypes: float64(1), int64(18), object(5) memory usage: 23.8+ MB Non-Null Count Dtype -------------- ----- 129880 non-null int64 129880 non-null object 129880 non-null object 129880 non-null int64 129880 non-null object 129880 non-null object 129880 non-null int64 129880 non-null int64 According to the result given by the command train_data.info(), this dataset contains 3 types of data: 64-bit integer(int64), object and 64-bit float(float64). localhost:8888/notebooks/Desktop/Pass.ipynb 2/14 2022/9/1 15:27 Pass - Jupyter Notebook In [ ]: # Fistly, count the total number of customers who feel satisfied and who feel neutural or unsatisfie label_gp = train_data.groupby('satisfaction').count() print('number of samples:\n',label_gp) _,axe = plt.subplots(1,2,figsize=(12,6)) train_data.satisfaction.value_counts().\ plot(kind= 'pie',autopct= '%1.1f%%',shadow= T r u e ,explode= [0,0.1],ax= axe[0]) sns.countplot('satisfaction',data= train_data,ax= axe[1],) number of samples: satisfaction neutral or dissatisfied satisfied [2 rows x 23 columns] Out[14]: Unnamed: 0 ... arrival_delay_in_minutes ... <matplotlib.axes._subplots.AxesSubplot at 0x7f342be43890> 73452 ... 56428 ... 73225 56262 From the pie chart, we can see that only 43.4% passengers feel satisfied with this airline. This is not a pleasant result because over half of the customers involved in this suvey were not fully satisfied. For a service-oriented industry, too low customer satisfaction can lead to future business failure. So the company have to improve their quality of service in order to gain customer satisfaction localhost:8888/notebooks/Desktop/Pass.ipynb 3/14 2022/9/1 15:27 Pass - Jupyter Notebook In [ ]: # Count the number of loyal customers and disloyal customers label_gp = train_data.groupby('customer_type').count() print('number of samples:\n',label_gp) _,axe = plt.subplots(1,2,figsize=(12,6)) train_data.customer_type.value_counts().\ plot(kind= 'pie',autopct= '%1.1f%%',shadow= T r u e ,explode= [0,0.1],ax= axe[0]) sns.countplot('customer_type',data= train_data,ax= axe[1],) number of samples: customer_type Loyal Customer disloyal Customer Unnamed: 0 Gender ... arrival_delay_in_minutes satisfaction ... [2 rows x 23 columns] Out[15]: 106100 106100 ... 23780 23780 ... 105773 23714 106100 23780 <matplotlib.axes._subplots.AxesSubplot at 0x7f342bd47b90> From the pie chart, we can observed that over 80% are loyal customers and number of disloyal customers are significantly lower than ones are loyal. This is a good news for the company because its user stickiness is high. localhost:8888/notebooks/Desktop/Pass.ipynb 4/14 2022/9/1 15:27 Pass - Jupyter Notebook In [ ]: label_gp = train_data.groupby('customer_class').count() print('number of samples:\n',label_gp) _,axe = plt.subplots(1,2,figsize=(12,6)) train_data.customer_class.value_counts().\ plot(kind= 'pie',autopct= '%1.1f%%',shadow= T r u e ,explode= [0,0.1,0.2],ax= axe[0]) sns.countplot('customer_class',data= train_data,ax= axe[1],) number of samples: Unnamed: 0 Gender ... arrival_delay_in_minutes satisfaction customer_class Business Eco Eco Plus ... 62160 62160 ... 58309 58309 ... 9411 9411 ... 61990 58117 9380 62160 58309 9411 [3 rows x 23 columns] Out[22]: <matplotlib.axes._subplots.AxesSubplot at 0x7f342b3efe90> From the pie chart, most of people are travelling with business and economic class. localhost:8888/notebooks/Desktop/Pass.ipynb 5/14 2022/9/1 15:27 Pass - Jupyter Notebook In [ ]: label_gp = train_data.groupby('inflight_wifi_service').count() print('number of samples:\n',label_gp) _,axe = plt.subplots(1,2,figsize=(12,6)) train_data.inflight_wifi_service.value_counts().\ plot(kind='pie',autopct='%1.1f%%',shadow=True,explode=[i 0.1 for i in range(label_gp.shape[0 sns.countplot('inflight_wifi_service',data= train_data,ax= axe[1],) : inflight_wifi_service 0 3916 3916 1 22328 22328 2 32320 32320 3 32185 32185 4 24775 24775 5 14356 14356 3916 3916 22328 22328 32320 32320 32185 32185 24775 24775 14356 14356 inflight_wifi_service 0 3916 1 22328 2 32320 3 32185 4 24775 5 14356 3916 22328 32320 32185 24775 14356 3916 22328 32320 32185 24775 14356 Unnamed: 0 Gender customer_type age \ type_of_travel customer_class flight_distance \ departure arrival time convenient \ localhost:8888/notebooks/Desktop/Pass.ipynb 6/14 2022/9/1 15:27 Pass - Jupyter Notebook In [ ]: # count the number of rating in food and drink label_gp = train_data.groupby('food_and_drink').count() print('number of sample:\n',label_gp) _,axe = plt.subplots(1,2,figsize=(12,6)) train_data.food_and_drink.value_counts().\ plot(kind='pie',autopct='%1.1f%%',shadow=True,explode=[i 0.1 for i in range(label_gp.shape[0 sns.countplot('food_and_drink',data= train_data,ax= axe[1],) number of sample: Unnamed: 0 Gender ... arrival_delay_in_minutes satisfaction food_and_drink ... 0 132 132 ... 1 16051 16051 ... 2 27383 27383 ... 3 27794 27794 ... 4 30563 30563 ... 5 27957 27957 ... [6 rows x 23 columns] Out[23]: <matplotlib.axes._subplots.AxesSubplot at 0x7f342b331210> 130 132 16010 16051 27293 27383 27712 27794 30477 30563 27865 27957 The bar chart shows that the rate 2, 3, 4, 5 is quite equal. This may because different people have different eating habits, so they feel differently for the unboard meals localhost:8888/notebooks/Desktop/Pass.ipynb 7/14 2022/9/1 15:27 Pass - Jupyter Notebook In [ ]: # Check if gender affect satisfaction plt.figure(figsize= (8,8)) plt.title('Gender VS satisfaction') ax = sns.countplot('Gender',hue='satisfaction',data=train_data) for p in ax.patches: height = p.get_height() From the bar chart, we can see that both male and female perform a similar pattern, this indicates that gender does not have significant effect on the satisfaction. localhost:8888/notebooks/Desktop/Pass.ipynb 8/14 2022/9/1 15:27 Pass - Jupyter Notebook In [ ]: # check if loyalty affect satisfaction plt.figure(figsize= (8,8)) plt.title('customer_type VS satisfaction') ax = sns.countplot('customer_type',hue='satisfaction',data=train_data) for p in ax.patches: height = p.get_height() From the bar chart, we can see the difference in disloyal customer is much larger that in loyal customer. we can conclude that disloyal customers are more likely to feel dissatisfied. localhost:8888/notebooks/Desktop/Pass.ipynb 9/14 2022/9/1 15:27 Pass - Jupyter Notebook In [ ]: # Check if customer class affect satisfaction plt.figure(figsize= (8,8)) plt.title('customer_class VS satisfaction') ax = sns.countplot('customer_class',hue='satisfaction',data=train_data) for p in ax.patches: height = p.get_height() From the bar chart, we can see that economic customers are most unsatisfied and business customers feel most satisfied. This can be explained as business class enjoy the best inflight survice. Therefore, to improve customer satisfaction, the company should focus on economic class customers more. localhost:8888/notebooks/Desktop/Pass.ipynb 10/14 2022/9/1 15:27 Pass - Jupyter Notebook In [ ]: # Check if age affects satisfaction plt.figure(figsize= (24,24)) plt.title('Age VS satisfaction') ax = sns.countplot('age',hue='satisfaction',data=train_data) The barchart shows that for people below 40 years old, the rate of satisfaction is below 50%, for people above 40, the rate of satisfaction is above 50%. The peak of neutral/unsatisfied appears at 25 years old. localhost:8888/notebooks/Desktop/Pass.ipynb 11/14 2022/9/1 15:27 Pass - Jupyter Notebook In [ ]: plt.figure(figsize= (24,8)) plt.title('flight_distance VS satisfaction') ax = sns.countplot('flight_distance',hue='satisfaction',data=train_data) Because the distribution of flight distance is continuous, I divided it into 10 portions. It is obvious that with the increase of the flight distance, the number of samples in each slot is decreasing, and the proportion of satisfaction is increasing localhost:8888/notebooks/Desktop/Pass.ipynb 12/14 2022/9/1 15:27 Pass - Jupyter Notebook In [ ]: # check the relation between rating of inflight service and satisfaction plt.figure(figsize= (8,8)) plt.title('inflight_service VS satisfaction') ax = sns.countplot('inflight_service',hue='satisfaction',data=train_data) It is interesting to see that although customers rate 4 for inflight service, the proportion of dissatisfaction is still higher than satisfiction. This may indicate that the main cause of dissatisfaction is not inflight service but other factors. Conclusion In conclusion, the quality of this data set is high, and there is no miss test value or outliers. The purpose of this data analysis is to investigate relationship between customer satisfaction and varies factors. The analysis could help the company to optimise its service and flight, to gain competative advantages within the industry, so as to maximise the profit This analysis mainly apply numpy, pandas and marplotlilb functions. The data visualisation of bar charts and pie charts are mainly used. and the automatic recognition threshold is set according to the length of the category of the sample. Limitation and Recommendation localhost:8888/notebooks/Desktop/Pass.ipynb 13/14 2022/9/1 15:27 Pass - Jupyter Notebook Overall, the sample size is only 129980 which is still quite small. For an aviation big data, it is obviously not sufficient to produce a reliable result. For improvement, more customers should be invited to take the survey to avoid bias. From the investigation, we can see that the most unsatisfied customers are from economic class, so the company should put more effort on economic class. It can improve the food and drink onboard because large porportion of people rate 2 and 3 on that category. According to experience, economic class customers often do not have many choice on meals compare to other class. To improve that, the company could prepare 3 different types of dishes, for example one Italian pasta, one Chinese rice and one British mashed potato. Customer can choose which one they prefer, this may increase their satisfaction. localhost:8888/notebooks/Desktop/Pass.ipynb 14/14 

51作业君 51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: ITCSdaixie