28/12/22, 5:16 PM
Uber Trips Analysis using Python - Jupyter Notebook
Page 1 of 7
http://localhost:8888/notebooks/Python%20Case%20Studies/Uber%…sing%20Python/Uber%20Trips%20Analysis%20using%20Python.ipynb
Uber Trips Analysis using PythonI
In [1]:
In [2]:
In [3]:
In [4]:
Out[2]:
Date/Time Lat Lon Base
0 9/1/2014 0:01:00 40.2201 -74.0021 B02512
1 9/1/2014 0:01:00 40.7500 -74.0027 B02512
2 9/1/2014 0:03:00 40.7559 -73.9864 B02512
3 9/1/2014 0:06:00 40.7450 -73.9889 B02512
4 9/1/2014 0:11:00 40.8145 -73.9444 B02512
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1028136 entries, 0 to 1028135
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date/Time 1028136 non-null object
1 Lat 1028136 non-null float64
2 Lon 1028136 non-null float64
3 Base 1028136 non-null object
dtypes: float64(2), object(2)
memory usage: 31.4+ MB
#importing necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
uber = pd.read_csv('uber-raw-data-sep14.csv')
uber.head()
#checking information about the columns of dataset:
uber.info()
#converting the object type 'Date' column into the datetime column:
uber['Date/Time']= uber['Date/Time'].map(pd.to_datetime)
1
2
3
4
5
6
7
1
2
1
2
1
2
28/12/22, 5:16 PM
Uber Trips Analysis using Python - Jupyter Notebook
Page 2 of 7
http://localhost:8888/notebooks/Python%20Case%20Studies/Uber%…sing%20Python/Uber%20Trips%20Analysis%20using%20Python.ipynb
In [5]:
In [6]:
-- So, now I have prepared the data according to days and hours. As I am using the Uber
trips for the September mont. So let's have a look into each day the Uber trips were the
highest.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1028136 entries, 0 to 1028135
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date/Time 1028136 non-null datetime64[ns]
1 Lat 1028136 non-null float64
2 Lon 1028136 non-null float64
3 Base 1028136 non-null object
dtypes: datetime64[ns](1), float64(2), object(1)
memory usage: 31.4+ MB
Out[6]:
Date/Time Lat Lon Base Day Weekday Hour
0 2014-09-01 00:01:00 40.2201 -74.0021 B02512 1 0 0
1 2014-09-01 00:01:00 40.7500 -74.0027 B02512 1 0 0
2 2014-09-01 00:03:00 40.7559 -73.9864 B02512 1 0 0
3 2014-09-01 00:06:00 40.7450 -73.9889 B02512 1 0 0
4 2014-09-01 00:11:00 40.8145 -73.9444 B02512 1 0 0
uber.info()
#splitting the 'Date/Time' column into the 'Day', 'Weekday' and 'Ho
uber['Day']= uber['Date/Time'].apply(lambda x:x.day)
uber['Weekday'] = uber['Date/Time'].apply(lambda x: x.weekday())
uber['Hour']= uber['Date/Time'].apply(lambda x: x.hour)
uber.head()
1
1
2
3
4
5
28/12/22, 5:16 PM
Uber Trips Analysis using Python - Jupyter Notebook
Page 3 of 7
http://localhost:8888/notebooks/Python%20Case%20Studies/Uber%…sing%20Python/Uber%20Trips%20Analysis%20using%20Python.ipynb
In [7]:
-- So by looking at the daily trips we can say that Uber trips were rising on the working
days and declining on the weekends.
plt.figure(figsize= (12,7))
sns.distplot(uber['Day'])
plt.show()
1
2
3
28/12/22, 5:16 PM
Uber Trips Analysis using Python - Jupyter Notebook
Page 4 of 7
http://localhost:8888/notebooks/Python%20Case%20Studies/Uber%…sing%20Python/Uber%20Trips%20Analysis%20using%20Python.ipynb
In [8]:
-- So by looking into the hourly data we can say that Uber trips decreases after midnight
and then start increases around 5:00 PM and then keep rising till 6:00 PM. It is the
busiest hour and after that it again starts decreasing.
#Now let's look at the uber trips according to the hours:
plt.figure(figsize= (12, 7))
sns.distplot(uber['Hour'])
plt.show()
1
2
3
4
28/12/22, 5:16 PM
Uber Trips Analysis using Python - Jupyter Notebook
Page 5 of 7
http://localhost:8888/notebooks/Python%20Case%20Studies/Uber%…sing%20Python/Uber%20Trips%20Analysis%20using%20Python.ipynb
In [9]:
-- By observing the above data, we can say that Uber trips on Sunday's are higher than
Saturdays. So, we can say that people use Uber on those days for outings rather than
just going to work. On Saturdays Uber trips are the lowest and on Mondays they are the
highest.
#Now let's analyze Uber trips according to the weekdays:
plt.figure(figsize = (12, 7))
sns.distplot(uber['Weekday'])
plt.show()
1
2
3
4
28/12/22, 5:16 PM
Uber Trips Analysis using Python - Jupyter Notebook
Page 6 of 7
http://localhost:8888/notebooks/Python%20Case%20Studies/Uber%…sing%20Python/Uber%20Trips%20Analysis%20using%20Python.ipynb
In [10]:
-- As we have the data about the longitude and lattitude so we can also plot the density
of Uber trips according to the regions of New York City.
In [11]:
Out[11]:
Date/Time Lat Lon Base Day Weekday Hour
0 2014-09-01 00:01:00 40.2201 -74.0021 B02512 1 0 0
1 2014-09-01 00:01:00 40.7500 -74.0027 B02512 1 0 0
2 2014-09-01 00:03:00 40.7559 -73.9864 B02512 1 0 0
3 2014-09-01 00:06:00 40.7450 -73.9889 B02512 1 0 0
4 2014-09-01 00:11:00 40.8145 -73.9444 B02512 1 0 0
#Now let's have a look at the correlation of hours and weekdays on
#Correlation of Weekday and Hours
df = uber.groupby(['Weekday', 'Hour']).apply(lambda x: len(x))
df = df.unstack()
sns.heatmap(df, annot = False)
plt.show()
uber.head()
1
2
3
4
5
6
7
1
28/12/22, 5:16 PM
Uber Trips Analysis using Python - Jupyter Notebook
Page 7 of 7
http://localhost:8888/notebooks/Python%20Case%20Studies/Uber%…sing%20Python/Uber%20Trips%20Analysis%20using%20Python.ipynb
In [12]:
So this is how I analyzed the uber trips for New York City. Some of the conclusion I
derived are as follows:
1. Mondays are the most profitable for Uber.
2. On Saturdays less number of people uses Uber.
3. 6:00 PM is the busiest hour for Uber.
4. On average, a rise in Uber trips start around 5:00 AM.
5. Most of the Uber trips originate around Manhattan region in the New York.
In [ ]:
uber.plot(kind = 'scatter', x = 'Lon', y = 'Lat', alpha = 0.4,
figsize = (12, 7), cmap= plt.get_cmap('jet'))
plt.title("Uber Trip Analysis")
plt.legend()
plt.show()
1
2
3
4
5
6
1

Uber Trips Analysis using Python.pdf

  • 1.
    28/12/22, 5:16 PM UberTrips Analysis using Python - Jupyter Notebook Page 1 of 7 http://localhost:8888/notebooks/Python%20Case%20Studies/Uber%…sing%20Python/Uber%20Trips%20Analysis%20using%20Python.ipynb Uber Trips Analysis using PythonI In [1]: In [2]: In [3]: In [4]: Out[2]: Date/Time Lat Lon Base 0 9/1/2014 0:01:00 40.2201 -74.0021 B02512 1 9/1/2014 0:01:00 40.7500 -74.0027 B02512 2 9/1/2014 0:03:00 40.7559 -73.9864 B02512 3 9/1/2014 0:06:00 40.7450 -73.9889 B02512 4 9/1/2014 0:11:00 40.8145 -73.9444 B02512 <class 'pandas.core.frame.DataFrame'> RangeIndex: 1028136 entries, 0 to 1028135 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date/Time 1028136 non-null object 1 Lat 1028136 non-null float64 2 Lon 1028136 non-null float64 3 Base 1028136 non-null object dtypes: float64(2), object(2) memory usage: 31.4+ MB #importing necessary libraries import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import warnings warnings.filterwarnings('ignore') uber = pd.read_csv('uber-raw-data-sep14.csv') uber.head() #checking information about the columns of dataset: uber.info() #converting the object type 'Date' column into the datetime column: uber['Date/Time']= uber['Date/Time'].map(pd.to_datetime) 1 2 3 4 5 6 7 1 2 1 2 1 2
  • 2.
    28/12/22, 5:16 PM UberTrips Analysis using Python - Jupyter Notebook Page 2 of 7 http://localhost:8888/notebooks/Python%20Case%20Studies/Uber%…sing%20Python/Uber%20Trips%20Analysis%20using%20Python.ipynb In [5]: In [6]: -- So, now I have prepared the data according to days and hours. As I am using the Uber trips for the September mont. So let's have a look into each day the Uber trips were the highest. <class 'pandas.core.frame.DataFrame'> RangeIndex: 1028136 entries, 0 to 1028135 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date/Time 1028136 non-null datetime64[ns] 1 Lat 1028136 non-null float64 2 Lon 1028136 non-null float64 3 Base 1028136 non-null object dtypes: datetime64[ns](1), float64(2), object(1) memory usage: 31.4+ MB Out[6]: Date/Time Lat Lon Base Day Weekday Hour 0 2014-09-01 00:01:00 40.2201 -74.0021 B02512 1 0 0 1 2014-09-01 00:01:00 40.7500 -74.0027 B02512 1 0 0 2 2014-09-01 00:03:00 40.7559 -73.9864 B02512 1 0 0 3 2014-09-01 00:06:00 40.7450 -73.9889 B02512 1 0 0 4 2014-09-01 00:11:00 40.8145 -73.9444 B02512 1 0 0 uber.info() #splitting the 'Date/Time' column into the 'Day', 'Weekday' and 'Ho uber['Day']= uber['Date/Time'].apply(lambda x:x.day) uber['Weekday'] = uber['Date/Time'].apply(lambda x: x.weekday()) uber['Hour']= uber['Date/Time'].apply(lambda x: x.hour) uber.head() 1 1 2 3 4 5
  • 3.
    28/12/22, 5:16 PM UberTrips Analysis using Python - Jupyter Notebook Page 3 of 7 http://localhost:8888/notebooks/Python%20Case%20Studies/Uber%…sing%20Python/Uber%20Trips%20Analysis%20using%20Python.ipynb In [7]: -- So by looking at the daily trips we can say that Uber trips were rising on the working days and declining on the weekends. plt.figure(figsize= (12,7)) sns.distplot(uber['Day']) plt.show() 1 2 3
  • 4.
    28/12/22, 5:16 PM UberTrips Analysis using Python - Jupyter Notebook Page 4 of 7 http://localhost:8888/notebooks/Python%20Case%20Studies/Uber%…sing%20Python/Uber%20Trips%20Analysis%20using%20Python.ipynb In [8]: -- So by looking into the hourly data we can say that Uber trips decreases after midnight and then start increases around 5:00 PM and then keep rising till 6:00 PM. It is the busiest hour and after that it again starts decreasing. #Now let's look at the uber trips according to the hours: plt.figure(figsize= (12, 7)) sns.distplot(uber['Hour']) plt.show() 1 2 3 4
  • 5.
    28/12/22, 5:16 PM UberTrips Analysis using Python - Jupyter Notebook Page 5 of 7 http://localhost:8888/notebooks/Python%20Case%20Studies/Uber%…sing%20Python/Uber%20Trips%20Analysis%20using%20Python.ipynb In [9]: -- By observing the above data, we can say that Uber trips on Sunday's are higher than Saturdays. So, we can say that people use Uber on those days for outings rather than just going to work. On Saturdays Uber trips are the lowest and on Mondays they are the highest. #Now let's analyze Uber trips according to the weekdays: plt.figure(figsize = (12, 7)) sns.distplot(uber['Weekday']) plt.show() 1 2 3 4
  • 6.
    28/12/22, 5:16 PM UberTrips Analysis using Python - Jupyter Notebook Page 6 of 7 http://localhost:8888/notebooks/Python%20Case%20Studies/Uber%…sing%20Python/Uber%20Trips%20Analysis%20using%20Python.ipynb In [10]: -- As we have the data about the longitude and lattitude so we can also plot the density of Uber trips according to the regions of New York City. In [11]: Out[11]: Date/Time Lat Lon Base Day Weekday Hour 0 2014-09-01 00:01:00 40.2201 -74.0021 B02512 1 0 0 1 2014-09-01 00:01:00 40.7500 -74.0027 B02512 1 0 0 2 2014-09-01 00:03:00 40.7559 -73.9864 B02512 1 0 0 3 2014-09-01 00:06:00 40.7450 -73.9889 B02512 1 0 0 4 2014-09-01 00:11:00 40.8145 -73.9444 B02512 1 0 0 #Now let's have a look at the correlation of hours and weekdays on #Correlation of Weekday and Hours df = uber.groupby(['Weekday', 'Hour']).apply(lambda x: len(x)) df = df.unstack() sns.heatmap(df, annot = False) plt.show() uber.head() 1 2 3 4 5 6 7 1
  • 7.
    28/12/22, 5:16 PM UberTrips Analysis using Python - Jupyter Notebook Page 7 of 7 http://localhost:8888/notebooks/Python%20Case%20Studies/Uber%…sing%20Python/Uber%20Trips%20Analysis%20using%20Python.ipynb In [12]: So this is how I analyzed the uber trips for New York City. Some of the conclusion I derived are as follows: 1. Mondays are the most profitable for Uber. 2. On Saturdays less number of people uses Uber. 3. 6:00 PM is the busiest hour for Uber. 4. On average, a rise in Uber trips start around 5:00 AM. 5. Most of the Uber trips originate around Manhattan region in the New York. In [ ]: uber.plot(kind = 'scatter', x = 'Lon', y = 'Lat', alpha = 0.4, figsize = (12, 7), cmap= plt.get_cmap('jet')) plt.title("Uber Trip Analysis") plt.legend() plt.show() 1 2 3 4 5 6 1