● Led as project manager to analyze 2017 DC taxi trips including 1.1 M data, 150 taxi companies and 39 variables.
● Processed and cleaned the dataset by using Python. Used Tableau to analyze and do visualization.
● Successfully built the model to detect demand and supply pattern in each neighborhood. Stood out from 69 teams and got invited to present the result to Department of For-Hire Vehicles data team in DC.
5. 3
11M
2017 Trip Data from
DFHV
Time, fare amount, origin,
destination, mileage, duration,
etc.
42 Variables
53
Zip codes
Data Description
150
Taxi companies
6. 4 Neighborhoods
• Divided DC into 217
neighborhoods
• These are small regions
considered to possess
uniform characters
• Added neighborhood
information for each trip
11. 9 How do we decide price?
• Match supply with ridership/demand
• Study pattern in ridership and identify anomalies
○ Hourly and Weekly demand pattern in each neighborhood
○ Identify anomalies in demand pattern by comparing with overall pattern
16. 14 Recommendations
Anomalies for the Union Station Neighborhood
● High Ridership: Experiment by increasing fares and assess
the impact on ridership
● Low Ridership: Experiment by providing discounts and assess
the impact on ridership
17. 15 What to do in future?
➔Conduct our analysis on complete dataset
➔Apply machine learning to cluster
neighborhoods based on their characteristics
(hospitals, train station etc)
◆ Match this information with areas showing
anomalies to efficiently identify reasons for
anomalies
➔Collect drivers’ real-time geolocation
information for supply and analysis