This presentation analyzes NYC taxi data to answer several questions:
- Manhattan has the most taxi pickups per hour, especially after 5pm, though some data is unknown.
- Hot weather only slightly increases taxi trips compared to normal weather.
- After green taxis were introduced, yellow taxi trips decreased yearly while green taxi trips increased from 1% to 12% of the total between 2013 to 2015.
- Rush hours see much higher volumes of taxi pickups but average fares are slightly lower than other hours, despite the total fares being higher.
2. This presentation will cover the following questions:
• The most pick up locations on hourly basis in a day
• How does hot weather affect to the number of taxi trips?
• How did the demand change after green taxi appearance?
• How does the fare differ of “rush hour” and “other hours”?
3. The most pick up locations on hourly basis in a day
(borough)
0
500000
1000000
1500000
2000000
2500000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Bronx
Brooklyn
EWR
Manhattan
Queens
Staten Island
Unknown
(blank)
Most of hours in a day, especially after 17, Manhattan have the biggest share, though there a huge amount of
unknown and blank.
Script:
SELECT HOUR(pickup_datetime) As pickup_hour, borough, count(borough) as qty FROM [bigquery-public-
data:new_york.tlc_fhv_trips_2015] group by pickup_hour, borough
4. Hot weather vs number of taxi trips?
There is a slight difference between hot and normal
weather conditions taxi trips quantity (the graph show the
number of Yellow taxi trips in 2016).
Script for How hot weather affects to number of taxi trips:
SELECT date(pickup_datetime) as pickup_date, count(
pickup_datetime) as qty
FROM [bigquery-public-
data:new_york.tlc_yellow_trips_2016] group by
pickup_date
SELECT CONCAT( year, '-', mo, '-', da) as temp_date, case
when AVG(temp)>=40 then "hot" else "normal" end as
temp_hot
FROM [bigquery-public-data:noaa_gsod.gsod2016] group
by temp_date
358803
353901
351000
352000
353000
354000
355000
356000
357000
358000
359000
360000
hot normal
5. How did the demand change after green taxi
appearance on yearly basis
1% 9%
12%
11%
0
50000000
100000000
150000000
200000000
2009 2010 2011 2012 2013 2014 2015 2016
yellow_yearly_qty green_yearly_qty
After green taxi appearance in market the yearly quantity of yellow taxi trips decreased. The green
taxi yearly pick ups share in total amount of yellow and green taxi together in 2013 was 1% which
reached to 12% in 2015.
6. Script for How the demand was changed after green taxi appearance on yearly basis:
SELECT year( pickup_datetime) as green_year, count( pickup_datetime) as qty
FROM [bigquery-public-data:new_york.tlc_green_trips_2013] group by green_year
SELECT year( pickup_datetime) as green_year, count( pickup_datetime) as qty
FROM [bigquery-public-data:new_york.tlc_green_trips_2014] group by green_year
SELECT year( pickup_datetime) as green_year, count( pickup_datetime) as qty
FROM [bigquery-public-data:new_york.tlc_green_trips_2015] group by green_year
SELECT year( pickup_datetime) as green_year, count( pickup_datetime) as qty
FROM [bigquery-public-data:new_york.tlc_green_trips_2016] group by green_year
SELECT year( pickup_datetime) as yellow_year, count( pickup_datetime) FROM [nyc-tlc:yellow.trips] group by
yellow_year
SELECT year( pickup_datetime) as yellow_year, count( pickup_datetime) as qty
FROM [bigquery-public-data:new_york.tlc_yellow_trips_2016] group by yellow_year
7. “rush hour” and “other hours” fare difference
-
2,000,000
4,000,000
6,000,000
8,000,000
10,000,000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Hourly taxi pick ups quantity
0.00
5.00
10.00
15.00
20.00
25.00
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Hourly average fare
Script:
SELECT
Hour(pickup_datetime) AS
pickup_hour,
AVG( total_amount ) AS avg_fare,
SUM( total_amount ) AS sum_fare,
COUNT( pickup_datetime) AS qty
FROM [bigquery-public-
data:new_york.tlc_yellow_trips_2016
]
GROUP BY
pickup_hour
8. The previous graph shows
that, hourly average fare
didn’t change dramatically, so
the sum of fare changes is
due to the hourly changes of
pick up quantity.
What is surprising, the
average fare for rush hours is
less then for other hours
(Rush hour is defined as “if
pick up quantity greater than
the average quantity, then
the hour considers as rush
hour”).
16.3
17.1
15.8
16.0
16.2
16.4
16.6
16.8
17.0
17.2
rush hour other hours
average of fare
109907757
48720367
0
20000000
40000000
60000000
80000000
100000000
120000000
rush hour other hours
sum of fare