10. “…you will need to send or bring an
external hard drive with a minimum
capacity of 200 GB to the TLC
offices. The address is listed below.
The hard drive must be brand new,
still in the box and unopened. “
13. Fare Data – 12 CSVs (~2GB
each)
• Payment Type
• Fare
• Tax
• Tip (Credit Card Only)
• Tolls
Trip Data – 12 CSVs (~2GB each)
• Medallion
• Driver
• Pickup Time and Location
• Dropoff Time and Location
173 Million Trips in 2013!
14.
15.
16. • Shared via Torrent and Direct Download
• Urban Data Nerds Rejoice!
• Lots of Interesting Projects and Analysis
17.
18.
19. Driver and Vehicle De - Anonymization
hash = md5(medallionNumber)
Medallion: 6B111958A39B24140C973B262EA9FEA5
Hack License: D3B035A03C8A34DA17488129DA581EE7
There are only 19 million possible values!
21. But what will I do with the data?
I love:
• Data Visualization
• Mapping
• Transportation
• Civic Hacking
• Node.js
• D3.js
• Learning how cities work
SpatioTemporal Data is PERFECT for map animations!
22. Burning Questions:
Where do cabs go?
How much $ to they make?
How much time do they spend empty/full?
23. The Vision:
Pick a single Taxi from the data,
follow it for 24 hours on a map.
Repeat.
24. 1) Get Raw Data
(Isolate a collection of full “cab/days”)
Figure it out myself : HARD!
Ask the internets for help: EASY!
+
25.
26. 2) Turn each trip into a line :
Directions API
Start and End Points IN
Directions with encoded
polylines OUT
Raw Data (Trips & Fares) Node.js Script Trips + Polylines