SlideShare a Scribd company logo
1 of 13
Download to read offline
MapMyCab
Preetika Kulshrestha
Motivation
• Tool for Data Scientists and Cab dispatchers to analyze
(by time of day or day of week):
• cab occupancy
• miles travelled
• pickups and drop-offs
• An app for city dwellers to view real-time cab status for
unoccupied cabs in a given area
Demo
Pipeline
Script Message
Broker
Real-Time
Streaming
HDFS
HBase UI
MrJob
Data Flow
CabID Lat Long Occ Timestamp
yyyy_m_day AvgOcc Pickups Drops Miles
MrJob
Tables
• Hourly data organized by Day of Week
• Aggregate metrics stored in the same table for fast retrieval
y_m_dow c:0 c:1 c:2 c:3 c:4 … c23 c:Totals
Day of Week Hour 0 Attributes hr 1 hr 2 hr 3 hr 4 … hr 23 ..
2008_01_Mon
pickups, dropoffs,
avg_occ, avg_dist
.. .. .. .. .. ..
sum(pickups),
sum(dropoffs), avg(occ),
avg(dist)
2008_01_Tue
<pickups, dropoffs,
avg_occ, avg_dist>
.. .. .. .. .. ..
<sum(pickups),
sum(dropoffs), avg(occ),
avg(dist)>
2008_01_Wed
<pickups, dropoffs,
avg_occ, avg_dist>
.. .. .. .. .. ..
<sum(pickups),
sum(dropoffs), avg(occ),
avg(dist)>
Hourly Aggregates by Day of Week
API and Lessons Learned
• Need to safeguard against corrupt data
• Workflow is very important when connecting
different tools
About Me
• Previous Life - Senior Energy
Analyst (EnerNOC Inc.).
• M.S. Electrical Engineering -
North Carolina State University
(focus on robotics, control
systems and smart grid).
• https://github.com/PreetikaKuls
• preetika.kulshrestha@gmail.com
Pipeline
Script Message
Broker
Real-Time
Streaming
HDFS
HBase UI
MrJob
Python Script
uid, lat, long,
timestamp, occ
y_m_dow_h, pickups,
drops, dist, occ
y_m_dow, hour(pickups,
drops, dist, occ)
Hive
Data
Item SF Cabs
Description
GPS coordinates of approx. 500 SF cabs collected
over 30 days
Format
[latitude (float), longitude (float), occupancy
(boolean), time (timestamp)]
Size ~ 500 MB
Throughput
50-100 messages/sec (500 cabs, 5-10 min
granularity)
Master Data Set
Time
CabID!
Lat | Long | Occupancy
CabID!
Lat | Long | Occupancy
-—>
CabID
Timestamp!
Lat | Long | Occupancy
Timestamp!
Lat | Long | Occupancy
-—>
Retrieve all data for a given time frame
where latitude and longitude fall with in a specific range
Analyze data based on timestamp
Batch Processing Result
Features and Example
Queries
Features!
• A system that uses crowdsourcing to automatically generate
parking spot information for streets
• Parking information overlaid on Google Maps
Queries!
• Does West Middlefield Road allow for street parking?
• Can I park on this street for more than 2 hours?
• Which nearby streets might have better parking availability?

More Related Content

What's hot

DataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scaleDataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scaleHakka Labs
 
Koober Machine Learning
Koober Machine LearningKoober Machine Learning
Koober Machine LearningJames Ward
 
Trb 2017 annual_conference_visualization_lightning_talk_rst
Trb 2017 annual_conference_visualization_lightning_talk_rstTrb 2017 annual_conference_visualization_lightning_talk_rst
Trb 2017 annual_conference_visualization_lightning_talk_rstRobert Tung
 
Logistics (University Paper)
Logistics (University Paper)Logistics (University Paper)
Logistics (University Paper)Shokhzod Yakubov
 
Geo-Processing in the Clouds
Geo-Processing in the CloudsGeo-Processing in the Clouds
Geo-Processing in the CloudsRobert Coup
 
NYC Ferry Weekday Service Optimization in the Midst of COVID-19
NYC Ferry Weekday Service Optimization in the Midst of COVID-19NYC Ferry Weekday Service Optimization in the Midst of COVID-19
NYC Ferry Weekday Service Optimization in the Midst of COVID-19Joseph Chow
 
Graph Computing with Apache TinkerPop
Graph Computing with Apache TinkerPopGraph Computing with Apache TinkerPop
Graph Computing with Apache TinkerPopJason Plurad
 
JanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching ForwardJanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching ForwardJason Plurad
 
Heavy Hitters Change Detection
Heavy Hitters Change DetectionHeavy Hitters Change Detection
Heavy Hitters Change Detectionmadhucharis
 
AWS for mega(geo)data
AWS for mega(geo)dataAWS for mega(geo)data
AWS for mega(geo)dataAndrew Zolnai
 
Relay Local State Management: Replacing Redux
Relay Local State Management: Replacing ReduxRelay Local State Management: Replacing Redux
Relay Local State Management: Replacing ReduxJoao Marins
 
Community-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphCommunity-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphJason Plurad
 
Insight_Project_Presentation
Insight_Project_PresentationInsight_Project_Presentation
Insight_Project_Presentationdforthomme
 
Scalable Data Analytics and Visualization with Cloud Optimized Services
Scalable Data Analytics and Visualization with Cloud Optimized ServicesScalable Data Analytics and Visualization with Cloud Optimized Services
Scalable Data Analytics and Visualization with Cloud Optimized ServicesGlobus
 
Join semantics in kafka streams
Join semantics in kafka streamsJoin semantics in kafka streams
Join semantics in kafka streamsKnoldus Inc.
 
AWS and Terraform for Disaster Recovery
AWS and Terraform for Disaster RecoveryAWS and Terraform for Disaster Recovery
AWS and Terraform for Disaster RecoveryRandall Thomson
 
Map Analytics in Starcraft II
Map Analytics in Starcraft IIMap Analytics in Starcraft II
Map Analytics in Starcraft IIgy8
 

What's hot (20)

DataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scaleDataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scale
 
5200 Analysis-Airbnb data
5200 Analysis-Airbnb data5200 Analysis-Airbnb data
5200 Analysis-Airbnb data
 
velocityatlas
velocityatlasvelocityatlas
velocityatlas
 
Koober Machine Learning
Koober Machine LearningKoober Machine Learning
Koober Machine Learning
 
Trb 2017 annual_conference_visualization_lightning_talk_rst
Trb 2017 annual_conference_visualization_lightning_talk_rstTrb 2017 annual_conference_visualization_lightning_talk_rst
Trb 2017 annual_conference_visualization_lightning_talk_rst
 
Logistics (University Paper)
Logistics (University Paper)Logistics (University Paper)
Logistics (University Paper)
 
Geo-Processing in the Clouds
Geo-Processing in the CloudsGeo-Processing in the Clouds
Geo-Processing in the Clouds
 
NYC Ferry Weekday Service Optimization in the Midst of COVID-19
NYC Ferry Weekday Service Optimization in the Midst of COVID-19NYC Ferry Weekday Service Optimization in the Midst of COVID-19
NYC Ferry Weekday Service Optimization in the Midst of COVID-19
 
Graph Computing with Apache TinkerPop
Graph Computing with Apache TinkerPopGraph Computing with Apache TinkerPop
Graph Computing with Apache TinkerPop
 
JanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching ForwardJanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching Forward
 
Heavy Hitters Change Detection
Heavy Hitters Change DetectionHeavy Hitters Change Detection
Heavy Hitters Change Detection
 
AWS for mega(geo)data
AWS for mega(geo)dataAWS for mega(geo)data
AWS for mega(geo)data
 
Relay Local State Management: Replacing Redux
Relay Local State Management: Replacing ReduxRelay Local State Management: Replacing Redux
Relay Local State Management: Replacing Redux
 
Community-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphCommunity-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraph
 
Jennifer marche
Jennifer marcheJennifer marche
Jennifer marche
 
Insight_Project_Presentation
Insight_Project_PresentationInsight_Project_Presentation
Insight_Project_Presentation
 
Scalable Data Analytics and Visualization with Cloud Optimized Services
Scalable Data Analytics and Visualization with Cloud Optimized ServicesScalable Data Analytics and Visualization with Cloud Optimized Services
Scalable Data Analytics and Visualization with Cloud Optimized Services
 
Join semantics in kafka streams
Join semantics in kafka streamsJoin semantics in kafka streams
Join semantics in kafka streams
 
AWS and Terraform for Disaster Recovery
AWS and Terraform for Disaster RecoveryAWS and Terraform for Disaster Recovery
AWS and Terraform for Disaster Recovery
 
Map Analytics in Starcraft II
Map Analytics in Starcraft IIMap Analytics in Starcraft II
Map Analytics in Starcraft II
 

Similar to MapMyCab

Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series ExampleCassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series ExampleDataStax Academy
 
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...Data Con LA
 
Spark: Interactive To Production
Spark: Interactive To ProductionSpark: Interactive To Production
Spark: Interactive To ProductionJen Aman
 
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Ontico
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Alexey Zinoviev
 
QCon SF-2015 Stream Processing in uber
QCon SF-2015 Stream Processing in uberQCon SF-2015 Stream Processing in uber
QCon SF-2015 Stream Processing in uberDanny Yuan
 
Stream Computing & Analytics at Uber
Stream Computing & Analytics at UberStream Computing & Analytics at Uber
Stream Computing & Analytics at UberSudhir Tonse
 
Halko_santafe_2015
Halko_santafe_2015Halko_santafe_2015
Halko_santafe_2015Nathan Halko
 
Drinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsDrinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsSamantha Quiñones
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax Academy
 
A Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataA Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataDatabricks
 
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...Carol McDonald
 
Aug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminarAug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminarbalmanme
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!Ian Foster
 
Analyzing NYC Transit Data
Analyzing NYC Transit DataAnalyzing NYC Transit Data
Analyzing NYC Transit DataWork-Bench
 
Gis capabilities on Big Data Systems
Gis capabilities on Big Data SystemsGis capabilities on Big Data Systems
Gis capabilities on Big Data SystemsAhmad Jawwad
 
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAUNye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAUInfinIT - Innovationsnetværket for it
 
Apache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseApache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseFlorian Lautenschlager
 
API Testing. Streamline your testing process.
API Testing. Streamline your testing process.API Testing. Streamline your testing process.
API Testing. Streamline your testing process.Andrey Oleynik
 

Similar to MapMyCab (20)

Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series ExampleCassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
 
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
 
Spark: Interactive To Production
Spark: Interactive To ProductionSpark: Interactive To Production
Spark: Interactive To Production
 
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
 
QCon SF-2015 Stream Processing in uber
QCon SF-2015 Stream Processing in uberQCon SF-2015 Stream Processing in uber
QCon SF-2015 Stream Processing in uber
 
Stream Computing & Analytics at Uber
Stream Computing & Analytics at UberStream Computing & Analytics at Uber
Stream Computing & Analytics at Uber
 
Halko_santafe_2015
Halko_santafe_2015Halko_santafe_2015
Halko_santafe_2015
 
Drinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsDrinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time Metrics
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
 
A Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataA Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big Data
 
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
 
Aug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminarAug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminar
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 
Analyzing NYC Transit Data
Analyzing NYC Transit DataAnalyzing NYC Transit Data
Analyzing NYC Transit Data
 
Presentation
PresentationPresentation
Presentation
 
Gis capabilities on Big Data Systems
Gis capabilities on Big Data SystemsGis capabilities on Big Data Systems
Gis capabilities on Big Data Systems
 
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAUNye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
 
Apache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseApache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series database
 
API Testing. Streamline your testing process.
API Testing. Streamline your testing process.API Testing. Streamline your testing process.
API Testing. Streamline your testing process.
 

Recently uploaded

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 

MapMyCab

  • 2. Motivation • Tool for Data Scientists and Cab dispatchers to analyze (by time of day or day of week): • cab occupancy • miles travelled • pickups and drop-offs • An app for city dwellers to view real-time cab status for unoccupied cabs in a given area
  • 5. Data Flow CabID Lat Long Occ Timestamp yyyy_m_day AvgOcc Pickups Drops Miles MrJob
  • 6. Tables • Hourly data organized by Day of Week • Aggregate metrics stored in the same table for fast retrieval y_m_dow c:0 c:1 c:2 c:3 c:4 … c23 c:Totals Day of Week Hour 0 Attributes hr 1 hr 2 hr 3 hr 4 … hr 23 .. 2008_01_Mon pickups, dropoffs, avg_occ, avg_dist .. .. .. .. .. .. sum(pickups), sum(dropoffs), avg(occ), avg(dist) 2008_01_Tue <pickups, dropoffs, avg_occ, avg_dist> .. .. .. .. .. .. <sum(pickups), sum(dropoffs), avg(occ), avg(dist)> 2008_01_Wed <pickups, dropoffs, avg_occ, avg_dist> .. .. .. .. .. .. <sum(pickups), sum(dropoffs), avg(occ), avg(dist)> Hourly Aggregates by Day of Week
  • 7. API and Lessons Learned • Need to safeguard against corrupt data • Workflow is very important when connecting different tools
  • 8. About Me • Previous Life - Senior Energy Analyst (EnerNOC Inc.). • M.S. Electrical Engineering - North Carolina State University (focus on robotics, control systems and smart grid). • https://github.com/PreetikaKuls • preetika.kulshrestha@gmail.com
  • 9. Pipeline Script Message Broker Real-Time Streaming HDFS HBase UI MrJob Python Script uid, lat, long, timestamp, occ y_m_dow_h, pickups, drops, dist, occ y_m_dow, hour(pickups, drops, dist, occ) Hive
  • 10. Data Item SF Cabs Description GPS coordinates of approx. 500 SF cabs collected over 30 days Format [latitude (float), longitude (float), occupancy (boolean), time (timestamp)] Size ~ 500 MB Throughput 50-100 messages/sec (500 cabs, 5-10 min granularity)
  • 11. Master Data Set Time CabID! Lat | Long | Occupancy CabID! Lat | Long | Occupancy -—> CabID Timestamp! Lat | Long | Occupancy Timestamp! Lat | Long | Occupancy -—> Retrieve all data for a given time frame where latitude and longitude fall with in a specific range Analyze data based on timestamp
  • 13. Features and Example Queries Features! • A system that uses crowdsourcing to automatically generate parking spot information for streets • Parking information overlaid on Google Maps Queries! • Does West Middlefield Road allow for street parking? • Can I park on this street for more than 2 hours? • Which nearby streets might have better parking availability?