SlideShare a Scribd company logo
ETA
Prediction
Challenge
@DataHack
A taxi goes from
Chinatown to Times
Square. How long will
it take to arrive?
Taxi challenge by @Final
In this challenge, you are given data
on taxi rides in New York, containing
information on each ride such as the
start and end points, date, time of
day, distance, etc...
Our purpose is to predict the travel
time (in logarithmic scale) of a ride.
The data is split to train and test sets,
and we can use both general data of
the ride with local data on similar
rides from the train set.
Data : Goal :
Ride Information - Given Dataset
● From / To coordinates (lon, lat)
● Departure timestamp
● Trip distance (road distance)
● Vendor - Taxi company (Found to be not important)
● Passenger count (Found to be not important)
Data Wizard, expert in big
data processing and
production ready ML.
Googler, Wazer and traffic
analytics expert.
Data Ninja, a pure
professional in every data
spect from gathering and
exploring to modeling.
Kaggle Master
An innovative ML expert
and programmer, expert in
feature engineering and
selection techniques.
Kaggle Master
The Team
A group of talented and creative world class professionals in ML and Traffic Analytics
Nir Malbin Gad Benram Seffi Cohen
CDS (Chief Data Scientist)
for the Israeli Defense
Forces, and pioneer in ML
ensemble techniques.
Kaggle Master
Daniel Marcous
How it works
Dataset
Train
(Train model on)
Test
(Make prediction on)
Public score
(30%)
Private score
(70%)
Reminders
The Metric
Mean Square Error (log values) / Variance (constant)
sum((y-y')^2) / sum((y-avg(y))^2)
Notes :
● Interesting part :
𝚺((real-pred)^2) ~ Least Squares
● Log values
○ Mistake weight goes down for longer rides
○ Mistake is determined by error percentile -
10 minutes mistake on a 20 minutes ride
matters the same as
20 minutes mistake on a 40 minutes ride.
{log(a)-log(b)=log(a/b)}
Predicting ETA Using :
1. Ride Information
2. Environment
3. Geography
4. Inferred States
Machine
Learning
Data Shortage
● We Don’t Have
○ Historical speeds
○ Real Time speeds
● Box coordinates to NYC (remove 0.0 etc.)
● Remove very long / far rides (>2h/65km)
● Remove unreasonable speed / time-distance ratio
○ Remove 5% anomalies (Top & Bottom)
Data Cleaning
New feature - Abnormal Ride
Feature Engineering
Datetime based features
● Month start / end
● Day / Day of week / Hour / 15 Minute interval
● Is weekend / business day
● Is work hour (09:00-17:00)
● Is rush hour (morning / afternoon)
● Is holiday
● Coordinate Transformations (Coding directionality)
○ PCA 2 2
○ RBF
● Spheric (geo) distance
● Distance percentile
● Spheric-Trip distance ratio
Location based features
City based features
● NYC Neighbourhood (pair crossing)
● Distance to points of interest (100X2)
○ Schools / Hospitals / Parks etc.
PCA 2
Weather based features
● Temperature
● Events - Rain / Snow etc.
● Humidity
● Wind
● Visibility
● Min / Max / Avg / std etc.
PCA 2
Inferred Traffic based
features
PCA 1
● Assumption :
our data is a representative sample
of the NYC’s - “driving population”
● Crowdedness
○ #rides in X radius
■ 100 / 500 / 1500 / 5000
■ Euclidean / Manhattan
News based
features
1. Crawling NYTimes
2. Topic Modeling
3. Finding topics correlated with ETA
4. Using top10 correlated topics as
features
a. Number of articles on a day for
every topic
Results
Caveats
● Timeseries future mixing
● Not exactly a metric that might be the most important
○ Same weight for positive / negative error
● Crowdedness - assumes that data is a representative sample of the total
car population
○ E.g. 2 times the samples equals 2 times the traffic
● Variance - taken from original validation dataset (constant)
Public Leader Board
Team Score
Team DCountdown 0.150041
Squanchers 0.160468
Noa's stars 0.165869
Aperture Science 0.167958
R-North 0.175308
TAU Deep Learning Lab 0.182602
Mr Terminal 0.193593
MTG 0.282009
SuperFish 0.302637
Summary
Machine Learning Approach
● Black Box (~ish)
● Harder to deploy
● Retrain when system changes
Advantages : Disadvantages :
● No manual tuning
● No complex heuristics
● Optimise different metrics
● Personalisation
● Taking different ideas and their
interactions into account as one

More Related Content

What's hot

PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
Jinwon Lee
 
Interpretable machine learning
Interpretable machine learningInterpretable machine learning
Interpretable machine learning
Sri Ambati
 
Style space analysis paper review !
Style space analysis paper review !Style space analysis paper review !
Style space analysis paper review !
taeseon ryu
 
Building Random Forest at Scale
Building Random Forest at ScaleBuilding Random Forest at Scale
Building Random Forest at Scale
Sri Ambati
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
Mark Chang
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
Knoldus Inc.
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
Jinwon Lee
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
Shunta Saito
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Sebastian Raschka
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
HARISH R
 
Pr083 Non-local Neural Networks
Pr083 Non-local Neural NetworksPr083 Non-local Neural Networks
Pr083 Non-local Neural Networks
Taeoh Kim
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
ArangoDB Database
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
amalalhait
 
Decision tree
Decision treeDecision tree
Decision tree
SEMINARGROOT
 
Topological data analysis
Topological data analysisTopological data analysis
Topological data analysis
Sunghyon Kyeong
 
Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...
Universitat Politècnica de Catalunya
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Sungchul Kim
 
Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.
Rohit Kumar
 
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Universitat Politècnica de Catalunya
 

What's hot (20)

PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
 
Interpretable machine learning
Interpretable machine learningInterpretable machine learning
Interpretable machine learning
 
Style space analysis paper review !
Style space analysis paper review !Style space analysis paper review !
Style space analysis paper review !
 
Building Random Forest at Scale
Building Random Forest at ScaleBuilding Random Forest at Scale
Building Random Forest at Scale
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
 
Pr083 Non-local Neural Networks
Pr083 Non-local Neural NetworksPr083 Non-local Neural Networks
Pr083 Non-local Neural Networks
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Decision tree
Decision treeDecision tree
Decision tree
 
Topological data analysis
Topological data analysisTopological data analysis
Topological data analysis
 
Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.
 
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
 

Similar to Prediction of taxi rides ETA

Kharita: Robust Road Map Inference Through Network Alignment of Trajectories
Kharita: Robust Road Map Inference Through Network Alignment of TrajectoriesKharita: Robust Road Map Inference Through Network Alignment of Trajectories
Kharita: Robust Road Map Inference Through Network Alignment of Trajectories
vipyoung
 
Webinar: Using smart card and GPS data for policy and planning: the case of T...
Webinar: Using smart card and GPS data for policy and planning: the case of T...Webinar: Using smart card and GPS data for policy and planning: the case of T...
Webinar: Using smart card and GPS data for policy and planning: the case of T...
BRTCoE
 
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...
Biplav Srivastava
 
Analysing road traffic
Analysing road trafficAnalysing road traffic
Analysing road traffic
SailendraRajuMeruga
 
Theme 3b Users perspective of integrated transit systems
Theme 3b Users perspective of integrated transit systemsTheme 3b Users perspective of integrated transit systems
Theme 3b Users perspective of integrated transit systems
BRTCoE
 
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...
csandit
 
Crunching Gigabytes Locally
Crunching Gigabytes LocallyCrunching Gigabytes Locally
Crunching Gigabytes Locally
Dima Korolev
 
Machine Learning Approach to Report Prioritization with an ...
Machine Learning Approach to Report Prioritization with an ...Machine Learning Approach to Report Prioritization with an ...
Machine Learning Approach to Report Prioritization with an ...
butest
 
Cab travel time prediction using ensemble models
Cab travel time prediction using ensemble modelsCab travel time prediction using ensemble models
Cab travel time prediction using ensemble models
Ayan Sengupta
 
Insight_Project_Presentation
Insight_Project_PresentationInsight_Project_Presentation
Insight_Project_Presentation
dforthomme
 
3rd Conference on Sustainable Urban Mobility
3rd Conference on Sustainable Urban Mobility3rd Conference on Sustainable Urban Mobility
3rd Conference on Sustainable Urban Mobility
LIFE GreenYourMove
 
Chapter 3&4
Chapter 3&4Chapter 3&4
Chapter 3&4
EWIT
 
Supply chain logistics : vehicle routing and scheduling
Supply chain logistics : vehicle  routing and  schedulingSupply chain logistics : vehicle  routing and  scheduling
Supply chain logistics : vehicle routing and scheduling
Retigence Technologies
 
Spark Summit EU talk by Javier Aguedes
Spark Summit EU talk by Javier AguedesSpark Summit EU talk by Javier Aguedes
Spark Summit EU talk by Javier Aguedes
Spark Summit
 
IV2021-431-slides.pdf
IV2021-431-slides.pdfIV2021-431-slides.pdf
IV2021-431-slides.pdf
M. Ilhan Akbas
 
IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...
IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...
IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...
Yamato OKAMOTO
 
Replacing Manhattan Subway Service with On-demand transportation
Replacing Manhattan Subway Service with On-demand transportationReplacing Manhattan Subway Service with On-demand transportation
Replacing Manhattan Subway Service with On-demand transportation
Christian Moscardi
 
Christian Moscardi Presentation
Christian Moscardi PresentationChristian Moscardi Presentation
Christian Moscardi Presentation
Joseph Chow
 
KTH-Texxi Project 2010
KTH-Texxi Project 2010KTH-Texxi Project 2010
KTH-Texxi Project 2010
Texxi Global
 
Participatory Project
Participatory ProjectParticipatory Project
Participatory Project
#Xiao Zhe#
 

Similar to Prediction of taxi rides ETA (20)

Kharita: Robust Road Map Inference Through Network Alignment of Trajectories
Kharita: Robust Road Map Inference Through Network Alignment of TrajectoriesKharita: Robust Road Map Inference Through Network Alignment of Trajectories
Kharita: Robust Road Map Inference Through Network Alignment of Trajectories
 
Webinar: Using smart card and GPS data for policy and planning: the case of T...
Webinar: Using smart card and GPS data for policy and planning: the case of T...Webinar: Using smart card and GPS data for policy and planning: the case of T...
Webinar: Using smart card and GPS data for policy and planning: the case of T...
 
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...
 
Analysing road traffic
Analysing road trafficAnalysing road traffic
Analysing road traffic
 
Theme 3b Users perspective of integrated transit systems
Theme 3b Users perspective of integrated transit systemsTheme 3b Users perspective of integrated transit systems
Theme 3b Users perspective of integrated transit systems
 
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...
 
Crunching Gigabytes Locally
Crunching Gigabytes LocallyCrunching Gigabytes Locally
Crunching Gigabytes Locally
 
Machine Learning Approach to Report Prioritization with an ...
Machine Learning Approach to Report Prioritization with an ...Machine Learning Approach to Report Prioritization with an ...
Machine Learning Approach to Report Prioritization with an ...
 
Cab travel time prediction using ensemble models
Cab travel time prediction using ensemble modelsCab travel time prediction using ensemble models
Cab travel time prediction using ensemble models
 
Insight_Project_Presentation
Insight_Project_PresentationInsight_Project_Presentation
Insight_Project_Presentation
 
3rd Conference on Sustainable Urban Mobility
3rd Conference on Sustainable Urban Mobility3rd Conference on Sustainable Urban Mobility
3rd Conference on Sustainable Urban Mobility
 
Chapter 3&4
Chapter 3&4Chapter 3&4
Chapter 3&4
 
Supply chain logistics : vehicle routing and scheduling
Supply chain logistics : vehicle  routing and  schedulingSupply chain logistics : vehicle  routing and  scheduling
Supply chain logistics : vehicle routing and scheduling
 
Spark Summit EU talk by Javier Aguedes
Spark Summit EU talk by Javier AguedesSpark Summit EU talk by Javier Aguedes
Spark Summit EU talk by Javier Aguedes
 
IV2021-431-slides.pdf
IV2021-431-slides.pdfIV2021-431-slides.pdf
IV2021-431-slides.pdf
 
IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...
IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...
IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...
 
Replacing Manhattan Subway Service with On-demand transportation
Replacing Manhattan Subway Service with On-demand transportationReplacing Manhattan Subway Service with On-demand transportation
Replacing Manhattan Subway Service with On-demand transportation
 
Christian Moscardi Presentation
Christian Moscardi PresentationChristian Moscardi Presentation
Christian Moscardi Presentation
 
KTH-Texxi Project 2010
KTH-Texxi Project 2010KTH-Texxi Project 2010
KTH-Texxi Project 2010
 
Participatory Project
Participatory ProjectParticipatory Project
Participatory Project
 

More from Daniel Marcous

Cloud AI Platform Notebooks - Kaggle IL
Cloud AI Platform Notebooks - Kaggle ILCloud AI Platform Notebooks - Kaggle IL
Cloud AI Platform Notebooks - Kaggle IL
Daniel Marcous
 
S2
S2S2
Towards Smart Transportation DSS 2018
Towards Smart Transportation DSS 2018Towards Smart Transportation DSS 2018
Towards Smart Transportation DSS 2018
Daniel Marcous
 
Distributed Databases - Concepts & Architectures
Distributed Databases - Concepts & ArchitecturesDistributed Databases - Concepts & Architectures
Distributed Databases - Concepts & Architectures
Daniel Marcous
 
Distributed K-Betweenness (Spark)
Distributed K-Betweenness (Spark)Distributed K-Betweenness (Spark)
Distributed K-Betweenness (Spark)
Daniel Marcous
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
Daniel Marcous
 
Big Data - Big Insights - Waze @Google
Big Data - Big Insights - Waze @GoogleBig Data - Big Insights - Waze @Google
Big Data - Big Insights - Waze @Google
Daniel Marcous
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architectures
Daniel Marcous
 
Data Visualisation
Data VisualisationData Visualisation
Data Visualisation
Daniel Marcous
 
Geo data analytics
Geo data analyticsGeo data analytics
Geo data analytics
Daniel Marcous
 

More from Daniel Marcous (10)

Cloud AI Platform Notebooks - Kaggle IL
Cloud AI Platform Notebooks - Kaggle ILCloud AI Platform Notebooks - Kaggle IL
Cloud AI Platform Notebooks - Kaggle IL
 
S2
S2S2
S2
 
Towards Smart Transportation DSS 2018
Towards Smart Transportation DSS 2018Towards Smart Transportation DSS 2018
Towards Smart Transportation DSS 2018
 
Distributed Databases - Concepts & Architectures
Distributed Databases - Concepts & ArchitecturesDistributed Databases - Concepts & Architectures
Distributed Databases - Concepts & Architectures
 
Distributed K-Betweenness (Spark)
Distributed K-Betweenness (Spark)Distributed K-Betweenness (Spark)
Distributed K-Betweenness (Spark)
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
Big Data - Big Insights - Waze @Google
Big Data - Big Insights - Waze @GoogleBig Data - Big Insights - Waze @Google
Big Data - Big Insights - Waze @Google
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architectures
 
Data Visualisation
Data VisualisationData Visualisation
Data Visualisation
 
Geo data analytics
Geo data analyticsGeo data analytics
Geo data analytics
 

Recently uploaded

办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 

Recently uploaded (20)

办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 

Prediction of taxi rides ETA

  • 2. A taxi goes from Chinatown to Times Square. How long will it take to arrive?
  • 3. Taxi challenge by @Final In this challenge, you are given data on taxi rides in New York, containing information on each ride such as the start and end points, date, time of day, distance, etc... Our purpose is to predict the travel time (in logarithmic scale) of a ride. The data is split to train and test sets, and we can use both general data of the ride with local data on similar rides from the train set. Data : Goal :
  • 4. Ride Information - Given Dataset ● From / To coordinates (lon, lat) ● Departure timestamp ● Trip distance (road distance) ● Vendor - Taxi company (Found to be not important) ● Passenger count (Found to be not important)
  • 5. Data Wizard, expert in big data processing and production ready ML. Googler, Wazer and traffic analytics expert. Data Ninja, a pure professional in every data spect from gathering and exploring to modeling. Kaggle Master An innovative ML expert and programmer, expert in feature engineering and selection techniques. Kaggle Master The Team A group of talented and creative world class professionals in ML and Traffic Analytics Nir Malbin Gad Benram Seffi Cohen CDS (Chief Data Scientist) for the Israeli Defense Forces, and pioneer in ML ensemble techniques. Kaggle Master Daniel Marcous
  • 6. How it works Dataset Train (Train model on) Test (Make prediction on) Public score (30%) Private score (70%)
  • 7. Reminders The Metric Mean Square Error (log values) / Variance (constant) sum((y-y')^2) / sum((y-avg(y))^2) Notes : ● Interesting part : 𝚺((real-pred)^2) ~ Least Squares ● Log values ○ Mistake weight goes down for longer rides ○ Mistake is determined by error percentile - 10 minutes mistake on a 20 minutes ride matters the same as 20 minutes mistake on a 40 minutes ride. {log(a)-log(b)=log(a/b)}
  • 8. Predicting ETA Using : 1. Ride Information 2. Environment 3. Geography 4. Inferred States Machine Learning
  • 9. Data Shortage ● We Don’t Have ○ Historical speeds ○ Real Time speeds
  • 10. ● Box coordinates to NYC (remove 0.0 etc.) ● Remove very long / far rides (>2h/65km) ● Remove unreasonable speed / time-distance ratio ○ Remove 5% anomalies (Top & Bottom) Data Cleaning New feature - Abnormal Ride
  • 12. Datetime based features ● Month start / end ● Day / Day of week / Hour / 15 Minute interval ● Is weekend / business day ● Is work hour (09:00-17:00) ● Is rush hour (morning / afternoon) ● Is holiday
  • 13. ● Coordinate Transformations (Coding directionality) ○ PCA 2 2 ○ RBF ● Spheric (geo) distance ● Distance percentile ● Spheric-Trip distance ratio Location based features
  • 14. City based features ● NYC Neighbourhood (pair crossing) ● Distance to points of interest (100X2) ○ Schools / Hospitals / Parks etc. PCA 2
  • 15. Weather based features ● Temperature ● Events - Rain / Snow etc. ● Humidity ● Wind ● Visibility ● Min / Max / Avg / std etc. PCA 2
  • 16. Inferred Traffic based features PCA 1 ● Assumption : our data is a representative sample of the NYC’s - “driving population” ● Crowdedness ○ #rides in X radius ■ 100 / 500 / 1500 / 5000 ■ Euclidean / Manhattan
  • 17. News based features 1. Crawling NYTimes 2. Topic Modeling 3. Finding topics correlated with ETA 4. Using top10 correlated topics as features a. Number of articles on a day for every topic
  • 19. Caveats ● Timeseries future mixing ● Not exactly a metric that might be the most important ○ Same weight for positive / negative error ● Crowdedness - assumes that data is a representative sample of the total car population ○ E.g. 2 times the samples equals 2 times the traffic ● Variance - taken from original validation dataset (constant)
  • 20. Public Leader Board Team Score Team DCountdown 0.150041 Squanchers 0.160468 Noa's stars 0.165869 Aperture Science 0.167958 R-North 0.175308 TAU Deep Learning Lab 0.182602 Mr Terminal 0.193593 MTG 0.282009 SuperFish 0.302637
  • 22. Machine Learning Approach ● Black Box (~ish) ● Harder to deploy ● Retrain when system changes Advantages : Disadvantages : ● No manual tuning ● No complex heuristics ● Optimise different metrics ● Personalisation ● Taking different ideas and their interactions into account as one