SlideShare a Scribd company logo
GoDataDriven
PROUDLY PART OF THE XEBIA GROUP
github.com/jsamoocha
jonatansamoocha@godatadriven.com
Exercise Type Detection
Jonatan Samoocha
Number Cruncher
- Formal education in AI
-Doing quantitative & computational stuff (nowadays called “data science”)
-@ GoDataDriven - doing Big Data / Data Science consultancy
-Nearly 20-person strong team, various backgrounds (CS, econometrics, theoretical physics, etc)
-This project emerged from the GDD Friday
-Why this topic?
GoDataDriven
Google is our friend?
- (Very) casual runner until 3y ago
-decided without apparent reason to run a 10k race, 45m or faster…
-lots of “expert opinions”
-some seem more “reasonable” or “sophisticated” than others
-chose one, for “beginning competitive” runners
GPS Becoming a Commodity:
- Bought myself one of those (and a pair of new shoes :-))
- Uploads data to web app of vendor
“Fancy” analytics
by GPS vendors
- Just showing history, fail to answer simple questions:
- what's my level?
- what's my progress?
- is goal X achievable?
- on track for goal X?
- Consumes data from most GPS device vendors
-Focus on social (friends, clubs, leaderboards)
-Same shortcomings of analytics - just representations of history
-Huge amount of users, estimated 1.2M active
-potential for retrospective experiments with n > 1(00)k
-E.g. what is the most efficient way (time/injury/etc) to train for sub-45 10k, famous cyclosportive?
Explaining / Predicting
Race Performance
- Thought experiment: what could we do if we had all that data?
GoDataDriven
Exercise data (overview)
Potential aggregates (by period):
-total volume
-ramp-up (rate of increase in volume)
-speed/hr/power/etc variance
GoDataDriven
Exercise data (streams)
Potential aggregates (by exercise):
-within-activity variance
-distribution of speed/power/hr/etc
GoDataDriven
Time series analysis on streams
Allows detection of:
-blocks
-repetitions
-structure
GoDataDriven
Time series analysis, GPS streams
Allows detection of:
-Corners
-Acceleration/deceleration
GoDataDriven
Detecting Races
?
Precondition for predicting race performance…
GoDataDriven
The Plan
•Collect data from Strava
•Transform raw (GPS) streams to concepts of
higher abstraction level:
•Blocks, repetitions, = exercise types
•Aggregates by period of exercise types
•Aggregates by period of other exercise
properties
•Use Machine Learning to relate these to race
performance
Personalized training
advice for the masses!
Caveat: probably less suitable for elite athletes seeking marginal improvements
Need to deal with some “practicalities”
Getting the data
- REST API, all data is available
-Unfortunately, there’s no simple “download all” button or endpoint
-Need an access token for each user
-Host a web app, OAuth to Strava
GoDataDriven
25 “Data Donors”
•Obtained by recruiting in Facebook
•Diverse group
•Hardly active vs. elite level
•Running, Cycling, Duathlon,Triathlon
•9434 activities (2015-08-16)
GoDataDriven
Curse of Dimensionality
•Controlling some measure (e.g. race time) by
gender, age, weight
•Dividing age and weight in 10 bins totals 200 bins
•Assume 1000 users uniformly distributed
•Gives n=5 per bin :-(
•Need many more users!
Level
Progress
Activity Feedback
GoDataDriven
Level by sport should be easy…
Wait, where’s “mountainbike” as activity type?
- Some cyclists ride on the road as well as in the dirt
-Complicates computation of “level”!
Exercise Type
Detection
GoDataDriven
Challenge 1: Road vs. MTB
- Also using machine learning
-Classification, supervised
GoDataDriven
astroml.org
Validation method:
- Split data into train/test
- Train model on train
- Predict on test
- Compare predicted with actual label
- Quantify error (Accuracy, F1 score, Kappa, etc)
- Interpret error !!!
GoDataDriven
First things first: labels
- Supervised learning, needs examples
-Strava didn’t label our rides!
-Manually added “road”/“offroad” labels to ~500 ride activities
GoDataDriven
Overview Data…
No nice bimodal distributions, trying box plots…
GoDataDriven
- Speed separates ok
-Lots of overlap, probably need better features for good prediction
GoDataDriven
The 3 feature model
precision recall f1-score support
offroad 0.88 0.97 0.92 30
road 0.98 0.94 0.96 65
avg / total 0.95 0.95 0.95 95
- Gradient Boosting algorithm
-Precision: fraction of actual X in set of rides predicted as X
-Recall: fraction of X predicted as X
-F1 score: harmonic mean of precision and recall
-Seems ok, but beware of uneven distribution road/offroad
Refining raw data
Extract new features
How to quantify:
VS
Easily detectable for a human eye
Needs to be quantified for an algorithm
GoDataDriven
GPS stream = sequence of vectors
- Need to transform lat/long to x/y
-Each point in time is a vector (dx,dy) from previous point in time
GoDataDriven
There’s a cosine for vectors
cos = 1: vectors in same direction
cos = 0: vectors orthogonal
cos = -1: opposite directions
GoDataDriven
“Technicality” of the course
•time series of (dx,dy) vectors
•ts of cosine (range [-1, 1])
•transform to [0, 1]
•0 = straight ahead
•1 = 180 degree turn
•take sum
•divide by distance
•“corners per km”
GoDataDriven
Adding “technicality” as feature
Many outliers!
GPS freakouts (in mountains)
Track rides
GoDataDriven
The 4 feature model
precision recall f1-score support
offroad 0.97 0.97 0.97 30
road 0.98 0.98 0.98 65
avg / total 0.98 0.98 0.98 95
“Corners per km” feature adds value :-)
Lessons Learned
GoDataDriven
Never, ever, trust human input
- Subjective
-Error-prone
GoDataDriven
Optimizing the model
•Good features add predictive accuracy
•Same for a sufficient amount of (correctly)
labelled observations
•Preferably “borderline” cases
•Model selection and tuning is marginal compared
to above
GoDataDriven
Near Future
•Creating an app with the “bait” features
•Detect races
•Detect interval exercise types
•Ask users for feedback on predictions - meta
learning
•Open-sourcing part of the code
GoDataDriven
We’re hiring / Questions? / Thank you!
github.com/jsamoocha
jonatansamoocha@godatadriven.com
Jonatan Samoocha
Number Cruncher

More Related Content

Similar to Exercise type detection

Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningLarge scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log mining
itstuff
 
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesIEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesNish Parikh
 
Sea Amsterdam 2014 November 19
Sea Amsterdam 2014 November 19Sea Amsterdam 2014 November 19
Sea Amsterdam 2014 November 19
GoDataDriven
 
Data Science in the Elastic Stack
Data Science in the Elastic StackData Science in the Elastic Stack
Data Science in the Elastic Stack
Rochelle Sonnenberg
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
Sanghamitra Deb
 
Case Study: Building Analytic Models over Big Data
Case Study: Building Analytic Models over Big DataCase Study: Building Analytic Models over Big Data
Case Study: Building Analytic Models over Big Data
Collin Bennett
 
STARBUCKS Site Selection Analysis drift
STARBUCKS Site Selection Analysis driftSTARBUCKS Site Selection Analysis drift
STARBUCKS Site Selection Analysis drift
Park JunPyo
 
MongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
Wush Wu
 
Florian Douetteau @ Dataiku
Florian Douetteau @ DataikuFlorian Douetteau @ Dataiku
Florian Douetteau @ Dataiku
PAPIs.io
 
Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...
jins0618
 
Interactive Analytics in Human Time
Interactive Analytics in Human TimeInteractive Analytics in Human Time
Interactive Analytics in Human TimeDataWorks Summit
 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototyping
Akin Osman Kazakci
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
Mohit Garg
 
(Slides) A Personal Navigation System with a Schedule Planning Facility Based...
(Slides) A Personal Navigation System with a Schedule Planning Facility Based...(Slides) A Personal Navigation System with a Schedule Planning Facility Based...
(Slides) A Personal Navigation System with a Schedule Planning Facility Based...
Naoki Shibata
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
MLconf
 
20150607 sotm-us-osmose-qa
20150607 sotm-us-osmose-qa20150607 sotm-us-osmose-qa
20150607 sotm-us-osmose-qa
Frédéric Rodrigo
 
CTOs Perspective on Adding Geospatial and Location-based Information
CTOs Perspective on Adding Geospatial and Location-based InformationCTOs Perspective on Adding Geospatial and Location-based Information
CTOs Perspective on Adding Geospatial and Location-based Information
Bradley Brown
 
Pragmatic deep learning for image labelling
Pragmatic deep learning for image labellingPragmatic deep learning for image labelling
Pragmatic deep learning for image labelling
Pierre Gutierrez
 

Similar to Exercise type detection (20)

Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningLarge scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log mining
 
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesIEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slides
 
Sea Amsterdam 2014 November 19
Sea Amsterdam 2014 November 19Sea Amsterdam 2014 November 19
Sea Amsterdam 2014 November 19
 
Data Science in the Elastic Stack
Data Science in the Elastic StackData Science in the Elastic Stack
Data Science in the Elastic Stack
 
kdd2015
kdd2015kdd2015
kdd2015
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
Case Study: Building Analytic Models over Big Data
Case Study: Building Analytic Models over Big DataCase Study: Building Analytic Models over Big Data
Case Study: Building Analytic Models over Big Data
 
STARBUCKS Site Selection Analysis drift
STARBUCKS Site Selection Analysis driftSTARBUCKS Site Selection Analysis drift
STARBUCKS Site Selection Analysis drift
 
MongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor Management
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
 
Florian Douetteau @ Dataiku
Florian Douetteau @ DataikuFlorian Douetteau @ Dataiku
Florian Douetteau @ Dataiku
 
Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...
 
Interactive Analytics in Human Time
Interactive Analytics in Human TimeInteractive Analytics in Human Time
Interactive Analytics in Human Time
 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototyping
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
 
(Slides) A Personal Navigation System with a Schedule Planning Facility Based...
(Slides) A Personal Navigation System with a Schedule Planning Facility Based...(Slides) A Personal Navigation System with a Schedule Planning Facility Based...
(Slides) A Personal Navigation System with a Schedule Planning Facility Based...
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
 
20150607 sotm-us-osmose-qa
20150607 sotm-us-osmose-qa20150607 sotm-us-osmose-qa
20150607 sotm-us-osmose-qa
 
CTOs Perspective on Adding Geospatial and Location-based Information
CTOs Perspective on Adding Geospatial and Location-based InformationCTOs Perspective on Adding Geospatial and Location-based Information
CTOs Perspective on Adding Geospatial and Location-based Information
 
Pragmatic deep learning for image labelling
Pragmatic deep learning for image labellingPragmatic deep learning for image labelling
Pragmatic deep learning for image labelling
 

More from GoDataDriven

Streamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature CatalogStreamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature Catalog
GoDataDriven
 
Visualizing Big Data in a Small Screen
Visualizing Big Data in a Small ScreenVisualizing Big Data in a Small Screen
Visualizing Big Data in a Small Screen
GoDataDriven
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlow
GoDataDriven
 
Training Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organizationTraining Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organization
GoDataDriven
 
My Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics EngineerMy Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics Engineer
GoDataDriven
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchez
GoDataDriven
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
GoDataDriven
 
How to create a Devcontainer for your Python project
How to create a Devcontainer for your Python projectHow to create a Devcontainer for your Python project
How to create a Devcontainer for your Python project
GoDataDriven
 
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
GoDataDriven
 
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
GoDataDriven
 
MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022
GoDataDriven
 
MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022
GoDataDriven
 
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
GoDataDriven
 
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
GoDataDriven
 
AWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de HaanAWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de Haan
GoDataDriven
 
The 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven CompaniesThe 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven Companies
GoDataDriven
 
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
GoDataDriven
 
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
GoDataDriven
 
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't HofSmart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
GoDataDriven
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
GoDataDriven
 

More from GoDataDriven (20)

Streamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature CatalogStreamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature Catalog
 
Visualizing Big Data in a Small Screen
Visualizing Big Data in a Small ScreenVisualizing Big Data in a Small Screen
Visualizing Big Data in a Small Screen
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlow
 
Training Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organizationTraining Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organization
 
My Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics EngineerMy Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics Engineer
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchez
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
How to create a Devcontainer for your Python project
How to create a Devcontainer for your Python projectHow to create a Devcontainer for your Python project
How to create a Devcontainer for your Python project
 
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
 
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
 
MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022
 
MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022
 
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
 
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
 
AWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de HaanAWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de Haan
 
The 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven CompaniesThe 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven Companies
 
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
 
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
 
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't HofSmart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
 

Recently uploaded

【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 

Recently uploaded (20)

【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 

Exercise type detection

  • 1. GoDataDriven PROUDLY PART OF THE XEBIA GROUP github.com/jsamoocha jonatansamoocha@godatadriven.com Exercise Type Detection Jonatan Samoocha Number Cruncher - Formal education in AI -Doing quantitative & computational stuff (nowadays called “data science”) -@ GoDataDriven - doing Big Data / Data Science consultancy -Nearly 20-person strong team, various backgrounds (CS, econometrics, theoretical physics, etc) -This project emerged from the GDD Friday -Why this topic?
  • 2. GoDataDriven Google is our friend? - (Very) casual runner until 3y ago -decided without apparent reason to run a 10k race, 45m or faster… -lots of “expert opinions” -some seem more “reasonable” or “sophisticated” than others -chose one, for “beginning competitive” runners
  • 3. GPS Becoming a Commodity: - Bought myself one of those (and a pair of new shoes :-)) - Uploads data to web app of vendor
  • 4. “Fancy” analytics by GPS vendors - Just showing history, fail to answer simple questions: - what's my level? - what's my progress? - is goal X achievable? - on track for goal X?
  • 5. - Consumes data from most GPS device vendors -Focus on social (friends, clubs, leaderboards) -Same shortcomings of analytics - just representations of history -Huge amount of users, estimated 1.2M active -potential for retrospective experiments with n > 1(00)k -E.g. what is the most efficient way (time/injury/etc) to train for sub-45 10k, famous cyclosportive?
  • 6. Explaining / Predicting Race Performance - Thought experiment: what could we do if we had all that data?
  • 7. GoDataDriven Exercise data (overview) Potential aggregates (by period): -total volume -ramp-up (rate of increase in volume) -speed/hr/power/etc variance
  • 8. GoDataDriven Exercise data (streams) Potential aggregates (by exercise): -within-activity variance -distribution of speed/power/hr/etc
  • 9. GoDataDriven Time series analysis on streams Allows detection of: -blocks -repetitions -structure
  • 10. GoDataDriven Time series analysis, GPS streams Allows detection of: -Corners -Acceleration/deceleration
  • 11. GoDataDriven Detecting Races ? Precondition for predicting race performance…
  • 12. GoDataDriven The Plan •Collect data from Strava •Transform raw (GPS) streams to concepts of higher abstraction level: •Blocks, repetitions, = exercise types •Aggregates by period of exercise types •Aggregates by period of other exercise properties •Use Machine Learning to relate these to race performance
  • 13. Personalized training advice for the masses! Caveat: probably less suitable for elite athletes seeking marginal improvements
  • 14. Need to deal with some “practicalities”
  • 15. Getting the data - REST API, all data is available -Unfortunately, there’s no simple “download all” button or endpoint -Need an access token for each user -Host a web app, OAuth to Strava
  • 16. GoDataDriven 25 “Data Donors” •Obtained by recruiting in Facebook •Diverse group •Hardly active vs. elite level •Running, Cycling, Duathlon,Triathlon •9434 activities (2015-08-16)
  • 17. GoDataDriven Curse of Dimensionality •Controlling some measure (e.g. race time) by gender, age, weight •Dividing age and weight in 10 bins totals 200 bins •Assume 1000 users uniformly distributed •Gives n=5 per bin :-( •Need many more users!
  • 19. GoDataDriven Level by sport should be easy… Wait, where’s “mountainbike” as activity type? - Some cyclists ride on the road as well as in the dirt -Complicates computation of “level”!
  • 21. GoDataDriven Challenge 1: Road vs. MTB - Also using machine learning -Classification, supervised
  • 22. GoDataDriven astroml.org Validation method: - Split data into train/test - Train model on train - Predict on test - Compare predicted with actual label - Quantify error (Accuracy, F1 score, Kappa, etc) - Interpret error !!!
  • 23. GoDataDriven First things first: labels - Supervised learning, needs examples -Strava didn’t label our rides! -Manually added “road”/“offroad” labels to ~500 ride activities
  • 24. GoDataDriven Overview Data… No nice bimodal distributions, trying box plots…
  • 25. GoDataDriven - Speed separates ok -Lots of overlap, probably need better features for good prediction
  • 26. GoDataDriven The 3 feature model precision recall f1-score support offroad 0.88 0.97 0.92 30 road 0.98 0.94 0.96 65 avg / total 0.95 0.95 0.95 95 - Gradient Boosting algorithm -Precision: fraction of actual X in set of rides predicted as X -Recall: fraction of X predicted as X -F1 score: harmonic mean of precision and recall -Seems ok, but beware of uneven distribution road/offroad
  • 28. How to quantify: VS Easily detectable for a human eye Needs to be quantified for an algorithm
  • 29. GoDataDriven GPS stream = sequence of vectors - Need to transform lat/long to x/y -Each point in time is a vector (dx,dy) from previous point in time
  • 30. GoDataDriven There’s a cosine for vectors cos = 1: vectors in same direction cos = 0: vectors orthogonal cos = -1: opposite directions
  • 31. GoDataDriven “Technicality” of the course •time series of (dx,dy) vectors •ts of cosine (range [-1, 1]) •transform to [0, 1] •0 = straight ahead •1 = 180 degree turn •take sum •divide by distance •“corners per km”
  • 32. GoDataDriven Adding “technicality” as feature Many outliers! GPS freakouts (in mountains) Track rides
  • 33. GoDataDriven The 4 feature model precision recall f1-score support offroad 0.97 0.97 0.97 30 road 0.98 0.98 0.98 65 avg / total 0.98 0.98 0.98 95 “Corners per km” feature adds value :-)
  • 35. GoDataDriven Never, ever, trust human input - Subjective -Error-prone
  • 36. GoDataDriven Optimizing the model •Good features add predictive accuracy •Same for a sufficient amount of (correctly) labelled observations •Preferably “borderline” cases •Model selection and tuning is marginal compared to above
  • 37. GoDataDriven Near Future •Creating an app with the “bait” features •Detect races •Detect interval exercise types •Ask users for feedback on predictions - meta learning •Open-sourcing part of the code
  • 38. GoDataDriven We’re hiring / Questions? / Thank you! github.com/jsamoocha jonatansamoocha@godatadriven.com Jonatan Samoocha Number Cruncher