SlideShare a Scribd company logo
Data Science
By: Sachin Rastogi
1
Credit : All information (images/video/text used for this presentations) is available in public domain. All rights are reserved with their actual owners. My purpose is just to explain Data
Science for non-profit. If you still have any objection, please let me know I will remove respective contents. My email is “sachin.rastogi@yahoo.com”
“To make everyone understand
Data Science with the help of real
stories.
2
What is Data Science?
Data science is an interdisciplinary field that uses scientific
methods, processes, algorithms and systems to extract
knowledge and insights from data in various forms.
3 source https://en.wikipedia.org/wiki/Data_science
What is Data Science?
Data science is about using data to create impact for your
organization, Impact can be
• In the form of insights,
• In the form of data products,
• In the form of product recommendations.
4
The Target Story
1111
5
6
Target CorporationTarget CorporationTarget CorporationTarget Corporation is the second-largest department store retailer in the United States.
• Generally Shoppers don’t buy everything at one store.
• Target sells everything from milk to stuffed animals to lawn furniture to electronics.
• One of the company’s primary goals is to convince customers that they only need
Target, but how?
• Some specific periods in a person’s life when old routines fall apart and their buying
habits are suddenly in flux.
• TimingTimingTimingTiming is everything.
• The key is to reach them earlier, before any other retailers know a baby is on the
way.
“Can you give us a list of such customers ?”
The Target Story
7
Please watch this video @ https://youtu.be/RC5HNTj3Dag
The OSEMN(awesome)Model
2222
8 A very practical definition by Mason & Wiggins (2010)
9
01010101 Collect the data.
Obtain
02020202 Clean the data.
Scrub
03030303 Understand the data.
Explore
04040404 Mathematical
representation of the data.
Model
05050505 Storytelling and drawing
conclusion from the data.
iNterpret
10
01010101 1. Query from database.
2. Read from csv/html/Jason.
3. Generate data e.g. Sensors.
4. Collect from surveys.
5. Download from another location
(e.g. webserver).
Obtain
11
02 Real obtained data may have
missing values, inconsistencies,
errors, weird characters, or
uninteresting columns.
Common scrubbing operations
include:
1. Filtering lines.
2. Extracting certain columns.
3. Replacing values.
4. Extracting words.
5. Handling missing values.
6. Converting data from one format
to another.
Scrub
12
03 This is where it gets interesting,
because here we will get really into
our data.
1. Understand the data.
2. Identify patterns & relationship
among data.
3. Derive Statistics from the data.
4. Create interesting visualization.
Explore
13
14
04 It is a mathematical
representation of the data.
(with respect to the
assumptions we're willing to
make, the problem we're
trying to solve, and the data
themselves).
Model
15
04 Here we’re using linear regression,
one of the simplest techniques in
data science. We’re fitting the
model (the line) to a data series (the
dots).
We know that the model will be on
the form y =y =y =y = axaxaxax + b+ b+ b+ b
and we’re trying to find the optimal
values of a and b.
We draw a line that best fits the
existing data points on average.
Once we’ve fitted the model, we
can use it to predict outcomes (y
axis) based on inputs (x axis).
Model
“""""The purpose of computing is
insight, not numbers.""""
- Richard Hamming
16
17
05 1. Drawing conclusions from your
data.
2. Evaluating what your results
mean.
3. Visualize your finding – keep it
simple and priority driven.
4. Storytelling about data –
Effectively communicate the
results to non-technical
audiences.
iNterpret
The Strava Story
3333
18
19
What is Strava?
Strava is a social fitness networking application that is used to
track cycling, running, and swimming activities, among others,
using GPS data.
Strava in numbers
1. Activities recorded as at 31 December 2017: 1 billion
2. Runs uploaded in 2017: 136 million
3. Marathons uploaded in 2017: 627,239
4. Every 40 days, a million people join Strava.
Strava’s also counts commuters.
20
Strava Profile-Mr. R - Bike
21
Strava Profile-Mr. R - Run
22
Strava Profile-Mr. M - Run
23
Strava Profile-Mr. M - Run
24
The Nike Story
4444
25
26
Nike Says Its $250 Running Shoes Will Make You
Run Much Faster. What if That’s Actually True?
Source : https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html
27
• Nike says the shoes are about 4 percent better than some of its best racing shoes.
• Based on profiles from more than 700 races in dozens of countries since 2014, TheTheTheThe
NY Times compiled resultsNY Times compiled resultsNY Times compiled resultsNY Times compiled results from about 280,000 marathon and 215,000 half marathon
completed races.
• Using public race reports and shoe records from StravaStravaStravaStrava, The Times found that runnersrunnersrunnersrunners
in Vaporflys ran 3 to 4 percent fasterin Vaporflys ran 3 to 4 percent fasterin Vaporflys ran 3 to 4 percent fasterin Vaporflys ran 3 to 4 percent faster than similar runners wearing other shoes.
How ?
28
Obtain/Collection of DataObtain/Collection of DataObtain/Collection of DataObtain/Collection of Data
• An ideal experiment to measure how much shoes matter for
race performance will involve a series of marathons on a
variety of courses, with runners randomly assigned different
running shoes.
• There is no such experiment, but something like it happens
around the world almost every weekend.
• Every week, tens of thousands of amateuramateuramateuramateur runners compete in
races and upload their race data — collected on smartphones
or satellite watches — to Strava.
Source : https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html
2929Source : https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html
30 Source : https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html
Scrub/Cleaning of DataScrub/Cleaning of DataScrub/Cleaning of DataScrub/Cleaning of Data
1. No Shoes information.
2. Remove erroneous data.Incomplete data.
3. Higher speed threshold.
4. Virtual road ride.
5. Spelling mistakes.
31 Source : https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html
Explore/Model/InterpretExplore/Model/InterpretExplore/Model/InterpretExplore/Model/Interpret
Below, we describe the four ways we measured the shoes’ effect.
1. Measuring shoe effects using statistical models.
2. Comparing groups of runners who completed the same two
races.
3. Average change among shoe switchers compared with non
switchers.
4. All runners as they switch to a new kind of racing shoe.
32 Source : https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html
Explore/Model/InterpretExplore/Model/InterpretExplore/Model/InterpretExplore/Model/Interpret
Measuring shoe effects using statistical models.
Pros of this approach:Pros of this approach:Pros of this approach:Pros of this approach: Tries to control for race conditions, weather, gender,
age, pre-race training and a runner’s previous race times.
Cons of this approach:Cons of this approach:Cons of this approach:Cons of this approach: Still not a randomized controlled trial.
33 Source : https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html
Explore/Model/InterpretExplore/Model/InterpretExplore/Model/InterpretExplore/Model/Interpret
Comparing groups of runners who completed the same two races.
((((Boston 2017 and Boston 2018))))
Pros of this approach:Pros of this approach:Pros of this approach:Pros of this approach: Follows athletes of similar ability who ran in identical
conditions.
Cons of this approach:Cons of this approach:Cons of this approach:Cons of this approach: Runners could save their special shoes for when they
expect to have a fast race.
Instead of directly comparing performances in the two races, we can
compare the net change of runners who switched to VaporflysVaporflysVaporflysVaporflys with the net
change of similar runners who did not.
34 Source : https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html
Explore/Model/InterpretExplore/Model/InterpretExplore/Model/InterpretExplore/Model/Interpret
35 Source : https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html
Explore/Model/InterpretExplore/Model/InterpretExplore/Model/InterpretExplore/Model/Interpret
Average change among shoe switchers compared with non switchers.
Hundreds of pairs of races in which large groups of runners ran the same two
races and in which a subset of them switched shoes.
36 Source : https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html
Explore/Model/InterpretExplore/Model/InterpretExplore/Model/InterpretExplore/Model/Interpret
All runners as they switch to a new kind of racing shoe.
Pros of this approach:Pros of this approach:Pros of this approach:Pros of this approach: Accounts for runners of varying skills over several
races.
Cons of this approach:Cons of this approach:Cons of this approach:Cons of this approach: Runners could save VaporflysVaporflysVaporflysVaporflys for when they expect to
be faster than normal.
37 Source : https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html
Explore/Model/InterpretExplore/Model/InterpretExplore/Model/InterpretExplore/Model/Interpret
All runners as they switch to a new kind of racing shoe.
38 Source : https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html
Explore/Model/InterpretExplore/Model/InterpretExplore/Model/InterpretExplore/Model/Interpret
None of these approaches are perfect, but they all point to a similar
conclusion.
Wherever we look for evidence that shoes matter in a marathon or half
marathon, wewewewe findfindfindfind VaporflysVaporflysVaporflysVaporflys atatatat orororor nearnearnearnear thethethethe toptoptoptop ofofofof thatthatthatthat listlistlistlist.
RunnersRunnersRunnersRunners whowhowhowho improvedimprovedimprovedimproved theirtheirtheirtheir performanceperformanceperformanceperformance inininin VaporflysVaporflysVaporflysVaporflys andandandand thenthenthenthen switchedswitchedswitchedswitched totototo
otherotherotherother shoesshoesshoesshoes gotgotgotgot slowerslowerslowerslower....
“"Data will talk to you if you’re
willing to listen to it."
-Jim Bergeson
39
The Strava Heatmap
for City Planner
5555
40
41
What is heatmap?
It is a graphical representation of different activates recorded on Strava with respective
GPS data on map. Activities includes Running, Commute, Biking, Swimming etc.
To give a sense of scale, the heatmap consists of:
• 700 million activities
• 1.4 trillion latitude/longitude points
• A total distance of 16 billion km (10 billion miles)
• A total recorded activity duration of 100 thousand years
Source : https://medium.com/strava-engineering/the-global-heatmap-now-6x-hotter-23fc01d301de
Strava Heatmap
42 Source : https://medium.com/strava-engineering/the-global-heatmap-now-6x-hotter-23fc01d301de
Bike counter Correlation
43 Source : https://medium.com/strava-engineering/the-global-heatmap-now-6x-hotter-23fc01d301de
Strava Heatmap
In SeattleSeattleSeattleSeattle, At one intersection, city planner discovered
• Cyclists coming from the south would slow down before crossing,
• Cyclists coming from the north would come to a stop and then walk their bikes or
ride slowly.
• City planner realized the intersection posed a risk to cyclists.
Similarly DOT installed rumblerumblerumblerumble stripsstripsstripsstrips on Highway to avoid motor vehicles running off
the road, but they’re a nightmare for cyclists.
44 Source : https://www.cyclingweekly.com/news/latest-news/five-best-strava-art-139034
Strava accuracy on map
The New Forest ponyThe New Forest ponyThe New Forest ponyThe New Forest ponyThe Strava proposalThe Strava proposalThe Strava proposalThe Strava proposal
45 Source : https://www.strava.com/heatmap#15.00/78.35556/17.44737/hot/run
Strava Heatmap - Running
46 Source : https://medium.com/strava-engineering/the-global-heatmap-now-6x-hotter-23fc01d301de
Strava Heatmap - Biking
47 Source : https://medium.com/strava-engineering/the-global-heatmap-now-6x-hotter-23fc01d301de
Strava Heatmap - Swimming
48
Strava Heatmap - Live
https://www.strava.com/heatmap#15.00/78.35152/17.44928/hot/run
Our process is easy
49
Thanks!
Any questions?Any questions?Any questions?Any questions?
50
Credits
Special thanks to all the people
who made and released these
awesome resources for free:
◎ Presentation template by
SlidesCarnival
◎ Photographs by Unsplash
51

More Related Content

Similar to Let's understand Data Science

The Case for Graphs in Supply Chains
The Case for Graphs in Supply ChainsThe Case for Graphs in Supply Chains
The Case for Graphs in Supply Chains
Neo4j
 
Agile Metrics: Make Better Decisions with Data
Agile Metrics: Make Better Decisions with DataAgile Metrics: Make Better Decisions with Data
Agile Metrics: Make Better Decisions with Data
TechWell
 
Survey Add-on Showcase: Cloud Transformation
Survey Add-on Showcase: Cloud TransformationSurvey Add-on Showcase: Cloud Transformation
Survey Add-on Showcase: Cloud Transformation
LeanIX GmbH
 
Future se oct15
Future se oct15Future se oct15
Future se oct15
CS, NcState
 
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance ManagementAn AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
Databricks
 
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific DataEvaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
AM Publications
 
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific DataEvaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
AM Publications
 
[drupalday2017] - Speed-up your Drupal instance!
[drupalday2017] - Speed-up your Drupal instance![drupalday2017] - Speed-up your Drupal instance!
[drupalday2017] - Speed-up your Drupal instance!
DrupalDay
 
Speed up your Drupal instance!!
Speed up your Drupal instance!!Speed up your Drupal instance!!
Speed up your Drupal instance!!
bmeme
 
81819, 957 PMPrintPage 1 of 43httpscontent.ashford.e.docx
81819, 957 PMPrintPage 1 of 43httpscontent.ashford.e.docx81819, 957 PMPrintPage 1 of 43httpscontent.ashford.e.docx
81819, 957 PMPrintPage 1 of 43httpscontent.ashford.e.docx
blondellchancy
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
 
Architecting IoT with Machine Learning
Architecting IoT with Machine LearningArchitecting IoT with Machine Learning
Architecting IoT with Machine Learning
Rudradeb Mitra
 
Data Summer Conf 2018, “Architecting IoT system with Machine Learning (ENG)” ...
Data Summer Conf 2018, “Architecting IoT system with Machine Learning (ENG)” ...Data Summer Conf 2018, “Architecting IoT system with Machine Learning (ENG)” ...
Data Summer Conf 2018, “Architecting IoT system with Machine Learning (ENG)” ...
Provectus
 
Strategies for the seamless deployment of travel diary collection systems to ...
Strategies for the seamless deployment of travel diary collection systems to ...Strategies for the seamless deployment of travel diary collection systems to ...
Strategies for the seamless deployment of travel diary collection systems to ...
Adrian C. Prelipcean
 
I can be apple and so can you
I can be apple and so can youI can be apple and so can you
I can be apple and so can you
Shakacon
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveDemystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Hyderabad Scalability Meetup
 
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Lippo Group Digital
 
A Year of Data Science at Metail
A Year of Data Science at MetailA Year of Data Science at Metail
A Year of Data Science at Metail
Matt McDonnell
 
Automated Creativity + Footwear Futures
Automated Creativity + Footwear FuturesAutomated Creativity + Footwear Futures
Automated Creativity + Footwear Futures
Ryan Polgar
 
Datapreneurs
DatapreneursDatapreneurs
Datapreneurs
suresh sood
 

Similar to Let's understand Data Science (20)

The Case for Graphs in Supply Chains
The Case for Graphs in Supply ChainsThe Case for Graphs in Supply Chains
The Case for Graphs in Supply Chains
 
Agile Metrics: Make Better Decisions with Data
Agile Metrics: Make Better Decisions with DataAgile Metrics: Make Better Decisions with Data
Agile Metrics: Make Better Decisions with Data
 
Survey Add-on Showcase: Cloud Transformation
Survey Add-on Showcase: Cloud TransformationSurvey Add-on Showcase: Cloud Transformation
Survey Add-on Showcase: Cloud Transformation
 
Future se oct15
Future se oct15Future se oct15
Future se oct15
 
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance ManagementAn AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
 
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific DataEvaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
 
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific DataEvaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
 
[drupalday2017] - Speed-up your Drupal instance!
[drupalday2017] - Speed-up your Drupal instance![drupalday2017] - Speed-up your Drupal instance!
[drupalday2017] - Speed-up your Drupal instance!
 
Speed up your Drupal instance!!
Speed up your Drupal instance!!Speed up your Drupal instance!!
Speed up your Drupal instance!!
 
81819, 957 PMPrintPage 1 of 43httpscontent.ashford.e.docx
81819, 957 PMPrintPage 1 of 43httpscontent.ashford.e.docx81819, 957 PMPrintPage 1 of 43httpscontent.ashford.e.docx
81819, 957 PMPrintPage 1 of 43httpscontent.ashford.e.docx
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Architecting IoT with Machine Learning
Architecting IoT with Machine LearningArchitecting IoT with Machine Learning
Architecting IoT with Machine Learning
 
Data Summer Conf 2018, “Architecting IoT system with Machine Learning (ENG)” ...
Data Summer Conf 2018, “Architecting IoT system with Machine Learning (ENG)” ...Data Summer Conf 2018, “Architecting IoT system with Machine Learning (ENG)” ...
Data Summer Conf 2018, “Architecting IoT system with Machine Learning (ENG)” ...
 
Strategies for the seamless deployment of travel diary collection systems to ...
Strategies for the seamless deployment of travel diary collection systems to ...Strategies for the seamless deployment of travel diary collection systems to ...
Strategies for the seamless deployment of travel diary collection systems to ...
 
I can be apple and so can you
I can be apple and so can youI can be apple and so can you
I can be apple and so can you
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveDemystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep Dive
 
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
 
A Year of Data Science at Metail
A Year of Data Science at MetailA Year of Data Science at Metail
A Year of Data Science at Metail
 
Automated Creativity + Footwear Futures
Automated Creativity + Footwear FuturesAutomated Creativity + Footwear Futures
Automated Creativity + Footwear Futures
 
Datapreneurs
DatapreneursDatapreneurs
Datapreneurs
 

Recently uploaded

一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 

Recently uploaded (20)

一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 

Let's understand Data Science

  • 1. Data Science By: Sachin Rastogi 1 Credit : All information (images/video/text used for this presentations) is available in public domain. All rights are reserved with their actual owners. My purpose is just to explain Data Science for non-profit. If you still have any objection, please let me know I will remove respective contents. My email is “sachin.rastogi@yahoo.com”
  • 2. “To make everyone understand Data Science with the help of real stories. 2
  • 3. What is Data Science? Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms. 3 source https://en.wikipedia.org/wiki/Data_science
  • 4. What is Data Science? Data science is about using data to create impact for your organization, Impact can be • In the form of insights, • In the form of data products, • In the form of product recommendations. 4
  • 6. 6 Target CorporationTarget CorporationTarget CorporationTarget Corporation is the second-largest department store retailer in the United States. • Generally Shoppers don’t buy everything at one store. • Target sells everything from milk to stuffed animals to lawn furniture to electronics. • One of the company’s primary goals is to convince customers that they only need Target, but how? • Some specific periods in a person’s life when old routines fall apart and their buying habits are suddenly in flux. • TimingTimingTimingTiming is everything. • The key is to reach them earlier, before any other retailers know a baby is on the way. “Can you give us a list of such customers ?” The Target Story
  • 7. 7 Please watch this video @ https://youtu.be/RC5HNTj3Dag
  • 8. The OSEMN(awesome)Model 2222 8 A very practical definition by Mason & Wiggins (2010)
  • 9. 9 01010101 Collect the data. Obtain 02020202 Clean the data. Scrub 03030303 Understand the data. Explore 04040404 Mathematical representation of the data. Model 05050505 Storytelling and drawing conclusion from the data. iNterpret
  • 10. 10 01010101 1. Query from database. 2. Read from csv/html/Jason. 3. Generate data e.g. Sensors. 4. Collect from surveys. 5. Download from another location (e.g. webserver). Obtain
  • 11. 11 02 Real obtained data may have missing values, inconsistencies, errors, weird characters, or uninteresting columns. Common scrubbing operations include: 1. Filtering lines. 2. Extracting certain columns. 3. Replacing values. 4. Extracting words. 5. Handling missing values. 6. Converting data from one format to another. Scrub
  • 12. 12 03 This is where it gets interesting, because here we will get really into our data. 1. Understand the data. 2. Identify patterns & relationship among data. 3. Derive Statistics from the data. 4. Create interesting visualization. Explore
  • 13. 13
  • 14. 14 04 It is a mathematical representation of the data. (with respect to the assumptions we're willing to make, the problem we're trying to solve, and the data themselves). Model
  • 15. 15 04 Here we’re using linear regression, one of the simplest techniques in data science. We’re fitting the model (the line) to a data series (the dots). We know that the model will be on the form y =y =y =y = axaxaxax + b+ b+ b+ b and we’re trying to find the optimal values of a and b. We draw a line that best fits the existing data points on average. Once we’ve fitted the model, we can use it to predict outcomes (y axis) based on inputs (x axis). Model
  • 16. “""""The purpose of computing is insight, not numbers."""" - Richard Hamming 16
  • 17. 17 05 1. Drawing conclusions from your data. 2. Evaluating what your results mean. 3. Visualize your finding – keep it simple and priority driven. 4. Storytelling about data – Effectively communicate the results to non-technical audiences. iNterpret
  • 19. 19 What is Strava? Strava is a social fitness networking application that is used to track cycling, running, and swimming activities, among others, using GPS data. Strava in numbers 1. Activities recorded as at 31 December 2017: 1 billion 2. Runs uploaded in 2017: 136 million 3. Marathons uploaded in 2017: 627,239 4. Every 40 days, a million people join Strava. Strava’s also counts commuters.
  • 24. 24
  • 26. 26 Nike Says Its $250 Running Shoes Will Make You Run Much Faster. What if That’s Actually True? Source : https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html
  • 27. 27 • Nike says the shoes are about 4 percent better than some of its best racing shoes. • Based on profiles from more than 700 races in dozens of countries since 2014, TheTheTheThe NY Times compiled resultsNY Times compiled resultsNY Times compiled resultsNY Times compiled results from about 280,000 marathon and 215,000 half marathon completed races. • Using public race reports and shoe records from StravaStravaStravaStrava, The Times found that runnersrunnersrunnersrunners in Vaporflys ran 3 to 4 percent fasterin Vaporflys ran 3 to 4 percent fasterin Vaporflys ran 3 to 4 percent fasterin Vaporflys ran 3 to 4 percent faster than similar runners wearing other shoes. How ?
  • 28. 28 Obtain/Collection of DataObtain/Collection of DataObtain/Collection of DataObtain/Collection of Data • An ideal experiment to measure how much shoes matter for race performance will involve a series of marathons on a variety of courses, with runners randomly assigned different running shoes. • There is no such experiment, but something like it happens around the world almost every weekend. • Every week, tens of thousands of amateuramateuramateuramateur runners compete in races and upload their race data — collected on smartphones or satellite watches — to Strava. Source : https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html
  • 30. 30 Source : https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html Scrub/Cleaning of DataScrub/Cleaning of DataScrub/Cleaning of DataScrub/Cleaning of Data 1. No Shoes information. 2. Remove erroneous data.Incomplete data. 3. Higher speed threshold. 4. Virtual road ride. 5. Spelling mistakes.
  • 31. 31 Source : https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html Explore/Model/InterpretExplore/Model/InterpretExplore/Model/InterpretExplore/Model/Interpret Below, we describe the four ways we measured the shoes’ effect. 1. Measuring shoe effects using statistical models. 2. Comparing groups of runners who completed the same two races. 3. Average change among shoe switchers compared with non switchers. 4. All runners as they switch to a new kind of racing shoe.
  • 32. 32 Source : https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html Explore/Model/InterpretExplore/Model/InterpretExplore/Model/InterpretExplore/Model/Interpret Measuring shoe effects using statistical models. Pros of this approach:Pros of this approach:Pros of this approach:Pros of this approach: Tries to control for race conditions, weather, gender, age, pre-race training and a runner’s previous race times. Cons of this approach:Cons of this approach:Cons of this approach:Cons of this approach: Still not a randomized controlled trial.
  • 33. 33 Source : https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html Explore/Model/InterpretExplore/Model/InterpretExplore/Model/InterpretExplore/Model/Interpret Comparing groups of runners who completed the same two races. ((((Boston 2017 and Boston 2018)))) Pros of this approach:Pros of this approach:Pros of this approach:Pros of this approach: Follows athletes of similar ability who ran in identical conditions. Cons of this approach:Cons of this approach:Cons of this approach:Cons of this approach: Runners could save their special shoes for when they expect to have a fast race. Instead of directly comparing performances in the two races, we can compare the net change of runners who switched to VaporflysVaporflysVaporflysVaporflys with the net change of similar runners who did not.
  • 34. 34 Source : https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html Explore/Model/InterpretExplore/Model/InterpretExplore/Model/InterpretExplore/Model/Interpret
  • 35. 35 Source : https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html Explore/Model/InterpretExplore/Model/InterpretExplore/Model/InterpretExplore/Model/Interpret Average change among shoe switchers compared with non switchers. Hundreds of pairs of races in which large groups of runners ran the same two races and in which a subset of them switched shoes.
  • 36. 36 Source : https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html Explore/Model/InterpretExplore/Model/InterpretExplore/Model/InterpretExplore/Model/Interpret All runners as they switch to a new kind of racing shoe. Pros of this approach:Pros of this approach:Pros of this approach:Pros of this approach: Accounts for runners of varying skills over several races. Cons of this approach:Cons of this approach:Cons of this approach:Cons of this approach: Runners could save VaporflysVaporflysVaporflysVaporflys for when they expect to be faster than normal.
  • 37. 37 Source : https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html Explore/Model/InterpretExplore/Model/InterpretExplore/Model/InterpretExplore/Model/Interpret All runners as they switch to a new kind of racing shoe.
  • 38. 38 Source : https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html Explore/Model/InterpretExplore/Model/InterpretExplore/Model/InterpretExplore/Model/Interpret None of these approaches are perfect, but they all point to a similar conclusion. Wherever we look for evidence that shoes matter in a marathon or half marathon, wewewewe findfindfindfind VaporflysVaporflysVaporflysVaporflys atatatat orororor nearnearnearnear thethethethe toptoptoptop ofofofof thatthatthatthat listlistlistlist. RunnersRunnersRunnersRunners whowhowhowho improvedimprovedimprovedimproved theirtheirtheirtheir performanceperformanceperformanceperformance inininin VaporflysVaporflysVaporflysVaporflys andandandand thenthenthenthen switchedswitchedswitchedswitched totototo otherotherotherother shoesshoesshoesshoes gotgotgotgot slowerslowerslowerslower....
  • 39. “"Data will talk to you if you’re willing to listen to it." -Jim Bergeson 39
  • 40. The Strava Heatmap for City Planner 5555 40
  • 41. 41 What is heatmap? It is a graphical representation of different activates recorded on Strava with respective GPS data on map. Activities includes Running, Commute, Biking, Swimming etc. To give a sense of scale, the heatmap consists of: • 700 million activities • 1.4 trillion latitude/longitude points • A total distance of 16 billion km (10 billion miles) • A total recorded activity duration of 100 thousand years Source : https://medium.com/strava-engineering/the-global-heatmap-now-6x-hotter-23fc01d301de Strava Heatmap
  • 42. 42 Source : https://medium.com/strava-engineering/the-global-heatmap-now-6x-hotter-23fc01d301de Bike counter Correlation
  • 43. 43 Source : https://medium.com/strava-engineering/the-global-heatmap-now-6x-hotter-23fc01d301de Strava Heatmap In SeattleSeattleSeattleSeattle, At one intersection, city planner discovered • Cyclists coming from the south would slow down before crossing, • Cyclists coming from the north would come to a stop and then walk their bikes or ride slowly. • City planner realized the intersection posed a risk to cyclists. Similarly DOT installed rumblerumblerumblerumble stripsstripsstripsstrips on Highway to avoid motor vehicles running off the road, but they’re a nightmare for cyclists.
  • 44. 44 Source : https://www.cyclingweekly.com/news/latest-news/five-best-strava-art-139034 Strava accuracy on map The New Forest ponyThe New Forest ponyThe New Forest ponyThe New Forest ponyThe Strava proposalThe Strava proposalThe Strava proposalThe Strava proposal
  • 45. 45 Source : https://www.strava.com/heatmap#15.00/78.35556/17.44737/hot/run Strava Heatmap - Running
  • 46. 46 Source : https://medium.com/strava-engineering/the-global-heatmap-now-6x-hotter-23fc01d301de Strava Heatmap - Biking
  • 47. 47 Source : https://medium.com/strava-engineering/the-global-heatmap-now-6x-hotter-23fc01d301de Strava Heatmap - Swimming
  • 48. 48 Strava Heatmap - Live https://www.strava.com/heatmap#15.00/78.35152/17.44928/hot/run
  • 49. Our process is easy 49
  • 50. Thanks! Any questions?Any questions?Any questions?Any questions? 50
  • 51. Credits Special thanks to all the people who made and released these awesome resources for free: ◎ Presentation template by SlidesCarnival ◎ Photographs by Unsplash 51