SlideShare a Scribd company logo
1 of 11
Download to read offline
DATA SCIENCE FOR CARS
AND CATTLE
ZINAYIDA KENSCHE, PHD, DATA SCIENTIST AT SPOTLIGHT @SAP
DATA SCIENCE IN A NUTSHELL
23.06.20
A DATA SCIENCE PROJECT
• WHAT IS THE PROBLEM?
• DATA COLLECTION
• WHAT FEATURES WE HAVE + WHAT FEATURES ARE REASONABLE
• BASE MODEL AND OTHER MODELS
• UPDATES / EVALUATIONS / ADOPTATIONS
• CELEBRATE VICTORIES AND LEARN FROM MISTAKES
CARS: DEDUPLICATION
100K per day
CARS: DATA COLLECTION
• MANUFACTURER & MODEL & VERSION: VW PASSAT VARIANT 2.0 TDI
• FUEL: DIESEL
• MOTOR: 150 PS
• COLOR: BLACK
• MILEAGE: 200.150KM
• FIRST REGISTRATION: 02/2016
• …
CARS: ARE TWO ADS REPRESENT ONE CAR?
• PREPARE YOUR DATA: FILTER, SORT, PRIORITIZE
• SELECT ONLY THOSE CARS THAT HAVE A HIGH PROBABILITY TO REPRESENT ONE CAR
• SORT ON FIRST REGISTRATION, DATE OF APPEARANCE
AND MILEAGE
• START WITH SIMPLE MODEL FIRST
• CLASSIFICATION:
• A VECTOR WITH SIMILARITIES BETWEEN TWO ADS
• TWO ADS REPRESENT ONE CAR
• TWO ADS REPRESENT DIFFERENT CARS
CARS: NEXT STEPS
• USE LOGISTIC REGRESSION AS BASE MODEL, TRY OTHER MODELS
• FEATURE ENGINEERING
• MORE DATA: FEATURES, LABELED ENTITIES
• AUTOMATE, MONITOR, ADOPT
CARS: LESSONS LEARNED
• + START SIMPLE
• + QUICK SOLUTION
• + PRIORITIZING FEATURES
• - LABELLED DATA IS COSTLY
• - DATA NEED TO BE PRE-FILTERED
• - MORE INTUITIVE APPROACH?!
CATTLE: WHEN IS IT READY TO BE FERTILIZED?
• PERIOD TRACKING APPS: CLUE, FLOW, OVIA, EVE BY GLOW, …
• BIRTH - 11 MONTHS (FIRST HEAT) – HEAT - … - HEAT & INSEMINATION – PREGNANCY – CALVING
Repeats 3-6 times
CATTLE: DS PROJECT SET-UP
• DATA COLLECTION AND SYNTHETIC DATA GENERATION
• SYSTEMATIC SAMPLING & FEATURE ENGINEERING: FREQUENCY OF PEAKS
• HUNTING OUTLIERS WITH SVM
CATTLE: LESSONS LEARNED
• + FILTER REAL-TIME DATA
• + NOT ENOUGH DATA – GENERATE
IT
• - DO NOT TRUST YOUR DATA
• - ONE DATA SOURCE IS NOT
ENOUGH
http://dilbert.com/strip/2004-12-26
HAPPY TO ANSWER YOUR QUESTIONS!

More Related Content

Similar to Data Science for Cars and Cattle - (Predictive Analytics / Data Science In A Nutshell) – by Zinayida Kensche

Large-Scaled Insurance Analytics Using Tweedie Models in Apache Spark with Ya...
Large-Scaled Insurance Analytics Using Tweedie Models in Apache Spark with Ya...Large-Scaled Insurance Analytics Using Tweedie Models in Apache Spark with Ya...
Large-Scaled Insurance Analytics Using Tweedie Models in Apache Spark with Ya...Databricks
 
Big data internship plan at Contemi Vietnam
Big data internship plan at Contemi VietnamBig data internship plan at Contemi Vietnam
Big data internship plan at Contemi VietnamQuang Nguyen
 
Activate 2019 - Search and relevance at scale for online classifieds
Activate 2019 - Search and relevance at scale for online classifiedsActivate 2019 - Search and relevance at scale for online classifieds
Activate 2019 - Search and relevance at scale for online classifiedsRoger Rafanell Mas
 
Uber: $1.25M VC investment turned into $84.2B. Uber's initial pitch deck
Uber: $1.25M VC investment turned into $84.2B. Uber's initial pitch deckUber: $1.25M VC investment turned into $84.2B. Uber's initial pitch deck
Uber: $1.25M VC investment turned into $84.2B. Uber's initial pitch deckAA BB
 
Uber cab first deck dec 2008
Uber cab first deck dec 2008Uber cab first deck dec 2008
Uber cab first deck dec 2008Vincenzo Belpiede
 
Le 25 Slide del Primo Pitch di Uber
Le 25 Slide del Primo Pitch di UberLe 25 Slide del Primo Pitch di Uber
Le 25 Slide del Primo Pitch di UberFederico Sbandi
 
מצגת משקיעים של Uber
מצגת משקיעים של Uberמצגת משקיעים של Uber
מצגת משקיעים של UberRom Shiri
 
Uber pitch deck 2008
Uber pitch deck 2008Uber pitch deck 2008
Uber pitch deck 2008mustafa sarac
 
Agile bringing Big Data & Analytics closer
Agile bringing Big Data & Analytics closerAgile bringing Big Data & Analytics closer
Agile bringing Big Data & Analytics closerNitin Khattar
 
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Databricks
 
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...Codemotion
 

Similar to Data Science for Cars and Cattle - (Predictive Analytics / Data Science In A Nutshell) – by Zinayida Kensche (20)

Large-Scaled Insurance Analytics Using Tweedie Models in Apache Spark with Ya...
Large-Scaled Insurance Analytics Using Tweedie Models in Apache Spark with Ya...Large-Scaled Insurance Analytics Using Tweedie Models in Apache Spark with Ya...
Large-Scaled Insurance Analytics Using Tweedie Models in Apache Spark with Ya...
 
Uberpitchdeck
Uberpitchdeck Uberpitchdeck
Uberpitchdeck
 
Big data internship plan at Contemi Vietnam
Big data internship plan at Contemi VietnamBig data internship plan at Contemi Vietnam
Big data internship plan at Contemi Vietnam
 
Activate 2019 - Search and relevance at scale for online classifieds
Activate 2019 - Search and relevance at scale for online classifiedsActivate 2019 - Search and relevance at scale for online classifieds
Activate 2019 - Search and relevance at scale for online classifieds
 
UberCab_Dec2008.pdf
UberCab_Dec2008.pdfUberCab_Dec2008.pdf
UberCab_Dec2008.pdf
 
Uber: $1.25M VC investment turned into $84.2B. Uber's initial pitch deck
Uber: $1.25M VC investment turned into $84.2B. Uber's initial pitch deckUber: $1.25M VC investment turned into $84.2B. Uber's initial pitch deck
Uber: $1.25M VC investment turned into $84.2B. Uber's initial pitch deck
 
Uber Pitch Deck 2008
Uber Pitch Deck 2008Uber Pitch Deck 2008
Uber Pitch Deck 2008
 
Uber pitch deck
Uber pitch deckUber pitch deck
Uber pitch deck
 
Uber cab first deck dec 2008
Uber cab first deck dec 2008Uber cab first deck dec 2008
Uber cab first deck dec 2008
 
Le 25 Slide del Primo Pitch di Uber
Le 25 Slide del Primo Pitch di UberLe 25 Slide del Primo Pitch di Uber
Le 25 Slide del Primo Pitch di Uber
 
מצגת משקיעים של Uber
מצגת משקיעים של Uberמצגת משקיעים של Uber
מצגת משקיעים של Uber
 
Uber Pitch Deck
Uber Pitch DeckUber Pitch Deck
Uber Pitch Deck
 
Uber
UberUber
Uber
 
Uber.pdf
Uber.pdfUber.pdf
Uber.pdf
 
Uber pitch deck 2008
Uber pitch deck 2008Uber pitch deck 2008
Uber pitch deck 2008
 
Uber Pitch Deck
Uber Pitch DeckUber Pitch Deck
Uber Pitch Deck
 
Agile bringing Big Data & Analytics closer
Agile bringing Big Data & Analytics closerAgile bringing Big Data & Analytics closer
Agile bringing Big Data & Analytics closer
 
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
 
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...
 
In plant Web to Print a Solution that Fits
In plant Web to Print a Solution that FitsIn plant Web to Print a Solution that Fits
In plant Web to Print a Solution that Fits
 

Recently uploaded

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Recently uploaded (20)

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

Data Science for Cars and Cattle - (Predictive Analytics / Data Science In A Nutshell) – by Zinayida Kensche

  • 1. DATA SCIENCE FOR CARS AND CATTLE ZINAYIDA KENSCHE, PHD, DATA SCIENTIST AT SPOTLIGHT @SAP DATA SCIENCE IN A NUTSHELL 23.06.20
  • 2. A DATA SCIENCE PROJECT • WHAT IS THE PROBLEM? • DATA COLLECTION • WHAT FEATURES WE HAVE + WHAT FEATURES ARE REASONABLE • BASE MODEL AND OTHER MODELS • UPDATES / EVALUATIONS / ADOPTATIONS • CELEBRATE VICTORIES AND LEARN FROM MISTAKES
  • 4. CARS: DATA COLLECTION • MANUFACTURER & MODEL & VERSION: VW PASSAT VARIANT 2.0 TDI • FUEL: DIESEL • MOTOR: 150 PS • COLOR: BLACK • MILEAGE: 200.150KM • FIRST REGISTRATION: 02/2016 • …
  • 5. CARS: ARE TWO ADS REPRESENT ONE CAR? • PREPARE YOUR DATA: FILTER, SORT, PRIORITIZE • SELECT ONLY THOSE CARS THAT HAVE A HIGH PROBABILITY TO REPRESENT ONE CAR • SORT ON FIRST REGISTRATION, DATE OF APPEARANCE AND MILEAGE • START WITH SIMPLE MODEL FIRST • CLASSIFICATION: • A VECTOR WITH SIMILARITIES BETWEEN TWO ADS • TWO ADS REPRESENT ONE CAR • TWO ADS REPRESENT DIFFERENT CARS
  • 6. CARS: NEXT STEPS • USE LOGISTIC REGRESSION AS BASE MODEL, TRY OTHER MODELS • FEATURE ENGINEERING • MORE DATA: FEATURES, LABELED ENTITIES • AUTOMATE, MONITOR, ADOPT
  • 7. CARS: LESSONS LEARNED • + START SIMPLE • + QUICK SOLUTION • + PRIORITIZING FEATURES • - LABELLED DATA IS COSTLY • - DATA NEED TO BE PRE-FILTERED • - MORE INTUITIVE APPROACH?!
  • 8. CATTLE: WHEN IS IT READY TO BE FERTILIZED? • PERIOD TRACKING APPS: CLUE, FLOW, OVIA, EVE BY GLOW, … • BIRTH - 11 MONTHS (FIRST HEAT) – HEAT - … - HEAT & INSEMINATION – PREGNANCY – CALVING Repeats 3-6 times
  • 9. CATTLE: DS PROJECT SET-UP • DATA COLLECTION AND SYNTHETIC DATA GENERATION • SYSTEMATIC SAMPLING & FEATURE ENGINEERING: FREQUENCY OF PEAKS • HUNTING OUTLIERS WITH SVM
  • 10. CATTLE: LESSONS LEARNED • + FILTER REAL-TIME DATA • + NOT ENOUGH DATA – GENERATE IT • - DO NOT TRUST YOUR DATA • - ONE DATA SOURCE IS NOT ENOUGH http://dilbert.com/strip/2004-12-26
  • 11. HAPPY TO ANSWER YOUR QUESTIONS!