Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

5,406 views

Published on

Presentation at Spark Summit 2015

Published in: Data & Analytics
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

  1. 1. TOYOTA Customer 360◦ on Apache SparkTM DATA DRIVEN Brian Kursar, Sr Data Scientist Toyota Motor Sales IT Research and Development Final
  2. 2. TOYOTA Big Data History 2015 2014 2013 2012 2011 2010 C360 - Next Gen Insights Platform Over 6B Records C360 - Customer Experience Analytics Over 700M Records C360 - Toyota Social Media Intelligence Center Over 500M Records Product Quality Analytics v2 Over 120M Records Marketing and Incentives Analytics 70M Records Product Quality Analytics Over 60M Records
  3. 3. SPARK SUMMIT 2014 TEAM TOYOTA R&D Data Engineering Infrastructure Enterprise Architecture
  4. 4. Genchi Genbutsu “Go Look, Go See”
  5. 5. •  Compute •  Streaming •  Machine Learning ACTIONABLE INSIGHTS
  6. 6. Customer Experience original Batch Job 160 hours (6.6 days) Same job re-written using Apache Spark … 4hours Data Engineering
  7. 7. Existing Tools
  8. 8. Existing Tools Jan – Feb Feb - Mar +1% +1% +1% +2% Toyota Social Opinion 2013
  9. 9. 40% Retailers Selling Toyota Vehicles 11% Opinions on Marketing Campaigns 10% Feedback on Dealer Sales and Service Experiences 9% Opinions on Product Styling and Features 8% People In Market for a Toyota 8% Incident Reports Involving a Toyota Vehicle 7% Feedback on Product Quality 5% Customers Advocating for the Brand 2% Completely Irrelevant Toyota Online Conversations by the Numbers 2014 Study
  10. 10. Toyota Online Conversations by the Numbers 40% Retailers Selling Toyota Vehicles 11% Opinions on Marketing Campaigns 10% Feedback on Dealer Sales and Service Experiences 9% Opinions on Product Styling and Features 8% People In Market for a Toyota 8% Incident Reports Involving a Toyota Vehicle 7% Feedback on Product Quality 5% Customers Advocating for the Brand 2% Completely Irrelevant 2014 Study 50% Noise
  11. 11. Categorize and Prioritize incoming Social Media interactions in Real-Time using Spark MLLib Campaign Opinions Customer Feedback Product Feedback Noise
  12. 12. First Spark MLlib Experiment •  Seat Cover Wrinkles/Cracking •  Brake Noise •  Shift Quality •  Oil Leaks •  HVAC Odor •  Dead Battery •  Rodent Wire Harness Damage •  Paint Chips Time-box project to 12 Weeks Classify at min 80% accuracy
  13. 13. Describe this issue…
  14. 14. NoiseBrakes
  15. 15. When I’m backing up in my 2012 Prius, it sounds like something hanging up or scraping as it rotates and only happens in the morning.. I hear a squeak coming from the back wheel of my Prius as I pull out from my driveway in the morning.
  16. 16. Extract features from the Verbatim Text Identify Training Data for Brake Noise Model Extract Category (Label) matching Model Objective
  17. 17. Social ML Pipeline Natural  Language  Processing  Options Feature  Engineering Options Statistical  Selection (Chi  Square) All Top TopRandomRandom Popular Vectorization   Options Machine  Learning   Model TF*IDF TF  Options Natural Boolean Log Augmented IDF  Options Unary Inverse Inverse  Smooth ProbDF Support  Vector  Machine  (MLlib) Validation  Set   (1  Fold) Cross  Validation:  Repeat  10  times Training  Set (9  Fold) Multiple  Iteration:  Optimal  SVM   Parameter Training  selection  filters StopwordsStemming N-­‐grams   extraction Cleaning   (#,’,  /,  @) Positive  &  Negative   training  Sets Labeled  Data (Surveys,  Hand   Scored  Social) Extracted  n-­‐grams
  18. 18. Extract Text Features
  19. 19. Train Predictive Model
  20. 20. Ver 1 56% Accuracy
  21. 21. Ver 3 36% Accuracy
  22. 22. Ver 8 35% Accuracy
  23. 23. False Positives I just had my friend at the toyota dealer rotate my tires and he said … that the brake pads are getting thin really fast. So what should I do when they get too thin in the future and start to squeak?
  24. 24. False Positives i cut the iac hose as shown in figure 20 in the manual but when i start the car, it started gasping for air... choking... sounds like it's about to die out. i bought the power brake check valve (80190 part for kragen)... but either i'm not installing it right or it's the wrong size... i have no idea.
  25. 25. Solution
  26. 26. Explicit Semantic Analysis
  27. 27. NoiseBrakes
  28. 28. Noise caliper pads rotor wheel squeak grind groan squeal Brakes drum
  29. 29. Noise squeak grind groan squeal Distance Similarity Between Concepts pads caliper rotor wheel drum Brakes Distance is calculated between concepts based on the Minkowski distance formula.
  30. 30. Ver 9 82% Accuracy
  31. 31. Kaizen = Continuous Improvement
  32. 32. Kaizen Train TestEvaluate Refine
  33. 33. Social ML Pipeline Natural  Language  Processing  Options Feature  Engineering Options Statistical  Selection (Chi  Square) All Top TopRandomRandom Popular Vectorization   Options Machine  Learning   Model TF*IDF TF  Options Natural Boolean Log Augmented IDF  Options Unary Inverse Inverse  Smooth ProbDF Support  Vector  Machine  (MLlib) Validation  Set   (1  Fold) Cross  Validation:  Repeat  10  times Training  Set (9  Fold) Multiple  Iteration:  Optimal  SVM   Parameter Training  selection  filters StopwordsStemming N-­‐grams   extraction Cleaning   (#,’,  /,  @) Positive  &  Negative   training  Sets Labeled  Data (Surveys,  Hand   Scored  Social) Extracted  n-­‐grams Synonyms POS Feature  Manager Negation  Evaluator Distance  Similarity Manhattan   Distance  (L1) -­‐ |x|+  |y| Euclidean   Distance  (L2) -­‐ √(|x|^  2  +  |y|^  2) Concept   Manager Features Concept  Interpreter
  34. 34. TODAY
  35. 35. FUTURE Manufacturing Data Connected Vehicle Data Consumer Data
  36. 36. TEAM TOYOTA Spark Tips •  Education and Inclusion •  Pace Yourself •  Design and Plan a Transitional Architecture to Incrementally Introduce Spark elements into your Applications •  Use Joda Time for Date Comparisons
  37. 37. TEAM TOYOTA Lessons Learned •  Be mindful of AKKA versions when trying to Build a new Spark release to a packaged Hadoop Distribution •  Use SparkSQL versus DSLs for Joins •  Remember to configure Memory Fraction based on the size of your data.
  38. 38. Brian Kursar – Sr Data Scientist Toyota Motor Sales IT Research and Development @briankursar Visit us at the TOYOTA Booth here at the Spark Summit Today.

×