TOYOTA Customer 360◦
on
Apache SparkTM
DATA DRIVEN
Brian Kursar, Sr Data Scientist
Toyota Motor Sales IT Research and Development
Final
TOYOTA Big Data History
2015
2014
2013
2012
2011
2010
C360 - Next Gen Insights Platform
Over 6B Records
C360 - Customer Experience Analytics
Over 700M Records
C360 - Toyota Social Media Intelligence Center
Over 500M Records
Product Quality Analytics v2
Over 120M Records
Marketing and Incentives Analytics
70M Records
Product Quality Analytics
Over 60M Records
SPARK SUMMIT 2014
TEAM TOYOTA
R&D
Data Engineering
Infrastructure
Enterprise Architecture
Genchi Genbutsu
“Go Look, Go See”
•  Compute
•  Streaming
•  Machine Learning
ACTIONABLE INSIGHTS
Customer Experience original Batch Job
160 hours (6.6 days)
Same job re-written using Apache Spark …
4hours
Data Engineering
Existing Tools
Existing Tools
Jan – Feb Feb - Mar
+1% +1%
+1% +2%
Toyota Social Opinion 2013
40% Retailers Selling Toyota Vehicles
11% Opinions on Marketing Campaigns
10% Feedback on Dealer Sales and Service Experiences
9% Opinions on Product Styling and Features
8% People In Market for a Toyota
8% Incident Reports Involving a Toyota Vehicle
7% Feedback on Product Quality
5% Customers Advocating for the Brand
2% Completely Irrelevant
Toyota Online Conversations by the Numbers
2014
Study
Toyota Online Conversations by the Numbers
40% Retailers Selling Toyota Vehicles
11% Opinions on Marketing Campaigns
10% Feedback on Dealer Sales and Service Experiences
9% Opinions on Product Styling and Features
8% People In Market for a Toyota
8% Incident Reports Involving a Toyota Vehicle
7% Feedback on Product Quality
5% Customers Advocating for the Brand
2% Completely Irrelevant
2014
Study
50%
Noise
Categorize and Prioritize incoming Social Media
interactions in Real-Time using Spark MLLib
Campaign
Opinions
Customer
Feedback
Product
Feedback Noise
First Spark MLlib Experiment
•  Seat Cover Wrinkles/Cracking
•  Brake Noise
•  Shift Quality
•  Oil Leaks
•  HVAC Odor
•  Dead Battery
•  Rodent Wire Harness Damage
•  Paint Chips
Time-box project to 12 Weeks
Classify at min 80% accuracy
Describe this
issue…
NoiseBrakes
When I’m backing up in my 2012 Prius, it sounds like
something hanging up or scraping as it rotates and only
happens in the morning..
I hear a squeak coming from the back wheel of
my Prius as I pull out from my driveway in the
morning.
Extract features from the
Verbatim Text
Identify Training Data for Brake Noise Model
Extract Category (Label)
matching Model Objective
Social ML Pipeline
Natural	
  Language	
  Processing	
  Options
Feature	
  Engineering
Options
Statistical	
  Selection
(Chi	
  Square)
All Top
TopRandomRandom
Popular
Vectorization	
  
Options
Machine	
  Learning	
  
Model
TF*IDF
TF	
  Options
Natural
Boolean
Log
Augmented
IDF	
  Options
Unary
Inverse
Inverse	
  Smooth
ProbDF
Support	
  Vector	
  Machine	
  (MLlib)
Validation	
  Set	
  
(1	
  Fold)
Cross	
  Validation:	
  Repeat	
  10	
  times
Training	
  Set
(9	
  Fold)
Multiple	
  Iteration:	
  Optimal	
  SVM	
  
Parameter
Training	
  selection	
  filters
StopwordsStemming
N-­‐grams	
  
extraction
Cleaning	
  
(#,’,	
  /,	
  @)
Positive	
  &	
  Negative	
  
training	
  Sets
Labeled	
  Data
(Surveys,	
  Hand	
  
Scored	
  Social)
Extracted	
  n-­‐grams
Extract Text Features
Train Predictive Model
Ver 1
56%
Accuracy
Ver 3
36%
Accuracy
Ver 8
35%
Accuracy
False Positives
I just had my friend at the toyota dealer rotate my tires and he
said … that the brake pads are getting thin really fast.
So what should I do when they get too thin in the future and start
to squeak?
False Positives
i cut the iac hose as shown in figure 20 in the manual but when i
start the car, it started gasping for air... choking...
sounds like it's about to die out.
i bought the power brake check valve (80190 part for
kragen)... but either i'm not installing it right or it's the wrong
size... i have no idea.
Solution
Explicit Semantic Analysis
NoiseBrakes
Noise
caliper
pads
rotor
wheel
squeak grind
groan
squeal
Brakes
drum
Noise
squeak grind
groan
squeal
Distance Similarity Between Concepts
pads
caliper
rotor
wheel
drum
Brakes
Distance is calculated between
concepts based on the
Minkowski distance formula.
Ver 9
82%
Accuracy
Kaizen
= Continuous Improvement
Kaizen
Train
TestEvaluate
Refine
Social ML Pipeline
Natural	
  Language	
  Processing	
  Options
Feature	
  Engineering
Options
Statistical	
  Selection
(Chi	
  Square)
All Top
TopRandomRandom
Popular
Vectorization	
  
Options
Machine	
  Learning	
  
Model
TF*IDF
TF	
  Options
Natural
Boolean
Log
Augmented
IDF	
  Options
Unary
Inverse
Inverse	
  Smooth
ProbDF
Support	
  Vector	
  Machine	
  (MLlib)
Validation	
  Set	
  
(1	
  Fold)
Cross	
  Validation:	
  Repeat	
  10	
  times
Training	
  Set
(9	
  Fold)
Multiple	
  Iteration:	
  Optimal	
  SVM	
  
Parameter
Training	
  selection	
  filters
StopwordsStemming
N-­‐grams	
  
extraction
Cleaning	
  
(#,’,	
  /,	
  @)
Positive	
  &	
  Negative	
  
training	
  Sets
Labeled	
  Data
(Surveys,	
  Hand	
  
Scored	
  Social)
Extracted	
  n-­‐grams
Synonyms POS
Feature	
  Manager
Negation	
  Evaluator
Distance	
  Similarity
Manhattan	
  
Distance	
  (L1)
-­‐
|x|+	
  |y|
Euclidean	
  
Distance	
  (L2)
-­‐
√(|x|^	
  2	
  +	
  |y|^	
  2)
Concept	
  
Manager
Features
Concept	
  Interpreter
TODAY
FUTURE
Manufacturing Data
Connected Vehicle Data
Consumer Data
TEAM TOYOTA Spark Tips
•  Education and Inclusion
•  Pace Yourself
•  Design and Plan a Transitional Architecture to Incrementally
Introduce Spark elements into your Applications
•  Use Joda Time for Date Comparisons
TEAM TOYOTA Lessons Learned
•  Be mindful of AKKA versions when trying to Build a new Spark release to a
packaged Hadoop Distribution
•  Use SparkSQL versus DSLs for Joins
•  Remember to configure Memory Fraction based on the size of your data.
Brian Kursar – Sr Data Scientist
Toyota Motor Sales IT Research and Development
@briankursar
Visit us at the TOYOTA
Booth here at the Spark
Summit Today.

Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

  • 1.
    TOYOTA Customer 360◦ on ApacheSparkTM DATA DRIVEN Brian Kursar, Sr Data Scientist Toyota Motor Sales IT Research and Development Final
  • 2.
    TOYOTA Big DataHistory 2015 2014 2013 2012 2011 2010 C360 - Next Gen Insights Platform Over 6B Records C360 - Customer Experience Analytics Over 700M Records C360 - Toyota Social Media Intelligence Center Over 500M Records Product Quality Analytics v2 Over 120M Records Marketing and Incentives Analytics 70M Records Product Quality Analytics Over 60M Records
  • 3.
    SPARK SUMMIT 2014 TEAMTOYOTA R&D Data Engineering Infrastructure Enterprise Architecture
  • 4.
  • 5.
    •  Compute •  Streaming • Machine Learning ACTIONABLE INSIGHTS
  • 6.
    Customer Experience originalBatch Job 160 hours (6.6 days) Same job re-written using Apache Spark … 4hours Data Engineering
  • 7.
  • 8.
    Existing Tools Jan –Feb Feb - Mar +1% +1% +1% +2% Toyota Social Opinion 2013
  • 9.
    40% Retailers SellingToyota Vehicles 11% Opinions on Marketing Campaigns 10% Feedback on Dealer Sales and Service Experiences 9% Opinions on Product Styling and Features 8% People In Market for a Toyota 8% Incident Reports Involving a Toyota Vehicle 7% Feedback on Product Quality 5% Customers Advocating for the Brand 2% Completely Irrelevant Toyota Online Conversations by the Numbers 2014 Study
  • 10.
    Toyota Online Conversationsby the Numbers 40% Retailers Selling Toyota Vehicles 11% Opinions on Marketing Campaigns 10% Feedback on Dealer Sales and Service Experiences 9% Opinions on Product Styling and Features 8% People In Market for a Toyota 8% Incident Reports Involving a Toyota Vehicle 7% Feedback on Product Quality 5% Customers Advocating for the Brand 2% Completely Irrelevant 2014 Study 50% Noise
  • 11.
    Categorize and Prioritizeincoming Social Media interactions in Real-Time using Spark MLLib Campaign Opinions Customer Feedback Product Feedback Noise
  • 13.
    First Spark MLlibExperiment •  Seat Cover Wrinkles/Cracking •  Brake Noise •  Shift Quality •  Oil Leaks •  HVAC Odor •  Dead Battery •  Rodent Wire Harness Damage •  Paint Chips Time-box project to 12 Weeks Classify at min 80% accuracy
  • 14.
  • 15.
  • 16.
    When I’m backingup in my 2012 Prius, it sounds like something hanging up or scraping as it rotates and only happens in the morning.. I hear a squeak coming from the back wheel of my Prius as I pull out from my driveway in the morning.
  • 17.
    Extract features fromthe Verbatim Text Identify Training Data for Brake Noise Model Extract Category (Label) matching Model Objective
  • 18.
    Social ML Pipeline Natural  Language  Processing  Options Feature  Engineering Options Statistical  Selection (Chi  Square) All Top TopRandomRandom Popular Vectorization   Options Machine  Learning   Model TF*IDF TF  Options Natural Boolean Log Augmented IDF  Options Unary Inverse Inverse  Smooth ProbDF Support  Vector  Machine  (MLlib) Validation  Set   (1  Fold) Cross  Validation:  Repeat  10  times Training  Set (9  Fold) Multiple  Iteration:  Optimal  SVM   Parameter Training  selection  filters StopwordsStemming N-­‐grams   extraction Cleaning   (#,’,  /,  @) Positive  &  Negative   training  Sets Labeled  Data (Surveys,  Hand   Scored  Social) Extracted  n-­‐grams
  • 19.
  • 20.
  • 22.
  • 23.
  • 24.
  • 25.
    False Positives I justhad my friend at the toyota dealer rotate my tires and he said … that the brake pads are getting thin really fast. So what should I do when they get too thin in the future and start to squeak?
  • 26.
    False Positives i cutthe iac hose as shown in figure 20 in the manual but when i start the car, it started gasping for air... choking... sounds like it's about to die out. i bought the power brake check valve (80190 part for kragen)... but either i'm not installing it right or it's the wrong size... i have no idea.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
    Noise squeak grind groan squeal Distance SimilarityBetween Concepts pads caliper rotor wheel drum Brakes Distance is calculated between concepts based on the Minkowski distance formula.
  • 32.
  • 34.
  • 35.
  • 36.
    Social ML Pipeline Natural  Language  Processing  Options Feature  Engineering Options Statistical  Selection (Chi  Square) All Top TopRandomRandom Popular Vectorization   Options Machine  Learning   Model TF*IDF TF  Options Natural Boolean Log Augmented IDF  Options Unary Inverse Inverse  Smooth ProbDF Support  Vector  Machine  (MLlib) Validation  Set   (1  Fold) Cross  Validation:  Repeat  10  times Training  Set (9  Fold) Multiple  Iteration:  Optimal  SVM   Parameter Training  selection  filters StopwordsStemming N-­‐grams   extraction Cleaning   (#,’,  /,  @) Positive  &  Negative   training  Sets Labeled  Data (Surveys,  Hand   Scored  Social) Extracted  n-­‐grams Synonyms POS Feature  Manager Negation  Evaluator Distance  Similarity Manhattan   Distance  (L1) -­‐ |x|+  |y| Euclidean   Distance  (L2) -­‐ √(|x|^  2  +  |y|^  2) Concept   Manager Features Concept  Interpreter
  • 37.
  • 38.
  • 39.
    TEAM TOYOTA SparkTips •  Education and Inclusion •  Pace Yourself •  Design and Plan a Transitional Architecture to Incrementally Introduce Spark elements into your Applications •  Use Joda Time for Date Comparisons
  • 40.
    TEAM TOYOTA LessonsLearned •  Be mindful of AKKA versions when trying to Build a new Spark release to a packaged Hadoop Distribution •  Use SparkSQL versus DSLs for Joins •  Remember to configure Memory Fraction based on the size of your data.
  • 41.
    Brian Kursar –Sr Data Scientist Toyota Motor Sales IT Research and Development @briankursar Visit us at the TOYOTA Booth here at the Spark Summit Today.