1. Girls Who Code and Do Data Science
@EstherVasiete
Data Scientist
July 12th, 2016
Girls Who Code
Summer Immersion Program
2. About me
• Born and raised in Barcelona
• Bachelor’s Degree in Electrical Engineering
3. About me
• Studied abroad in UK
- Best time of my life
- Developed an interest in image processing and computer vision
- Also developed an interest in machine learning, I just didn’t know then
4. About me
• Did my Masters at CU Boulder
• Officially, I received my diploma in EE
- Unofficially, I like to think about it as a CS degree
- I managed to cross-list most courses and thesis
advisor so that I could feed my growing interest for
machine learning
5. About me
• Once I graduated, I moved to San Francisco
- My first data science gig
14. Gene Sequencing
Smart Grids
COST TO SEQUENCE
ONE GENOME
HAS FALLEN FROM
$100M IN 2001
TO $10K IN 2011
TO $1K IN 2014
READING SMART METERS
EVERY 15 MINUTES IS
3000X MORE
DATA INTENSIVE
Stock Market
Social Media
FACEBOOK UPLOADS
250 MILLION
PHOTOS EACH DAY
In all industries billions of data points represent
opportunities for data science
Oil Exploration
Video Surveillance
OIL RIGS GENERATE
25000
DATA POINTS
PER SECOND
Medical Imaging
Mobile Sensors
18. Data Sources for Predictive Maintenance
VIN
Timestamp
DTC Code
Odometer
Speed
Acceleration
Engine Temperature
Engine Torque GPS
Coordinates
etc.
VIN
Date vehicle in
Date vehicle out
Repair code
Parts replaced
Warranty claims
Repair Comments
Vehicle Data Car Repairs Data
19. Predicting Job Type from Diagnostic Trouble Codes
(DTCs)
Time
Job Type:
Transmission
Job Type:
Transmission
Engine
Job Type:
Regular check
DTC: B DTC:
B,
P, C
DTC: U
DTC: B DTC: B
DTC:
B, P, C, U
DTC:
P, B, U
DTC: P DTC: B DTC:
B,P
DTC:
B,P
Can the DTCs
observed here predict
this Job Type?
Can the DTCs observed
here predict this Job
Type?
Can the DTCs observed
here predict this Job
Type?
20. Predicting Job Type: a multi-class classification
problem
DF
12
10
DF
12
15
DF
29
80
AB
10
29
AB
16
22
AB
16
25
AB
86
22
CT
34
02
CT
34
08
CT
35
60
CT
24
09
Vehicle
Features
22. • Diagnostic Trouble Codes (DTCs) are not always symptomatic of an
ensuing repair.
• Hence, creating a rule-based approach for repairs based on DTCs has
been challenging to construct.
• A machine learning approach could be a better solution to infer the
relationship between groups of DTCs and repairs.
• Become a mechanic and solve a few car repairs, or become a data
scientist and solve millions!
Takeaways