4. What is Data Science?
Data science is an interdisciplinary field that uses scientific methods,
processes, algorithms and systems to extract knowledge and
insights from data in various forms, both structured and
unstructured
Wikipedia
Data Science is:
- About understanding what data says
- Analysis and Storytelling
- Based on math and statistics
Data Science is NOT:
- Big Data
- Robots
- Magic
5. Data Science in numbers
• Data Science platform market is constantly growing and will reach
$195.7 billion by 2023
• 61% of businesses said they implemented AI in 2017, up from just
38% in 2016
• 20% of all 4-year institutions in the U.S. have at least one analytics
program
• 2.35 million of Data Science and Analytics job listings in 2015,
projected to grow to ~ 2.720 by 2020
• Data scientist job openings on Glassdoor in 2016 had an average
salary of $116k.
Sources: Prescient & Strategic Intelligence, Narrative Science, Tableau, IBM, Glassdor
7. Instead
• Listen carefully our clients / stakeholders
• Explain where data science can and can not
be applied
• Define business metrics to influence
• Propose type of solution
8. Then
• Understand data and processes
• Transform business problem to data science
task
• Develop solution code (googling and
stackoveflowing mostly)
• Productize the solution (god bless data
engineers)
9. Expectations
• Present the solution to client / stakeholder
• Deploy it
• Measure results
• Complete project
11. How data scientists spend
work time?
Data Preparation Data Analysis
80 % 20%
…not exactly…
Adopting business problem:
communication, research,
brainstorm
Data
Analysis
80 % 20%
but:
14. Case study #1
Problem: Large sports retail company experiences lack of proper
stock planning.
DEMAND
DEMAND
DEMAND
PRODUCT: STOCK:
Proactive inventory management
15. Proactive inventory management
Use case: Company launches brand new model of sport
shoes. Supply team wants to know precise distribution of
shoes sizes that need to be delivered from factory to the
stock.
FACTORY STOCK SHOP
100 x
150 x
160 x
140 x
80 x
16. Proactive inventory management
Data Science Solution: Prediction model that provides
recommendations in regards to number of items to meet
customers demand.
PREDICTION
MODEL
98 x
154 x
81 x
…
17. HOW?
1. Collect historical sales data
2. Identify key descriptors / features of the product
3. Train clustering algorithm that separates all products into
sub-categories
4. Apply algorithm to new product to define cluster it belongs to
5. Make a prediction based on internal cluster characteristics
19. Once cluster is detected,
make prediction based on
weighted average of
existing cluster members 0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
36 37 38 39 40 41 42 43 44 45 46
Shoes quantity distribution per size
Member 1 Member 2 Member 3 New member (prediction)
A
Prediction
20. Results
1. Model integrated into production
2. Improved stock planning (must be measured!)
3. Average prediction error is 2% (percentage difference
between predicted values and real ones for all sizes)
4. Clustering algorithm re-trained every month and tuned based
on communication with business team
21. Case study # 2
Flights tickets price predictor
Problem: Over project engagement consultants frequently
travel on client site to deliver best quality of the services. The
most convenient transport is plane whereas flight tickets
costs burn significant part of project budget.
Aim: Reduce spending of flight tickets by choosing optimal
offering
24. What if
• We could forecast ticket price fluctuations exactly until departure
date
• Choose date with lowest price and buy it then
80 81 81
72
81 81 85 85
90 90
120 120
180
0
20
40
60
80
100
120
140
160
180
200
12-Jan 13-Jan 14-Jan 15-Jan 16-Jan 17-Jan 18-Jan 19-Jan 20-Jan 21-Jan 22-Jan 23-Jan 24-Jan
Riga - Paris
12-Jan 13-Jan 14-Jan 15-Jan 16-Jan 17-Jan 18-Jan 19-Jan 20-Jan 21-Jan 22-Jan 23-Jan 24-Jan
25. Why it may work?
80 81 81
72
81 81 85 85
90 90
120 120
180
0
20
40
60
80
100
120
140
160
180
200
12-Jan 13-Jan 14-Jan 15-Jan 16-Jan 17-Jan 18-Jan 19-Jan 20-Jan 21-Jan 22-Jan 23-Jan 24-Jan
Riga - Paris
12-Jan 13-Jan 14-Jan 15-Jan 16-Jan 17-Jan 18-Jan 19-Jan 20-Jan 21-Jan 22-Jan 23-Jan 24-Jan
• Tickets tend to cost more as you get closer to departure date, BUT!
• Prices are still fluctuating
• Proven insights like “book seven weeks in advance”, “the cheapest
day to buy flight is Tuesday”
• Low demand
• Promotions
26. How?
1. Collect (web-scrape) historical data about price fluctuations of the
usual flight directions for our consultants:
Date of
collection
From To Departure
Date
Carrier Price
Dec 1 Riga Paris Jan 25 Air Baltic 80
Dec 1 Riga Frankfurt Jan 25 Lufthansa 92
Dec 2 Riga Paris Jan 24 Air Baltic 80
Dec 2 Riga Frankfurt Jan 24 Lufthansa 91
… … … … … …
27. How?
2. Generate additional features that may be good predictors:
• Days till flight
• Flight distance
• Weekday/Month of departure
• Departure/Arrival airport
• Segment of airline (low cost, premium)
• Number of days till public holiday from departure day
Date of
collecti
on
Days till
flight
From Dep
Airport
To Ar
Airport
Flight
dist
Dep
Date
Dep
weekda
y
Till
public
holiday
Carrier Price
Dec 1 55 Riga RIX Paris CDG 2241 Jan 25 Fr 84 Air Baltic 80
Dec 1 55 Riga RIX Frankfurt FRA 1205 Jan 25 Fr 84 Lufthansa 92
Dec 2 54 Riga RIX Paris CDG 2241 Jan 24 Thu 85 Air Baltic 80
Dec 2 54 Riga RIX Frankfurt FRA 1205 Jan 24 Thu 85 Lufthansa 91
… … … … … … … … … … … …
28. How?
3. Build regression models that predict tickets price starting from day
when data started to be collected:
𝜷 𝟏 ∗ 𝑫𝒂𝒚𝒔_𝒕𝒊𝒍𝒍_𝒇𝒍𝒊𝒈𝒉𝒕 + 𝜷 𝟐* 𝑭𝒓𝒐𝒎 + … + 𝜷 𝒏* 𝑪𝒂𝒓𝒓𝒊𝒆𝒓 = 𝒑𝒓𝒊𝒄𝒆
Oct 1, 2017
𝜷 𝟏 ∗ 𝑫𝒂𝒚𝒔_𝒕𝒊𝒍𝒍_𝒇𝒍𝒊𝒈𝒉𝒕 + 𝜷 𝟐* 𝑭𝒓𝒐𝒎 + … + 𝜷 𝒏* 𝑪𝒂𝒓𝒓𝒊𝒆𝒓 = 𝒑𝒓𝒊𝒄𝒆
…
Oct 2, 2017
…
𝜷 𝟏 ∗ 𝑫𝒂𝒚𝒔_𝒕𝒊𝒍𝒍_𝒇𝒍𝒊𝒈𝒉𝒕 + 𝜷 𝟐* 𝑭𝒓𝒐𝒎 + … + 𝜷 𝒏* 𝑪𝒂𝒓𝒓𝒊𝒆𝒓 = 𝒑𝒓𝒊𝒄𝒆
Today, Dec 7, 2018
29. How?
4. Forecast regression coefficients (using ARIMA, for example) till
departure date of desired tickets
1-Oct 2-Oct 3-Oct 4-Oct … 7-Dec 8-Dec … 24-Jan 25-Jan
Beta1 Beta2 Beta3 BetaK BetaN
FORECAST
Coefficients that were calculated
based on data collected today
33. About me
• Vladyslav Yakovenko
• Senior Data Science consultant
at Deloitte
• Former Data Scientist at
Accenture
• Graduated with Master’s
degree in Statistics from Taras
Shevchenko National University
of Kyiv, Ukraine