The document discusses the importance of a data-driven culture for businesses. It provides the following key points:
1. Research has shown that companies that emphasize data-driven decision making have 5-6% higher productivity and output than comparable companies. This relationship also appears in other financial metrics like return on equity.
2. Data science draws from various fields like operations research, probability theory, analytics, and computer science. It is used for optimal decision making, handling uncertainties, generating insights from data, and implementing analytical solutions.
3. When adopting a data-driven approach, companies should focus on specific business goals and KPIs rather than just collecting data. Iterative testing is also important to measure impact
2. survey data on the business practices and IT investments of 179 large,
publicly traded companies
Firms that emphasise “data driven decision making”
have output and productivity that is 5-6% higher than what would be
expected given other investments and IT usage.
relationship also appears in asset utilisation, return on equity and market
value
Why “data-driven”
WHY
2
Brynjolfson et al (2011) on Data-Driven
3. Business acumen
what for
Operations Research
optimal decisions and actions
Probability theory
how to handle uncertainties
Analytics
insights and machine learning from data
Computer Science
how to implement all that
Data Science in business
WHY
3
5. BASICS
5
Some dimensions
1. Business case
2. Analytical task
1. Active - Passive system
2. Informative - Operative aim
3. Modelling (model selection and fitting)
4. Data: structure, amount, velocity, and source
REAKTOR / JOHAN HIMBERG
FEBRUARY 2016
7. SECTION TITLE
7
Beware of empty “data-speak”
A quote from my colleague Janne Sinkkonen from a presentation at Helsinki University Machine
learning course:
“Data-speak” hides the processes behind data.
What creates the data? What is done with the results?
The goal is not “data analysis”
Define your goal and setup without using the word ‘data’.
REAKTOR
2016
9. Operations
BUSINESS CASE
9
Create beneficial events
marketing: targeting, cross-sell, up-sell, conversion
find right product/service to sell or buy, find a good doctor, expert etc.
Avoid non-beneficial events
churn, people leaving, waste,
credit loss, fraud, …
system failures, …
Optimize
customer value,
work force, schedules,
prices, discounts, stocks,
relevancy for customer,
production quality, speed
Rationalise
process efficiency, lead times, handle complexity, search time …
Understand: customer & product base, transactions, or processes
internally: ERP, CRM, HR, sales systems, production, …
externally: location, routes, weather, demographics, estates, …
10. Efficiency and competition
React faster, streamlined decision making, risk awareness
Financial efficiency
Innovations
Well-informed strategic decisions
Understanding customer groups’ needs for product and service
development
Understanding and predicting world events, economics, demographics, ….
React to market fluctuation or changes in financial environment
Internal and external image and culture
Transparency, learning as a part of company culture
Customer satisfaction, personalisation, brand
Strategic
BUSINESS CASE
10
11. Netflix
"The goal of our ranking system is to find the best possible ordering of a set of items for a member, within a specific
context, in real-time. ... Our business objective is to maximize member satisfaction and month-to-month
subscription retention, which correlates well with maximizing consumption of video content.
- 2012 Xavier Amatriain and Justin Basilico, Personalization Science and Engineering
Example
VIRTUES
11
13. BASICS
13
Some dimensions
1. Business case
2. Analytical task
1. Active - Passive system
2. Informative - Operative aim
3. Modelling (model selection and fitting)
4. Data: structure, amount, velocity, and source
REAKTOR / JOHAN HIMBERG
FEBRUARY 2016
14. BASICS
14
Informative - Operative
Informative (for understanding)
Analysis results for understanding things, results for management for making decisions:
reports, predictions, what-if analyses, simulations, visualisations,…
Operative
Automated system that makes decisions based on some rules or models, or
results that are directly operative, if not automated.
REAKTOR / JOHAN HIMBERG
FEBRUARY 2016
15. BASICS
15
Active - Passive
Active
You make an “intervention” and gather evidence in tests designed to reveal an effect.
Example: A/B testing.
Passive
Data is just collected, captured “as it happens”: customer transactions, sales, web-browsing,
tweets
REAKTOR / JOHAN HIMBERG
FEBRUARY 2016
16. BASICS
16
Use cases
REAKTOR
2016
Descriptive
What has happened?
Diagnostic
Why did it happen?
Passive Active
Customer profiles
Customer segmentation
Shopping cart analysis
Predictive
What will happen?
Prescriptive
What should I do?
Informative
Operative
Marketing impact analysis
Price elasticity analysis
Web design testing
Up-sell/cross-sell
New customer acquisition
Churn prediction
Life-time value prediction
Demography prediction
Marketing impact optimisation
Recommendation system
in a dynamic environment
18. RISKS / PROBLEMS
18
Issues by analytics use case
REAKTOR
2016
Descriptive
• isolated / ad hoc reports
• isolated ad hoc decisions
• feedback loop (report - decision
- effect)
• ignoring statistics
• analysts as sql-monkeys
• UI / visualization
Diagnostic
• statistical skills
• testing and organisation
• correlation vs. causality
• requires lots of
communication
Passive Active
Predictive
• what to predict: how to
quantify the target
• access to historical data
• quantifying and understanding
the risk(s)
• prediction accuracy validation
for future
Prescriptive
• what to optimize?
• complex software system
• technical feedback loop
• co-op between “human” and
“artificial intelligence”
• monitoring
Informative
Operative
19. •Focusing on wrong things
•not recognising the analytics use cases
•“data first”: long time from investment to benefits
•not starting from the beef: actions and decisions
•thinking only IT solutions and products
•careful examination and validation of the algorithms, but not setting targets
and risks according to the business target
•Organisation
•silos: communication through hierarchy
•no access to data, internal politics
•technical details decided by business people
•business criteria set by technical people
Examples…
RISKS / PROBLEMS
19
20. •Underestimating complexity (time & scope)
•both software and analytics to be build simultaneously
•the time and effort needed with “data wrangling”
•the time used for UIs and visualisations
•the feedback loop
•Unrealistic expectations (quality)
•on analytical systems in general (they are not that intelligent); rules needed
•a product, a model, an algorithm, a data scientist solves all the problems
•risks and targets cannot always be defined properly right away
•there is no guarantee on accuracy on a particular case before trying
…more examples
RISKS / PROBLEMS
20
22. Wise: Solve the right problems with analytics!
Determined: aim at specific, concrete things
Curious: be ready to divert, seek for evidence
Bayesian: understand uncertainties and risks
Truthful: don’t bend results upon wishes, it’s data science
Courageous: act on evidence
Active and Agile: test, don’t just observe; inspect - adapt - learn
Transparent and Helpful: co-operate from end-to-end, don’t silo
Culture that helps to handle risk
VIRTUES
22
24. Netflix prize competition (2006-2008)
Who gets the best RMSE (root mean squared error) on true user likings?
BUT
"The goal of our ranking system is to find the best possible ordering of a set of items for a member, within a specific
context, in real-time. ... Our business objective is to maximize member satisfaction and month-to-month
subscription retention, which correlates well with maximizing consumption of video content. We therefore optimize
our algorithms to give the highest scores to titles that a member is most likely to play and enjoy.”---Netflix Prize
objective... is just one of the many components of an effective recommendation system... We also need to take
into account factors such as context, title popularity… Supporting all the different contexts in which we want to make
recommendations requires a range of algorithms that are tuned to the needs of those contexts.”
- 2012 Xavier Amatriain and Justin Basilico, Personalization Science and Engineering
Aim at the right things
VIRTUES
24
25. Always aim at something specific … but be open-minded and curious
Example: Röntgen and Fleming (Nobel laureates)
• their most famous findings were “accidental”, but
• they were skilled scientists doing disciplined research for some other aim
Explore occasionally “from data to insights”. But not aimlessly.
If you find something interesting, make a disciplined analysis, preferably a test.
Curiosity
VIRTUES
25
27. The main ingredients of data science!
Making decisions based on data analysis requires the concepts of risk and
probability.
Understanding probabilities
VIRTUES
27
29. Courage
“Data driven means that progress in an activity is compelled by data
rather than by intuition or personal experience. It is often labeled as
the business jargon for what scientists call evidence based decision
making
- Wikipedia 2016-02-24
“I take risks, sometimes patients die. But not taking risks causes more
patients to die, so I guess my biggest problem is I've been cursed with
the ability to do the math.
- Fictional character Dr. House in Fox television series “House”
31. Agile - Transparent
Doing data-driven work and data science in any organisation model boils
down to
“Involve everyone along the information path”
Agile development - Team decides details
Start from
•concrete actions that can be optimized
•decisions they require, and
•how to measure the effects properly
Remember the feedback loop!
Develop constantly
Lecture @AaltoBIZ, Johan Himberg, 2015
32. Action
optimize
decide
deploy
Data
big, small, open
local, web, meta, …
Information
report
visualize
model
Businessdrivers
aim 1
aim 2
aim 3
aim 4
aim 5
For example
• Automatised decisions;
recommendation, targeting
• Simulation
• prescriptive, predictive
modelling
For example
• documentation on meaning
of the data
• KPIs, profiles, segments,
factors, DW dashboards
• descriptive, diagnostic,
predictive modelling
For example
• source integrations
• Extract - Load - Transform
• Metadata
• modelling for cleansing &
consistency
modelling
what are the actions what are the insights
wrangling
what data means
testing
what is the impact
Think & plan from deployment to data
Pick an aim!
Lecture @AaltoBIZ, Johan Himberg, 2015
33. Action DataInformation
Businessdrivers
aim 1
start from here!
aim 3
aim 4
aim 5
For example
• Business: need optimising
for customer retention
• Marketing: we could start
with special offer by SMS
• Data Scientist: we’ll set up
test & control groups!
For example
• Solution expert: Field ZPOR
means revenue per unit and
it is calculated based on …
• Customer transactions are
not in Data Warehouse,
they’re aggregated on
monthly level - Let’s get daily
data from system Z
For example
• Now we have transactions
for 1M users for 1 yr fields
a,b,c,d,e …
• …
modelling
what are the actions what are the insights
wrangling
what data means
testing
what is the impact
Data-Driven is inherently iterative and benefits from agility.
Data and processes are often not like assumed.
Be curious, keep backlog, inspect, adapt.
Lecture @AaltoBIZ, Johan Himberg, 2015
34. Action DataInformation
Businessdrivers
aim 1
aim 2
aim 3
aim 4
aim 5
For example
• deploy campaign, collect
responses
For example
• calibrate & apply model
For example
• get data for modeling
• store results
modelling
what are the actions what are the insights
wrangling
what data means
testing
what is the impact
Execute based on model, collect data
THE LOOP: results
35. Action DataInformation
Businessdrivers
aim 1
aim 2
aim 3
aim 4
aim 5
Backlog example
• test & control group
handling in marketing
automation
• Involve N.N. to the process
Backlog example
• define new information
source
• Look for a new data source
for determining income on
zip code areas
• correct documentation
• automatization for the
campaign modelling
Backlog example
• better system configuration
& architecture
• automatization for the
campaign process…
• new data: record information
on all campaigns
modelling
what are the actions what are the insights
wrangling
what data means
testing
what is the impact
Information path focused backlog
Lecture @AaltoBIZ, Johan Himberg, 2015
36. Don’t silo
• A change of culture; information (not data) is everybody’s business as well as
money
• One data scientist can’t excel all of this:
• PO / Technical Account Manager
• Business specialist
• Solution owner / process owner
• Data Steward
• Developer
• Visualization / UX expert
37. Data Scientists’ special role
• Data scientists main tasks are in methods, but also in
processes and machinery of
• making evidence based decisions (automated if possible)
• finding out confidence on the outcome (by active tests if
possible)
• getting insights based on models and data
• Data scientist often act as a “glue”.
Lecture @AaltoBIZ, Johan Himberg, 2015
39. Technology
• Different analytical tasks need different tools. One has to integrate
different systems. Remember that you need a feedback loop!
• Prefer systems
• that give mass-access to historical, transactional data on
individual level instead of just aggregates (avoid being “blinded by
averages”)
• from which you’ll get the data, transformations, and results out to
another system (avoid being “data hostage”)
• where you see what the analytics actually does at least on modular
level (avoid being “method hostage”) Prefer being able to see the
actual implementation (open source)
• Pick a product when you know the task, your needs, the product
quality.
Lecture @AaltoBIZ, Johan Himberg, 2015
40. References
• Brynjolfsson, Erik and Hitt, Lorin M. and Kim, Heekyung Hellen, Strength in Numbers: How Does Data-
Driven Decisionmaking Affect Firm Performance? (April 22, 2011). Available at SSRN:http://ssrn.com/
abstract=1819486 or http://dx.doi.org/10.2139/ssrn.1819486
• Netflix case: http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html
• Big Data landscape: http://mattturck.com/2016/02/01/big-data-landscape/#more-917
• Data science skills
• http://www.oralytics.com/2012/06/data-science-is-multidisciplinary.html
• http://www.oralytics.com/2013/03/type-i-and-type-ii-data-scientists.html