4. 4
IN REALITY
It is very easy to start using Hadoop
and Cloud now.
So it is true that now most people doing
traditional things with just larger data
sets.
And at much lower cost, of course.
So it looks like the size matters, and this
is just another technology
5. 5
BUT IT IS …
Completely new mindset and
approach to analytics
Solution to satisfy new, “mass
market” analytics
And you cannot skip it
6. 6
YOU CAN FEEL THIS AS …
Developers (Java, .NET etc.), non-
BI and even non-IT people talk and
work with analytics today.
That was not the case before.
So what happens?
7. 7
TRADITIONAL ANALYTICS
Expensive
Separate and isolated BI world
Analyzing transactions (data you
cannot afford to lose or calculate
with errors)
Historical data and strategic decisions
8. 8
AND TODAY THIS IS …
Very small % of analytics (1-5%?)
Analytics Boom
9. 9
EVERTHING IS ABOUT DATA
Mindset: Data Analysis
not OLTP, DWH, ETL
Kimball/Inmon
Any application: UX+Analytics
(Machine Learning i.e.)
Competing on analytics, not just
product and service
Analytics become operational,
mass market
10. 10
THE NEXT BIG SHIFT?
Digital Transformation of Economy
IoT, VR, AR, Machine Learning, AI
Personalized UX
Heavily relies on analytics
11. 11
ANALYTICS TODAY
Fast, Advanced and Predictive
Analytics
o Personalization and customization: from
summary reports to a lot of tailored
data-driven actions (in near real time)
o Fast prototyping, implementation,
deployment and fast performance
o Data lakes
12. 12
EXAMPLE - YESTERDAY
Company sends promo by email to
1M users paying $1 for each email,
50,000 users purchased goods at
$25
Profit: 50,000 * $25 - $1M =
$250,000
This is what traditional analytics
does.
13. 13
EXAMPLE - TODAY
Today
Company identified to send promo
email just to 100,000 users, now
30,000 users purchased goods at $25
Profit: 30,000 * $25 - $100K =
$650,000
No new customers, no new
contracts – just algorithms and more
data
14. 14
USE CASES
o Anomaly Detection
o Recommendation Systems
o Loyalty and Retention Programs
o Optimization
o A/B Testing
o Alarms, Scoring, Diagnosis
o Demand Forecasting and so on.
15. 15
NEW CORE SKILLS
Distributed Data Processing and
Streaming Analytics
Programming (Python, R, Spark)
Math, Statistics
Machine Learning
Deep Learning
16. 16
MACHINE LEARNING
Automation of discovery
Automatically adapt to new
circumstances
Detect patterns
In wide use now. “Self-testing”.
Few lines of code
17. 17
BUILDING BLOCKS
Enriching analysis, development and
quality in software development
o Generic algorithms vs hardcoding
endless IF-ELSE
o Discovering hidden, not obvious
patterns
o Finding anomalies, outliers vs test
cases
18. 18
BI TOOLS NOW
Self-service (less jobs?)
Advanced analytics (requires
understanding stats and machine
learning fundamentals)
19. 19
SOURCE DATA
Non-transactional systems, weak or
no data model
Calculations with probability
Raw, unstructured data from
diverse data sources
Extracting small relevant pieces of
data from huge data sets
25. 25
TRADITIONAL EDW PLATFORMS
o Too expensive ($10,000 per TB and more)
o Large upfront cost
o Not easy procurement, setup and
maintenance
o Designed for relational data, SQL interface
only, limited schema flexibility
o Data must be loaded first (modeled,
prepared and moved)
o Marketing limitations for Appliances
26. 26
TRADITIONAL OPEN SOURCE PLATFORMS
• Designed for relational data, SQL interface
only, limited schema flexibility
• Data must be loaded first (modeled,
prepared and moved)
• Not easily scalable (scale up and down)
27. 27
TRADITIONAL DATA MINING TOOLS
• Expensive
• Smaller community (one more isolated
world)
• Targeted for enterprise users
• Longer release cycles, no way to mix tools
and try fresh new staff etc.
• Scalability and integration issues
28. 28
WHY BIG DATA AND CLOUD
o Extremely economically attractive
o Scalable and elastic
o Self service
o Rich and diverse data tools
o Good enough quality (and
constantly improving)
29. 29
BIG DATA AND CLOUD DESIGN PRINCIPLES
Decoupling Data Storage and Computing
o Database engine does not own data anymore
o Simplified load/extract
o Schema on read
o Not just SQL interface
o Any computing engines on top of data
Commodity Hardware
o Fault tolerant
Scale up and down
30. 30
GROW PATH
From monolithic suites to diverse and rich tool set
SQL tools on Hadoop, Cloud
Advanced Data Analysis and Analytics
o Spark, MapReduce, NoSQL
o Python, R, Java, Scala
o Statistics
o Batch, Streaming, Real-time
Machine Learning and Deep Learning
o Understand use cases
o Understand specific algorithms and their
application
o Implementation
32. 32
LET’S WIN THIS CAR
Suppose you're on a game show, and
you're given the choice of three
doors:
Behind one door is a car; behind the
others, goats.
You pick a door, say No. 3
33. 33
SWITCH OR NOT?
Then the host, who knows what's
behind the doors, opens another
door, say No. 2, which has a goat.
He then says to you, "Do you want
to pick door No. 1?"
Is it to your advantage to switch
your choice?