Five Reasons Enterprise Adoption Of
Spark Is Unstoppable
Mike Gualtieri, Principal Analyst
February 17, 2016 New York
Twitter: @mgualtieri
#Customers
REASON
ADOPTION
1. Customer experience is a top
priority for enterprises.
© 2015 Forrester Research, Inc. Reproduction Prohibited 4
52%
53%
53%
54%
58%
64%
64%
65%
66%
73%
75%
0% 10% 20% 30% 40% 50% 60% 70% 80%
Better leverage big data and analytics in business decision-making
Create a comprehensive strategy for addressing digital technologies like mobile,
social & smart products
Create a comprehensive digital marketing strategy
Better comply with regulations and requirements
Improve differentiation in the market
Increase influence and brand reach in the market
Address rising customer expectations
Improve our ability to innovate
Reduce costs
Improve our products /services
Improve the experience of our customers
A strong majority of business leaders prioritize
improved customer experience and products.
› Base: 3,005 global data and analytics decision-makers
› Source: Global Business Technographics Data And Analytics Online Survey, 2015
For you For all For segments For you
Demographic
Relationships
Hyper-Personal,
Real-Time
Relationships
Personal
Relationships
Mass
Relationships
CustomerExperience
1800 1900 1950 2000 2015
Customers want and increasingly expect
to be treated like celebrities.
• Learn individual customer
characteristics and
behaviors (understanding)
• Detect customer needs and
desires in real-time
(context)
• Adapt applications to serve
an individual customer
(experience)
Celebrity experiences must:
© 2015 Forrester Research, Inc. Reproduction Prohibited 8
Fortunately, every industry is graced with more
data
› Richer transactional data from portfolio of hundreds of
business applications
› Usage and behavior data from web and mobile apps
› IoT device sensor and event data
› Social media data
› Log data
› Data economy – firms buying and selling data
Using your best estimate, what is the size of
all data stored within your company?
Source: Forrester Research, September 2015
Base: 100 US Managers and above currently using Hadoop for processing and analyzing data.
Enterprises have plenty of data from both internal and
external sources
10-49
Terabytes
5% 50-99
Terabytes
12%
100-500
Terabytes
54%
Greater than
500
Terabytes
29%
Internal
business
data
49%
External
source data
51%
What % of the data available is from internal
business applications (ERP and business
applications) versus external sources
(social, IoT)?
© 2015 Forrester Research, Inc. Reproduction Prohibited 10
Learn Model Detect Adapt
Four kinds of analytics are necessary
Predictive
Analytics
Streaming
Analytics
Descriptive
Analytics
(Advanced Analytics)
Prescriptive
Analytics
Batch Real-time
Most firms invest here They must invest here too
© 2015 Forrester Research, Inc. Reproduction Prohibited 11
Source: Forrester Research
That’s why use of advanced analytics is surging
“What is your firm's/business unit's current use of the following technologies?”
Source: Forrester's Global Business Technographics Data And Analytics Survey, 2015 and 2014
Base: 1805 (2015), 1063 (2014)
19%
19%
24%
31%
34%
22%
22%
35%
31%
43%
53%
54%
50%
50%
69%
39%
42%
42%
42%
42%
43%
43%
46%
48%
52%
54%
55%
56%
57%
69%
Non modeled data exploration and discovery
Search/interactive discovery
Streaming analytics
Metadata generated analytics
OLAP
Advanced visualization
Text analytics
Location analytics
Predictive analytics
Process analytics
Embedded analytics
Web analytics
Dashboards
Performance analytics
Reporting
2015
2014
Most of your
competitors
still haven’t
started!
#Hadooponomics
REASON
ADOPTION2. Hadoop and friends makes
analytics of all kinds cost-effective
at scale.
#
100%
Number of enterprises that
Forrester estimates will adopt
Hadoop and friends!
Hadoop is designed for volume.
Spark is designed for speed.
© 2015 Forrester Research, Inc. Reproduction Prohibited 18
Spark and Hadoop can coexist in the same
cluster.
#Perishable
REASON
ADOPTION3. Perishable insights must be
captured and used before they
expire (or rot).
Perishable insights can have exponentially more
value than sleepy, after-the-fact traditional
historical analytics.
All data is born fast!
110010011011001
010010011
010011001101
0100
CustomerData
Transactions
DataWarehosue
IoT
But, analytics is usually done much later.
#WhyWait
How can you prevent this dude from fleecing
you right now?
What offers should you make to your customer if
they are within proximity of your store right now?
Resilient Distributed Datasets (RDD) is a
generalized data structure that can cache data in-
memory and spool to disk if necessary.
58,000x
© 2015 Forrester Research, Inc. Reproduction Prohibited 30
Spark data processing jobs run exponentially
faster when the data set fits in memory.
© 2015 Forrester Research, Inc. Reproduction Prohibited 31
Why not just pop your data in-memory?
Planning, implementing, or expanding the use of
in-memory data platform.
73%
Base: 1,805 global data and analytics decision-makers
Source: Forrester Global Business Technographics Data And Analytics Online Survey, 2015
#MMLA
REASON
ADOPTION4. Massive Machine Learning
Automation (MMLA) is the future
of data science.
Massive Machine Learning Automation (MMLA)
is the only competitive way forward.
Data scientists have slogged through the
same iterative process for 20 years
LEARNING AUTOMATION
MASSIVE MACHINE
Tools and technologies that automate through
configuration rather than coding the process of
data preparation, model building using statistical
and machine learning algorithms, model
evaluation, and model monitoring at scale.
The seven characteristics of massive machine
learning automation.
REASON
ADOPTION
5. Spark community is diverse
and innovating fast.
© 2015 Forrester Research, Inc. Reproduction Prohibited 41
Learn Model Detect Adapt
Only the analytical enterprise can compete and
win in the age of the customer
Predictive
Analytics
Streaming
Analytics
Descriptive
Analytics
(Real-time)
Prescriptive
Analytics
(Continuous Batch)
   
#Insights
I need
insights.
You shall have
none - until you
build a continuous
analytics pipeline.
© 2015 Forrester Research, Inc. Reproduction Prohibited 44
Generate industrial strength analytics with
Spark and Hadoop
forrester.com
Thank you
Mike Gualtieri
mgualtieri@forrester.com
Twitter: @mgualtieri

5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri

  • 1.
    Five Reasons EnterpriseAdoption Of Spark Is Unstoppable Mike Gualtieri, Principal Analyst February 17, 2016 New York Twitter: @mgualtieri
  • 2.
  • 3.
    REASON ADOPTION 1. Customer experienceis a top priority for enterprises.
  • 4.
    © 2015 ForresterResearch, Inc. Reproduction Prohibited 4 52% 53% 53% 54% 58% 64% 64% 65% 66% 73% 75% 0% 10% 20% 30% 40% 50% 60% 70% 80% Better leverage big data and analytics in business decision-making Create a comprehensive strategy for addressing digital technologies like mobile, social & smart products Create a comprehensive digital marketing strategy Better comply with regulations and requirements Improve differentiation in the market Increase influence and brand reach in the market Address rising customer expectations Improve our ability to innovate Reduce costs Improve our products /services Improve the experience of our customers A strong majority of business leaders prioritize improved customer experience and products. › Base: 3,005 global data and analytics decision-makers › Source: Global Business Technographics Data And Analytics Online Survey, 2015
  • 5.
    For you Forall For segments For you Demographic Relationships Hyper-Personal, Real-Time Relationships Personal Relationships Mass Relationships CustomerExperience 1800 1900 1950 2000 2015
  • 6.
    Customers want andincreasingly expect to be treated like celebrities.
  • 7.
    • Learn individualcustomer characteristics and behaviors (understanding) • Detect customer needs and desires in real-time (context) • Adapt applications to serve an individual customer (experience) Celebrity experiences must:
  • 8.
    © 2015 ForresterResearch, Inc. Reproduction Prohibited 8 Fortunately, every industry is graced with more data › Richer transactional data from portfolio of hundreds of business applications › Usage and behavior data from web and mobile apps › IoT device sensor and event data › Social media data › Log data › Data economy – firms buying and selling data
  • 9.
    Using your bestestimate, what is the size of all data stored within your company? Source: Forrester Research, September 2015 Base: 100 US Managers and above currently using Hadoop for processing and analyzing data. Enterprises have plenty of data from both internal and external sources 10-49 Terabytes 5% 50-99 Terabytes 12% 100-500 Terabytes 54% Greater than 500 Terabytes 29% Internal business data 49% External source data 51% What % of the data available is from internal business applications (ERP and business applications) versus external sources (social, IoT)?
  • 10.
    © 2015 ForresterResearch, Inc. Reproduction Prohibited 10 Learn Model Detect Adapt Four kinds of analytics are necessary Predictive Analytics Streaming Analytics Descriptive Analytics (Advanced Analytics) Prescriptive Analytics Batch Real-time Most firms invest here They must invest here too
  • 11.
    © 2015 ForresterResearch, Inc. Reproduction Prohibited 11 Source: Forrester Research That’s why use of advanced analytics is surging “What is your firm's/business unit's current use of the following technologies?” Source: Forrester's Global Business Technographics Data And Analytics Survey, 2015 and 2014 Base: 1805 (2015), 1063 (2014) 19% 19% 24% 31% 34% 22% 22% 35% 31% 43% 53% 54% 50% 50% 69% 39% 42% 42% 42% 42% 43% 43% 46% 48% 52% 54% 55% 56% 57% 69% Non modeled data exploration and discovery Search/interactive discovery Streaming analytics Metadata generated analytics OLAP Advanced visualization Text analytics Location analytics Predictive analytics Process analytics Embedded analytics Web analytics Dashboards Performance analytics Reporting 2015 2014 Most of your competitors still haven’t started!
  • 12.
  • 13.
    REASON ADOPTION2. Hadoop andfriends makes analytics of all kinds cost-effective at scale.
  • 14.
  • 15.
    100% Number of enterprisesthat Forrester estimates will adopt Hadoop and friends!
  • 16.
    Hadoop is designedfor volume.
  • 17.
  • 18.
    © 2015 ForresterResearch, Inc. Reproduction Prohibited 18 Spark and Hadoop can coexist in the same cluster.
  • 21.
  • 22.
    REASON ADOPTION3. Perishable insightsmust be captured and used before they expire (or rot).
  • 23.
    Perishable insights canhave exponentially more value than sleepy, after-the-fact traditional historical analytics.
  • 24.
    All data isborn fast!
  • 25.
  • 26.
  • 27.
    How can youprevent this dude from fleecing you right now?
  • 28.
    What offers shouldyou make to your customer if they are within proximity of your store right now?
  • 29.
    Resilient Distributed Datasets(RDD) is a generalized data structure that can cache data in- memory and spool to disk if necessary. 58,000x
  • 30.
    © 2015 ForresterResearch, Inc. Reproduction Prohibited 30 Spark data processing jobs run exponentially faster when the data set fits in memory.
  • 31.
    © 2015 ForresterResearch, Inc. Reproduction Prohibited 31 Why not just pop your data in-memory?
  • 32.
    Planning, implementing, orexpanding the use of in-memory data platform. 73% Base: 1,805 global data and analytics decision-makers Source: Forrester Global Business Technographics Data And Analytics Online Survey, 2015
  • 33.
  • 34.
    REASON ADOPTION4. Massive MachineLearning Automation (MMLA) is the future of data science.
  • 35.
    Massive Machine LearningAutomation (MMLA) is the only competitive way forward.
  • 36.
    Data scientists haveslogged through the same iterative process for 20 years
  • 37.
    LEARNING AUTOMATION MASSIVE MACHINE Toolsand technologies that automate through configuration rather than coding the process of data preparation, model building using statistical and machine learning algorithms, model evaluation, and model monitoring at scale.
  • 38.
    The seven characteristicsof massive machine learning automation.
  • 40.
    REASON ADOPTION 5. Spark communityis diverse and innovating fast.
  • 41.
    © 2015 ForresterResearch, Inc. Reproduction Prohibited 41 Learn Model Detect Adapt Only the analytical enterprise can compete and win in the age of the customer Predictive Analytics Streaming Analytics Descriptive Analytics (Real-time) Prescriptive Analytics (Continuous Batch)    
  • 42.
  • 43.
    I need insights. You shallhave none - until you build a continuous analytics pipeline.
  • 44.
    © 2015 ForresterResearch, Inc. Reproduction Prohibited 44 Generate industrial strength analytics with Spark and Hadoop
  • 45.