SlideShare a Scribd company logo
1 of 37
Download to read offline
APACHE SPARK
SURVEY 2016
REPORT
® ™
Table of Contents
Introduction	3
Foreword: Matei Zaharia	 4
REPORT HIGHLIGHTS	 5
APACHE SPARK’S GROWTH CONTINUES	 13
The Apache Spark Community is Growing 	 14
Spark’s Fastest Growing Areas from 2015 to 2016 	 17
Spark Users are Growing 	 18
Spark Users Employ Multiple Languages 	 19
Spark Components Used in Production	 20
Spark is Used Widely in Organizations	 21
Users Solve Complex Problems	 22
Users Employ Multiple Components	 23
What Users Consider Important	 24
Top Three Storage Technologies 	 25
Section Summary	 26
APACHE SPARK IN THE CLOUD IS GROWING	 27
Trend: Increase in Public Cloud Deployments	 28
Trend: Percentage Decrease in On-Premises Deployments 	 29
Section Summary	 30
APACHE SPARK STREAMING AND MACHINE LEARNING
SURGE IN USAGE 	31
Apache Spark Streaming is Growing	 32
Apache Spark Streaming Engine is the Preferred Choice	 34
Section Summary	 35
Afterword: Reynold Xin	 36
About Databricks	 37
2
SPARK SURVEY 2016
Introduction
In July 2016, Databricks conducted an Apache® Spark™ Survey to
identify insights into how organizations are using Spark as well
as highlight growth trends since the last Spark Survey 2015. In
this report, the results reflect answers from over 900 distinct
organizations and 1615 respondents, who were predominantly
Apache Spark users.
As in 2015, which was a tremendous year in growth for Apache Spark,
this year, too, its growth remains unabated—not only in areas like
the public cloud, but also with the increased use of Spark Streaming
and the use of Machine Learning. 2016 also shows Spark’s robust
adoption across a variety of organizations and users from many
functional roles to build complex solutions, using multiple Spark
components. Of the roles represented in the survey, 41% identified
themselves as data engineers, while 23% as data scientists and 21%
as architects; the rest of the 10% came from technical management
and 5% from academia.
	
1615RESPONDENTS
900DISTINCT ORGANIZATIONS
DATA ENGINEERS
ARCHITECTS
TECHNICAL
MANAGEMENT
ACADEMICS
DATA SCIENTISTS
41%
21%
10%
5%
23%
3
Foreword: Matei Zaharia
I’m delighted to share the results of this year’s Databricks Apache
Spark Survey. As I noted in the previous Spark Survey 2015, we
witnessed a rapid adoption of Spark and the precipitous growth
of the Spark community. And this year’s Spark’s growth trajectory
and trends continue. In particular, I’m excited to see more Spark
deployments in the cloud and more interest in people building real-
time applications using Spark Streaming with multiple components,
such as Machine Learning. Given that Apache Spark 2.0 lays the
foundational steps for Structured Streaming, by providing simplified
and unified APIs to write end-to-end streaming applications called
continuous applications, I anticipate this interest will surge further in
the coming months—with subsequent releases of Spark.
Since its inception, Spark’s core mission has been to make Big
Data simple and accessible for everyone—for organizations of all
sizes and across all industries. And we have not deviated from that
mission. In Apache Spark 2.0, we strived to make Spark easier, faster
and smarter. And we remain committed to our vision of simplicity.
Seventy-six percent of respondents in this survey indicate ease-of-
programing as one of the most important features of Spark.
Since its inception, Spark’s core mission has been to
make Big Data simple and accessible for everyone—
for organizations of all sizes and across all industries.
And we have not deviated from that mission...
M A T E I Z A H A R I A
Chief Technologist at Databricks,
VP of Apache Spark at the Apache Software Foundation
@matei_zaharia
			
Spark’s growth continues across various industries building complex data
solutions by people in various functional roles. It has moved well beyond
the early-adopter phase at tech companies and is now mainstream in
large data-driven enterprises.
4
TOP THREE APACHE SPARK TAKEAWAYS
REPORT HIGHLIGHTS
SPARK STREAMING AND MACHINE
LEARNING SURGE IN USAGE
SPARK’S GROWTH
CONTINUES
SPARK IN THE CLOUD IS GROWING
5
REPORT HIGHLIGHTS
This year the growth trend continues in the
community. Increased growth of Apache Spark
Meetup members, a jump in Spark Summit
attendees, more code contributors, and a surge
in companies represented at the Spark Summit
(from several vertical industries) suggest a
growing and thriving Spark community.
67%
CODE
CONTRIBUTORS
240%
SPARK MEETUP
MEMBERS
2016
1000
2015
600
2016
225,000
2015
66,000
NOTABLE SPARK USERS WHO PRESENTED AT SPARK SUMMIT 2016
57%
NUMBER OF COMPANIES
AT SUMMITS
2016
1800
2015
1144
30%
SPARK SUMMIT
ATTENDEES
2016
5100
2015
3912
6
REPORT HIGHLIGHTS
Asked what Apache Spark components developers use to build complex solutions
for their use cases, 74% of respondents said they use two or more components
to build different types of products.
74%
USE TWO OR MORE
COMPONENTS
of respondents
64%
USE THREE OR
MORE COMPONENTS
of respondents
NUMBER OF COMPONENTS USEDTYPES OF PRODUCTS BUILT
68%
52%
45%
40%
37%
36%
29%
BUSINESS / CUSTOMER INTELLIGENCE
DATA WAREHOUSING
REAL-TIME / STREAMING SOLUTIONS
RECOMMENDATION ENGINES
LOG PROCESSING
USER-FACING SERVICES
FRAUD DETECTION / SECURITY
% of respondents who use Spark to create each product (more than one product could be selected)
7
REPORT HIGHLIGHTS
LANGUAGES USED IN SPARK YEAR-OVER-YEAR
18% 20%
2015 2016
R
36% 44%
2015 2016
SQL
31%
29%
JAVA
2015 2016
58%
62%
PYTHON
2015 2016
71%
65%
SCALA
2015 2016
% of respondents who use each language (more than one language could be selected)
SPARK COMPONENTS USED IN PRODUCTION YEAR-OVER-YEAR
% of respondents who use each component in production (more than one component could be selected)
SQL
24%
40%
2015 2016
DATAFRAMES
15%
38%
2015 2016
STREAMING
14%
22%
2015 2016
ADVANCED
ANALYTICS
(MLlib)
13% 18%
2015 2016
In addition to using multiple Apache Spark components, many respondents indicated that they
use multiple programing languages in Spark. They also are using multiple components in
production, including increased use of Spark Streaming and MLlib.
8
REPORT HIGHLIGHTS
APACHE SPARK’S FASTEST GROWING AREAS IN 2016
57%
STREAMING
USERS
2016
22%
OF RESPONDENTS
2015
14%
OF RESPONDENTS
38%
ADVANCED ANALYTICS
USERS (MLlib)
2016
18%
OF RESPONDENTS
2015
13%
OF RESPONDENTS
153%
DATAFRAME
USERS
2016
38%
OF RESPONDENTS
2015
15%
OF RESPONDENTS
67%
SPARK SQL
USERS
2016
40%
OF RESPONDENTS
2015
24%
OF RESPONDENTS
* * * *
*component used in production
9
REPORT HIGHLIGHTS
APACHE SPARK
DEPLOYMENT
IN PUBLIC CLOUDS
INCREASED BY 10%
SINCE 2015.
51% of users in the 2015 Spark Survey said they
deployed Apache Spark in the public cloud,
compared with 61% of users in 2016, showing
a growth of 20%.
51%
2015
of respondents deployed
in a public cloud
2016
of respondents deploy
in a public cloud
61%
While Apache Spark deployments in the public
cloud increased in 2016, the percentage of Spark
deployments on-premises decreased. For
example, 48% of users in 2015 Spark survey and
42% in 2016 survey said they used Standalone
cluster managers for their on-premises Spark
deployments, showing a 13% percentage decrease.
Similarly, YARN and Mesos show 10% and 36%
percentage decreases respectively in deployments.
2015 2016
40% 48%
36% 42%
2015 2016
STANDALONEYARN
ON-PREMISES DEPLOYMENTS YEAR-OVER-YEAR
% of respondents who use each (more than one deployment could be selected)
11%
7%
2015 2016
MESOS
10
Investments in fast data analytics has surged,
according to Datanami. Since companies
are shifting investments from batch to
real-time applications, respondents in this
survey show an affinity toward building real-
time applications using the Spark Streaming
framework.
Among all the streaming engines, 33% of
respondents said they were heavy users of
Spark Streaming.
REPORT HIGHLIGHTS
51%
35%
S O M E W H A T
I M P O R T A N T
N O T
I M P O R T A N T
14%
of respondents
CONSIDER APACHE
SPARK STREAMING
VERY IMPORTANT
33% of respondents
USE APACHE SPARK
STREAMING A LOT
11
Respondents indicated that Spark Streaming
is very important for building real-time
streaming, recommendation engines, and
fraud detection applications.
Machine Learning has seen an increase in
production usage.
MLlib USE IN PRODUCTION
% of respondents who use the component in production
REPORT HIGHLIGHTS
40%
of respondents develop
RECOMMENDATION
ENGINE PRODUCTS
of respondents develop
REAL-TIME STREAMING
PRODUCTS
45%29%
of respondents develop
FRAUD DETECTION /
SECURITY PRODUCTS
Q: WHICH KINDS OF PRODUCTS DOES YOUR
ORGANIZATION DEVELOP? Select all that apply.
13%
18%
2015 2016
38%
ADVANCED ANALYTICS
PRODUCTION CASES
12
APACHE SPARK’S
GROWTH
CONTINUES
13
The Apache Spark
Community is Growing
The section identifies key growth areas in all aspects
of Spark that are propelling this uptake. Both 2015
and 2016 have seen a tremendous growth in the
Spark community and Spark usage in many vertical
industries.
Spark today remains the most active open source
project in Big Data. Today, there are over 1000
Spark contributors, compared to 600 in 2015 from
250+ organizations. With such large numbers of
contributors and organizations investing in Spark’s
future development, it has engaged a community
of developers globally. The Apache Spark Meetup
groups’ membership continues to flourish, both
nationally and internationally.
APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK’S GROWTH CONTINUES
240%
SPARK MEETUP MEMBERS
67%
CODE CONTRIBUTORS
2016
1000
2015
600
2016
225,000
2015
66,000
14
APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK’S GROWTH CONTINUES
57%
30%
COMPANIES REPRESENTED
AT SUMMITS
SPARK SUMMIT
ATTENDEES
2016
1800
2016
5100
2015
1144
2015
3912
Every year, more users attend Spark Summit, the
largest dedicated conference to the Apache Spark
project. In 2016 there has been an increased number
of attendees from a broad range of organizations
attending this event, with attendees ranging from
developers to data scientists and engineers; to
business users and analysts; and executive level
decision makers. A number of notable users
presented how they use Spark at the Spark Summit
San Francisco 2016.
NOTABLE SPARK USERS WHO PRESENTED AT SPARK SUMMIT 2016
15
APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK’S GROWTH CONTINUES
4 RELEASES IN 2015
1.2, 1.3, 1.4, 1.5
2 MAJOR
RELEASES IN 2016
1.6, 2.0
75%
18%
7%
USE SPARK 1.6 USE SPARK 2.0
OTHER
In just two years, the Spark community has
released six Spark releases. When asked which
version of Apache Spark they are using, 75%
responded that they are using Spark 1.6, while 18%
are using Spark 2.0 (respondents could choose
multiple releases, such as 1.3, or 1.4 or 1.5).
as of September 2016
16
ADVANCED ANALYTICS USERS (MLlib)
IN PRODUCTION
Spark’s Fastest Growing
Areas from 2015 to 2016
Spark Streaming, in particular, has taken
a notable increase in its usage, so has SQL,
MLlib, and Windows users from 2015.
APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK’S GROWTH CONTINUES
57%
STREAMING USERS IN PRODUCTION
2016
22%
OF RESPONDENTS
2015
14%
OF RESPONDENTS
153%
DATAFRAME USERS IN PRODUCTION
2016
38%
OF RESPONDENTS
2015
15%
OF RESPONDENTS 67%
SPARK SQL USERS IN PRODUCTION
2016
40%
OF RESPONDENTS
2015
24%
OF RESPONDENTS
38% 2016
18%
OF RESPONDENTS
2015
13%
OF RESPONDENTS
39%
WINDOWS USERS IN DEVELOPMENT
2016
32%
OF RESPONDENTS
2015
23%
OF RESPONDENTS
17
Spark Users are Growing
Spark is attractive not only to highly-skilled and
technically advanced users. It crosses barriers,
and other users such as business analysts
increasingly use Spark and develop Spark-based
applications in environments other than Linux.
From last year, the percentage of Windows users
employing Spark has increased.
APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK’S GROWTH CONTINUES
LINUX / UNIX
75%
74%
2015 2016
DEVELOPMENT ENVIRONMENTS
WINDOWS
23%
32%
2015 2016
MAC OSX
14%
2015 2016
22%
39%
WINDOWS USERS
YEAR-OVER-YEAR
% of respondents who use each development environment (more than one environment could be selected)
18
71%
65%
2015 2016
SCALA
Spark Users Employ
Multiple Languages
Spark is becoming the key data processing and
computing platform used by a broad range of users.
These users span many vertical industries and use
a variety of programming languages. One reason for
this broad adoption is because Spark is easy to use
and supports familiar programming APIs across
these languages.
Usage of Spark in Python, SQL, and R increased,
while Scala and Java usage decreased. This
indicates that more data analysts are drawn
to Spark from areas other than pure data
engineering, suggesting that Spark usage is
expanding to new and diverse users.
APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK’S GROWTH CONTINUES
WHICH LANGUAGES DO YOU USE SPARK IN?Q:
58% 62%
2015 2016
PYTHON
31%
29%
2015 2016
JAVA
18% 20%
2015 2016
R
36% 44%
2015 2016
SQL
% of respondents who use each language (more than one language could be selected)
19
Spark Components Used
in Production
Since last year, the use of Spark components in
production has increased, especially in Spark
Streaming and advanced analytics with Apache
Spark MLlib (machine learning). This corroborates
with the observation in this report about increased
interest among Spark users to build real-time
streaming applications with Spark Streaming,
using multiple components, including MLlib.
APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK’S GROWTH CONTINUES
DATAFRAMES
15%
38%
2015 2016
SQL
24%
40%
2015 2016
STREAMING
14%
22%
2015 2016
ADVANCED
ANALYTICS (MLlib)
13%
18%
2015 2016
WHICH COMPONENTS OF THE
APACHE SPARK STACK ARE YOU USING?Q:
153%
57%
11%
STREAMING
USERS
ADVANCED ANALYTICS
USERS
67%
SQL
USERS
% of respondents who use each component in production (more than one component could be selected)
DATAFRAMES
USERS
20
WHAT INDUSTRY VERTICAL
BEST DESCRIBES YOUR
ORGANIZATION?
Spark is Used Widely
in Organizations
Spark’s adoption continues to grow across varied
industries because of its unified engine, and because
of its proven performance and versatility that
enables it to process diverse workloads.
The banking sector saw the highest percentage
change in the usage of Spark since 2015, as did the
Health, Medical, Biotech and Pharmacy verticals.
APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK’S GROWTH CONTINUES
18%
CONSULTING
(IT)
25%
SOFTWARE
(SAAS, WEB, MOBILE)
11%
BANKING /
FINANCE
7%
ADVERTISING /
MARKETING /
PR
6%
ECOMMERCE / RETAIL
5%
HEALTH / MEDICAL /
PHARMACY / BIOTECH
CARRIERS / TELECOM
5%
4%
3%
EDUCATION
PUBLISHING / MEDIA
COMPUTERS / HARDWARE
3% 13%
OTHER
Q:
29%
CONSULTING (IT)
USERS
39%
HEALTH / MEDICAL /
PHARMACY / BIOTECH USERS
63%
BANKING
USERS
2016
10.58%
2016
5.42%
2016
18.09%
2015
6.48%
2015
3.89%
2015
13.98%
Percentages rounded to the nearest integer.
21
APACHE SPARK’S GROWTH CONTINUES
Users Solve Complex
Problems
Users are solving complex data problems across
varied industry verticals, as Spark’s unified platform
enables users to build complex solutions using
multiple Spark components for their multiple
data workloads.
68%
52%
45%
40%
37%
36%
29%
BUSINESS / CUSTOMER INTELLIGENCE
DATA WAREHOUSING
REAL-TIME / STREAMING SOLUTIONS
RECOMMENDATION ENGINES
LOG PROCESSING
USER-FACING SERVICES
FRAUD DETECTION / SECURITY
WHICH KINDS OF PRODUCTS DOES
YOUR ORGANIZATION DEVELOP?Q: Select all that apply.
22
31%

APACHE SPARK’S GROWTH CONTINUES
Users Employ Multiple
Components
Because of Spark’s unified engine and its ability
to process multiple workloads within the same
cluster, many Spark users within organizations use
multiple components of Spark for their use cases
and their respective workloads.
Not only are Spark components used separately;
two or more components are often used in
prototyping and production. This unification
blurs the barriers between data scientists, data
engineers, and data analysts—all using the same
unified compute engine.
COMPONENTS USED IN PROTOTYPING
AND PRODUCTION
DATASETS
14%
43%
67%
43%
67%
74%
USE TWO OR MORE
COMPONENTS
of Spark users
64%
USE THREE OR
MORE COMPONENTS
of Spark users
GRAPHX
MLlib
SPARK SQL
SPARK STREAMING
DATAFRAMES
More than one component could be selected.
23
APACHE SPARK’S GROWTH CONTINUES
What Users Consider
Important
Users are drawn to Spark for a number of reasons:
it’s easier to get started quickly because of simple and
consistent APIs; it’s faster because of improvements
in Apache Spark 2.0; and it’s smarter because of
simplified Structured Streaming APIs, allowing users
to build end-to-end continuous applications.
According to our 2015 Spark Survey, 91% of users
consider performance as the most important
aspect of Apache Spark, along with ease of
programming, real-time streaming and advanced
analytics. In this year’s survey, Spark users reflect
these as equally important.
At the time of this survey, Apache Spark
2.0 had just been officially released, and
users displayed a keen interest in using
it. Even though most users run Spark 1.6,
the 2016 survey results suggest they had
quickly started using Spark 2.0.
% OF RESPONDENTS WHO CONSIDERED THE FEATURE
VERY IMPORTANT
PERFORMANCE
91% EASE OF
PROGRAMMING
76%EASE OF
DEPLOYMENT
69%
ADVANCED
ANALYTICS
82%
REAL-TIME
STREAMING
51%
RUN SPARK 1.6
75%
RUN
SPARK 2.0
18%
More than one feature could be selected.
24
73%
WHICH OF THESE TECHNOLOGIES DO YOU
CURRENTLY USE?
58%
82%
of respondents use
KEY-VALUE STORES (NoSQL)
of respondents use
OPEN-SOURCE SQL DATABASES
SPARKAPACHE SPARK’S GROWTH CONTINUES
Top Three Storage
Technologies
A large number of Spark users use technologies
for storage other than Apache®
Hadoop®
, such as
Cassandra, MongoDB and NoSQL as well as other
open-source and proprietary SQL data stores.
Q:
of respondents use
PROPRIETARY SQL DATABASES
Select all that apply.
25
APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK’S GROWTH CONTINUES
Section Summary
Apache Spark’s growth and adoption continues as users, industries, development
environments, disciplines, and programming languages embrace its ease of use and
programming, its unified compute engine, and its performance to solve complex data
problems at scale. Spark allows multiple components to work on multiple workloads
and access data from multiple data sources. All of these factors make Spark an attractive
choice as a unified compute data platform.
26
APACHE SPARK IN THE
CLOUD IS GROWING
27
2016
51%
2015
Trend: Increase in Public Cloud
Deployments
The rise of cloud computing is rapid, inexorable and
causing a huge upheaval in the tech industry, writes
The Economist. “Gartner estimates that about $205
billion, or 6% of the world’s IT budget of $3.4 trillion,
will be spent on cloud computing in 2016—a number
it expects to grow to $240 billion next year,” according
to another article in The Economist.
This survey reflects this trend, as many respondents
are electing to deploy Spark in the public cloud,
mitigating both cost and infrastructure headaches.
Since 2015, we have seen a 20% growth of users
deploying Spark in the public cloud. That is, 61% users
in the 2016 survey said they deployed Spark in the public
cloud compared to 51% in 2015.
APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK IN THE CLOUD IS GROWING
SPARK DEPLOYMENT IN PUBLIC CLOUDS
HAS INCREASED BY 10% SINCE 2015.
2016
61% of respondents
deploy Spark
in a public cloud
28
Trend: Percentage Decrease
in On-Premises Deployments
Although many Spark users run Spark
on-premises alongside Hadoop and other data
sources, some deployment modes in 2016 have
seen a percentage decrease.	
APACHE SPARK STREAMING IS IMPORTANT
2015 2016
WHERE DO YOU RUN SPARK?
Q:
11%
40%
48%
7%
36%
42%
2015 20162015 2016
STANDALONEYARNMESOS
APACHE SPARK IN THE CLOUD IS GROWING
36%
13%
MESOS
SPARK DEPLOYMENTS
STANDALONE
SPARK DEPLOYMENTS
10%
YARN
SPARK DEPLOYMENTS
Select all that apply.
29
Section Summary
Not only do cloud deployments have lower deployment costs and fewer management headaches,
they have higher and proven performance benefits.
Using Apache Spark on 206 EC2 machines, we sorted 100TB of data on disk in
23 minutes. In comparison, the previous world record set by Hadoop MapReduce
used 2100 machines and took 72 minutes. This means that Spark sorted the
same data 3X faster using 10X fewer machines.
APACHE SPARK IN THE CLOUD IS GROWING
R E Y N O L D X I N
Chief Architect & Co-Founder of Databricks
30
APACHE SPARK STREAMING
AND MACHINE LEARNING
SURGE IN USAGE
31
VERY
IMPORTANT
51%
35%
SOMEWHAT
IMPORTANT
NOT
IMPORTANT
14%
Q:
APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK STREAMING AND MACHINE LEARNING SURGE IN USAGE
Apache Spark Streaming
is Growing
Since its release, Spark Streaming has become
one of the most widely used distributed
streaming engines. Interest in developing real-time
applications and advanced analytics is on the rise.
Over half of the survey respondents indicate that
streaming is vital and important for developing
valuable real-time streaming, recommendation
engines, and fraud-detection and security solutions.
HOW IMPORTANT IS
SPARK STREAMING
TO YOUR USE CASE?
40%
of respondents develop
RECOMMENDATION
ENGINE PRODUCTS
45%
of respondents develop
REAL-TIME STREAMING
PRODUCTS
29%
of respondents develop
FRAUD DETECTION /
SECURITY PRODUCTS
Q:
WHICH KINDS OF PRODUCTS DOES YOUR
ORGANIZATION DEVELOP? Select all that apply.
32
APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK STREAMING AND MACHINE LEARNING SURGE IN USAGE
Organizations use Spark Streaming along with
Spark’s other multiple components to develop
streaming applications. Both Spark Streaming and
MLlib saw a notable increase in production use.
SPARK STREAMING AND MLlib USE IN PRODUCTION
2015 2016
13%
18%
2015 2016
STREAMING ADVANCED ANALYTICS (MLlib)
14%
22%
57%
STREAMING
PRODUCTION CASES
38%
ADVANCED ANALYTICS
PRODUCTION CASES
% of respondents who use the component in production (more than one component could be selected)
33
WHICH OF THESE TECHNOLOGIES DO YOU CURRENTLY
USE A LOT FOR STREAMING AND/OR COMPLEX EVENT
PROCESSING CASES?
Q:
APACHE SPARK STREAMING IS IMPORTANT
1% 	 APACHE APEX	
APACHE SPARK
4% 	 KINESIS	
6%	 APACHE STORM 	
1% 	 APACHE FLINK
29% 	 APACHE KAFKA 	
APACHE SPARK
COMPONENT POPULARITY
% of respondents who use the component
anywhere from evaluation to production
(more than one component could be selected) SPARK STREAMING
71%
MLlib
71%
SQL
88%
RDDS
8383%
DATAFRAMES
89%
33%
DO YOU
CURRENTLY
USE SPARK
STREAMING
IN PRODUCTION?
Q:
used it in 2015
14% +57%
are using it today
SPARK STREAMING
PRODUCTION CASES
22%
Apache Spark
Streaming Engine
is the Preferred Choice
Compared to other streaming engines, Spark
is the preferred choice at 33%.
When compared to other Spark components,
Spark Streaming matches MLlib at 71% in use,
from evaluation to production.
In the 2015 Spark survey, 14% of users said they
used Spark Streaming in production, compared to
22% of users in 2016. Overall, we saw a 57% growth
of users using Spark Streaming in production.
APACHE SPARK STREAMING AND MACHINE LEARNING SURGE IN USAGE
Select all that apply.
Note: Respondents were predominately Spark users.
34
Section Summary
Spark Streaming is being used for real-time solutions, from evaluation to production, closer
in usage to Spark’s other commonly used components. As a preferred choice of streaming
engine over others, more organizations are building real-time streaming solutions as they
consider streaming an important Spark feature.
APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK STREAMING AND MACHINE LEARNING SURGE IN USAGE
35
Afterword: Reynold Xin
2015 and 2016 have been exciting years for the adoption and increased
growth of Apache Spark and its community. Two releases—Spark 1.6
and 2.0—have seen major improvements in all aspects of Spark noted
by respondents in this survey as important. I continue to look forward,
and work with the community, to the exciting future ahead for the
Spark platform.
As Spark becomes easier, faster, and smarter, outside the predominantly
IT and Consulting Industry, a newer audience is adopting it, as results from
the survey suggest. Performance, ease-of-use, streaming, and reliability
top the list as most important features. At the time of this survey, we
released Apache Spark 2.0. Ongoing performance improvements, with
Project Tungsten, started in earlier releases and culminated in Spark 2.0.
In addition, Spark 2.0 delivered unified DataFrames and Datasets APIs and
simplified Structured Streaming APIs. All these make Spark an attractive
engine for performing advanced analytics across industry verticals in
solving complex data problems, by users from different functional roles.	
						
Your voice matters. We got an insightful glimpse into the growth and
trends from this year’s survey: who’s using Spark, how they are using it,
what’s important, what new features they use, and what they are using
it for. Just as the feedback from last year’s survey did, these insights will
drive major updates and help shape the future of the Spark platform.
Thank you to everyone who participated in Databricks’ Apache Spark
Survey 2016!
R E Y N O L D X I N
Chief Architect  Co-Founder of Databricks
@rxin
36
Databricks’ vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache® Spark™,
a powerful open source data processing engine built for sophisticated analytics, ease of use, and speed. Databricks is the largest contributor to the open source Apache
Spark project providing 10x more code than any other company. The company has also trained over 20,000 users on Apache Spark, and has the largest number of
customers deploying Spark to date. Databricks provides a just-in-time data platform, to simplify data integration, real-time experimentation, and robust deployment of
production applications. Databricks is venture-backed by Andreessen Horowitz and NEA. For more information, contact info@databricks.com.
TRY DATABRICKS FOR FREE
databricks.com/try-databricks
CONTACT US FOR A PERSONALIZED DEMO
databricks.com/contact-databricks
© Databricks 2016. All rights reserved. Apache, Apache Spark, Spark and the Spark logo are trademarks of the Apache Software Foundation.
37

More Related Content

Viewers also liked

An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache SparkAmir Sedighi
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupFrens Jan Rumph
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!Edureka!
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best PracticesCloudera, Inc.
 
Distributed ML in Apache Spark
Distributed ML in Apache SparkDistributed ML in Apache Spark
Distributed ML in Apache SparkDatabricks
 
Performance of Spark vs MapReduce
Performance of Spark vs MapReducePerformance of Spark vs MapReduce
Performance of Spark vs MapReduceEdureka!
 
Introduction to Apache Spark and MLlib
Introduction to Apache Spark and MLlibIntroduction to Apache Spark and MLlib
Introduction to Apache Spark and MLlibpumaranikar
 
Big data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & ScalaBig data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & ScalaEdureka!
 
Machine Learning with Spark MLlib
Machine Learning with Spark MLlibMachine Learning with Spark MLlib
Machine Learning with Spark MLlibTodd McGrath
 
Big Data Trend with Open Platform
Big Data Trend with Open PlatformBig Data Trend with Open Platform
Big Data Trend with Open PlatformJongwook Woo
 
Online Tweet Sentiment Analysis with Apache Spark
Online Tweet Sentiment Analysis with Apache SparkOnline Tweet Sentiment Analysis with Apache Spark
Online Tweet Sentiment Analysis with Apache SparkDavide Nardone
 
PySpark in practice slides
PySpark in practice slidesPySpark in practice slides
PySpark in practice slidesDat Tran
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark Mostafa
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark TutorialAhmet Bulut
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphXAndy Petrella
 
Spark DataFrames and ML Pipelines
Spark DataFrames and ML PipelinesSpark DataFrames and ML Pipelines
Spark DataFrames and ML PipelinesDatabricks
 
Big Data Day LA 2016 Keynote - Reynold Xin/ Databricks
Big Data Day LA 2016 Keynote - Reynold Xin/ DatabricksBig Data Day LA 2016 Keynote - Reynold Xin/ Databricks
Big Data Day LA 2016 Keynote - Reynold Xin/ DatabricksData Con LA
 
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkLarge-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkDB Tsai
 
MLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning LibraryMLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning Libraryjeykottalam
 
Python and Bigdata - An Introduction to Spark (PySpark)
Python and Bigdata -  An Introduction to Spark (PySpark)Python and Bigdata -  An Introduction to Spark (PySpark)
Python and Bigdata - An Introduction to Spark (PySpark)hiteshnd
 

Viewers also liked (20)

An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark Meetup
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best Practices
 
Distributed ML in Apache Spark
Distributed ML in Apache SparkDistributed ML in Apache Spark
Distributed ML in Apache Spark
 
Performance of Spark vs MapReduce
Performance of Spark vs MapReducePerformance of Spark vs MapReduce
Performance of Spark vs MapReduce
 
Introduction to Apache Spark and MLlib
Introduction to Apache Spark and MLlibIntroduction to Apache Spark and MLlib
Introduction to Apache Spark and MLlib
 
Big data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & ScalaBig data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & Scala
 
Machine Learning with Spark MLlib
Machine Learning with Spark MLlibMachine Learning with Spark MLlib
Machine Learning with Spark MLlib
 
Big Data Trend with Open Platform
Big Data Trend with Open PlatformBig Data Trend with Open Platform
Big Data Trend with Open Platform
 
Online Tweet Sentiment Analysis with Apache Spark
Online Tweet Sentiment Analysis with Apache SparkOnline Tweet Sentiment Analysis with Apache Spark
Online Tweet Sentiment Analysis with Apache Spark
 
PySpark in practice slides
PySpark in practice slidesPySpark in practice slides
PySpark in practice slides
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphX
 
Spark DataFrames and ML Pipelines
Spark DataFrames and ML PipelinesSpark DataFrames and ML Pipelines
Spark DataFrames and ML Pipelines
 
Big Data Day LA 2016 Keynote - Reynold Xin/ Databricks
Big Data Day LA 2016 Keynote - Reynold Xin/ DatabricksBig Data Day LA 2016 Keynote - Reynold Xin/ Databricks
Big Data Day LA 2016 Keynote - Reynold Xin/ Databricks
 
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkLarge-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache Spark
 
MLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning LibraryMLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning Library
 
Python and Bigdata - An Introduction to Spark (PySpark)
Python and Bigdata -  An Introduction to Spark (PySpark)Python and Bigdata -  An Introduction to Spark (PySpark)
Python and Bigdata - An Introduction to Spark (PySpark)
 

Similar to 2016 spark survey

[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big DataLegacy Typesafe (now Lightbend)
 
The Evolution of Social & Revolution of Messaging Apps
The Evolution of Social & Revolution of Messaging AppsThe Evolution of Social & Revolution of Messaging Apps
The Evolution of Social & Revolution of Messaging AppsMaggie Schneider Huston
 
IoT Developer Survey 2016
IoT Developer Survey 2016IoT Developer Survey 2016
IoT Developer Survey 2016Ian Skerrett
 
Spark Summit EU 2015: Matei Zaharia keynote
Spark Summit EU 2015: Matei Zaharia keynoteSpark Summit EU 2015: Matei Zaharia keynote
Spark Summit EU 2015: Matei Zaharia keynoteDatabricks
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiSlim Baltagi
 
State of API Integration Report 2017
State of API Integration Report 2017State of API Integration Report 2017
State of API Integration Report 2017Cloud Elements
 
Don't Migrate to S/4HANA until you've read this research
Don't Migrate to S/4HANA until you've read this researchDon't Migrate to S/4HANA until you've read this research
Don't Migrate to S/4HANA until you've read this researchResulting IT
 
Why change? Why Open Source? Why Red Hat? Why now?
Why change? Why Open Source? Why Red Hat? Why now?Why change? Why Open Source? Why Red Hat? Why now?
Why change? Why Open Source? Why Red Hat? Why now?Eric D. Schabell
 
Partner Managed Cloud for SAP S/4HANA
Partner Managed Cloud for SAP S/4HANAPartner Managed Cloud for SAP S/4HANA
Partner Managed Cloud for SAP S/4HANASAPPartnerCloud
 
You Can’t Live Without Open Source - Results from the Open Source 360 Survey
You Can’t Live Without Open Source - Results from the Open Source 360 SurveyYou Can’t Live Without Open Source - Results from the Open Source 360 Survey
You Can’t Live Without Open Source - Results from the Open Source 360 SurveyBlack Duck by Synopsys
 
apidays LIVE Hong Kong 2021 - The API Trends for 2022 and beyond by Jimmy Tsa...
apidays LIVE Hong Kong 2021 - The API Trends for 2022 and beyond by Jimmy Tsa...apidays LIVE Hong Kong 2021 - The API Trends for 2022 and beyond by Jimmy Tsa...
apidays LIVE Hong Kong 2021 - The API Trends for 2022 and beyond by Jimmy Tsa...apidays
 
2014 Future of Open Source - 8th Annual Survey results
2014 Future of Open Source - 8th Annual Survey results2014 Future of Open Source - 8th Annual Survey results
2014 Future of Open Source - 8th Annual Survey resultsMichael Skok
 
State of the Cloud DevOps Trends
State of the Cloud DevOps TrendsState of the Cloud DevOps Trends
State of the Cloud DevOps TrendsRightScale
 
apidays Australia 2023 - 2023 State of the API Report, Jordan Walsh, Postman
apidays Australia 2023 - 2023 State of the API Report, Jordan Walsh, Postmanapidays Australia 2023 - 2023 State of the API Report, Jordan Walsh, Postman
apidays Australia 2023 - 2023 State of the API Report, Jordan Walsh, Postmanapidays
 
2023 State of the API Report: Key Findings and Trends
2023 State of the API Report: Key Findings and Trends2023 State of the API Report: Key Findings and Trends
2023 State of the API Report: Key Findings and TrendsPostman
 
Modern IT Architecture Survey 2016
Modern IT Architecture Survey 2016Modern IT Architecture Survey 2016
Modern IT Architecture Survey 2016LeanIX GmbH
 
Pre-Con Ed: Hack that API—Your Data, Your Way With CA Performance Management
Pre-Con Ed: Hack that API—Your Data, Your Way With CA Performance ManagementPre-Con Ed: Hack that API—Your Data, Your Way With CA Performance Management
Pre-Con Ed: Hack that API—Your Data, Your Way With CA Performance ManagementCA Technologies
 
The State of API 2020 Webinar – Exploring Trends, Tools & Takeaways to Drive ...
The State of API 2020 Webinar – Exploring Trends, Tools & Takeaways to Drive ...The State of API 2020 Webinar – Exploring Trends, Tools & Takeaways to Drive ...
The State of API 2020 Webinar – Exploring Trends, Tools & Takeaways to Drive ...SmartBear
 
'Shift-Right' - Rapid Evolution with DesignOps
'Shift-Right' - Rapid Evolution with DesignOps'Shift-Right' - Rapid Evolution with DesignOps
'Shift-Right' - Rapid Evolution with DesignOpsCA Technologies
 

Similar to 2016 spark survey (20)

[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data
 
The Evolution of Social & Revolution of Messaging Apps
The Evolution of Social & Revolution of Messaging AppsThe Evolution of Social & Revolution of Messaging Apps
The Evolution of Social & Revolution of Messaging Apps
 
IoT Developer Survey 2016
IoT Developer Survey 2016IoT Developer Survey 2016
IoT Developer Survey 2016
 
Spark Summit EU 2015: Matei Zaharia keynote
Spark Summit EU 2015: Matei Zaharia keynoteSpark Summit EU 2015: Matei Zaharia keynote
Spark Summit EU 2015: Matei Zaharia keynote
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
 
State of API Integration Report 2017
State of API Integration Report 2017State of API Integration Report 2017
State of API Integration Report 2017
 
Don't Migrate to S/4HANA until you've read this research
Don't Migrate to S/4HANA until you've read this researchDon't Migrate to S/4HANA until you've read this research
Don't Migrate to S/4HANA until you've read this research
 
Why change? Why Open Source? Why Red Hat? Why now?
Why change? Why Open Source? Why Red Hat? Why now?Why change? Why Open Source? Why Red Hat? Why now?
Why change? Why Open Source? Why Red Hat? Why now?
 
Partner Managed Cloud for SAP S/4HANA
Partner Managed Cloud for SAP S/4HANAPartner Managed Cloud for SAP S/4HANA
Partner Managed Cloud for SAP S/4HANA
 
You Can’t Live Without Open Source - Results from the Open Source 360 Survey
You Can’t Live Without Open Source - Results from the Open Source 360 SurveyYou Can’t Live Without Open Source - Results from the Open Source 360 Survey
You Can’t Live Without Open Source - Results from the Open Source 360 Survey
 
apidays LIVE Hong Kong 2021 - The API Trends for 2022 and beyond by Jimmy Tsa...
apidays LIVE Hong Kong 2021 - The API Trends for 2022 and beyond by Jimmy Tsa...apidays LIVE Hong Kong 2021 - The API Trends for 2022 and beyond by Jimmy Tsa...
apidays LIVE Hong Kong 2021 - The API Trends for 2022 and beyond by Jimmy Tsa...
 
2014 Future of Open Source - 8th Annual Survey results
2014 Future of Open Source - 8th Annual Survey results2014 Future of Open Source - 8th Annual Survey results
2014 Future of Open Source - 8th Annual Survey results
 
State of the Cloud DevOps Trends
State of the Cloud DevOps TrendsState of the Cloud DevOps Trends
State of the Cloud DevOps Trends
 
apidays Australia 2023 - 2023 State of the API Report, Jordan Walsh, Postman
apidays Australia 2023 - 2023 State of the API Report, Jordan Walsh, Postmanapidays Australia 2023 - 2023 State of the API Report, Jordan Walsh, Postman
apidays Australia 2023 - 2023 State of the API Report, Jordan Walsh, Postman
 
2023 State of the API Report: Key Findings and Trends
2023 State of the API Report: Key Findings and Trends2023 State of the API Report: Key Findings and Trends
2023 State of the API Report: Key Findings and Trends
 
Modern IT Architecture Survey 2016
Modern IT Architecture Survey 2016Modern IT Architecture Survey 2016
Modern IT Architecture Survey 2016
 
Pre-Con Ed: Hack that API—Your Data, Your Way With CA Performance Management
Pre-Con Ed: Hack that API—Your Data, Your Way With CA Performance ManagementPre-Con Ed: Hack that API—Your Data, Your Way With CA Performance Management
Pre-Con Ed: Hack that API—Your Data, Your Way With CA Performance Management
 
The State of API 2020 Webinar – Exploring Trends, Tools & Takeaways to Drive ...
The State of API 2020 Webinar – Exploring Trends, Tools & Takeaways to Drive ...The State of API 2020 Webinar – Exploring Trends, Tools & Takeaways to Drive ...
The State of API 2020 Webinar – Exploring Trends, Tools & Takeaways to Drive ...
 
'Shift-Right' - Rapid Evolution with DesignOps
'Shift-Right' - Rapid Evolution with DesignOps'Shift-Right' - Rapid Evolution with DesignOps
'Shift-Right' - Rapid Evolution with DesignOps
 
OpenStack 2015 Marketing Plan
OpenStack 2015 Marketing PlanOpenStack 2015 Marketing Plan
OpenStack 2015 Marketing Plan
 

Recently uploaded

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...ranjana rawat
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 

Recently uploaded (20)

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 

2016 spark survey

  • 2. Table of Contents Introduction 3 Foreword: Matei Zaharia 4 REPORT HIGHLIGHTS 5 APACHE SPARK’S GROWTH CONTINUES 13 The Apache Spark Community is Growing 14 Spark’s Fastest Growing Areas from 2015 to 2016 17 Spark Users are Growing 18 Spark Users Employ Multiple Languages 19 Spark Components Used in Production 20 Spark is Used Widely in Organizations 21 Users Solve Complex Problems 22 Users Employ Multiple Components 23 What Users Consider Important 24 Top Three Storage Technologies 25 Section Summary 26 APACHE SPARK IN THE CLOUD IS GROWING 27 Trend: Increase in Public Cloud Deployments 28 Trend: Percentage Decrease in On-Premises Deployments 29 Section Summary 30 APACHE SPARK STREAMING AND MACHINE LEARNING SURGE IN USAGE 31 Apache Spark Streaming is Growing 32 Apache Spark Streaming Engine is the Preferred Choice 34 Section Summary 35 Afterword: Reynold Xin 36 About Databricks 37 2
  • 3. SPARK SURVEY 2016 Introduction In July 2016, Databricks conducted an Apache® Spark™ Survey to identify insights into how organizations are using Spark as well as highlight growth trends since the last Spark Survey 2015. In this report, the results reflect answers from over 900 distinct organizations and 1615 respondents, who were predominantly Apache Spark users. As in 2015, which was a tremendous year in growth for Apache Spark, this year, too, its growth remains unabated—not only in areas like the public cloud, but also with the increased use of Spark Streaming and the use of Machine Learning. 2016 also shows Spark’s robust adoption across a variety of organizations and users from many functional roles to build complex solutions, using multiple Spark components. Of the roles represented in the survey, 41% identified themselves as data engineers, while 23% as data scientists and 21% as architects; the rest of the 10% came from technical management and 5% from academia. 1615RESPONDENTS 900DISTINCT ORGANIZATIONS DATA ENGINEERS ARCHITECTS TECHNICAL MANAGEMENT ACADEMICS DATA SCIENTISTS 41% 21% 10% 5% 23% 3
  • 4. Foreword: Matei Zaharia I’m delighted to share the results of this year’s Databricks Apache Spark Survey. As I noted in the previous Spark Survey 2015, we witnessed a rapid adoption of Spark and the precipitous growth of the Spark community. And this year’s Spark’s growth trajectory and trends continue. In particular, I’m excited to see more Spark deployments in the cloud and more interest in people building real- time applications using Spark Streaming with multiple components, such as Machine Learning. Given that Apache Spark 2.0 lays the foundational steps for Structured Streaming, by providing simplified and unified APIs to write end-to-end streaming applications called continuous applications, I anticipate this interest will surge further in the coming months—with subsequent releases of Spark. Since its inception, Spark’s core mission has been to make Big Data simple and accessible for everyone—for organizations of all sizes and across all industries. And we have not deviated from that mission. In Apache Spark 2.0, we strived to make Spark easier, faster and smarter. And we remain committed to our vision of simplicity. Seventy-six percent of respondents in this survey indicate ease-of- programing as one of the most important features of Spark. Since its inception, Spark’s core mission has been to make Big Data simple and accessible for everyone— for organizations of all sizes and across all industries. And we have not deviated from that mission... M A T E I Z A H A R I A Chief Technologist at Databricks, VP of Apache Spark at the Apache Software Foundation @matei_zaharia Spark’s growth continues across various industries building complex data solutions by people in various functional roles. It has moved well beyond the early-adopter phase at tech companies and is now mainstream in large data-driven enterprises. 4
  • 5. TOP THREE APACHE SPARK TAKEAWAYS REPORT HIGHLIGHTS SPARK STREAMING AND MACHINE LEARNING SURGE IN USAGE SPARK’S GROWTH CONTINUES SPARK IN THE CLOUD IS GROWING 5
  • 6. REPORT HIGHLIGHTS This year the growth trend continues in the community. Increased growth of Apache Spark Meetup members, a jump in Spark Summit attendees, more code contributors, and a surge in companies represented at the Spark Summit (from several vertical industries) suggest a growing and thriving Spark community. 67% CODE CONTRIBUTORS 240% SPARK MEETUP MEMBERS 2016 1000 2015 600 2016 225,000 2015 66,000 NOTABLE SPARK USERS WHO PRESENTED AT SPARK SUMMIT 2016 57% NUMBER OF COMPANIES AT SUMMITS 2016 1800 2015 1144 30% SPARK SUMMIT ATTENDEES 2016 5100 2015 3912 6
  • 7. REPORT HIGHLIGHTS Asked what Apache Spark components developers use to build complex solutions for their use cases, 74% of respondents said they use two or more components to build different types of products. 74% USE TWO OR MORE COMPONENTS of respondents 64% USE THREE OR MORE COMPONENTS of respondents NUMBER OF COMPONENTS USEDTYPES OF PRODUCTS BUILT 68% 52% 45% 40% 37% 36% 29% BUSINESS / CUSTOMER INTELLIGENCE DATA WAREHOUSING REAL-TIME / STREAMING SOLUTIONS RECOMMENDATION ENGINES LOG PROCESSING USER-FACING SERVICES FRAUD DETECTION / SECURITY % of respondents who use Spark to create each product (more than one product could be selected) 7
  • 8. REPORT HIGHLIGHTS LANGUAGES USED IN SPARK YEAR-OVER-YEAR 18% 20% 2015 2016 R 36% 44% 2015 2016 SQL 31% 29% JAVA 2015 2016 58% 62% PYTHON 2015 2016 71% 65% SCALA 2015 2016 % of respondents who use each language (more than one language could be selected) SPARK COMPONENTS USED IN PRODUCTION YEAR-OVER-YEAR % of respondents who use each component in production (more than one component could be selected) SQL 24% 40% 2015 2016 DATAFRAMES 15% 38% 2015 2016 STREAMING 14% 22% 2015 2016 ADVANCED ANALYTICS (MLlib) 13% 18% 2015 2016 In addition to using multiple Apache Spark components, many respondents indicated that they use multiple programing languages in Spark. They also are using multiple components in production, including increased use of Spark Streaming and MLlib. 8
  • 9. REPORT HIGHLIGHTS APACHE SPARK’S FASTEST GROWING AREAS IN 2016 57% STREAMING USERS 2016 22% OF RESPONDENTS 2015 14% OF RESPONDENTS 38% ADVANCED ANALYTICS USERS (MLlib) 2016 18% OF RESPONDENTS 2015 13% OF RESPONDENTS 153% DATAFRAME USERS 2016 38% OF RESPONDENTS 2015 15% OF RESPONDENTS 67% SPARK SQL USERS 2016 40% OF RESPONDENTS 2015 24% OF RESPONDENTS * * * * *component used in production 9
  • 10. REPORT HIGHLIGHTS APACHE SPARK DEPLOYMENT IN PUBLIC CLOUDS INCREASED BY 10% SINCE 2015. 51% of users in the 2015 Spark Survey said they deployed Apache Spark in the public cloud, compared with 61% of users in 2016, showing a growth of 20%. 51% 2015 of respondents deployed in a public cloud 2016 of respondents deploy in a public cloud 61% While Apache Spark deployments in the public cloud increased in 2016, the percentage of Spark deployments on-premises decreased. For example, 48% of users in 2015 Spark survey and 42% in 2016 survey said they used Standalone cluster managers for their on-premises Spark deployments, showing a 13% percentage decrease. Similarly, YARN and Mesos show 10% and 36% percentage decreases respectively in deployments. 2015 2016 40% 48% 36% 42% 2015 2016 STANDALONEYARN ON-PREMISES DEPLOYMENTS YEAR-OVER-YEAR % of respondents who use each (more than one deployment could be selected) 11% 7% 2015 2016 MESOS 10
  • 11. Investments in fast data analytics has surged, according to Datanami. Since companies are shifting investments from batch to real-time applications, respondents in this survey show an affinity toward building real- time applications using the Spark Streaming framework. Among all the streaming engines, 33% of respondents said they were heavy users of Spark Streaming. REPORT HIGHLIGHTS 51% 35% S O M E W H A T I M P O R T A N T N O T I M P O R T A N T 14% of respondents CONSIDER APACHE SPARK STREAMING VERY IMPORTANT 33% of respondents USE APACHE SPARK STREAMING A LOT 11
  • 12. Respondents indicated that Spark Streaming is very important for building real-time streaming, recommendation engines, and fraud detection applications. Machine Learning has seen an increase in production usage. MLlib USE IN PRODUCTION % of respondents who use the component in production REPORT HIGHLIGHTS 40% of respondents develop RECOMMENDATION ENGINE PRODUCTS of respondents develop REAL-TIME STREAMING PRODUCTS 45%29% of respondents develop FRAUD DETECTION / SECURITY PRODUCTS Q: WHICH KINDS OF PRODUCTS DOES YOUR ORGANIZATION DEVELOP? Select all that apply. 13% 18% 2015 2016 38% ADVANCED ANALYTICS PRODUCTION CASES 12
  • 14. The Apache Spark Community is Growing The section identifies key growth areas in all aspects of Spark that are propelling this uptake. Both 2015 and 2016 have seen a tremendous growth in the Spark community and Spark usage in many vertical industries. Spark today remains the most active open source project in Big Data. Today, there are over 1000 Spark contributors, compared to 600 in 2015 from 250+ organizations. With such large numbers of contributors and organizations investing in Spark’s future development, it has engaged a community of developers globally. The Apache Spark Meetup groups’ membership continues to flourish, both nationally and internationally. APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK’S GROWTH CONTINUES 240% SPARK MEETUP MEMBERS 67% CODE CONTRIBUTORS 2016 1000 2015 600 2016 225,000 2015 66,000 14
  • 15. APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK’S GROWTH CONTINUES 57% 30% COMPANIES REPRESENTED AT SUMMITS SPARK SUMMIT ATTENDEES 2016 1800 2016 5100 2015 1144 2015 3912 Every year, more users attend Spark Summit, the largest dedicated conference to the Apache Spark project. In 2016 there has been an increased number of attendees from a broad range of organizations attending this event, with attendees ranging from developers to data scientists and engineers; to business users and analysts; and executive level decision makers. A number of notable users presented how they use Spark at the Spark Summit San Francisco 2016. NOTABLE SPARK USERS WHO PRESENTED AT SPARK SUMMIT 2016 15
  • 16. APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK’S GROWTH CONTINUES 4 RELEASES IN 2015 1.2, 1.3, 1.4, 1.5 2 MAJOR RELEASES IN 2016 1.6, 2.0 75% 18% 7% USE SPARK 1.6 USE SPARK 2.0 OTHER In just two years, the Spark community has released six Spark releases. When asked which version of Apache Spark they are using, 75% responded that they are using Spark 1.6, while 18% are using Spark 2.0 (respondents could choose multiple releases, such as 1.3, or 1.4 or 1.5). as of September 2016 16
  • 17. ADVANCED ANALYTICS USERS (MLlib) IN PRODUCTION Spark’s Fastest Growing Areas from 2015 to 2016 Spark Streaming, in particular, has taken a notable increase in its usage, so has SQL, MLlib, and Windows users from 2015. APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK’S GROWTH CONTINUES 57% STREAMING USERS IN PRODUCTION 2016 22% OF RESPONDENTS 2015 14% OF RESPONDENTS 153% DATAFRAME USERS IN PRODUCTION 2016 38% OF RESPONDENTS 2015 15% OF RESPONDENTS 67% SPARK SQL USERS IN PRODUCTION 2016 40% OF RESPONDENTS 2015 24% OF RESPONDENTS 38% 2016 18% OF RESPONDENTS 2015 13% OF RESPONDENTS 39% WINDOWS USERS IN DEVELOPMENT 2016 32% OF RESPONDENTS 2015 23% OF RESPONDENTS 17
  • 18. Spark Users are Growing Spark is attractive not only to highly-skilled and technically advanced users. It crosses barriers, and other users such as business analysts increasingly use Spark and develop Spark-based applications in environments other than Linux. From last year, the percentage of Windows users employing Spark has increased. APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK’S GROWTH CONTINUES LINUX / UNIX 75% 74% 2015 2016 DEVELOPMENT ENVIRONMENTS WINDOWS 23% 32% 2015 2016 MAC OSX 14% 2015 2016 22% 39% WINDOWS USERS YEAR-OVER-YEAR % of respondents who use each development environment (more than one environment could be selected) 18
  • 19. 71% 65% 2015 2016 SCALA Spark Users Employ Multiple Languages Spark is becoming the key data processing and computing platform used by a broad range of users. These users span many vertical industries and use a variety of programming languages. One reason for this broad adoption is because Spark is easy to use and supports familiar programming APIs across these languages. Usage of Spark in Python, SQL, and R increased, while Scala and Java usage decreased. This indicates that more data analysts are drawn to Spark from areas other than pure data engineering, suggesting that Spark usage is expanding to new and diverse users. APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK’S GROWTH CONTINUES WHICH LANGUAGES DO YOU USE SPARK IN?Q: 58% 62% 2015 2016 PYTHON 31% 29% 2015 2016 JAVA 18% 20% 2015 2016 R 36% 44% 2015 2016 SQL % of respondents who use each language (more than one language could be selected) 19
  • 20. Spark Components Used in Production Since last year, the use of Spark components in production has increased, especially in Spark Streaming and advanced analytics with Apache Spark MLlib (machine learning). This corroborates with the observation in this report about increased interest among Spark users to build real-time streaming applications with Spark Streaming, using multiple components, including MLlib. APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK’S GROWTH CONTINUES DATAFRAMES 15% 38% 2015 2016 SQL 24% 40% 2015 2016 STREAMING 14% 22% 2015 2016 ADVANCED ANALYTICS (MLlib) 13% 18% 2015 2016 WHICH COMPONENTS OF THE APACHE SPARK STACK ARE YOU USING?Q: 153% 57% 11% STREAMING USERS ADVANCED ANALYTICS USERS 67% SQL USERS % of respondents who use each component in production (more than one component could be selected) DATAFRAMES USERS 20
  • 21. WHAT INDUSTRY VERTICAL BEST DESCRIBES YOUR ORGANIZATION? Spark is Used Widely in Organizations Spark’s adoption continues to grow across varied industries because of its unified engine, and because of its proven performance and versatility that enables it to process diverse workloads. The banking sector saw the highest percentage change in the usage of Spark since 2015, as did the Health, Medical, Biotech and Pharmacy verticals. APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK’S GROWTH CONTINUES 18% CONSULTING (IT) 25% SOFTWARE (SAAS, WEB, MOBILE) 11% BANKING / FINANCE 7% ADVERTISING / MARKETING / PR 6% ECOMMERCE / RETAIL 5% HEALTH / MEDICAL / PHARMACY / BIOTECH CARRIERS / TELECOM 5% 4% 3% EDUCATION PUBLISHING / MEDIA COMPUTERS / HARDWARE 3% 13% OTHER Q: 29% CONSULTING (IT) USERS 39% HEALTH / MEDICAL / PHARMACY / BIOTECH USERS 63% BANKING USERS 2016 10.58% 2016 5.42% 2016 18.09% 2015 6.48% 2015 3.89% 2015 13.98% Percentages rounded to the nearest integer. 21
  • 22. APACHE SPARK’S GROWTH CONTINUES Users Solve Complex Problems Users are solving complex data problems across varied industry verticals, as Spark’s unified platform enables users to build complex solutions using multiple Spark components for their multiple data workloads. 68% 52% 45% 40% 37% 36% 29% BUSINESS / CUSTOMER INTELLIGENCE DATA WAREHOUSING REAL-TIME / STREAMING SOLUTIONS RECOMMENDATION ENGINES LOG PROCESSING USER-FACING SERVICES FRAUD DETECTION / SECURITY WHICH KINDS OF PRODUCTS DOES YOUR ORGANIZATION DEVELOP?Q: Select all that apply. 22
  • 23. 31% APACHE SPARK’S GROWTH CONTINUES Users Employ Multiple Components Because of Spark’s unified engine and its ability to process multiple workloads within the same cluster, many Spark users within organizations use multiple components of Spark for their use cases and their respective workloads. Not only are Spark components used separately; two or more components are often used in prototyping and production. This unification blurs the barriers between data scientists, data engineers, and data analysts—all using the same unified compute engine. COMPONENTS USED IN PROTOTYPING AND PRODUCTION DATASETS 14% 43% 67% 43% 67% 74% USE TWO OR MORE COMPONENTS of Spark users 64% USE THREE OR MORE COMPONENTS of Spark users GRAPHX MLlib SPARK SQL SPARK STREAMING DATAFRAMES More than one component could be selected. 23
  • 24. APACHE SPARK’S GROWTH CONTINUES What Users Consider Important Users are drawn to Spark for a number of reasons: it’s easier to get started quickly because of simple and consistent APIs; it’s faster because of improvements in Apache Spark 2.0; and it’s smarter because of simplified Structured Streaming APIs, allowing users to build end-to-end continuous applications. According to our 2015 Spark Survey, 91% of users consider performance as the most important aspect of Apache Spark, along with ease of programming, real-time streaming and advanced analytics. In this year’s survey, Spark users reflect these as equally important. At the time of this survey, Apache Spark 2.0 had just been officially released, and users displayed a keen interest in using it. Even though most users run Spark 1.6, the 2016 survey results suggest they had quickly started using Spark 2.0. % OF RESPONDENTS WHO CONSIDERED THE FEATURE VERY IMPORTANT PERFORMANCE 91% EASE OF PROGRAMMING 76%EASE OF DEPLOYMENT 69% ADVANCED ANALYTICS 82% REAL-TIME STREAMING 51% RUN SPARK 1.6 75% RUN SPARK 2.0 18% More than one feature could be selected. 24
  • 25. 73% WHICH OF THESE TECHNOLOGIES DO YOU CURRENTLY USE? 58% 82% of respondents use KEY-VALUE STORES (NoSQL) of respondents use OPEN-SOURCE SQL DATABASES SPARKAPACHE SPARK’S GROWTH CONTINUES Top Three Storage Technologies A large number of Spark users use technologies for storage other than Apache® Hadoop® , such as Cassandra, MongoDB and NoSQL as well as other open-source and proprietary SQL data stores. Q: of respondents use PROPRIETARY SQL DATABASES Select all that apply. 25
  • 26. APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK’S GROWTH CONTINUES Section Summary Apache Spark’s growth and adoption continues as users, industries, development environments, disciplines, and programming languages embrace its ease of use and programming, its unified compute engine, and its performance to solve complex data problems at scale. Spark allows multiple components to work on multiple workloads and access data from multiple data sources. All of these factors make Spark an attractive choice as a unified compute data platform. 26
  • 27. APACHE SPARK IN THE CLOUD IS GROWING 27
  • 28. 2016 51% 2015 Trend: Increase in Public Cloud Deployments The rise of cloud computing is rapid, inexorable and causing a huge upheaval in the tech industry, writes The Economist. “Gartner estimates that about $205 billion, or 6% of the world’s IT budget of $3.4 trillion, will be spent on cloud computing in 2016—a number it expects to grow to $240 billion next year,” according to another article in The Economist. This survey reflects this trend, as many respondents are electing to deploy Spark in the public cloud, mitigating both cost and infrastructure headaches. Since 2015, we have seen a 20% growth of users deploying Spark in the public cloud. That is, 61% users in the 2016 survey said they deployed Spark in the public cloud compared to 51% in 2015. APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK IN THE CLOUD IS GROWING SPARK DEPLOYMENT IN PUBLIC CLOUDS HAS INCREASED BY 10% SINCE 2015. 2016 61% of respondents deploy Spark in a public cloud 28
  • 29. Trend: Percentage Decrease in On-Premises Deployments Although many Spark users run Spark on-premises alongside Hadoop and other data sources, some deployment modes in 2016 have seen a percentage decrease. APACHE SPARK STREAMING IS IMPORTANT 2015 2016 WHERE DO YOU RUN SPARK? Q: 11% 40% 48% 7% 36% 42% 2015 20162015 2016 STANDALONEYARNMESOS APACHE SPARK IN THE CLOUD IS GROWING 36% 13% MESOS SPARK DEPLOYMENTS STANDALONE SPARK DEPLOYMENTS 10% YARN SPARK DEPLOYMENTS Select all that apply. 29
  • 30. Section Summary Not only do cloud deployments have lower deployment costs and fewer management headaches, they have higher and proven performance benefits. Using Apache Spark on 206 EC2 machines, we sorted 100TB of data on disk in 23 minutes. In comparison, the previous world record set by Hadoop MapReduce used 2100 machines and took 72 minutes. This means that Spark sorted the same data 3X faster using 10X fewer machines. APACHE SPARK IN THE CLOUD IS GROWING R E Y N O L D X I N Chief Architect & Co-Founder of Databricks 30
  • 31. APACHE SPARK STREAMING AND MACHINE LEARNING SURGE IN USAGE 31
  • 32. VERY IMPORTANT 51% 35% SOMEWHAT IMPORTANT NOT IMPORTANT 14% Q: APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK STREAMING AND MACHINE LEARNING SURGE IN USAGE Apache Spark Streaming is Growing Since its release, Spark Streaming has become one of the most widely used distributed streaming engines. Interest in developing real-time applications and advanced analytics is on the rise. Over half of the survey respondents indicate that streaming is vital and important for developing valuable real-time streaming, recommendation engines, and fraud-detection and security solutions. HOW IMPORTANT IS SPARK STREAMING TO YOUR USE CASE? 40% of respondents develop RECOMMENDATION ENGINE PRODUCTS 45% of respondents develop REAL-TIME STREAMING PRODUCTS 29% of respondents develop FRAUD DETECTION / SECURITY PRODUCTS Q: WHICH KINDS OF PRODUCTS DOES YOUR ORGANIZATION DEVELOP? Select all that apply. 32
  • 33. APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK STREAMING AND MACHINE LEARNING SURGE IN USAGE Organizations use Spark Streaming along with Spark’s other multiple components to develop streaming applications. Both Spark Streaming and MLlib saw a notable increase in production use. SPARK STREAMING AND MLlib USE IN PRODUCTION 2015 2016 13% 18% 2015 2016 STREAMING ADVANCED ANALYTICS (MLlib) 14% 22% 57% STREAMING PRODUCTION CASES 38% ADVANCED ANALYTICS PRODUCTION CASES % of respondents who use the component in production (more than one component could be selected) 33
  • 34. WHICH OF THESE TECHNOLOGIES DO YOU CURRENTLY USE A LOT FOR STREAMING AND/OR COMPLEX EVENT PROCESSING CASES? Q: APACHE SPARK STREAMING IS IMPORTANT 1% APACHE APEX APACHE SPARK 4% KINESIS 6% APACHE STORM 1% APACHE FLINK 29% APACHE KAFKA APACHE SPARK COMPONENT POPULARITY % of respondents who use the component anywhere from evaluation to production (more than one component could be selected) SPARK STREAMING 71% MLlib 71% SQL 88% RDDS 8383% DATAFRAMES 89% 33% DO YOU CURRENTLY USE SPARK STREAMING IN PRODUCTION? Q: used it in 2015 14% +57% are using it today SPARK STREAMING PRODUCTION CASES 22% Apache Spark Streaming Engine is the Preferred Choice Compared to other streaming engines, Spark is the preferred choice at 33%. When compared to other Spark components, Spark Streaming matches MLlib at 71% in use, from evaluation to production. In the 2015 Spark survey, 14% of users said they used Spark Streaming in production, compared to 22% of users in 2016. Overall, we saw a 57% growth of users using Spark Streaming in production. APACHE SPARK STREAMING AND MACHINE LEARNING SURGE IN USAGE Select all that apply. Note: Respondents were predominately Spark users. 34
  • 35. Section Summary Spark Streaming is being used for real-time solutions, from evaluation to production, closer in usage to Spark’s other commonly used components. As a preferred choice of streaming engine over others, more organizations are building real-time streaming solutions as they consider streaming an important Spark feature. APACHE SPARK STREAMING IS IMPORTANTAPACHE SPARK STREAMING AND MACHINE LEARNING SURGE IN USAGE 35
  • 36. Afterword: Reynold Xin 2015 and 2016 have been exciting years for the adoption and increased growth of Apache Spark and its community. Two releases—Spark 1.6 and 2.0—have seen major improvements in all aspects of Spark noted by respondents in this survey as important. I continue to look forward, and work with the community, to the exciting future ahead for the Spark platform. As Spark becomes easier, faster, and smarter, outside the predominantly IT and Consulting Industry, a newer audience is adopting it, as results from the survey suggest. Performance, ease-of-use, streaming, and reliability top the list as most important features. At the time of this survey, we released Apache Spark 2.0. Ongoing performance improvements, with Project Tungsten, started in earlier releases and culminated in Spark 2.0. In addition, Spark 2.0 delivered unified DataFrames and Datasets APIs and simplified Structured Streaming APIs. All these make Spark an attractive engine for performing advanced analytics across industry verticals in solving complex data problems, by users from different functional roles. Your voice matters. We got an insightful glimpse into the growth and trends from this year’s survey: who’s using Spark, how they are using it, what’s important, what new features they use, and what they are using it for. Just as the feedback from last year’s survey did, these insights will drive major updates and help shape the future of the Spark platform. Thank you to everyone who participated in Databricks’ Apache Spark Survey 2016! R E Y N O L D X I N Chief Architect Co-Founder of Databricks @rxin 36
  • 37. Databricks’ vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache® Spark™, a powerful open source data processing engine built for sophisticated analytics, ease of use, and speed. Databricks is the largest contributor to the open source Apache Spark project providing 10x more code than any other company. The company has also trained over 20,000 users on Apache Spark, and has the largest number of customers deploying Spark to date. Databricks provides a just-in-time data platform, to simplify data integration, real-time experimentation, and robust deployment of production applications. Databricks is venture-backed by Andreessen Horowitz and NEA. For more information, contact info@databricks.com. TRY DATABRICKS FOR FREE databricks.com/try-databricks CONTACT US FOR A PERSONALIZED DEMO databricks.com/contact-databricks © Databricks 2016. All rights reserved. Apache, Apache Spark, Spark and the Spark logo are trademarks of the Apache Software Foundation. 37