Successfully reported this slideshow.
Spark in the Enterprise - 2 Years Later by Alan Saldich
1© Cloudera, Inc. All rights reserved.
Spark in the Enterprise – 2
Alan Saldich – Vice President, Marketing
2© Cloudera, Inc. All rights reserved.
A busy 2 years for Cloudera & Apache Spark
2013 2014 2015 2016
Spark on YARN
Announces initiative to
make Spark the standard
publish O’Reilly Spark
3© Cloudera, Inc. All rights reserved.
Recent engineering contributions
Integration with Hadoop
Production-Ready Features Ongoing Initiatives
• Spark-on-YARN integration
• Dynamic Resource
• Kafka Integration
• HBase Integration
• Fixed operational issues at
• Kerberos Integration
• HDFS Sync (Sentry)
• Cloudera Navigator
integration (audit & lineage)
• Improved debugging
• Zero Data Loss
• Spark Streaming Resilience
• Standard Execution Engine
• Hive on Spark
• Pig on Spark
• Crunch on Spark
• Solr indexing on Spark
4© Cloudera, Inc. All rights reserved.
2 years, 200+ customers
5© Cloudera, Inc. All rights reserved.
What are they doing with Spark?
0% 20% 40% 60% 80% 100%
Commonly CoinstalledMost Popular Use Cases
6© Cloudera, Inc. All rights reserved.
What are they asking for?
• At a minimum equivalent to market leading RDBMS
• At least as fast as the systems I’m familiar with today
• All the functionality I need to build my application but not more
7© Cloudera, Inc. All rights reserved.
Current Security Architecture: Inconsistency = Limited
Some engines support
more granular restrictions...
A new high-performance security layer that centrally enforces access control policy. Complementing
Apache Sentry, which provides unified policy definition, it delivers unified row- and column-based security,
and dynamic data masking, to every Hadoop access path.
● Security: Fine-grained permissions and enforcement across Hadoop, building on Sentry.
● Interoperability: Developers don’t need to be aware of on-disk formats; transparently swap
RecordService: Unified Authorization Enforcement
8© Cloudera, Inc. All rights reserved. 8
Kudu: Fast Analytics on Fast Changing Data
Fast Scans, Analytics
and Processing of
(on fast-changing or
Kudu fills the Gap
require complex data
flow & difficult
integration work to
move data between
HBase & HDFS
Pace of Analysis
9© Cloudera, Inc. All rights reserved.
• Spark in the enterprise => we’re well on our way
• Cloudera in the community => we’re doing our part
• The applications you can build => will only get more powerful,
10© Cloudera, Inc. All rights reserved.