Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1© Cloudera, Inc. All rights reserved.
Spark in the Enterprise – 2
Years Later
Alan Saldich – Vice President, Marketing
2© Cloudera, Inc. All rights reserved.
A busy 2 years for Cloudera & Apache Spark
2013 2014 2015 2016
Announced support
fo...
3© Cloudera, Inc. All rights reserved.
Recent engineering contributions
Integration with Hadoop
Ecosystem
Production-Ready...
4© Cloudera, Inc. All rights reserved.
2 years, 200+ customers
5© Cloudera, Inc. All rights reserved.
What are they doing with Spark?
0% 20% 40% 60% 80% 100%
Hive
Hbase
Impala
Solr
Batc...
6© Cloudera, Inc. All rights reserved.
What are they asking for?
• Security
• At a minimum equivalent to market leading RD...
7© Cloudera, Inc. All rights reserved.
Current Security Architecture: Inconsistency = Limited
Access
Policy B
Impala
(colu...
8© Cloudera, Inc. All rights reserved. 8
Kudu: Fast Analytics on Fast Changing Data
Fast Scans, Analytics
and Processing o...
9© Cloudera, Inc. All rights reserved.
In conclusion
• Spark in the enterprise => we’re well on our way
• Cloudera in the ...
10© Cloudera, Inc. All rights reserved.
Thank You
Upcoming SlideShare
Loading in …5
×

Spark in the Enterprise - 2 Years Later by Alan Saldich

Spark in the Enterprise - 2 Years Later by Alan Saldich

  • Login to see the comments

Spark in the Enterprise - 2 Years Later by Alan Saldich

  1. 1. 1© Cloudera, Inc. All rights reserved. Spark in the Enterprise – 2 Years Later Alan Saldich – Vice President, Marketing
  2. 2. 2© Cloudera, Inc. All rights reserved. A busy 2 years for Cloudera & Apache Spark 2013 2014 2015 2016 Announced support for Spark Shipped with CDH 4.4 Spark on YARN integration Announces initiative to make Spark the standard execution engine Launches first Spark training Added Kerberos integration Cloudera engineers publish O’Reilly Spark book
  3. 3. 3© Cloudera, Inc. All rights reserved. Recent engineering contributions Integration with Hadoop Ecosystem Production-Ready Features Ongoing Initiatives • Spark-on-YARN integration • Dynamic Resource Allocation • Kafka Integration • HBase Integration • Fixed operational issues at scale • Security • Kerberos Integration • HDFS Sync (Sentry) • Governance • Cloudera Navigator integration (audit & lineage) • Monitoring/Troubleshooti ng • Improved debugging • Zero Data Loss • Spark Streaming Resilience • Standard Execution Engine • Hive on Spark • Pig on Spark • Crunch on Spark • Solr indexing on Spark
  4. 4. 4© Cloudera, Inc. All rights reserved. 2 years, 200+ customers
  5. 5. 5© Cloudera, Inc. All rights reserved. What are they doing with Spark? 0% 20% 40% 60% 80% 100% Hive Hbase Impala Solr Batch ETLPredictive Machine Learning MPI Alternative Stream processing Commonly CoinstalledMost Popular Use Cases
  6. 6. 6© Cloudera, Inc. All rights reserved. What are they asking for? • Security • At a minimum equivalent to market leading RDBMS • Performance • At least as fast as the systems I’m familiar with today • Simplicity • All the functionality I need to build my application but not more
  7. 7. 7© Cloudera, Inc. All rights reserved. Current Security Architecture: Inconsistency = Limited Access Policy B Impala (column-level) Policy A Impala ...than others. Some engines support more granular restrictions... Unified, Granular Policy Enforcement A new high-performance security layer that centrally enforces access control policy. Complementing Apache Sentry, which provides unified policy definition, it delivers unified row- and column-based security, and dynamic data masking, to every Hadoop access path. Benefits: ● Security: Fine-grained permissions and enforcement across Hadoop, building on Sentry. ● Interoperability: Developers don’t need to be aware of on-disk formats; transparently swap components. RecordService: Unified Authorization Enforcement Spark (file-level) RecordService (policy enforcement) Spark Sentry (policy definition) Sentry (policy definition) MR
  8. 8. 8© Cloudera, Inc. All rights reserved. 8 Kudu: Fast Analytics on Fast Changing Data Fast Scans, Analytics and Processing of Stored Data Fast On-Line Updates & Data Serving Unchanging Fast Changing Frequent Updates HDFS HBase Arbitrary Storage (Active Archive) Append-Only Fast Analytics (on fast-changing or frequently-updated data) Real-Time Kudu Kudu fills the Gap Modern analytic applications often require complex data flow & difficult integration work to move data between HBase & HDFS Analytic Gap Pace of Analysis PaceofData
  9. 9. 9© Cloudera, Inc. All rights reserved. In conclusion • Spark in the enterprise => we’re well on our way • Cloudera in the community => we’re doing our part • The applications you can build => will only get more powerful, more valuable
  10. 10. 10© Cloudera, Inc. All rights reserved. Thank You

×