More Related Content Similar to Apache Zeppelin and Spark for Enterprise Data Science (20) Apache Zeppelin and Spark for Enterprise Data Science1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Enabling Apache Zeppelin* and Spark*
for Data Science in the Enterprise
Bikas Saha
@bikassaha
*Apache Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive,
HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper,
Oozie, Zeppelin and the Hadoop elephant logo are trademarks of the
Apache Software Foundation.
2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Making Big Data Science easy to approach
What are the current issues for the enterprise
Making Apache Zeppelin enterprise ready
Future Roadmap
4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin makes Big Data Science Easy to Approach
Zero install – Just connect via a web browser and ready to run
Support for multiple execution platforms (Apache Spark, JDBC, Hive…)
Support for multiple languages (Scala, SQL, Python…)
Support for built-in visualizations
Support for reporting
Support for sharing and collaborative work
Does NOT have machine learning built-in – that’s where Apache Spark comes in (or your
favorite SQL engine Apache Flink/Drill/Hive… and 30+ others)
5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin for Sharing
6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Making Big Data Science easy to approach
What are the current issues for the enterprise
Making Apache Zeppelin enterprise ready
Future Roadmap
7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Current Apache Zeppelin and Spark integration
Zeppelin
Server
Spark
Driver
U
s
e
r Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Architectural Issue with Secure Data Access
Zeppelin
Server
Spark
Driver
U
s
e
r
1
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Zeppelin
Server
User
H
D
F
S
9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Architectural Issues with Multi-Tenancy – Fault Tolerance
Zeppelin
Server
Spark
Driver
U
s
e
r
1
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
U
s
e
r
2
User 1 failure
affects User 2
Heavy-weight
Spark drivers
10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Architectural Issues with Multi-Tenancy – Privacy
Zeppelin
Server
Spark
Driver
U
s
e
r
1
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
U
s
e
r
2
User 1
can
access
User 2
Data
11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Making Big Data Science easy to approach
What are the current issues for the enterprise
Enterprise Ready Big Data Science
Future Roadmap
12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Livy Server as a Session Management Service
Livy
Server
Remote
Spark
Driver
Session
Remote
Context
Interactive
REST API
Batch
REST API
Standard Spark
Batch Job
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Secure Data Access - Solved
Zeppelin
Server
Livy
Interpreter
U
s
e
r
Spark
Executor
Spark
Executor
Livy
Server
Remote
Spark
Driver
Session
Remote
Context
User
HDFS
14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi Tenancy - Solved
Zeppelin
Server
Livy
Interpreter
Livy
Server
Session 1
U
s
e
r
1
U
s
e
r
2
Livy
Interpreter
Session 2
Remote
Spark
Driver
Remote
Context
Spark
Executor
Remote
Spark
Driver
Remote
Context
Spark
Executor
15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Making Big Data Science easy to approach
What are the current issues for the enterprise
Making Apache Zeppelin enterprise ready
Future Roadmap
16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Near Term Improvements
Session Management
Debuggability
Unified session for all languages
Better visualizations for Machine Learning
Support for Spark 2.0
17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Long Term Improvements
Controlled sharing of sessions for collaboration
Data exploration and browsing with metadata
Taking the model from training to production
18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You