More Related Content Similar to Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 (20) More from Hortonworks (15) Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 1. © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hortonworks Premier Inside Out – Introducing
Data Science Experience (DSX)
3. © Hortonworks Inc. 2011 – 2017. All Rights Reserved
à #1 Pure Open Source Hadoop Distribution
à 1000+ customers and 2100+ ecosystem
partners
à Employs the original architects, developers
and operators of Hadoop from Yahoo!
à Best-in-class 24x7 customer support
à Leading professional services and training
à #1 Data Science Platform (Source: Gartner)
à OpenPOWER performance leadership
à Flexible, software defined storage
à #1 SQL Engine for complex, analytical workloads
à Leader in On-premise and Hybrid Cloud solutions
+
IBM + Hortonworks = Unlocking Actionable Insights
5. © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Next Generation Data Science Problems
Multiple data sources & clusters
Data Scientists
Where is the data I need to answer the
business questions?
Data Engineers
How do I move that data into a central
repository?
How do I transform and cleanse that data?
6. © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Next Generation Data Science Problems
Too many tools and technologies
Data Scientists
How do I learn the latest library/ technique?
I don’t (want to) know Hadoop/ Hive etc.
How do I bring my familiar R/ Python library
to the new data science platform?
7. © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Next Generation Data Science Problems
Socializing insights is challenging
Data Scientists
How do I collaborate and share my work
with others in the organization?
Business Analyst
How do I move that data into a central
repository?
What is the best visualization to tell my
story?
8. © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Next Generation Data Science Problems
Going from prototype to production is cumbersome
Data Scientists
I created this awesome Machine Learning
Model, how do I put it into production?
Data Scientists/ Data Engineers
How are my Machine Learning Models
performing & how to improve them?
9. © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Data Science Experience
Explore & Learn Model & Evaluate
Deploy & Predict Monitor & Measure
The leading data science platform that allows you to easily collaborate across teams, use the top
open source tools and scale at the speed your business requires.
10. © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Data Science Solution
Community Open Source Scale & Enterprise Security
• Find tutorials and datasets
• Connect with Data Scientists
• Ask questions
• Read articles and papers
• Fork and share projects
• Code in Scala/Python/R/SQL
• Zeppelin & Jupyter Notebooks
• RStudio IDE and Shiny
• Apache Spark
• Your favorite libraries
• Data Science at Scale
• Run Spark Jobs on HDP Cluster
• Secure Hadoop Support
• Ranger Atlas Support for Data
• Support for ABAC
Model Management
• Data Shaping Pipeline UI
• Auto-data preparation & modeling
• Advanced Visualizations
• Model management & deployment
• Documented Model APIs
Data Science Experience
11. © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Enterprise Data Science At Scale
Enterprise
Secured,
governed and
managed
Tools
Leverage your
favorite tools,
technologies
and libraries
Deployment
From pilot to
production
Data
Build models
using all the
data
13. © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Demo Scenario
Sensors monitoring Trucks
• Stored long term sensor data about various truckers driving behavior
• New sensor data coming from trucks as they are driving in various conditions
• Predict a driving violation before they happen
• Alert the driver | manager
• Business monitors the driver performance
14. © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Demo Flow
Insights from Data Science to Production
Data Scientists
Where is the data I
need to answer the
business questions?
Business Users
Where is the insight
& predictions from
the data?
HDP Cluster
Knox
Admins
How do I meet SLA,
Performance, .., Feature
needs?
15. © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Demo Scenario
Problems Solved
• Data Scientist collaborate, learn new tools & frameworks
• Choice of tools, notebooks and languages
• Run favorite notebook on all data in the HDP Cluster
• Deploy the model to production
• Leverage the production model to deliver insights to business
• Monitor models and retrain models as new data comes in
16. © Hortonworks Inc. 2011 – 2017. All Rights Reserved
DSX with HDP Roadmap
Summary plan
DSX install with Ambari
DSX Ambari Install, DSX in HDP, Improve Enterprise readiness
Install DSX with Ambari, DSX runs on YARN node labeled nodes, Ranger, Atlas
integration for Model Management, SSO
Improve YARN integration, Model Scoring on YARN
DSX scales on all YARN nodes, Model Scoring and Notebooks run on YARN
Deeper DSX YARN
integration
18. © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Customer Briefings coming to a City Near You!
24
OCT
Silicon
Valley
25
OCT
Salt Lake
City
26
OCT
Dallas
1
NOV
Chicago
2
NOV
Toronto
7
NOV
Tysons
8
NOV
New York
City
9
NOV
Boston
19. © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Join us for a Meetup Session
Enterprise Data Science at Scale Meetup
Silicon Valley 10/30
San Francisco 11/14
Chicago 11/08 (*)
Dallas 11/09 (*)
Toronto 11/09 (*)
NYC 11/15 (*)
Washington DC 11/16 (*)
London 11/24 (*)
Boston 12/01 (*)
(*) Tentative
22. © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Announcement 13 Jun 2017:
IBM and Hortonworks extend partnership to bring
Data Science to HDP
Great Data + Great Data Science = Great Decisions
à IBM chooses Hortonworks Data Platform (HDP®) as their Hadoop distribution
à Hortonworks Data Platform (HDP) combining IBM DSX (Data Science Experience)
& IBM Big SQL into new integrated solutions