Democratizing Access
to Spark
Ali Ghodsi
Main Challenge
Big Data is Hard
2
Databricks
Goal
Democratize Big Data
3
Databricks Cloud Platform
Hosted Model
We ensureeverything works end-to-end
Rapid releases
Iterate quickly based on customer feedback
Dynamicuse
Customers scaledynamically based on needs
4
Databricks Cloud Platform
FS
CLOUD HADOOP DATA WAREHOUSE
Your Storage
DBMS
Databricks Platform
OPEN SOURCE
MANAGEMENT
Security Controls
BI connectivity
24x7 SLAs
Multi-tenancy
Production Jobs
Managed Clusters
SQL
Machine Learning
R
Graph
Streaming
Integrated Workspace NOTEBOOKS COLLABORATION REST APIsDASHBOARDS
5
How is this used so far?
Just-in-time DataWarehouse Use Case
• Separate compute from storage
• 3 top 10 mass media company shortenedtime fromidea-to-app
Advanced Analytics Use Case
• Machine learningand graph processing
• Top 2 gamingcompanies,Radius modeling of 20M companies
Real-time Use Case
• Data productusingspark streaming
• Top 5 creditcard company is doing loan approvals in real-time
6
Main Lesson
Many companies struggle with big dataprojects
• Steep learning curve formany developers
Getting trained on big datais costly and time consuming
• Acquiringmachines
• Setting up and configuringinfrastructure
• Build systemswithoutaccess to much documentation
7
How do we empower more developers?
Trained 2,000 on Sparkin 2014
Launched two Massive Open Online Courses (MOOCs) in 2015
• ~125,000 tookourcourses
• ~20,000 finishedthe course
• ~500,000 hoursspentlearningSpark
How do we multiply this to democratize access to Spark?
8
Announcing
Databricks Community Edition (beta)
Free edition of Databricks Platform
• Mini Spark clusters
• Notebooks,Dashboards
• REST APIs
Continuous delivery of content
• Course and MOOC material
• Spark how-to’sand documentation
9
Every attendee gets access today!
10
Democratizing Big Data
for Organizations
Will provide seamless transition to production
• Large clusters
• Productionpipelines
• Security and Governance
Databricks Community
Edition Demo
Michael Armbrust

2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo

  • 1.
  • 2.
    Main Challenge Big Datais Hard 2 Databricks Goal Democratize Big Data
  • 3.
    3 Databricks Cloud Platform HostedModel We ensureeverything works end-to-end Rapid releases Iterate quickly based on customer feedback Dynamicuse Customers scaledynamically based on needs
  • 4.
    4 Databricks Cloud Platform FS CLOUDHADOOP DATA WAREHOUSE Your Storage DBMS Databricks Platform OPEN SOURCE MANAGEMENT Security Controls BI connectivity 24x7 SLAs Multi-tenancy Production Jobs Managed Clusters SQL Machine Learning R Graph Streaming Integrated Workspace NOTEBOOKS COLLABORATION REST APIsDASHBOARDS
  • 5.
    5 How is thisused so far? Just-in-time DataWarehouse Use Case • Separate compute from storage • 3 top 10 mass media company shortenedtime fromidea-to-app Advanced Analytics Use Case • Machine learningand graph processing • Top 2 gamingcompanies,Radius modeling of 20M companies Real-time Use Case • Data productusingspark streaming • Top 5 creditcard company is doing loan approvals in real-time
  • 6.
    6 Main Lesson Many companiesstruggle with big dataprojects • Steep learning curve formany developers Getting trained on big datais costly and time consuming • Acquiringmachines • Setting up and configuringinfrastructure • Build systemswithoutaccess to much documentation
  • 7.
    7 How do weempower more developers? Trained 2,000 on Sparkin 2014 Launched two Massive Open Online Courses (MOOCs) in 2015 • ~125,000 tookourcourses • ~20,000 finishedthe course • ~500,000 hoursspentlearningSpark How do we multiply this to democratize access to Spark?
  • 8.
    8 Announcing Databricks Community Edition(beta) Free edition of Databricks Platform • Mini Spark clusters • Notebooks,Dashboards • REST APIs Continuous delivery of content • Course and MOOC material • Spark how-to’sand documentation
  • 9.
    9 Every attendee getsaccess today!
  • 10.
    10 Democratizing Big Data forOrganizations Will provide seamless transition to production • Large clusters • Productionpipelines • Security and Governance
  • 11.