Spark Summit 
June 2014
Apache Spark and Databricks
Adoption 
All major Hadoop distributions include Spark 
Beyond Hadoop
Partnerships 
Partner with Spark distributors to provide great 
experience to every Spark user 
Partners
Certification 
Build a strong application ecosystem 
Spark API 
Spark Distros 
… 
Distros Cert 
Spark Apps 
… App Cert
Certification 
Free certification process 
Scripts for certifying Spark distributions 
• Developed by community 
• Open-source 
Anyone will be able to certify any Spark distribution
Training 
We’ve been teaching Spark since 2012 
• 400+ people this year through Databricks 
Just launched a new training program 
• Already hold workshops in 5 cities 
300+ people signed up for training on Wednesday
Solve Big Data Challenges
Big Promise 
Great successes using Big Data
Big Promise 
Great successes using Big Data 
Every organization collects dataYour company here!
Big Challenge 
Great successes using Big Data 
Google, Facebook spend billions $ to develop, 
implement, and run data analysis tools and products 
Your company here! 
Every organization collects data
Typical Story 
Your company starts a Big Data initiative 
You are tasked to… 
1) Build a Hadoop cluster 
2) Build a data pipeline 
3) Get insights & 
build data products 
Clusters hard to set up 
and manage 
Need to integrate a zoo 
of tools 
Tools are hard to use 
(IT) 
(engineers, data scientists) 
(engineers, data scientists, analysts)
Typical Data Pipeline 
Data 
ETL 
Exploration 
Dashboards 
& Reports 
Data 
Products 
Advanced 
Analytics 
Integrate disparate, clunky tools 
Hard to navigate data, develop and deploy apps
Vision 
Make big data easy
From Challenges to Solutions 
Challenges 
Solutions 
Hosted platform 
Apache Spark 
Clusters hard to set up 
and manage 
Need to integrate a zoo 
of tools 
Tools are hard to useInteractive Workspace
Databricks Cloud 
Databricks Workspace 
Databricks Cloud 
Databricks Platform
Databricks Platform 
Databricks Workspace 
Databricks Platform 
… 
…
Databricks Platform 
Start clusters in seconds 
Zero-cost management 
Dynamically scale up & down
Apache Spark 
Unifies 
• Streaming 
• SQL 
• Machine learning 
• Graphs 
Single system, 
Databricks Workspace 
Databricks Platformsingle API
Databricks Workspace 
Databricks Workspace 
Notebooks 
Dashboards 
Jobs 
Apps 
Databricks Platform
Notebooks 
Support Python, SQL, Scala 
Interactive commands & plots 
On-line collaboration
Dashboards 
WYSIWYG builder 
Interactive plots 
One-click publishing
Job Launcher 
Run arbitrary Spark jobs, programmatically
Dramatically Simplify Data Pipeline 
Data 
ETL 
Exploration 
Advanced Analytics 
Dashboards & Reports 
Data Products 
Cloud
Dramatically Simplify Data Pipeline 
Data 
ETL 
Exploration 
Advanced Analytics 
Dashboards & Reports 
Data Products 
Cloud 
Free users to focus on 
finding answers & building products
Demo
Availability 
Started closed beta program earlier this year 
Limited availability soon 
• Gradually ramping up 
• Sign up on databricks.com!
3rd Party Apps 
Databricks 
Workspace 
Databricks Platform
3rd Party Apps 
Databricks Workspace 
Apps 
… 
Databricks Platform
Databricks Cloud and Spark 
Databricks Cloud runs 100% Apache Spark 
• No lock in: any Databricks Cloud app runs on any 
certified Spark distribution 
Databricks Cloud accelerates Spark adoption 
• Provide easiest way to learn and use Apache Spark
Databricks Cloud 
Databricks Workspace 
Databricks Platform 
Dramatically simplify 
• analyzing big data 
• building data products 
Fuel growth of Spark ecosystem 
Make big data easy
Thank You!

Announcing Databricks Cloud (Spark Summit 2014)

  • 1.
  • 2.
    Apache Spark andDatabricks
  • 3.
    Adoption All majorHadoop distributions include Spark Beyond Hadoop
  • 4.
    Partnerships Partner withSpark distributors to provide great experience to every Spark user Partners
  • 5.
    Certification Build astrong application ecosystem Spark API Spark Distros … Distros Cert Spark Apps … App Cert
  • 6.
    Certification Free certificationprocess Scripts for certifying Spark distributions • Developed by community • Open-source Anyone will be able to certify any Spark distribution
  • 7.
    Training We’ve beenteaching Spark since 2012 • 400+ people this year through Databricks Just launched a new training program • Already hold workshops in 5 cities 300+ people signed up for training on Wednesday
  • 8.
    Solve Big DataChallenges
  • 9.
    Big Promise Greatsuccesses using Big Data
  • 10.
    Big Promise Greatsuccesses using Big Data Every organization collects dataYour company here!
  • 11.
    Big Challenge Greatsuccesses using Big Data Google, Facebook spend billions $ to develop, implement, and run data analysis tools and products Your company here! Every organization collects data
  • 12.
    Typical Story Yourcompany starts a Big Data initiative You are tasked to… 1) Build a Hadoop cluster 2) Build a data pipeline 3) Get insights & build data products Clusters hard to set up and manage Need to integrate a zoo of tools Tools are hard to use (IT) (engineers, data scientists) (engineers, data scientists, analysts)
  • 13.
    Typical Data Pipeline Data ETL Exploration Dashboards & Reports Data Products Advanced Analytics Integrate disparate, clunky tools Hard to navigate data, develop and deploy apps
  • 14.
    Vision Make bigdata easy
  • 15.
    From Challenges toSolutions Challenges Solutions Hosted platform Apache Spark Clusters hard to set up and manage Need to integrate a zoo of tools Tools are hard to useInteractive Workspace
  • 16.
    Databricks Cloud DatabricksWorkspace Databricks Cloud Databricks Platform
  • 17.
    Databricks Platform DatabricksWorkspace Databricks Platform … …
  • 18.
    Databricks Platform Startclusters in seconds Zero-cost management Dynamically scale up & down
  • 19.
    Apache Spark Unifies • Streaming • SQL • Machine learning • Graphs Single system, Databricks Workspace Databricks Platformsingle API
  • 20.
    Databricks Workspace DatabricksWorkspace Notebooks Dashboards Jobs Apps Databricks Platform
  • 21.
    Notebooks Support Python,SQL, Scala Interactive commands & plots On-line collaboration
  • 22.
    Dashboards WYSIWYG builder Interactive plots One-click publishing
  • 23.
    Job Launcher Runarbitrary Spark jobs, programmatically
  • 24.
    Dramatically Simplify DataPipeline Data ETL Exploration Advanced Analytics Dashboards & Reports Data Products Cloud
  • 25.
    Dramatically Simplify DataPipeline Data ETL Exploration Advanced Analytics Dashboards & Reports Data Products Cloud Free users to focus on finding answers & building products
  • 26.
  • 27.
    Availability Started closedbeta program earlier this year Limited availability soon • Gradually ramping up • Sign up on databricks.com!
  • 28.
    3rd Party Apps Databricks Workspace Databricks Platform
  • 29.
    3rd Party Apps Databricks Workspace Apps … Databricks Platform
  • 30.
    Databricks Cloud andSpark Databricks Cloud runs 100% Apache Spark • No lock in: any Databricks Cloud app runs on any certified Spark distribution Databricks Cloud accelerates Spark adoption • Provide easiest way to learn and use Apache Spark
  • 31.
    Databricks Cloud DatabricksWorkspace Databricks Platform Dramatically simplify • analyzing big data • building data products Fuel growth of Spark ecosystem Make big data easy
  • 32.