SlideShare a Scribd company logo
1 of 39
Download to read offline
Harnessing the Power of Spark
with Databricks Cloud
Ion Stoica
March 18, 2015
Accelerating Spark Adoption
2
Certification
3
Applications
(35+)
Distributions
(11+)
Training
4
Spark training since 2011
~2000 people trained in 2014
1200+ people trained by end of
March, 2015
–  500+ people trained at this
Spark Summit alone!
MOOCs
“Intro to Big Data with Apache Spark”
–  Anthony Joseph, UC Berkeley
–  30,000+ already registered
“Scalable Machine Learning”
–  Ameet Talwalkar, UCLA
–  16,000+ already registered
5
6
Making Big Data Simple
Databricks Cloud
July, 2014: Unveiled Databricks Cloud
Over 3,500+ have registered to use Databricks Cloud
November, 2014: Limited availability
100+ companies have been using Databricks Cloud
7
Big Data Projects are Hard
8
Set up &
maintain
cluster
6-9 MONTHS
Reports &
Dashboards
Exploration Insights
ProductionProduction
Data Preparation
(Ingestion, ETL)
MONTHS WEEKS MONTHS
Why Databricks Cloud?
Accelerate time-to-results from months to days
–  Zero management
–  Real-time
–  Unified platform
Open platform
9
Databricks Cloud
10
Workspace
Notebooks Dashboards Jobs
Cloud Infrastructure
Spark + Cluster Manager
Spark
Cluster
Manager
+
11
Zero Management
Zero Management
12
Spark Cluster Manager
Set up &
maintain
cluster Production Production
Reports &
Dashboards
Data
Preparation
(Ingestion, ETL)
Exploration Insights
No need to set up clusters
Spark
Cluster
Manager
13
Real Time
Data Preparation
(Ingestion, ETL)
Exploration Insights
Real-Time
14
Production Production
Reports &
Dashboards
Data Preparation
(Ingestion, ETL)
Exploration Insights
Spark
Interactive Queries & Streaming
Real-Time
15
Production Production
Reports &
Dashboards
Notebooks
Interactive Visualization Data Preparation
(Ingestion, ETL)
Exploration Insights
Data Preparation
(Ingestion, ETL)
Exploration Insights
Notebooks
Data Preparation
(Ingestion, ETL)
Exploration Insights
Data Preparation
(Ingestion, ETL)
Exploration Insights
16
Production Production
Reports &
Dashboards
Notebooks
Real-Time Collaboration Data Preparation
(Ingestion, ETL)
Exploration Insights
Real-Time
Notebooks
17
Unified Platform
Unified Platform
18
Production Production
Reports &
Dashboards
Data Preparation
(Ingestion, ETL)
Exploration Insights
Spark
One API, One Engine
Supporting All Workloads
Production Production
Reports &
Dashboards
Production Production
Reports &
DashboardsProduction Production
Reports &
Dashboards
Jobs
Unified Platform
19
Notebooks,
Dashboards,
Jobs
One Set of Tools Data Preparation
(Ingestion, ETL)
Exploration Insights
DashboardsNotebooks
ProductionProduction
Reports &
Dashboards
Unified Platform
20
Use notebooks to interactively develop
•  ETL
•  Data analysis
•  ML Models
•  …
Run notebooks as jobs!
•  Can take input arguments
•  No need to re-engineer
JobsNotebooks
Unified Platform
21
JobsNotebooks
Run Notebooks as Jobs
No Code to Rewrite Exploration
Reports &
Dashboards
Dashboards
Production
Data Preparation
(Ingestion, ETL)
Production
Insights
Data Preparation
(Ingestion, ETL)
Production
Insights
Production
Unified Platform
22
Drag and drop notebook plots
to instantly create dashboards.
DashboardsNotebooks
Use notebooks to compute and plot
•  KPIs
•  Funnels
•  …
Unified Platform
23
JobsNotebooks
Data Preparation
(Ingestion, ETL)
Production
Insights
Production
Notebooks as Dashboards
Easily Go From Exploration
to Production
Exploration
Reports &
Dashboards
Exploration
Production
Dashboards
From Months to Days
24
Set up &
maintain
cluster
6-9 MONTHS
Production Production
Reports &
Dashboards
Data Preparation
(Ingestion, ETL)
Exploration Insights
MONTHS WEEKS MONTHS
From Months to Days
25
Exploration
Production
Data Preparation
(Ingestion, ETL)
Production
Insights
Production
DAYS / WEEKS DAYS DAYS / WEEKS
26
Open Platform
 	
  
Open Platform
	
  	
  
S3	
  
Redshift
Kinesis
…
Data Sources
	
  	
  
…
BI Tools
Notebooks Dashboards Jobs
Spark
Cluster
Manager
Databricks Cloud
+
	
  	
  
No Lock-In
Run Code
Certified Spark Distribution
	
  	
  
External Packages
•  JARs
•  Libraries
•  ...
28
Spark	
  for	
  Health	
  &	
  Fitness	
  
Chul	
  Lee	
  
Head	
  of	
  Data	
  Engineering	
  &	
  Science	
  
MyFitnessPal, Inc.
What	
  is	
  MyFitnessPal?	
  
MyFitnessPal, Inc. 
Simple	
  &	
  Effec,ve	
  	
  
Health/Fitness	
  Tracking	
  Tool	
  
Big	
  Engaged	
  Community	
  
80+	
  million	
  registered	
  users	
  	
  	
  
#1	
  health	
  &	
  fitness	
  app	
  for	
  iOS	
  &	
  Android	
  
over	
  1	
  million	
  5	
  star	
  raHngs	
  in	
  the	
  App	
  Store	
  
Massive	
  DB	
  of	
  foods	
  
Over	
  5	
  million	
  food	
  items	
  
Over	
  14.5	
  billion	
  logged	
  foods	
  
Over	
  36	
  million	
  recipes	
  
	
  
(plus	
  Massive	
  DB	
  of	
  exercise	
  data)	
  
Success Factors of Data Product Innovation
MyFitnessPal, Inc. 
Large-­‐Scale	
  Algorithms	
  (ML,	
  NLP,	
  etc)	
  
Solid	
  &	
  Highly	
  Scalable	
  Data	
  Infrastructure	
  
Big	
  Data	
  (Foods,	
  Recipes,	
  Diets,	
  etc)	
   MyFitnessPal’s	
  food	
  DB	
  (other	
  
related	
  data)	
  is	
  the	
  richest	
  and	
  
largest	
  in	
  industry	
  
Spark	
  provides	
  an	
  easy	
  access	
  
to	
  large	
  scale	
  ML	
  and	
  data	
  
mining	
  algorithms	
  (i.e.	
  MLlib)	
  
DataBricks	
  provides	
  a	
  flexible	
  and	
  
scalable	
  data	
  infrastructure	
  for	
  the	
  
rapid	
  and	
  solid	
  development	
  of	
  
data	
  products	
  
MyFitnessPal, Inc. 
Product	
  Fit	
  
DataBricks	
  helps	
  to	
  reduce	
  “Hme	
  
to	
  value”	
  allowing	
  to	
  focus	
  on	
  data	
  
product	
  innovaHon	
  and	
  customer	
  
understanding	
  
Past
MyFitnessPal, Inc. 
MyFitnessPal, Inc. 
Future	
  
Food Data Cleaning
Search
Suggested Serving Sizes
And	
  more….	
  
Ad-targetting/RecSys
Deep-Dive into Customer
Understanding
Large-Scale ETL
And	
  more…	
  
33
Open Platform: 3rd Party Apps
Notebooks
Spark
Cluster
Manager
Databricks Cloud
+
3rd Party AppsDashboards Jobs
35
36
37
Databricks Cloud
Dramatically accelerate time-to-results for big data
Open platform, no lock-in
38
Everyone here will receive access to
Databricks Cloud within next week!
39

More Related Content

What's hot

Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesPaco Nathan
 
Simplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta LakeSimplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta LakeDatabricks
 
Building an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using SparkBuilding an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using SparkItai Yaffe
 
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleEbooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleVasu S
 
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...Databricks
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsDr. Mirko Kämpf
 
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark Summit
 
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016StampedeCon
 
[Strata] Sparkta
[Strata] Sparkta[Strata] Sparkta
[Strata] SparktaStratio
 
Why spark by Stratio - v.1.0
Why spark by Stratio - v.1.0Why spark by Stratio - v.1.0
Why spark by Stratio - v.1.0Stratio
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeDatabricks
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleSriram Krishnan
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment Databricks
 
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache SparkReal-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache SparkDatabricks
 
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkPandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkLi Jin
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)Spark Summit
 
Building Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta LakesBuilding Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta LakesDatabricks
 
How Spark Fits into Baidu's Scale-(James Peng, Baidu)
How Spark Fits into Baidu's Scale-(James Peng, Baidu)How Spark Fits into Baidu's Scale-(James Peng, Baidu)
How Spark Fits into Baidu's Scale-(James Peng, Baidu)Spark Summit
 
Einstieg in Machine Learning für Datenbankentwickler
Einstieg in Machine Learning für DatenbankentwicklerEinstieg in Machine Learning für Datenbankentwickler
Einstieg in Machine Learning für DatenbankentwicklerSascha Dittmann
 

What's hot (20)

Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case Studies
 
Simplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta LakeSimplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta Lake
 
Building an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using SparkBuilding an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using Spark
 
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleEbooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
 
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
 
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
 
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
 
[Strata] Sparkta
[Strata] Sparkta[Strata] Sparkta
[Strata] Sparkta
 
Why spark by Stratio - v.1.0
Why spark by Stratio - v.1.0Why spark by Stratio - v.1.0
Why spark by Stratio - v.1.0
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta Lake
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache SparkReal-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
 
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkPandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySpark
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 
Building Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta LakesBuilding Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta Lakes
 
How Spark Fits into Baidu's Scale-(James Peng, Baidu)
How Spark Fits into Baidu's Scale-(James Peng, Baidu)How Spark Fits into Baidu's Scale-(James Peng, Baidu)
How Spark Fits into Baidu's Scale-(James Peng, Baidu)
 
Einstieg in Machine Learning für Datenbankentwickler
Einstieg in Machine Learning für DatenbankentwicklerEinstieg in Machine Learning für Datenbankentwickler
Einstieg in Machine Learning für Datenbankentwickler
 

Viewers also liked

Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesSpark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesDatabricks
 
Apache Ambari - What's New in 2.1
Apache Ambari - What's New in 2.1Apache Ambari - What's New in 2.1
Apache Ambari - What's New in 2.1Hortonworks
 
Apache Hive 0.13 Performance Benchmarks
Apache Hive 0.13 Performance BenchmarksApache Hive 0.13 Performance Benchmarks
Apache Hive 0.13 Performance BenchmarksHortonworks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
 
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARNYARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARNHortonworks
 
Apache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNApache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNHortonworks
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark Hortonworks
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalHortonworks
 
Hadoop crashcourse v3
Hadoop crashcourse v3Hadoop crashcourse v3
Hadoop crashcourse v3Hortonworks
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
 
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...Hortonworks
 
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop   operations with ambariHortonworks technical workshop   operations with ambari
Hortonworks technical workshop operations with ambariHortonworks
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesDatabricks
 
Platform as a Service with Kubernetes and Mesos
Platform as a Service with Kubernetes and Mesos Platform as a Service with Kubernetes and Mesos
Platform as a Service with Kubernetes and Mesos Miguel Zuniga
 
Cloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions ComparedCloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions ComparedWork-Bench
 
Aioug vizag oracle12c_new_features
Aioug vizag oracle12c_new_featuresAioug vizag oracle12c_new_features
Aioug vizag oracle12c_new_featuresAiougVizagChapter
 
COUG_AAbate_Oracle_Database_12c_New_Features
COUG_AAbate_Oracle_Database_12c_New_FeaturesCOUG_AAbate_Oracle_Database_12c_New_Features
COUG_AAbate_Oracle_Database_12c_New_FeaturesAlfredo Abate
 

Viewers also liked (20)

Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesSpark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student Slides
 
Apache Ambari - What's New in 2.1
Apache Ambari - What's New in 2.1Apache Ambari - What's New in 2.1
Apache Ambari - What's New in 2.1
 
Apache Hive 0.13 Performance Benchmarks
Apache Hive 0.13 Performance BenchmarksApache Hive 0.13 Performance Benchmarks
Apache Hive 0.13 Performance Benchmarks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARNYARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
 
Apache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNApache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARN
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
 
Hadoop crashcourse v3
Hadoop crashcourse v3Hadoop crashcourse v3
Hadoop crashcourse v3
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
 
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical Applications
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop   operations with ambariHortonworks technical workshop   operations with ambari
Hortonworks technical workshop operations with ambari
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFrames
 
PuttingItAllTogether
PuttingItAllTogetherPuttingItAllTogether
PuttingItAllTogether
 
Platform as a Service with Kubernetes and Mesos
Platform as a Service with Kubernetes and Mesos Platform as a Service with Kubernetes and Mesos
Platform as a Service with Kubernetes and Mesos
 
H20: A platform for big math
H20: A platform for big math H20: A platform for big math
H20: A platform for big math
 
Cloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions ComparedCloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions Compared
 
Aioug vizag oracle12c_new_features
Aioug vizag oracle12c_new_featuresAioug vizag oracle12c_new_features
Aioug vizag oracle12c_new_features
 
COUG_AAbate_Oracle_Database_12c_New_Features
COUG_AAbate_Oracle_Database_12c_New_FeaturesCOUG_AAbate_Oracle_Database_12c_New_Features
COUG_AAbate_Oracle_Database_12c_New_Features
 

Similar to Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica

1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...Jürgen Ambrosi
 
Preconference Overview of data visualisation and technology
Preconference Overview of data visualisation and technologyPreconference Overview of data visualisation and technology
Preconference Overview of data visualisation and technologyJen Stirrup
 
Building a Data-Driven Culture
Building a Data-Driven CultureBuilding a Data-Driven Culture
Building a Data-Driven CultureLucas Neo
 
Forging Cultural Change: Transforming Your Organization Into a Data-Driven Ma...
Forging Cultural Change: Transforming Your Organization Into a Data-Driven Ma...Forging Cultural Change: Transforming Your Organization Into a Data-Driven Ma...
Forging Cultural Change: Transforming Your Organization Into a Data-Driven Ma...Erika Roach
 
Google Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scaleGoogle Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scaleIdan Tohami
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
 
BDM26: Spark Summit 2014 Debriefing
BDM26: Spark Summit 2014 DebriefingBDM26: Spark Summit 2014 Debriefing
BDM26: Spark Summit 2014 DebriefingDavid Lauzon
 
Data Culture Series - Keynote & Panel - 19h May - London
Data Culture Series  - Keynote & Panel - 19h May - LondonData Culture Series  - Keynote & Panel - 19h May - London
Data Culture Series - Keynote & Panel - 19h May - LondonJonathan Woodward
 
Yhat 2017 Investor Deck
Yhat 2017 Investor DeckYhat 2017 Investor Deck
Yhat 2017 Investor DeckAustin Ogilvie
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxGautamPopli1
 
Marketing Digital Command Center
Marketing Digital Command CenterMarketing Digital Command Center
Marketing Digital Command CenterDataWorks Summit
 
AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Introjeykottalam
 
Spark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with SparkSpark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with SparkDatabricks
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
 
Big Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil JadhavBig Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil JadhavSwapnil (Neil) Jadhav
 
Data Culture Series - Keynote - 16th September 2014
Data Culture Series - Keynote - 16th September 2014Data Culture Series - Keynote - 16th September 2014
Data Culture Series - Keynote - 16th September 2014Jonathan Woodward
 
ICRISAT Global Planning Meeting 2019: Research Data Management by Abhishek Ra...
ICRISAT Global Planning Meeting 2019: Research Data Management by Abhishek Ra...ICRISAT Global Planning Meeting 2019: Research Data Management by Abhishek Ra...
ICRISAT Global Planning Meeting 2019: Research Data Management by Abhishek Ra...ICRISAT
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptxWasm1953
 

Similar to Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica (20)

1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
 
Preconference Overview of data visualisation and technology
Preconference Overview of data visualisation and technologyPreconference Overview of data visualisation and technology
Preconference Overview of data visualisation and technology
 
Hadoop and SAP BI
Hadoop and SAP BI   Hadoop and SAP BI
Hadoop and SAP BI
 
Building a Data-Driven Culture
Building a Data-Driven CultureBuilding a Data-Driven Culture
Building a Data-Driven Culture
 
Forging Cultural Change: Transforming Your Organization Into a Data-Driven Ma...
Forging Cultural Change: Transforming Your Organization Into a Data-Driven Ma...Forging Cultural Change: Transforming Your Organization Into a Data-Driven Ma...
Forging Cultural Change: Transforming Your Organization Into a Data-Driven Ma...
 
Google Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scaleGoogle Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scale
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
BDM26: Spark Summit 2014 Debriefing
BDM26: Spark Summit 2014 DebriefingBDM26: Spark Summit 2014 Debriefing
BDM26: Spark Summit 2014 Debriefing
 
Data Culture Series - Keynote & Panel - 19h May - London
Data Culture Series  - Keynote & Panel - 19h May - LondonData Culture Series  - Keynote & Panel - 19h May - London
Data Culture Series - Keynote & Panel - 19h May - London
 
Yhat 2017 Investor Deck
Yhat 2017 Investor DeckYhat 2017 Investor Deck
Yhat 2017 Investor Deck
 
On Big Data
On Big DataOn Big Data
On Big Data
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptx
 
Marketing Digital Command Center
Marketing Digital Command CenterMarketing Digital Command Center
Marketing Digital Command Center
 
AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Intro
 
Spark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with SparkSpark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with Spark
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
Big Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil JadhavBig Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil Jadhav
 
Data Culture Series - Keynote - 16th September 2014
Data Culture Series - Keynote - 16th September 2014Data Culture Series - Keynote - 16th September 2014
Data Culture Series - Keynote - 16th September 2014
 
ICRISAT Global Planning Meeting 2019: Research Data Management by Abhishek Ra...
ICRISAT Global Planning Meeting 2019: Research Data Management by Abhishek Ra...ICRISAT Global Planning Meeting 2019: Research Data Management by Abhishek Ra...
ICRISAT Global Planning Meeting 2019: Research Data Management by Abhishek Ra...
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 

Recently uploaded (20)

Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 

Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica

  • 1. Harnessing the Power of Spark with Databricks Cloud Ion Stoica March 18, 2015
  • 4. Training 4 Spark training since 2011 ~2000 people trained in 2014 1200+ people trained by end of March, 2015 –  500+ people trained at this Spark Summit alone!
  • 5. MOOCs “Intro to Big Data with Apache Spark” –  Anthony Joseph, UC Berkeley –  30,000+ already registered “Scalable Machine Learning” –  Ameet Talwalkar, UCLA –  16,000+ already registered 5
  • 7. Databricks Cloud July, 2014: Unveiled Databricks Cloud Over 3,500+ have registered to use Databricks Cloud November, 2014: Limited availability 100+ companies have been using Databricks Cloud 7
  • 8. Big Data Projects are Hard 8 Set up & maintain cluster 6-9 MONTHS Reports & Dashboards Exploration Insights ProductionProduction Data Preparation (Ingestion, ETL) MONTHS WEEKS MONTHS
  • 9. Why Databricks Cloud? Accelerate time-to-results from months to days –  Zero management –  Real-time –  Unified platform Open platform 9
  • 10. Databricks Cloud 10 Workspace Notebooks Dashboards Jobs Cloud Infrastructure Spark + Cluster Manager Spark Cluster Manager +
  • 12. Zero Management 12 Spark Cluster Manager Set up & maintain cluster Production Production Reports & Dashboards Data Preparation (Ingestion, ETL) Exploration Insights No need to set up clusters Spark Cluster Manager
  • 14. Data Preparation (Ingestion, ETL) Exploration Insights Real-Time 14 Production Production Reports & Dashboards Data Preparation (Ingestion, ETL) Exploration Insights Spark Interactive Queries & Streaming
  • 15. Real-Time 15 Production Production Reports & Dashboards Notebooks Interactive Visualization Data Preparation (Ingestion, ETL) Exploration Insights Data Preparation (Ingestion, ETL) Exploration Insights Notebooks
  • 16. Data Preparation (Ingestion, ETL) Exploration Insights Data Preparation (Ingestion, ETL) Exploration Insights 16 Production Production Reports & Dashboards Notebooks Real-Time Collaboration Data Preparation (Ingestion, ETL) Exploration Insights Real-Time Notebooks
  • 18. Unified Platform 18 Production Production Reports & Dashboards Data Preparation (Ingestion, ETL) Exploration Insights Spark One API, One Engine Supporting All Workloads Production Production Reports & Dashboards
  • 19. Production Production Reports & DashboardsProduction Production Reports & Dashboards Jobs Unified Platform 19 Notebooks, Dashboards, Jobs One Set of Tools Data Preparation (Ingestion, ETL) Exploration Insights DashboardsNotebooks ProductionProduction Reports & Dashboards
  • 20. Unified Platform 20 Use notebooks to interactively develop •  ETL •  Data analysis •  ML Models •  … Run notebooks as jobs! •  Can take input arguments •  No need to re-engineer JobsNotebooks
  • 21. Unified Platform 21 JobsNotebooks Run Notebooks as Jobs No Code to Rewrite Exploration Reports & Dashboards Dashboards Production Data Preparation (Ingestion, ETL) Production Insights Data Preparation (Ingestion, ETL) Production Insights Production
  • 22. Unified Platform 22 Drag and drop notebook plots to instantly create dashboards. DashboardsNotebooks Use notebooks to compute and plot •  KPIs •  Funnels •  …
  • 23. Unified Platform 23 JobsNotebooks Data Preparation (Ingestion, ETL) Production Insights Production Notebooks as Dashboards Easily Go From Exploration to Production Exploration Reports & Dashboards Exploration Production Dashboards
  • 24. From Months to Days 24 Set up & maintain cluster 6-9 MONTHS Production Production Reports & Dashboards Data Preparation (Ingestion, ETL) Exploration Insights MONTHS WEEKS MONTHS
  • 25. From Months to Days 25 Exploration Production Data Preparation (Ingestion, ETL) Production Insights Production DAYS / WEEKS DAYS DAYS / WEEKS
  • 27.     Open Platform     S3   Redshift Kinesis … Data Sources     … BI Tools Notebooks Dashboards Jobs Spark Cluster Manager Databricks Cloud +     No Lock-In Run Code Certified Spark Distribution     External Packages •  JARs •  Libraries •  ...
  • 28. 28
  • 29. Spark  for  Health  &  Fitness   Chul  Lee   Head  of  Data  Engineering  &  Science   MyFitnessPal, Inc.
  • 30. What  is  MyFitnessPal?   MyFitnessPal, Inc. Simple  &  Effec,ve     Health/Fitness  Tracking  Tool   Big  Engaged  Community   80+  million  registered  users       #1  health  &  fitness  app  for  iOS  &  Android   over  1  million  5  star  raHngs  in  the  App  Store   Massive  DB  of  foods   Over  5  million  food  items   Over  14.5  billion  logged  foods   Over  36  million  recipes     (plus  Massive  DB  of  exercise  data)  
  • 31. Success Factors of Data Product Innovation MyFitnessPal, Inc. Large-­‐Scale  Algorithms  (ML,  NLP,  etc)   Solid  &  Highly  Scalable  Data  Infrastructure   Big  Data  (Foods,  Recipes,  Diets,  etc)   MyFitnessPal’s  food  DB  (other   related  data)  is  the  richest  and   largest  in  industry   Spark  provides  an  easy  access   to  large  scale  ML  and  data   mining  algorithms  (i.e.  MLlib)   DataBricks  provides  a  flexible  and   scalable  data  infrastructure  for  the   rapid  and  solid  development  of   data  products   MyFitnessPal, Inc. Product  Fit   DataBricks  helps  to  reduce  “Hme   to  value”  allowing  to  focus  on  data   product  innovaHon  and  customer   understanding  
  • 32. Past MyFitnessPal, Inc. MyFitnessPal, Inc. Future   Food Data Cleaning Search Suggested Serving Sizes And  more….   Ad-targetting/RecSys Deep-Dive into Customer Understanding Large-Scale ETL And  more…  
  • 33. 33
  • 34. Open Platform: 3rd Party Apps Notebooks Spark Cluster Manager Databricks Cloud + 3rd Party AppsDashboards Jobs
  • 35. 35
  • 36. 36
  • 37. 37
  • 38. Databricks Cloud Dramatically accelerate time-to-results for big data Open platform, no lock-in 38
  • 39. Everyone here will receive access to Databricks Cloud within next week! 39

Editor's Notes

  1. This has not been only a great year for Spark but also for Databricks. We are proud that Databricks has played and continues to play a key role in accelerating the adoption of Apache Spark.
  2. Last year Databricks has launched two certification programs for both Spark distributions and applications. These certification programs ensures that every certified application will run on any certified distribution. This fuels the growths of the Spark ecosystem and reduce the risk of fragmentation. Since their launched we have certified more than 35 applications and more than 11 distributions, and since the last summit the number of certified distributions has doubled and the number of certified applications almost tripled.
  3. We have trained Spark developers and data scientists since 2011 while Spark was still a research project at UC Berkeley. Since we started Databricks we have dramatically increased our training efforts. Last year alone we trained close to 2000 people, and this year just in the first three months we will be training over 1200. Out of those we will be training more than 500 at this Spark Summit. This is the largest number of people we have ever trained at a single event!
  4. Furthermore, over the next few months we will deliver two MOOCs… So far more than 46,000 people have registered to these two courses, exceeding our wildest expectations.
  5. Now let me talk about the databricks product. Our vision is to make big data simple. As a first step towards achieving that goal last year we have unveiled DBC.
  6. Right after that we were overwhelmed by user’s interest. Since then over 3,500 have registering to try Databricks Cloud. In November last year we have released Databricks Cloud in limited availability, and started to slowly ramp up customers. Right now I’m happy to say that we have over 100 companies using databricks cloud. Since our launching we gathered a better understanding of how our customers use our product and the value we bring to them. In the rest of this talk, I’ll articulate this value and some of the features behind this value.
  7. Today, big data projects are hard. One particular consequence of this is that they take a very long time. This has a high opportunity cost, and in some cases lead to the failure of these projects. First you need to set up a cluster, typically a hadoop cluster. This alone can take at least 6 to 9 months. Next you need to prepare data. Data is typically unstructured, for example logs, tweets, facebooks posts, and may come from multiple sources. Ingesting, wrangling, and cleaning data is an iterative process which may take weeks. And once you are coming up with a set of scripts that prepare data you want to put them production to run on new data as is generated. This may take weeks to months. Once you prepare the data you will want to quickly explore it maybe to compute some KPIs or other metrics of interest. This is another time consuming process. And once you do so you want to generate some reports or create dashboards for your managers or the business organization. This process may take days to months depending on whether you use one of the existing BI tools or write it from scratch. Finally, you are collecting this data to ultimately extract value out of it to improve your service, product, or a business process. This requires to leverage ML and graph algorithms to develop predictive models and make better decisions. Again this is a long process and once you are done you want to put your model in production. This again requires reengineering as typically you need to reimplement the model to deploy it in production. This may takes months. So at the end of the day you are looking to many months to even over one year to successfully complete a data project.
  8. Databricks can dramatically reduce the time-to-value from moths to weeks and even days. Databricks is doings so by providing a zero management and unified platform with real-time capabilities. And finally Databricks Cloud is an open platform which another big benefit for customer.
  9. Databricks Cloud is a hosted service which currently runs on AWS. On top of that it provides a sophisticated cluster manager for Spark, and a set of tools: notebooks, dashboards, and jobs, that simplify significantly data processing and make it much faster.
  10. Next, let me illustrate these capabilities and their impact on the accelerating time-to-results. First, zero management.
  11. DBC provides very powerful cluster management capabilities which allow users to create new clusters in seconds, dynamically scale them up and down and allow users to share these clusters. This obviates the need to setting up and maintaining clusters
  12. DBC provides real-time capabilities in several dimensions.
  13. First Spark allows users to perform interactive queries and process data streams in real time. This can dramatically increase the productivity of users when performing exploration and getting insights.
  14. The notebooks represent the central component of our workspace, and go far beyond the functionality available in existing notebooks. For instance, notebooks provide interactive visualization. At a click of a button the user can visualize the data and make decisions. This accelerate the speed of exploration.
  15. Databricks notebooks also provide real-time collaboration like google docs. This helps users share documents and work together building far more effectively on each other’s work. On-line and off-line collaboration speedups data cleaning, exploration, and getting insights.
  16. But perhaps most importantly, DBC provides a unified platform.
  17. Spark provides one API and one execution engine that can seamlessly support a large variety of workloads, including batch, interactive queries, streaming, and machine learning and graph processing.
  18. Second, the Databricks Workspace provides one set of tools that can be used for everything from data ingestion, ETL, interactive exploration, insights, as well as running production jobs. These tools are highly integrated. To illustrate this let me take two examples.
  19. As all of us know notebooks are great for exploration, data analysis, and training models. But what do you do when you are done with this? Well if you are a data scientists you give your code and model to some engineers who will re-implement the functionality, test it, and deploy in production. This may take many weeks even months. Wouldn’t be great if you can remove this step! Databricks cloud allows you to do exactly this! Once you develop a notebook you can simply run it programatically as a job. You can run it periodically or when the input changes. Furthermore, notebooks can take input arguments which allow you to run them over different sets. Furthermore, you can use notebooks to write complex workflows from which you can call other notebooks or jobs.
  20. Thus, Databricks cloud allows you to develop and test your work in notebooks and then run it in production at a click of the button. This dramatically shorten the time to take your work and run into production at scale.
  21. Notebooks are also tightly integrated with dashboard. Once you have computed and plot the metrics of interest in your notebook, you can simply drag and drop these plots and create interactive dashboards, where you can do slice & dice analysis.
  22. This allows you to create plots and dashboards in minutes and then publish them at the click of a button.
  23. Putting everything together using Databricks Cloud you can reduce the time-to-results form months to weeks or even days!
  24. In addition to providing these great capabilities DBC is also an open platform.
  25. First, it can import and export data from a variety of sources including HDFS, S3, Cassandra, redshift, kinesis, kafka, and many more. Second, you can upload your own or existing libraries to use with the notebooks or upload arbitrary JARs and run them programmatically as jobs! Third, DBC provides an ODBC driver is you can connect your favorite BI tool such as Tableau, Qlik, or Microstrategy. And finally, if you wish you can download the code you developed on databricks cloud and run it on any certified Spark distribution. So no locking! Rather than just listening to me describe how Databricks Cloud can reduce your time to value, I thought it would be more interesting to hear from actual users.
  26. First, I'd like to introduce Chul Lee, the Head of Data Engineering and Science at MyFitnessPal - now part of UnderArmour.  Sitting at the intersection of health, fitness and nutrition in a digital world, MyFitnessPal is uniquely positioned to translate its rich repository of data into insights for its rapidly growing user community.
  27. Foursquare – Do we have more users eating at restaurants that foursquare has check-ins??
  28. Foursquare – Do we have more users eating at restaurants that foursquare has check-ins??
  29. Next, I'd like to introduce Rob Ferguson, the Director of Engineering at Automatic Labs - which brings the promise of Internet of Things-style analytics to the automotive industry.
  30. Our vision with 3rd party applications is to enable enterprises to interact with and consume data through the interface they're most comfortable with while avoiding the pitfalls of connecting a disjoint set of best-of-breed tools.  Users will be able to load their data into Databricks Cloud, perform any transformations necessary, and then launch their application with a single click and all configuration and infrastructure will be taken care of under the covers. To demonstrate this capability, I'll be joined by two of our close partners today.
  31. First, I'd like to invite up Justin Langseth, CEO of ZoomData - an analytics visualization and exploration tool built for Big Data from the ground up.
  32. Next, I'd like to invite up Rob Harper, Lead Product Architect at Uncharted to demonstrate Pantera, a new tool that uses Spark to create a Google Maps-like interface for exploring the richness of large datasets.. The Databricks Cloud platform can support a wide variety of Spark-powered applications.  While I wish we could demonstrate more of them, I've already been up here long enough. 
  33. However, for another great example, please check out the talk by Tresata later today that demonstrates a revolutionary new anti money laundering application integrated with Databricks Cloud.
  34. In summary Databricks cloud dramatically accelerates time-to-results for big data. In addition is an open platform allowing you to import and export data from a variety of data sources, run arbitrary jobs programmatically, and export your code if you wish to run on another certified spark distribution. Furthermore, we are looking forward to seemelessly support your favorite application on top of databricks cloud. And one final thing. We're working hard to get everyone access as fast as possible to Databricks Cloud. We know that many of you have waited for many months and we really appreciate your patience.
  35. As a token of our appreciation for those here today, I'm excited announce that you will receive an email within the next week providing you access to Databricks Cloud!