Democratizing AI with Apache Spark

Democratizing AI with
Apache Spark
Ali Ghodsi
Co-Founder and CEO

AI is changing the world
2
Why now?
AlphaGoSIRI/assistantsSelf-driving cars

Data is the catalyst
3
AI hasn’t been democratized
Better training, tuning,
validation
More data
Clickstreams
Sensor data (IoT)
Video
Speech
Handwriting
…

The hardest part of AI isn’t AI
4
“Hidden Technical Debt in Machine Learning Systems “, Google NIPS
2015
How do we democratize AI?

5
“Hidden Technical Debt in Machine Learning Systems “, Google
NIPS 2015
+ AI
FLEXIBLE FAST BIG DATA

Some gaps remain
6
Manage Data
infrastructure
• Create, configure, monitor resilient big data clusters.
• Securely access silos of disparate data sources.
• Enforce proper data governance.
•1
Empower teams to be
productive
• Interactively explore data and prototype ideas.
• Securely share big data clusters among analysts.
• Debug, troubleshoot, version-control big data applications.•
2
Establish Production-
Ready Applications
• Setup robust ML data pipelines for ETL/ELT.
• Productionize real-time applications with HA, FT.
• Build, serve, maintain advanced machine learning models.
•
3

Databricks: Closing the gap
7
• Separate compute &
storage
• Integrate existing data
stores
• Efficient cache on first
access
Just-in-Time
Data Platform1
Agile + Low TCO
• Interactive notebooks,
dashboards, reports
• Real-time exploration,
machine learning, graph
use cases
Integrated Workspace2
Accelerate Time to
Value
• Workflow scheduler for
ML, streaming, SQL, ETL
• Performance-optimized,
high availability, fault-
tolerant
Automated
Spark Management3
Performance

Enterprise AI use-cases
8
Predict credit score, credit limit, anomalies
Predict energy demand based on massive weather data
Natural language processing to extract author graph
Predict player churn, predicting network outages
Predict machine equipment failure

New Frontier of AI: Deep Learning
9
Detect cancer Understand
speech
Infer location
Identify landmarks in
photos
Recognize Mandarin and
English
Improve cancer detection

Faster and easier deep learning with Databricks
10
GPUs
• TensorFlow: The most
popular deep learning
framework.
• TensorFrames: Makes
TensorFlow computations
faster and easier to program
on Spark.
TensorFlow on
TensorFrames and GPUs support out-of-the-box
Massive parallelism

Deep Learning on Databricks
11
Data
Ingest
Feature
extractio
n
Model
Training
Product-
ionize
Clusters
Jobs &
Workflows
TensorFrames
+
GPUs
Interactive
exploration
Just-in-time data
platform
Automated
management

Deep Learning references
• Image recognition (Geo ID):
– https://www.technologyreview.com/s/600889/google-unveils-
neural-network-with-superhuman-ability-to-determine-the-location-
of-almost/
• Cancer screening:
– http://www.popsci.com/how-deep-learning-technology-could-be-
next-step-in-cancer-detection
– https://blogs.nvidia.com/blog/2016/09/19/deep-learning-breast-
cancer-diagnosis/
• Speech translation:
– https://www.technologyreview.com/s/544651/baidus-deep-
learning-system-rivals-people-at-speech-recognition/
13

PHARM
A
MEDIA INDUSTRIA
L
Generating programs
based on Nielsen ratings
Predictive Analytics
Analytics Transforming Industries
15
Real-time detection
of failing wind-turbines
Anomaly Detection
Predicting Diabetes
in Rural Counties
Next-Gen Product R&D
Real-time Data-Driven Analytics Applications

DATA
WAREHOUSE
S
HADOOP /
DATA LAKESYour
Storage
CLOUD
STORAGE
Orchestrated
Spark In The
Cloud
16
Databricks Just-in-Time Data Platform
Integrated
WorkspaceDASHBOARD
S
Reports
NOTEBOOKS
github, viz,
collaboration
EnterpriseSecurity
AccessControl,Auditing,Encryption
BI Tools
Open
Source
Your Custom Spark
Apps
• Clusters: Auto-scaled, resilient, multi-tenant
• Data Integration: Universal secure and fast
• Interfaces: BI tools & REST API
Databricks Managed Services
PRODUCTION JOBS
+
Powered by Apache Spark

Data Warehouses Hadoop / Data Lakes
YOUR
STORAGE
Cloud Storage
Orchestrated Spark In The Cloud
Integrated Workspace
Dashboards
Reports
Notebooks
github, viz,
collaboration
BI Tools
Open Source
Your Custom Spark
Apps
• Clusters: Auto-scaled, resilient, multi-tenant
• Data Integration: Universal secure and fast
• Interfaces: BI tools & REST API
Production Jobs
Powered by Apache Spark
+ Managed Services
Your Storage
Enterprise SecurityAccess Control, Auditing, Encryption

Today’s Data Reality
18
Siloed, Unstructured, Fast-Growing Data
Data Warehouses Hadoop / Data LakesCloud Storage

Powered by Apache Spark® ™
Integrated Enterprise Security Framework
Integrated Workspace
Notebooks Dashboards
Your Custom Spark
Apps
Production Jobs
BI Tools
Orchestrated Apache® Spark™ in the Cloud
Open Source Managed Services+
Your Storage
Cloud Storage | Data Warehouses | Data Lakes
OPTIONAL

Democratizing AI with Apache Spark

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Democratizing AI with Apache Spark

Similar to Democratizing AI with Apache Spark (20)

More from Spark Summit

More from Spark Summit (20)

Recently uploaded

Recently uploaded (20)

Democratizing AI with Apache Spark

Editor's Notes