SlideShare a Scribd company logo
Deep Learning and
Streaming in Apache Spark
2.x
Matei Zaharia
@matei_zaharia
Welcome to Spark Summit Europe
Our largest European summit yet
102talks
1200attendees
11tracks
What’s New in Spark?
Cost-based optimizer (Spark 2.2)
Python and R improvements
• PyPI & CRAN packages (Spark 2.2)
• Python ML plugins (Spark 2.3)
• Vectorized Pandas UDFs (Spark 2.3)
Kubernetes support (targeting 2.3)
0
10
20
30
40
50
Time(s)
Spark 2.2
Vectorized UDFs
0
25
50
75
100
125
Q1 Q2
Time(s)
Pandas
Spark
Spark: The Definitive Guide
To be released this winter
Free preview chapters and
code on Databricks website:
dbricks.co/spark-guide
Two Fast-Growing Workloads
Both are important but complex with current tools
We think we can simplify both with Apache Spark!
Streaming Deep
Learning
&
Why are Streaming and DL
Hard?Similar to early big data tools!
Tremendous potential, but very hard to use at first:
• Low-level APIs (MapReduce)
• Separate systems for each task (SQL, ETL, ML,
etc)
Spark’s Approach
1) Composable, high-level APIs
• Build apps from components
2) Unified engine
• Run complete, end-to-end apps
SQLStreaming ML Graph
…
Expanding Spark to New
Areas
Structured Streaming
Deep Learning
1
2
Structured Streaming
Streaming today requires separate APIs & systems
Structured Streaming is a high-level, end-to-end API
• Simple interface: run any DataFrame or SQL code incrementally
• Complete apps: combine with batch & interactive queries
• End-to-end reliability: exactly-once processing
Became GA in Apache Spark 2.2
Structured Streaming Use
Cases
Monitor quality of live video streaming
Anomaly detection on millions of WiFi hotspots
100s of customer apps in production on Databricks
Largest apps process tens of trillions of records per month
Real-time game analytics at scale
KTable<String, String> kCampaigns = builder.table("campaigns", "cmp-state");
KTable<String, CampaignAd> deserCampaigns = kCampaigns.mapValues((value) -> {
Map<String, String> campMap = Json.parser.readValue(value);
return new CampaignAd(campMap.get("ad_id"), campMap.get("campaign_id"));
});
KStream<String, String> joined =
filtered.join(deserCampaigns, (value1, value2) -> {
return value2.campaign_id;
},
Serdes.String(), Serdes.serdeFrom(new ProjectedEventSerializer(),
new ProjectedEventDeserializer()));
KStream<String, ProjectedEvent> filtered = kEvents.filter((key, value) -> {
return value.event_type.equals("view");
}).mapValues((value) -> {
return new ProjectedEvent(value.ad_id, value.event_time);
});
KStream<String, String> keyedData = joined.selectKey((key, value) -> value);
KTable<Windowed<String>, Long> counts = keyedData.groupByKey()
.count(TimeWindows.of(10000), "time-windows");
streams
Example:
Benchmark DataFrames
events
.where("event_type = 'view'")
.join(table("campaigns"), "ad_id")
.groupBy(
window('event_time, "10 seconds"),
'campaign_id)
.count()
Batch Plan Incremental Plan
Scan Files
Aggregate
Write to Sink
Scan New
Files
Stateful Agg.
Update Sink
automatic
transformation
4xlower cost
Structured Streaming
reuses the Spark SQL
Optimizer and Tungsten
Engine.
https://data-artisans.com/blog/extending-the-yahoo-streaming-benchmark
Performance:
Benchmark System Throughput
700K
15M
65M
0
10
20
30
40
50
60
70
Kafka
Streams
Apache Flink Structured
Streaming
Millionsofrecords/s
4xfewer nodes
What About Latency?
Continuous processing mode to run without
microbatches
• <1 ms latency (same as per-record streaming systems)
• No changes to user code
• Proposal in SPARK-20928
Key idea: same API can target both streaming & batch
Find out more in today’s deep dive
Expanding Spark to New
Areas
Structured Streaming
Deep Learning
1
2
Deep Learning has Huge
PotentialUnprecedented ability to work with unstructured
data such as images and text
But Deep Learning is Hard to
UseCurrent APIs (TensorFlow, Keras, etc) are low-level
• Build a computation graph from scratch
Scale-out requires manual parallelization
Hard to use models in larger applications
Very similar to early big data APIs
Deep Learning on Spark
Image support in MLlib: SPARK-21866 (Spark 2.3)
DL framework integrations: TensorFlowOnSpark,
MMLSpark, Intel BigDL
Higher-level APIs: Deep Learning Pipelines
New in TensorFlowOnSpark
Library to run distributed TF on Spark clusters & data
• Built at Yahoo!, where it powers photos, videos & more
Yahoo! and Databricks collaborated to add:
• ML pipeline APIs
• Support for non-YARN and AWS clusters
github.com/yahoo/TensorFlowOnSpark
talk
tomorrow
at 17:00
Deep Learning Pipelines
Low-level DL frameworks are powerful, but common
use cases should be much simpler to build
Goal: Enable an order of magnitude more
users to build production apps using deep
learning
Deep Learning Pipelines
Key idea: High-level API built on ML Pipelines model
• Common use cases are just a few lines of code
• All operators automatically scale over Spark
• Expose models in batch, streaming & SQL apps
Uses existing DL engines (TensorFlow, Keras, etc)
Example: Using Existing
Modelpredictor = DeepImagePredictor(inputCol="image",
outputCol="labels",
modelName="InceptionV3")
predictions_df = predictor.transform(image_df)
SELECT image, my_predictor(image) AS labels
FROM uploaded_images
Example: Model Search
est = KerasImageFileEstimator()
grid = ParamGridBuilder() 
.addGrid(est.modelFile, ["InceptionV3", "ResNet50"]) 
.addGrid(est.kerasParams, [{'batch': 32}, {'batch': 64}]) 
.build()
CrossValidator(est, eval, grid).fit(image_df)
InceptionV3
batch size 32
ResNet50
batch size 32
InceptionV3
batch size 64
ResNet50
batch size 64
Spark
Driver
Deep Learning Pipelines
DemoSue Ann Hong

More Related Content

What's hot

Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
Anyscale
 
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Databricks
 
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and more
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and moreTypesafe Reactive Platform: Monitoring 1.0, Commercial features and more
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and more
Legacy Typesafe (now Lightbend)
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
Jen Aman
 
Making Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development TeamsMaking Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development Teams
Lightbend
 
Spark Summit EU talk by Yiannis Gkoufas
Spark Summit EU talk by Yiannis GkoufasSpark Summit EU talk by Yiannis Gkoufas
Spark Summit EU talk by Yiannis Gkoufas
Spark Summit
 
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Lightbend
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
Databricks
 
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
StreamNative
 
Pakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google Cloud
Pakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google CloudPakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google Cloud
Pakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google Cloud
Lightbend
 
Lessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and MicroservicesLessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and Microservices
Alexis Seigneurin
 
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Databricks
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar Castaneda
Spark Summit
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
huguk
 
Koalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIsKoalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIs
Xiao Li
 
Data science lifecycle with Apache Zeppelin
Data science lifecycle with Apache ZeppelinData science lifecycle with Apache Zeppelin
Data science lifecycle with Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Spark Summit EU talk by Emlyn Whittick
Spark Summit EU talk by Emlyn WhittickSpark Summit EU talk by Emlyn Whittick
Spark Summit EU talk by Emlyn Whittick
Spark Summit
 
Akka at Enterprise Scale: Performance Tuning Distributed Applications
Akka at Enterprise Scale: Performance Tuning Distributed ApplicationsAkka at Enterprise Scale: Performance Tuning Distributed Applications
Akka at Enterprise Scale: Performance Tuning Distributed Applications
Lightbend
 
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzArchiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Databricks
 
Do's and don'ts when deploying akka in production
Do's and don'ts when deploying akka in productionDo's and don'ts when deploying akka in production
Do's and don'ts when deploying akka in production
jglobal
 

What's hot (20)

Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
 
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and more
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and moreTypesafe Reactive Platform: Monitoring 1.0, Commercial features and more
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and more
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
 
Making Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development TeamsMaking Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development Teams
 
Spark Summit EU talk by Yiannis Gkoufas
Spark Summit EU talk by Yiannis GkoufasSpark Summit EU talk by Yiannis Gkoufas
Spark Summit EU talk by Yiannis Gkoufas
 
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
 
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
 
Pakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google Cloud
Pakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google CloudPakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google Cloud
Pakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google Cloud
 
Lessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and MicroservicesLessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and Microservices
 
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar Castaneda
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Koalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIsKoalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIs
 
Data science lifecycle with Apache Zeppelin
Data science lifecycle with Apache ZeppelinData science lifecycle with Apache Zeppelin
Data science lifecycle with Apache Zeppelin
 
Spark Summit EU talk by Emlyn Whittick
Spark Summit EU talk by Emlyn WhittickSpark Summit EU talk by Emlyn Whittick
Spark Summit EU talk by Emlyn Whittick
 
Akka at Enterprise Scale: Performance Tuning Distributed Applications
Akka at Enterprise Scale: Performance Tuning Distributed ApplicationsAkka at Enterprise Scale: Performance Tuning Distributed Applications
Akka at Enterprise Scale: Performance Tuning Distributed Applications
 
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzArchiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
 
Do's and don'ts when deploying akka in production
Do's and don'ts when deploying akka in productionDo's and don'ts when deploying akka in production
Do's and don'ts when deploying akka in production
 

Similar to Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia

Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
GoDataDriven
 
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Databricks
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
Spark Summit
 
Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要
Paulo Gutierrez
 
Apache spark 2.4 and beyond
Apache spark 2.4 and beyondApache spark 2.4 and beyond
Apache spark 2.4 and beyond
Xiao Li
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Jason Dai
 
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0
Databricks
 
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
Databricks
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and Smarter
Databricks
 
Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015
Databricks
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
Ned Shawa
 
Spark streaming state of the union
Spark streaming state of the unionSpark streaming state of the union
Spark streaming state of the union
Databricks
 
Jumpstart on Apache Spark 2.2 on Databricks
Jumpstart on Apache Spark 2.2 on DatabricksJumpstart on Apache Spark 2.2 on Databricks
Jumpstart on Apache Spark 2.2 on Databricks
Databricks
 
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
Databricks
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
confluent
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Databricks
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
Databricks
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
Michael Rys
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
Yousun Jeong
 

Similar to Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia (20)

Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
 
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要
 
Apache spark 2.4 and beyond
Apache spark 2.4 and beyondApache spark 2.4 and beyond
Apache spark 2.4 and beyond
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0
 
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and Smarter
 
Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
 
Spark streaming state of the union
Spark streaming state of the unionSpark streaming state of the union
Spark streaming state of the union
 
Jumpstart on Apache Spark 2.2 on Databricks
Jumpstart on Apache Spark 2.2 on DatabricksJumpstart on Apache Spark 2.2 on Databricks
Jumpstart on Apache Spark 2.2 on Databricks
 
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 

More from Jen Aman

Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
Jen Aman
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Jen Aman
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
Jen Aman
 
Spatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using SparkSpatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using Spark
Jen Aman
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Jen Aman
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
Jen Aman
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Jen Aman
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
Jen Aman
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using Spark
Jen Aman
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
Jen Aman
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
Jen Aman
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
Jen Aman
 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And Python
Jen Aman
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 Furious
Jen Aman
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
Jen Aman
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on Mesos
Jen Aman
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Jen Aman
 
Spark at Bloomberg: Dynamically Composable Analytics
Spark at Bloomberg:  Dynamically Composable Analytics Spark at Bloomberg:  Dynamically Composable Analytics
Spark at Bloomberg: Dynamically Composable Analytics
Jen Aman
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
Jen Aman
 
EclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache SparkEclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache Spark
Jen Aman
 

More from Jen Aman (20)

Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
 
Spatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using SparkSpatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using Spark
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using Spark
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And Python
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 Furious
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on Mesos
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
 
Spark at Bloomberg: Dynamically Composable Analytics
Spark at Bloomberg:  Dynamically Composable Analytics Spark at Bloomberg:  Dynamically Composable Analytics
Spark at Bloomberg: Dynamically Composable Analytics
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
EclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache SparkEclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache Spark
 

Recently uploaded

Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
6459astrid
 
DU degree offer diploma Transcript
DU degree offer diploma TranscriptDU degree offer diploma Transcript
DU degree offer diploma Transcript
uapta
 
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
tanupasswan6
 
the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...
huseindihon
 
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
kinni singh$A17
 
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
kinni singh$A17
 
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
revolutionary575
 
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeliveryBDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
erynsouthern
 
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
sukaniyasunnu
 
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
gargnatasha985
 
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
ginni singh$A17
 
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
revolutionary575
 
Celonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptxCelonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptx
AnujaGaikwad28
 
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
rightmanforbloodline
 
potential development of the A* search algorithm specifically
potential development of the A* search algorithm specificallypotential development of the A* search algorithm specifically
potential development of the A* search algorithm specifically
huseindihon
 
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
norina2645
 
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
kuldeepsharmaks8120
 
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy DsouzaOpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata
 
potential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in generalpotential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in general
huseindihon
 
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
NABLAS株式会社
 

Recently uploaded (20)

Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
 
DU degree offer diploma Transcript
DU degree offer diploma TranscriptDU degree offer diploma Transcript
DU degree offer diploma Transcript
 
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
 
the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...
 
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
 
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
 
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
 
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeliveryBDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
 
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
 
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
 
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
 
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
 
Celonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptxCelonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptx
 
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
 
potential development of the A* search algorithm specifically
potential development of the A* search algorithm specificallypotential development of the A* search algorithm specifically
potential development of the A* search algorithm specifically
 
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
 
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
 
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy DsouzaOpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
 
potential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in generalpotential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in general
 
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
 

Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia

  • 1. Deep Learning and Streaming in Apache Spark 2.x Matei Zaharia @matei_zaharia
  • 2. Welcome to Spark Summit Europe Our largest European summit yet 102talks 1200attendees 11tracks
  • 3. What’s New in Spark? Cost-based optimizer (Spark 2.2) Python and R improvements • PyPI & CRAN packages (Spark 2.2) • Python ML plugins (Spark 2.3) • Vectorized Pandas UDFs (Spark 2.3) Kubernetes support (targeting 2.3) 0 10 20 30 40 50 Time(s) Spark 2.2 Vectorized UDFs 0 25 50 75 100 125 Q1 Q2 Time(s) Pandas Spark
  • 4. Spark: The Definitive Guide To be released this winter Free preview chapters and code on Databricks website: dbricks.co/spark-guide
  • 5. Two Fast-Growing Workloads Both are important but complex with current tools We think we can simplify both with Apache Spark! Streaming Deep Learning &
  • 6. Why are Streaming and DL Hard?Similar to early big data tools! Tremendous potential, but very hard to use at first: • Low-level APIs (MapReduce) • Separate systems for each task (SQL, ETL, ML, etc)
  • 7. Spark’s Approach 1) Composable, high-level APIs • Build apps from components 2) Unified engine • Run complete, end-to-end apps SQLStreaming ML Graph …
  • 8. Expanding Spark to New Areas Structured Streaming Deep Learning 1 2
  • 9. Structured Streaming Streaming today requires separate APIs & systems Structured Streaming is a high-level, end-to-end API • Simple interface: run any DataFrame or SQL code incrementally • Complete apps: combine with batch & interactive queries • End-to-end reliability: exactly-once processing Became GA in Apache Spark 2.2
  • 10. Structured Streaming Use Cases Monitor quality of live video streaming Anomaly detection on millions of WiFi hotspots 100s of customer apps in production on Databricks Largest apps process tens of trillions of records per month Real-time game analytics at scale
  • 11. KTable<String, String> kCampaigns = builder.table("campaigns", "cmp-state"); KTable<String, CampaignAd> deserCampaigns = kCampaigns.mapValues((value) -> { Map<String, String> campMap = Json.parser.readValue(value); return new CampaignAd(campMap.get("ad_id"), campMap.get("campaign_id")); }); KStream<String, String> joined = filtered.join(deserCampaigns, (value1, value2) -> { return value2.campaign_id; }, Serdes.String(), Serdes.serdeFrom(new ProjectedEventSerializer(), new ProjectedEventDeserializer())); KStream<String, ProjectedEvent> filtered = kEvents.filter((key, value) -> { return value.event_type.equals("view"); }).mapValues((value) -> { return new ProjectedEvent(value.ad_id, value.event_time); }); KStream<String, String> keyedData = joined.selectKey((key, value) -> value); KTable<Windowed<String>, Long> counts = keyedData.groupByKey() .count(TimeWindows.of(10000), "time-windows"); streams Example: Benchmark DataFrames events .where("event_type = 'view'") .join(table("campaigns"), "ad_id") .groupBy( window('event_time, "10 seconds"), 'campaign_id) .count() Batch Plan Incremental Plan Scan Files Aggregate Write to Sink Scan New Files Stateful Agg. Update Sink automatic transformation
  • 12. 4xlower cost Structured Streaming reuses the Spark SQL Optimizer and Tungsten Engine. https://data-artisans.com/blog/extending-the-yahoo-streaming-benchmark Performance: Benchmark System Throughput 700K 15M 65M 0 10 20 30 40 50 60 70 Kafka Streams Apache Flink Structured Streaming Millionsofrecords/s 4xfewer nodes
  • 13. What About Latency? Continuous processing mode to run without microbatches • <1 ms latency (same as per-record streaming systems) • No changes to user code • Proposal in SPARK-20928 Key idea: same API can target both streaming & batch Find out more in today’s deep dive
  • 14. Expanding Spark to New Areas Structured Streaming Deep Learning 1 2
  • 15. Deep Learning has Huge PotentialUnprecedented ability to work with unstructured data such as images and text
  • 16. But Deep Learning is Hard to UseCurrent APIs (TensorFlow, Keras, etc) are low-level • Build a computation graph from scratch Scale-out requires manual parallelization Hard to use models in larger applications Very similar to early big data APIs
  • 17. Deep Learning on Spark Image support in MLlib: SPARK-21866 (Spark 2.3) DL framework integrations: TensorFlowOnSpark, MMLSpark, Intel BigDL Higher-level APIs: Deep Learning Pipelines
  • 18. New in TensorFlowOnSpark Library to run distributed TF on Spark clusters & data • Built at Yahoo!, where it powers photos, videos & more Yahoo! and Databricks collaborated to add: • ML pipeline APIs • Support for non-YARN and AWS clusters github.com/yahoo/TensorFlowOnSpark talk tomorrow at 17:00
  • 19. Deep Learning Pipelines Low-level DL frameworks are powerful, but common use cases should be much simpler to build Goal: Enable an order of magnitude more users to build production apps using deep learning
  • 20. Deep Learning Pipelines Key idea: High-level API built on ML Pipelines model • Common use cases are just a few lines of code • All operators automatically scale over Spark • Expose models in batch, streaming & SQL apps Uses existing DL engines (TensorFlow, Keras, etc)
  • 21. Example: Using Existing Modelpredictor = DeepImagePredictor(inputCol="image", outputCol="labels", modelName="InceptionV3") predictions_df = predictor.transform(image_df) SELECT image, my_predictor(image) AS labels FROM uploaded_images
  • 22. Example: Model Search est = KerasImageFileEstimator() grid = ParamGridBuilder() .addGrid(est.modelFile, ["InceptionV3", "ResNet50"]) .addGrid(est.kerasParams, [{'batch': 32}, {'batch': 64}]) .build() CrossValidator(est, eval, grid).fit(image_df) InceptionV3 batch size 32 ResNet50 batch size 32 InceptionV3 batch size 64 ResNet50 batch size 64 Spark Driver

Editor's Notes

  1. Make this more about how easy it is.
  2. Comparable latency to flink
  3. We’ve been experimenting with this at DB and we’re excited to contribute it back.