SlideShare a Scribd company logo
𝜆
Open Source
-Architecture for Deep Learning
Use case
Patrick R Nicolas
Oct. 2020
pnicolasai@yahoo.com
Overview
3
“… and the wise man said,
thou shall embrace open source”.
21st century proverb
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Overview
4
Overview
Layers
Open-source components
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Overview
5
The world of data scientists accustomed to Python
scientific libraries have been shaken up by the
emergence of ’big data’ framework such as Apache
Hadoop, Spark and Kafka.
This presentation introduces a variant of the
architecture and describes the seamless integration of
various open source components to train, validate and
test deep learning models.
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
𝜆
Disclaimer
6
The concept and architecture are versatile enough to
accommodate a variety of open source, commercial
solutions and services beside the frameworks
prescribed in this presentation.
For instance, deep learning frameworks, such as Keras
or tensor flow are excellent alternatives to PyTorch.
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Requirements
7
• Process batch and stream data, concurrently
• Enforce data immutability
• Recover gracefully from human errors
• Handle hardware failures
• Minimize latency for real-time requests
• Scale for very large data set
• Optimize full lifecycle of data set
• Guarantee quality and integrity of data
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
A ‘big data’ framework should be able to ….
Optimizing data life cycle
8
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
The need for optimizing the data life cycle: 79% of data
scientist time is spent collecting and organizing data.
Source Quora
Data quality
9
Accuracy: Correct models and representative data.
Completeness: No missing data
Consistency: Applied to semantic and format
Timeliness: Up-to-date data and notification
Accessibility: Ease of use and high availability
Validity: Comply to constraints, rules and regulations
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Guaranteeing data quality and integrity
Solution …
10
- architecture is a large scale data processing that
balanced batch and real-time streamed data.
It is a one-stop shopping for various data sources that
balance latency, redundancy, easy of access and
throughput.
It breaks down into 3 layers
• Speed (streaming, real-time, …)
• Batch (training, analysis, …)
• Serving (query, visualization, …)
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
𝜆
… using open source
11
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
architecture using open source components?
𝜆
The task consists of reviewing and evaluating the trove
of available of open source libraries to build a robust
architecture that support the rigor of training and
tuning deep learning models.
The libraries are weaved through a set language-
agnostic REST API to form a coherent pipeline.
… for deep learning
12
• Python scientific libraries have been the go-to tools
for data scientists to analyze data and build models.
• PyTorch framework builds up on these libraries to
support the design and execution of deep learning
models.
• Apache Spark and Kafka complements these
frameworks for very large data set and real-time
processing.
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
architecture for deep learning?
𝜆
Bird-eye view
13
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Feel overwhelmed?
... Let’s break it down
Example open source
𝜆 architecture
Layers
14
Overview
Layers
Open-source components
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Batch layer
15
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Batch layer objective: load batch of data to be distributed,
preprocessed to train deep learning models.
Batch layer
16
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Typical use case:
1. Apache Spark loads training set from Amazon S3
2. Spark master partitions training data
3. Spark workers preprocessed data and notify
completion through Kafka event queue
4. Pytorch updated model parameters from pre-
processed training data
5. Pytorch broadcast model parameters and quality
metrics through Kafka
6. Apache Hive powered by Spark stores models related
data and metrics
Speed layer
17
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Speed layer objective: process queries to predictive
models with very low latency.
Speed layer
18
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Use case:
1. Kafka routes data streams to Spark master
2. Spark pre-processes requests and forward them to
deep model micro-service
3. Flask converts requests to prediction query to Pytorch
model
4. Pytorch model generate a prediction
5. Run-time metrics are broadcast through Kafka
Serving layer
19
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Serving layer objective: process queries to analyze data,
model performances and execute statistical inference
Serving layer
20
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Use case:
1. Analyst queries relational data base, MySQL for most
recent data, statistics using Fine report UI (low
latency)
2. Analyst queries asynchronously Hive data warehouse
for archived data, statistics (high latency)
3. Hive processes queries through Spark datasets
4. Spark updates regularly MySQL short term data
Overview
21
Overview
Layers
Open-source components
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
PyTorch
22
PyTorch is an optimized tensor library for deep
learning using GPUs and CPUs.
It extends the functionality of Numpy and Scikit-
learn to support the training, evaluation and
commercialization of complex machine learning
models.
https://pytorch.org/tutorials/
Alternatives:
Tensor flow: https://www.tensorflow.org/
Keras: https://keras.io
MxNet: https://mxnet.apache.org
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Apache Spark
23
Apache Spark is an open source cluster computing
framework for fast real-time processing.
It supports Scala, Java, Python and R programming
languages and includes streaming, graph and machine
learning libraries.
https://www.scala-lang.org
https://spark.apache.org
Alternative:
PySpark: https://databricks.com/glossary/pyspark
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Streaming
24
Apache Kafka is an open-source distributed event
streaming framework to large scale, real-time data
processing and analytics.
It captures data from various sources in real-time as a
continuous flow and routes it to the appropriate
processor.
https://kafka.apache.org
Alternatives:
Amazon SQS: https://aws.amazon.com/sqs/
RabbitMQ: https://www.rabbitmq.com
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Model tuning
25
Ray-tune is a distributed hyper-parameters
tuning framework particularly suitable to deep learning
models.
It reduces significantly the cost of optimizing the
configuration of a model. It is a wrapper around other
open source libraries
https://docs.ray.io/en/master/tune/index.html
Alternatives:
Amazon SageMaker: https://aws.amazon.com/sagemaker/
HyperOpt: https://github.com/hyperopt/hyperopt
Optuna: https://optuna.readthedocs.io
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Python REST service
26
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Flask is an easy to use implementation of the
RESTful interface to Python applications.
It supports most of web and deployment standards
such Docker, React.js, Angular, HTML5 and WSGI
containers.
https://palletsprojects.com/p/flask/
Alternatives:
Falcon: https://falcon.readthedocs.io
Fast API: https://fastapi.tiangolo.com
RDBMS
27
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
MySQL is an open source relational database
supporting partitioning, sharding, replication. It can
be extended with real-time analytics (Heatwave)
and enterprise clustering (CGE)
https://www.mysql.com
Alternatives:
PosgresSQL: https://www.postgresql.org
HyperSQL http://www.hsqldb.org
Amazon RDS: http://aws.amazon.com/rds
Data warehouse
28
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Apache Hive is a data warehouse framework that
leverages Spark to execute largely distributed SQL
queries.
It optimizes SQL queries through lazy evaluation of
acyclic execution graph. It is integrated with
Spark data set and HDFS.
https://hive.apache.org
Alternatives:
Vertica http://www.vertica.com
Amazon Redshift https://aws.amazon.com/redshift/
Dashboard
29
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Fine report is a business intelligence and
dashboard tool that supports real time analytics,
reporting and visualization. It accomodates needs
of business managers and data scientists
https://www.finereport.com
Alternatives:
Sisense: https://www.sisense.com
Tableau: https://www.tableau.com
30
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Final disclaimer
This presentation is not an endorsement of the various
tools, libraries or frameworks described or suggested in
this presentation.
Allthough the tools listed in the slides are known to work
in the context of the architecture, there are excellent
alternative libraries that may better meet your specific
needs.
31
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Thank you!
Q&A

More Related Content

What's hot

Dspace OAI-PMH
Dspace OAI-PMHDspace OAI-PMH
Dspace OAI-PMH
Sem Gebresilassie
 
Bitquery GraphQL for Analytics on ClickHouse
Bitquery GraphQL for Analytics on ClickHouseBitquery GraphQL for Analytics on ClickHouse
Bitquery GraphQL for Analytics on ClickHouse
Altinity Ltd
 
Make Your Application “Oracle RAC Ready” & Test For It
Make Your Application “Oracle RAC Ready” & Test For ItMake Your Application “Oracle RAC Ready” & Test For It
Make Your Application “Oracle RAC Ready” & Test For It
Markus Michalewicz
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
Cloudera, Inc.
 
Why Architecting for Disaster Recovery is Important for Your Time Series Data...
Why Architecting for Disaster Recovery is Important for Your Time Series Data...Why Architecting for Disaster Recovery is Important for Your Time Series Data...
Why Architecting for Disaster Recovery is Important for Your Time Series Data...
InfluxData
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
Databricks
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
Databricks
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
Flink Forward
 
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
VMware Tanzu
 
Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive
Glen Hawkins
 
Frame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine LearningFrame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine Learning
David Stein
 
B-tree & R-tree
B-tree & R-treeB-tree & R-tree
B-tree & R-tree
Shakil Ahmed
 
Introduction to Machine Learning with Azure & Databricks
Introduction to Machine Learning with Azure & DatabricksIntroduction to Machine Learning with Azure & Databricks
Introduction to Machine Learning with Azure & Databricks
CCG
 
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
DataWorks Summit
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentationMapR Technologies
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
C4Media
 
Pre-Con Ed: Using SQL to Access Your CA IDMS Databases
Pre-Con Ed: Using SQL to Access Your CA IDMS DatabasesPre-Con Ed: Using SQL to Access Your CA IDMS Databases
Pre-Con Ed: Using SQL to Access Your CA IDMS Databases
CA Technologies
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™
Databricks
 
PostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CapturePostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data Capture
Jeff Klukas
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Toshihiro Suzuki
 

What's hot (20)

Dspace OAI-PMH
Dspace OAI-PMHDspace OAI-PMH
Dspace OAI-PMH
 
Bitquery GraphQL for Analytics on ClickHouse
Bitquery GraphQL for Analytics on ClickHouseBitquery GraphQL for Analytics on ClickHouse
Bitquery GraphQL for Analytics on ClickHouse
 
Make Your Application “Oracle RAC Ready” & Test For It
Make Your Application “Oracle RAC Ready” & Test For ItMake Your Application “Oracle RAC Ready” & Test For It
Make Your Application “Oracle RAC Ready” & Test For It
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Why Architecting for Disaster Recovery is Important for Your Time Series Data...
Why Architecting for Disaster Recovery is Important for Your Time Series Data...Why Architecting for Disaster Recovery is Important for Your Time Series Data...
Why Architecting for Disaster Recovery is Important for Your Time Series Data...
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
 
Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive
 
Frame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine LearningFrame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine Learning
 
B-tree & R-tree
B-tree & R-treeB-tree & R-tree
B-tree & R-tree
 
Introduction to Machine Learning with Azure & Databricks
Introduction to Machine Learning with Azure & DatabricksIntroduction to Machine Learning with Azure & Databricks
Introduction to Machine Learning with Azure & Databricks
 
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Pre-Con Ed: Using SQL to Access Your CA IDMS Databases
Pre-Con Ed: Using SQL to Access Your CA IDMS DatabasesPre-Con Ed: Using SQL to Access Your CA IDMS Databases
Pre-Con Ed: Using SQL to Access Your CA IDMS Databases
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™
 
PostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CapturePostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data Capture
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
 

Similar to Open Source Lambda Architecture for deep learning

04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
Marco Quartulli
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
Paco Nathan
 
Data Engineering A Deep Dive into Databricks
Data Engineering A Deep Dive into DatabricksData Engineering A Deep Dive into Databricks
Data Engineering A Deep Dive into Databricks
Knoldus Inc.
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case Studies
Paco Nathan
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
 
Deep learning and Apache Spark
Deep learning and Apache SparkDeep learning and Apache Spark
Deep learning and Apache Spark
QuantUniversity
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Jason Dai
 
ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302
Timothy Spann
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
Happiest Minds Technologies
 
Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_VeriticalsPeyman Mohajerian
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
Dan Eaton
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
eRic Choo
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
160606 data lifecycle project outline
160606 data lifecycle project outline160606 data lifecycle project outline
160606 data lifecycle project outline
Ian Duncan
 
Strata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache SparkStrata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache Spark
Databricks
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
DataStax Academy
 
Enabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and REnabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and R
Databricks
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Michael Rys
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Luciano Resende
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
Mark Kerzner
 

Similar to Open Source Lambda Architecture for deep learning (20)

04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
Data Engineering A Deep Dive into Databricks
Data Engineering A Deep Dive into DatabricksData Engineering A Deep Dive into Databricks
Data Engineering A Deep Dive into Databricks
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case Studies
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
Deep learning and Apache Spark
Deep learning and Apache SparkDeep learning and Apache Spark
Deep learning and Apache Spark
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
 
Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_Veriticals
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
160606 data lifecycle project outline
160606 data lifecycle project outline160606 data lifecycle project outline
160606 data lifecycle project outline
 
Strata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache SparkStrata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache Spark
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
 
Enabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and REnabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and R
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
 

More from Patrick Nicolas

Autonomous medical coding with discriminative transformers
Autonomous medical coding with discriminative transformersAutonomous medical coding with discriminative transformers
Autonomous medical coding with discriminative transformers
Patrick Nicolas
 
AI for electronic health records
AI for electronic health recordsAI for electronic health records
AI for electronic health records
Patrick Nicolas
 
Monadic genetic kernels in Scala
Monadic genetic kernels in ScalaMonadic genetic kernels in Scala
Monadic genetic kernels in Scala
Patrick Nicolas
 
Scala for Machine Learning
Scala for Machine LearningScala for Machine Learning
Scala for Machine Learning
Patrick Nicolas
 
Stock Market Prediction using Hidden Markov Models and Investor sentiment
Stock Market Prediction using Hidden Markov Models and Investor sentimentStock Market Prediction using Hidden Markov Models and Investor sentiment
Stock Market Prediction using Hidden Markov Models and Investor sentiment
Patrick Nicolas
 
Advanced Functional Programming in Scala
Advanced Functional Programming in ScalaAdvanced Functional Programming in Scala
Advanced Functional Programming in Scala
Patrick Nicolas
 
Adaptive Intrusion Detection Using Learning Classifiers
Adaptive Intrusion Detection Using Learning ClassifiersAdaptive Intrusion Detection Using Learning Classifiers
Adaptive Intrusion Detection Using Learning Classifiers
Patrick Nicolas
 
Data Modeling using Symbolic Regression
Data Modeling using Symbolic RegressionData Modeling using Symbolic Regression
Data Modeling using Symbolic Regression
Patrick Nicolas
 
Semantic Analysis using Wikipedia Taxonomy
Semantic Analysis using Wikipedia TaxonomySemantic Analysis using Wikipedia Taxonomy
Semantic Analysis using Wikipedia Taxonomy
Patrick Nicolas
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
Patrick Nicolas
 
Taxonomy-based Contextual Ads Targeting
Taxonomy-based Contextual Ads TargetingTaxonomy-based Contextual Ads Targeting
Taxonomy-based Contextual Ads Targeting
Patrick Nicolas
 
Multi-tenancy in Private Clouds
Multi-tenancy in Private CloudsMulti-tenancy in Private Clouds
Multi-tenancy in Private Clouds
Patrick Nicolas
 

More from Patrick Nicolas (12)

Autonomous medical coding with discriminative transformers
Autonomous medical coding with discriminative transformersAutonomous medical coding with discriminative transformers
Autonomous medical coding with discriminative transformers
 
AI for electronic health records
AI for electronic health recordsAI for electronic health records
AI for electronic health records
 
Monadic genetic kernels in Scala
Monadic genetic kernels in ScalaMonadic genetic kernels in Scala
Monadic genetic kernels in Scala
 
Scala for Machine Learning
Scala for Machine LearningScala for Machine Learning
Scala for Machine Learning
 
Stock Market Prediction using Hidden Markov Models and Investor sentiment
Stock Market Prediction using Hidden Markov Models and Investor sentimentStock Market Prediction using Hidden Markov Models and Investor sentiment
Stock Market Prediction using Hidden Markov Models and Investor sentiment
 
Advanced Functional Programming in Scala
Advanced Functional Programming in ScalaAdvanced Functional Programming in Scala
Advanced Functional Programming in Scala
 
Adaptive Intrusion Detection Using Learning Classifiers
Adaptive Intrusion Detection Using Learning ClassifiersAdaptive Intrusion Detection Using Learning Classifiers
Adaptive Intrusion Detection Using Learning Classifiers
 
Data Modeling using Symbolic Regression
Data Modeling using Symbolic RegressionData Modeling using Symbolic Regression
Data Modeling using Symbolic Regression
 
Semantic Analysis using Wikipedia Taxonomy
Semantic Analysis using Wikipedia TaxonomySemantic Analysis using Wikipedia Taxonomy
Semantic Analysis using Wikipedia Taxonomy
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Taxonomy-based Contextual Ads Targeting
Taxonomy-based Contextual Ads TargetingTaxonomy-based Contextual Ads Targeting
Taxonomy-based Contextual Ads Targeting
 
Multi-tenancy in Private Clouds
Multi-tenancy in Private CloudsMulti-tenancy in Private Clouds
Multi-tenancy in Private Clouds
 

Recently uploaded

一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 

Recently uploaded (20)

一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 

Open Source Lambda Architecture for deep learning

  • 1. 𝜆 Open Source -Architecture for Deep Learning Use case Patrick R Nicolas Oct. 2020 pnicolasai@yahoo.com
  • 2. Overview 3 “… and the wise man said, thou shall embrace open source”. 21st century proverb Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
  • 3. Overview 4 Overview Layers Open-source components Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
  • 4. Overview 5 The world of data scientists accustomed to Python scientific libraries have been shaken up by the emergence of ’big data’ framework such as Apache Hadoop, Spark and Kafka. This presentation introduces a variant of the architecture and describes the seamless integration of various open source components to train, validate and test deep learning models. Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning 𝜆
  • 5. Disclaimer 6 The concept and architecture are versatile enough to accommodate a variety of open source, commercial solutions and services beside the frameworks prescribed in this presentation. For instance, deep learning frameworks, such as Keras or tensor flow are excellent alternatives to PyTorch. Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
  • 6. Requirements 7 • Process batch and stream data, concurrently • Enforce data immutability • Recover gracefully from human errors • Handle hardware failures • Minimize latency for real-time requests • Scale for very large data set • Optimize full lifecycle of data set • Guarantee quality and integrity of data Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning A ‘big data’ framework should be able to ….
  • 7. Optimizing data life cycle 8 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning The need for optimizing the data life cycle: 79% of data scientist time is spent collecting and organizing data. Source Quora
  • 8. Data quality 9 Accuracy: Correct models and representative data. Completeness: No missing data Consistency: Applied to semantic and format Timeliness: Up-to-date data and notification Accessibility: Ease of use and high availability Validity: Comply to constraints, rules and regulations Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Guaranteeing data quality and integrity
  • 9. Solution … 10 - architecture is a large scale data processing that balanced batch and real-time streamed data. It is a one-stop shopping for various data sources that balance latency, redundancy, easy of access and throughput. It breaks down into 3 layers • Speed (streaming, real-time, …) • Batch (training, analysis, …) • Serving (query, visualization, …) Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning 𝜆
  • 10. … using open source 11 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning architecture using open source components? 𝜆 The task consists of reviewing and evaluating the trove of available of open source libraries to build a robust architecture that support the rigor of training and tuning deep learning models. The libraries are weaved through a set language- agnostic REST API to form a coherent pipeline.
  • 11. … for deep learning 12 • Python scientific libraries have been the go-to tools for data scientists to analyze data and build models. • PyTorch framework builds up on these libraries to support the design and execution of deep learning models. • Apache Spark and Kafka complements these frameworks for very large data set and real-time processing. Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning architecture for deep learning? 𝜆
  • 12. Bird-eye view 13 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Feel overwhelmed? ... Let’s break it down Example open source 𝜆 architecture
  • 13. Layers 14 Overview Layers Open-source components Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
  • 14. Batch layer 15 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Batch layer objective: load batch of data to be distributed, preprocessed to train deep learning models.
  • 15. Batch layer 16 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Typical use case: 1. Apache Spark loads training set from Amazon S3 2. Spark master partitions training data 3. Spark workers preprocessed data and notify completion through Kafka event queue 4. Pytorch updated model parameters from pre- processed training data 5. Pytorch broadcast model parameters and quality metrics through Kafka 6. Apache Hive powered by Spark stores models related data and metrics
  • 16. Speed layer 17 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Speed layer objective: process queries to predictive models with very low latency.
  • 17. Speed layer 18 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Use case: 1. Kafka routes data streams to Spark master 2. Spark pre-processes requests and forward them to deep model micro-service 3. Flask converts requests to prediction query to Pytorch model 4. Pytorch model generate a prediction 5. Run-time metrics are broadcast through Kafka
  • 18. Serving layer 19 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Serving layer objective: process queries to analyze data, model performances and execute statistical inference
  • 19. Serving layer 20 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Use case: 1. Analyst queries relational data base, MySQL for most recent data, statistics using Fine report UI (low latency) 2. Analyst queries asynchronously Hive data warehouse for archived data, statistics (high latency) 3. Hive processes queries through Spark datasets 4. Spark updates regularly MySQL short term data
  • 20. Overview 21 Overview Layers Open-source components Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
  • 21. PyTorch 22 PyTorch is an optimized tensor library for deep learning using GPUs and CPUs. It extends the functionality of Numpy and Scikit- learn to support the training, evaluation and commercialization of complex machine learning models. https://pytorch.org/tutorials/ Alternatives: Tensor flow: https://www.tensorflow.org/ Keras: https://keras.io MxNet: https://mxnet.apache.org Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
  • 22. Apache Spark 23 Apache Spark is an open source cluster computing framework for fast real-time processing. It supports Scala, Java, Python and R programming languages and includes streaming, graph and machine learning libraries. https://www.scala-lang.org https://spark.apache.org Alternative: PySpark: https://databricks.com/glossary/pyspark Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
  • 23. Streaming 24 Apache Kafka is an open-source distributed event streaming framework to large scale, real-time data processing and analytics. It captures data from various sources in real-time as a continuous flow and routes it to the appropriate processor. https://kafka.apache.org Alternatives: Amazon SQS: https://aws.amazon.com/sqs/ RabbitMQ: https://www.rabbitmq.com Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
  • 24. Model tuning 25 Ray-tune is a distributed hyper-parameters tuning framework particularly suitable to deep learning models. It reduces significantly the cost of optimizing the configuration of a model. It is a wrapper around other open source libraries https://docs.ray.io/en/master/tune/index.html Alternatives: Amazon SageMaker: https://aws.amazon.com/sagemaker/ HyperOpt: https://github.com/hyperopt/hyperopt Optuna: https://optuna.readthedocs.io Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
  • 25. Python REST service 26 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Flask is an easy to use implementation of the RESTful interface to Python applications. It supports most of web and deployment standards such Docker, React.js, Angular, HTML5 and WSGI containers. https://palletsprojects.com/p/flask/ Alternatives: Falcon: https://falcon.readthedocs.io Fast API: https://fastapi.tiangolo.com
  • 26. RDBMS 27 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning MySQL is an open source relational database supporting partitioning, sharding, replication. It can be extended with real-time analytics (Heatwave) and enterprise clustering (CGE) https://www.mysql.com Alternatives: PosgresSQL: https://www.postgresql.org HyperSQL http://www.hsqldb.org Amazon RDS: http://aws.amazon.com/rds
  • 27. Data warehouse 28 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Apache Hive is a data warehouse framework that leverages Spark to execute largely distributed SQL queries. It optimizes SQL queries through lazy evaluation of acyclic execution graph. It is integrated with Spark data set and HDFS. https://hive.apache.org Alternatives: Vertica http://www.vertica.com Amazon Redshift https://aws.amazon.com/redshift/
  • 28. Dashboard 29 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Fine report is a business intelligence and dashboard tool that supports real time analytics, reporting and visualization. It accomodates needs of business managers and data scientists https://www.finereport.com Alternatives: Sisense: https://www.sisense.com Tableau: https://www.tableau.com
  • 29. 30 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Final disclaimer This presentation is not an endorsement of the various tools, libraries or frameworks described or suggested in this presentation. Allthough the tools listed in the slides are known to work in the context of the architecture, there are excellent alternative libraries that may better meet your specific needs.
  • 30. 31 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Thank you! Q&A