SlideShare a Scribd company logo
End-to-End Big Data AI with Analytics Zoo
Jason Dai – Sr. Principal Engineer
Fadi Zuhayri – Sr. Director
Intel Architecture, Graphics & Software
• Intel’s Transformation for Intelligence Era
• Analytics Zoo: Software Platform for Big Data AI
• Building Big Data AI Applications on Analytics Zoo
Outline
Fadi Zuhayri
Sr. Director
COMPUTE
1980 1990 2000 2010 2020 2030 2040
PC ERA
DIGITIZE
EVERYTHING
NETWORK
EVERYTHING
1 BILLION INTERNET
CONNECTED DEVICES
1018
109
104
1015
102
T E C H N O L O G Y L E D D I S R U P T I O N S
COMPUTE
DEMOCRATIZATION
COMPUTE
1980 1990 2000 2010 2020 2030 2040
PC ERA
DIGITIZE
EVERYTHING
NETWORK
EVERYTHING
1 BILLION INTERNET
CONNECTED DEVICES
1018
109
104
1015
102
CLOUD
EVERYTHING
MOBILE
EVERYTHING
MOBILE + CLOUD ERA
10 BILLION CLOUD CONNECTED DEVICES
T E C H N O L O G Y L E D D I S R U P T I O N S
COMPUTE
DEMOCRATIZATION
COMPUTE
1980 1990 2000 2010 2020 2030 2040
PC ERA
DIGITIZE
EVERYTHING
NETWORK
EVERYTHING
1 BILLION INTERNET
CONNECTED DEVICES
1018
109
104
1015
102
100 BILLION
INTELLIGENT
CONNECTED
DEVICES
INTELLIGENCE ERA
CLOUD
EVERYTHING
MOBILE
EVERYTHING
MOBILE + CLOUD ERA
10 BILLION CLOUD CONNECTED DEVICES
T E C H N O L O G Y L E D D I S R U P T I O N S
COMPUTE
DEMOCRATIZATION
COMPUTE
1980 1990 2000 2010 2020 2030 2040
PC ERA
DIGITIZE
EVERYTHING
NETWORK
EVERYTHING
1 BILLION INTERNET
CONNECTED DEVICES
1018
109
104
1015
102
T E C H N O L O G Y L E D D I S R U P T I O N S
COMPUTE
DEMOCRATIZATION F O R E V E R Y O N E
EXASCALE
CLOUD
EVERYTHING
MOBILE
EVERYTHING
MOBILE + CLOUD ERA
10 BILLION CLOUD CONNECTED DEVICES
Industry inflections are fueling the growth of data
Intelligent
Edge
Artificial
Intelligence
5G Network
Transformation
Cloudification
Move Faster Store More Process Everything
Software & System Level Optimized
Intel® Silicon Photonics
Intel® Ethernet
Intel® Tofino
Unleashing the Potential of Data
Analytics & AI Strategy
21
4 3 Hardware
Software
Ecosystem
OPTIMIZED
SOFTWARE
E2E DATA
SCIENCE
UNIFIED
APIs
CPU INFUSED
WITH AI
FLEXIBLE
ACCELERATION
OPTIMIZED
PLATFORM
A THRIVING
COMMUNITY
INTELLIGENT
SOLUTIONS
INNOVATION &
INVESTMENT
XPU: DIVERSE INTEL HW PORTFOLIO – from edge to cloud
CPU
General
compute,
AI Inference
& training
Xe
GPU
HPC & AI
AI training &
inference
Habana
Low power
vision
Low power
NLPGNA
Movidius
24 OPTIMIZED
TOPOLOGIES
44 OPTIMIZED
TOPOLOGIES
100+ OPTIMIZED
TOPOLOGIES
…
Foundation for AI
13
More built-in
AI acceleration &
optimized
topologies with
each new gen
OPTIMIZED LIBRARIES AND FRAMEWORKS
2017 1ST GEN
Intel® Advanced
Vector Extensions
512 (Intel AVX-512)
2019 2ND GEN
Intel Deep
Learning Boost
(with VNNI)
2020 3RD GEN
Intel Deep
Learning Boost
(VNNI, BF16)
Intel Deep
Learning Boost
(AMX)
2021 NEXT GEN
AIPERFORMANCE
CPU INFUSED
WITH AI
Languages and Libraries
Middleware & Frameworks
Application Workloads
CPU GPU FPGAOther
Accelerators
Scalar Vector Matrix Spatial
XPUs
Languages and Libraries
Middleware & Frameworks
Application Workloads
CPU GPU FPGAOther
Accelerators
Scalar Vector Matrix Spatial
2020 2021
Q1 Q2 Q3 Q4Q4Q3
0.6
Spec.
0.7
Spec.
0.8
Spec.
0.9
Spec.
1.0
Spec.
Industry
Initiative
Announced
More Soon…
Learn more at oneapi.com
Middleware, Frameworks & Runtimes
Applications & Services
XPUs
CPU GPU FPGA
Intel® oneAPI Product
Gold
Available December
2020
...
Compatibility
Tool
Languages Libraries
Analysis &
Debug Tools
Hardware Abstraction Layer
Run Locally Run in the Cloud
Get started quickly: code samples, quick-start guides, webinars, training
software.intel.com/oneapi
Downloads
Repositories
Containers
DevCloud
One Minute
to Code
No Hardware
Acquisition
No Download, Install
or Configuration
Support for Jupyter
Notebooks, VS Code
Easy Access to Samples
and Tutorials
ECOSYSTEM OF AI SOFTWARE STACK
HW
LIBRARIES &
COMPILERS
AI/ANALYTICS
SOLUTIONS
DL/ML/BIGDATA
FRAMEWORKS
XPU
oneDNN
CPU
oneDAL oneCCL
DATA SCIENTISTS &
DATA ANALYSTS
GPU ACCELERATERS
M O D E L Z O O A N A L Y T I C S Z O O
O P E N -
V I N O ™
T E N S O R -
F L O W
P Y T H O N
/
N U M B A
T V M
P Y -
T O R C H
M X N E T
S P A R K
S Q L + M L / DL S c a l e O ut
M O D I N
N U M P Y
X G -
BO O S T
S C I K I T -
L E A R N
P A N D A S
21
Achieving higher
yields and efficiency
Increasing production
and uptime
Transforming
learning
Enhancing
safety
Revolutionizing
patient outcomes
Turning data
into value
Pervasive Analytics & AI
Agriculture Energy Education Government Finance Healthcare
Empowering
industry 4.0
Creating thrilling
experiences
Modernizing
shopping
Enabling homes to
see, hear & respond
Fueling automated
driving
Driving network
efficiency
Industrial Media Retail Smart Home Telecom Transport
intel.com/customerspotlight
INFERENCETRAINING
FEATURE
ENGINEERING
AI TREND: END TO END
GROWING DEMAND FOR END-TO-END
AI PIPELINE
DATA
INGESTION
Big Data AI Open-Source Software Platform
Distributed TensorFlow, PyTorch, Keras, BigDL, RAY, and Apache Spark
Reference Use Cases, AI Models, High-level APIs, Feature Engineering, etc.
https://github.com/intel-analytics/analytics-zoo
AI on Big Data
Simplifying End-to-End Big Data AI Solutions Development
Jason Dai
Senior Principal Engineer
• Big Data AI
• Analytics Zoo overview
• Summary
Agenda
• Big Data AI
• Analytics Zoo overview
• Summary
Agenda
Seamless Scaling from Laptop to Distributed Big Data
Distributed, High-Performance
Deep Learning Framework
for Apache Spark
https://github.com/intel-analytics/bigdl
Unified Big Data AI Platform
for TensorFlow, PyTorch, Keras, BigDL,
OpenVINO, Ray and Apache Spark
https://github.com/intel-analytics/analytics-zoo
AI on Big Data
Transformation of Big Data
• Storing and processing more data
• Analyzing (querying) more data
• Real-time analysis
• Modelling and prediction (ML/DL)
AI is everywhere
• Moving from experimentation to production
• Applying to large-scale, distributed Big Data
Big Data AI
Case Study: Image Feature Extraction at JD.com
Query
Search Result
Source: “Bringing deep learning into big data analytics using BigDL”, Xianyan Jia and Zhenhua Wang, Strata Data Conference Singapore 2017
Similar Image Search Image Deduplication
Image Feature
Extraction:
Applications:
* https://software.intel.com/en-us/articles/building-large-scale-image-feature-extraction-with-bigdl-at-jdcom
* “BigDL: A Distributed Deep Learning Framework for Big Data”, ACM SoCC 2019, https://arxiv.org/abs/1804.05839
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
• End-to-end Big Data AI
pipeline (using BigDL
on Apache Spark)
• Efficiently scale out
(3.83x speed-up vs.
Nvidia GPU severs)*
Case Study: Image Feature Extraction at JD.com
Analytics Zoo: Software Platform for Big Data AI
End-to-End Pipelines
(Seamlessly scale AI models to distributed Big Data)
ML Workflow
(Automate tasks for building end-to-end pipelines)
Models
(Built-in models and algorithms)
Compute
Environment
K8s Cluster Cloud
Python Libraries
(Numpy/Pandas/sklearn/…)
DL Frameworks
(TF/PyTorch/BigDL/OpenVINO/…)
Distributed Analytics
(Spark/Flink/Ray/…)
Laptop Hadoop Cluster
Powered by oneAPI
https://github.com/intel-analytics/analytics-zoo
End-to-End Big Data Analytics and AI
Seamless Scaling from Laptop to Distributed Big Data
Big Data
Pipeline
Prototype on laptop
using sample data
Experiment on clusters
with history data
Production deployment w/
distributed data pipeline
• Easily prototype end-to-end pipelines that apply AI models to big data
• “Zero” code change from laptop to distributed cluster
• Seamlessly deployed on production Hadoop/K8s clusters
• Automate the process of applying machine learning to big data
• Big Data AI
• Analytics Zoo overview
• Summary
Agenda
Analytics Zoo: Software Platform for Big Data AI
Recommendation
Distributed TensorFlow & PyTorch on Spark
Spark Dataframes & ML Pipelines for DL
RayOnSpark
InferenceModel
Models &
Algorithms
End-to-end
Pipelines
Time Series Computer Vision NLP
ML Workflow AutoML Automatic Cluster Serving
Compute
Environment
K8s Cluster Cloud
Python Libraries
(Numpy/Pandas/sklearn/…)
DL Frameworks
(TF/PyTorch/BigDL/OpenVINO/…)
Distributed Analytics
(Spark/Flink/Ray/…)
Laptop Hadoop Cluster
Powered by oneAPI
https://github.com/intel-analytics/analytics-zoo
Analytics Zoo: Software Platform for Big Data AI
Recommendation
Spark Dataframes & ML Pipelines for DL
Distributed TensorFlow & PyTorch on Spark
InferenceModel
Models &
Algorithms
End-to-end
Pipelines
Time Series Computer Vision NLP
ML Workflow AutoML Automatic Cluster Serving
Compute
Environment
K8s Cluster Cloud
Python Libraries
(Numpy/Pandas/sklearn/…)
DL Frameworks
(TF/PyTorch/BigDL/OpenVINO/…)
Distributed Analytics
(Spark/Flink/Ray/…)
Laptop Hadoop Cluster
Powered by oneAPI
RayOnSpark
https://github.com/intel-analytics/analytics-zoo
Distributed TensorFlow/PyTorch on Spark
Write TensorFlow/PyTorch inline with Spark code
#pyspark code
train_rdd = spark.hadoopFile(…).map(…)
dataset = TFDataset.from_rdd(train_rdd,…)
#tensorflow code
import tensorflow as tf
slim = tf.contrib.slim
images, labels = dataset.tensors
with slim.arg_scope(lenet.lenet_arg_scope()):
logits, end_points = lenet.lenet(images, …)
loss = tf.reduce_mean(tf.losses.sparse_softmax_cross_entropy( 
logits=logits, labels=labels))
#distributed training on Spark
optimizer = TFOptimizer.from_loss(loss, Adam(…))
optimizer.optimize(end_trigger=MaxEpoch(5))
Analytics Zoo API in blue
Network Quality Prediction in SK Telecom
Distributed TensorFlow/PyTorch on Spark
https://networkbuilders.intel.com/solutionslibrary/sk-telecom-intel-build-ai-pipeline-to-improve-network-quality
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
Analytics Zoo: Software Platform for Big Data AI
Recommendation
Spark Dataframes & ML Pipelines for DL
Distributed TensorFlow & PyTorch on Spark
InferenceModel
Models &
Algorithms
End-to-end
Pipelines
Time Series Computer Vision NLP
ML Workflow AutoML Automatic Cluster Serving
Compute
Environment
K8s Cluster Cloud
Python Libraries
(Numpy/Pandas/sklearn/…)
DL Frameworks
(TF/PyTorch/BigDL/OpenVINO/…)
Distributed Analytics
(Spark/Flink/Ray/…)
Laptop Hadoop Cluster
Powered by oneAPI
RayOnSpark
https://github.com/intel-analytics/analytics-zoo
• : distributed framework for
emerging AI applications
• RayOnSpark
• Directly run Ray programs on Big
Data cluster
• Integrate Ray programs into Spark
data pipeline
https://medium.com/riselab/rayonspark-running-emerging-ai-applications-on-big-data-clusters-with-ray-and-analytics-zoo-923e0136ed6a
RayOnSpark
Run Ray Programs Directly on Big Data Platform
RayOnSpark
Run Ray Programs Directly on Big Data Platform
Analytics Zoo API in blue
sc = init_spark_on_yarn(...)
ray_ctx = RayContext(sc=sc, ...)
ray_ctx.init()
#Ray code
@ray.remote
class TestRay():
def hostname(self):
import socket
return socket.gethostname()
actors = [TestRay.remote() for i in range(0, 100)]
print([ray.get(actor.hostname.remote()) for actor in actors])
ray_ctx.stop()
https://medium.com/riselab/rayonspark-running-emerging-ai-applications-on-big-data-clusters-with-ray-and-analytics-zoo-923e0136ed6a
Fast Food Recommendation in Burger King
End-to-End Training Pipeline w/ RayOnSpark
* https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king-with-rayonspark-2e7a6009dd2d
* “Context-Aware Drive-thru Recommendation Service at Fast Food Restaurants”, https://arxiv.org/abs/2010.06197
DATA
INGESTIO N
FEATURE
ENGINEERING
TRAINING INFERENCE
on
Analytics Zoo: Software Platform for Big Data AI
Recommendation
Spark Dataframes & ML Pipelines for DL
Distributed TensorFlow & PyTorch on Spark
InferenceModel
Models &
Algorithms
End-to-end
Pipelines
Time Series Computer Vision NLP
ML Workflow AutoML Automatic Cluster Serving
Compute
Environment
K8s Cluster Cloud
Python Libraries
(Numpy/Pandas/sklearn/…)
DL Frameworks
(TF/PyTorch/BigDL/OpenVINO/…)
Distributed Analytics
(Spark/Flink/Ray/…)
Laptop Hadoop Cluster
Powered by oneAPI
RayOnSpark
https://github.com/intel-analytics/analytics-zoo
Scalable AutoML for Time Series Prediction
Automated feature generation, model selection and hyper parameter tuning
Analytics Zoo API in blue
tsp = TimeSequencePredictor( 
dt_col="datetime",
target_col="value")
pipeline = tsp.fit(train_df,
val_df, metric="mse",
recipe=RandomRecipe())
pipeline.predict(test_df)
https://medium.com/riselab/scalable-automl-for-time-series-prediction-using-ray-and-analytics-zoo-
b79a6fd08139
Scalable AutoML for Time Series Prediction
Automated feature generation, model selection and hyper parameter tuning
FeatureTransformer
Model
SearchEngine
Search presets
trial
trial
trial
trial
…best model
/parameters
trail jobs
Pipeline
with tunable parameters
with tunable parameters
configured with best parameters/model
Each trial runs a different combination
of hyper parameters
Ray Tune
rolling, scaling, feature generation, etc.
Spark + Ray
“Scalable AutoML for Time Series Forecasting using Ray”, USENIX OpML’20
TI-One ML Platform in Tencent Cloud
Scalable AutoML for Time Series Prediction
Using Analytics Zoo in Tencent Cloud TI-
One ML Platform
Predicting NYC Taxi Passengers Using AutoML
https://software.intel.com/content/www/us/en/develop/articles/tencent-cloud-leverages-analytics-zoo-to-improve-performance-of-ti-one-ml-platform.html
“Zouwu”
Open Source Framework for Time Series on Analytics Zoo
Application framework for building end-to-end
time series analysis
• Use case - reference time series use cases
• Models - built-in models for time series analysis
• AutoTS - AutoML support for building E2E time
series analysis pipelines
Project
Zouwu
Built-in Models
ML
Workflow
AutoML Workflow
End-to-End Pipelines
use-case
models autots
https://github.com/intel-analytics/analytics-
zoo/tree/master/pyzoo/zoo/zouwu
“Project Zouwu: Scalable AutoML for Telco Time Series Analysis using Ray and Analytics”, Ray Summit 2020
Analytics Zoo: Software Platform for Big Data AI
• E2E Big Data & AI pipeline (distributed TF/PyTorch/OpenVINO/Ray on Spark)
• Advanced AI workflow (AutoML, Time-Series, Cluster Serving, etc.)
Github
• Project repo: https://github.com/intel-analytics/analytics-zoo
• Use cases: https://analytics-zoo.github.io/master/#powered-by/
Technical paper/tutorials
• CVPR 2020 tutorial: https://jason-dai.github.io/cvpr2018/
• ACM SoCC 2019 paper: https://arxiv.org/abs/1804.05839
• AAAI 2019 tutorial: https://jason-dai.github.io/aaai2019/
• CVPR 2018 tutorial: https://jason-dai.github.io/cvpr2018/
Conclusion
Jason Dai
Senior Principal Engineer
Analytics Zoo: Software Platform for Big Data AI
End-to-End Pipelines
(Seamlessly scale AI models to distributed Big Data)
ML Workflow
(Automate tasks for building end-to-end pipelines)
Compute
Environment
K8s Cluster Cloud
Python Libraries
(Numpy/Pandas/sklearn/…)
DL Frameworks
(TF/PyTorch/BigDL/OpenVINO/…)
Distributed Analytics
(Spark/Flink/Ray/…)
Laptop Hadoop Cluster
Powered by oneAPI
Recommendation Time Series Computer Vision NLP
https://github.com/intel-analytics/analytics-zoo
• Recommendation
• Time series analysis
• Computer vision
• Natural language processing (NLP)
Big Data AI Applications on Analytics Zoo
Food Recommendation Using Analytics Zoo in Burger King
Guest arrives
ODMB
Checks Menu
Board
Cashier enters
order
Checks Menu
Board
Guest
completes
order
* https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king-with-rayonspark-2e7a6009dd2d
* “Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King”, Data + AI Summit Europe 2020
Food Recommendation Challenges
Challenges
• Lack of user identifiers
• Same session food compatibilities
• Other variables in our use case:
locations, weathers, time, etc.
• Deployment challenges
* https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king-with-rayonspark-2e7a6009dd2d
* “Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King”, Data + AI Summit Europe 2020
Transformer Cross Transformer (TxT) Model
Model Components
• Sequence Transformer
• Taking item order sequence as input
• Context Transformer
• Taking multiple context features as input
• Latent Cross Joint Training
• Element-wise product for both transformer
outputs
* https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king-with-rayonspark-2e7a6009dd2d
* “Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King”, Data + AI Summit Europe 2020
Unified Big Data Processing and Model Training on
Analytics Zoo
CurrentPrevious
* https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king-with-rayonspark-2e7a6009dd2d
* “Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King”, Data + AI Summit Europe 2020
Food Recommendation Using Analytics Zoo in
Burger King
Offline Training Result
Model Top1
Accuracy
Top3
Accuracy
RNN 29.98% 46.24%
Contextual ItemCF 32.18% 48.37%
RNN Latent Cross 33.10% 49.98%
TxT 34.52% 52.37%
A/B Testing Result
Model Conversation
Rate Gain
Add-on
Sales Gain
RNN Latent
Cross (control)
- -
TxT +7.5% +4.7%
* https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king-with-rayonspark-2e7a6009dd2d
* “Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King”, Data + AI Summit Europe 2020
Recommendation Using Analytic Zoo in Mastercard
https://software.intel.com/en-us/articles/deep-learning-with-analytic-zoo-optimizes-mastercard-recommender-ai-service
Train NCF Model
Features Models
Model
Candidates
Models
sampled
partition
Training Data
…
Load Parquet
Train Multiple Models
Train Wide & Deep Model
sampled
partition
sampled
partition
Spark ML Pipeline Stages
Test Data
Predictions
Test
Spark DataFramesParquet Files
Feature
Selections
SparkMLPipeline
Neural Recommender using Analytics Zoo
Estimator Transformer Model
Evaluation
& Fine
Tune
Train ALS Model
• Recommendation
• Time series analysis
• Computer vision
• Natural language processing (NLP)
Big Data AI Applications on Analytics Zoo
Time Series Based Network Quality Prediction in
SK Telecom
• Predict Network Quality Indicators (CQI, RSRP, RSRQ, SINR, …)*
for anomaly detection and real-time management
* CQI : Channel Quality Indicator
* RSRP : Reference Signal Received Power
* RSRQ : Reference Signal Received Quality
* SINR :Signal to Interference Noise Ratio
* “Vectorized Deep Learning Acceleration from Preprocessing to Inference and Training on Apache Spark in SK Telecom”, Spark + AI Summit 2020
* https://networkbuilders.intel.com/solutionslibrary/sk-telecom-intel-build-ai-pipeline-to-improve-network-quality
Memory Augmented Network
https://networkbuilders.intel.com/solutionslibrary/sk-telecom-intel-build-ai-pipeline-to-improve-network-quality
Memory Augmented Network – Test Result
Improved predictions for sudden change!
seq2seq
Mem-network
https://networkbuilders.intel.com/solutionslibrary/sk-telecom-intel-build-ai-pipeline-to-improve-network-quality
Migrating to Analytics Zoo on Intel® Xeon
Data Loader
DRAM
Store tiering forked.
Flash
Store customized.
Data Source APIs
Spark-SQL
SQL Queries
(Web, Jupyter)
LegacyDesignwithGPU
Export Preprocessing AITraining/Inference
GPU
Servers
NewArchitecture: Unified DataAnalytic+AIPlatform
Preprocessing RDDofTensor
2nd Generation Intel®Xeon®
Scalable Processors
csv files pandas, dask
spark spark
spark
Manually manage separate clusters, and segregated workflow
E2E architecture that atomically scales deep learning on Spark
AIModelCodeofTF
https://networkbuilders.intel.com/solutionslibrary/sk-telecom-intel-build-ai-pipeline-to-improve-network-quality
Inference Pipeline Speed-up with Analytics Zoo
* https://networkbuilders.intel.com/solutionslibrary/sk-telecom-intel-build-ai-pipeline-to-improve-network-quality
Up-to 6x speedup for
end-to-end inference
on Analytics Zoo in
SK Telecom*
Training Pipeline Speed-up with Analytics Zoo
Up-to 4x speedup for
end-to-end training
on Analytics Zoo in
SK Telecom*
* https://networkbuilders.intel.com/solutionslibrary/sk-telecom-intel-build-ai-pipeline-to-improve-network-quality
Wind Power Prediction using Analytics Zoo in
GoldWind
LSTNet
ETL Training Prediction
Deployment
Update
DB
Historical
Power
Wind Power Prediction
• Accuracy improved to 79%
(from previous 59%)*
• 4x training speedup*
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
* https://www.intel.cn/content/www/cn/zh/analytics/artificial-intelligence/create-power-forecasting-solutions.html
• Recommendation
• Time series analysis
• Computer vision
• Natural language processing (NLP)
Big Data AI Applications on Analytics Zoo
Industrial Vision Inspection Using Analytics Zoo in
Midea and KUKA
https://software.intel.com/en-us/articles/industrial-inspection-platform-in-midea-and-kuka-using-distributed-tensorflow-on-analytics
Industrial Vision Inspection Using Analytics Zoo in
Midea and KUKA
Edge to Cloud architecture using
Analytics Zoo
• 99.8% accuracy*
• <50ms image processing latency*
• >8x inference speedup*
* https://software.intel.com/en-us/articles/industrial-inspection-platform-in-midea-and-kuka-using-distributed-tensorflow-on-analytics
* https://www.intel.cn/content/www/cn/zh/analytics/artificial-intelligence/midea-case-study.html
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
AI-Assisted Radiology Using Analytics Zoo in
Dell EMC
Condition E
Condition D
Condition C
Condition B
Condition A
Patient A
Transfer Learning using ResNet-50 trained with ImageNet
https://www.delltaechnologies.com/resources/en-us/asset/white-papers/solutions/h17686_hornet_wp.pdf
chest X-rays
• Recommendation
• Time series analysis
• Computer vision
• Natural language processing (NLP)
Big Data AI Applications on Analytics Zoo
Customer Service Chatbot Using Analytics Zoo in
Microsoft Azure
*https://software.intel.com/en-us/articles/use-analytics-zoo-to-inject-ai-into-customer-service-platforms-on-microsoft-azure-part-1
*https://www.infoq.com/articles/analytics-zoo-qa-module/
Job Recommendation Using Analytics Zoo in Talroo
documents
(resume, job
description)
https://software.intel.com/content/www/us/en/develop/articles/talroo-uses-analytics-zoo-and-aws-to-leverage-deep-learning-for-job-recommendations.html
Analytics Zoo: Software Platform for Big Data AI
• E2E Big Data & AI pipeline (distributed TF/PyTorch/OpenVINO/Ray on Spark)
• Advanced AI workflow (AutoML, Time-Series, Cluster Serving, etc.)
Github
• Project repo: https://github.com/intel-analytics/analytics-zoo
• Use cases: https://analytics-zoo.github.io/master/#powered-by/
Technical paper/tutorials
• CVPR 2020 tutorial: https://jason-dai.github.io/cvpr2018/
• ACM SoCC 2019 paper: https://arxiv.org/abs/1804.05839
• AAAI 2019 tutorial: https://jason-dai.github.io/aaai2019/
• CVPR 2018 tutorial: https://jason-dai.github.io/cvpr2018/
Big Data AI
Summary
INDUSTRY INFLECTIONS ARE FUELING THE GROWTH OF DATA
5G Network Transformation, Artificial Intelligence, Intelligent Edge, Cloudification
AI & ANALYTICS ARE THE DEFINING WORKLOADS OF THE NEXT DECADE
with growing demand for end-to-end AI pipeline
UNMATCHED PORTFOLIO BREADTH AND ECOSYSTEM SUPPORT
Intel delivers a silicon & software foundation designed for the
diverse range of use cases from the cloud to the edge
ANALYTICS ZOO OPEN-SOURCE SOFTWARE PLATFORM FOR BIG DATA AI
Simplifies End-to-End Big Data AI pipeline solutions development
Thank You
Notices & Disclaimers
• Performance varies by use, configuration and other factors. Learn more at www.intel.com/performanceIndex
• Performance may vary based on the specific game title and server configuration. To reference the full list of Intel Server GPU platform measurements, please refer to
http://www.intel.com/content/www/us/en/benchmarks/server/graphics/IntelServerGPU
• All product plans and roadmaps are subject to change without notice.
• Intel technologies may require enabled hardware, software or service activation.
• No product or component can be absolutely secure.
• Your costs and results may vary.
• Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.
• All product plans and roadmaps are subject to change without notice.
• © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
Intel Server GPU TCO analysis is based on internal Intel research. Pricing as of 10/01/2020. Analysis assumes standard serving pricing, GPU list pricing, and software pricing based on
estimated Nvidia software license costs of $1 per year for 5 years.
• Intel Server GPU Performance may vary based on the specific game title and server configuration. To reference the full list of Intel Server GPU platform measurements, please refer to
http://www.intel.com/content/www/us/en/benchmarks/server/graphics/IntelServerGPU
• Video game footage courtesy of Tencent Games and Gamestream.
• LEGO STAR WARS TITLES : © Lucasfilm Entertainment Company Ltd. or Lucasfilm Ltd. & ® or TM as indicated. All rights reserved.
• LEGO, the LEGO logo and the Minifigure are trademarks of The LEGO Group. © The LEGO Group. All rights reserved.
• “DiRT4”™ : © 2017 The Codemasters Software Company Limited ("Codemasters"). All rights reserved. "Codemasters"®, “EGO”®, the Codemasters logo, and “DiRT”® are registered
trademarks owned by Codemasters. “DiRT4”™ and “RaceNet”™ are trademarks of Codemasters. All rights reserved. Under licence from International Management Group (UK) Limited. All
other copyrights or trademarks are the property of their respective owners and are being used under license. Developed by Codemasters.

More Related Content

What's hot

Dell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data CenterDell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data Center
Renee Yao
 
OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar
Ganesan Narayanasamy
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
inside-BigData.com
 
WML OpenPOWER presentation
WML OpenPOWER presentationWML OpenPOWER presentation
WML OpenPOWER presentation
Ganesan Narayanasamy
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
DESMOND YUEN
 
Building the World's Largest GPU
Building the World's Largest GPUBuilding the World's Largest GPU
Building the World's Largest GPU
Renee Yao
 
Artificial intelligence on the Edge
Artificial intelligence on the EdgeArtificial intelligence on the Edge
Artificial intelligence on the Edge
Usman Qayyum
 
A Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate ArraysA Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate Arrays
Taylor Riggan
 
NVIDIA Keynote #GTC21
NVIDIA Keynote #GTC21 NVIDIA Keynote #GTC21
NVIDIA Keynote #GTC21
Alison B. Lowndes
 
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta..."The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
Edge AI and Vision Alliance
 
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Mathieu Dumoulin
 
Simplifying AI Infrastructure: Lessons in Scaling on DGX Systems
Simplifying AI Infrastructure: Lessons in Scaling on DGX SystemsSimplifying AI Infrastructure: Lessons in Scaling on DGX Systems
Simplifying AI Infrastructure: Lessons in Scaling on DGX Systems
Renee Yao
 
DUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage ArchitecturesDUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage Architectures
Andrey Kudryavtsev
 
NVIDIA Developer Program Overview
NVIDIA Developer Program OverviewNVIDIA Developer Program Overview
NVIDIA Developer Program Overview
NVIDIA
 
Reduce the complexities of managing Kubernetes clusters anywhere 2
Reduce the complexities of managing Kubernetes clusters anywhere 2Reduce the complexities of managing Kubernetes clusters anywhere 2
Reduce the complexities of managing Kubernetes clusters anywhere 2
Ashnikbiz
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
NVIDIA Taiwan
 
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA Taiwan
 
StarlingX - Project Onboarding
StarlingX - Project OnboardingStarlingX - Project Onboarding
StarlingX - Project Onboarding
Shuquan Huang
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
Ganesan Narayanasamy
 

What's hot (20)

Dell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data CenterDell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data Center
 
OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
 
WML OpenPOWER presentation
WML OpenPOWER presentationWML OpenPOWER presentation
WML OpenPOWER presentation
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
 
Building the World's Largest GPU
Building the World's Largest GPUBuilding the World's Largest GPU
Building the World's Largest GPU
 
Artificial intelligence on the Edge
Artificial intelligence on the EdgeArtificial intelligence on the Edge
Artificial intelligence on the Edge
 
A Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate ArraysA Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate Arrays
 
NVIDIA Keynote #GTC21
NVIDIA Keynote #GTC21 NVIDIA Keynote #GTC21
NVIDIA Keynote #GTC21
 
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta..."The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
 
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
 
Simplifying AI Infrastructure: Lessons in Scaling on DGX Systems
Simplifying AI Infrastructure: Lessons in Scaling on DGX SystemsSimplifying AI Infrastructure: Lessons in Scaling on DGX Systems
Simplifying AI Infrastructure: Lessons in Scaling on DGX Systems
 
DUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage ArchitecturesDUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage Architectures
 
NVIDIA Developer Program Overview
NVIDIA Developer Program OverviewNVIDIA Developer Program Overview
NVIDIA Developer Program Overview
 
Reduce the complexities of managing Kubernetes clusters anywhere 2
Reduce the complexities of managing Kubernetes clusters anywhere 2Reduce the complexities of managing Kubernetes clusters anywhere 2
Reduce the complexities of managing Kubernetes clusters anywhere 2
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
 
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
 
StarlingX - Project Onboarding
StarlingX - Project OnboardingStarlingX - Project Onboarding
StarlingX - Project Onboarding
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 

Similar to End-to-End Big Data AI with Analytics Zoo

Technology and AI sharing - From 2016 to Y2017 and Beyond
Technology and AI sharing - From 2016 to Y2017 and BeyondTechnology and AI sharing - From 2016 to Y2017 and Beyond
Technology and AI sharing - From 2016 to Y2017 and Beyond
James Huang
 
Bhadale group of companies our technology ecosystem
Bhadale group of companies our technology ecosystemBhadale group of companies our technology ecosystem
Bhadale group of companies our technology ecosystem
Vijayananda Mohire
 
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Mark Goldstein
 
Intel Powered AI Applications for Telco
Intel Powered AI Applications for TelcoIntel Powered AI Applications for Telco
Intel Powered AI Applications for Telco
Michelle Holley
 
Introduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI PlatformIntroduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI Platform
Indrajit Poddar
 
AIoT: Intelligence on Microcontroller
AIoT: Intelligence on MicrocontrollerAIoT: Intelligence on Microcontroller
AIoT: Intelligence on Microcontroller
Andri Yadi
 
2 pc enterprise summit cronin newfinal aug 18
2 pc enterprise summit cronin newfinal aug 182 pc enterprise summit cronin newfinal aug 18
2 pc enterprise summit cronin newfinal aug 18
IntelAPAC
 
Activeeon - Scale Beyond Limits
Activeeon - Scale Beyond LimitsActiveeon - Scale Beyond Limits
Activeeon - Scale Beyond Limits
Activeeon
 
Dell AI Telecom Webinar
Dell AI Telecom WebinarDell AI Telecom Webinar
Dell AI Telecom Webinar
Bill Wong
 
Dell AI and HPC University Roadshow
Dell AI and HPC University RoadshowDell AI and HPC University Roadshow
Dell AI and HPC University Roadshow
Bill Wong
 
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo JapanAI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
Avkash Chauhan
 
bhide_connected_raleigh2016 (1)
bhide_connected_raleigh2016 (1)bhide_connected_raleigh2016 (1)
bhide_connected_raleigh2016 (1)sandhibhide
 
oneAPI: Industry Initiative & Intel Product
oneAPI: Industry Initiative & Intel ProductoneAPI: Industry Initiative & Intel Product
oneAPI: Industry Initiative & Intel Product
Tyrone Systems
 
Intel IoT Edge Computing 在 AI 領域的應用與商機
Intel IoT Edge Computing 在 AI 領域的應用與商機Intel IoT Edge Computing 在 AI 領域的應用與商機
Intel IoT Edge Computing 在 AI 領域的應用與商機
Amazon Web Services
 
DDDP 2019 - Brown to Green
DDDP 2019  - Brown to GreenDDDP 2019  - Brown to Green
DDDP 2019 - Brown to Green
John Archer
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
John Archer
 
Parallel universe-issue-29
Parallel universe-issue-29Parallel universe-issue-29
Parallel universe-issue-29
DESMOND YUEN
 
OpenPOWER foundation
OpenPOWER foundationOpenPOWER foundation
OpenPOWER foundation
Ganesan Narayanasamy
 
HPE Discover 2017 - Internet of Things Program Guide
HPE Discover 2017 - Internet of Things Program GuideHPE Discover 2017 - Internet of Things Program Guide
HPE Discover 2017 - Internet of Things Program Guide
Isaac Rodriguez
 
Resume march 20
Resume march 20Resume march 20
Resume march 20
Mohamed Rashad
 

Similar to End-to-End Big Data AI with Analytics Zoo (20)

Technology and AI sharing - From 2016 to Y2017 and Beyond
Technology and AI sharing - From 2016 to Y2017 and BeyondTechnology and AI sharing - From 2016 to Y2017 and Beyond
Technology and AI sharing - From 2016 to Y2017 and Beyond
 
Bhadale group of companies our technology ecosystem
Bhadale group of companies our technology ecosystemBhadale group of companies our technology ecosystem
Bhadale group of companies our technology ecosystem
 
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
 
Intel Powered AI Applications for Telco
Intel Powered AI Applications for TelcoIntel Powered AI Applications for Telco
Intel Powered AI Applications for Telco
 
Introduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI PlatformIntroduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI Platform
 
AIoT: Intelligence on Microcontroller
AIoT: Intelligence on MicrocontrollerAIoT: Intelligence on Microcontroller
AIoT: Intelligence on Microcontroller
 
2 pc enterprise summit cronin newfinal aug 18
2 pc enterprise summit cronin newfinal aug 182 pc enterprise summit cronin newfinal aug 18
2 pc enterprise summit cronin newfinal aug 18
 
Activeeon - Scale Beyond Limits
Activeeon - Scale Beyond LimitsActiveeon - Scale Beyond Limits
Activeeon - Scale Beyond Limits
 
Dell AI Telecom Webinar
Dell AI Telecom WebinarDell AI Telecom Webinar
Dell AI Telecom Webinar
 
Dell AI and HPC University Roadshow
Dell AI and HPC University RoadshowDell AI and HPC University Roadshow
Dell AI and HPC University Roadshow
 
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo JapanAI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
 
bhide_connected_raleigh2016 (1)
bhide_connected_raleigh2016 (1)bhide_connected_raleigh2016 (1)
bhide_connected_raleigh2016 (1)
 
oneAPI: Industry Initiative & Intel Product
oneAPI: Industry Initiative & Intel ProductoneAPI: Industry Initiative & Intel Product
oneAPI: Industry Initiative & Intel Product
 
Intel IoT Edge Computing 在 AI 領域的應用與商機
Intel IoT Edge Computing 在 AI 領域的應用與商機Intel IoT Edge Computing 在 AI 領域的應用與商機
Intel IoT Edge Computing 在 AI 領域的應用與商機
 
DDDP 2019 - Brown to Green
DDDP 2019  - Brown to GreenDDDP 2019  - Brown to Green
DDDP 2019 - Brown to Green
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
 
Parallel universe-issue-29
Parallel universe-issue-29Parallel universe-issue-29
Parallel universe-issue-29
 
OpenPOWER foundation
OpenPOWER foundationOpenPOWER foundation
OpenPOWER foundation
 
HPE Discover 2017 - Internet of Things Program Guide
HPE Discover 2017 - Internet of Things Program GuideHPE Discover 2017 - Internet of Things Program Guide
HPE Discover 2017 - Internet of Things Program Guide
 
Resume march 20
Resume march 20Resume march 20
Resume march 20
 

Recently uploaded

Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 

Recently uploaded (20)

Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 

End-to-End Big Data AI with Analytics Zoo

  • 1. End-to-End Big Data AI with Analytics Zoo Jason Dai – Sr. Principal Engineer Fadi Zuhayri – Sr. Director Intel Architecture, Graphics & Software
  • 2. • Intel’s Transformation for Intelligence Era • Analytics Zoo: Software Platform for Big Data AI • Building Big Data AI Applications on Analytics Zoo Outline
  • 4. COMPUTE 1980 1990 2000 2010 2020 2030 2040 PC ERA DIGITIZE EVERYTHING NETWORK EVERYTHING 1 BILLION INTERNET CONNECTED DEVICES 1018 109 104 1015 102 T E C H N O L O G Y L E D D I S R U P T I O N S COMPUTE DEMOCRATIZATION
  • 5. COMPUTE 1980 1990 2000 2010 2020 2030 2040 PC ERA DIGITIZE EVERYTHING NETWORK EVERYTHING 1 BILLION INTERNET CONNECTED DEVICES 1018 109 104 1015 102 CLOUD EVERYTHING MOBILE EVERYTHING MOBILE + CLOUD ERA 10 BILLION CLOUD CONNECTED DEVICES T E C H N O L O G Y L E D D I S R U P T I O N S COMPUTE DEMOCRATIZATION
  • 6. COMPUTE 1980 1990 2000 2010 2020 2030 2040 PC ERA DIGITIZE EVERYTHING NETWORK EVERYTHING 1 BILLION INTERNET CONNECTED DEVICES 1018 109 104 1015 102 100 BILLION INTELLIGENT CONNECTED DEVICES INTELLIGENCE ERA CLOUD EVERYTHING MOBILE EVERYTHING MOBILE + CLOUD ERA 10 BILLION CLOUD CONNECTED DEVICES T E C H N O L O G Y L E D D I S R U P T I O N S COMPUTE DEMOCRATIZATION
  • 7. COMPUTE 1980 1990 2000 2010 2020 2030 2040 PC ERA DIGITIZE EVERYTHING NETWORK EVERYTHING 1 BILLION INTERNET CONNECTED DEVICES 1018 109 104 1015 102 T E C H N O L O G Y L E D D I S R U P T I O N S COMPUTE DEMOCRATIZATION F O R E V E R Y O N E EXASCALE CLOUD EVERYTHING MOBILE EVERYTHING MOBILE + CLOUD ERA 10 BILLION CLOUD CONNECTED DEVICES
  • 8. Industry inflections are fueling the growth of data Intelligent Edge Artificial Intelligence 5G Network Transformation Cloudification
  • 9.
  • 10. Move Faster Store More Process Everything Software & System Level Optimized Intel® Silicon Photonics Intel® Ethernet Intel® Tofino Unleashing the Potential of Data
  • 11. Analytics & AI Strategy 21 4 3 Hardware Software Ecosystem OPTIMIZED SOFTWARE E2E DATA SCIENCE UNIFIED APIs CPU INFUSED WITH AI FLEXIBLE ACCELERATION OPTIMIZED PLATFORM A THRIVING COMMUNITY INTELLIGENT SOLUTIONS INNOVATION & INVESTMENT
  • 12. XPU: DIVERSE INTEL HW PORTFOLIO – from edge to cloud CPU General compute, AI Inference & training Xe GPU HPC & AI AI training & inference Habana Low power vision Low power NLPGNA Movidius
  • 13. 24 OPTIMIZED TOPOLOGIES 44 OPTIMIZED TOPOLOGIES 100+ OPTIMIZED TOPOLOGIES … Foundation for AI 13 More built-in AI acceleration & optimized topologies with each new gen OPTIMIZED LIBRARIES AND FRAMEWORKS 2017 1ST GEN Intel® Advanced Vector Extensions 512 (Intel AVX-512) 2019 2ND GEN Intel Deep Learning Boost (with VNNI) 2020 3RD GEN Intel Deep Learning Boost (VNNI, BF16) Intel Deep Learning Boost (AMX) 2021 NEXT GEN AIPERFORMANCE CPU INFUSED WITH AI
  • 14. Languages and Libraries Middleware & Frameworks Application Workloads CPU GPU FPGAOther Accelerators Scalar Vector Matrix Spatial
  • 15. XPUs Languages and Libraries Middleware & Frameworks Application Workloads CPU GPU FPGAOther Accelerators Scalar Vector Matrix Spatial
  • 16. 2020 2021 Q1 Q2 Q3 Q4Q4Q3 0.6 Spec. 0.7 Spec. 0.8 Spec. 0.9 Spec. 1.0 Spec. Industry Initiative Announced More Soon… Learn more at oneapi.com
  • 17. Middleware, Frameworks & Runtimes Applications & Services XPUs CPU GPU FPGA Intel® oneAPI Product Gold Available December 2020 ... Compatibility Tool Languages Libraries Analysis & Debug Tools Hardware Abstraction Layer
  • 18. Run Locally Run in the Cloud Get started quickly: code samples, quick-start guides, webinars, training software.intel.com/oneapi Downloads Repositories Containers DevCloud
  • 19. One Minute to Code No Hardware Acquisition No Download, Install or Configuration Support for Jupyter Notebooks, VS Code Easy Access to Samples and Tutorials
  • 20. ECOSYSTEM OF AI SOFTWARE STACK HW LIBRARIES & COMPILERS AI/ANALYTICS SOLUTIONS DL/ML/BIGDATA FRAMEWORKS XPU oneDNN CPU oneDAL oneCCL DATA SCIENTISTS & DATA ANALYSTS GPU ACCELERATERS M O D E L Z O O A N A L Y T I C S Z O O O P E N - V I N O ™ T E N S O R - F L O W P Y T H O N / N U M B A T V M P Y - T O R C H M X N E T S P A R K S Q L + M L / DL S c a l e O ut M O D I N N U M P Y X G - BO O S T S C I K I T - L E A R N P A N D A S
  • 21. 21 Achieving higher yields and efficiency Increasing production and uptime Transforming learning Enhancing safety Revolutionizing patient outcomes Turning data into value Pervasive Analytics & AI Agriculture Energy Education Government Finance Healthcare Empowering industry 4.0 Creating thrilling experiences Modernizing shopping Enabling homes to see, hear & respond Fueling automated driving Driving network efficiency Industrial Media Retail Smart Home Telecom Transport intel.com/customerspotlight
  • 22. INFERENCETRAINING FEATURE ENGINEERING AI TREND: END TO END GROWING DEMAND FOR END-TO-END AI PIPELINE DATA INGESTION
  • 23. Big Data AI Open-Source Software Platform Distributed TensorFlow, PyTorch, Keras, BigDL, RAY, and Apache Spark Reference Use Cases, AI Models, High-level APIs, Feature Engineering, etc. https://github.com/intel-analytics/analytics-zoo AI on Big Data Simplifying End-to-End Big Data AI Solutions Development
  • 25. • Big Data AI • Analytics Zoo overview • Summary Agenda
  • 26. • Big Data AI • Analytics Zoo overview • Summary Agenda
  • 27. Seamless Scaling from Laptop to Distributed Big Data Distributed, High-Performance Deep Learning Framework for Apache Spark https://github.com/intel-analytics/bigdl Unified Big Data AI Platform for TensorFlow, PyTorch, Keras, BigDL, OpenVINO, Ray and Apache Spark https://github.com/intel-analytics/analytics-zoo AI on Big Data
  • 28. Transformation of Big Data • Storing and processing more data • Analyzing (querying) more data • Real-time analysis • Modelling and prediction (ML/DL) AI is everywhere • Moving from experimentation to production • Applying to large-scale, distributed Big Data Big Data AI
  • 29. Case Study: Image Feature Extraction at JD.com Query Search Result Source: “Bringing deep learning into big data analytics using BigDL”, Xianyan Jia and Zhenhua Wang, Strata Data Conference Singapore 2017 Similar Image Search Image Deduplication Image Feature Extraction: Applications:
  • 30. * https://software.intel.com/en-us/articles/building-large-scale-image-feature-extraction-with-bigdl-at-jdcom * “BigDL: A Distributed Deep Learning Framework for Big Data”, ACM SoCC 2019, https://arxiv.org/abs/1804.05839 For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. • End-to-end Big Data AI pipeline (using BigDL on Apache Spark) • Efficiently scale out (3.83x speed-up vs. Nvidia GPU severs)* Case Study: Image Feature Extraction at JD.com
  • 31. Analytics Zoo: Software Platform for Big Data AI End-to-End Pipelines (Seamlessly scale AI models to distributed Big Data) ML Workflow (Automate tasks for building end-to-end pipelines) Models (Built-in models and algorithms) Compute Environment K8s Cluster Cloud Python Libraries (Numpy/Pandas/sklearn/…) DL Frameworks (TF/PyTorch/BigDL/OpenVINO/…) Distributed Analytics (Spark/Flink/Ray/…) Laptop Hadoop Cluster Powered by oneAPI https://github.com/intel-analytics/analytics-zoo
  • 32. End-to-End Big Data Analytics and AI Seamless Scaling from Laptop to Distributed Big Data Big Data Pipeline Prototype on laptop using sample data Experiment on clusters with history data Production deployment w/ distributed data pipeline • Easily prototype end-to-end pipelines that apply AI models to big data • “Zero” code change from laptop to distributed cluster • Seamlessly deployed on production Hadoop/K8s clusters • Automate the process of applying machine learning to big data
  • 33. • Big Data AI • Analytics Zoo overview • Summary Agenda
  • 34. Analytics Zoo: Software Platform for Big Data AI Recommendation Distributed TensorFlow & PyTorch on Spark Spark Dataframes & ML Pipelines for DL RayOnSpark InferenceModel Models & Algorithms End-to-end Pipelines Time Series Computer Vision NLP ML Workflow AutoML Automatic Cluster Serving Compute Environment K8s Cluster Cloud Python Libraries (Numpy/Pandas/sklearn/…) DL Frameworks (TF/PyTorch/BigDL/OpenVINO/…) Distributed Analytics (Spark/Flink/Ray/…) Laptop Hadoop Cluster Powered by oneAPI https://github.com/intel-analytics/analytics-zoo
  • 35. Analytics Zoo: Software Platform for Big Data AI Recommendation Spark Dataframes & ML Pipelines for DL Distributed TensorFlow & PyTorch on Spark InferenceModel Models & Algorithms End-to-end Pipelines Time Series Computer Vision NLP ML Workflow AutoML Automatic Cluster Serving Compute Environment K8s Cluster Cloud Python Libraries (Numpy/Pandas/sklearn/…) DL Frameworks (TF/PyTorch/BigDL/OpenVINO/…) Distributed Analytics (Spark/Flink/Ray/…) Laptop Hadoop Cluster Powered by oneAPI RayOnSpark https://github.com/intel-analytics/analytics-zoo
  • 36. Distributed TensorFlow/PyTorch on Spark Write TensorFlow/PyTorch inline with Spark code #pyspark code train_rdd = spark.hadoopFile(…).map(…) dataset = TFDataset.from_rdd(train_rdd,…) #tensorflow code import tensorflow as tf slim = tf.contrib.slim images, labels = dataset.tensors with slim.arg_scope(lenet.lenet_arg_scope()): logits, end_points = lenet.lenet(images, …) loss = tf.reduce_mean(tf.losses.sparse_softmax_cross_entropy( logits=logits, labels=labels)) #distributed training on Spark optimizer = TFOptimizer.from_loss(loss, Adam(…)) optimizer.optimize(end_trigger=MaxEpoch(5)) Analytics Zoo API in blue
  • 37. Network Quality Prediction in SK Telecom Distributed TensorFlow/PyTorch on Spark https://networkbuilders.intel.com/solutionslibrary/sk-telecom-intel-build-ai-pipeline-to-improve-network-quality For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 38. Analytics Zoo: Software Platform for Big Data AI Recommendation Spark Dataframes & ML Pipelines for DL Distributed TensorFlow & PyTorch on Spark InferenceModel Models & Algorithms End-to-end Pipelines Time Series Computer Vision NLP ML Workflow AutoML Automatic Cluster Serving Compute Environment K8s Cluster Cloud Python Libraries (Numpy/Pandas/sklearn/…) DL Frameworks (TF/PyTorch/BigDL/OpenVINO/…) Distributed Analytics (Spark/Flink/Ray/…) Laptop Hadoop Cluster Powered by oneAPI RayOnSpark https://github.com/intel-analytics/analytics-zoo
  • 39. • : distributed framework for emerging AI applications • RayOnSpark • Directly run Ray programs on Big Data cluster • Integrate Ray programs into Spark data pipeline https://medium.com/riselab/rayonspark-running-emerging-ai-applications-on-big-data-clusters-with-ray-and-analytics-zoo-923e0136ed6a RayOnSpark Run Ray Programs Directly on Big Data Platform
  • 40. RayOnSpark Run Ray Programs Directly on Big Data Platform Analytics Zoo API in blue sc = init_spark_on_yarn(...) ray_ctx = RayContext(sc=sc, ...) ray_ctx.init() #Ray code @ray.remote class TestRay(): def hostname(self): import socket return socket.gethostname() actors = [TestRay.remote() for i in range(0, 100)] print([ray.get(actor.hostname.remote()) for actor in actors]) ray_ctx.stop() https://medium.com/riselab/rayonspark-running-emerging-ai-applications-on-big-data-clusters-with-ray-and-analytics-zoo-923e0136ed6a
  • 41. Fast Food Recommendation in Burger King End-to-End Training Pipeline w/ RayOnSpark * https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king-with-rayonspark-2e7a6009dd2d * “Context-Aware Drive-thru Recommendation Service at Fast Food Restaurants”, https://arxiv.org/abs/2010.06197 DATA INGESTIO N FEATURE ENGINEERING TRAINING INFERENCE on
  • 42. Analytics Zoo: Software Platform for Big Data AI Recommendation Spark Dataframes & ML Pipelines for DL Distributed TensorFlow & PyTorch on Spark InferenceModel Models & Algorithms End-to-end Pipelines Time Series Computer Vision NLP ML Workflow AutoML Automatic Cluster Serving Compute Environment K8s Cluster Cloud Python Libraries (Numpy/Pandas/sklearn/…) DL Frameworks (TF/PyTorch/BigDL/OpenVINO/…) Distributed Analytics (Spark/Flink/Ray/…) Laptop Hadoop Cluster Powered by oneAPI RayOnSpark https://github.com/intel-analytics/analytics-zoo
  • 43. Scalable AutoML for Time Series Prediction Automated feature generation, model selection and hyper parameter tuning Analytics Zoo API in blue tsp = TimeSequencePredictor( dt_col="datetime", target_col="value") pipeline = tsp.fit(train_df, val_df, metric="mse", recipe=RandomRecipe()) pipeline.predict(test_df) https://medium.com/riselab/scalable-automl-for-time-series-prediction-using-ray-and-analytics-zoo- b79a6fd08139
  • 44. Scalable AutoML for Time Series Prediction Automated feature generation, model selection and hyper parameter tuning FeatureTransformer Model SearchEngine Search presets trial trial trial trial …best model /parameters trail jobs Pipeline with tunable parameters with tunable parameters configured with best parameters/model Each trial runs a different combination of hyper parameters Ray Tune rolling, scaling, feature generation, etc. Spark + Ray “Scalable AutoML for Time Series Forecasting using Ray”, USENIX OpML’20
  • 45. TI-One ML Platform in Tencent Cloud Scalable AutoML for Time Series Prediction Using Analytics Zoo in Tencent Cloud TI- One ML Platform Predicting NYC Taxi Passengers Using AutoML https://software.intel.com/content/www/us/en/develop/articles/tencent-cloud-leverages-analytics-zoo-to-improve-performance-of-ti-one-ml-platform.html
  • 46. “Zouwu” Open Source Framework for Time Series on Analytics Zoo Application framework for building end-to-end time series analysis • Use case - reference time series use cases • Models - built-in models for time series analysis • AutoTS - AutoML support for building E2E time series analysis pipelines Project Zouwu Built-in Models ML Workflow AutoML Workflow End-to-End Pipelines use-case models autots https://github.com/intel-analytics/analytics- zoo/tree/master/pyzoo/zoo/zouwu “Project Zouwu: Scalable AutoML for Telco Time Series Analysis using Ray and Analytics”, Ray Summit 2020
  • 47. Analytics Zoo: Software Platform for Big Data AI • E2E Big Data & AI pipeline (distributed TF/PyTorch/OpenVINO/Ray on Spark) • Advanced AI workflow (AutoML, Time-Series, Cluster Serving, etc.) Github • Project repo: https://github.com/intel-analytics/analytics-zoo • Use cases: https://analytics-zoo.github.io/master/#powered-by/ Technical paper/tutorials • CVPR 2020 tutorial: https://jason-dai.github.io/cvpr2018/ • ACM SoCC 2019 paper: https://arxiv.org/abs/1804.05839 • AAAI 2019 tutorial: https://jason-dai.github.io/aaai2019/ • CVPR 2018 tutorial: https://jason-dai.github.io/cvpr2018/ Conclusion
  • 49. Analytics Zoo: Software Platform for Big Data AI End-to-End Pipelines (Seamlessly scale AI models to distributed Big Data) ML Workflow (Automate tasks for building end-to-end pipelines) Compute Environment K8s Cluster Cloud Python Libraries (Numpy/Pandas/sklearn/…) DL Frameworks (TF/PyTorch/BigDL/OpenVINO/…) Distributed Analytics (Spark/Flink/Ray/…) Laptop Hadoop Cluster Powered by oneAPI Recommendation Time Series Computer Vision NLP https://github.com/intel-analytics/analytics-zoo
  • 50. • Recommendation • Time series analysis • Computer vision • Natural language processing (NLP) Big Data AI Applications on Analytics Zoo
  • 51. Food Recommendation Using Analytics Zoo in Burger King Guest arrives ODMB Checks Menu Board Cashier enters order Checks Menu Board Guest completes order * https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king-with-rayonspark-2e7a6009dd2d * “Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King”, Data + AI Summit Europe 2020
  • 52. Food Recommendation Challenges Challenges • Lack of user identifiers • Same session food compatibilities • Other variables in our use case: locations, weathers, time, etc. • Deployment challenges * https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king-with-rayonspark-2e7a6009dd2d * “Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King”, Data + AI Summit Europe 2020
  • 53. Transformer Cross Transformer (TxT) Model Model Components • Sequence Transformer • Taking item order sequence as input • Context Transformer • Taking multiple context features as input • Latent Cross Joint Training • Element-wise product for both transformer outputs * https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king-with-rayonspark-2e7a6009dd2d * “Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King”, Data + AI Summit Europe 2020
  • 54. Unified Big Data Processing and Model Training on Analytics Zoo CurrentPrevious * https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king-with-rayonspark-2e7a6009dd2d * “Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King”, Data + AI Summit Europe 2020
  • 55. Food Recommendation Using Analytics Zoo in Burger King Offline Training Result Model Top1 Accuracy Top3 Accuracy RNN 29.98% 46.24% Contextual ItemCF 32.18% 48.37% RNN Latent Cross 33.10% 49.98% TxT 34.52% 52.37% A/B Testing Result Model Conversation Rate Gain Add-on Sales Gain RNN Latent Cross (control) - - TxT +7.5% +4.7% * https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king-with-rayonspark-2e7a6009dd2d * “Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King”, Data + AI Summit Europe 2020
  • 56. Recommendation Using Analytic Zoo in Mastercard https://software.intel.com/en-us/articles/deep-learning-with-analytic-zoo-optimizes-mastercard-recommender-ai-service Train NCF Model Features Models Model Candidates Models sampled partition Training Data … Load Parquet Train Multiple Models Train Wide & Deep Model sampled partition sampled partition Spark ML Pipeline Stages Test Data Predictions Test Spark DataFramesParquet Files Feature Selections SparkMLPipeline Neural Recommender using Analytics Zoo Estimator Transformer Model Evaluation & Fine Tune Train ALS Model
  • 57. • Recommendation • Time series analysis • Computer vision • Natural language processing (NLP) Big Data AI Applications on Analytics Zoo
  • 58. Time Series Based Network Quality Prediction in SK Telecom • Predict Network Quality Indicators (CQI, RSRP, RSRQ, SINR, …)* for anomaly detection and real-time management * CQI : Channel Quality Indicator * RSRP : Reference Signal Received Power * RSRQ : Reference Signal Received Quality * SINR :Signal to Interference Noise Ratio * “Vectorized Deep Learning Acceleration from Preprocessing to Inference and Training on Apache Spark in SK Telecom”, Spark + AI Summit 2020 * https://networkbuilders.intel.com/solutionslibrary/sk-telecom-intel-build-ai-pipeline-to-improve-network-quality
  • 60. Memory Augmented Network – Test Result Improved predictions for sudden change! seq2seq Mem-network https://networkbuilders.intel.com/solutionslibrary/sk-telecom-intel-build-ai-pipeline-to-improve-network-quality
  • 61. Migrating to Analytics Zoo on Intel® Xeon Data Loader DRAM Store tiering forked. Flash Store customized. Data Source APIs Spark-SQL SQL Queries (Web, Jupyter) LegacyDesignwithGPU Export Preprocessing AITraining/Inference GPU Servers NewArchitecture: Unified DataAnalytic+AIPlatform Preprocessing RDDofTensor 2nd Generation Intel®Xeon® Scalable Processors csv files pandas, dask spark spark spark Manually manage separate clusters, and segregated workflow E2E architecture that atomically scales deep learning on Spark AIModelCodeofTF https://networkbuilders.intel.com/solutionslibrary/sk-telecom-intel-build-ai-pipeline-to-improve-network-quality
  • 62. Inference Pipeline Speed-up with Analytics Zoo * https://networkbuilders.intel.com/solutionslibrary/sk-telecom-intel-build-ai-pipeline-to-improve-network-quality Up-to 6x speedup for end-to-end inference on Analytics Zoo in SK Telecom*
  • 63. Training Pipeline Speed-up with Analytics Zoo Up-to 4x speedup for end-to-end training on Analytics Zoo in SK Telecom* * https://networkbuilders.intel.com/solutionslibrary/sk-telecom-intel-build-ai-pipeline-to-improve-network-quality
  • 64. Wind Power Prediction using Analytics Zoo in GoldWind LSTNet ETL Training Prediction Deployment Update DB Historical Power Wind Power Prediction • Accuracy improved to 79% (from previous 59%)* • 4x training speedup* For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. * https://www.intel.cn/content/www/cn/zh/analytics/artificial-intelligence/create-power-forecasting-solutions.html
  • 65. • Recommendation • Time series analysis • Computer vision • Natural language processing (NLP) Big Data AI Applications on Analytics Zoo
  • 66. Industrial Vision Inspection Using Analytics Zoo in Midea and KUKA https://software.intel.com/en-us/articles/industrial-inspection-platform-in-midea-and-kuka-using-distributed-tensorflow-on-analytics
  • 67. Industrial Vision Inspection Using Analytics Zoo in Midea and KUKA Edge to Cloud architecture using Analytics Zoo • 99.8% accuracy* • <50ms image processing latency* • >8x inference speedup* * https://software.intel.com/en-us/articles/industrial-inspection-platform-in-midea-and-kuka-using-distributed-tensorflow-on-analytics * https://www.intel.cn/content/www/cn/zh/analytics/artificial-intelligence/midea-case-study.html For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 68. AI-Assisted Radiology Using Analytics Zoo in Dell EMC Condition E Condition D Condition C Condition B Condition A Patient A Transfer Learning using ResNet-50 trained with ImageNet https://www.delltaechnologies.com/resources/en-us/asset/white-papers/solutions/h17686_hornet_wp.pdf chest X-rays
  • 69. • Recommendation • Time series analysis • Computer vision • Natural language processing (NLP) Big Data AI Applications on Analytics Zoo
  • 70. Customer Service Chatbot Using Analytics Zoo in Microsoft Azure *https://software.intel.com/en-us/articles/use-analytics-zoo-to-inject-ai-into-customer-service-platforms-on-microsoft-azure-part-1 *https://www.infoq.com/articles/analytics-zoo-qa-module/
  • 71. Job Recommendation Using Analytics Zoo in Talroo documents (resume, job description) https://software.intel.com/content/www/us/en/develop/articles/talroo-uses-analytics-zoo-and-aws-to-leverage-deep-learning-for-job-recommendations.html
  • 72. Analytics Zoo: Software Platform for Big Data AI • E2E Big Data & AI pipeline (distributed TF/PyTorch/OpenVINO/Ray on Spark) • Advanced AI workflow (AutoML, Time-Series, Cluster Serving, etc.) Github • Project repo: https://github.com/intel-analytics/analytics-zoo • Use cases: https://analytics-zoo.github.io/master/#powered-by/ Technical paper/tutorials • CVPR 2020 tutorial: https://jason-dai.github.io/cvpr2018/ • ACM SoCC 2019 paper: https://arxiv.org/abs/1804.05839 • AAAI 2019 tutorial: https://jason-dai.github.io/aaai2019/ • CVPR 2018 tutorial: https://jason-dai.github.io/cvpr2018/ Big Data AI
  • 73. Summary INDUSTRY INFLECTIONS ARE FUELING THE GROWTH OF DATA 5G Network Transformation, Artificial Intelligence, Intelligent Edge, Cloudification AI & ANALYTICS ARE THE DEFINING WORKLOADS OF THE NEXT DECADE with growing demand for end-to-end AI pipeline UNMATCHED PORTFOLIO BREADTH AND ECOSYSTEM SUPPORT Intel delivers a silicon & software foundation designed for the diverse range of use cases from the cloud to the edge ANALYTICS ZOO OPEN-SOURCE SOFTWARE PLATFORM FOR BIG DATA AI Simplifies End-to-End Big Data AI pipeline solutions development
  • 75. Notices & Disclaimers • Performance varies by use, configuration and other factors. Learn more at www.intel.com/performanceIndex • Performance may vary based on the specific game title and server configuration. To reference the full list of Intel Server GPU platform measurements, please refer to http://www.intel.com/content/www/us/en/benchmarks/server/graphics/IntelServerGPU • All product plans and roadmaps are subject to change without notice. • Intel technologies may require enabled hardware, software or service activation. • No product or component can be absolutely secure. • Your costs and results may vary. • Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy. • All product plans and roadmaps are subject to change without notice. • © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. Intel Server GPU TCO analysis is based on internal Intel research. Pricing as of 10/01/2020. Analysis assumes standard serving pricing, GPU list pricing, and software pricing based on estimated Nvidia software license costs of $1 per year for 5 years. • Intel Server GPU Performance may vary based on the specific game title and server configuration. To reference the full list of Intel Server GPU platform measurements, please refer to http://www.intel.com/content/www/us/en/benchmarks/server/graphics/IntelServerGPU • Video game footage courtesy of Tencent Games and Gamestream. • LEGO STAR WARS TITLES : © Lucasfilm Entertainment Company Ltd. or Lucasfilm Ltd. & ® or TM as indicated. All rights reserved. • LEGO, the LEGO logo and the Minifigure are trademarks of The LEGO Group. © The LEGO Group. All rights reserved. • “DiRT4”™ : © 2017 The Codemasters Software Company Limited ("Codemasters"). All rights reserved. "Codemasters"®, “EGO”®, the Codemasters logo, and “DiRT”® are registered trademarks owned by Codemasters. “DiRT4”™ and “RaceNet”™ are trademarks of Codemasters. All rights reserved. Under licence from International Management Group (UK) Limited. All other copyrights or trademarks are the property of their respective owners and are being used under license. Developed by Codemasters.