SlideShare a Scribd company logo
Maurice Nsabimana,
World Bank Development Data Group
Jiao (Jennie) Wang,
Big Data Technologies, Intel Corporation
Agenda
• Use Case Overview and Problem Statement
• Solution Architecture
• Dataset, Model Development and Results
• Next Steps
• Call To Action
2
World Bank
The World Bank is a vital source of financial and technical assistance
to developing countries around the world. It is not a bank in the
ordinary sense but a unique partnership to reduce poverty and
support development. The World Bank Group comprises five
institutions managed by their member countries.
Established in 1944, the World Bank Group is headquartered in
Washington, D.C. It has more than 10,000 employees in more than
120 offices worldwide.
The Development Data Group provides high-quality national and
international statistics to clients within and outside the World Bank
and to improve the capacity of member countries to produce and use
statistical information.
4
PROBLEM STATEMENT
The International Comparison Program team in the World Bank Development Data Group
collected crowdsourced images for a pilot data collection study through a privately-
operated network of paid on-the-ground contributors that had access to a smartphone and
a data collection application designed for the pilot.
Nearly 3 million labeled images were collected as ground truth/metadata attached to each
price observation of 162 tightly specified items for a variety of household goods and
services. The use of common item specifications aimed at ensuring the quality, as well as
intra- and inter-country comparability, of the collected data.
Goal is to reduce labor intensive tasks of manually moderating (reviewing, searching and
sorting) the crowd-sourced images before their release as a public image dataset that could
be used to train various deep learning models.
5
Our Challenges
• Crowdsourced images are of different quality (resolution, close-up, blur, etc.)
• Crowdsourced images are not very focused.(Multiple items, item is small, etc.)
• Some images are possibly corrupted (cannot be resized or transformed)
• Images sourced from 15 different countries – different language groups
represented in the text example of images
• Some text is typed, some text is handwritten
6
7
Classifying Real Food Images is Not a Cat vs. Dog
Problem
Project Layout
Phase 1:
• Image preprocessing (eliminate poor quality images and invalid images)
• Classify images (by food type) to validate existing labels
Phase 2:
• Identify texts in the image and make bounding box around them
• Text recognition (words/sentences in the image text)
• Determine whether text contains PII (personally identifiable information)
• Blur areas with PII text
8
A
Analytics Zoo
AWS Cloud
Deep Learning
BigDL
Storage
Databricks File System
Solution Architecture
• Analytics Zoo
• BigDL 0.5.0
• Databricks Spark
• AWS S3
• AWS R4 instances
Databricks Spark
AWS S3
10
What Is BigDL?
BigDL is a distributed deep learning library for Apache Spark*
11
BigDL is Designed for Big Data
• Distributed deep learning framework for Apache Spark*
• Make deep learning more accessible to big data users
and data scientists
• Write deep learning applications as standard Spark programs
• Run on existing Spark/Hadoop clusters (no changes needed)
• Feature parity with popular deep learning frameworks
• E.g., Caffe, Torch, Tensorflow, etc.
• High performance
• Powered by Intel MKL and multi-threaded programming
• Efficient scale-out
• Leveraging Spark for distributed training & inference
12
BigDL as a Standard Spark Program
Distributed Deep learning applications (training, fine-tuning & prediction) on
Apache Spark*
• No changes to the existing Hadoop/Spark clusters needed
13
Model Interoperability Support
• Model Snapshots
• Long training work checkpoint
• Model deployment and sharing
• Fine-tune
• Caffe/Torch/Tensorflow Model Support
• Model file loading
• Easy to migrate your Caffe/Torch/Tensorflow
work to Spark
• NEW - BigDL supports loading pre-defined Keras models (Keras 1.2.2)
Caffe Model
File
Torch Model
File
Storage
BigDL
BigDL Model
File
Load
Save
Tensorflow
Model File
14
BigDL: Python API
• Supports deep learning model training,
evaluation, inference
• Supports Spark v1.5/1.6/2.X
• Supports Python 2.7/3.5/3.6
• Based on PySpark, Python API in BigDL
allows use of existing Python libs
(Numpy, Scipy, Pandas, Scikit-learn,
NLTK, Matplotlib, etc.)
15
Works with Interactive Notebooks
Running BigDL applications directly in Jupyter,
Zeppelin, Databricks notebooks, etc.
✓ Share and Reproduce
– Notebooks can be shared with others
– Easy to reproduce and track
✓ Rich Content
– Texts, images, videos, LaTeX and JavaScript
– Code can also produce rich contents
✓ Rich toolbox
– Apache Spark, from Python, R and Scala
– Pandas, scikit-learn, ggplot2, dplyr, etc.
16
Visualization for Learning
BigDL integration with TensorBoard
• TensorBoard is a suite of web applications from Google for visualizing and understanding deep
learning applications
17
BigDL Analytics Zoo
Analytics + AI Pipelines for Spark and BigDL
“Out-of-the-box” ready for use
• Reference use cases
• Fraud detection, time series prediction, sentiment analysis, chatbot, etc.
• Predefined models
• Object detection, image classification, text classification, recommendations, etc.
• Feature transformations
• Vision, text, 3D imaging, etc.
• High level APIs
• DataFrames, ML Pipelines, Keras/Keras2, etc.
18
Model Development - Phase 1
20
Transfer learning/Fine tuning from pre-trained Inception model to do classification
• Load pre-trained Inception-v1 model to BigDL
• Add FC layer with SoftMax classifier
• Train on partial dataset with pre-trained weights using Analytics Zoo on Spark Cluster
• Scale training on multi-node cluster in AWS Databricks to train large food dataset (1M
images)
Model Development - Phase 1
21
F
C
F
C
L
o
g
s
o
f
t
m
a
x
Inception v1 Customized Classifier
Code - Phase 1
22
Transfer learning Train Predict and Evaluation
Code - Phase 1
23
Fine tune Train Predict and Evaluation
Results - Phase 1
24
Training from scratch vs Transfer Learning vs Fine Tuning (on a partial dataset)
Dataset: 1927 images, 9 categories
Epoch Training Time(s) Accuracy
Training from scratch 40 1266 23.9
Transfer Learning 40 210 65.4
Fine Tuning 15 489 81.5
* Accuracy numbers are in that range due to a small part of dataset being used
Results - Phase 1
25
Fine tune with inception v1 on Food dataset
Dataset: 994325 images, 69 categories, 1TB
Nodes Cores Batch Size Epoch Training
Time(s)
Throughput Accuracy
20 30 1200 12 61125 170 81.7
* This model training was performed using multinode cluster on AWS R4.8xlarge instance with 20 nodes
Model vCPUs Memory (GiB)
Networking
Performance
r4.xlarge 4 30.5 Up to 10 Gigabit
Scaling Results – Phase 1
Nodes BatchSize Epoch
Throughp
ut
Training
Time
8 256 20 56.7 745.6
16 256 20 99.6 424.7
8 128 20 55.3 764.9
16 128 20 80.7 524.4
26
56.7
99.6
0
20
40
60
80
100
120
8 16
Throughput
nodes nodes
Test was run on AWS R4 instance that include the following
features:
• dual socket Intel Xeon E5 Broadwell processors (2.3 GHz)
• DDR4 memory
• Hardware Virtualization (HVM) only
Next Steps – Phase 2
27
• Image Quality Preprocessing
• Filter print text only
• Rescaling, Binarization, Noise Removal, Rotation / Deskewing (OpenCV,
Python, etc.)
• Detect text and bounding box circle text
• Reference model:
• Object Detection (Fast-RCNN, SSD)
• Tesseract (https://github.com/tesseract-ocr/tesseract)
• CTPN (https://github.com/tianzhi0549/CTPN)
Next Steps – Phase 2
28
• Recognize text
• Tesseract(https://github.com/tesseract-ocr/tesseract)
• CRNN (https://github.com/bgshih/crnn)
• Determine whether text contains PII (personally identifiable
information)
• Recognize PII with leading words (RNN)
• Blur areas with PII text
• Image tools
30
Key Takeaways
• AI on Apache Spark is a reality with use cases like World Bank
• BigDL makes distributed deep learning and AI more accessible both for big data users
and data scientists
• Analytics Zoo provides high level APIs that are more accessible to Data Scientists and
Spark Users (Keras, Spark DataFrame, etc.)
• BigDL can leverage existing Spark/Hadoop infrastructure and also runs deep learning
applications in the cloud (AWS, Azure, GCP, …)
• Join in and contribute to the project:
• Analytics Zoo: github.com/intel-analytics/analytics-zoo
• BigDL: github.com/intel-analytics/BigDL
Call to Action
• Try BigDL on AWS - lookup BigDL AMI in AWS Marketplace
31
• Try image classification with Analytics Zoo using BigDL– this use case code
is shared on https://github.com/intel-analytics/WorldBankPoC
Resources
https://github.com/intel-
analytics/analytics-zoo
https://github.com/intel-analytics/BigDL
https://software.intel.com/bigdl
Join BigDL Mail List
bigdl-user-group+subscribe@googlegroups.com
Report Bugs and Create Feature Request
https://github.com/intel-analytics/BigDL/issues 32
Thank you
Special thanks to:
-Vegnonveg contributors (Dave Nielsen, Ashley Zhao, Alex Kalinin, Sujee Maniyam, Rashim
Khadka & Goutham Nekkalapu)
-BigDL user group
-Amazon AWS team
-Databricks 33
Alex Kalinin
Maniyam
Maniyam
Maniyam
Notices and Disclaimers
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such
as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of
those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your
contemplated purchases, including the performance of that product when combined with other products. For more complete information visit:
http://www.intel.com/performance.
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect
actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information
about performance and benchmark results, visit www.intel.com/benchmarks.
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation.
Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or
retailer or learn more at intel.com.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications.
Current characterized errata are available on request.
Intel, the Intel logo, Xeon, Xeon Phi and Nervana are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others
© 2018 Intel Corporation. All rights reserved.
35
The findings, interpretations, and conclusions expressed in this document do not necessarily reflect the views of the World Bank Group.

More Related Content

What's hot

PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona Keynote
Travis Oliphant
 
PyData Introduction
PyData IntroductionPyData Introduction
PyData Introduction
Travis Oliphant
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the Future
Odinot Stanislas
 
Know thy logos
Know thy logosKnow thy logos
Know thy logos
Vishal V
 
Notes on data-intensive processing with Hadoop Mapreduce
Notes on data-intensive processing with Hadoop MapreduceNotes on data-intensive processing with Hadoop Mapreduce
Notes on data-intensive processing with Hadoop Mapreduce
Evert Lammerts
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata
Mk Kim
 
David Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TESDavid Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TES
SysFera
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
MLconf
 
High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC Clusters
Saliya Ekanayake
 
Mahout Introduction BarCampDC
Mahout Introduction BarCampDCMahout Introduction BarCampDC
Mahout Introduction BarCampDC
Drew Farris
 
Hadoop
HadoopHadoop
Hadoop
Anil Reddy
 
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontSome "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data front
Greg Landrum
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Mahantesh Angadi
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
Evert Lammerts
 
Chemical Databases and Open Chemistry on the Desktop
Chemical Databases and Open Chemistry on the DesktopChemical Databases and Open Chemistry on the Desktop
Chemical Databases and Open Chemistry on the Desktop
Marcus Hanwell
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
Bart Vandewoestyne
 
Scaling hadoopapplications
Scaling hadoopapplicationsScaling hadoopapplications
Scaling hadoopapplications
Milind Bhandarkar
 
BigData HUB Workshop
BigData HUB WorkshopBigData HUB Workshop
BigData HUB Workshop
Ahmed Salman
 
Social Networks Analysis
Social Networks AnalysisSocial Networks Analysis
Social Networks Analysis
Joud Khattab
 
Large-Scale Data Storage and Processing for Scientists with Hadoop
Large-Scale Data Storage and Processing for Scientists with HadoopLarge-Scale Data Storage and Processing for Scientists with Hadoop
Large-Scale Data Storage and Processing for Scientists with Hadoop
Evert Lammerts
 

What's hot (20)

PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona Keynote
 
PyData Introduction
PyData IntroductionPyData Introduction
PyData Introduction
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the Future
 
Know thy logos
Know thy logosKnow thy logos
Know thy logos
 
Notes on data-intensive processing with Hadoop Mapreduce
Notes on data-intensive processing with Hadoop MapreduceNotes on data-intensive processing with Hadoop Mapreduce
Notes on data-intensive processing with Hadoop Mapreduce
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata
 
David Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TESDavid Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TES
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
 
High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC Clusters
 
Mahout Introduction BarCampDC
Mahout Introduction BarCampDCMahout Introduction BarCampDC
Mahout Introduction BarCampDC
 
Hadoop
HadoopHadoop
Hadoop
 
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontSome "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data front
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
Chemical Databases and Open Chemistry on the Desktop
Chemical Databases and Open Chemistry on the DesktopChemical Databases and Open Chemistry on the Desktop
Chemical Databases and Open Chemistry on the Desktop
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
Scaling hadoopapplications
Scaling hadoopapplicationsScaling hadoopapplications
Scaling hadoopapplications
 
BigData HUB Workshop
BigData HUB WorkshopBigData HUB Workshop
BigData HUB Workshop
 
Social Networks Analysis
Social Networks AnalysisSocial Networks Analysis
Social Networks Analysis
 
Large-Scale Data Storage and Processing for Scientists with Hadoop
Large-Scale Data Storage and Processing for Scientists with HadoopLarge-Scale Data Storage and Processing for Scientists with Hadoop
Large-Scale Data Storage and Processing for Scientists with Hadoop
 

Similar to Using Crowdsourced Images to Create Image Recognition Models with Analytics Zoo using BigDL

Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Databricks
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
Amazon Web Services
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
Dony Riyanto
 
GDSC Cloud Jam.pptx
GDSC Cloud Jam.pptxGDSC Cloud Jam.pptx
GDSC Cloud Jam.pptx
GDSCIITBhilai
 
IBM Developer Model Asset eXchange
IBM Developer Model Asset eXchangeIBM Developer Model Asset eXchange
IBM Developer Model Asset eXchange
Nick Pentreath
 
Automate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking EcosystemAutomate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking Ecosystem
Hellmar Becker
 
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Databricks
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1
Bill Liu
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
Machine Learning for Capacity Management
 Machine Learning for Capacity Management Machine Learning for Capacity Management
Machine Learning for Capacity Management
EDB
 
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr..."Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...
Edge AI and Vision Alliance
 
Python Open CV
Python Open CVPython Open CV
Python Open CV
Tarun Bamba
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdb
jixuan1989
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Mihai Criveti
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Databricks
 
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXTDriving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
DataWorks Summit
 
PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)
Stratebi
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
Big Data Value Association
 
SCAPE - Scalable Preservation Environments
SCAPE - Scalable Preservation EnvironmentsSCAPE - Scalable Preservation Environments
SCAPE - Scalable Preservation Environments
SCAPE Project
 

Similar to Using Crowdsourced Images to Create Image Recognition Models with Analytics Zoo using BigDL (20)

Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
GDSC Cloud Jam.pptx
GDSC Cloud Jam.pptxGDSC Cloud Jam.pptx
GDSC Cloud Jam.pptx
 
IBM Developer Model Asset eXchange
IBM Developer Model Asset eXchangeIBM Developer Model Asset eXchange
IBM Developer Model Asset eXchange
 
Automate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking EcosystemAutomate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking Ecosystem
 
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Machine Learning for Capacity Management
 Machine Learning for Capacity Management Machine Learning for Capacity Management
Machine Learning for Capacity Management
 
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr..."Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...
 
Python Open CV
Python Open CVPython Open CV
Python Open CV
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdb
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
 
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXTDriving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
 
PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
 
SCAPE - Scalable Preservation Environments
SCAPE - Scalable Preservation EnvironmentsSCAPE - Scalable Preservation Environments
SCAPE - Scalable Preservation Environments
 

Recently uploaded

A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
Jio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdfJio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdf
inaya7568
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
tzu5xla
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 

Recently uploaded (20)

A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
Jio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdfJio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdf
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 

Using Crowdsourced Images to Create Image Recognition Models with Analytics Zoo using BigDL

  • 1. Maurice Nsabimana, World Bank Development Data Group Jiao (Jennie) Wang, Big Data Technologies, Intel Corporation
  • 2. Agenda • Use Case Overview and Problem Statement • Solution Architecture • Dataset, Model Development and Results • Next Steps • Call To Action 2
  • 3.
  • 4. World Bank The World Bank is a vital source of financial and technical assistance to developing countries around the world. It is not a bank in the ordinary sense but a unique partnership to reduce poverty and support development. The World Bank Group comprises five institutions managed by their member countries. Established in 1944, the World Bank Group is headquartered in Washington, D.C. It has more than 10,000 employees in more than 120 offices worldwide. The Development Data Group provides high-quality national and international statistics to clients within and outside the World Bank and to improve the capacity of member countries to produce and use statistical information. 4
  • 5. PROBLEM STATEMENT The International Comparison Program team in the World Bank Development Data Group collected crowdsourced images for a pilot data collection study through a privately- operated network of paid on-the-ground contributors that had access to a smartphone and a data collection application designed for the pilot. Nearly 3 million labeled images were collected as ground truth/metadata attached to each price observation of 162 tightly specified items for a variety of household goods and services. The use of common item specifications aimed at ensuring the quality, as well as intra- and inter-country comparability, of the collected data. Goal is to reduce labor intensive tasks of manually moderating (reviewing, searching and sorting) the crowd-sourced images before their release as a public image dataset that could be used to train various deep learning models. 5
  • 6. Our Challenges • Crowdsourced images are of different quality (resolution, close-up, blur, etc.) • Crowdsourced images are not very focused.(Multiple items, item is small, etc.) • Some images are possibly corrupted (cannot be resized or transformed) • Images sourced from 15 different countries – different language groups represented in the text example of images • Some text is typed, some text is handwritten 6
  • 7. 7 Classifying Real Food Images is Not a Cat vs. Dog Problem
  • 8. Project Layout Phase 1: • Image preprocessing (eliminate poor quality images and invalid images) • Classify images (by food type) to validate existing labels Phase 2: • Identify texts in the image and make bounding box around them • Text recognition (words/sentences in the image text) • Determine whether text contains PII (personally identifiable information) • Blur areas with PII text 8
  • 9.
  • 10. A Analytics Zoo AWS Cloud Deep Learning BigDL Storage Databricks File System Solution Architecture • Analytics Zoo • BigDL 0.5.0 • Databricks Spark • AWS S3 • AWS R4 instances Databricks Spark AWS S3 10
  • 11. What Is BigDL? BigDL is a distributed deep learning library for Apache Spark* 11
  • 12. BigDL is Designed for Big Data • Distributed deep learning framework for Apache Spark* • Make deep learning more accessible to big data users and data scientists • Write deep learning applications as standard Spark programs • Run on existing Spark/Hadoop clusters (no changes needed) • Feature parity with popular deep learning frameworks • E.g., Caffe, Torch, Tensorflow, etc. • High performance • Powered by Intel MKL and multi-threaded programming • Efficient scale-out • Leveraging Spark for distributed training & inference 12
  • 13. BigDL as a Standard Spark Program Distributed Deep learning applications (training, fine-tuning & prediction) on Apache Spark* • No changes to the existing Hadoop/Spark clusters needed 13
  • 14. Model Interoperability Support • Model Snapshots • Long training work checkpoint • Model deployment and sharing • Fine-tune • Caffe/Torch/Tensorflow Model Support • Model file loading • Easy to migrate your Caffe/Torch/Tensorflow work to Spark • NEW - BigDL supports loading pre-defined Keras models (Keras 1.2.2) Caffe Model File Torch Model File Storage BigDL BigDL Model File Load Save Tensorflow Model File 14
  • 15. BigDL: Python API • Supports deep learning model training, evaluation, inference • Supports Spark v1.5/1.6/2.X • Supports Python 2.7/3.5/3.6 • Based on PySpark, Python API in BigDL allows use of existing Python libs (Numpy, Scipy, Pandas, Scikit-learn, NLTK, Matplotlib, etc.) 15
  • 16. Works with Interactive Notebooks Running BigDL applications directly in Jupyter, Zeppelin, Databricks notebooks, etc. ✓ Share and Reproduce – Notebooks can be shared with others – Easy to reproduce and track ✓ Rich Content – Texts, images, videos, LaTeX and JavaScript – Code can also produce rich contents ✓ Rich toolbox – Apache Spark, from Python, R and Scala – Pandas, scikit-learn, ggplot2, dplyr, etc. 16
  • 17. Visualization for Learning BigDL integration with TensorBoard • TensorBoard is a suite of web applications from Google for visualizing and understanding deep learning applications 17
  • 18. BigDL Analytics Zoo Analytics + AI Pipelines for Spark and BigDL “Out-of-the-box” ready for use • Reference use cases • Fraud detection, time series prediction, sentiment analysis, chatbot, etc. • Predefined models • Object detection, image classification, text classification, recommendations, etc. • Feature transformations • Vision, text, 3D imaging, etc. • High level APIs • DataFrames, ML Pipelines, Keras/Keras2, etc. 18
  • 19.
  • 20. Model Development - Phase 1 20 Transfer learning/Fine tuning from pre-trained Inception model to do classification • Load pre-trained Inception-v1 model to BigDL • Add FC layer with SoftMax classifier • Train on partial dataset with pre-trained weights using Analytics Zoo on Spark Cluster • Scale training on multi-node cluster in AWS Databricks to train large food dataset (1M images)
  • 21. Model Development - Phase 1 21 F C F C L o g s o f t m a x Inception v1 Customized Classifier
  • 22. Code - Phase 1 22 Transfer learning Train Predict and Evaluation
  • 23. Code - Phase 1 23 Fine tune Train Predict and Evaluation
  • 24. Results - Phase 1 24 Training from scratch vs Transfer Learning vs Fine Tuning (on a partial dataset) Dataset: 1927 images, 9 categories Epoch Training Time(s) Accuracy Training from scratch 40 1266 23.9 Transfer Learning 40 210 65.4 Fine Tuning 15 489 81.5 * Accuracy numbers are in that range due to a small part of dataset being used
  • 25. Results - Phase 1 25 Fine tune with inception v1 on Food dataset Dataset: 994325 images, 69 categories, 1TB Nodes Cores Batch Size Epoch Training Time(s) Throughput Accuracy 20 30 1200 12 61125 170 81.7 * This model training was performed using multinode cluster on AWS R4.8xlarge instance with 20 nodes
  • 26. Model vCPUs Memory (GiB) Networking Performance r4.xlarge 4 30.5 Up to 10 Gigabit Scaling Results – Phase 1 Nodes BatchSize Epoch Throughp ut Training Time 8 256 20 56.7 745.6 16 256 20 99.6 424.7 8 128 20 55.3 764.9 16 128 20 80.7 524.4 26 56.7 99.6 0 20 40 60 80 100 120 8 16 Throughput nodes nodes Test was run on AWS R4 instance that include the following features: • dual socket Intel Xeon E5 Broadwell processors (2.3 GHz) • DDR4 memory • Hardware Virtualization (HVM) only
  • 27. Next Steps – Phase 2 27 • Image Quality Preprocessing • Filter print text only • Rescaling, Binarization, Noise Removal, Rotation / Deskewing (OpenCV, Python, etc.) • Detect text and bounding box circle text • Reference model: • Object Detection (Fast-RCNN, SSD) • Tesseract (https://github.com/tesseract-ocr/tesseract) • CTPN (https://github.com/tianzhi0549/CTPN)
  • 28. Next Steps – Phase 2 28 • Recognize text • Tesseract(https://github.com/tesseract-ocr/tesseract) • CRNN (https://github.com/bgshih/crnn) • Determine whether text contains PII (personally identifiable information) • Recognize PII with leading words (RNN) • Blur areas with PII text • Image tools
  • 29.
  • 30. 30 Key Takeaways • AI on Apache Spark is a reality with use cases like World Bank • BigDL makes distributed deep learning and AI more accessible both for big data users and data scientists • Analytics Zoo provides high level APIs that are more accessible to Data Scientists and Spark Users (Keras, Spark DataFrame, etc.) • BigDL can leverage existing Spark/Hadoop infrastructure and also runs deep learning applications in the cloud (AWS, Azure, GCP, …) • Join in and contribute to the project: • Analytics Zoo: github.com/intel-analytics/analytics-zoo • BigDL: github.com/intel-analytics/BigDL
  • 31. Call to Action • Try BigDL on AWS - lookup BigDL AMI in AWS Marketplace 31 • Try image classification with Analytics Zoo using BigDL– this use case code is shared on https://github.com/intel-analytics/WorldBankPoC
  • 32. Resources https://github.com/intel- analytics/analytics-zoo https://github.com/intel-analytics/BigDL https://software.intel.com/bigdl Join BigDL Mail List bigdl-user-group+subscribe@googlegroups.com Report Bugs and Create Feature Request https://github.com/intel-analytics/BigDL/issues 32
  • 33. Thank you Special thanks to: -Vegnonveg contributors (Dave Nielsen, Ashley Zhao, Alex Kalinin, Sujee Maniyam, Rashim Khadka & Goutham Nekkalapu) -BigDL user group -Amazon AWS team -Databricks 33 Alex Kalinin Maniyam Maniyam Maniyam
  • 34.
  • 35. Notices and Disclaimers Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Intel, the Intel logo, Xeon, Xeon Phi and Nervana are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others © 2018 Intel Corporation. All rights reserved. 35 The findings, interpretations, and conclusions expressed in this document do not necessarily reflect the views of the World Bank Group.