Intel optimized tensorflow, distributed deep learning

•

2 likes•1,478 views

This document discusses optimizations for running TensorFlow on Intel CPUs for deep learning. It outlines techniques for compiling TensorFlow from source with CPU optimizations, using proper data formats and batch sizes, and reading data with queues to leverage multi-core CPUs. It also covers distributed deep learning using TensorFlow Estimators, parameter servers, and model parallelism to distribute graphs across multiple machines. Resources for further information on Intel optimizations, installing libraries, and distributed TensorFlow are provided.

Technology

Intel Optimized
Tensorflow
Distributed Deep
Learning
GEETA CHAUHAN JUNE, 2017

Agenda
Tensorflow
Optimizations
for Intel CPUs
Distributed
Deep
Learning

Myth: GPUs are must for DL
 Leverage High Performance compute
tools
 Intel Python, Intel Math Kernel Library
(MKL), MKL-DNN
 Compile Tensorflow from Source for
CPU Optimizations
 Proper Data Format
 NCHW for CPUs vs Tensorflow default
NHWC
 Proper Batch size, using all cores &
memory
 Use Queues for Reading Data

Tensorflow CPU Optimizations
 Compile from source
 git clone https://github.com/tensorflow/tensorflow.git
 Run ./configure from Tensorflow source directory
 Select option MKL (CPU) Optimization
 Build pip package for install
 bazel build --config=mkl --copt=-DEIGEN_USE_VML -c opt
//tensorflow/tools/pip_package:build_pip_package
 Install the optimized TensorFlow wheel
 bazel-bin/tensorflow/tools/pip_package/build_pip_package
~/path_to_save_wheel
pip install --upgrade --user ~/path_to_save_wheel /wheel_name.whl

Distributed
Deep Learning
 Data Parallelism
 Tensorflow Estimator +
Experiments
 Parameter Server, Worker
cluster
 Intel BigDL Spark Cluster
 HyperTune Google Cloud ML
 Model Parallelism
 Tensorflow Model Towers
 Graph too large to fit on one
machine

Resources
 https://software.intel.com/en-us/articles/tensorflow-optimizations-on-
modern-intel-architecture
 https://software.intel.com/en-us/articles/how-to-install-the-python-
version-of-intel-daal-in-linux
 https://software.intel.com/en-us/articles/installing-intel-free-libs-and-
python-apt-repo
 https://www.tensorflow.org/deploy/distributed
 https://www.tensorflow.org/extend/estimators
 https://www.tensorflow.org/programmers_guide/reading_data#readin
g_from_files
 https://software.intel.com/en-us/articles/bigdl-distributed-deep-
learning-on-apache-spark

Questions?
Contact
https://www.linkedin.com/
in/geetachauhan/
geeta@svsg.co

What's hot

CPN211 My Datacenter Has Walls That Move - AWS re: Invent 2012Amazon Web Services

Slides for In-Datacenter Performance Analysis of a Tensor Processing UnitCarlo C. del Mundo

Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016MLconf

2017 04-13-google-tpu-04Brahim HAMADICHAREF

Nexxworks bootcamp ML6 (27/09/2017)Karel Dumon

Machine Learning with New Hardware ChallegensOscar Law

Google TPUHao(Robin) Dong

Advances in GPU ComputingFrédéric Parienté

running Tensorflow in ProductionMatthias Feys

GPU ComputingKhan Mostafa

Parallel computing with GpuRohit Khatana

Introduction to TensorFlowMatthias Feys

Graph lab in a NutShellMohit Ranjan

Google warehouse scale computerTejhaskar Ashok Kumar

Gpu presentationspartasoft

Cloud Roundtable at Microsoft Switzerland mictc

GPU - An IntroductionDhan V Sagar

GPU and Deep learning best practicesLior Sidi

HPC in the CloudGuy Tel-Zur

ML6 talk at Nexxworks BootcampKarel Dumon

What's hot (20)

CPN211 My Datacenter Has Walls That Move - AWS re: Invent 2012

Slides for In-Datacenter Performance Analysis of a Tensor Processing Unit

Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016

2017 04-13-google-tpu-04

Nexxworks bootcamp ML6 (27/09/2017)

Machine Learning with New Hardware Challegens

Google TPU

Advances in GPU Computing

running Tensorflow in Production

GPU Computing

Parallel computing with Gpu

Introduction to TensorFlow

Graph lab in a NutShell

Google warehouse scale computer

Gpu presentation

Cloud Roundtable at Microsoft Switzerland

GPU - An Introduction

GPU and Deep learning best practices

HPC in the Cloud

ML6 talk at Nexxworks Bootcamp

Similar to Intel optimized tensorflow, distributed deep learning

Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDatabricks

Best Practices and Performance Studies for High-Performance Computing ClustersIntel® Software

Python Data Science and Machine Learning at Scale with Intel and AnacondaIntel® Software

Intel Distribution for Python - Scaling for HPC and Big DataDESMOND YUEN

Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Chris Fregly

Parallel and Distributed Computing Chapter 8AbdullahMunir32

Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Tyrone Systems

Tensorflow Lite and ARM Compute LibraryKobe Yu

Intel new processorszaid_b

Clustering tensor flow con kubernetes y raspberry piAndrés Leonardo Martinez Ortiz

Training course lect1Noor Dhiya

Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...Chris Fregly

Python* Scalability in Production EnvironmentsIntel® Software

Intel python 2017DESMOND YUEN

Deep Learning with Spark and GPUsDataWorks Summit

TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...tdc-globalcode

Accelerating AI from the Cloud to the EdgeIntel® Software

TensorFlow for HPC?inside-BigData.com

Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSDatabricks

eBPF - Observability In DeepMydbops

Similar to Intel optimized tensorflow, distributed deep learning (20)

Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling

Best Practices and Performance Studies for High-Performance Computing Clusters

Python Data Science and Machine Learning at Scale with Intel and Anaconda

Intel Distribution for Python - Scaling for HPC and Big Data

Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...

Parallel and Distributed Computing Chapter 8

Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2

Tensorflow Lite and ARM Compute Library

Intel new processors

Clustering tensor flow con kubernetes y raspberry pi

Training course lect1

Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...

Python* Scalability in Production Environments

Intel python 2017

Deep Learning with Spark and GPUs

TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...

Accelerating AI from the Cloud to the Edge

TensorFlow for HPC?

Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS

eBPF - Observability In Deep

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

How to convert PDF to text with Nanonetsnaman860154

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

Artificial intelligence in the post-deep learning eraDeakin University

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Understanding the Laravel MVC ArchitecturePixlogix Infotech

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

Key Features Of Token Development (1).pptxLBM Solutions

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

My Hashitalk Indonesia April 2024 Presentation

How to convert PDF to text with Nanonets

DMCC Future of Trade Web3 - Special Edition

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

Scanning the Internet for External Cloud Exposures via SSL Certs

Artificial intelligence in the post-deep learning era

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Connect Wave/ connectwave Pitch Deck Presentation

Understanding the Laravel MVC Architecture

The Codex of Business Writing Software for Real-World Solutions 2.pptx

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi

08448380779 Call Girls In Friends Colony Women Seeking Men

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Key Features Of Token Development (1).pptx

Maximizing Board Effectiveness 2024 Webinar.pptx

Unblocking The Main Thread Solving ANRs and Frozen Frames

Streamlining Python Development: A Guide to a Modern Project Setup

Intel optimized tensorflow, distributed deep learning

1. Intel Optimized Tensorflow Distributed Deep Learning GEETA CHAUHAN JUNE, 2017

2. Agenda Tensorflow Optimizations for Intel CPUs Distributed Deep Learning

3. Myth: GPUs are must for DL  Leverage High Performance compute tools  Intel Python, Intel Math Kernel Library (MKL), MKL-DNN  Compile Tensorflow from Source for CPU Optimizations  Proper Data Format  NCHW for CPUs vs Tensorflow default NHWC  Proper Batch size, using all cores & memory  Use Queues for Reading Data

4. Tensorflow CPU Optimizations  Compile from source  git clone https://github.com/tensorflow/tensorflow.git  Run ./configure from Tensorflow source directory  Select option MKL (CPU) Optimization  Build pip package for install  bazel build --config=mkl --copt=-DEIGEN_USE_VML -c opt //tensorflow/tools/pip_package:build_pip_package  Install the optimized TensorFlow wheel  bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/path_to_save_wheel pip install --upgrade --user ~/path_to_save_wheel /wheel_name.whl

5. Distributed Deep Learning  Data Parallelism  Tensorflow Estimator + Experiments  Parameter Server, Worker cluster  Intel BigDL Spark Cluster  HyperTune Google Cloud ML  Model Parallelism  Tensorflow Model Towers  Graph too large to fit on one machine

6. Resources  https://software.intel.com/en-us/articles/tensorflow-optimizations-on- modern-intel-architecture  https://software.intel.com/en-us/articles/how-to-install-the-python- version-of-intel-daal-in-linux  https://software.intel.com/en-us/articles/installing-intel-free-libs-and- python-apt-repo  https://www.tensorflow.org/deploy/distributed  https://www.tensorflow.org/extend/estimators  https://www.tensorflow.org/programmers_guide/reading_data#readin g_from_files  https://software.intel.com/en-us/articles/bigdl-distributed-deep- learning-on-apache-spark

7. Questions? Contact https://www.linkedin.com/ in/geetachauhan/ geeta@svsg.co

Intel optimized tensorflow, distributed deep learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Intel optimized tensorflow, distributed deep learning

Similar to Intel optimized tensorflow, distributed deep learning (20)

More from geetachauhan

More from geetachauhan (18)

Recently uploaded

Recently uploaded (20)

Intel optimized tensorflow, distributed deep learning