SlideShare a Scribd company logo
TensorFlow
Internal
Hyunghun Cho
(webofthink@snu.ac.kr)
1
Overview
■ Dataflow-like model
■ Runs on a wide variety of different H/W platform
2※ Source: tensorflow.org
※ Source: github.com/zer0n/deepframeworks
Basic concepts
■ Tensor
– definition: an array with more than two axes
– arbitrary dimensionality array
■ Directed graph describes T/F computation
– node: instantiation of an Operation
■ Operation
– an abstract computation
– have attribute(s)
■ Kernel
– particular implementation of an Operation
– run on a type of device (e.g. CPU, GPU)
■ Variable
– special Operation to persistent mutable Tensor
■ Session
– Created to interact with T/F system
3
nodein out
0…* 0…*
※ Source: T/F white paper
Programming Model
■ Example T/F code and corresponding computation graph
■ Single machine and distributed system architecture
4※ Source: T/F white paper
Previous work
■ DistBelief
– Downpour SGD
– Sandblaster L-BFGS
■ Related to
– Project Adam
• MSR
– Parameter
Server project
5
※ Source: Large Scale Distributed Deep Networks
※ Source: parameter server architecture github wiki
※ Source: Project Adam paper
Feature Comparison
Feature
Tensor
Flow
Theano Torch Caffe Chainer CNTK
Run on
Single Machine
O O O O O O
Run on
Distributed
Machines
O X X X X O
Symbolic
differentiation
O O X X O X
Implemented by
C++
O X X O X X
6
※ Source: T/F white paper
■ For detail, refer to Wikipedia
Execution Mode
■ Single Device
■ Multi Device
– Node placement
– Cross-Device Communication
■ Distributed
– Fault Tolerance
• Error handling between Send-Receive node pair
• Periodic health check to worker process
7
Programming Idioms
■ Programming Idioms
– Data Parallel Training
• sequential SGD
– Model Parallel Training
• Recurrent deep LSTM
– Concurrent Steps
8
Code Metrics
■ Source
– https://github.com/tensorflow/tensorflow
■ Code Summary
– Total 114MB
• 3373 files including C/C++, python, HTML, …
– Top 5 languages for implementation
• C++ and Python are the major languages
• Protocol Buffers: provide mechanism for serializing structured data
9
language files blank comment code
C++ 1092 46473 43399 276160
C/C++ Header 779 23457 44727 86274
Python 641 27622 46660 97570
Protocol Buffers 179 2217 7294 8724
Java 167 8296 17325 49374
C# 116 4285 8653 34347
How it works
■ Python-C++ connection with SWIG wrapper
10
[tensorflow.i] [py_func.i]
[py_func.h] [py_func.cc]
v v
Code Structure
■ C++ implementation under /core folder
11
Folder C/C++ Header C++ Protocol Buffers 총합계
./tensorflow/core/client/ 511 511
./tensorflow/core/common_runtime/ 1384 8526 9910
./tensorflow/core/common_runtime/gpu/ 644 3674 4318
./tensorflow/core/distributed_runtime/ 581 2579 3160
./tensorflow/core/distributed_runtime/rpc/ 434 2759 3193
./tensorflow/core/example/ 116 209 45 370
./tensorflow/core/framework/ 3539 14022 451 18012
./tensorflow/core/graph/ 952 5586 6538
./tensorflow/core/kernels/ 9180 42188 11 51379
./tensorflow/core/lib/core/ 573 1240 25 1838
./tensorflow/core/lib/gtl/ 1452 1943 3395
./tensorflow/core/lib/hash/ 36 400 436
./tensorflow/core/lib/histogram/ 60 324 384
./tensorflow/core/lib/io/ 340 2134 2474
./tensorflow/core/lib/jpeg/ 78 767 845
./tensorflow/core/lib/png/ 37 311 348
./tensorflow/core/lib/random/ 690 856 1546
./tensorflow/core/lib/strings/ 532 3111 3643
./tensorflow/core/lib/wav/ 13 166 179
./tensorflow/core/ops/ 9346 9346
./tensorflow/core/ops/compat/ 25 204 229
./tensorflow/core/platform/ 805 738 1543
./tensorflow/core/platform/default/ 349 290 639
./tensorflow/core/platform/posix/ 31 656 687
./tensorflow/core/protobuf/ 333 333
./tensorflow/core/public/ 202 202
./tensorflow/core/user_ops/ 20 20
./tensorflow/core/util/ 1354 4426 170 5950
./tensorflow/core/util/ctc/ 600 298 898
./tensorflow/core/util/sparse/ 504 498 1002
총합계 24511 107782 1035 133328
C++ framework
■ Key classes
12
C++ kernels
■ Inherit from OpKernel
■ Kernel is implemented per CPU / GPU [How to]
– GPU version uses CUDA library
13
[constant_op.h]
[constant_op.cc]
[constant_op_gpu.cu.cc]
Code Structure
■ Python implementation under /python folder
14
Folder C/C++ Header C++ Protocol Buffers Python 총합계
./tensorflow/python/ 168 168
./tensorflow/python/client/ 33 475 2031 2539
./tensorflow/python/framework/ 13 686 7097 7796
./tensorflow/python/kernel_tests/ 25391 25391
./tensorflow/python/lib/core/ 26 316 342
./tensorflow/python/lib/io/ 52 75 31 158
./tensorflow/python/ops/ 14995 14995
./tensorflow/python/platform/ 888 888
./tensorflow/python/platform/default
/
389 389
./tensorflow/python/summary/ 1168 1168
./tensorflow/python/summary/impl/ 693 693
./tensorflow/python/tools/ 280 280
./tensorflow/python/training/ 6 7732 7738
./tensorflow/python/user_ops/ 7 7
./tensorflow/python/util/ 51 51
총합계 124 1552 6 60921 62603
Python Implementation
■ Operations
■ Trainings
15
Code Summary
■ The Python part
– Various operations and trainings
– API:
• the most complete and the easiest to use
■ The C++ part
– Framework and kernel functions
– API:
• offer some performance advantages
• supports deployment to small devices such as Android
16
Meta Framework
■ Keras
■ TensorFlow Slim
– a lightweight library for defining, training and evaluating models
■ Skflow
– provide Scikit Learn style API
■ PrettyTensor
– support a chainable object syntax to quickly define neural networks
■ TFLearn
– a modular and transparent deep learning library
17

More Related Content

What's hot

“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
Edge AI and Vision Alliance
 

What's hot (20)

Hopper アーキテクチャで、変わること、変わらないこと
Hopper アーキテクチャで、変わること、変わらないことHopper アーキテクチャで、変わること、変わらないこと
Hopper アーキテクチャで、変わること、変わらないこと
 
TFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesTFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU Delegates
 
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
 
人工知能技術を用いた各医学画像処理の基礎 (2022/09/09)
人工知能技術を用いた各医学画像処理の基礎 (2022/09/09)人工知能技術を用いた各医学画像処理の基礎 (2022/09/09)
人工知能技術を用いた各医学画像処理の基礎 (2022/09/09)
 
Tensorflow presentation
Tensorflow presentationTensorflow presentation
Tensorflow presentation
 
Making Linux do Hard Real-time
Making Linux do Hard Real-timeMaking Linux do Hard Real-time
Making Linux do Hard Real-time
 
帰ってきた凄い謎マシン (ARMのDevice Tree話, 2015年6月 東海道らぐ浜松)
帰ってきた凄い謎マシン (ARMのDevice Tree話, 2015年6月 東海道らぐ浜松)帰ってきた凄い謎マシン (ARMのDevice Tree話, 2015年6月 東海道らぐ浜松)
帰ってきた凄い謎マシン (ARMのDevice Tree話, 2015年6月 東海道らぐ浜松)
 
Resnet
ResnetResnet
Resnet
 
Embedded Hypervisor for ARM
Embedded Hypervisor for ARMEmbedded Hypervisor for ARM
Embedded Hypervisor for ARM
 
Reservoir Computing Overview (with emphasis on Liquid State Machines)
Reservoir Computing Overview (with emphasis on Liquid State Machines)Reservoir Computing Overview (with emphasis on Liquid State Machines)
Reservoir Computing Overview (with emphasis on Liquid State Machines)
 
Architecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPUArchitecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPU
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Userspace networking
Userspace networkingUserspace networking
Userspace networking
 
Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0
 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural network
 
K-Fashion 경진대회 3등 수상자 솔루션
K-Fashion 경진대회 3등 수상자 솔루션K-Fashion 경진대회 3등 수상자 솔루션
K-Fashion 경진대회 3등 수상자 솔루션
 
Multicore Processor Technology
Multicore Processor TechnologyMulticore Processor Technology
Multicore Processor Technology
 
AI Hardware Landscape 2021
AI Hardware Landscape 2021AI Hardware Landscape 2021
AI Hardware Landscape 2021
 
FPGA・リコンフィギャラブルシステム研究の最新動向
FPGA・リコンフィギャラブルシステム研究の最新動向FPGA・リコンフィギャラブルシステム研究の最新動向
FPGA・リコンフィギャラブルシステム研究の最新動向
 
携帯SoCでの画像処理とHalide
携帯SoCでの画像処理とHalide携帯SoCでの画像処理とHalide
携帯SoCでの画像処理とHalide
 

Viewers also liked

Viewers also liked (6)

파알못의 파이썬 크롤러 이해하기
파알못의 파이썬 크롤러 이해하기파알못의 파이썬 크롤러 이해하기
파알못의 파이썬 크롤러 이해하기
 
배워봅시다 머신러닝 with TensorFlow
배워봅시다 머신러닝 with TensorFlow배워봅시다 머신러닝 with TensorFlow
배워봅시다 머신러닝 with TensorFlow
 
Howto_Tensorflow+Linear Regression
Howto_Tensorflow+Linear RegressionHowto_Tensorflow+Linear Regression
Howto_Tensorflow+Linear Regression
 
텐서플로 걸음마 (TensorFlow Tutorial)
텐서플로 걸음마 (TensorFlow Tutorial)텐서플로 걸음마 (TensorFlow Tutorial)
텐서플로 걸음마 (TensorFlow Tutorial)
 
Deep dive into deeplearn.js
Deep dive into deeplearn.jsDeep dive into deeplearn.js
Deep dive into deeplearn.js
 
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
 

Similar to Tensorflow internal

Serving Deep Learning Models At Scale With RedisAI: Luca Antiga
Serving Deep Learning Models At Scale With RedisAI: Luca AntigaServing Deep Learning Models At Scale With RedisAI: Luca Antiga
Serving Deep Learning Models At Scale With RedisAI: Luca Antiga
Redis Labs
 
Concurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System DiscussionConcurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System Discussion
CherryBerry2
 
LCU14 302- How to port OP-TEE to another platform
LCU14 302- How to port OP-TEE to another platformLCU14 302- How to port OP-TEE to another platform
LCU14 302- How to port OP-TEE to another platform
Linaro
 
Tuning parallelcodeonsolaris005
Tuning parallelcodeonsolaris005Tuning parallelcodeonsolaris005
Tuning parallelcodeonsolaris005
dflexer
 

Similar to Tensorflow internal (20)

Rlite software-architecture (1)
Rlite software-architecture (1)Rlite software-architecture (1)
Rlite software-architecture (1)
 
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSDLAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
 
Designing Tracing Tools
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing Tools
 
Serving Deep Learning Models At Scale With RedisAI: Luca Antiga
Serving Deep Learning Models At Scale With RedisAI: Luca AntigaServing Deep Learning Models At Scale With RedisAI: Luca Antiga
Serving Deep Learning Models At Scale With RedisAI: Luca Antiga
 
Continuous Go Profiling & Observability
Continuous Go Profiling & ObservabilityContinuous Go Profiling & Observability
Continuous Go Profiling & Observability
 
MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103
 
BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!
 
Using Netconf/Yang with OpenDalight
Using Netconf/Yang with OpenDalightUsing Netconf/Yang with OpenDalight
Using Netconf/Yang with OpenDalight
 
OSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable SwitchOSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable Switch
 
Concurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System DiscussionConcurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System Discussion
 
Fletcher Framework for Programming FPGA
Fletcher Framework for Programming FPGAFletcher Framework for Programming FPGA
Fletcher Framework for Programming FPGA
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
3.2 process text streams using filters
3.2 process text streams using filters3.2 process text streams using filters
3.2 process text streams using filters
 
Threads and multi threading
Threads and multi threadingThreads and multi threading
Threads and multi threading
 
A Peek into TFRT
A Peek into TFRTA Peek into TFRT
A Peek into TFRT
 
Designing Tracing Tools
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing Tools
 
LCU14 302- How to port OP-TEE to another platform
LCU14 302- How to port OP-TEE to another platformLCU14 302- How to port OP-TEE to another platform
LCU14 302- How to port OP-TEE to another platform
 
1032 cs208 g operation system ip camera case share.v0.2
1032 cs208 g operation system ip camera case share.v0.21032 cs208 g operation system ip camera case share.v0.2
1032 cs208 g operation system ip camera case share.v0.2
 
Tuning parallelcodeonsolaris005
Tuning parallelcodeonsolaris005Tuning parallelcodeonsolaris005
Tuning parallelcodeonsolaris005
 
Week1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC BeginWeek1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC Begin
 

More from Hyunghun Cho

Do IoT Yourself 3rd : Open API - revision 3
Do IoT Yourself 3rd : Open API - revision 3Do IoT Yourself 3rd : Open API - revision 3
Do IoT Yourself 3rd : Open API - revision 3
Hyunghun Cho
 

More from Hyunghun Cho (9)

2018 소프트웨어에 물들다 - 기계는 어떻게 생각할까?
2018 소프트웨어에 물들다 - 기계는 어떻게 생각할까?2018 소프트웨어에 물들다 - 기계는 어떻게 생각할까?
2018 소프트웨어에 물들다 - 기계는 어떻게 생각할까?
 
Somul 2017 소프트웨어, 사람과 사물의 소통을 향하여
Somul 2017 소프트웨어, 사람과 사물의 소통을 향하여Somul 2017 소프트웨어, 사람과 사물의 소통을 향하여
Somul 2017 소프트웨어, 사람과 사물의 소통을 향하여
 
Do IoT Yourself 3rd : Open API - revision 3
Do IoT Yourself 3rd : Open API - revision 3Do IoT Yourself 3rd : Open API - revision 3
Do IoT Yourself 3rd : Open API - revision 3
 
IoT Web App - 수집된 정보의 가공, 처리, 융합
IoT Web App - 수집된 정보의 가공, 처리, 융합IoT Web App - 수집된 정보의 가공, 처리, 융합
IoT Web App - 수집된 정보의 가공, 처리, 융합
 
Do IoT Yourself! - 사물 간의 연결을 위한 Open API
Do IoT Yourself! - 사물 간의 연결을 위한 Open APIDo IoT Yourself! - 사물 간의 연결을 위한 Open API
Do IoT Yourself! - 사물 간의 연결을 위한 Open API
 
IoT, 기술의 혁신과 미래 그리고 통찰
IoT, 기술의 혁신과 미래 그리고 통찰IoT, 기술의 혁신과 미래 그리고 통찰
IoT, 기술의 혁신과 미래 그리고 통찰
 
GameTube app-swing-introduction
GameTube app-swing-introductionGameTube app-swing-introduction
GameTube app-swing-introduction
 
Home sensor prototype on Arduino & Raspberry Pi with Node.JS
Home sensor prototype on Arduino & Raspberry Pi with Node.JSHome sensor prototype on Arduino & Raspberry Pi with Node.JS
Home sensor prototype on Arduino & Raspberry Pi with Node.JS
 
REST to JavaScript for Better Client-side Development
REST to JavaScript for Better Client-side DevelopmentREST to JavaScript for Better Client-side Development
REST to JavaScript for Better Client-side Development
 

Recently uploaded

Recently uploaded (20)

AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
How To Build a Successful SaaS Design.pdf
How To Build a Successful SaaS Design.pdfHow To Build a Successful SaaS Design.pdf
How To Build a Successful SaaS Design.pdf
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting software
 
A Guideline to Gorgias to to Re:amaze Data Migration
A Guideline to Gorgias to to Re:amaze Data MigrationA Guideline to Gorgias to to Re:amaze Data Migration
A Guideline to Gorgias to to Re:amaze Data Migration
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
Benefits of Employee Monitoring Software
Benefits of  Employee Monitoring SoftwareBenefits of  Employee Monitoring Software
Benefits of Employee Monitoring Software
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 

Tensorflow internal

  • 2. Overview ■ Dataflow-like model ■ Runs on a wide variety of different H/W platform 2※ Source: tensorflow.org ※ Source: github.com/zer0n/deepframeworks
  • 3. Basic concepts ■ Tensor – definition: an array with more than two axes – arbitrary dimensionality array ■ Directed graph describes T/F computation – node: instantiation of an Operation ■ Operation – an abstract computation – have attribute(s) ■ Kernel – particular implementation of an Operation – run on a type of device (e.g. CPU, GPU) ■ Variable – special Operation to persistent mutable Tensor ■ Session – Created to interact with T/F system 3 nodein out 0…* 0…* ※ Source: T/F white paper
  • 4. Programming Model ■ Example T/F code and corresponding computation graph ■ Single machine and distributed system architecture 4※ Source: T/F white paper
  • 5. Previous work ■ DistBelief – Downpour SGD – Sandblaster L-BFGS ■ Related to – Project Adam • MSR – Parameter Server project 5 ※ Source: Large Scale Distributed Deep Networks ※ Source: parameter server architecture github wiki ※ Source: Project Adam paper
  • 6. Feature Comparison Feature Tensor Flow Theano Torch Caffe Chainer CNTK Run on Single Machine O O O O O O Run on Distributed Machines O X X X X O Symbolic differentiation O O X X O X Implemented by C++ O X X O X X 6 ※ Source: T/F white paper ■ For detail, refer to Wikipedia
  • 7. Execution Mode ■ Single Device ■ Multi Device – Node placement – Cross-Device Communication ■ Distributed – Fault Tolerance • Error handling between Send-Receive node pair • Periodic health check to worker process 7
  • 8. Programming Idioms ■ Programming Idioms – Data Parallel Training • sequential SGD – Model Parallel Training • Recurrent deep LSTM – Concurrent Steps 8
  • 9. Code Metrics ■ Source – https://github.com/tensorflow/tensorflow ■ Code Summary – Total 114MB • 3373 files including C/C++, python, HTML, … – Top 5 languages for implementation • C++ and Python are the major languages • Protocol Buffers: provide mechanism for serializing structured data 9 language files blank comment code C++ 1092 46473 43399 276160 C/C++ Header 779 23457 44727 86274 Python 641 27622 46660 97570 Protocol Buffers 179 2217 7294 8724 Java 167 8296 17325 49374 C# 116 4285 8653 34347
  • 10. How it works ■ Python-C++ connection with SWIG wrapper 10 [tensorflow.i] [py_func.i] [py_func.h] [py_func.cc] v v
  • 11. Code Structure ■ C++ implementation under /core folder 11 Folder C/C++ Header C++ Protocol Buffers 총합계 ./tensorflow/core/client/ 511 511 ./tensorflow/core/common_runtime/ 1384 8526 9910 ./tensorflow/core/common_runtime/gpu/ 644 3674 4318 ./tensorflow/core/distributed_runtime/ 581 2579 3160 ./tensorflow/core/distributed_runtime/rpc/ 434 2759 3193 ./tensorflow/core/example/ 116 209 45 370 ./tensorflow/core/framework/ 3539 14022 451 18012 ./tensorflow/core/graph/ 952 5586 6538 ./tensorflow/core/kernels/ 9180 42188 11 51379 ./tensorflow/core/lib/core/ 573 1240 25 1838 ./tensorflow/core/lib/gtl/ 1452 1943 3395 ./tensorflow/core/lib/hash/ 36 400 436 ./tensorflow/core/lib/histogram/ 60 324 384 ./tensorflow/core/lib/io/ 340 2134 2474 ./tensorflow/core/lib/jpeg/ 78 767 845 ./tensorflow/core/lib/png/ 37 311 348 ./tensorflow/core/lib/random/ 690 856 1546 ./tensorflow/core/lib/strings/ 532 3111 3643 ./tensorflow/core/lib/wav/ 13 166 179 ./tensorflow/core/ops/ 9346 9346 ./tensorflow/core/ops/compat/ 25 204 229 ./tensorflow/core/platform/ 805 738 1543 ./tensorflow/core/platform/default/ 349 290 639 ./tensorflow/core/platform/posix/ 31 656 687 ./tensorflow/core/protobuf/ 333 333 ./tensorflow/core/public/ 202 202 ./tensorflow/core/user_ops/ 20 20 ./tensorflow/core/util/ 1354 4426 170 5950 ./tensorflow/core/util/ctc/ 600 298 898 ./tensorflow/core/util/sparse/ 504 498 1002 총합계 24511 107782 1035 133328
  • 12. C++ framework ■ Key classes 12
  • 13. C++ kernels ■ Inherit from OpKernel ■ Kernel is implemented per CPU / GPU [How to] – GPU version uses CUDA library 13 [constant_op.h] [constant_op.cc] [constant_op_gpu.cu.cc]
  • 14. Code Structure ■ Python implementation under /python folder 14 Folder C/C++ Header C++ Protocol Buffers Python 총합계 ./tensorflow/python/ 168 168 ./tensorflow/python/client/ 33 475 2031 2539 ./tensorflow/python/framework/ 13 686 7097 7796 ./tensorflow/python/kernel_tests/ 25391 25391 ./tensorflow/python/lib/core/ 26 316 342 ./tensorflow/python/lib/io/ 52 75 31 158 ./tensorflow/python/ops/ 14995 14995 ./tensorflow/python/platform/ 888 888 ./tensorflow/python/platform/default / 389 389 ./tensorflow/python/summary/ 1168 1168 ./tensorflow/python/summary/impl/ 693 693 ./tensorflow/python/tools/ 280 280 ./tensorflow/python/training/ 6 7732 7738 ./tensorflow/python/user_ops/ 7 7 ./tensorflow/python/util/ 51 51 총합계 124 1552 6 60921 62603
  • 16. Code Summary ■ The Python part – Various operations and trainings – API: • the most complete and the easiest to use ■ The C++ part – Framework and kernel functions – API: • offer some performance advantages • supports deployment to small devices such as Android 16
  • 17. Meta Framework ■ Keras ■ TensorFlow Slim – a lightweight library for defining, training and evaluating models ■ Skflow – provide Scikit Learn style API ■ PrettyTensor – support a chainable object syntax to quickly define neural networks ■ TFLearn – a modular and transparent deep learning library 17