SlideShare a Scribd company logo
1 of 22
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
TensorFlow Study (Part I)
劉得彥
Danny Liu
資訊與通訊研究所 ICL
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Tensor
2
• A tensor is an n-dimensional structure
▪ n=0 : Single Value
▪ n=1 : List of Values
▪ n=2 : Matrix of values
▪ n=3 : Cube of values
▪ …
• The Tensor:
▪ Data is in TensorBuffer as an Eigen::Tensor
▪ Shape definition is in TensorShape
▪ Reference counting is in RefCounted
https://www.slideshare.net/EdurekaIN/introduction-to-tensorflow-deep-learning-using-tensorflow-
tensorflow-tutorial-edureka
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Operation
3
• Here is customized Sigmoid Op example:
▪ Input Tensors: x = tf.constant([[1.0, 0.0], [0.0, -1.0]])
▪ Output Tensors: y = cpp_con_sigmoid.cpp_con_sigmoid(x)
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Operation
4
• Register OP definition to kernel builder
Class CppConSigmoid
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Expressing: Ops
5
http://public.kevinrobinsonblog.com/docs/A%20tour%20through%20the%20Te
nsorFlow%20codebase%20-%20v4.pdf
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Expressing: Ops
6
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Build a Computation Graph
7
• Tensors flows between operations
https://machinelearningblogs.com/2017/09/07/tensorflow-tutorial-part-1-introduction/
Tensor
Operator
Computation Graph
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
TensorFlow Framework
8
Python APIs
C++ APIs
swig
C APIs (tensorflow/c/c_api.h)
Layers
Estimator Keras
Canned Estimator
TensorFlow Core libraries (C++)
core, runtime, graph, grappler, ops, kernels …
Application using
C, Golang…
C++ Application
Python
Application
Python Application
Limited
functions : Do
inference
( Ongoing )
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
DirectSession
Build a Computation Graph
9
• TF Graph is the computation graph in session
Python APIs C++ APIs
tf.MatMul(a, b)
swig
C APIs
MatMul(scope, a, b)
GraphDef
TF Graph
Protobuf text
Convert to GraphDef with
Graph::ToGraphDef()
Serialization / Deserialization
for distribution training
1464 def _create_c_op(graph, node_def, inputs, control_inputs):
Protobuf text
Convert to Graph (node and edge) with
ConvertGraphDefToGraph()
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Computation Graph in details
10
• tf.get_default_graph() can generate the ProtoBuf message of graph (GraphDef)
• Distributed Session use this to prune the graph and send to another device via grpc.
node {
name: "mse"
op: "Mean"
input: "Square"
input: "Const"
attr {
key: "T"
value {
type: DT_FLOAT
}
}
attr {
key: "Tidx"
value {
type: DT_INT32
}
}
attr {
key: "keep_dims"
value {
b: false
}
}
}
node {
name: "Const"
op: "Const"
attr {
key: "dtype"
value {
type: DT_INT32
}
}
attr {
key: "value"
value {
tensor {
dtype: DT_INT32
tensor_shape {
dim {
size: 2
}
}
tensor_content:
"000000000000001
000000000"
}
}
}
}
node {
name: "Square"
op: "Square"
input: "sub"
attr {
key: "T"
value {
type: DT_FLOAT
}
}
}
node {
name: "sub"
op: "Sub"
input: "predictions"
input: "y"
attr {
key: "T"
value {
type: DT_FLOAT
}
}
}
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Computation Graph in details
11
• We can use graph editor to manipulate our graph,
▪ for instance: swap out our targeted output tenor and in to a gradient op.
• from tensorflow.contrib import graph_editor as ge
▪ ge.add_control_inputs()
▪ ge.connect()
▪ ge.sgv()
▪ ge. remap_inputs()
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Re-use models
12
Protobuf:
pb ( binary ) or
pb_txt
Protobuf binary:
pb
It contains weights
data
Define
model/network
in Python
Train
model/network
in Python
Save model:
*.ckpt
freeze_graph
Load
model/network in
C++
Inferencing
Transer learning
Model.data,
model.index,
and model.meta
Another way to
use model in
C++
original
way:
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Compile C++ TensorFlow app/ops
13
• Here is an example to use CMake instead of Bazel BUILD…
is more convenient and the binary file is much smaller.
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Gradient Calculation
14
• TensorFlow uses reverse-mode autodiff
▪ It computes all the partial derivatives of the outputs with regards to all the inputs in just n
outputs + 1 graph traversals.
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Where are the Gradient Ops?
15
• The gradient op registration could be in Python or C++
Python
C++
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Computation Graph Execution
16
• a simple example that illustrates graph construction and
execution using the C++ API
https://www.tensorflow.org/api_guides/cc/guide
How to know what happened?
1. export TF_CPP_MIN_VLOG_LEVEL=2
or
2. import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='0'
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Log information
17
• The environment is 1 GPU and 32 CPU cores
▪ Decide Session Factory type ( dicrect session )
a. Inter op parallelism threads: 32
▪ Build Executor
a. Find and add visiable GPU devices
b. Create TF devices mapping to physical GPU device
c. Build 4 kinds of streams:
» CUDA stream
» Host_to_Device stream
» Device_to_Host stream
» Device_to_Device stream
▪ PoolAllocator for ProcessState CPU allocator
▪ BFCAllocator
a. Create Bins …
▪ Grappler ( computation graph optimization )
a. Do something …
▪ Op Kernel
a. Instantiating kernel for node
b. Processing / Computing node
» Allocate and deallocate tensors with allocators ( cpu or gpu: cuda_host_bfc )
▪ PollEvents
(sington)
StreamGroupFactory
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Log: Op kernel processing
18
• 2018-02-23 11:19:04.563051: I tensorflow/core/common_runtime/executor.cc:1561] Process node: 96 step 2 output/output/MatMul = MatMul[T=DT_FLOAT, transpose_a=false,
transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](fc1/fc1/Relu, output/kernel/read) is dead: 0
• 2018-02-23 11:19:04.563053: I tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:190] PollEvents free_events_ 0 used_events_ 1
• 2018-02-23 11:19:04.563059: I tensorflow/core/platform/default/device_tracer.cc:307] PushAnnotation output/output/MatMul:MatMul
• 2018-02-23 11:19:04.563065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:445] GpuDevice::Compute output/output/MatMul op MatMul on GPU0 stream[0]
• 2018-02-23 11:19:04.563090: I tensorflow/core/framework/log_memory.cc:35] __LOG_MEMORY__ MemoryLogTensorAllocation { step_id: 2 kernel_name: "output/output/MatMul"
tensor { dtype: DT_FLOAT shape { dim { size: 1024 } dim { size: 10 } } allocation_description { requested_bytes: 40960 allocated_bytes: 68608 allocator_name: "GPU_0_bfc"
allocation_id: 85 has_single_reference: true ptr: 1108332340992 } } }
• 2018-02-23 11:19:04.563118: I tensorflow/stream_executor/stream.cc:3521] Called Stream::ThenBlasGemm(transa=NoTranspose, transb=NoTranspose, m=10, n=1024, k=128,
alpha=1, a=0x1020dc15300, lda=10, b=0x1020dc62e00, ldb=128, beta=0, c=0x1020dc16700, ldc=10) stream=0x6006d80
• 2018-02-23 11:19:04.563129: I tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:190] PollEvents free_events_ 0 used_events_ 1
• 2018-02-23 11:19:04.563130: I tensorflow/stream_executor/cuda/cuda_blas.cc:1881] doing cuBLAS SGEMM: at=0 bt=0 m=10 n=1024 k=128 alpha=1.000000 a=0x1020dc15300
lda=10 b=0x1020dc62e00 ldb=128 beta=0.000000 c=0x1020dc16700 ldc=10
• 2018-02-23 11:19:04.563156: I tensorflow/core/platform/default/device_tracer.cc:483] ApiCallback 1:307 func: cuLaunchKernel
• 2018-02-23 11:19:04.563164: I tensorflow/core/platform/default/device_tracer.cc:497] LAUNCH stream 0x5fd4490 correllation 1217 kernel sgemm_32x32x32_NN
• 2018-02-23 11:19:04.563168: I tensorflow/core/platform/default/device_tracer.cc:471] 1217 : output/output/MatMul:MatMul
• 2018-02-23 11:19:04.563198: I tensorflow/core/platform/default/device_tracer.cc:483] ApiCallback 1:307 func: cuLaunchKernel
• 2018-02-23 11:19:04.563210: I tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:190] PollEvents free_events_ 0 used_events_ 1
• 2018-02-23 11:19:04.563222: I tensorflow/core/framework/log_memory.cc:35] __LOG_MEMORY__ MemoryLogTensorOutput { step_id: 2 kernel_name: "output/output/MatMul"
tensor { dtype: DT_FLOAT shape { dim { size: 1024 } dim { size: 10 } } allocation_description { requested_bytes: 40960 allocated_bytes: 68608 allocator_name: "GPU_0_bfc"
allocation_id: 85 has_single_reference: true ptr: 1108332340992 } } }
• 2018-02-23 11:19:04.563236: I tensorflow/core/common_runtime/executor.cc:1673] Synchronous kernel done: 96 step 2 output/output/MatMul = MatMul[T=DT_FLOAT,
transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](fc1/fc1/Relu, output/kernel/read) is dead: 0
• 2018-02-23 11:19:04.563244: I tensorflow/core/common_runtime/step_stats_collector.cc:264] Save dev /job:localhost/replica:0/task:0/device:GPU:0 nt 0x7fa0a7786830
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Log: Tensor Allocation / Deallocation
19
• 2018-02-23 11:19:04.430051: I tensorflow/core/framework/log_memory.cc:35] __LOG_MEMORY__
MemoryLogTensorOutput { step_id: 2 kernel_name: "pool3/dropout/cond/Merge" tensor { dtype: DT_FLOAT
shape { dim { size: 1024 } dim { size: 12544 } } allocation_description { requested_bytes: 51380224
allocated_bytes: 82837504 allocator_name: "GPU_0_bfc" allocation_id: 81 ptr: 1108467515392 } } }
• ... ...
• 2018-02-23 11:19:04.564922: I tensorflow/core/common_runtime/gpu/gpu_device.cc:445]
GpuDevice::Compute train/gradients/train/Mean_grad/Prod/_16 op _Send on GPU0 stream[0].
• .. ...
• 2018-02-23 11:19:04.566499: I tensorflow/core/framework/log_memory.cc:35] __LOG_MEMORY__
MemoryLogTensorDeallocation { allocation_id: 81 allocator_name: "GPU_0_bfc" }
This tensor is
going to be
deallocated…
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Log: Event Manager
20
• 2018-02-23 11:19:04.565108: I tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:190] PollEvents free_events_ 1 used_events_
• 22018-02-23 11:19:04.565123: I tensorflow/core/platform/default/device_tracer.cc:483] ApiCallback 1:279 func: cuMemcpyDtoHAsync_v
• 22018-02-23 11:19:04.565132: I tensorflow/core/platform/default/device_tracer.cc:471] 1286 : edge_69_train/gradients/train/Mean_grad/Prod
• 2018-02-23 11:19:04.565138: I tensorflow/stream_executor/cuda/cuda_driver.cc:1215] successfully enqueued async memcpy d2h of 4 bytes from
0x1020f310000 to 0x1020de00a00 on stream 0x7fa18db423b0
• 2018-02-23 11:19:04.565144: I tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:151] QueueInUse free_events_ 1 used_events_ 2
• 2018-02-23 11:19:04.565152: I tensorflow/stream_executor/stream.cc:302] Called Stream::ThenRecordEvent(event=0x7fa0a778b3f0)
stream=0x7fa18db1b7f0
• 2018-02-23 11:19:04.565159: I tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:190] PollEvents free_events_ 0 used_events_ 3
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
IntraProcessRendezvous
21
• 2018-02-23 11:19:04.577400: I tensorflow/core/common_runtime/rendezvous_mgr.cc:42]
IntraProcessRendezvous Send 0x7fa18d9bab20
/job:localhost/replica:0/task:0/device:CPU:0;0000000000000001;/job:localhost/replica:0/task:0/d
evice:GPU:0;edge_185_Conv2_SwapIn;0:0
• 2018-02-23 11:19:03.141265: I tensorflow/core/common_runtime/rendezvous_mgr.cc:119]
IntraProcessRendezvous Recv 0x7fa18d9bab20
/job:localhost/replica:0/task:0/device:CPU:0;0000000000000001;/job:localhost/replica:0/task:0/d
evice:GPU:0;edge_185_Conv2_SwapIn;0:0
GPU:0
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
TensorFlow Graph Exection
22
Device Level:
memory
management
Session Level:
Global control
Executor Level:
Run graph
async
Op Level:
Compute forward
and gradient.
Executors get created
for each subgraph
Get node ( operation )
from “ready” queue
Call into Stream, which contains stream_executor

More Related Content

What's hot

The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...David Walker
 
zkStudy Club: Subquadratic SNARGs in the Random Oracle Model
zkStudy Club: Subquadratic SNARGs in the Random Oracle ModelzkStudy Club: Subquadratic SNARGs in the Random Oracle Model
zkStudy Club: Subquadratic SNARGs in the Random Oracle ModelAlex Pruden
 
Introduction to Homomorphic Encryption
Introduction to Homomorphic EncryptionIntroduction to Homomorphic Encryption
Introduction to Homomorphic EncryptionChristoph Matthies
 
Building High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low EffortBuilding High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low EffortStefan Marr
 
Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...
Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...
Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...Stefan Marr
 
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNetAlex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNetAI Frontiers
 
Homomorphic Encryption
Homomorphic EncryptionHomomorphic Encryption
Homomorphic EncryptionGöktuğ Serez
 
Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)Daniel Lemire
 
ZK Study Club: Sumcheck Arguments and Their Applications
ZK Study Club: Sumcheck Arguments and Their ApplicationsZK Study Club: Sumcheck Arguments and Their Applications
ZK Study Club: Sumcheck Arguments and Their ApplicationsAlex Pruden
 
Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018Preferred Networks
 
19 algorithms-and-complexity-110627100203-phpapp02
19 algorithms-and-complexity-110627100203-phpapp0219 algorithms-and-complexity-110627100203-phpapp02
19 algorithms-and-complexity-110627100203-phpapp02Muhammad Aslam
 
Homomorphic Encryption
Homomorphic EncryptionHomomorphic Encryption
Homomorphic EncryptionVictor Pereira
 
Seeing with Python presented at PyCon AU 2014
Seeing with Python presented at PyCon AU 2014Seeing with Python presented at PyCon AU 2014
Seeing with Python presented at PyCon AU 2014Mark Rees
 
Demystifying DataFrame and Dataset
Demystifying DataFrame and DatasetDemystifying DataFrame and Dataset
Demystifying DataFrame and DatasetKazuaki Ishizaki
 
EclipseCon 2017 - Introduction to Machine Learning with Eclipse Deeplearning4j
EclipseCon 2017 - Introduction to Machine Learning with Eclipse Deeplearning4jEclipseCon 2017 - Introduction to Machine Learning with Eclipse Deeplearning4j
EclipseCon 2017 - Introduction to Machine Learning with Eclipse Deeplearning4jMax Pumperla
 
Partial Homomorphic Encryption
Partial Homomorphic EncryptionPartial Homomorphic Encryption
Partial Homomorphic Encryptionsecurityxploded
 
PyTorch Tutorial for NTU Machine Learing Course 2017
PyTorch Tutorial for NTU Machine Learing Course 2017PyTorch Tutorial for NTU Machine Learing Course 2017
PyTorch Tutorial for NTU Machine Learing Course 2017Yu-Hsun (lymanblue) Lin
 

What's hot (20)

The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
 
zkStudy Club: Subquadratic SNARGs in the Random Oracle Model
zkStudy Club: Subquadratic SNARGs in the Random Oracle ModelzkStudy Club: Subquadratic SNARGs in the Random Oracle Model
zkStudy Club: Subquadratic SNARGs in the Random Oracle Model
 
Introduction to Homomorphic Encryption
Introduction to Homomorphic EncryptionIntroduction to Homomorphic Encryption
Introduction to Homomorphic Encryption
 
Building High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low EffortBuilding High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low Effort
 
Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...
Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...
Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...
 
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNetAlex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
 
Homomorphic Encryption
Homomorphic EncryptionHomomorphic Encryption
Homomorphic Encryption
 
同態加密
同態加密同態加密
同態加密
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)
 
ZK Study Club: Sumcheck Arguments and Their Applications
ZK Study Club: Sumcheck Arguments and Their ApplicationsZK Study Club: Sumcheck Arguments and Their Applications
ZK Study Club: Sumcheck Arguments and Their Applications
 
Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018
 
19 algorithms-and-complexity-110627100203-phpapp02
19 algorithms-and-complexity-110627100203-phpapp0219 algorithms-and-complexity-110627100203-phpapp02
19 algorithms-and-complexity-110627100203-phpapp02
 
Computing on Encrypted Data
Computing on Encrypted DataComputing on Encrypted Data
Computing on Encrypted Data
 
Homomorphic Encryption
Homomorphic EncryptionHomomorphic Encryption
Homomorphic Encryption
 
Seeing with Python presented at PyCon AU 2014
Seeing with Python presented at PyCon AU 2014Seeing with Python presented at PyCon AU 2014
Seeing with Python presented at PyCon AU 2014
 
Demystifying DataFrame and Dataset
Demystifying DataFrame and DatasetDemystifying DataFrame and Dataset
Demystifying DataFrame and Dataset
 
EclipseCon 2017 - Introduction to Machine Learning with Eclipse Deeplearning4j
EclipseCon 2017 - Introduction to Machine Learning with Eclipse Deeplearning4jEclipseCon 2017 - Introduction to Machine Learning with Eclipse Deeplearning4j
EclipseCon 2017 - Introduction to Machine Learning with Eclipse Deeplearning4j
 
Partial Homomorphic Encryption
Partial Homomorphic EncryptionPartial Homomorphic Encryption
Partial Homomorphic Encryption
 
PyTorch Tutorial for NTU Machine Learing Course 2017
PyTorch Tutorial for NTU Machine Learing Course 2017PyTorch Tutorial for NTU Machine Learing Course 2017
PyTorch Tutorial for NTU Machine Learing Course 2017
 

Similar to TensorFlow Study Part I

Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Intel® Software
 
Windbg랑 친해지기
Windbg랑 친해지기Windbg랑 친해지기
Windbg랑 친해지기Ji Hun Kim
 
ez-clang C++ REPL for bare-metal embedded devices
ez-clang C++ REPL for bare-metal embedded devicesez-clang C++ REPL for bare-metal embedded devices
ez-clang C++ REPL for bare-metal embedded devicesStefan Gränitz
 
Caffe studying 2017
Caffe studying 2017Caffe studying 2017
Caffe studying 2017Te-Yen Liu
 
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...Provectus
 
A Brief Introduction of TiDB (Percona Live)
A Brief Introduction of TiDB (Percona Live)A Brief Introduction of TiDB (Percona Live)
A Brief Introduction of TiDB (Percona Live)PingCAP
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak PROIDEA
 
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...David Beazley (Dabeaz LLC)
 
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015Windows Developer
 
Module: drand - the Distributed Randomness Beacon
Module: drand - the Distributed Randomness BeaconModule: drand - the Distributed Randomness Beacon
Module: drand - the Distributed Randomness BeaconIoannis Psaras
 
Data Democratization at Nubank
 Data Democratization at Nubank Data Democratization at Nubank
Data Democratization at NubankDatabricks
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingMichelle Holley
 
Score (smart contract for icon)
Score (smart contract for icon) Score (smart contract for icon)
Score (smart contract for icon) Doyun Hwang
 
Anais Dotis-Georgiou & Steven Soroka [InfluxData] | Machine Learning with Tel...
Anais Dotis-Georgiou & Steven Soroka [InfluxData] | Machine Learning with Tel...Anais Dotis-Georgiou & Steven Soroka [InfluxData] | Machine Learning with Tel...
Anais Dotis-Georgiou & Steven Soroka [InfluxData] | Machine Learning with Tel...InfluxData
 
Over-the-Air: How we Remotely Compromised the Gateway, BCM, and Autopilot ECU...
Over-the-Air: How we Remotely Compromised the Gateway, BCM, and Autopilot ECU...Over-the-Air: How we Remotely Compromised the Gateway, BCM, and Autopilot ECU...
Over-the-Air: How we Remotely Compromised the Gateway, BCM, and Autopilot ECU...Priyanka Aash
 
Reverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemReverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemCyber Security Alliance
 
C Programming Language
C Programming LanguageC Programming Language
C Programming LanguageRTS Tech
 

Similar to TensorFlow Study Part I (20)

Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
 
Windbg랑 친해지기
Windbg랑 친해지기Windbg랑 친해지기
Windbg랑 친해지기
 
ez-clang C++ REPL for bare-metal embedded devices
ez-clang C++ REPL for bare-metal embedded devicesez-clang C++ REPL for bare-metal embedded devices
ez-clang C++ REPL for bare-metal embedded devices
 
Caffe studying 2017
Caffe studying 2017Caffe studying 2017
Caffe studying 2017
 
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
 
A Brief Introduction of TiDB (Percona Live)
A Brief Introduction of TiDB (Percona Live)A Brief Introduction of TiDB (Percona Live)
A Brief Introduction of TiDB (Percona Live)
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
 
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
 
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
 
From logs to metrics
From logs to metricsFrom logs to metrics
From logs to metrics
 
Module: drand - the Distributed Randomness Beacon
Module: drand - the Distributed Randomness BeaconModule: drand - the Distributed Randomness Beacon
Module: drand - the Distributed Randomness Beacon
 
Data Democratization at Nubank
 Data Democratization at Nubank Data Democratization at Nubank
Data Democratization at Nubank
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet Processing
 
Score (smart contract for icon)
Score (smart contract for icon) Score (smart contract for icon)
Score (smart contract for icon)
 
Anais Dotis-Georgiou & Steven Soroka [InfluxData] | Machine Learning with Tel...
Anais Dotis-Georgiou & Steven Soroka [InfluxData] | Machine Learning with Tel...Anais Dotis-Georgiou & Steven Soroka [InfluxData] | Machine Learning with Tel...
Anais Dotis-Georgiou & Steven Soroka [InfluxData] | Machine Learning with Tel...
 
Over-the-Air: How we Remotely Compromised the Gateway, BCM, and Autopilot ECU...
Over-the-Air: How we Remotely Compromised the Gateway, BCM, and Autopilot ECU...Over-the-Air: How we Remotely Compromised the Gateway, BCM, and Autopilot ECU...
Over-the-Air: How we Remotely Compromised the Gateway, BCM, and Autopilot ECU...
 
Reverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemReverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande Modem
 
report
reportreport
report
 
C Programming Language
C Programming LanguageC Programming Language
C Programming Language
 

Recently uploaded

chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 

Recently uploaded (20)

chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 

TensorFlow Study Part I

  • 1. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE TensorFlow Study (Part I) 劉得彥 Danny Liu 資訊與通訊研究所 ICL
  • 2. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Tensor 2 • A tensor is an n-dimensional structure ▪ n=0 : Single Value ▪ n=1 : List of Values ▪ n=2 : Matrix of values ▪ n=3 : Cube of values ▪ … • The Tensor: ▪ Data is in TensorBuffer as an Eigen::Tensor ▪ Shape definition is in TensorShape ▪ Reference counting is in RefCounted https://www.slideshare.net/EdurekaIN/introduction-to-tensorflow-deep-learning-using-tensorflow- tensorflow-tutorial-edureka
  • 3. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Operation 3 • Here is customized Sigmoid Op example: ▪ Input Tensors: x = tf.constant([[1.0, 0.0], [0.0, -1.0]]) ▪ Output Tensors: y = cpp_con_sigmoid.cpp_con_sigmoid(x)
  • 4. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Operation 4 • Register OP definition to kernel builder Class CppConSigmoid
  • 5. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Expressing: Ops 5 http://public.kevinrobinsonblog.com/docs/A%20tour%20through%20the%20Te nsorFlow%20codebase%20-%20v4.pdf
  • 6. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Expressing: Ops 6
  • 7. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Build a Computation Graph 7 • Tensors flows between operations https://machinelearningblogs.com/2017/09/07/tensorflow-tutorial-part-1-introduction/ Tensor Operator Computation Graph
  • 8. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE TensorFlow Framework 8 Python APIs C++ APIs swig C APIs (tensorflow/c/c_api.h) Layers Estimator Keras Canned Estimator TensorFlow Core libraries (C++) core, runtime, graph, grappler, ops, kernels … Application using C, Golang… C++ Application Python Application Python Application Limited functions : Do inference ( Ongoing )
  • 9. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE DirectSession Build a Computation Graph 9 • TF Graph is the computation graph in session Python APIs C++ APIs tf.MatMul(a, b) swig C APIs MatMul(scope, a, b) GraphDef TF Graph Protobuf text Convert to GraphDef with Graph::ToGraphDef() Serialization / Deserialization for distribution training 1464 def _create_c_op(graph, node_def, inputs, control_inputs): Protobuf text Convert to Graph (node and edge) with ConvertGraphDefToGraph()
  • 10. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Computation Graph in details 10 • tf.get_default_graph() can generate the ProtoBuf message of graph (GraphDef) • Distributed Session use this to prune the graph and send to another device via grpc. node { name: "mse" op: "Mean" input: "Square" input: "Const" attr { key: "T" value { type: DT_FLOAT } } attr { key: "Tidx" value { type: DT_INT32 } } attr { key: "keep_dims" value { b: false } } } node { name: "Const" op: "Const" attr { key: "dtype" value { type: DT_INT32 } } attr { key: "value" value { tensor { dtype: DT_INT32 tensor_shape { dim { size: 2 } } tensor_content: "000000000000001 000000000" } } } } node { name: "Square" op: "Square" input: "sub" attr { key: "T" value { type: DT_FLOAT } } } node { name: "sub" op: "Sub" input: "predictions" input: "y" attr { key: "T" value { type: DT_FLOAT } } }
  • 11. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Computation Graph in details 11 • We can use graph editor to manipulate our graph, ▪ for instance: swap out our targeted output tenor and in to a gradient op. • from tensorflow.contrib import graph_editor as ge ▪ ge.add_control_inputs() ▪ ge.connect() ▪ ge.sgv() ▪ ge. remap_inputs()
  • 12. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Re-use models 12 Protobuf: pb ( binary ) or pb_txt Protobuf binary: pb It contains weights data Define model/network in Python Train model/network in Python Save model: *.ckpt freeze_graph Load model/network in C++ Inferencing Transer learning Model.data, model.index, and model.meta Another way to use model in C++ original way:
  • 13. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Compile C++ TensorFlow app/ops 13 • Here is an example to use CMake instead of Bazel BUILD… is more convenient and the binary file is much smaller.
  • 14. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Gradient Calculation 14 • TensorFlow uses reverse-mode autodiff ▪ It computes all the partial derivatives of the outputs with regards to all the inputs in just n outputs + 1 graph traversals.
  • 15. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Where are the Gradient Ops? 15 • The gradient op registration could be in Python or C++ Python C++
  • 16. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Computation Graph Execution 16 • a simple example that illustrates graph construction and execution using the C++ API https://www.tensorflow.org/api_guides/cc/guide How to know what happened? 1. export TF_CPP_MIN_VLOG_LEVEL=2 or 2. import os os.environ['TF_CPP_MIN_LOG_LEVEL']='0'
  • 17. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Log information 17 • The environment is 1 GPU and 32 CPU cores ▪ Decide Session Factory type ( dicrect session ) a. Inter op parallelism threads: 32 ▪ Build Executor a. Find and add visiable GPU devices b. Create TF devices mapping to physical GPU device c. Build 4 kinds of streams: » CUDA stream » Host_to_Device stream » Device_to_Host stream » Device_to_Device stream ▪ PoolAllocator for ProcessState CPU allocator ▪ BFCAllocator a. Create Bins … ▪ Grappler ( computation graph optimization ) a. Do something … ▪ Op Kernel a. Instantiating kernel for node b. Processing / Computing node » Allocate and deallocate tensors with allocators ( cpu or gpu: cuda_host_bfc ) ▪ PollEvents (sington) StreamGroupFactory
  • 18. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Log: Op kernel processing 18 • 2018-02-23 11:19:04.563051: I tensorflow/core/common_runtime/executor.cc:1561] Process node: 96 step 2 output/output/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](fc1/fc1/Relu, output/kernel/read) is dead: 0 • 2018-02-23 11:19:04.563053: I tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:190] PollEvents free_events_ 0 used_events_ 1 • 2018-02-23 11:19:04.563059: I tensorflow/core/platform/default/device_tracer.cc:307] PushAnnotation output/output/MatMul:MatMul • 2018-02-23 11:19:04.563065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:445] GpuDevice::Compute output/output/MatMul op MatMul on GPU0 stream[0] • 2018-02-23 11:19:04.563090: I tensorflow/core/framework/log_memory.cc:35] __LOG_MEMORY__ MemoryLogTensorAllocation { step_id: 2 kernel_name: "output/output/MatMul" tensor { dtype: DT_FLOAT shape { dim { size: 1024 } dim { size: 10 } } allocation_description { requested_bytes: 40960 allocated_bytes: 68608 allocator_name: "GPU_0_bfc" allocation_id: 85 has_single_reference: true ptr: 1108332340992 } } } • 2018-02-23 11:19:04.563118: I tensorflow/stream_executor/stream.cc:3521] Called Stream::ThenBlasGemm(transa=NoTranspose, transb=NoTranspose, m=10, n=1024, k=128, alpha=1, a=0x1020dc15300, lda=10, b=0x1020dc62e00, ldb=128, beta=0, c=0x1020dc16700, ldc=10) stream=0x6006d80 • 2018-02-23 11:19:04.563129: I tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:190] PollEvents free_events_ 0 used_events_ 1 • 2018-02-23 11:19:04.563130: I tensorflow/stream_executor/cuda/cuda_blas.cc:1881] doing cuBLAS SGEMM: at=0 bt=0 m=10 n=1024 k=128 alpha=1.000000 a=0x1020dc15300 lda=10 b=0x1020dc62e00 ldb=128 beta=0.000000 c=0x1020dc16700 ldc=10 • 2018-02-23 11:19:04.563156: I tensorflow/core/platform/default/device_tracer.cc:483] ApiCallback 1:307 func: cuLaunchKernel • 2018-02-23 11:19:04.563164: I tensorflow/core/platform/default/device_tracer.cc:497] LAUNCH stream 0x5fd4490 correllation 1217 kernel sgemm_32x32x32_NN • 2018-02-23 11:19:04.563168: I tensorflow/core/platform/default/device_tracer.cc:471] 1217 : output/output/MatMul:MatMul • 2018-02-23 11:19:04.563198: I tensorflow/core/platform/default/device_tracer.cc:483] ApiCallback 1:307 func: cuLaunchKernel • 2018-02-23 11:19:04.563210: I tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:190] PollEvents free_events_ 0 used_events_ 1 • 2018-02-23 11:19:04.563222: I tensorflow/core/framework/log_memory.cc:35] __LOG_MEMORY__ MemoryLogTensorOutput { step_id: 2 kernel_name: "output/output/MatMul" tensor { dtype: DT_FLOAT shape { dim { size: 1024 } dim { size: 10 } } allocation_description { requested_bytes: 40960 allocated_bytes: 68608 allocator_name: "GPU_0_bfc" allocation_id: 85 has_single_reference: true ptr: 1108332340992 } } } • 2018-02-23 11:19:04.563236: I tensorflow/core/common_runtime/executor.cc:1673] Synchronous kernel done: 96 step 2 output/output/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](fc1/fc1/Relu, output/kernel/read) is dead: 0 • 2018-02-23 11:19:04.563244: I tensorflow/core/common_runtime/step_stats_collector.cc:264] Save dev /job:localhost/replica:0/task:0/device:GPU:0 nt 0x7fa0a7786830
  • 19. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Log: Tensor Allocation / Deallocation 19 • 2018-02-23 11:19:04.430051: I tensorflow/core/framework/log_memory.cc:35] __LOG_MEMORY__ MemoryLogTensorOutput { step_id: 2 kernel_name: "pool3/dropout/cond/Merge" tensor { dtype: DT_FLOAT shape { dim { size: 1024 } dim { size: 12544 } } allocation_description { requested_bytes: 51380224 allocated_bytes: 82837504 allocator_name: "GPU_0_bfc" allocation_id: 81 ptr: 1108467515392 } } } • ... ... • 2018-02-23 11:19:04.564922: I tensorflow/core/common_runtime/gpu/gpu_device.cc:445] GpuDevice::Compute train/gradients/train/Mean_grad/Prod/_16 op _Send on GPU0 stream[0]. • .. ... • 2018-02-23 11:19:04.566499: I tensorflow/core/framework/log_memory.cc:35] __LOG_MEMORY__ MemoryLogTensorDeallocation { allocation_id: 81 allocator_name: "GPU_0_bfc" } This tensor is going to be deallocated…
  • 20. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Log: Event Manager 20 • 2018-02-23 11:19:04.565108: I tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:190] PollEvents free_events_ 1 used_events_ • 22018-02-23 11:19:04.565123: I tensorflow/core/platform/default/device_tracer.cc:483] ApiCallback 1:279 func: cuMemcpyDtoHAsync_v • 22018-02-23 11:19:04.565132: I tensorflow/core/platform/default/device_tracer.cc:471] 1286 : edge_69_train/gradients/train/Mean_grad/Prod • 2018-02-23 11:19:04.565138: I tensorflow/stream_executor/cuda/cuda_driver.cc:1215] successfully enqueued async memcpy d2h of 4 bytes from 0x1020f310000 to 0x1020de00a00 on stream 0x7fa18db423b0 • 2018-02-23 11:19:04.565144: I tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:151] QueueInUse free_events_ 1 used_events_ 2 • 2018-02-23 11:19:04.565152: I tensorflow/stream_executor/stream.cc:302] Called Stream::ThenRecordEvent(event=0x7fa0a778b3f0) stream=0x7fa18db1b7f0 • 2018-02-23 11:19:04.565159: I tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:190] PollEvents free_events_ 0 used_events_ 3
  • 21. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE IntraProcessRendezvous 21 • 2018-02-23 11:19:04.577400: I tensorflow/core/common_runtime/rendezvous_mgr.cc:42] IntraProcessRendezvous Send 0x7fa18d9bab20 /job:localhost/replica:0/task:0/device:CPU:0;0000000000000001;/job:localhost/replica:0/task:0/d evice:GPU:0;edge_185_Conv2_SwapIn;0:0 • 2018-02-23 11:19:03.141265: I tensorflow/core/common_runtime/rendezvous_mgr.cc:119] IntraProcessRendezvous Recv 0x7fa18d9bab20 /job:localhost/replica:0/task:0/device:CPU:0;0000000000000001;/job:localhost/replica:0/task:0/d evice:GPU:0;edge_185_Conv2_SwapIn;0:0 GPU:0
  • 22. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE TensorFlow Graph Exection 22 Device Level: memory management Session Level: Global control Executor Level: Run graph async Op Level: Compute forward and gradient. Executors get created for each subgraph Get node ( operation ) from “ready” queue Call into Stream, which contains stream_executor