SlideShare a Scribd company logo
1 of 14
Data generator in Python
for Keras (& Tensorflow)
Sean Yu, 2017/08/17
1
What is data generator and why we need it?
• 假設大家都寫過 keras or tensorflow
• Generally, we will have …
• training set / validation set / testing set
• We always load them ALL into memory
• MNIST – 60k images, 28 x 28 x 1
• Cifar10 – 60k images, 32 x 32 x 3
• EASY!
• However, if you have 100k+ 400 x 300 x 3 images,
it is impossible to load them all.
• We need to real-time load data
2
In GPU memory
In RAM (by CPU)
In Disk
Real-time data loading
• With real-time data loading, we can do many
manipulations on the data.
• Augmentation
• Add random noise
• …
• 你想對 data 幹什麼就幹什麼
• Principle of data generator
• A infinite loop that can ‘yield’ data when being requested
3
Prepared data
This batch
(for model)
Data in folder
…
Coding flow
4
Import basic libraries and
customized functions
Make data list & preload
Define data/image generator
Define model &
model related parameters
Train the model !! Evaluate the model
Today’s example data
• Kaggle, cats and dogs classification
• https://www.kaggle.com/c/dogs-vs-cats
• Keras blog 上面其實有類似的 example code 了, 改
寫一下而已
• Classification problem: cat or dog
• Training set: 25k
• Testing set 12.5k
5
Note
• Train generator 跟 Keras 的 Image generator 在
yield 打到怎麼辦?
• 看 code, 簡單來說, break 它
6
Note
• Validation augmentation?
• We only need to augment it at begin (not dynamically!)
• 乾五郝? – 我的經驗, 好像有用捏
• 增加 validation 的難度 (複雜性) – 避免 validation 進步太
神速而太早被 earlystop
7
Note
• Testing set 可以做 augmentation 嗎?
8
Note
• Testing set 可以做 augmentation 嗎?
• 本質上不是不同影像
9
Conclusion
Data generator 很好用
我希望人人都有一個
10
其他 GPU 相關使用注意事項
• 在單張卡上, 限制每個 GPU memory fraction
• 在多張卡的機器上 (如 server), 選定使用特定編號之
GPU
• 關閉 jupyter notebook 占用之 GPU 空間
11
其他 GPU 相關使用注意事項
• 在單張卡上, 限制每個 GPU memory fraction
• 在多張卡的機器上 (如 server), 選定使用特定編號之
GPU
• 關閉 jupyter notebook 占用之 GPU 空間
12
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.5 # take
50% of gpu memory
set_session(tf.Session(config=config))
其他 GPU 相關使用注意事項
• 在單張卡上, 限制每個 GPU memory fraction
• 在多張卡的機器上 (如 server), 選定使用特定
編號之 GPU
• 關閉 jupyter notebook 占用之 GPU 空間
13
- For Unix (python script)
In the terminal,
CUDA_VISIBLE_DEVICES=0 python
your_script.py
- For python script
At the script begin
Import os
os.environ[‘CUDA_VISIBLE_DEVICES’] = 0
- For jupyter notebook
At the notebook begin
%env CUDA_VISIBLE_DEVICES=0
其他 GPU 相關使用注意事項
• 在單張卡上, 限制每個 GPU memory fraction
• 在多張卡的機器上 (如 server), 選定使用特定編號之
GPU
• 關閉 jupyter notebook 占用之 GPU 空間
14
把這段加在 notebook 最後並執行
%%javascript
Jupyter.notebook.session.delete();

More Related Content

What's hot

准实时海量数据分析系统架构探究
准实时海量数据分析系统架构探究准实时海量数据分析系统架构探究
准实时海量数据分析系统架构探究Min Zhou
 
服务器基准测试-叶金荣@CYOU-20121130
服务器基准测试-叶金荣@CYOU-20121130服务器基准测试-叶金荣@CYOU-20121130
服务器基准测试-叶金荣@CYOU-20121130Jinrong Ye
 
Make your DVR playground using DevStack
Make your DVR playground using DevStackMake your DVR playground using DevStack
Make your DVR playground using DevStackJiang Jun
 
系统性能分析和优化.ppt
系统性能分析和优化.ppt系统性能分析和优化.ppt
系统性能分析和优化.pptFrank Cai
 
Ceph bluestore-tiering-2018-11-15
Ceph bluestore-tiering-2018-11-15Ceph bluestore-tiering-2018-11-15
Ceph bluestore-tiering-2018-11-15Jiaying Ren
 
goCloud!雲端電腦教室
goCloud!雲端電腦教室 goCloud!雲端電腦教室
goCloud!雲端電腦教室 Pake Cheng
 
Sheepdog介绍
Sheepdog介绍Sheepdog介绍
Sheepdog介绍Liu Yuan
 
Ceph in UnitedStack
Ceph in UnitedStackCeph in UnitedStack
Ceph in UnitedStackRongze Zhu
 
Ceph Day Shanghai - Ceph in Chinau Unicom Labs
Ceph Day Shanghai - Ceph in Chinau Unicom LabsCeph Day Shanghai - Ceph in Chinau Unicom Labs
Ceph Day Shanghai - Ceph in Chinau Unicom LabsCeph Community
 
数据库极限性能测试
数据库极限性能测试数据库极限性能测试
数据库极限性能测试helbreathszw
 
Chasingice
ChasingiceChasingice
Chasingice冰 白
 
Deskpool vdi solution introduction
Deskpool vdi solution introductionDeskpool vdi solution introduction
Deskpool vdi solution introductionDongLiwu
 
Ceph Day Beijing - Leverage Ceph for SDS in China Mobile
Ceph Day Beijing - Leverage Ceph for SDS in China MobileCeph Day Beijing - Leverage Ceph for SDS in China Mobile
Ceph Day Beijing - Leverage Ceph for SDS in China MobileDanielle Womboldt
 
Sheepdog内部实现机制
Sheepdog内部实现机制Sheepdog内部实现机制
Sheepdog内部实现机制Liu Yuan
 
排行榜V3项目总结
排行榜V3项目总结排行榜V3项目总结
排行榜V3项目总结Frank Xu
 

What's hot (16)

准实时海量数据分析系统架构探究
准实时海量数据分析系统架构探究准实时海量数据分析系统架构探究
准实时海量数据分析系统架构探究
 
服务器基准测试-叶金荣@CYOU-20121130
服务器基准测试-叶金荣@CYOU-20121130服务器基准测试-叶金荣@CYOU-20121130
服务器基准测试-叶金荣@CYOU-20121130
 
Make your DVR playground using DevStack
Make your DVR playground using DevStackMake your DVR playground using DevStack
Make your DVR playground using DevStack
 
系统性能分析和优化.ppt
系统性能分析和优化.ppt系统性能分析和优化.ppt
系统性能分析和优化.ppt
 
MogileFS
MogileFSMogileFS
MogileFS
 
Ceph bluestore-tiering-2018-11-15
Ceph bluestore-tiering-2018-11-15Ceph bluestore-tiering-2018-11-15
Ceph bluestore-tiering-2018-11-15
 
goCloud!雲端電腦教室
goCloud!雲端電腦教室 goCloud!雲端電腦教室
goCloud!雲端電腦教室
 
Sheepdog介绍
Sheepdog介绍Sheepdog介绍
Sheepdog介绍
 
Ceph in UnitedStack
Ceph in UnitedStackCeph in UnitedStack
Ceph in UnitedStack
 
Ceph Day Shanghai - Ceph in Chinau Unicom Labs
Ceph Day Shanghai - Ceph in Chinau Unicom LabsCeph Day Shanghai - Ceph in Chinau Unicom Labs
Ceph Day Shanghai - Ceph in Chinau Unicom Labs
 
数据库极限性能测试
数据库极限性能测试数据库极限性能测试
数据库极限性能测试
 
Chasingice
ChasingiceChasingice
Chasingice
 
Deskpool vdi solution introduction
Deskpool vdi solution introductionDeskpool vdi solution introduction
Deskpool vdi solution introduction
 
Ceph Day Beijing - Leverage Ceph for SDS in China Mobile
Ceph Day Beijing - Leverage Ceph for SDS in China MobileCeph Day Beijing - Leverage Ceph for SDS in China Mobile
Ceph Day Beijing - Leverage Ceph for SDS in China Mobile
 
Sheepdog内部实现机制
Sheepdog内部实现机制Sheepdog内部实现机制
Sheepdog内部实现机制
 
排行榜V3项目总结
排行榜V3项目总结排行榜V3项目总结
排行榜V3项目总结
 

Similar to [Python - Deep Learning] Data generator

Maximize Your Production Effort (Chinese)
Maximize Your Production Effort (Chinese)Maximize Your Production Effort (Chinese)
Maximize Your Production Effort (Chinese)slantsixgames
 
D2_node在淘宝的应用实践_pdf版
D2_node在淘宝的应用实践_pdf版D2_node在淘宝的应用实践_pdf版
D2_node在淘宝的应用实践_pdf版Jackson Tian
 
Data Analyse Black Horse - ClickHouse
Data Analyse Black Horse - ClickHouseData Analyse Black Horse - ClickHouse
Data Analyse Black Horse - ClickHouseJack Gao
 
Kmeans in-hadoop
Kmeans in-hadoopKmeans in-hadoop
Kmeans in-hadoopTianwei Liu
 
寫出高性能的服務與應用 那些你沒想過的事
寫出高性能的服務與應用 那些你沒想過的事寫出高性能的服務與應用 那些你沒想過的事
寫出高性能的服務與應用 那些你沒想過的事Chieh (Jack) Yu
 
Node.js在淘宝的应用实践
Node.js在淘宝的应用实践Node.js在淘宝的应用实践
Node.js在淘宝的应用实践taobao.com
 
賽門鐵克 NetBackup 7.5 完整簡報
賽門鐵克 NetBackup 7.5 完整簡報賽門鐵克 NetBackup 7.5 完整簡報
賽門鐵克 NetBackup 7.5 完整簡報Wales Chen
 
2015-05-20 製造業生產歷程全方位整合查詢與探勘的規劃心法
2015-05-20 製造業生產歷程全方位整合查詢與探勘的規劃心法2015-05-20 製造業生產歷程全方位整合查詢與探勘的規劃心法
2015-05-20 製造業生產歷程全方位整合查詢與探勘的規劃心法Jazz Yao-Tsung Wang
 
用 Keras 玩 Machine Learning
用 Keras 玩 Machine Learning用 Keras 玩 Machine Learning
用 Keras 玩 Machine Learning家弘 周
 
Continuous Delivery Workshop with Ansible x GitLab CI
Continuous Delivery Workshop with Ansible x GitLab CIContinuous Delivery Workshop with Ansible x GitLab CI
Continuous Delivery Workshop with Ansible x GitLab CIChu-Siang Lai
 
Running a Service in Production without Losing Your Sanity
Running a Service in Production without Losing Your SanityRunning a Service in Production without Losing Your Sanity
Running a Service in Production without Losing Your SanityPoga Po
 
ProtoStar:“次世代”的移动端渲染
ProtoStar:“次世代”的移动端渲染ProtoStar:“次世代”的移动端渲染
ProtoStar:“次世代”的移动端渲染Ning Hu
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data邦宇 叶
 
如何設計電腦 -- 還有讓電腦變快的那些方法
如何設計電腦  -- 還有讓電腦變快的那些方法如何設計電腦  -- 還有讓電腦變快的那些方法
如何設計電腦 -- 還有讓電腦變快的那些方法鍾誠 陳鍾誠
 
開發人員必須知道的 Kubernetes 核心技術 - Kubernetes Summit 2018
開發人員必須知道的 Kubernetes 核心技術 - Kubernetes Summit 2018開發人員必須知道的 Kubernetes 核心技術 - Kubernetes Summit 2018
開發人員必須知道的 Kubernetes 核心技術 - Kubernetes Summit 2018Will Huang
 
OpenStack Introduction Ecosystem
OpenStack Introduction EcosystemOpenStack Introduction Ecosystem
OpenStack Introduction EcosystemNUTC, imac
 
Behavior+tree+ai lite
Behavior+tree+ai liteBehavior+tree+ai lite
Behavior+tree+ai lite勇浩 赖
 

Similar to [Python - Deep Learning] Data generator (20)

Maximize Your Production Effort (Chinese)
Maximize Your Production Effort (Chinese)Maximize Your Production Effort (Chinese)
Maximize Your Production Effort (Chinese)
 
D2_node在淘宝的应用实践_pdf版
D2_node在淘宝的应用实践_pdf版D2_node在淘宝的应用实践_pdf版
D2_node在淘宝的应用实践_pdf版
 
Data Analyse Black Horse - ClickHouse
Data Analyse Black Horse - ClickHouseData Analyse Black Horse - ClickHouse
Data Analyse Black Horse - ClickHouse
 
Kmeans in-hadoop
Kmeans in-hadoopKmeans in-hadoop
Kmeans in-hadoop
 
寫出高性能的服務與應用 那些你沒想過的事
寫出高性能的服務與應用 那些你沒想過的事寫出高性能的服務與應用 那些你沒想過的事
寫出高性能的服務與應用 那些你沒想過的事
 
Node.js在淘宝的应用实践
Node.js在淘宝的应用实践Node.js在淘宝的应用实践
Node.js在淘宝的应用实践
 
賽門鐵克 NetBackup 7.5 完整簡報
賽門鐵克 NetBackup 7.5 完整簡報賽門鐵克 NetBackup 7.5 完整簡報
賽門鐵克 NetBackup 7.5 完整簡報
 
2015-05-20 製造業生產歷程全方位整合查詢與探勘的規劃心法
2015-05-20 製造業生產歷程全方位整合查詢與探勘的規劃心法2015-05-20 製造業生產歷程全方位整合查詢與探勘的規劃心法
2015-05-20 製造業生產歷程全方位整合查詢與探勘的規劃心法
 
Something about Kafka - Why Kafka is so fast
Something about Kafka - Why Kafka is so fastSomething about Kafka - Why Kafka is so fast
Something about Kafka - Why Kafka is so fast
 
用 Keras 玩 Machine Learning
用 Keras 玩 Machine Learning用 Keras 玩 Machine Learning
用 Keras 玩 Machine Learning
 
Continuous Delivery Workshop with Ansible x GitLab CI
Continuous Delivery Workshop with Ansible x GitLab CIContinuous Delivery Workshop with Ansible x GitLab CI
Continuous Delivery Workshop with Ansible x GitLab CI
 
Running a Service in Production without Losing Your Sanity
Running a Service in Production without Losing Your SanityRunning a Service in Production without Losing Your Sanity
Running a Service in Production without Losing Your Sanity
 
LabView with Lego NXT
LabView  with Lego NXTLabView  with Lego NXT
LabView with Lego NXT
 
Zabbix in PPTV
Zabbix in PPTVZabbix in PPTV
Zabbix in PPTV
 
ProtoStar:“次世代”的移动端渲染
ProtoStar:“次世代”的移动端渲染ProtoStar:“次世代”的移动端渲染
ProtoStar:“次世代”的移动端渲染
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
如何設計電腦 -- 還有讓電腦變快的那些方法
如何設計電腦  -- 還有讓電腦變快的那些方法如何設計電腦  -- 還有讓電腦變快的那些方法
如何設計電腦 -- 還有讓電腦變快的那些方法
 
開發人員必須知道的 Kubernetes 核心技術 - Kubernetes Summit 2018
開發人員必須知道的 Kubernetes 核心技術 - Kubernetes Summit 2018開發人員必須知道的 Kubernetes 核心技術 - Kubernetes Summit 2018
開發人員必須知道的 Kubernetes 核心技術 - Kubernetes Summit 2018
 
OpenStack Introduction Ecosystem
OpenStack Introduction EcosystemOpenStack Introduction Ecosystem
OpenStack Introduction Ecosystem
 
Behavior+tree+ai lite
Behavior+tree+ai liteBehavior+tree+ai lite
Behavior+tree+ai lite
 

More from Sean Yu

AI-powered Medical Imaging Analysis for Precision Medicine
AI-powered Medical Imaging Analysis for Precision MedicineAI-powered Medical Imaging Analysis for Precision Medicine
AI-powered Medical Imaging Analysis for Precision MedicineSean Yu
 
Weakly Supervised Whole Slide Image Analysis Using Cloud Computing
Weakly Supervised Whole Slide Image Analysis Using Cloud ComputingWeakly Supervised Whole Slide Image Analysis Using Cloud Computing
Weakly Supervised Whole Slide Image Analysis Using Cloud ComputingSean Yu
 
NTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer LearningNTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer LearningSean Yu
 
NTU DBME5028 Week5 Introduction to Machine Learning
NTU DBME5028 Week5 Introduction to Machine Learning NTU DBME5028 Week5 Introduction to Machine Learning
NTU DBME5028 Week5 Introduction to Machine Learning Sean Yu
 
Practical aspects of medical image ai for hospital (IRB course)
Practical aspects of medical image ai for hospital (IRB course)Practical aspects of medical image ai for hospital (IRB course)
Practical aspects of medical image ai for hospital (IRB course)Sean Yu
 
Baisc Deep Learning HandsOn
Baisc Deep Learning HandsOnBaisc Deep Learning HandsOn
Baisc Deep Learning HandsOnSean Yu
 
[Taiwan AI Academy] Machine learning and deep learning application examples
[Taiwan AI Academy] Machine learning and deep learning application examples[Taiwan AI Academy] Machine learning and deep learning application examples
[Taiwan AI Academy] Machine learning and deep learning application examplesSean Yu
 
日常生活中的機器學習與 AI 應用 - 院區公開演講
日常生活中的機器學習與 AI 應用 - 院區公開演講日常生活中的機器學習與 AI 應用 - 院區公開演講
日常生活中的機器學習與 AI 應用 - 院區公開演講Sean Yu
 
台灣人工智慧年會 - 兩位跨領域者的深度學習之旅
台灣人工智慧年會 - 兩位跨領域者的深度學習之旅台灣人工智慧年會 - 兩位跨領域者的深度學習之旅
台灣人工智慧年會 - 兩位跨領域者的深度學習之旅Sean Yu
 
R 語言教學: 探索性資料分析與文字探勘初探
R 語言教學: 探索性資料分析與文字探勘初探R 語言教學: 探索性資料分析與文字探勘初探
R 語言教學: 探索性資料分析與文字探勘初探Sean Yu
 

More from Sean Yu (10)

AI-powered Medical Imaging Analysis for Precision Medicine
AI-powered Medical Imaging Analysis for Precision MedicineAI-powered Medical Imaging Analysis for Precision Medicine
AI-powered Medical Imaging Analysis for Precision Medicine
 
Weakly Supervised Whole Slide Image Analysis Using Cloud Computing
Weakly Supervised Whole Slide Image Analysis Using Cloud ComputingWeakly Supervised Whole Slide Image Analysis Using Cloud Computing
Weakly Supervised Whole Slide Image Analysis Using Cloud Computing
 
NTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer LearningNTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer Learning
 
NTU DBME5028 Week5 Introduction to Machine Learning
NTU DBME5028 Week5 Introduction to Machine Learning NTU DBME5028 Week5 Introduction to Machine Learning
NTU DBME5028 Week5 Introduction to Machine Learning
 
Practical aspects of medical image ai for hospital (IRB course)
Practical aspects of medical image ai for hospital (IRB course)Practical aspects of medical image ai for hospital (IRB course)
Practical aspects of medical image ai for hospital (IRB course)
 
Baisc Deep Learning HandsOn
Baisc Deep Learning HandsOnBaisc Deep Learning HandsOn
Baisc Deep Learning HandsOn
 
[Taiwan AI Academy] Machine learning and deep learning application examples
[Taiwan AI Academy] Machine learning and deep learning application examples[Taiwan AI Academy] Machine learning and deep learning application examples
[Taiwan AI Academy] Machine learning and deep learning application examples
 
日常生活中的機器學習與 AI 應用 - 院區公開演講
日常生活中的機器學習與 AI 應用 - 院區公開演講日常生活中的機器學習與 AI 應用 - 院區公開演講
日常生活中的機器學習與 AI 應用 - 院區公開演講
 
台灣人工智慧年會 - 兩位跨領域者的深度學習之旅
台灣人工智慧年會 - 兩位跨領域者的深度學習之旅台灣人工智慧年會 - 兩位跨領域者的深度學習之旅
台灣人工智慧年會 - 兩位跨領域者的深度學習之旅
 
R 語言教學: 探索性資料分析與文字探勘初探
R 語言教學: 探索性資料分析與文字探勘初探R 語言教學: 探索性資料分析與文字探勘初探
R 語言教學: 探索性資料分析與文字探勘初探
 

[Python - Deep Learning] Data generator

  • 1. Data generator in Python for Keras (& Tensorflow) Sean Yu, 2017/08/17 1
  • 2. What is data generator and why we need it? • 假設大家都寫過 keras or tensorflow • Generally, we will have … • training set / validation set / testing set • We always load them ALL into memory • MNIST – 60k images, 28 x 28 x 1 • Cifar10 – 60k images, 32 x 32 x 3 • EASY! • However, if you have 100k+ 400 x 300 x 3 images, it is impossible to load them all. • We need to real-time load data 2
  • 3. In GPU memory In RAM (by CPU) In Disk Real-time data loading • With real-time data loading, we can do many manipulations on the data. • Augmentation • Add random noise • … • 你想對 data 幹什麼就幹什麼 • Principle of data generator • A infinite loop that can ‘yield’ data when being requested 3 Prepared data This batch (for model) Data in folder …
  • 4. Coding flow 4 Import basic libraries and customized functions Make data list & preload Define data/image generator Define model & model related parameters Train the model !! Evaluate the model
  • 5. Today’s example data • Kaggle, cats and dogs classification • https://www.kaggle.com/c/dogs-vs-cats • Keras blog 上面其實有類似的 example code 了, 改 寫一下而已 • Classification problem: cat or dog • Training set: 25k • Testing set 12.5k 5
  • 6. Note • Train generator 跟 Keras 的 Image generator 在 yield 打到怎麼辦? • 看 code, 簡單來說, break 它 6
  • 7. Note • Validation augmentation? • We only need to augment it at begin (not dynamically!) • 乾五郝? – 我的經驗, 好像有用捏 • 增加 validation 的難度 (複雜性) – 避免 validation 進步太 神速而太早被 earlystop 7
  • 8. Note • Testing set 可以做 augmentation 嗎? 8
  • 9. Note • Testing set 可以做 augmentation 嗎? • 本質上不是不同影像 9
  • 11. 其他 GPU 相關使用注意事項 • 在單張卡上, 限制每個 GPU memory fraction • 在多張卡的機器上 (如 server), 選定使用特定編號之 GPU • 關閉 jupyter notebook 占用之 GPU 空間 11
  • 12. 其他 GPU 相關使用注意事項 • 在單張卡上, 限制每個 GPU memory fraction • 在多張卡的機器上 (如 server), 選定使用特定編號之 GPU • 關閉 jupyter notebook 占用之 GPU 空間 12 import tensorflow as tf from keras.backend.tensorflow_backend import set_session config = tf.ConfigProto() config.gpu_options.per_process_gpu_memory_fraction = 0.5 # take 50% of gpu memory set_session(tf.Session(config=config))
  • 13. 其他 GPU 相關使用注意事項 • 在單張卡上, 限制每個 GPU memory fraction • 在多張卡的機器上 (如 server), 選定使用特定 編號之 GPU • 關閉 jupyter notebook 占用之 GPU 空間 13 - For Unix (python script) In the terminal, CUDA_VISIBLE_DEVICES=0 python your_script.py - For python script At the script begin Import os os.environ[‘CUDA_VISIBLE_DEVICES’] = 0 - For jupyter notebook At the notebook begin %env CUDA_VISIBLE_DEVICES=0
  • 14. 其他 GPU 相關使用注意事項 • 在單張卡上, 限制每個 GPU memory fraction • 在多張卡的機器上 (如 server), 選定使用特定編號之 GPU • 關閉 jupyter notebook 占用之 GPU 空間 14 把這段加在 notebook 最後並執行 %%javascript Jupyter.notebook.session.delete();