Submit Search
Upload
[232]TensorRT를 활용한 딥러닝 Inference 최적화
•
0 likes
•
412 views
NAVER D2
Follow
[232]TensorRT를 활용한 딥러닝 Inference 최적화
Read less
Read more
Technology
Report
Share
Report
Share
1 of 42
Download now
Download to read offline
Recommended
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
NAVER D2
node ffi
node ffi
偉格 高
Openstack taskflow 簡介
Openstack taskflow 簡介
kao kuo-tung
RuntimeUnitTestToolkit for Unity(English)
RuntimeUnitTestToolkit for Unity(English)
Yoshifumi Kawai
Gevent what's the point
Gevent what's the point
seanmcq
TensorFlow XLA RPC
TensorFlow XLA RPC
Mr. Vengineer
All you need to know about the JavaScript event loop
All you need to know about the JavaScript event loop
Saša Tatar
Доклад Антона Поварова "Go in Badoo" с Golang Meetup
Доклад Антона Поварова "Go in Badoo" с Golang Meetup
Badoo Development
Recommended
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
NAVER D2
node ffi
node ffi
偉格 高
Openstack taskflow 簡介
Openstack taskflow 簡介
kao kuo-tung
RuntimeUnitTestToolkit for Unity(English)
RuntimeUnitTestToolkit for Unity(English)
Yoshifumi Kawai
Gevent what's the point
Gevent what's the point
seanmcq
TensorFlow XLA RPC
TensorFlow XLA RPC
Mr. Vengineer
All you need to know about the JavaScript event loop
All you need to know about the JavaScript event loop
Saša Tatar
Доклад Антона Поварова "Go in Badoo" с Golang Meetup
Доклад Антона Поварова "Go in Badoo" с Golang Meetup
Badoo Development
Redis & ZeroMQ: How to scale your application
Redis & ZeroMQ: How to scale your application
rjsmelo
Redis as a message queue
Redis as a message queue
Brandon Lamb
Bridge TensorFlow to run on Intel nGraph backends (v0.4)
Bridge TensorFlow to run on Intel nGraph backends (v0.4)
Mr. Vengineer
RestMQ - HTTP/Redis based Message Queue
RestMQ - HTTP/Redis based Message Queue
Gleicon Moraes
Php 5.6 From the Inside Out
Php 5.6 From the Inside Out
Ferenc Kovács
Memory Management of C# with Unity Native Collections
Memory Management of C# with Unity Native Collections
Yoshifumi Kawai
Qt Rest Server
Qt Rest Server
Vasiliy Sorokin
Bridge TensorFlow to run on Intel nGraph backends (v0.5)
Bridge TensorFlow to run on Intel nGraph backends (v0.5)
Mr. Vengineer
Linux fundamental - Chap 14 shell script
Linux fundamental - Chap 14 shell script
Kenny (netman)
Python Load Testing - Pygotham 2012
Python Load Testing - Pygotham 2012
Dan Kuebrich
Go Profiling - John Graham-Cumming
Go Profiling - John Graham-Cumming
Cloudflare
Node.js - Best practices
Node.js - Best practices
Felix Geisendörfer
Node child process
Node child process
LearningTech
Vagrant + Rouster at salesforce.com - PuppetConf 2013
Vagrant + Rouster at salesforce.com - PuppetConf 2013
Puppet
Using Puppet to Create a Dynamic Network - PuppetConf 2013
Using Puppet to Create a Dynamic Network - PuppetConf 2013
Puppet
Go memory
Go memory
jgrahamc
Protocol handler in Gecko
Protocol handler in Gecko
Chih-Hsuan Kuo
Binary Studio Academy: Concurrency in C# 5.0
Binary Studio Academy: Concurrency in C# 5.0
Binary Studio
Cooking pies with Celery
Cooking pies with Celery
Aleksandr Mokrov
Openstack 簡介
Openstack 簡介
kao kuo-tung
服务框架: Thrift & PasteScript
服务框架: Thrift & PasteScript
Qiangning Hong
In Class AssignmetzCST280W13a-1.pdfCST 280 In-Class Pract.docx
In Class AssignmetzCST280W13a-1.pdfCST 280 In-Class Pract.docx
bradburgess22840
More Related Content
What's hot
Redis & ZeroMQ: How to scale your application
Redis & ZeroMQ: How to scale your application
rjsmelo
Redis as a message queue
Redis as a message queue
Brandon Lamb
Bridge TensorFlow to run on Intel nGraph backends (v0.4)
Bridge TensorFlow to run on Intel nGraph backends (v0.4)
Mr. Vengineer
RestMQ - HTTP/Redis based Message Queue
RestMQ - HTTP/Redis based Message Queue
Gleicon Moraes
Php 5.6 From the Inside Out
Php 5.6 From the Inside Out
Ferenc Kovács
Memory Management of C# with Unity Native Collections
Memory Management of C# with Unity Native Collections
Yoshifumi Kawai
Qt Rest Server
Qt Rest Server
Vasiliy Sorokin
Bridge TensorFlow to run on Intel nGraph backends (v0.5)
Bridge TensorFlow to run on Intel nGraph backends (v0.5)
Mr. Vengineer
Linux fundamental - Chap 14 shell script
Linux fundamental - Chap 14 shell script
Kenny (netman)
Python Load Testing - Pygotham 2012
Python Load Testing - Pygotham 2012
Dan Kuebrich
Go Profiling - John Graham-Cumming
Go Profiling - John Graham-Cumming
Cloudflare
Node.js - Best practices
Node.js - Best practices
Felix Geisendörfer
Node child process
Node child process
LearningTech
Vagrant + Rouster at salesforce.com - PuppetConf 2013
Vagrant + Rouster at salesforce.com - PuppetConf 2013
Puppet
Using Puppet to Create a Dynamic Network - PuppetConf 2013
Using Puppet to Create a Dynamic Network - PuppetConf 2013
Puppet
Go memory
Go memory
jgrahamc
Protocol handler in Gecko
Protocol handler in Gecko
Chih-Hsuan Kuo
Binary Studio Academy: Concurrency in C# 5.0
Binary Studio Academy: Concurrency in C# 5.0
Binary Studio
Cooking pies with Celery
Cooking pies with Celery
Aleksandr Mokrov
Openstack 簡介
Openstack 簡介
kao kuo-tung
What's hot
(20)
Redis & ZeroMQ: How to scale your application
Redis & ZeroMQ: How to scale your application
Redis as a message queue
Redis as a message queue
Bridge TensorFlow to run on Intel nGraph backends (v0.4)
Bridge TensorFlow to run on Intel nGraph backends (v0.4)
RestMQ - HTTP/Redis based Message Queue
RestMQ - HTTP/Redis based Message Queue
Php 5.6 From the Inside Out
Php 5.6 From the Inside Out
Memory Management of C# with Unity Native Collections
Memory Management of C# with Unity Native Collections
Qt Rest Server
Qt Rest Server
Bridge TensorFlow to run on Intel nGraph backends (v0.5)
Bridge TensorFlow to run on Intel nGraph backends (v0.5)
Linux fundamental - Chap 14 shell script
Linux fundamental - Chap 14 shell script
Python Load Testing - Pygotham 2012
Python Load Testing - Pygotham 2012
Go Profiling - John Graham-Cumming
Go Profiling - John Graham-Cumming
Node.js - Best practices
Node.js - Best practices
Node child process
Node child process
Vagrant + Rouster at salesforce.com - PuppetConf 2013
Vagrant + Rouster at salesforce.com - PuppetConf 2013
Using Puppet to Create a Dynamic Network - PuppetConf 2013
Using Puppet to Create a Dynamic Network - PuppetConf 2013
Go memory
Go memory
Protocol handler in Gecko
Protocol handler in Gecko
Binary Studio Academy: Concurrency in C# 5.0
Binary Studio Academy: Concurrency in C# 5.0
Cooking pies with Celery
Cooking pies with Celery
Openstack 簡介
Openstack 簡介
Similar to [232]TensorRT를 활용한 딥러닝 Inference 최적화
服务框架: Thrift & PasteScript
服务框架: Thrift & PasteScript
Qiangning Hong
In Class AssignmetzCST280W13a-1.pdfCST 280 In-Class Pract.docx
In Class AssignmetzCST280W13a-1.pdfCST 280 In-Class Pract.docx
bradburgess22840
Kapacitor - Real Time Data Processing Engine
Kapacitor - Real Time Data Processing Engine
Prashant Vats
Instruction1. Please read the two articles. (Kincheloe part 1 &.docx
Instruction1. Please read the two articles. (Kincheloe part 1 &.docx
carliotwaycave
33rd Degree 2013, Bad Tests, Good Tests
33rd Degree 2013, Bad Tests, Good Tests
Tomek Kaczanowski
如何透過 Go-kit 快速搭建微服務架構應用程式實戰
如何透過 Go-kit 快速搭建微服務架構應用程式實戰
KAI CHU CHUNG
How Many Ohs? (An Integration Guide to Apex & Triple-o)
How Many Ohs? (An Integration Guide to Apex & Triple-o)
OPNFV
Microkernel Development
Microkernel Development
Rodrigo Almeida
How Does Kubernetes Build OpenAPI Specifications?
How Does Kubernetes Build OpenAPI Specifications?
reallavalamp
The Ring programming language version 1.5.3 book - Part 8 of 184
The Ring programming language version 1.5.3 book - Part 8 of 184
Mahmoud Samir Fayed
VPN Access Runbook
VPN Access Runbook
Taha Shakeel
NodeJs
NodeJs
dizabl
The Ring programming language version 1.5.4 book - Part 8 of 185
The Ring programming language version 1.5.4 book - Part 8 of 185
Mahmoud Samir Fayed
2012 JDays Bad Tests Good Tests
2012 JDays Bad Tests Good Tests
Tomek Kaczanowski
MT_01_unittest_python.pdf
MT_01_unittest_python.pdf
Hans Jones
Where the wild things are - Benchmarking and Micro-Optimisations
Where the wild things are - Benchmarking and Micro-Optimisations
Matt Warren
Keras and TensorFlow
Keras and TensorFlow
NopphawanTamkuan
Create a JAVA program that performs file IO and database interaction.pdf
Create a JAVA program that performs file IO and database interaction.pdf
malavshah9013
The Ring programming language version 1.5.1 book - Part 7 of 180
The Ring programming language version 1.5.1 book - Part 7 of 180
Mahmoud Samir Fayed
Below is the question I need help with. It need to be done in Java. .pdf
Below is the question I need help with. It need to be done in Java. .pdf
aroraenterprisesmbd
Similar to [232]TensorRT를 활용한 딥러닝 Inference 최적화
(20)
服务框架: Thrift & PasteScript
服务框架: Thrift & PasteScript
In Class AssignmetzCST280W13a-1.pdfCST 280 In-Class Pract.docx
In Class AssignmetzCST280W13a-1.pdfCST 280 In-Class Pract.docx
Kapacitor - Real Time Data Processing Engine
Kapacitor - Real Time Data Processing Engine
Instruction1. Please read the two articles. (Kincheloe part 1 &.docx
Instruction1. Please read the two articles. (Kincheloe part 1 &.docx
33rd Degree 2013, Bad Tests, Good Tests
33rd Degree 2013, Bad Tests, Good Tests
如何透過 Go-kit 快速搭建微服務架構應用程式實戰
如何透過 Go-kit 快速搭建微服務架構應用程式實戰
How Many Ohs? (An Integration Guide to Apex & Triple-o)
How Many Ohs? (An Integration Guide to Apex & Triple-o)
Microkernel Development
Microkernel Development
How Does Kubernetes Build OpenAPI Specifications?
How Does Kubernetes Build OpenAPI Specifications?
The Ring programming language version 1.5.3 book - Part 8 of 184
The Ring programming language version 1.5.3 book - Part 8 of 184
VPN Access Runbook
VPN Access Runbook
NodeJs
NodeJs
The Ring programming language version 1.5.4 book - Part 8 of 185
The Ring programming language version 1.5.4 book - Part 8 of 185
2012 JDays Bad Tests Good Tests
2012 JDays Bad Tests Good Tests
MT_01_unittest_python.pdf
MT_01_unittest_python.pdf
Where the wild things are - Benchmarking and Micro-Optimisations
Where the wild things are - Benchmarking and Micro-Optimisations
Keras and TensorFlow
Keras and TensorFlow
Create a JAVA program that performs file IO and database interaction.pdf
Create a JAVA program that performs file IO and database interaction.pdf
The Ring programming language version 1.5.1 book - Part 7 of 180
The Ring programming language version 1.5.1 book - Part 7 of 180
Below is the question I need help with. It need to be done in Java. .pdf
Below is the question I need help with. It need to be done in Java. .pdf
More from NAVER D2
[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다
NAVER D2
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
NAVER D2
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
NAVER D2
[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발
NAVER D2
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
NAVER D2
[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A
NAVER D2
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기
NAVER D2
[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning
NAVER D2
[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications
NAVER D2
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
NAVER D2
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
NAVER D2
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
NAVER D2
[224]네이버 검색과 개인화
[224]네이버 검색과 개인화
NAVER D2
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
NAVER D2
[213] Fashion Visual Search
[213] Fashion Visual Search
NAVER D2
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화
NAVER D2
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
NAVER D2
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
NAVER D2
[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?
NAVER D2
[231] Clova 화자인식
[231] Clova 화자인식
NAVER D2
More from NAVER D2
(20)
[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning
[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[224]네이버 검색과 개인화
[224]네이버 검색과 개인화
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[213] Fashion Visual Search
[213] Fashion Visual Search
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?
[231] Clova 화자인식
[231] Clova 화자인식
Recently uploaded
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
Deakin University
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
Alex Barbosa Coqueiro
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Dubai Multi Commodity Centre
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
Softradix Technologies
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Rizwan Syed
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Wonjun Hwang
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
null - The Open Security Community
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
Commit University
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
ngoud9212
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Ridwan Fadjar
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
Fwdays
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
Lorenzo Miniero
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
jimielynbastida
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
comworks
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
2toLead Limited
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
The Digital Insurer
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Fwdays
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Pixlogix Infotech
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
Memoori
Recently uploaded
(20)
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
[232]TensorRT를 활용한 딥러닝 Inference 최적화
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
Step 1: TF모델을
TRT 포맷으로 변환 Step 2: 모델 Parser 생성 Step 3: 입/출력 레이어 정보 입력 Step 4: 모델의 최적화 및 런타임 Engine 생성 Step 5: 엔진을 파일로 저장 Step 6: 엔진을 파일에서 읽음 Step 7: Inference 수행
15.
16.
17.
18.
19.
20.
21.
22.
23.
• • • • • •
24.
• • • PReLUPlugin::PReLUPlugin(const Weights *weights,
int nbWeights) { mWeights = weights[0]; mWeights.values = malloc(mWeights.count * type2size(mWeights.type)); memcpy(const_cast<void *>(mWeights.values), weights[0].values, mWeights.count * type2size(mWeights.type)); }
25.
int PReLUPlugin::enqueue(int batchSize,
const void *const *inputs, void **outputs, void *workspace, cudaStream_t stream) { const float zerof{0.0f}; const __half zeroh = fp16::__float2half(0.0f); if (mWeights.type == DataType::__float) { CHECK(Forward_gpu<__float>(batchSize * mNbInputCount, mNbInputChannels, mNbInputHeight * mNbInputHeight, reinterpret_cast<const __float *>(mDeviceKernel), reinterpret_cast<const __float *>(inputs[0]), reinterpret_cast<__float *>(outputs[0]), zerof, mChannelShared ? mNbInputChannels : 1, stream)); } else { // DataType::kFLOAT } return 0; }
26.
template <typename Ftype> __global__
void PReLUForward(const int n, const int channels, const int dim, const Ftype* slope_data, const Ftype* in, Ftype* out, const Ftype zero, const int div_factor) { CUDA_KERNEL_LOOP(index, n) { int c = (index / dim) % channels / div_factor; out[index] = (in[index] > (Ftype(zero))) ? in[index] : in[index] * *(reinterpret_cast<const Ftype*>(slope_data)+c); } }
27.
template <typename Ftype> cudaError_t
Forward_gpu(const int count, const int channels, const int dim, const Ftype* mDeviceKernel, const Ftype* bottom_data, Ftype* top_data, const Ftype zero, const int div_factor, const cudaStream_t stream) { PReLUForward<<<CAFFE_GET_BLOCKS(count), CAFFE_CUDA_NUM_THREADS, 0, stream>>> (count, channels, dim, mDeviceKernel, bottom_data, top_data, zero, div_factor); return cudaGetLastError(); }
28.
29.
IPluginExt *PReLUPlugin::clone() const
override { return new PReLUPlugin(&mWeights, 1); } IPlugin* pluginFactory::createPlugin(const char* layerName, const Weights* serialData, int nbWeights) override { return new PReLUPlugin(serialData, serialLength); }
30.
PluginFactory parserPluginFactory; parser->setPluginFactoryExt(&parserPluginFactory); const IBlobNameToTensor
*blobNameToTensor = parser->parse(gParams.deployFile.c_str(), // caffe deploy file gParams.modelFile.c_str(), // caffe model file *network, // network definition that the parser will populate gParams.fp16 ? DataType::kHALF : DataType::kFLOAT);
31.
builder->setMaxBatchSize(gParams.batchSize); builder->setMaxWorkspaceSize(size_t(gParams.workspaceSize) << 20); builder->setFp16Mode(gParams.fp16); ICudaEngine*
engine = builder->buildCudaEngine(*network);
32.
void PReLUPlugin::serialize(void *buffer)
{ char *d = static_cast<char *>(buffer), *a = d; write(d, mNbInputChannels); write(d, mNbInputHeight); write(d, mNbInputWidth); write(d, mNbInputCount); write(d, mChannelShared); write(d, mWeights.count); write(d, mWeights.type); convertAndCopyToBuffer(d, mWeights); assert(d == a + getSerializationSize()); }
33.
PReLUPlugin::PReLUPlugin(const void *data,
size_t length) { const char *d = static_cast<const char *>(data), *a = d; read<int>(d, mNbInputChannels); read<int>(d, mNbInputHeight); read<int>(d, mNbInputWidth); read<int>(d, mNbInputCount); read<bool>(d, mChannelShared); read<int64_t>(d, mWeights.count); read<DataType>(d, mWeights.type); mWeights.values = malloc(mWeights.count * type2size(mWeights.type)); memcpy(const_cast<void *>(mWeights.values), d, mWeights.count * type2size(mWeights.type)); deserializeToDevice(d, mDeviceKernel, mWeights.count * type2size(mWeights.type)); assert(d == a + length); }
34.
Iplugin *PluginFactory::createPlugin(const char
*layerName, const void *serialData, size_t serialLength) override { return new PReLUPlugin(serialData, serialLength); }
35.
PluginFactory pluginFactory; engine =
infer->deserializeCudaEngine(trt_plan_file, size, &pluginFactory);
36.
cudaMemcpyAsync(buffers[inputIndex], input, batchSize
* INPUT_SIZE * sizeof(float), cudaMemcpyHostToDevice, stream); context->enqueue(gParams.batchSize, &buffers[0], stream, nullptr); cudaMemcpyAsync(output, buffers[outputIndex], batchSize * OUTPUT_SIZE * sizeof(float), cudaMemcpyDeviceToHost, stream); cudaStreamSynchronize(stream); cudaStreamCreate(&stream)); IExecutionContext* context = engine->createExecutionContext();
Download now