SlideShare a Scribd company logo
1 of 7
CUDA Speedup for DVC System   Members:  黃琮閔  [email_address] 王品翔  [email_address] 呂侃翰  698470271 Presented by  王品翔  GPU Programming  Final presentation
Topic Review ,[object Object],[object Object],[object Object],Low-complexity encoder High-complexity decoder
Project Goal ,[object Object],[object Object],->  Side Information Generation  :   Motion estimation procedure LPDC SI Generation ->CUDA speedup by  小小白學姐 DISCOVER codec
[object Object],[object Object],Step 1)  Low-pass Filter : Step 2)  Upsampling (FIR Filter) : Step 3)  Forward Motion Estimation :  Step 4) Bidirectional Motion Estimation : Step 5) Motion Filter and Compensation : 2 ms 63 ms 442 ms 1 ms 1 ms 12% 87% Total (avg.)= 510 ms CUDA Speedup! 10 ms -> 77ms
[object Object],[object Object],Step 1)  Low-pass Filter : Step 2)  Upsampling (FIR Filter) : Step 3)  Forward Motion Estimation :  Step 4) Bidirectional Motion Estimation : Step 5) Motion Filter and Compensation : 2 ms 63 ms 10 ms 1 ms 1 ms 4% 78% 13% 2% 2% Total (avg.)= 77 ms CUDA Speedup! 7 ms -> 21ms  (sequential / parallel = 24)
Demo  1.Sequential Mode   2.CUDA Mode
Thank you

More Related Content

What's hot

Design Of A PI Rate Controller For Mitigating SIP Overload
Design Of A PI Rate Controller For Mitigating SIP OverloadDesign Of A PI Rate Controller For Mitigating SIP Overload
Design Of A PI Rate Controller For Mitigating SIP Overload
Yang Hong
 
R&D of Equipment
R&D of EquipmentR&D of Equipment
R&D of Equipment
Tomoya Ito
 
AndreaPetrucci_ACAT_2007
AndreaPetrucci_ACAT_2007AndreaPetrucci_ACAT_2007
AndreaPetrucci_ACAT_2007
Andrea PETRUCCI
 
LinkedIn – Engineer I at Infineon Technologies
LinkedIn – Engineer I at Infineon TechnologiesLinkedIn – Engineer I at Infineon Technologies
LinkedIn – Engineer I at Infineon Technologies
Jan Yeong Koay
 
wd1-01-jaseel-madhusudan-pres-user
wd1-01-jaseel-madhusudan-pres-userwd1-01-jaseel-madhusudan-pres-user
wd1-01-jaseel-madhusudan-pres-user
jaseel_abdulla
 
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
NECST Lab @ Politecnico di Milano
 

What's hot (19)

4 U 5 Slides With Notes
4 U 5 Slides With Notes4 U 5 Slides With Notes
4 U 5 Slides With Notes
 
Dpdk – IoT packet analyzer
Dpdk – IoT packet analyzerDpdk – IoT packet analyzer
Dpdk – IoT packet analyzer
 
Design Of A PI Rate Controller For Mitigating SIP Overload
Design Of A PI Rate Controller For Mitigating SIP OverloadDesign Of A PI Rate Controller For Mitigating SIP Overload
Design Of A PI Rate Controller For Mitigating SIP Overload
 
R&D of Equipment
R&D of EquipmentR&D of Equipment
R&D of Equipment
 
AndreaPetrucci_ACAT_2007
AndreaPetrucci_ACAT_2007AndreaPetrucci_ACAT_2007
AndreaPetrucci_ACAT_2007
 
Snug 2014 China
Snug 2014 ChinaSnug 2014 China
Snug 2014 China
 
Enabling Active Flow Manipulation In Silicon-based Network Forwarding Engines
Enabling Active Flow Manipulation In Silicon-based Network Forwarding EnginesEnabling Active Flow Manipulation In Silicon-based Network Forwarding Engines
Enabling Active Flow Manipulation In Silicon-based Network Forwarding Engines
 
Open stackdaykorea2016 wedge
Open stackdaykorea2016 wedgeOpen stackdaykorea2016 wedge
Open stackdaykorea2016 wedge
 
SAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics
SAND: A Fault-Tolerant Streaming Architecture for Network Traffic AnalyticsSAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics
SAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics
 
Netsft2017 day in_life_of_nfv
Netsft2017 day in_life_of_nfvNetsft2017 day in_life_of_nfv
Netsft2017 day in_life_of_nfv
 
LinkedIn – Engineer I at Infineon Technologies
LinkedIn – Engineer I at Infineon TechnologiesLinkedIn – Engineer I at Infineon Technologies
LinkedIn – Engineer I at Infineon Technologies
 
wd1-01-jaseel-madhusudan-pres-user
wd1-01-jaseel-madhusudan-pres-userwd1-01-jaseel-madhusudan-pres-user
wd1-01-jaseel-madhusudan-pres-user
 
Dpdk frame pipeline for ips ids suricata
Dpdk frame pipeline for ips ids suricataDpdk frame pipeline for ips ids suricata
Dpdk frame pipeline for ips ids suricata
 
Software development for the COMPASS experiment
Software development for the COMPASS experimentSoftware development for the COMPASS experiment
Software development for the COMPASS experiment
 
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
 
Programming embedded systems ii
Programming embedded systems iiProgramming embedded systems ii
Programming embedded systems ii
 
TRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch HaimTRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch Haim
 
The Cryptol Epilogue: Swift and Bulletproof VHDL
The Cryptol Epilogue: Swift and Bulletproof VHDLThe Cryptol Epilogue: Swift and Bulletproof VHDL
The Cryptol Epilogue: Swift and Bulletproof VHDL
 
Polyteda: Power DRC/LVS, October 2016
Polyteda: Power DRC/LVS, October 2016Polyteda: Power DRC/LVS, October 2016
Polyteda: Power DRC/LVS, October 2016
 

Similar to Final

A Methodology for Automatic GPU Kernel Optimization
A Methodology for Automatic GPU Kernel OptimizationA Methodology for Automatic GPU Kernel Optimization
A Methodology for Automatic GPU Kernel Optimization
NECST Lab @ Politecnico di Milano
 
6.3 DatacenterService Laporan Juni .pptx
6.3 DatacenterService Laporan Juni .pptx6.3 DatacenterService Laporan Juni .pptx
6.3 DatacenterService Laporan Juni .pptx
AndreWirawan14
 

Similar to Final (20)

GPUrdma - Presentation
GPUrdma - PresentationGPUrdma - Presentation
GPUrdma - Presentation
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
 
Midtem_19082004
Midtem_19082004Midtem_19082004
Midtem_19082004
 
Cuda project paper
Cuda project paperCuda project paper
Cuda project paper
 
A Methodology for Automatic GPU Kernel Optimization
A Methodology for Automatic GPU Kernel OptimizationA Methodology for Automatic GPU Kernel Optimization
A Methodology for Automatic GPU Kernel Optimization
 
Proposal
ProposalProposal
Proposal
 
Proceso de certificación de gráficos
Proceso de certificación de gráficosProceso de certificación de gráficos
Proceso de certificación de gráficos
 
[1C2]webrtc 개발, 현재와 미래
[1C2]webrtc 개발, 현재와 미래[1C2]webrtc 개발, 현재와 미래
[1C2]webrtc 개발, 현재와 미래
 
Advanced Modular Software Performance Monitoring
Advanced Modular Software Performance MonitoringAdvanced Modular Software Performance Monitoring
Advanced Modular Software Performance Monitoring
 
Resume marky20181025
Resume marky20181025Resume marky20181025
Resume marky20181025
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloud
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloud
 
6.3 DatacenterService Laporan Juni .pptx
6.3 DatacenterService Laporan Juni .pptx6.3 DatacenterService Laporan Juni .pptx
6.3 DatacenterService Laporan Juni .pptx
 
VMworld 2013: How Good is PCoIP - A Remoting Protocol Shootout
VMworld 2013: How Good is PCoIP - A Remoting Protocol ShootoutVMworld 2013: How Good is PCoIP - A Remoting Protocol Shootout
VMworld 2013: How Good is PCoIP - A Remoting Protocol Shootout
 
GPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print ImagingGPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print Imaging
 
Bender pdr
Bender pdrBender pdr
Bender pdr
 
Dynamic Classification in a Silicon-Based Forwarding Engine
Dynamic Classification in a Silicon-Based Forwarding EngineDynamic Classification in a Silicon-Based Forwarding Engine
Dynamic Classification in a Silicon-Based Forwarding Engine
 
XPDS16: Consideration of Real Time GPU Scheduling of XenGT in Automotive Embe...
XPDS16: Consideration of Real Time GPU Scheduling of XenGT in Automotive Embe...XPDS16: Consideration of Real Time GPU Scheduling of XenGT in Automotive Embe...
XPDS16: Consideration of Real Time GPU Scheduling of XenGT in Automotive Embe...
 
Octnews featured article
Octnews featured articleOctnews featured article
Octnews featured article
 
Introduction to Programmable Networks by Clarence Anslem, Intel
Introduction to Programmable Networks by Clarence Anslem, IntelIntroduction to Programmable Networks by Clarence Anslem, Intel
Introduction to Programmable Networks by Clarence Anslem, Intel
 

More from Kan-Han (John) Lu

More from Kan-Han (John) Lu (20)

Dagger for android
Dagger for androidDagger for android
Dagger for android
 
Android develop guideline
Android develop guidelineAndroid develop guideline
Android develop guideline
 
Working process and git branch strategy
Working process and git branch strategyWorking process and git branch strategy
Working process and git branch strategy
 
Deep neural network for youtube recommendations
Deep neural network for youtube recommendationsDeep neural network for youtube recommendations
Deep neural network for youtube recommendations
 
Android testing part i
Android testing part iAndroid testing part i
Android testing part i
 
Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service ii
 
Multimedia data minig and analytics sentiment analysis using social multimedia
Multimedia data minig and analytics sentiment analysis using social multimediaMultimedia data minig and analytics sentiment analysis using social multimedia
Multimedia data minig and analytics sentiment analysis using social multimedia
 
Android IPC: Binder
Android IPC: BinderAndroid IPC: Binder
Android IPC: Binder
 
ARM: Trusted Zone on Android
ARM: Trusted Zone on AndroidARM: Trusted Zone on Android
ARM: Trusted Zone on Android
 
Android Training - Card Style
Android Training - Card StyleAndroid Training - Card Style
Android Training - Card Style
 
Android Training - View Pager
Android Training - View PagerAndroid Training - View Pager
Android Training - View Pager
 
Android Training - Sliding Menu
Android Training - Sliding MenuAndroid Training - Sliding Menu
Android Training - Sliding Menu
 
Android Training - Pull to Refresh
Android Training - Pull to RefreshAndroid Training - Pull to Refresh
Android Training - Pull to Refresh
 
Java: Exception Handling
Java: Exception HandlingJava: Exception Handling
Java: Exception Handling
 
Dynamic Proxy by Java
Dynamic Proxy by JavaDynamic Proxy by Java
Dynamic Proxy by Java
 
Code analyzer: FindBugs and PMD
Code analyzer: FindBugs and PMDCode analyzer: FindBugs and PMD
Code analyzer: FindBugs and PMD
 
Android UI System
Android UI SystemAndroid UI System
Android UI System
 
Android Fragment
Android FragmentAndroid Fragment
Android Fragment
 
Android Training - Content Sharing
Android Training - Content SharingAndroid Training - Content Sharing
Android Training - Content Sharing
 
Android Training - Action Bar
Android Training - Action BarAndroid Training - Action Bar
Android Training - Action Bar
 

Final

  • 1. CUDA Speedup for DVC System Members: 黃琮閔 [email_address] 王品翔 [email_address] 呂侃翰 698470271 Presented by 王品翔 GPU Programming Final presentation
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. Demo 1.Sequential Mode 2.CUDA Mode

Editor's Notes

  1. 我們為什麼會想要去加速這個 DVC codec Complexity decoding 試著找出主要的 CO 作加速 1
  2. 那我們實際做加速的部分 d vc decoder 理的 SI generation 這一塊 這一塊基本上就是做 ME 那傳 這一塊是最耗時的部分 主不過現在是搬到 decoder 作 部過 DVC 最耗時的部分是在 然後我有一個實驗室學姐 順論 DVC 讀 LDPC 加速 基於他的系統去改 SI 這一塊 主要是為了 只脞門比較