Bp.On.Cuda

•Download as PPTX, PDF•

0 likes•414 views

Yanyan Xu

Belief Propagation Algorithm using CUDA

Education Technology

Disparity-Map Generation using GPUs Yan Xu Tutor: Hui Chen School of Information Science and Engineering Aug. 1, 2009

Tsukuba Right Image Tsukuba Left Image Ground Truth

Disparity-Map in Stereo Vision Parallel Programming Programming on GPUs Belief Propagation BP on CUDA Experiment Results Conclusions and Future Works Over View

Disparity-Map Generation Disparity-Map Stereo Match Rectification Calibration

Local Algorithm Belief Propagation Graph Cut Dynamic Programming Disparity-Map Generation

Ground Truth Tsukuba Left Image Tsukuba Right Image Disparity Image by BP (F. Felzenszwalb) Disparity Image by DP Disparity Image by GC (Kolmogorov)

Parallel Programming Serial Programming Parallel Programming

Parallel Programming Traditionally, software has been written for serial computation: • To be run on a single computer having a single Central Processing Unit (CPU). • A problem is broken into a discrete series of instructions. • Instructions are executed one after another. • Only one instruction may execute at any moment in time. •Serial Programming Parallel Programming

Parallel Programming In the simplest sense, parallel computing is the simultaneous use of multiple compute resources to solve a computational problem: • To be run using multiple CPUs. • A problem is broken into discrete parts that can be solved concurrently. • Each part is further broken down to a series of instructions. • Instructions from each part execute simultaneously on different CPUs. Serial Programming • Parallel Programming

Programming on GPUs CPU (Host) GPU (Device)

(Device) Grid Block (0, 0) Block (1, 0) Shared Memory Shared Memory Registers Registers Registers Registers Thread (0, 0) Thread (1, 0) Thread (0, 0) Thread (1, 0) Local Memory Local Memory Local Memory Local Memory Host Global Memory Constant Memory Texture Memory Grid 1 Block (0, 0) Block (1, 0) Block (2, 0) Block (0, 1) Block (1, 1) Block (2, 1) Grid 2 Block (1, 1) Thread (0, 1) Thread (1, 1) Thread (2, 1) Thread (3, 1) Thread (4, 1) Thread (0, 2) Thread (1, 2) Thread (2, 2) Thread (3, 2) Thread (4, 2) Thread (0, 0) Thread (1, 0) Thread (2, 0) Thread (3, 0) Thread (4, 0) Programming on GPUs Host Device Kernel 1 Kernel 2

Programming on GPUs Main() { //Allocate memory on GPU float *Md; cudaMalloc((void**)&Md, size); //Copy data from CPU to GPU cudaMemcpy(Md, M, size, cudaMemcpyHostToDevice); //Call GPU kernel function kernel<<<dimGrid, dimBlock>>> (arguments); //Copy data from GPU back to CPU CopyFromDeviceMatrix(M, Md); //Free device matrices FreeDeviceMatrix(Md); }

Programming on GPUs • CUDA (Compute Unified Device Architecture) is a computing architecture developed by nVIDIA to use graphic processing unit as a general purpose parallel processor. nVIDIAGeFroce 8800

Belief Propagation Algorithm mlabels s sites data costs + discontinuity costs

Belief Propagation on CUDA 1. Allocate GPU global memory 2. Load original images (left and right) to GPU global memory 3. (If real-world image) Pre-process images with Sobel / Residual 4. Calculate data cost 5. Calculate the data (Gaussian) pyramid 6. Message passing using created pyramid 7. Compute disparity map from messages and data-cost 8. Retrieve disparity map to local (host) memory

Conclusions and Future Works • Improve Belief Propagation (faster and better) • Implement other stereo algorithms in parallel (such as DP, GC…) • Apply the algorithm to stereo images captured by Truck

Thank you for your attention ! Questions ?

What's hot

Gpu Systemsjpaugh

Slide tesiNicolò Savioli

GPU-Accelerated Parallel ComputingJun Young Park

GPU-working & structure(Nividia & AMD)-History and 2017-Open sessioncrazytenz

Unite 2013 optimizing unity games for mobile platformsナム-Nam Nguyễn

[UniteKorea2013] Memory profiling in UnityWilliam Hugo Yang

CUDARachel Miller

Unity Internals: Memory and PerformanceDevGAMM Conference

Cgroups in androidramalinga prasad tadepalli

Resource Management with Systemd and cgroupsTsung-en Hsiao

Gossip-based resource allocation for green computing in large cloudsRerngvit Yanggratoke

Built for performance: the UIElements Renderer – Unite Copenhagen 2019Unity Technologies

【Unite 2017 Tokyo】Unityを使ったNintendo Switch™ローンチタイトル制作～スーパーボンバーマンRの事例～Unite2017Tokyo

Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule

【Unite Tokyo 2018】その最適化、本当に最適ですか！？～正しい最適化を行うためのテクニック～Unity Technologies Japan K.K.

Optimization in Unity: simple tips for developing with "no surprises" / Anton...DevGAMM Conference

Porting and optimizing UniFrac for GPUsIgor Sfiligoi

A Buffering Approach to Manage I/O in a Normalized Cross-Correlation Earthqua...Dawei Mu

Minor Presentation - Distributed file systemiamumr

Parallel Implementation of K Means Clustering on CUDAprithan

What's hot (20)

Gpu Systems

Slide tesi

GPU-Accelerated Parallel Computing

GPU-working & structure(Nividia & AMD)-History and 2017-Open session

Unite 2013 optimizing unity games for mobile platforms

[UniteKorea2013] Memory profiling in Unity

CUDA

Unity Internals: Memory and Performance

Cgroups in android

Resource Management with Systemd and cgroups

Gossip-based resource allocation for green computing in large clouds

Built for performance: the UIElements Renderer – Unite Copenhagen 2019

【Unite 2017 Tokyo】Unityを使ったNintendo Switch™ローンチタイトル制作～スーパーボンバーマンRの事例～

Kato Mivule: An Overview of CUDA for High Performance Computing

【Unite Tokyo 2018】その最適化、本当に最適ですか！？～正しい最適化を行うためのテクニック～

Optimization in Unity: simple tips for developing with "no surprises" / Anton...

Porting and optimizing UniFrac for GPUs

A Buffering Approach to Manage I/O in a Normalized Cross-Correlation Earthqua...

Minor Presentation - Distributed file system

Parallel Implementation of K Means Clustering on CUDA

Viewers also liked

Platform BPO - Nasscom BPO Strategy Summit - June 2009Suresh Sambandam

Sucessful BPO Through Technology EnablementACTIVE Network

Indian KPO Industry 2025Eminenture

BPO session featuring DB, JPM, Bank of China and Bank of Tokyo MitsubishiAndré Casterman

KPO Sector In IndiaJaspal Singh

Ericsson Technology Review: Evolving LTE to fit the 5G future Ericsson

Project on BPOVivek Saha

Viewers also liked (7)

Platform BPO - Nasscom BPO Strategy Summit - June 2009

Sucessful BPO Through Technology Enablement

Indian KPO Industry 2025

BPO session featuring DB, JPM, Bank of China and Bank of Tokyo Mitsubishi

KPO Sector In India

Ericsson Technology Review: Evolving LTE to fit the 5G future

Project on BPO

Similar to Bp.On.Cuda

Gpu with cuda architectureDhaval Kaneria

Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Johan Andersson

PhD defense talk (portfolio of my expertise)Gernot Ziegler

gpuprogram_lecture,architecture_designsnARUNACHALAM468781

GPU Introduction.pptxSherazMunawar5

Gpuhashim102

Umbra Ignite 2015: Jérémy Virga – Dishonored 2 rendering engine architecture ...Umbra Software

Monte Carlo on GPUsfcassier

Intro to Machine Learning for GPUsSri Ambati

Graphics processing unit (GPU)Amal R

Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)PhtRaveller

Parallel Futures of a Game Engine (v2.0)Johan Andersson

The Rise of Parallel Computingbakers84

BDL_project_reportShobha Vissapragada

Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecturemohamedragabslideshare

Nvidia (History, GPU Architecture and New Pascal Architecture)Saksham Tanwar

Vpu technology &gpgpu computingArka Ghosh

Defense_PresentationDebjyoti Majumder

Similar to Bp.On.Cuda (20)

Gpu with cuda architecture

Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)

PhD defense talk (portfolio of my expertise)

gpuprogram_lecture,architecture_designsn

GPU Introduction.pptx

Gpu

Umbra Ignite 2015: Jérémy Virga – Dishonored 2 rendering engine architecture ...

Monte Carlo on GPUs

Intro to Machine Learning for GPUs

Graphics processing unit (GPU)

Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)

Parallel Futures of a Game Engine (v2.0)

The Rise of Parallel Computing

BDL_project_report

Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture

Nvidia (History, GPU Architecture and New Pascal Architecture)

Vpu technology &gpgpu computing

Defense_Presentation

Recently uploaded

Mastering the Unannounced Regulatory InspectionSafetyChain Software

A Critique of the Proposed National Education Policy ReformChameera Dedduwage

Student login on Anyboli platform.helpinRaunakKeshri1

Software Engineering Methodologies (overview)eniolaolutunde

Measures of Central Tendency: Mean, Median and ModeThiyagu K

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr

Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxRAM LAL ANAND COLLEGE, DELHI UNIVERSITY.

Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique

Introduction to AI in Higher Education_draft.pptxpboyjonauth

Interactive Powerpoint_How to Master effective communicationnomboosow

Activity 01 - Artificial Culture (1).pdfciinovamais

Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732

mini mental status format.docxPoojaSen20

Staff of Color (SOC) Retention Efforts DDSDDavid Douglas School District

Advanced Views - Calendar View in Odoo 17Celine George

Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande

Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron

Grant Readiness 101 TechSoup and Remy ConsultingTechSoup

Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1

Recently uploaded (20)

Mastering the Unannounced Regulatory Inspection

A Critique of the Proposed National Education Policy Reform

Student login on Anyboli platform.helpin

Software Engineering Methodologies (overview)

Measures of Central Tendency: Mean, Median and Mode

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...

Sanyam Choudhary Chemistry practical.pdf

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx

Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx

Introduction to AI in Higher Education_draft.pptx

Interactive Powerpoint_How to Master effective communication

Activity 01 - Artificial Culture (1).pdf

Separation of Lanthanides/ Lanthanides and Actinides

mini mental status format.docx

Staff of Color (SOC) Retention Efforts DDSD

Advanced Views - Calendar View in Odoo 17

Web & Social Media Analytics Previous Year Question Paper.pdf

Q4-W6-Restating Informational Text Grade 3

Grant Readiness 101 TechSoup and Remy Consulting

Employee wellbeing at the workplace.pptx

Bp.On.Cuda

1. Disparity-Map Generation using GPUs Yan Xu Tutor: Hui Chen School of Information Science and Engineering Aug. 1, 2009

2. Tsukuba Right Image Tsukuba Left Image Ground Truth

3. Disparity-Map in Stereo Vision Parallel Programming Programming on GPUs Belief Propagation BP on CUDA Experiment Results Conclusions and Future Works Over View

4. Disparity-Map Generation Disparity-Map Stereo Match Rectification Calibration

5. Local Algorithm Belief Propagation Graph Cut Dynamic Programming Disparity-Map Generation

6. Ground Truth Tsukuba Left Image Tsukuba Right Image Disparity Image by BP (F. Felzenszwalb) Disparity Image by DP Disparity Image by GC (Kolmogorov)

7. Parallel Programming Serial Programming Parallel Programming

8. Parallel Programming Traditionally, software has been written for serial computation: • To be run on a single computer having a single Central Processing Unit (CPU). • A problem is broken into a discrete series of instructions. • Instructions are executed one after another. • Only one instruction may execute at any moment in time. •Serial Programming Parallel Programming

9. Parallel Programming In the simplest sense, parallel computing is the simultaneous use of multiple compute resources to solve a computational problem: • To be run using multiple CPUs. • A problem is broken into discrete parts that can be solved concurrently. • Each part is further broken down to a series of instructions. • Instructions from each part execute simultaneously on different CPUs. Serial Programming • Parallel Programming

10. Serial Parallel

11. Programming on GPUs CPU (Host) GPU (Device)

12. (Device) Grid Block (0, 0) Block (1, 0) Shared Memory Shared Memory Registers Registers Registers Registers Thread (0, 0) Thread (1, 0) Thread (0, 0) Thread (1, 0) Local Memory Local Memory Local Memory Local Memory Host Global Memory Constant Memory Texture Memory Grid 1 Block (0, 0) Block (1, 0) Block (2, 0) Block (0, 1) Block (1, 1) Block (2, 1) Grid 2 Block (1, 1) Thread (0, 1) Thread (1, 1) Thread (2, 1) Thread (3, 1) Thread (4, 1) Thread (0, 2) Thread (1, 2) Thread (2, 2) Thread (3, 2) Thread (4, 2) Thread (0, 0) Thread (1, 0) Thread (2, 0) Thread (3, 0) Thread (4, 0) Programming on GPUs Host Device Kernel 1 Kernel 2

13. Programming on GPUs Main() { //Allocate memory on GPU float *Md; cudaMalloc((void**)&Md, size); //Copy data from CPU to GPU cudaMemcpy(Md, M, size, cudaMemcpyHostToDevice); //Call GPU kernel function kernel<<<dimGrid, dimBlock>>> (arguments); //Copy data from GPU back to CPU CopyFromDeviceMatrix(M, Md); //Free device matrices FreeDeviceMatrix(Md); }

14. Programming on GPUs • CUDA (Compute Unified Device Architecture) is a computing architecture developed by nVIDIA to use graphic processing unit as a general purpose parallel processor. nVIDIAGeFroce 8800

15. Belief Propagation Algorithm mlabels s sites data costs + discontinuity costs

16. Belief Propagation Algorithm

17. Belief Propagation on CUDA 1. Allocate GPU global memory 2. Load original images (left and right) to GPU global memory 3. (If real-world image) Pre-process images with Sobel / Residual 4. Calculate data cost 5. Calculate the data (Gaussian) pyramid 6. Message passing using created pyramid 7. Compute disparity map from messages and data-cost 8. Retrieve disparity map to local (host) memory

18. Experiment Results video

19. Experiment Results Original

20. Experiment Results Sobel

21. Experiment Results video Residual

22. Conclusions and Future Works • Improve Belief Propagation (faster and better) • Implement other stereo algorithms in parallel (such as DP, GC…) • Apply the algorithm to stereo images captured by Truck

23. Thank you for your attention ! Questions ?

Bp.On.Cuda

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Bp.On.Cuda

Similar to Bp.On.Cuda (20)

Recently uploaded

Recently uploaded (20)

Bp.On.Cuda