Accelerating Real-Time LiDAR Data Processing Using GPUs
Upcoming SlideShare
Loading in...5
×
 

Accelerating Real-Time LiDAR Data Processing Using GPUs

on

  • 393 views

 

Statistics

Views

Total Views
393
Views on SlideShare
393
Embed Views
0

Actions

Likes
0
Downloads
4
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Accelerating Real-Time LiDAR Data Processing Using GPUs Accelerating Real-Time LiDAR Data Processing Using GPUs Presentation Transcript

    • Accelerating Real-time LiDAR Data Processing using GPUs Vivek Venugopal and Suresh Kannan United Technologies Research Center venugov@utrc.utc.com August 7, 2013 This page contains no technical data subject to the EAR or the ITAR. The 2013 56th Midwest Symposium on Circuits and Systems (MWSCAS-2013)
    • This page contains no technical data subject to the EAR or the ITAR. Introduction and Motivation  Unmanned Autonomous Vehicles (UAV) are being considered for induction into the National Airspace System (NAS) to perform missions of vital importance to national security and defense, emergency management, science and commercial applications.  Safety is critical and therefore requirement for better sense-and-avoid concepts interoperable with the NAS. 2
    • This page contains no technical data subject to the EAR or the ITAR. LiDAR operation  Commercial off-the-shelf Maxi-Joker with a gross takeoff weight of 25 lbs including avionics and sensors. It consists of a spinning LiDAR, that provides 1024 points per frame up to 40 frames/s.  The LiDAR sensor emits multiple rays in all directions that intersect with various triangles in the scene.  Once the intersections are determined, the location of the UAV can be established and can also be used to identify the surrounding environment. 3 LiDAR Terrain axis of rotation xs t t xb θ 45 ray ray intersection points Monocular Camera Hokuyo UTM-30-LX LiDAR Flight Control and Navigation Computer Planning Computer
    • This page contains no technical data subject to the EAR or the ITAR. Block (0,0) Block (0,1) Block (0,2) Block (1,0) Block (1,1) Block (1,2) Block (0,0) Block (0,1) Block (0,2) Block (1,0) Block (1,1) Block (1,2) Thread (0,0) Thread (0,1) Thread (0,2) Thread (0,3) Thread (1,0) Thread (1,1) Thread (1,2) Thread (1,3) Thread (2,0) Thread (2,1) Thread (2,2) Thread (2,3) Block (1,1) Grid kernel 1 kernel 2 Device (GPU)Host (CPU)Instruction Cache Warp Scheduler and Dispatch Warp Scheduler and Dispatch Warp Scheduler and Dispatch Register File (64K x 32-bit) Interconnection Network 48cores 16DPunits 48cores 16DPunits 48cores 16DPunits 48cores 16DPunits 16LD/ST 16SFU 16LD/ST 16SFU 64 KB Shared Memory / L1 Cache 48 KB Read only Data Cache 16 Texture Units SMX Graphics Processing Unit (GPU)  NVIDIA's Kepler series GPU consists of a maximum of 15 Streaming Execution (SMX) units and up to six 64-bit memory controllers.  Existing algorithms need to be re-written and optimized to increase core utilization with minimum divergent threads.  The GeForce GT 650M GPU is clocked at 900 MHz and consists of 384 cores, 2 SMX processors with 192 cores each and 1 GB RAM. 4
    • This page contains no technical data subject to the EAR or the ITAR. Load ray origin and ray directions shared memory calculate distance from vertex to ray origin calculate determinant calculate barycentric co-ordinates Check if rays lie in the plane of triangle ? calculate intersection points calculate distance from vertex to ray origin calculate determinant calculate barycentric co-ordinates Check if rays lie in the plane of triangle ? calculate intersection points Copy triangles, ray positions and ray directions to GPU get triangle0 get trianglen Copy intersection points to CPU 0 n Implementation 5  CPU implementation:  compute-intensive, as each triangle is visited multiple times for multiple ray directions.  GPU implementation:  baseline implementation - unoptimized  memory optimized implementation: cache ray directions in GPU shared memory, utilize GPU’s atomic functions to modify the data in the shared memory location and prevent race conditions.
    • This page contains no technical data subject to the EAR or the ITAR. 1 10 100 1000 10000 100000 1 128 256 512 1024 Executiontimeinms Number of ray directions per given LiDAR position CPU time (ms) GPU time (ms) 2.854x 17.24x 17.42x 17.71x 17.52x Results - Unoptimized baseline implementation 6
    • This page contains no technical data subject to the EAR or the ITAR. 1 10 100 1000 10000 100000 1 128 256 512 1024 Executiontimeinms Number of ray directions per given LiDAR position CPU time (ms) GPU time (ms) 2.854x 46.42x 89x 148x 179x Results - Memory optimized implementation 7
    • This page contains no technical data subject to the EAR or the ITAR. LiDAR Yellow dots indicate LiDAR intersections Results 8
    • This page contains no technical data subject to the EAR or the ITAR. Results 9
    • This page contains no technical data subject to the EAR or the ITAR. Conclusion  The optimized shared memory implementation gives 10x more speedup as compared to the naive implementation on the GPU platform.  Future research efforts may also address a customized FPGA implementation for comparing the latency and throughput between GPUs and FPGAs. 10
    • This page contains no technical data subject to the EAR or the ITAR. Backup 11
    • This page contains no technical data subject to the EAR or the ITAR. Algorithm 12 for r = 0 to R, where R is the maximum number of rays in a given LiDAR position do for each triangle 2 scene do ~t = O X edge1 = Y X edge2 = Z X ~p = ~D ⇥ edge2, where ⇥ denotes cross product det = edge1 · ~p det = 1/det u = (~t · ~p) ⇤ det ~q = ~t ⇥ edge1 v = (~D · ~q) ⇤ det if (u < 0.0||u > 1.0) then return 0 else if (v < 0.0||(u + v) > 1.0) then return 0 else return (edge2 · ~q) ⇤ det end if end for end for