GPU_based Searching

Search Optimization using GPU
February , 2010
J.Pavan Kumar
T.Polender Reddy
G.Sravan Kumar
T.Raghava Jyothi

Agenda
• Introduction
▫ History
▫ Why ‘String Searching’ ?
▫ Generations of Sequential search.
▫ Searching through GPU approach.
▫ Architecture of a GPU System.
▫ CPU vs.GPU.
▫ Features.
▫ Algorithms Implementation-Output analysis
September 28, 2016 2

History
 1980’s – No GPU. PC used VGA controller.
 1990’s – Add more function into VGA controller.
 1997 – 3D acceleration functions:
a) Hardware for triangle setup and rasterization.
b) Texture mapping.
c) Shading.
 2000 – A single chip graphics processor ( beginning
of GPU term).
 2005 – Massively parallel programmable
processors.
 2007 – CUDA (Compute Unified Device
Architecture).

GPU Computing Application Areas

Why ‘String Searching’ ?
ATGCATGCATGACTAG
AGCTAGA
ATGCATGCATGACTAG
ATGCATGCATGACTAG
ATGCATGCATGACTAG
ATGCATGCATGACTAG
ATGCATGCATGACTAG
•String-matching is a
very important subject
in the wider domain of
text processing.
•String-matching
algorithms are basic
components used in
implementations of
practical software's
existing under most
operating systems.

Generations
CPUCPU
SequentialSequential
Message PassingMessage Passing
Interface(MPI)Interface(MPI)
Open MPIOpen MPI
PThreadsPThreads
MultiMulti CoreCore
Single CoreSingle Core

Sequential Search Pain Points
Performance
Power
Consumption
Heat
Generation
Accuracy Time
Complexity

GPU Vision
Researchers
Customers
Business Applications
(Programming, Gaming, custom . . .)
•Computer vision tasks are computationally intensive
and repetitive, and they often exceed the real-time
capabilities of the CPU, leaving little time for higher-
level tasks.
•However, many computer vision operations map efficiently onto the modern GPU,
whose programmability allows a wide variety of computer vision algorithms to be
implemented

Technology Convergence
Single Core SystemsSingle Core SystemsSequential Searching “CPU”Sequential Searching “CPU”
MultiCore SystemsMultiCore Systems
Search using “Pthreads”Search using “Pthreads”
Search using “MPI” & “Open MP”Search using “MPI” & “Open MP”
CUDA ArchitectureCUDA Architecture
Graphics Processing Unit (GPU)Graphics Processing Unit (GPU)
Portal, Virtual, and MorePortal, Virtual, and More
2005
2007
2000

What is GPU?
• “Graphics Processing Unit ” , it is a processor
optimized for 2D/3D graphics, video, visual computing,
and display.
• It is highly parallel, highly multithreaded multiprocessor
optimized for visual computing.
• It provide real-time visual interaction with
computed objects via graphics images, and video.
• It serves as both a programmable graphics processor
and a scalable parallel computing platform.
• Heterogeneous Systems: combine a GPU with a
CPU.

Compute Unified Device Architecture
•CUDA is a scalable parallel programming
model and a software environment for
parallel computing.
• Minimal extensions to familiar C/C++
environment.
• Heterogeneous serial-parallel
programming model.
• CUDA also maps well to MultiCore CPUs!

CUDA C Example: Add Arrays

Logical Architecture of a GPU System

Physical Architecture
StrengthsStrengths
Fast & EasyFast & Easy
Network considerationsNetwork considerations
LimitationsLimitations
Limited data & user loadLimited data & user load
capability,capability, Memory
Bandwidth
Availability & reliabilityAvailability & reliability
GPU
Availability
Performance
Resource requirements:
5KB of SMEM per block
30 registers
128 threads per block.

Sequential CPU vs. Graphics Processing Unit
(GPU)
CPU
GPU
• Optimized for low-
latency access to cached
data sets.
• Control logic for out-of-
order and speculative
execution
• Optimized for data-
parallel, throughput
computation
• Architecture tolerant
of memory latency.
• More transistors
dedicated to
Computation

Testing - Matrices
• Test the multiplication of two matrices.
• Creates two matrices with random floating point
values.
• We tested with matrices of various dimensions…

Results:
DimensionTime CUDA(GPU) CPU
64x64 0.417465 ms 18.0876 ms
128x128 0.41691 ms 18.3007 ms
256x256 2.146367 ms 145.6302 ms
512x512 8.093004 ms 1494.7275 ms
768x768 25.97624 ms 4866.3246 ms
1024x1024 52.42811 ms 66097.1688 ms
2048x2048 407.648 ms No Results
4096x4096 3.1 seconds No Results

Features
• Random access to memory
• Thread can access any memory location
• Unlimited access to memory
• Thread can read/write as many locations as needed
• User-managed cache (per block)
• Threads can cooperatively load data into SMEM
• Any thread can then access any SMEM location
• Low learning curve
• Just a few extensions to C
• No knowledge of graphics is required
• No graphics API overhead

Brute Force Algorithm
• No preprocessing phase.
• Constant extra space needed.
• Always shifts the window by exactly 1 position to the
right.
• Comparisons can be done in any order.
• Searching phase in O(mn) time complexity.
• 2n expected text characters comparisons.

Example
Text
Search Pattern

Boyer-Moore Algorithm
• Performs the comparisons from right to left.
• Preprocessing phase in O(m+) time and space
complexity.
• Searching phase in O(mn) time complexity.
• 3n text character comparisons in the worst case
when searching for a non periodic pattern.
• O(n / m) best performance.

Example
Text
Search Pattern

Results: To search Four length string
Data
SizeTime
Brute Force
GPU
Brute Force
CPU
Boyer-Moore
GPU
Boyer-
Moore CPU
64 MB 135.48 sec 145.7 sec 219.35 sec 208.76
128 MB 279.86 sec 327 sec 388.69 sec 382.7 sec
256 MB 520.55 sec 575.42 sec 759.52 sec 753.21 sec
512 MB 1045.33 sec 1133 sec 970.56 sec 1520.71 sec
768 MB 1258.67 sec 1689 sec 1105.6 sec 2258.4 sec
1 GB 1389.96 sec 1896.45 sec 1506.34 sec 3018.2 sec

Graphs: Brute Force
Graph 1 Graph 2

Graphs: Boyer-Moore
Graph 1 Graph 2

Conclusion
• GPU computing has gathered tremendous interest
as a solution that is able to handle complex
computational problems and massive data sets in
real time.
• At the core of GPU computing, the C-based
Integrated Development Environment, CUDA, has
become the quintessential piece to the next
generations of application development.

Questions

GPU_based Searching

Recommended

Recommended

More Related Content

What's hot

What's hot (16)

Similar to GPU_based Searching

Similar to GPU_based Searching (20)

GPU_based Searching