Hadoop + GPU

© ALTOROS Systems | CONFIDENTIAL
“The norm for data analytics is now to run them on commodity clusters with
MapReduce-like abstractions. One only needs to read the popular blogs to see the
evidence of this. We believe that we could now say that
“nobody ever got fired
for using Hadoop on a cluster”!

Breaking
News
IBM Keynote at JavaOne 2013: Java Flies in Blue Skies and Open Clouds
Java and GPUs open up a world of new opportunities
for GPU accelerators and Java programmers alike.

Breaking
News
Duimovich showed an example of GPU acceleration
of sorting using standard NVIDIA CUDA libraries
that are already available!
The speedups are phenomenal — ranging from 2x to 48x faster!

Breaking
News?

Breaking
Hadoop

Breaking
Hadoop
10 000x faster

Hadoop vs GPU
Hadoop & GPU
Hadoop + GPU
HPC
Big Data
GPGPU in Java
Heterogeneous systems
Horizontal and vertical scalability

Hadoop horizontal scalability
file01 file02 file03

Node 1 Node 2 Node 3
01 02 03 04 05 06 07 08 09 10
01
02
03
04
05 0607 0809 10

01 02 03 04 05 06 07 08 09 10
01
02
03
04
05 0607 0809 10
3 4 3

01 02 03 04 05 06 07 08 09 10
01
02
03
04
05 0607 0809 10
3 4 3
01 02
03 04
05 06
07 08
09 10
01 02 03
04
05 06 07
08 09 10

01 02 03 04 05 06 07 08 09 10
01
02
03
04
05 0607 0809 10
3 4 3
01 02
03 04
05 06
07 08
09 10
01 02 03
04
05 06 07
08 09 10
221 1 2 2

01 02
03 04
05 06
07 08
09 10
01 02 03
04
05 06 07
08 09 10
221 1 2 2

Use GPU to scale vertically
01 02
03 04
05 06
07 08
09 10
01 02 03
04
05 06 07
08 09 10
221 1 2 20.5 1 1 0.5 1 1

Profit estimation
“Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU” by Intel
NVidia GTX280
vs
Intel Core i7-960

Profit estimation
“Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU” by Intel
“OpenCL: the advantages of heterogeneous approach” by Intel
NVidia GTX280
vs
Intel Core i7-960

How to use OpenCL?

How to use OpenCL?
Hadoop streaming

Aparapi
Expands Java's “Write Once Run Anywhere” to include APU and GPU devices
by expressing data parallel algorithm through extending Kernel base class.
MyKernel.class

Aparapi
MyKernel.class
Platform
Supports
OpenCL?

Aparapi
MyKernel.class
Platform
Supports
OpenCL?
Execute using
Java Thread Pool

Aparapi
MyKernel.class
Platform
Supports
OpenCL?
Bytecode can
be converted
to OpenCL?
Execute using
Java Thread Pool

Aparapi
MyKernel.class
Platform
Supports
OpenCL?
Bytecode can
be converted
to OpenCL?
Convert it
Execute OpenCL
Kernel on DeviceExecute using
Java Thread Pool

Aparapi

Aparapi
lambda

Aparapi
lambda
HSA

Aparapi
Characteristics of ideal data parallel workload

Aparapi
Code which iterates over large arrays of primitives
- 32/64 bit data types preferred
- where the order of iterations is not critical
avoid data dependencies between iterations
- each iteration contains sequential code (few branches)

Aparapi
Code which iterates over large arrays of primitives
- 32/64 bit data types preferred
- where the order of iterations is not critical
avoid data dependencies between iterations
- each iteration contains sequential code (few branches)
Balance between data size (low) and compute (high)
- data transfer to/from the GPU can be costly
- trivial compute not worth the transfer cost
- may still benefit by freeing up CPU for other work(?)

HadoopCL
Rice University, AMD

HadoopCL

HadoopCL
2 six-core Intel X5660
(48 GB mem)
2 NVidia Tesla M2050
(2*2.5 GB mem)
AMD A10-5800K APU
(16 GB mem)

HadoopCL
2 six-core Intel X5660
(48 GB mem)
2 NVidia Tesla M2050
(2*2.5 GB mem)
AMD A10-5800K APU
(16 GB mem)
WHY?

Back to OpenCL, Aparapi and heterogeneous computing

OpenCL, Aparapi and heterogeneous computing
GPU cache
GPU GDDR5
CPU cache
SATA 3.0 (HDD)
SATA 2.0 (SSD)
1 GBit networkFormula in terms of time:
(CPU calc1) + disk read + disk write
>
(CPU calc2 + GPU calc + GPU-write + GPU-read) + disk read + disk write

OpenCL future

OpenCL future
http://streamcomputing.eu/

Questions?
Big Data Experts FB group

Hadoop + GPU

More Related Content

What's hot

Viewers also liked

Similar to Hadoop + GPU

Recently uploaded

Hadoop + GPU