SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
This proposes an integration of HPC and Apache Technologies. HPC-ABDS+ Integration areas include
File systems,
Cluster resource management,
File and object data management,
Inter process and thread communication,
Analytics libraries,
Workflow
Monitoring
This proposes an integration of HPC and Apache Technologies. HPC-ABDS+ Integration areas include
File systems,
Cluster resource management,
File and object data management,
Inter process and thread communication,
Analytics libraries,
Workflow
Monitoring
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
1.
HPC‐ABDS: The Case for an
Integrating Apache Big Data Stack
with HPC
1st JTC 1 SGBD Meeting
SDSC San Diego March 19 2014
Judy Qiu
Shantenu Jha (Rutgers)
Geoffrey Fox
gcf@indiana.edu
http://www.infomall.org
School of Informatics and Computing
Digital Science Center
Indiana University Bloomington
2.
Enhanced
Apache Big
Data Stack
ABDS
• ~120 Capabilities
• >40 Apache
• Green layers have
strong HPC Integration
opportunities
• Goal
• Functionality of ABDS
• Performance of HPC
3.
Broad Layers in HPC‐ABDS• Workflow‐Orchestration
• Application and Analytics
• High level Programming
• Basic Programming model and runtime
– SPMD, Streaming, MapReduce, MPI
• Inter process communication
– Collectives, point to point, publish‐subscribe
• In memory databases/caches
• Object‐relational mapping
• SQL and NoSQL, File management
• Data Transport
• Cluster Resource Management (Yarn, Slurm, SGE)
• File systems(HDFS, Lustre …)
• DevOps (Puppet, Chef …)
• IaaS Management from HPC to hypervisors (OpenStack)
• Cross Cutting
– Message Protocols
– Distributed Coordination
– Security & Privacy
– Monitoring
4.
Getting High Performance on Data
Analytics (e.g. Mahout, R …)
• On the systems side, we have two principles
– The Apache Big Data Stack with ~120 projects has important broad
functionality with a vital large support organization
– HPC including MPI has striking success in delivering high performance
with however a fragile sustainability model
• There are key systems abstractions which are levels in HPC‐ABDS software
stack where Apache approach needs careful integration with HPC
– Resource management
– Storage
– Programming model ‐‐ horizontal scaling parallelism
– Collective and Point to Point communication
– Support of iteration
– Data interface (not just key‐value)
• In application areas, we define application abstractions to support
– Graphs/network
– Geospatial
– Images etc.
5.
4 Forms of MapReduce
7
(a) Map Only
(d) Loosely
Synchronous
(c) Iterative
MapReduce
(b) Classic
MapReduce
Input
map
reduce
Input
map
reduce
Iterations
Input
Output
map
Pij
BLAST Analysis
Parametric sweep
Pleasingly Parallel
High Energy Physics
(HEP) Histograms
Distributed search
Classic MPI
PDE Solvers and
particle dynamics
Domain of MapReduce and Iterative Extensions
Science Clouds
MPI
Giraph
Expectation maximization
Clustering e.g. Kmeans
Linear Algebra, Page Rank
(a) Map Only
(d) Loosely
Synchronous
(c) Iterative
MapReduce
(b) Classic
MapReduce
InputInput
mapmap
reducereduce
InputInput
mapmap
reducereduce
IterationsIterations
InputInput
OutputOutput
mapmap
Pij
BLAST Analysis
Parametric sweep
Pleasingly Parallel
High Energy Physics
(HEP) Histograms
Distributed search
Classic MPI
PDE Solvers and
particle dynamics
Domain of MapReduce and Iterative Extensions
Science Clouds
MPI
Giraph
Expectation maximization
Clustering e.g. Kmeans
Linear Algebra, Page Rank
MPI is Map followed by Point to Point or Collective Communication
– as in style c) plus d)
6.
HPC‐ABDS
Hourglass
HPC ABDS
System (Middleware)
High performance
Applications
• HPC Yarn for Resource management
• Horizontally scalable parallel programming model
• Collective and Point to Point communication
• Support of iteration
System Abstractions/standards
• Data format
• Storage
120 Software Projects
Application Abstractions/standards
Graphs, Networks, Images, Geospatial ….
SPIDAL (Scalable Parallel
Interoperable Data Analytics Library)
or High performance Mahout, R,
Matlab …..
8.
We are sort of working on Use Cases with HPC‐ABDS
• Use Case 10 Internet of Things: Yarn, Storm, ActiveMQ
• Use Case 19, 20 Genomics. Hadoop, Iterative MapReduce, MPI,
Much better analytics than Mahout
• Use Case 26 Deep Learning. High performance distributed GPU
(optimized collectives) with Python front end (planned)
• Variant of Use Case 26, 27 Image classification using Kmeans:
Iterative MapReduce
• Use Case 28 Twitter with optimized index for Hbase, Hadoop and
Iterative MapReduce
• Use Case 30 Network Science. MPI and Giraph for network
structure and dynamics (planned)
• Use Case 39 Particle Physics. Iterative MapReduce (wrote
proposal)
• Use Case 43 Radar Image Analysis. Hadoop for multiple individual
images moving to Iterative MapReduce for global integration over
“all” images
• Use Case 44 Radar Images. Running on Amazon
9.
Features of Harp Hadoop Plug in
• Hadoop Plugin (on Hadoop 1.2.1 and Hadoop
2.2.0)
• Hierarchical data abstraction on arrays, key‐values
and graphs for easy programming expressiveness.
• Collective communication model to support
various communication operations on the data
abstractions.
• Caching with buffer management for memory
allocation required from computation and
communication
• BSP style parallelism
• Fault tolerance with check‐pointing
11.
Performance on Madrid Cluster (8
nodes)
0
200
400
600
800
1000
1200
1400
1600
100m 500 10m 5k 1m 50k
Execution Time (s)
Problem Size
K‐Means Clustering Harp v.s. Hadoop on Madrid
Hadoop 24 cores Harp 24 cores Hadoop 48 cores Harp 48 cores Hadoop 96 cores Harp 96 cores
Note compute same in each case as product of centers times points identical
Increasing
CommunicationIdentical Computation
12.
Mahout and Hadoop MR – Slow due to MapReduce
Python slow as Scripting
Spark Iterative MapReduce, non optimal communication
Harp Hadoop plug in with ~MPI collectives
MPI fastest as C not JavaIncreasing
Communication
Identical Computation
13.
Performance of MPI Kernel Operations
1
100
10000
0B
2B
8B
32B
128B
512B
2KB
8KB
32KB
128KB
512KB
Average time (us)
Message size (bytes)
MPI.NET C# in Tempest
FastMPJ Java in FG
OMPI‐nightly Java FG
OMPI‐trunk Java FG
OMPI‐trunk C FG
Performance of MPI send and receive operations
5
5000
4B
16B
64B
256B
1KB
4KB
16KB
64KB
256KB
1MB
4MB
Average time (us)
Message size (bytes)
MPI.NET C# in Tempest
FastMPJ Java in FG
OMPI‐nightly Java FG
OMPI‐trunk Java FG
OMPI‐trunk C FG
Performance of MPI allreduce operation
1
100
10000
1000000
4B
16B
64B
256B
1KB
4KB
16KB
64KB
256KB
1MB
4MB
Average Time (us)
Message Size (bytes)
OMPI‐trunk C Madrid
OMPI‐trunk Java Madrid
OMPI‐trunk C FG
OMPI‐trunk Java FG
1
10
100
1000
10000
0B
2B
8B
32B
128B
512B
2KB
8KB
32KB
128KB
512KB
Average Time (us)
Message Size (bytes)
OMPI‐trunk C Madrid
OMPI‐trunk Java Madrid
OMPI‐trunk C FG
OMPI‐trunk Java FG
Performance of MPI send and receive on
Infiniband and Ethernet
Performance of MPI allreduce on Infiniband
and Ethernet
Pure Java as
in FastMPJ
slower than
Java
interfacing
to C version
of MPI
14.
Use case 28: Truthy: Information diffusion research from Twitter Data
• Building blocks:
– Yarn
– Parallel query evaluation using Hadoop MapReduce
– Related hashtag mining algorithm using Hadoop MapReduce:
– Meme daily frequency generation using MapReduce over index tables
– Parallel force‐directed graph layout algorithm using Twister (Harp) iterative MapReduce
15.
Use case 28: Truthy: Information diffusion research from
Twitter Data
Two months’ data loading
for varied cluster size
Scalability of iterative graph
layout algorithm on Twister
Hadoop‐FS
not indexed
16.
0
200
400
600
800
1000
1200
1400
1600
1800
2000
24 48 96
Total execution time (s)
number of mappers
Different Kmeans Implementation
Total execution time vs. mapper number
Hadoop 100m,500 Hadoop 10m,5000 Hadoop 1m,50000
Harp 100m,500 Harp 10m,5000 Harp 1m,50000
Pig HD1 100m,500 Pig HD1 10m,5000 Pig HD1 1m,50000
Pig Yarn 100m,500 Pig Yarn 10m,5000 Pig Yarn 1m,50000
Pig Performance
Hadoop
Harp‐Hadoop
Pig +HD1 (Hadoop)
Pig + Yarn
18.
DACIDR for Gene Analysis (Use Case 19,20)
• Deterministic Annealing Clustering and Interpolative
Dimension Reduction Method (DACIDR)
• Use Hadoop for pleasingly parallel applications, and
Twister (replacing by Yarn) for iterative MapReduce
applications
• Sequences – Cluster Centers
• Add Existing data and find Phylogenetic Tree
All‐Pair
Sequence
Alignment
Streaming
Pairwise Clustering
Multidimensional
Scaling
Visualization
Simplified Flow Chart of DACIDR
19.
Summarize a million Fungi Sequences
Spherical Phylogram Visualization
RAxML result visualized in FigTree.
Spherical Phylogram from new MDS
method visualized in PlotViz
20.
Lessons / Insights
• Integrate (don’t compete) HPC with “Commodity Big
data” (Google to Amazon to Enterprise data Analytics)
– i.e. improve Mahout; don’t compete with it
– Use Hadoop plug‐ins rather than replacing Hadoop
– Enhanced Apache Big Data Stack HPC‐ABDS has 120
members – please improve!
• HPC‐ABDS+ Integration areas include
– file systems,
– cluster resource management,
– file and object data management,
– inter process and thread communication,
– analytics libraries,
– Workflow
– monitoring