1. Large-scale Neural ModelingLarge-scale Neural Modeling
in MapReduce and Giraphin MapReduce and Giraph
Co-authors
Nicholas D. Spielman
Neuroscience Program
University of St. Thomas
Presenter
Shuo Yang
Graduate Programs in Software
University of St. Thomas
Special thanks
Bhabani Misra, PhD
Graduate Programs in Software
University of St. Thomas
Jadin C. Jackson PhD
Department of Biology
University of St. Thomas
Bradley S. Rubin, PhD
Graduate Programs in Software
University of St. Thomas
2. Why Hadoop & What is Hadoop
Why not supercomputers?
Expensive
Limited access
Scalability
Why Hadoop?
Runs on commodity hardware
Scalable
Full-fledged eco-system & community
Open-source implementation of MapReduce
based on Java
MapReduce Model
Client
Map
Reduce
HDFS
Split Data
Output
Map
Map
Reduce Output
3. …....
…....
∑ I
input currents
from neighbors
∆vI1
I2
In
currents to all neighbors
Synaptic weight matrix
0 1000Time Step
Neuron ID
Simulation results
0
2500
Neural Model (Izhikevich model)
4. …....
…....
∑ I
input currents
from neighbors
∆vI1
I2
In
currents to all neighbors
Synaptic weight matrix
0 1000Time Step
Neuron ID
Simulation results
0
2500
Neural Model (Izhikevich model)
This is a graph structure
5. Mapper
N1 I2
I3
N1 I2
I3
I2
I3
Reducer
Mapper
N2 I1
I3
N2 I1
I3
I1
I3
Mapper
N3 I2
I1
N3 I2
I1
I2
I1
Reducer
Reducer
N1 I2
I3
N2 I1
I3
N3 I2
I1
sum currents to N1
sum currents to N2
sum currents to N3
update N1
update N2
update N3
HDFS
initial input
write back to HDFS
N1 and its
local structure
N2 and its
local structure
N3 and its
local structure
Map
Sort &
Shuffle Reduce
Basic MapReduce Implementation
input from previous job
6. Mapper
N1 I2
I3
N1 I2
I3
I2
I3
Reducer
Mapper
N2 I1
I3
N2 I1
I3
I1
I3
Mapper
N3 I2
I1
N3 I2
I1
I2
I1
Reducer
Reducer
N1 I2
I3
N2 I1
I3
N3 I2
I1
sum currents to N1
sum currents to N2
sum currents to N3
update N1
update N2
update N3
HDFS
initial input
write back to HDFS
N1 and its
local structure
N2 and its
local structure
N3 and its
local structure
Map
Sort &
Shuffle Reduce
Basic MapReduce Implementation
input from previous job
Problems:
synaptic currents are sent
directly to the reducers without
local aggregation
The graph structure is shuffled in
each iteration
9. Mapper
N1 I2
I3
I2
I3
Reducer
Mapper
N2 I1
I3
I1
I3
Mapper
N3 I2
I1
I2
I1
Reducer
Reducer
N1 I2
I3
N2 I1
I3
N3 I2
I1
sum currents to N1
sum currents to N2
sum currents to N3
update N1
update N2
update N3
HDFS
initial input
write back to HDFS
N1 and its
local structure
N2 and its
local structure
N3 and its
local structure
Schimmy (introduced by Lin & Schatz)
N1 I2
I3
N2 I1
I3
N3 I2
I1
Map
remotely read graph structure
sort &
shuffle Reduce
10. Mapper
N1 I2
I3
I2
I3
Reducer
Mapper
N2 I1
I3
I1
I3
Mapper
N3 I2
I1
I2
I1
Reducer
Reducer
N1 I2
I3
N2 I1
I3
N3 I2
I1
sum currents to N1
sum currents to N2
sum currents to N3
update N1
update N2
update N3
HDFS
initial input
write back to HDFS
N1 and its
local structure
N2 and its
local structure
N3 and its
local structure
Schimmy (introduced by Lin & Schatz)
N1 I2
I3
N2 I1
I3
N3 I2
I1
Map
remotely read graph structure
sort &
shuffle Reduce
Problems:
Remote reading from HDFS
The graph structure is read and
written in each iteration
11. Mapper
N1 I2
I3
I2
I3
Reducer
Mapper
N2 I1
I3
I1
I3
Mapper
N3 I2
I1
I2
I1
Reducer
Reducer
N1 I2
I3
N2 I1
I3
N3 I2
I1
sum currents to N1
sum currents to N2
sum currents to N3
update N1
update N2
update N3
HDFS
initial input
write back to HDFS
N1 and its
local structure
N2 and its
local structure
N3 and its
local structure
Schimmy (introduced by Lin & Schatz)
N1 I2
I3
N2 I1
I3
N3 I2
I1
Map
remotely read graph structure
sort &
shuffle Reduce
Observation:
The graph structure is read-only!
13. Drawbacks of Graph algorithm in MapReduce
Non-intuitive and hard to implement
Not efficiently expressed as iterative algorithms
Not optimized for large numbers of iterations
input from
HDFS
output to
HDFS
input from
HDFS
output to
HDFS
Mapper Intermediate files Reducer
Iterate
Startup Penalty Disk Penalty Disk Penalty
Not optimized for large numbers of iterations
14. Giraph
N1 I2
I3
N2 I1
I3
N3 I2
I1
N1 I2
I3
N2 I1
I3
N3 I2
I1
H
D
F
S
Load input Synchronous barrier Synchronous barrier
N1 I2
I3
N2 I1
I3
N3 I2
I1
H
D
F
S
…...
Write results back
Iterative graph processing system
Powers Facebook graph search
Highly scalable
Based on BSP model
Mapper-only job on Hadoop
In-memory computation
“Think like a vertex”
More intuitive APIs
15. Giraph
N1 I2
I3
N2 I1
I3
N3 I2
I1
N1 I2
I3
N2 I1
I3
N3 I2
I1
H
D
F
S
Load input Synchronous barrier Synchronous barrier
N1 I2
I3
N2 I1
I3
N3 I2
I1
H
D
F
S
…...
Write results back
Iterative graph processing system
Powers Facebook graph search
Highly scalable
Based on BSP model
Mapper-only job on Hadoop
In-memory computation
“Think like a NEURON”
More intuitive APIs
18. Conclusion
Hadoop is capable of modeling large-scale neural
networks.
Based on IMC and Schimmy, our Mapper-side Schimmy
improves MapReduce graph algorithms
Where graph structure is read-only.
Vertex-centric approaches, such as, Giraph showed
superior performance. However,
# of iterations specified as a global variable
Limited by memory per node
Not widely adopted by industry
19. Large-scale Neural ModelingLarge-scale Neural Modeling
in MapReduce and Giraphin MapReduce and Giraph
Co-authors
Nicholas D. Spielman
Neuroscience Program
University of St. Thomas
Presenter
Shuo Yang
Graduate Programs in Software
University of St. Thomas
Special thanks
Bhabani Misra, PhD
Graduate Programs in Software
University of St. Thomas
Jadin C. Jackson PhD
Department of Biology
University of St. Thomas
Bradley S. Rubin, PhD
Graduate Programs in Software
University of St. Thomas
20. Comparison of speeds – 40 ms simulation
Comparison of speeds – 20 ms to 40 ms simulation