11. 1. Set number of cores and heap per job
2. Internal map tasks reuse JVM
3. RDDs feel like Scala collections
4. I can read the source!! (Storm)
5. GraphX
1. Static cluster wide settings
2. JVM startup time costly
3. Reducer values iterator… yuck
4. Time for a beer, or six
5. Giraph
Poor setup docs, mostly high level dev
Serializability, runtime
We at least know how to do this
Most any code can fit in a mapper