Today’s amounts of collected data are showing a nearly exponential growth. More than 75% of all the data have been collected in the past 5 years. To store this data and process it in an appropriate time you need to partition the data and parallelize the processing of reports and analytics.
This talk will demonstrate how to parallelize data processing using Hazelcast and it’s underlying distributed data structures. With a quick introduction into the different terms and some short live coding examples we will make the journey into the distributed computing.
Sourcecode of the demonstrations are available here:
1. https://github.com/noctarius/hazelcast-mapreduce-presentation
2. https://github.com/noctarius/hazelcast-distributed-computing
20. WHY A DISTRIBUTED EXECUTORSERVICE?
j.l.Runnable /j.u.c.Callable
Onlyneeds to be serializable
Same Task all/multiple Nodes
Should notwork on Data
www.hazelcast.com
21. Printnode name on allnodes
QUICK EXAMPLE
Runnablerunnable=()->println("RunningonNode:"+member.node);
IExecutorServiceexecutorService=hazelcastInstance.getExecutorService("default");
executorService.executeOnAllMembers(runnable);
www.hazelcast.com
32. Dataare mapped /transformed in asetof key-value pairs
SOME PSEUDO CODE (1/3)
MAPPING
map(key:String,document:String):Void->
foreachw:Wordindocument:
emit(w,1)
www.hazelcast.com
33. Multiple values are combined to an
intermediate resultto preserve traffic
SOME PSEUDO CODE (2/3)
COMBINING
combine(word:Word,counts:List[Int]):Void->
emit(word,sum(counts))
www.hazelcast.com
34. Values are reduced /aggregated to the requested result
SOME PSEUDO CODE (3/3)
REDUCING
reduce(word:String,counts:List[Int]):Int->
returnsum(counts)
www.hazelcast.com