SlideShare a Scribd company logo
1 of 28
Download to read offline
. 
Hadoop/MapReduce Problems 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 2/23 
. 
2/23
. 
Hadoop/MapReduce Problems 
 upper limit on throughput, both in HDFS and MapReduce jobs 06 09 
 one machine is not so bad anymore: good RAM, multicore 09 
 MapReduce is key-value hashes only, very restrictive 
 MapReduce performs badly under heterogeneous workloads 
 Small file problem is hard to solve in Hadoop 10 
 ... 
06 K.Shvachko HDFS Scalability: the Limits to Growth the Magazine of USENIX, vol.35, no.2 (2012) 
09 A.Rowstron+4 Nobody ever got fired for using Hadoop on a cluster 1st HTCDP (2012) 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 3/23 
. 
3/23
. 
Hadoop/MapReduce Upgrades 
 solutions for geterogeneous jobs 17 
 R for statistical processing on top of HDFS shards 18 
 searchable shards -- HBase + Lucene 19 
 .... not enough 
17 A.Rasooli+1 COSHH: A Classification and Optimization based Scheduler for Heterogeneous Hadoop... McMaster (2013) 
18 S.Das+5 Ricardo: Integrating R and Hadoop SIGMOD (2010) 
19 X.Gao+2 Experimenting with Lucene Index on HBase in an HPC Environment 1st HPCDB (2012) 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 4/23 
. 
4/23
. 
Example : Superspreaders 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 5/23 
. 
5/23
. 
Example Problem : Superspreaders 
. 
A Superspreader... 
. 
... is a one-to-many (o2m) traffic artifact where one host tries to contact 
many other hosts within short timespan 
. 
 reverse of Superspreader is a Flash Crowd -- same algorithm 
 many-to-many (m2m, groups) is even more complex 
 a known problem 16 
. 
QUIZ 
. 
.How to detect superspreaders using MapReduce? Any ideas? 
16 S.Venkataraman+3 New Streaming Algorithms for Fast Detection of Superspreaders NDSS (2005) 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 6/23 
. 
6/23
. 
Superspreaders: (1) raw packets 
 time, sip, sport, dip, dport, psize, protocol -- the usual packet tuple 
 convert into text for MapReduce jobs 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 7/23 
. 
7/23
. 
Superspreaders: (2) 1st MapReduce 
. 
Step 1 
. 
....is to extract unique sip-dip pairs 
 3rd column is count (took word counting for 
basis) 
 ordered by sip 
 takes time because needs to process all the 
data 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 8/23 
. 
8/23
. 
Superspreaders: (3) 2nd MapReduce 
. 
Step 2 
. 
... is to count unique sips in sip-dip 
pairs 
. 
 based on the output of the 1st job 
 faster because data is relatively small 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 9/23 
. 
9/23
. 
Superspreaders: Problem NOT Solved 
 no time/sequence in MapReduce, no way to process data in a time 
window 
 do not know what to discard (tail, small counts) midway until all data is 
aggregated -- ineffective for large datasets 
 ... 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 10/23 
. 
10/23
. 
Proposal: TABID + Sketches 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 11/23 
. 
11/23
. 
TABID: Time-Aware BIg Data 
 TABID: acronym of Time Aware BIg Data 
 the grain is larger then key-value store, but better than HDFS shards 19 
KV 
Store 
Hadoop 
(HDFS) and 
MapReduce 
TABID Time-Aware 
Big Data 
(this demo) 
HDFS 
+ 
Lucene 
Index 
19 X.Gao+2 Experimenting with Lucene Index on HBase in an HPC Environment 1st HPCDB (2012) 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 12/23 
. 
12/23
. 
Sketches (Data Streaming) 
. 
Data Streaming 
. 
... is a new concept where data is processed in realtime without 
using any storage 
. 
 a relatively recent but well defined and mathematically/statistically 
formulated 11 
 many known interesting algorithms 12 
 algorithms for specific problems 14 15 16 
 main features: space efficiency, statistical rigidity (information 
theory), speed 
11 S.Muthukrishnan Data Streams: Algorithms and Applications Foundations and Trends... (2005) 
12 M.Sung+3 Scalable and Efficient Data Streaming Algorithms for Detecting .... ICDEW (2006) 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 13/23 
. 
13/23
. 
Hadoop/MapReduce 
. 
Hadoop is... 
. 
...a platform where your software 
meets with data shards 
. 
One Physical Machine (1 shard) 
file A 
file B 
file C 
ā€¦ 
Hadoop Space 
Read/parse Find 
Hadoop Job 
(your code) 
Hadoop Job 
(your code) 
Hadoop Job 
(your code) 
Hadoop Job 
(your code) 
many many 
Manager 
Name 
Server(s) 
Client Machine 
Hadoop Client 
Start Use 
Your 
Code 
You 
Deploy 
 shards are distributed across 
the cluster 
 nameserver is the 
bottleneck 
 fairness problems are 
when heavy and light jobs run 
together at the same shard 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 14/23 
. 
14/23
. 
TABID/Sketches 
. 
TABID is... 
. 
...an API that helps your jobs get access to 
data in its natural time order 
. 
One Physical Machine (1 shard) 
STuimb-eSltinoere 
Store 
TABID 
Node 
TABID 
Manager 
Schedule 
Client Machine 
TABID Client 
Start Use 
Your 
Sketcher 
You 
Multicore 
Replay 
 data shards are downloaded 
and replayed 
 jobs run on multicore 
 jobs can use any data 
structure (well beyond 
key-value) 
 jobs are data streaming 
sketches -- can be selected 
from a library 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 15/23 
. 
15/23
. 
TABID: BigData Replay 
Core X 
Core 1 
Core 1 
Read/prepare 
One SketcOhne Sketch One Sketch 
Start End End End 
TABID 
Manager 
Now(replay) 
ā€¦. 
Time-Aligned Big Data 
Cursor 
Time 
Direction 
Shared Memory 
Start 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 16/23 
. 
16/23
. 
TABID: Lockfree Shared Memory 
 lockfree shared memory is a new branch of parallel 
processing 
 locks are bad for multicore (memory fences 20) 
 a generic lockfree design in 05 and software implementation in 23 
 some attempts to apply multicore to MapReduce 21 22 
20 M.Aldinucci+2 FastFlow: Efficient Parallel Streaming Applications on Multi-core Universita di Pisa (2009) 
23 current MCoreMemory project page https://github.com/maratishe/mcorememory (myself) 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 17/23 
. 
17/23
. 
Wrapup (Problemā†’Solution) 
1. MapReduce is about counting, no rich datatypes 
 SOLVED ā†’ any datatype, stored as JSON 
2. MapReduce has no solution for heterogeneous jobs 
 SOLVED ā†’ TABID optimizes jobs-to-core mapping (bin packing) 
3. MapReduce has accountability problem because clients make their 
own jobs 
 SOLVED ā†’ a library of sketches based on known data 
streaming algorithms 
4. ... many smaller solutions 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 18/23 
. 
18/23
. 
Hadoop vs Tabid DEMO (today) 
 working Hadoop and TABID platforms 
 can play with various configurations on the spot 
. Replayable BigData for Multicore 
Processing and Stat... Rigid Sketching 
.Marat Zhanikeev ā€“ maratishe@gmail.com ā€“ maratishe.github.io ā€“ http://bit.do/marat141105 1/1 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 19/23 
. 
19/23
. 
Thatā€™s all, thank you ... 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 20/23 
. 
20/23
. 
Performance under Distributions 
 comparative efficiency (right) and job-to-core mapping efficiency 
(left) 
 all under heterogenious (variance) job distributions 
Tuples: min lifespan, max lifespan, exponent 
10,1000,0.01 
10,1000,0.05 
10,1000,0.1 
10,1000,0.3 
10,1000,0.7 
0 200 400 600 800 1000 
Number of Sketches 
5.4 
5.2 
5 
4.8 
4.6 
4.4 
4.2 
4 
3.8 
Log of Sketchbyte Ratio (HADOOP/TABID) 
10,1000,0 
More longer lifespans 
Tuples: min overhead, max overhead, exponent 
0 200 400 600 800 1000 
Number of Sketches 
6.65 
5.95 
5.25 
4.55 
3.85 
3.15 
Log of Max TABID Overhead (per core) 
100,10000,0 
100,10000,0.7 
100,10000,0.3 
100,10000,0.1 
100,10000,0.05 
100,10000,0.01 
Mostly 
large overhead 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 21/23 
. 
21/23
. 
Shared Memory : Logic 
Shared 
Memory 
IDLE 
All sketches 
caught up 
IDLE 
Cursor 
Time 
Sketch 
time 
... 
Data 
Stream 
ā€¦ 
All 
sketches Ring buffer Process 
Data 
Set Sketch 
Time 
Add 
data 
Sketch 
started 
Monitor Sketch Manager 
cursor 
Cursor 
moved 
Each 
item 
Cursor = 
sketch 
End time 
of life 
Global 
start 
Wait for 
all sketches 
End of 
data 
 coursor written by 
manager and read by jobs on 
cores 
 only manager writes, cores only 
read -- lockfree 
design 
 zero collissions guaranteed by 
API 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 22/23 
. 
22/23
. 
Related Subjects 
 rigid statistics in traffic analysis 
 QoS of user communities 01, dynamic network management, etc. 
 isn't BigData replay a bottleneck? 
 not really, circuits for bulk transfer 02 or multisource aggregation 03 can support very 
high throughputs 
 MapReduce has several bottlenecks: namespace lookup, HDFS, etc. -- see 06 
for details 
 Why BigData Replay? -- it's a new ecosystem 
 bigdata hoarders announce replays, openly collect public jobs until a deadline, open 
outcome 
 one good solution for bigdata ā†’ opendata innitiative 
01 myself+0 A holistic community-based architecture for measuring end-to-end QoS at data centres IJCSE (2014) 
02 myself+0 Circuit Emulation for Big Data Transfers in Clouds Networking for Big Data, Wiley (in print) (2015) 
03 myself+0 Multi-Source Stream Aggregation in the Cloud Wiley (2014) 
06 K.Shvachko HDFS Scalability: the Limits to Growth the Magazine of USENIX, vol.35, no.2 (2012) 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 23/23 
. 
23/23
. 
[01] myself+0 (2014) 
A holistic community-based architecture for measuring end-to-end QoS at data 
centres 
IJCSE 
[02] myself+0 (2015) 
Circuit Emulation for Big Data Transfers in Clouds 
Networking for Big Data, Wiley (in print) 
[03] myself+0 (2014) 
Multi-Source Stream Aggregation in the Cloud 
Wiley 
[04] myself+0 (2014) 
Optimizing Virtual Machine Migration for Energy-Efficient Clouds 
IEICEJ 
[05] myself+0 (2014) 
A lock-free shared memory design for high-throughput multicore packet traffic 
capture 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 23/23 
. 
23/23
. 
IJNM 
[06] K.Shvachko (2012) 
HDFS Scalability: the Limits to Growth 
the Magazine of USENIX, vol.35, no.2 
[07] Y.Chen+3 (2011) 
The Case for Evaluating MapReduce Performance Using Workload Suites 
MASCOTS 
[08] Z.Ren+4 (2012) 
Workload Characterization on a Production Hadoop Cluster... 
IEEE Workload... 
[09] A.Rowstron+4 (2012) 
Nobody ever got fired for using Hadoop on a cluster 
1st HTCDP 
[10] (current) 
Small File Problem in Hadoop (blog) 
http://amilaparanawithana.blogspot.jp/2012 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 23/23 
. 
23/23
. 
[11] S.Muthukrishnan (2005) 
Data Streams: Algorithms and Applications 
Foundations and Trends... 
[12] M.Sung+3 (2006) 
Scalable and Efficient Data Streaming Algorithms for Detecting .... 
ICDEW 
[13] Z.Bar-Yossef+2 (2002) 
...streaming algorithms, with an application to counting triangles in graphs 
ACM SODA 
[14] M.Charikar+2 (2002) 
Finding frequent items in data streams 
29th International Colloquium on Automata... 
[15] M.Datar+3 (2002) 
Maintaining stream statistics over sliding windows 
SIAM 
[16] S.Venkataraman+3 (2005) 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 23/23 
. 
23/23
. 
New Streaming Algorithms for Fast Detection of Superspreaders 
NDSS 
[17] A.Rasooli+1 (2013) 
COSHH: A Classification and Optimization based Scheduler for Heterogeneous 
Hadoop... 
McMaster 
[18] S.Das+5 (2010) 
Ricardo: Integrating R and Hadoop 
SIGMOD 
[19] X.Gao+2 (2012) 
Experimenting with Lucene Index on HBase in an HPC Environment 
1st HPCDB 
[20] M.Aldinucci+2 (2009) 
FastFlow: Efficient Parallel Streaming Applications on Multi-core 
Universita di Pisa 
[21] R.Brightwell (2008) 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 23/23 
. 
23/23
. 
Workshop on Managed Many-Core Systems 
1st Workshop ...Many-Core Systems 
[22] R.Chen+2 (2010) 
Tiled-MapReduce: Optimizing Resource Usages of Data-parallel... with Tiling 
19th PACT 
[23] current (myself) 
MCoreMemory project page 
https://github.com/maratishe/mcorememory 
M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 23/23 
. 
23/23

More Related Content

Viewers also liked

Viewers also liked (6)

20160309 AzureDay 2016 - Azure Stream Analytics & Azure Machine Learning
20160309   AzureDay 2016 - Azure Stream Analytics & Azure Machine Learning20160309   AzureDay 2016 - Azure Stream Analytics & Azure Machine Learning
20160309 AzureDay 2016 - Azure Stream Analytics & Azure Machine Learning
Ā 
Microsoft Machine Learning Smackdown
Microsoft Machine Learning SmackdownMicrosoft Machine Learning Smackdown
Microsoft Machine Learning Smackdown
Ā 
The Microsoft BigData Story
The Microsoft BigData StoryThe Microsoft BigData Story
The Microsoft BigData Story
Ā 
AWS for the SQL Server Pro
AWS for the SQL Server ProAWS for the SQL Server Pro
AWS for the SQL Server Pro
Ā 
Machine Learning on the Microsoft Stack
Machine Learning on the Microsoft StackMachine Learning on the Microsoft Stack
Machine Learning on the Microsoft Stack
Ā 
Personalized and Adaptive Math Learning: Recent Research and What It Means fo...
Personalized and Adaptive Math Learning: Recent Research and What It Means fo...Personalized and Adaptive Math Learning: Recent Research and What It Means fo...
Personalized and Adaptive Math Learning: Recent Research and What It Means fo...
Ā 

Similar to Replayable BigData for Multicore Processing and Statistically Rigid Sketching

EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingEuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
Jonathan Dursi
Ā 

Similar to Replayable BigData for Multicore Processing and Statistically Rigid Sketching (20)

(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
Ā 
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
Ā 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
Ā 
Map Reduce along with Amazon EMR
Map Reduce along with Amazon EMRMap Reduce along with Amazon EMR
Map Reduce along with Amazon EMR
Ā 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
Ā 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
Ā 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
Ā 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
Ā 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangalore
Ā 
Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)
Ā 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
Ā 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
Ā 
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingEuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
Ā 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Ā 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
Ā 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
Ā 
Hadoop Mapreduce
Hadoop MapreduceHadoop Mapreduce
Hadoop Mapreduce
Ā 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
Ā 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
Ā 
E031201032036
E031201032036E031201032036
E031201032036
Ā 

More from Tokyo University of Science

More from Tokyo University of Science (20)

A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...
A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...
A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...
Ā 
Ultrasound Relative Positioning for IoT Devices in Dense Wireless Spaces
Ultrasound Relative Positioning for IoT Devices in Dense Wireless SpacesUltrasound Relative Positioning for IoT Devices in Dense Wireless Spaces
Ultrasound Relative Positioning for IoT Devices in Dense Wireless Spaces
Ā 
Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...
Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...
Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...
Ā 
What if We Atomize Student Data and Apps and Put Them on Docker Containers?
What if We Atomize Student Data and Apps and Put Them on Docker Containers?What if We Atomize Student Data and Apps and Put Them on Docker Containers?
What if We Atomize Student Data and Apps and Put Them on Docker Containers?
Ā 
Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...
Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...
Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...
Ā 
Taking the Step from Software to Product Development \\ when teaching PBL at ...
Taking the Step from Software to Product Development \\ when teaching PBL at ...Taking the Step from Software to Product Development \\ when teaching PBL at ...
Taking the Step from Software to Product Development \\ when teaching PBL at ...
Ā 
Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...
Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...
Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...
Ā 
The Switchboard Optimization Problem and Heuristics for Cut-Through Networking
The Switchboard Optimization Problem and Heuristics for Cut-Through NetworkingThe Switchboard Optimization Problem and Heuristics for Cut-Through Networking
The Switchboard Optimization Problem and Heuristics for Cut-Through Networking
Ā 
The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...
The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...
The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...
Ā 
Bulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless Spaces
Bulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless SpacesBulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless Spaces
Bulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless Spaces
Ā 
Fog Cloud Caching at Network Edge via Local Hardware Awareness Spaces
Fog Cloud Caching at Network Edge via Local Hardware Awareness SpacesFog Cloud Caching at Network Edge via Local Hardware Awareness Spaces
Fog Cloud Caching at Network Edge via Local Hardware Awareness Spaces
Ā 
On a Hybrid Packets-and-Circuits Switching Logic
On a Hybrid Packets-and-Circuits Switching LogicOn a Hybrid Packets-and-Circuits Switching Logic
On a Hybrid Packets-and-Circuits Switching Logic
Ā 
Image-Related Uses for Roadside Infrastructure \\ based on Wireless Beacons
Image-Related Uses for Roadside Infrastructure \\ based on Wireless BeaconsImage-Related Uses for Roadside Infrastructure \\ based on Wireless Beacons
Image-Related Uses for Roadside Infrastructure \\ based on Wireless Beacons
Ā 
Complexity Resolution Control for Context Based on Metromaps
Complexity Resolution Control for Context Based on MetromapsComplexity Resolution Control for Context Based on Metromaps
Complexity Resolution Control for Context Based on Metromaps
Ā 
The Declarative-Coordinated Model for Self-Optimization of Service Networks
The Declarative-Coordinated Model for Self-Optimization of Service NetworksThe Declarative-Coordinated Model for Self-Optimization of Service Networks
The Declarative-Coordinated Model for Self-Optimization of Service Networks
Ā 
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
Ā 
3-Way Scripts as a Base Unit for Flexible Scale-Out Code
3-Way Scripts as a Base Unit for Flexible Scale-Out Code3-Way Scripts as a Base Unit for Flexible Scale-Out Code
3-Way Scripts as a Base Unit for Flexible Scale-Out Code
Ā 
Towards Social Robotics on Smartphones with Simple XYZV Sensor Feedback
Towards Social Robotics on Smartphones with Simple XYZV Sensor FeedbackTowards Social Robotics on Smartphones with Simple XYZV Sensor Feedback
Towards Social Robotics on Smartphones with Simple XYZV Sensor Feedback
Ā 
Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...
Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...
Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...
Ā 
Browser Visualization using PNGs Generated by HTML5 Workers on Multicore
Browser Visualization using PNGs Generated by HTML5 Workers on MulticoreBrowser Visualization using PNGs Generated by HTML5 Workers on Multicore
Browser Visualization using PNGs Generated by HTML5 Workers on Multicore
Ā 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
Ā 

Recently uploaded (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
Ā 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
Ā 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
Ā 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
Ā 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Ā 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Ā 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
Ā 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Ā 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
Ā 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
Ā 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
Ā 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Ā 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Ā 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Ā 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Ā 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
Ā 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Ā 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
Ā 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
Ā 

Replayable BigData for Multicore Processing and Statistically Rigid Sketching

  • 1.
  • 2. . Hadoop/MapReduce Problems M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 2/23 . 2/23
  • 3. . Hadoop/MapReduce Problems upper limit on throughput, both in HDFS and MapReduce jobs 06 09 one machine is not so bad anymore: good RAM, multicore 09 MapReduce is key-value hashes only, very restrictive MapReduce performs badly under heterogeneous workloads Small file problem is hard to solve in Hadoop 10 ... 06 K.Shvachko HDFS Scalability: the Limits to Growth the Magazine of USENIX, vol.35, no.2 (2012) 09 A.Rowstron+4 Nobody ever got fired for using Hadoop on a cluster 1st HTCDP (2012) M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 3/23 . 3/23
  • 4. . Hadoop/MapReduce Upgrades solutions for geterogeneous jobs 17 R for statistical processing on top of HDFS shards 18 searchable shards -- HBase + Lucene 19 .... not enough 17 A.Rasooli+1 COSHH: A Classification and Optimization based Scheduler for Heterogeneous Hadoop... McMaster (2013) 18 S.Das+5 Ricardo: Integrating R and Hadoop SIGMOD (2010) 19 X.Gao+2 Experimenting with Lucene Index on HBase in an HPC Environment 1st HPCDB (2012) M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 4/23 . 4/23
  • 5. . Example : Superspreaders M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 5/23 . 5/23
  • 6. . Example Problem : Superspreaders . A Superspreader... . ... is a one-to-many (o2m) traffic artifact where one host tries to contact many other hosts within short timespan . reverse of Superspreader is a Flash Crowd -- same algorithm many-to-many (m2m, groups) is even more complex a known problem 16 . QUIZ . .How to detect superspreaders using MapReduce? Any ideas? 16 S.Venkataraman+3 New Streaming Algorithms for Fast Detection of Superspreaders NDSS (2005) M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 6/23 . 6/23
  • 7. . Superspreaders: (1) raw packets time, sip, sport, dip, dport, psize, protocol -- the usual packet tuple convert into text for MapReduce jobs M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 7/23 . 7/23
  • 8. . Superspreaders: (2) 1st MapReduce . Step 1 . ....is to extract unique sip-dip pairs 3rd column is count (took word counting for basis) ordered by sip takes time because needs to process all the data M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 8/23 . 8/23
  • 9. . Superspreaders: (3) 2nd MapReduce . Step 2 . ... is to count unique sips in sip-dip pairs . based on the output of the 1st job faster because data is relatively small M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 9/23 . 9/23
  • 10. . Superspreaders: Problem NOT Solved no time/sequence in MapReduce, no way to process data in a time window do not know what to discard (tail, small counts) midway until all data is aggregated -- ineffective for large datasets ... M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 10/23 . 10/23
  • 11. . Proposal: TABID + Sketches M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 11/23 . 11/23
  • 12. . TABID: Time-Aware BIg Data TABID: acronym of Time Aware BIg Data the grain is larger then key-value store, but better than HDFS shards 19 KV Store Hadoop (HDFS) and MapReduce TABID Time-Aware Big Data (this demo) HDFS + Lucene Index 19 X.Gao+2 Experimenting with Lucene Index on HBase in an HPC Environment 1st HPCDB (2012) M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 12/23 . 12/23
  • 13. . Sketches (Data Streaming) . Data Streaming . ... is a new concept where data is processed in realtime without using any storage . a relatively recent but well defined and mathematically/statistically formulated 11 many known interesting algorithms 12 algorithms for specific problems 14 15 16 main features: space efficiency, statistical rigidity (information theory), speed 11 S.Muthukrishnan Data Streams: Algorithms and Applications Foundations and Trends... (2005) 12 M.Sung+3 Scalable and Efficient Data Streaming Algorithms for Detecting .... ICDEW (2006) M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 13/23 . 13/23
  • 14. . Hadoop/MapReduce . Hadoop is... . ...a platform where your software meets with data shards . One Physical Machine (1 shard) file A file B file C ā€¦ Hadoop Space Read/parse Find Hadoop Job (your code) Hadoop Job (your code) Hadoop Job (your code) Hadoop Job (your code) many many Manager Name Server(s) Client Machine Hadoop Client Start Use Your Code You Deploy shards are distributed across the cluster nameserver is the bottleneck fairness problems are when heavy and light jobs run together at the same shard M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 14/23 . 14/23
  • 15. . TABID/Sketches . TABID is... . ...an API that helps your jobs get access to data in its natural time order . One Physical Machine (1 shard) STuimb-eSltinoere Store TABID Node TABID Manager Schedule Client Machine TABID Client Start Use Your Sketcher You Multicore Replay data shards are downloaded and replayed jobs run on multicore jobs can use any data structure (well beyond key-value) jobs are data streaming sketches -- can be selected from a library M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 15/23 . 15/23
  • 16. . TABID: BigData Replay Core X Core 1 Core 1 Read/prepare One SketcOhne Sketch One Sketch Start End End End TABID Manager Now(replay) ā€¦. Time-Aligned Big Data Cursor Time Direction Shared Memory Start M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 16/23 . 16/23
  • 17. . TABID: Lockfree Shared Memory lockfree shared memory is a new branch of parallel processing locks are bad for multicore (memory fences 20) a generic lockfree design in 05 and software implementation in 23 some attempts to apply multicore to MapReduce 21 22 20 M.Aldinucci+2 FastFlow: Efficient Parallel Streaming Applications on Multi-core Universita di Pisa (2009) 23 current MCoreMemory project page https://github.com/maratishe/mcorememory (myself) M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 17/23 . 17/23
  • 18. . Wrapup (Problemā†’Solution) 1. MapReduce is about counting, no rich datatypes SOLVED ā†’ any datatype, stored as JSON 2. MapReduce has no solution for heterogeneous jobs SOLVED ā†’ TABID optimizes jobs-to-core mapping (bin packing) 3. MapReduce has accountability problem because clients make their own jobs SOLVED ā†’ a library of sketches based on known data streaming algorithms 4. ... many smaller solutions M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 18/23 . 18/23
  • 19. . Hadoop vs Tabid DEMO (today) working Hadoop and TABID platforms can play with various configurations on the spot . Replayable BigData for Multicore Processing and Stat... Rigid Sketching .Marat Zhanikeev ā€“ maratishe@gmail.com ā€“ maratishe.github.io ā€“ http://bit.do/marat141105 1/1 M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 19/23 . 19/23
  • 20. . Thatā€™s all, thank you ... M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 20/23 . 20/23
  • 21. . Performance under Distributions comparative efficiency (right) and job-to-core mapping efficiency (left) all under heterogenious (variance) job distributions Tuples: min lifespan, max lifespan, exponent 10,1000,0.01 10,1000,0.05 10,1000,0.1 10,1000,0.3 10,1000,0.7 0 200 400 600 800 1000 Number of Sketches 5.4 5.2 5 4.8 4.6 4.4 4.2 4 3.8 Log of Sketchbyte Ratio (HADOOP/TABID) 10,1000,0 More longer lifespans Tuples: min overhead, max overhead, exponent 0 200 400 600 800 1000 Number of Sketches 6.65 5.95 5.25 4.55 3.85 3.15 Log of Max TABID Overhead (per core) 100,10000,0 100,10000,0.7 100,10000,0.3 100,10000,0.1 100,10000,0.05 100,10000,0.01 Mostly large overhead M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 21/23 . 21/23
  • 22. . Shared Memory : Logic Shared Memory IDLE All sketches caught up IDLE Cursor Time Sketch time ... Data Stream ā€¦ All sketches Ring buffer Process Data Set Sketch Time Add data Sketch started Monitor Sketch Manager cursor Cursor moved Each item Cursor = sketch End time of life Global start Wait for all sketches End of data coursor written by manager and read by jobs on cores only manager writes, cores only read -- lockfree design zero collissions guaranteed by API M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 22/23 . 22/23
  • 23. . Related Subjects rigid statistics in traffic analysis QoS of user communities 01, dynamic network management, etc. isn't BigData replay a bottleneck? not really, circuits for bulk transfer 02 or multisource aggregation 03 can support very high throughputs MapReduce has several bottlenecks: namespace lookup, HDFS, etc. -- see 06 for details Why BigData Replay? -- it's a new ecosystem bigdata hoarders announce replays, openly collect public jobs until a deadline, open outcome one good solution for bigdata ā†’ opendata innitiative 01 myself+0 A holistic community-based architecture for measuring end-to-end QoS at data centres IJCSE (2014) 02 myself+0 Circuit Emulation for Big Data Transfers in Clouds Networking for Big Data, Wiley (in print) (2015) 03 myself+0 Multi-Source Stream Aggregation in the Cloud Wiley (2014) 06 K.Shvachko HDFS Scalability: the Limits to Growth the Magazine of USENIX, vol.35, no.2 (2012) M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 23/23 . 23/23
  • 24. . [01] myself+0 (2014) A holistic community-based architecture for measuring end-to-end QoS at data centres IJCSE [02] myself+0 (2015) Circuit Emulation for Big Data Transfers in Clouds Networking for Big Data, Wiley (in print) [03] myself+0 (2014) Multi-Source Stream Aggregation in the Cloud Wiley [04] myself+0 (2014) Optimizing Virtual Machine Migration for Energy-Efficient Clouds IEICEJ [05] myself+0 (2014) A lock-free shared memory design for high-throughput multicore packet traffic capture M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 23/23 . 23/23
  • 25. . IJNM [06] K.Shvachko (2012) HDFS Scalability: the Limits to Growth the Magazine of USENIX, vol.35, no.2 [07] Y.Chen+3 (2011) The Case for Evaluating MapReduce Performance Using Workload Suites MASCOTS [08] Z.Ren+4 (2012) Workload Characterization on a Production Hadoop Cluster... IEEE Workload... [09] A.Rowstron+4 (2012) Nobody ever got fired for using Hadoop on a cluster 1st HTCDP [10] (current) Small File Problem in Hadoop (blog) http://amilaparanawithana.blogspot.jp/2012 M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 23/23 . 23/23
  • 26. . [11] S.Muthukrishnan (2005) Data Streams: Algorithms and Applications Foundations and Trends... [12] M.Sung+3 (2006) Scalable and Efficient Data Streaming Algorithms for Detecting .... ICDEW [13] Z.Bar-Yossef+2 (2002) ...streaming algorithms, with an application to counting triangles in graphs ACM SODA [14] M.Charikar+2 (2002) Finding frequent items in data streams 29th International Colloquium on Automata... [15] M.Datar+3 (2002) Maintaining stream statistics over sliding windows SIAM [16] S.Venkataraman+3 (2005) M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 23/23 . 23/23
  • 27. . New Streaming Algorithms for Fast Detection of Superspreaders NDSS [17] A.Rasooli+1 (2013) COSHH: A Classification and Optimization based Scheduler for Heterogeneous Hadoop... McMaster [18] S.Das+5 (2010) Ricardo: Integrating R and Hadoop SIGMOD [19] X.Gao+2 (2012) Experimenting with Lucene Index on HBase in an HPC Environment 1st HPCDB [20] M.Aldinucci+2 (2009) FastFlow: Efficient Parallel Streaming Applications on Multi-core Universita di Pisa [21] R.Brightwell (2008) M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 23/23 . 23/23
  • 28. . Workshop on Managed Many-Core Systems 1st Workshop ...Many-Core Systems [22] R.Chen+2 (2010) Tiled-MapReduce: Optimizing Resource Usages of Data-parallel... with Tiling 19th PACT [23] current (myself) MCoreMemory project page https://github.com/maratishe/mcorememory M.Zhanikeev -- maratishe@gmail.com -- Replayable BigData for Multicore Processing and ... Sketching -- http://bit.do/marat141105 -- 23/23 . 23/23