Big Data, Beyond the Data Center

Big Data, Beyond the Data Center
Gilles Fedak
Gilles.Fedak@inria.fr
INRIA, University of Lyon, France
Cluj Economics and Business Seminar Series (CEBSS)
University Babes-Bolyai Faculty of Economics and Business
Administration
Cluj-Napoca, Romania
6/11/2014

AVALON Team
I Located in Lyon, France
I Joint Research Group
I INRIA : French National Institute for Research in Informatics
I ENS-Lyon : Ecole Normale Suprieure
I University of Lyon
G. Fedak(INRIA/Avalon Team) BitDew/Active Data 6/11/2014

AVALON Members
2
Avalon Members @ April 1st, 2014
Faculty Members (8)
(4 INRIA, 1 CNRS, 2 UCBL, 1 ENSL)
• Eddy Caron, MCF ENS Lyon, HDR (80%)
• Frédéric Desprez, DR INRIA, HDR (30%)
• Gilles Fedak, CR INRIA
• Jean-Patrick Gelas, MCF UCBL
• Olivier Glück, MCF UCBL
• Laurent Lefèvre, CR INRIA, HDR
• Christian Perez, DR INRIA, HDR, Project leader
• Frédéric Suter, CR CNRS
PhD students (6)
• Maurice-Djibril Faye, ENS-Lyon / Université
Gaston Berger (Sénégal)
• Sylvain Gault, MapReduce, INRIA
• Anthony Simonet, MapReduce, INRIA
• Vincent Lanore, ENSL
• Arnaud Lefray, SEED4C, ENSIB
• Daniel Balouek, CIFRE New Generation SR
Engineers (3+4+1)
• Simon Delamare, IR CNRS (80%)
• Jean-Christophe Mignot, IR CNRS (20%)
• Matthieu Imbert, INRIA SED (40%)
• Sylvain Bernard, CloudPower
• François Rossigneux, XCLOUD
• Guillaume Verger, SEED4C
• Yulin Zhang Huaxi, SEED4C
• Laurent Pouilloux (AE Héméra)
Postdoc
• Jonathan Rouzaud-Cornabas, CNRS
Temporary Teacher-Researcher
• Ghislain Landry Tsafack, UCBL
Assistant
• Evelyne Blesle, INRIA
Avalon Team Overview

AVALON Topics
3
Avalon: Research Activities
!#$%'($)*#+,-.)
/00-'1%2#()3)
4,.#51,)
*#+,-.)
/-$#'67'1.)
4,.#51,)*%(%$,,(6)
Super-computers
(Exascale)
Desktop
Grids
Clouds
(IaaS, PaaS)
Grids
(EGI)
Avalon Team Overview
Energy Application Profiling and Modelization
• Large Scale Energy Consumption Analysis for
Physical and Virtual Resources
• Energy Efficiency of Next Generation Large Scale
Platforms
Data-intensive Application Profiling, Modeling,
and Management
• Performance Prediction of Parallel Regular
Applications
• Modeling Large Scale Storage Infrastructure
• Data Management for Hybrid Computing
Infrastructures
Resource Agnostic Application Description
Model
• Moldable Application Description Model
• Dynamic Adaptation of the Application Structure
Application Mapping and Scheduling
• Application Mapping and Software Deployment
• Non-Deterministic Workflow Scheduling
• Security Management in Cloud Infrastructure
/00-'1%2#(.)

Big Data ...
I Huge and growing volume of information originating from
multiple sources.
0+1)*2+(2() !#$%'($#) *+',-./#) !$(%($) 34()5-$-)
I Impacts many scienti

c disciplines and industry branches
I Large Scienti

c Instruments (LSST, LHC, OOOI), but not only
(Sequencing machines)
I Internet and Social Network (Google, Facebook, Twitter, etc.)
I Open Data (Open Library, Governemental, Genomics)
! impacts the whole process of scienti

c discovery (4th paradigm
of science)

... or Big Bottlenecks ?
I Big Data creates several challenges :
I how to scale the infrastructure ?
I end-to-end performance improvement, inter-system
optimization.
I how to improve productivity of data-intensive scientist ?
I work
ow, programming language, quality of data provenance.
I how to enable collaborative data science ?
I incentive for data publication, data-sets sharing, collaborative
work
ow.
I New models and software are needed to represent and
manipulate large and distributed scienti

c data-sets.

BitDew: Large Scale Data
Management
Haiwu He (CAS/CNIC), Franck Cappello (ANL, UIUC)
I G. Fedak, H. He, and F. Cappello. BitDew: A Programmable Environment for Large-Scale Data
Management and Distribution. In Proceedings of the ACM/IEEE SuperComputing Conference (SC08),
pages 112, Austin, USA, November 2008.
I BitDew: A Data Management and Distribution Service with Multi-Protocol and Reliable File Transfer. G.
Fedak, H. He, and F. Cappello Journal of Network and Computer Applications, 32(5):961{975, 2009.
I H. He, G. Fedak, B. Tran, and F. Cappello. BLAST Application with Data-aware Desktop Grid
Middleware. In Proceedings of 9th IEEE International Symposium on Cluster Computing and the Grid
CCGRID09, pages 284291, Shanghai, China, May 2009.

Towards Data Desktop Grid
Desktop Grid or Volunteers Computing Systems
I High Throughput Computing over Large Sets of Idle Desktop
Computers
I Mature technology
I EU support : European Desktop Grid Infrastructures
But ...
I High number of resources
I Volatility
I Lack of trust
I Owned by volunteer

Towards Data Desktop Grid
Desktop Grid or Volunteers Computing Systems
I High Throughput Computing over Large Sets of Idle Desktop
Computers
I Mature technology
I EU support : European Desktop Grid Infrastructures
But ...
I High number of resources
I Volatility
I Lack of trust
I Owned by volunteer
I Scalable but mainly for embarrassingly parallel applications
with few I/O requirements
I Enabling data-intensive applications
I Bridge with Cloud and Grid infrastructures

Large Scale Data Management
BitDew : a Programmable Environment for Large Scale Data
Management
Key Idea 1: provides an API and a runtime environment which
integrates several P2P technologies in a consistent
way
Key Idea 2: relies on metadata (Data Attributes) to drive
transparently data management operation :
replication, fault-tolerance, distribution, placement,
life-cycle.

BitDew : the Big Cloudy Picture
I Aggregates storage in a
single Data Space:
I Clients put and get data
from the data space
I Clients de

nes data
attributes
Data Space
put get
client
REPLICAT =3

BitDew : the Big Cloudy Picture
I Distinguishes service nodes
(stable), client and Worker
nodes (volatile)
I Service : ensure fault
tolerance, indexing and
scheduling of data to
Worker nodes
I Worker : stores data on
Desktop PCs
I push/pull protocol between
client ! service Worker
Data Space
Service Nodes Reservoir Nodes
pull
put get
client
pull
pull

Data Attributes
replica : indicates how many occurrences of data should be available at
the same time in the system
resilience : controls the resilience of data in presence of machine crash
lifetime : is a duration, absolute or relative to the existence of other data,
which indicates when a datum is obsolete
affinity : drives movement of data according to dependency rules
transfer protocol : gives the runtime environment hints about the

le transfer pro-
tocol appropriate to distribute the data
distribution : which indicates the maximum number of pieces of Data with the
same Attribute should be sent to particular node.

Architecture Overview
File
System
Http
Ftp
Bittorrent
Data
Scheduler
Data
Transfer
Active Data BitDew
Transfer
Manager
Command
-line Tool
Service
Container
Storage
Master/
Worker
API
Service
Back-ends
Applications
Data
Catalog
Data
Repository
SQL
Server
DHT
BitDew Runtime Environnement
I Programming APIs to create Data, Attributes, manage

le
transfers and program applications
I Services (DC, DR, DT, DS) to store, index, distribute,
schedule, transfer and provide resiliency to the Data
I Several information storage backends and

le transfer
protocols

Examples of BitDew Applications
I Data-Intensive Application
I DI-BOT : data-driven master-worker Arabic characters
recognition (M. Labidi, University of Sfax)
I MapReduce vs Hadoop, (X. Shi, L. Lu HUST, Wuhan China)
I Data Management Utilities
I File Sharing for Social Network (N. Kourtellis, Univ. Florida)
I Distributed Checkpoint Storage (F. Bouabache, Univ. Paris XI)
I Grid Data Stagging (IEP, Chinese Academy of Science)

MapReduce for Hybrid Distributed
Computing Infrastructures
Haiwu He (CAS/CSNET), Bing Tang (WUST), Xunhua Shi Lu Lu
(HUST), Mircea Moca Gheorghe Silaghi (Univ Babes Bolay)
I Towards MapReduce for Desktop Grid Computing .B. Tang, M. Moca, S. Chevalier, H. He, and G. Fedak.
In Fifth International Conference on P2P, Paral lel, Grid, Cloud and Internet Computing (3PGCIC'10),
Fukuoka, Japan, November 2010. I Distributed Results Checking for MapReduce on Volunteer Computing Mircea Moca, Gheorghe Cosmin
Silaghi and Gilles Fedak, in 4th Workshop on Desktop Grids and Volunteer Computing Systems (PCGrid
2010) IPDPS'2011, Anchorage Alaska.
I Assessing MapReduce for Internet Computing: a Comparison of Hadoop and BitDew-MapReduce Lu Lu,
Hai Jin, Xuanhua Shi and Gilles Fedak in the 13th ACM/IEEE International Conference on Grid Computing
(Grid 2012), Beijing, China, 2012
I Data-Intensive Computing on Desktop Grids, H. Lin and W.-C. Feng and G. Fedak Book Chapter in
Desktop Grid Computing Book, CRC Press, 2012 I Parallel Data Processing in Dynamic Hybrid Computing Environment Using MapReduce, Bing Tang, Haiwu
He, Gilles Fedak, 4th International Conference on Algorithms and Architectures for Parallel Processing
(ICA3PP'14), LNCS/Springer Verlags, August 24-27, Dalian, China, 2014

What is MapReduce ?
I Programming Model for data-intense applications
I Proposed by Google in 2004
I Simple, inspired by functionnal programming
I programmer simply de

nes Map and Reduce tasks
I Building block for other parallel programming tools
I Strong open-source implementation: Hadoop
I Highly scalable
I Accommodate large scale clusters: faulty and unreliable
resources
MapReduce: Simpli

ed Data Processing on Large Clusters Jerey Dean and Sanjay
Ghemawat, in OSDI'04: Sixth Symposium on Operating System Design and
Implementation, San Francisco, CA, December, 2004.

Challenge of MapReduce over the Internet
Data Split Final Output
Reducer
Combine
Mapper
Input Files
Intermediate
Results
Output Files
Output1
Mapper
Mapper
Reducer
Mapper
Reducer
Mapper
Output2
Output3
I no shared

le system nor direct
communication between hosts
I Faults and hosts churn
I Result Certi

cation of
Intermediate Data
I Collective Operation (scatter
+ gather/reduction)

Implementing MapReduce over BitDew
Latency Hiding
I Multi-thredead worker to overlap
communication and computation.
I The number of maximum concurrent
Map and Reduce tasks can be
con

gured, as well as the minimum
number of tasks in the queue before
computations can start.
Barrier-free computation
I Reducers detect duplication of intermediate results (that happen because
of faults and/or lags).
I Early reduction : process IR as they arrive ! allowed us to remove the
barrier between Map and Reduce tasks.
I But ... IR are not sorted.

Scheduling and Fault-tolerance
2-level scheduling
1. Data placement is ensured by the BitDew scheduler, which is mainly
guided by the data attribute.
2. Workers periodically report to the MR-scheduler, running on the master
node the state of their ongoing computation.
3. The MR-scheduler determines if there are more nodes available than
tasks to execute which can avoid the lagger eect.
Fault-tolerance
I In Desktop Grid, computing resources have high failure rates :
! during the computation (either execution of Map or Reduce
tasks)
! during the communication, that is

le upload and download.
I MapInput data and ReduceToken token have the resilient attribute
enabled.

MapReduce Evaluation
#$'
a 2.7GB
collective
measure
chunks size
broadcast. The
#!!
#$!!
#!!!
'!!
(!!
!!
$!!
!
' #( %$ ( #$' $( #$
*+,-./
0?6,@A?B@9:3C5D/4
Figure 6. Scalability evaluation on the WordCount application: the y axis presents
the throughput in MB/s and the x axis the number of nodes varying from 1 to 512.
Figure: Scalability evaluation on the WordCount application: the y axis
presents the throughput in MB/s and the x axis the number of nodes
varying from 1 to 512.
Table II
EVALUATION OF THE PERFORMANCE ACCORDING TO THE NUMBER OF MAPPERS
AND REDUCERS.
#Mappers 4 8 16 32 32 32 32
#Reducers 1 1 1 1 4 8 16

Data Security and Privacy
Distributed Result Checking
I Traditional DG or VC projects implements result checking on
the server.
I IR are too large to be sent back to the server
! distributed result checking
I replicates the MapInput and Reducers
I select correct results by majority voting
!
!#
!$
%
%'
!)
(%
*
+
(a) No replication
(b) Replication of the Map tasks
(c) Replication of both Map and Reduce tasks
Figure 2. Dataflows in the MapReduce implementation
Ensuring distinct Data workers as Privacy
input for Map task. Consequently, the
reducer receives a set of rm versions of intermediate files
I Use corresponding a hybrid to a map infrastructure input fi. After receiving : rm
composed + 1
2 of private and public resources
I Use IDA (Information Dispersal Algorithms) approach to distribute and
identical versions, the reducer considers the respective result
correct and further, accepts it as input for a Reduce task.
Figure 2b illustrates the dataflow corresponding to a
MapReduce execution where rm = 3.
In order to activate the checking mechanism for the
results received from reducers, the master node schedules the
received intermediate result is present in its own codes list
K, ignoring failed results.
IV. EVALUATION OF THE RESULTS CHECKING
MECHANISM
In this section we describe the model for characteriz-ing
the errors produced by the MapReduce algorithm in
Volunteer Computing. We assume that workers sabotage
independently of one another, thus we do not take into
store securely the data

Active Data: Data Life-Cycle
Management
Anthony Simonet (INRIA), Matei Ripeanu (UCB), Samer Al-Kiswany
(UCB)
I Active Data: A Data-Centric Approach to Data Life-Cycle Management Anthony Simonet, Gilles Fedak,
Matei Ripeanu and Samer Al-Kiswany. 8th Parallel Data Storage Workshop (PDSW'13), Proceedings of
SC13 workshops, Denver, November, 2013 (position paper 5 pages)
I Active Data: A Programming Model for Data Life-Cycle Management on Heterogeneous Systems and
Infrastructures. Anthony Simonet, Gilles Fedak and Matei Ripeanu. Technical Report under evaluation.

Focus on Data Life-Cycle
Data Life Cycle: the course of operational stages through which
data pass from the time when they enter a system to the time
when they leave it.
!#$%%'()* +,-.,(-%)/* 01(,2/-*
!)234%*
!)234%*

Use Case : The Advanced Photon Source
I 3 to 5 TB of data per week on this detector
I Raw data are pre-processed and registered in the Globus
Catalog :
I Data are curated by several applications
I Data are shared amongst scienti

c user
Transfer
Instrument(Beamline) LocalStorage
MetadataCatalog
Extract Register Metadata
RemoteData Center
Transfer
AcademicCluster
Analysis
More analysis
Upload result
Register result metadata

Objectives
We're aiming at :
I A model to capture the essential life cycle stages and
properties: creation, deletion, faults, replication, error
checking . . .
I Allows legacy systems to expose their intrinsic data life cycle.
I Allow to reason about data sets handled by heterogeneous
software and infrastructures.
I Simplify the programming of applications that implement data
life cycle management.

Active Data principles
System programmers expose their system's internal data life cycle
with a model based on Petri Nets.
A life cycle model is made of Places and Transitions
Created

t1
Written
t2
Read
t3
Terminated
t4
Each token has a unique identi

er, corresponding to the actual
data item's.

Created
t1
Written

t2
Read
t3
Terminated
t4
A transition is

red whenever a data state changes.

Created
t1
Written

t2
Read
t3
Terminated
t4
public void handler () {
compressFile ();
}
Code may be plugged by clients to transitions.
It is executed whenever the transition is

red.

Active Data features
The Active Data programming model and runtime environment:
I Allows to react to life cycle progression
I Exposes transparently distributed data sets
I Can be integrated with existing systems
I Has scalable performance and minimum overhead over
existing systems

Integration with Data Management Systems
Created t1 To Place
t2
Deleted
t3
t4 Placed
Loop
t5
2
t6
t7
t9
Lost
t8
(a) Bitdew Scheduler
Created
t1
Ready
t2
Started
Deleted Invalid
t3
t4
Completed
t5
t6
t7
t8
(b) Bitdew File Transfer
IN CREATE
t1
IN MOVED FROM
t2
t3
IN MOVED TO
t4
IN CLOSE WRITE
t5
t6
t11 t9 t7 t8 t10
DELETED
t14
t13 CREATED
t12
(c) inotify
Deleted
t8
t7 t6
Get
Put
t5
Created
(d) iRODS
Created
t1 t2
Succeeded Failed
t3 t4
Deleted
(e) Globus Online
Figure 3: Data life cycle models for four data management system.
structed from its documentation.
Reading the source code of BitDew, we observe that data
items are managed by instances of the Data class, and this
class has the status variable which holds the data item
state. Therefore, we simply deduce from the enumeration of
the possible value of status the set of corresponding places
in the Petri Net (see Figure 2a and 2b). By further analyz-ing
the source code, we construct the model and summarize
how high level DLM features are modelized using Active
Data model:
Scheduling and replication Part of the complexity of
the data life cycle in BitDew comes from the Data Scheduler
I BitDew (INRIA), programmable
environment for data management.
I inotify Linux kernel subsystem:
noti

Big Data, Beyond the Data Center

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Big Data, Beyond the Data Center

Similar to Big Data, Beyond the Data Center (20)

Recently uploaded

Recently uploaded (20)

Big Data, Beyond the Data Center