SlideShare a Scribd company logo
Designing and Implementing a 
cloud-hosted SaaS for data 
movement and Sharing with 
SlapOS 
Authors: Walid Saad, Heithem Abbes, Mohamed Jemni 
and Christophe Cerin 
Journal: International Journal of Big Data Intelligence 
Online Date: Thursday, July 24, 2014 
By:- Arnob Saha (L20339084) 
Hari Prasad Dhonju Shrestha (L20352046) 
1
Outlines 
• Abstracts 
• Introduction 
•Motivation and fundamental issues 
• Related work 
• SlapOS overview 
• Design and Implementation issues 
• Experimental results 
• Conclusion and future works 
• Acknowledgements 
• Reference 
2
Abstract 
• Tools and framework developed to manage and handle the 
big amount of data for the grid platform. 
• Tools not adopted because of the complexity of the 
installation and configuration processes. 
• SlapOS (Simple Language for Accounting and Provisioning 
Operating System) emerged 
• Main aim -> to hide the complexity of IT infrastructures 
Software deployment from users 
• Paper propose a cloud-hosted data grid using the SlapOS 
cloud 
• Through a software as a service (SaaS) solution, users can 
request and install automatically any data movement and 
sharing tools like Stork and Bitdew without any intervention 
of a system administrator 
3
Introduction 
• Many real world scientific and enterprise applications deal with a 
huge amount of data. The emergence of data-intensive application 
has prompted scientists around the world to enable data grids. 
Examples bio-informatics, medical imaging, high energy physics, 
coastal and environmental modelling and geospatial analysis. 
• In order to process large data-sets, users need to access, process 
and transfer large datasets stored in distributed repositories. 
• Paper proposed a self-configurable desktop grids (DGs) platform on 
demanda. 
• The Simple Language for Accounting and Provisioning Operating 
System (SlapOS) cloud presents a configurable environment in 
terms of the OS and the software stack to manage without the 
need of virtualisation techniques. 
4
Introduction (contd…) 
• We focus in this paper on a subset of the overall research about 
interoperability between DGs and clouds namely data tools as 
hosted software as a service (SaaS) frameworks. 
• We present the design and the implementation of two Software 
as a Service tools for data management. The first service 
provides a mean for users to transfer data from their sites to the 
computation or simulation sites. The second service will be used 
to share data in widely distributed environment. 
• The challenge is how to: 
• imagine automatic data management tools that are able to mask the 
installation and configuration difficulties of data management software 
• deliver data management functionality as hosted services via web user 
interfaces. 
5
Introduction (contd…) 
6
Motivations and fundamental issues 
• e-Science applications require efficient data management and 
transfer software in wide-area, distributed computing environment. 
• To achieve data management on demand, the users need a resilient 
service and move data transparently 
• No IT knowledge required, no software 
download/installation/configuration steps. 
• Implementations based on: 
• Stork data scheduler: Manage data movement over wide area network, 
using intermediate data grid storage and different protocols 
• Bitdew: make data accessible & shared from other resources including end-user 
desktops and servers 
• SlapOS: with only a ‘one-click’ process instantiate, configure data 
managers(stork+ Bitdew) and deploy them over the internet 
7
Related Works 
• To manage the low-level data handling issues on grid systems 
• High-level tools for co-scheduling of data and computation in grid 
environments. 
• Research in data management using SaaS-based services. 
• Data management and transfer in grid environment 
• GridFTP is the most widely used tool through parallel streams. 
• Representative examples of storage systems includes SRMs, SRB, 
IBP and NeST 
• FreeLoader framework is designed to aggregate space and I/O 
bandwidth contributions from volatile desktop storage 
• Farsite builds a secure file system using entrusted desktop 
computers 
• Chirp is a user-level file sytem for collaboration across distributed 
system like cluster, clouds and grids. 
8
Related Works (condt...) 
• Bitdew is an open source data management for grid, DG and cloud 
computing. 
Higher level tools for data scheduling 
• Stork: a schedular for data placement activities in a grid env 
• Using stroke input data will be queued, scheduled, monitored, 
managed and even check-pointed. 
• Stork provides solutions for data placement problems both in the 
grid and DG environment since it can interact with different data 
transfer protocol such as FTP, GridFTP, HTTP and DiskRouter. 
Data orchestration through SaaS technologies 
• Globus Online (GO) is a project that delivers data management 
functionalities not as downloadable software but as hosted SaaS. 
• Allows users to move, synchronize and share their data using a web 
browser. 
9
SlapOS overview 
• An open source distributed operating system 
• Provides an environment for automating the deployment of 
applications 
• Based on the idea that ‘everything is process’, SlapOS combines 
grid computing, in particular the concepts inherited from 
BonjourGrid and the techniques inherited from the field of ERP in 
other to manage, through the SlapGrid daemon, IaaS, PaaS and 
SaaS cloud services. 
• The SlapOS strengths are the compatibility with any operating 
system, in particular GNU Linux, all software technologies and 
support for several infrastructure 
• More than 500 different recipes are available for consumer 
application such as Linux Apache MySQL PHP 
10
SlapOS key concepts 
• SlapOS architecture is composed of two types of 
components: SlapOS master and SlapOS node 
• SlapOS master: it acts as centralized directory for all SlapOS nodes 
and it knows the location where software are located and all 
software that are installed. 
• SlapOS node: it can be dedicated or volunteer node. The master’s 
role is to install applications and run processes on SlapOS nodes. 
• In comparision with the traditional clouds,SlapOS is based on 
an opportunistic view. 
• In its normal utilisation, the requests are serviced by the data 
center nodes. Whenever the number of requests reach a 
peak, SlapOS can redirect some of them on volunteer node. 
11
SlapOS key concepts 
• Doing so, the system can win on two points, 
• It maintains a good response time in the request treatment 
• In the case of increase in the number of cloud’s customers, there is a 
good alternative for guaranteeing the SLAs without buying new 
machines 
• SlapOS node consists essentially of a basic Linux distribution, a 
daemon named SlapGrid, a Buildout environment for bootstrapping 
applications and supervisord to control processes. 
• Node can receive a request to install software form master, receive 
request asking the master to deloy an instance of software 
• SlapOS software on a node is called a ‘Software Release’ and it 
consists of all the binaries to run the software. 
• ‘Software Instance’ -> multiple instances of the corresponding s/w 
12
How to join SlapOS? 
• SlapOS is a voluntary cloud, which mean that each person can potentially 
add its own server into the cloud. 
• To participate to a BOINC and/or Condor project, one has to: 
• Register on a SlapOS master 
• Install SlapOS node on the node. 
• Add a virtual server on the master and link it to the physical server by 
configuring the node installed on the physical server. 
• Select and install application, from the list of available application on the 
master, that will be allowed to be deployed on the node. 
• The number of instances that can be run on the node depends on the 
capacity and the configuration of SlapOS on the server. 
• To make application available on the SlapOS master, it is necessary to 
integrate them to SlapOS. 
• The integration of application to SlapOS goes through the writing of 
Buildout profiles, consisting mainly of the file software.cfg which will then 
make reference to all other reqired files. 
13
Design and Implementation Issues 
• Implementation steps: 
• SlapOS uses Buildout technologies to install software and deploy instances. 
• In the Stork case, software is divided in three profiles 
1. Component (slapos/component/stork/buidlout.cfg): we find here all the 
dependencies used by by Stork. Buildout will allow us to integrate the profile 
and dependencies using the rules extends in order to install mainly the Globus 
Client, Globus GSI grid security infrastructure. 
2. Software Release profile(SR): located on a remote git server and defined by its 
URL ( http://git-repository/slapos/software/stork/software.cfg ) . SR describe 
the installation of Stork and its dependencies without configuration files and 
disk image creation. When SlapOS installs a Stork SR, it launchesa Buildout 
command with the correct URL 
3. Software Instance: It will reuse an installed Software Released by 
creating wrappers, configuration files and anything specific to an 
instance. The whole process creates a stork configuration file. 
14
Design and Implementation Issues (contd..) 
• Architecture overview: SlapOS is based on a master-slave 
paradigm. All steps that allow user to participate in SlapOS 
community and exploit Stork services are as follow: 
1. Slapos-connect(Login, Password) 
2. Request-stork-software(Slave_Node_Name, Software_Release_Name) 
3. Download-stork-software(Stork_Software_Release_URL) 
4. Request-instance-parameter(Slap-Parameters_List) 
5. Deploy-instance(Slap_Parameter_List) 
6. Submit-data-job(submit_dap_file, stork_server) 
7. Move-data(src_data_url, dest_data_url) 
15
Architecture overview 
• Figure 2 Schematic of the Stork SaaS via SlapOS cloud 
16
Security and authentication process 
• Security in Stork is an important issue with many aspects to 
consider. The most important is the way in which user want to run 
Stork daemons. Current Stork releases fall into three main 
schema: 
1. single host: Stork_Server and Stork_Client are running in the same 
machine. 
2. Multiple hosts: Stork_Serve in one location and Stork_Client in another 
one. 
3. Multiple host and third party transfer: Stork_Server manage 
movement of data among two or more remote locations. 
• Many authentication mechanisms are available like SSL, Kerberso, 
PASSWORD and GSI. 
• Stork_Server provides only GSI authentication to allow different 
client machines to connect to it. 
17
Security configuration 
• Users can easily run 100+ Stork instances on a ‘small cluster’, each 
of them with its own independent daemons and configuration. 
• Sercurity setting depends on the manner in which the users want to 
deploy their Stork instances. 
1. Running Stork in the SlapOS cloud: After installation of the 
SlapOS slave node, the user requests one instance which includes 
two Stork components(server and client tools), both will use the 
same configuration file. 
2. Submitting jobs to an external Stork server: An important 
property of our approach is the ability to handle transfers using 
existing Stork_Server. 
3. Remote GridFTP transfer: to use GSIFTP transfers with Stork, the 
users need to specify a valid grid proxy and a user crendentials in 
place. 
18
Security configuration 
19
Data sharing via SaaS 
• Once data are placed on SlapOS, a second SaaS based on Bitdew is 
automatically launched to published and to distribute data over 
SlapOS community. 
• BitDew is a programmable framework for large-scale data 
management and distribution for DG systems. 
• Bitdew offers two sets of nodes: server (service host) and 
client(consumer) 
• To share data with Bitdew, end-users need to connect to SlapOS, 
request Bitdew software and specify information for instances 
deployment. 
• Cloud hosted approach divides the world in three sets of nodes: 
• Could-middleware node(SlapOS master), cloud-provider node (SlapOS 
slave node), SaaS instances (Bitdew server and client) 
20
Data sharing via SaaS 
• SlapOS user must invoke the following steps: 
• Request-instance-parameters: for client instances, slap-parametes are 
classified into two steps 
a. Bitdew_Server: the user sets information about the remote server 
hostname 
b. data information’s parameters: the user must specify the protocol 
used to get remote data and the signature of the file. 
• Deploy-instance 
• Share-data(transfer_protocal, data-path, properties.json) 
• Get-data(transfer_protocol, file_md5_ID) 
• Bitdew buildout profiles 
• The integration of Bitdew into SlapOS needs writing multiple Buildout 
profiles. Buildout profiles are divided in three types(component, software 
Release, Software Instances), organized into several directories. 
21
Bitdew buildout profiles 
22
Experimental results 
• Experiment have performed on the experimental grid computing 
infrastructure Grid 5000. Experiments were conducted in four 
cluster of Lyon site using more than 50 machines. Set two Debian 
Linux Distribution images of SlapOS. 
• Deployment steps of SlapOS on Grid ‘5000 
• SlapOS is designed to work natively with IPV6. Several restrictions are 
applied to limit access to and from outside the Grid ‘5000 infrastructure. To 
overcome restrictions, we prepared pre-compiled images containing all the 
standard install files of SlapOS: the kernel and runtime daemons. These 
images are also configured to run IPv6 at startup. Slapos-vifib image is 
implemented and slapos-image is used. 
• Usage scenario 
• To show the capacity of our cloud-hosted model to build a scalable 
platform for the purpose to manage bag-of-tasks applications with 
intensive data. 
23
Experimental results 
• Two type of metrics: 
• The scalability in terms of how-many-instances-requests-are-supported. If 
the master is overloaded, the time needed to respond to a request 
instance may increase. 
• Measure the time required to create Stork and Bitdew instances as a 
function of the number of SlapOS nodes. 
• In our experiments, we use blastn program to search respectively Human 
DNA sequences in DNA databases. To run BLAST jobs we need the BLAST 
application package, the DNA Genebase which contains millions of 
sequences is a compressed large archive, the DNA Sequence to compare 
with sequences in Genebase. 
• The recommended scenarios to be used in our experiments is shown in 
Algorithm. At the end of computation, each job will create a result file 
containing all matched sequences. 
24
Experimental results 
25
Experimental results 
• Experimentation stepss: 
26
Experimental results 
• Result Analysis 
• Data movement service completion time 
• All instances are launched simultaneously and completed successfully, the 
total completion time includes times to: 
• Register SlapOS node to the master: 
• Deploy of Stork instances: 
• Transfer BLAST files from NCBI FTP server to SlapOS nodes: 
• The completion time of instances is proportional to 
• the number of nodes connected to the master 
• the number of instances required simultaneously. 
• Data sharing service completion time 
• Deploy of server instances 
• Deployment of client instances 
• BLAST execution 
27
Experimental results 
• This figure illustrates the total completion time for two Stork 
instances using 50 SlapOS nodes (a total of 100 instances). All 
instances are launched simultaneously and completed successfully 
28
Conclusion and future work 
• The emergence of data-intensive applications and cloud SaaS 
technologies brought the flexibility to introduce new data 
management handling mechanism that help the basic scientist and 
the grid users to deploy easily their distributed platform. 
• This works focuses on data management as SaaS-based solutions 
for the purpose to mask the complexity of the installation and 
configuration processes and the IT infrastructure requirements. 
• Since SaaS solutions is already in production into the SlapOS cloud 
at Paris 13 University, our future research is focused more on self-configuration, 
scalability and security transfer. 
29
Acknowledgements 
• In France, this work is funded by the FUI-12 Resilience project from 
the ministry of industry. Experiments presented in this paper were 
partly carried out using the Grid ‘5000 testbed, supported by a 
scientific interest group hosted by Inria and including CNRS, 
RENATER and several Universities as well as other organisations 
(see https://www.grid5000.fr). Some experiments were carried out 
on the SlapOS cloud available at University of Paris 13 (see 
https://slapos.cloud.univ-paris13.fr). 
30
References 
• http://pypi.python.org/pypi/slapos.cookbook/ 
• Abbes, H., Cerin, C. and Jemni, M. (2008) ‘Bonjourgrid as 
adescentralized scheduler’, IEEE APSCC, December. 
• Foster, I. (2011) ‘Globus online: accelerating and democratizing 
science through cloud-based services’, IEEE Internet Computing, Vol. 
15, No. 3, pp.70–73. 
Thank YOU!!! 
31

More Related Content

What's hot

Cisco ASR1000 - архитектура, использование в сети предприятия, развитие плат...
 Cisco ASR1000 - архитектура, использование в сети предприятия, развитие плат... Cisco ASR1000 - архитектура, использование в сети предприятия, развитие плат...
Cisco ASR1000 - архитектура, использование в сети предприятия, развитие плат...
Cisco Russia
 
“見てわかる” ファイバーチャネルSAN基礎講座(第2弾)~FC SAN設計における勘所とは?~
“見てわかる” ファイバーチャネルSAN基礎講座(第2弾)~FC SAN設計における勘所とは?~“見てわかる” ファイバーチャネルSAN基礎講座(第2弾)~FC SAN設計における勘所とは?~
“見てわかる” ファイバーチャネルSAN基礎講座(第2弾)~FC SAN設計における勘所とは?~
Brocade
 
Is Is Routing Protocol
Is Is Routing ProtocolIs Is Routing Protocol
Is Is Routing Protocol
hayenas
 
“見てわかる” ファイバーチャネルSAN基礎講座(第5弾)~さあ、いよいよ、運用です!~
“見てわかる” ファイバーチャネルSAN基礎講座(第5弾)~さあ、いよいよ、運用です!~“見てわかる” ファイバーチャネルSAN基礎講座(第5弾)~さあ、いよいよ、運用です!~
“見てわかる” ファイバーチャネルSAN基礎講座(第5弾)~さあ、いよいよ、運用です!~
Brocade
 
An intoroduction to the IS-IS IGP routing protocol
An intoroduction to the IS-IS IGP routing protocolAn intoroduction to the IS-IS IGP routing protocol
An intoroduction to the IS-IS IGP routing protocol
Iftach Ian Amit
 
Intel 82599 10GbE Controllerで遊ぼう
Intel 82599 10GbE Controllerで遊ぼうIntel 82599 10GbE Controllerで遊ぼう
Intel 82599 10GbE Controllerで遊ぼう
Takuya ASADA
 
Архитектура маршрутизатора ASR1k и его применение в сетях операторов связи.
Архитектура маршрутизатора ASR1k и его применение в сетях операторов связи. Архитектура маршрутизатора ASR1k и его применение в сетях операторов связи.
Архитектура маршрутизатора ASR1k и его применение в сетях операторов связи.
Cisco Russia
 

What's hot (20)

CCNAv5 - S3: Chapter6 Multiarea OSPF
CCNAv5 - S3: Chapter6 Multiarea OSPFCCNAv5 - S3: Chapter6 Multiarea OSPF
CCNAv5 - S3: Chapter6 Multiarea OSPF
 
Cisco ASR1000 - архитектура, использование в сети предприятия, развитие плат...
 Cisco ASR1000 - архитектура, использование в сети предприятия, развитие плат... Cisco ASR1000 - архитектура, использование в сети предприятия, развитие плат...
Cisco ASR1000 - архитектура, использование в сети предприятия, развитие плат...
 
SATA Introduction
SATA IntroductionSATA Introduction
SATA Introduction
 
SDN (Software Defined Networking) Controller
SDN (Software Defined Networking) ControllerSDN (Software Defined Networking) Controller
SDN (Software Defined Networking) Controller
 
1000 Ccna Questions And Answers
1000 Ccna Questions And Answers1000 Ccna Questions And Answers
1000 Ccna Questions And Answers
 
CCNAS :Multi Area OSPF
CCNAS :Multi Area OSPFCCNAS :Multi Area OSPF
CCNAS :Multi Area OSPF
 
“見てわかる” ファイバーチャネルSAN基礎講座(第2弾)~FC SAN設計における勘所とは?~
“見てわかる” ファイバーチャネルSAN基礎講座(第2弾)~FC SAN設計における勘所とは?~“見てわかる” ファイバーチャネルSAN基礎講座(第2弾)~FC SAN設計における勘所とは?~
“見てわかる” ファイバーチャネルSAN基礎講座(第2弾)~FC SAN設計における勘所とは?~
 
REAL TIME OPERATING SYSTEM PART 1
REAL TIME OPERATING SYSTEM PART 1REAL TIME OPERATING SYSTEM PART 1
REAL TIME OPERATING SYSTEM PART 1
 
Is Is Routing Protocol
Is Is Routing ProtocolIs Is Routing Protocol
Is Is Routing Protocol
 
Db2 v11.5.4 高可用性構成 & HADR 構成パターンご紹介
Db2 v11.5.4 高可用性構成 & HADR 構成パターンご紹介Db2 v11.5.4 高可用性構成 & HADR 構成パターンご紹介
Db2 v11.5.4 高可用性構成 & HADR 構成パターンご紹介
 
SAN デザイン講座
SAN デザイン講座SAN デザイン講座
SAN デザイン講座
 
WiredTigerを詳しく説明
WiredTigerを詳しく説明WiredTigerを詳しく説明
WiredTigerを詳しく説明
 
RSS++
RSS++RSS++
RSS++
 
“見てわかる” ファイバーチャネルSAN基礎講座(第5弾)~さあ、いよいよ、運用です!~
“見てわかる” ファイバーチャネルSAN基礎講座(第5弾)~さあ、いよいよ、運用です!~“見てわかる” ファイバーチャネルSAN基礎講座(第5弾)~さあ、いよいよ、運用です!~
“見てわかる” ファイバーチャネルSAN基礎講座(第5弾)~さあ、いよいよ、運用です!~
 
SDDC Strategy 1.3
SDDC Strategy 1.3SDDC Strategy 1.3
SDDC Strategy 1.3
 
An intoroduction to the IS-IS IGP routing protocol
An intoroduction to the IS-IS IGP routing protocolAn intoroduction to the IS-IS IGP routing protocol
An intoroduction to the IS-IS IGP routing protocol
 
Intel 82599 10GbE Controllerで遊ぼう
Intel 82599 10GbE Controllerで遊ぼうIntel 82599 10GbE Controllerで遊ぼう
Intel 82599 10GbE Controllerで遊ぼう
 
Kafkaを使った マイクロサービス基盤 part2 +運用して起きたトラブル集
Kafkaを使った マイクロサービス基盤 part2 +運用して起きたトラブル集Kafkaを使った マイクロサービス基盤 part2 +運用して起きたトラブル集
Kafkaを使った マイクロサービス基盤 part2 +運用して起きたトラブル集
 
Архитектура маршрутизатора ASR1k и его применение в сетях операторов связи.
Архитектура маршрутизатора ASR1k и его применение в сетях операторов связи. Архитектура маршрутизатора ASR1k и его применение в сетях операторов связи.
Архитектура маршрутизатора ASR1k и его применение в сетях операторов связи.
 
Where is my MQ message on z/OS?
Where is my MQ message on z/OS?Where is my MQ message on z/OS?
Where is my MQ message on z/OS?
 

Similar to Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

Similar to Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS (20)

Consumer side
Consumer sideConsumer side
Consumer side
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Cloudera federal summit
Cloudera federal summitCloudera federal summit
Cloudera federal summit
 
Introduction to Google Cloud & GCCP Campaign
Introduction to Google Cloud & GCCP CampaignIntroduction to Google Cloud & GCCP Campaign
Introduction to Google Cloud & GCCP Campaign
 
Introduction to Cloud Computing
Introduction to Cloud Computing Introduction to Cloud Computing
Introduction to Cloud Computing
 
ME_Snowflake_Introduction_for new students.pptx
ME_Snowflake_Introduction_for new students.pptxME_Snowflake_Introduction_for new students.pptx
ME_Snowflake_Introduction_for new students.pptx
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Cloud virtualization
Cloud virtualizationCloud virtualization
Cloud virtualization
 
Pivotal Cloud Foundry and its usage in ecosystem
Pivotal Cloud Foundry and its usage in ecosystemPivotal Cloud Foundry and its usage in ecosystem
Pivotal Cloud Foundry and its usage in ecosystem
 
AZURE CC JP.pptx
AZURE CC JP.pptxAZURE CC JP.pptx
AZURE CC JP.pptx
 
CNCF Introduction - Feb 2018
CNCF Introduction - Feb 2018CNCF Introduction - Feb 2018
CNCF Introduction - Feb 2018
 
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
 
[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments
 
在小學有效運用雲端電腦以促進電子學習(第一節筆記)
在小學有效運用雲端電腦以促進電子學習(第一節筆記)在小學有效運用雲端電腦以促進電子學習(第一節筆記)
在小學有效運用雲端電腦以促進電子學習(第一節筆記)
 
Introduction to Cloud Computing CA03.pptx
Introduction to Cloud Computing CA03.pptxIntroduction to Cloud Computing CA03.pptx
Introduction to Cloud Computing CA03.pptx
 
SDN in Google
SDN in GoogleSDN in Google
SDN in Google
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
 
CC.pptx
CC.pptxCC.pptx
CC.pptx
 
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
 
cloudintroduction.ppt
cloudintroduction.pptcloudintroduction.ppt
cloudintroduction.ppt
 

Recently uploaded

Digital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfDigital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdf
AbrahamGadissa
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdf
Kamal Acharya
 
Fruit shop management system project report.pdf
Fruit shop management system project report.pdfFruit shop management system project report.pdf
Fruit shop management system project report.pdf
Kamal Acharya
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 

Recently uploaded (20)

Scaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltageScaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltage
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker project
 
Digital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfDigital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdf
 
Construction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxConstruction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptx
 
fluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answerfluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answer
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Explosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdfExplosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Natalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in KrakówNatalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in Kraków
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdf
 
Fruit shop management system project report.pdf
Fruit shop management system project report.pdfFruit shop management system project report.pdf
Fruit shop management system project report.pdf
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Toll tax management system project report..pdf
Toll tax management system project report..pdfToll tax management system project report..pdf
Toll tax management system project report..pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docxThe Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 

Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

  • 1. Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS Authors: Walid Saad, Heithem Abbes, Mohamed Jemni and Christophe Cerin Journal: International Journal of Big Data Intelligence Online Date: Thursday, July 24, 2014 By:- Arnob Saha (L20339084) Hari Prasad Dhonju Shrestha (L20352046) 1
  • 2. Outlines • Abstracts • Introduction •Motivation and fundamental issues • Related work • SlapOS overview • Design and Implementation issues • Experimental results • Conclusion and future works • Acknowledgements • Reference 2
  • 3. Abstract • Tools and framework developed to manage and handle the big amount of data for the grid platform. • Tools not adopted because of the complexity of the installation and configuration processes. • SlapOS (Simple Language for Accounting and Provisioning Operating System) emerged • Main aim -> to hide the complexity of IT infrastructures Software deployment from users • Paper propose a cloud-hosted data grid using the SlapOS cloud • Through a software as a service (SaaS) solution, users can request and install automatically any data movement and sharing tools like Stork and Bitdew without any intervention of a system administrator 3
  • 4. Introduction • Many real world scientific and enterprise applications deal with a huge amount of data. The emergence of data-intensive application has prompted scientists around the world to enable data grids. Examples bio-informatics, medical imaging, high energy physics, coastal and environmental modelling and geospatial analysis. • In order to process large data-sets, users need to access, process and transfer large datasets stored in distributed repositories. • Paper proposed a self-configurable desktop grids (DGs) platform on demanda. • The Simple Language for Accounting and Provisioning Operating System (SlapOS) cloud presents a configurable environment in terms of the OS and the software stack to manage without the need of virtualisation techniques. 4
  • 5. Introduction (contd…) • We focus in this paper on a subset of the overall research about interoperability between DGs and clouds namely data tools as hosted software as a service (SaaS) frameworks. • We present the design and the implementation of two Software as a Service tools for data management. The first service provides a mean for users to transfer data from their sites to the computation or simulation sites. The second service will be used to share data in widely distributed environment. • The challenge is how to: • imagine automatic data management tools that are able to mask the installation and configuration difficulties of data management software • deliver data management functionality as hosted services via web user interfaces. 5
  • 7. Motivations and fundamental issues • e-Science applications require efficient data management and transfer software in wide-area, distributed computing environment. • To achieve data management on demand, the users need a resilient service and move data transparently • No IT knowledge required, no software download/installation/configuration steps. • Implementations based on: • Stork data scheduler: Manage data movement over wide area network, using intermediate data grid storage and different protocols • Bitdew: make data accessible & shared from other resources including end-user desktops and servers • SlapOS: with only a ‘one-click’ process instantiate, configure data managers(stork+ Bitdew) and deploy them over the internet 7
  • 8. Related Works • To manage the low-level data handling issues on grid systems • High-level tools for co-scheduling of data and computation in grid environments. • Research in data management using SaaS-based services. • Data management and transfer in grid environment • GridFTP is the most widely used tool through parallel streams. • Representative examples of storage systems includes SRMs, SRB, IBP and NeST • FreeLoader framework is designed to aggregate space and I/O bandwidth contributions from volatile desktop storage • Farsite builds a secure file system using entrusted desktop computers • Chirp is a user-level file sytem for collaboration across distributed system like cluster, clouds and grids. 8
  • 9. Related Works (condt...) • Bitdew is an open source data management for grid, DG and cloud computing. Higher level tools for data scheduling • Stork: a schedular for data placement activities in a grid env • Using stroke input data will be queued, scheduled, monitored, managed and even check-pointed. • Stork provides solutions for data placement problems both in the grid and DG environment since it can interact with different data transfer protocol such as FTP, GridFTP, HTTP and DiskRouter. Data orchestration through SaaS technologies • Globus Online (GO) is a project that delivers data management functionalities not as downloadable software but as hosted SaaS. • Allows users to move, synchronize and share their data using a web browser. 9
  • 10. SlapOS overview • An open source distributed operating system • Provides an environment for automating the deployment of applications • Based on the idea that ‘everything is process’, SlapOS combines grid computing, in particular the concepts inherited from BonjourGrid and the techniques inherited from the field of ERP in other to manage, through the SlapGrid daemon, IaaS, PaaS and SaaS cloud services. • The SlapOS strengths are the compatibility with any operating system, in particular GNU Linux, all software technologies and support for several infrastructure • More than 500 different recipes are available for consumer application such as Linux Apache MySQL PHP 10
  • 11. SlapOS key concepts • SlapOS architecture is composed of two types of components: SlapOS master and SlapOS node • SlapOS master: it acts as centralized directory for all SlapOS nodes and it knows the location where software are located and all software that are installed. • SlapOS node: it can be dedicated or volunteer node. The master’s role is to install applications and run processes on SlapOS nodes. • In comparision with the traditional clouds,SlapOS is based on an opportunistic view. • In its normal utilisation, the requests are serviced by the data center nodes. Whenever the number of requests reach a peak, SlapOS can redirect some of them on volunteer node. 11
  • 12. SlapOS key concepts • Doing so, the system can win on two points, • It maintains a good response time in the request treatment • In the case of increase in the number of cloud’s customers, there is a good alternative for guaranteeing the SLAs without buying new machines • SlapOS node consists essentially of a basic Linux distribution, a daemon named SlapGrid, a Buildout environment for bootstrapping applications and supervisord to control processes. • Node can receive a request to install software form master, receive request asking the master to deloy an instance of software • SlapOS software on a node is called a ‘Software Release’ and it consists of all the binaries to run the software. • ‘Software Instance’ -> multiple instances of the corresponding s/w 12
  • 13. How to join SlapOS? • SlapOS is a voluntary cloud, which mean that each person can potentially add its own server into the cloud. • To participate to a BOINC and/or Condor project, one has to: • Register on a SlapOS master • Install SlapOS node on the node. • Add a virtual server on the master and link it to the physical server by configuring the node installed on the physical server. • Select and install application, from the list of available application on the master, that will be allowed to be deployed on the node. • The number of instances that can be run on the node depends on the capacity and the configuration of SlapOS on the server. • To make application available on the SlapOS master, it is necessary to integrate them to SlapOS. • The integration of application to SlapOS goes through the writing of Buildout profiles, consisting mainly of the file software.cfg which will then make reference to all other reqired files. 13
  • 14. Design and Implementation Issues • Implementation steps: • SlapOS uses Buildout technologies to install software and deploy instances. • In the Stork case, software is divided in three profiles 1. Component (slapos/component/stork/buidlout.cfg): we find here all the dependencies used by by Stork. Buildout will allow us to integrate the profile and dependencies using the rules extends in order to install mainly the Globus Client, Globus GSI grid security infrastructure. 2. Software Release profile(SR): located on a remote git server and defined by its URL ( http://git-repository/slapos/software/stork/software.cfg ) . SR describe the installation of Stork and its dependencies without configuration files and disk image creation. When SlapOS installs a Stork SR, it launchesa Buildout command with the correct URL 3. Software Instance: It will reuse an installed Software Released by creating wrappers, configuration files and anything specific to an instance. The whole process creates a stork configuration file. 14
  • 15. Design and Implementation Issues (contd..) • Architecture overview: SlapOS is based on a master-slave paradigm. All steps that allow user to participate in SlapOS community and exploit Stork services are as follow: 1. Slapos-connect(Login, Password) 2. Request-stork-software(Slave_Node_Name, Software_Release_Name) 3. Download-stork-software(Stork_Software_Release_URL) 4. Request-instance-parameter(Slap-Parameters_List) 5. Deploy-instance(Slap_Parameter_List) 6. Submit-data-job(submit_dap_file, stork_server) 7. Move-data(src_data_url, dest_data_url) 15
  • 16. Architecture overview • Figure 2 Schematic of the Stork SaaS via SlapOS cloud 16
  • 17. Security and authentication process • Security in Stork is an important issue with many aspects to consider. The most important is the way in which user want to run Stork daemons. Current Stork releases fall into three main schema: 1. single host: Stork_Server and Stork_Client are running in the same machine. 2. Multiple hosts: Stork_Serve in one location and Stork_Client in another one. 3. Multiple host and third party transfer: Stork_Server manage movement of data among two or more remote locations. • Many authentication mechanisms are available like SSL, Kerberso, PASSWORD and GSI. • Stork_Server provides only GSI authentication to allow different client machines to connect to it. 17
  • 18. Security configuration • Users can easily run 100+ Stork instances on a ‘small cluster’, each of them with its own independent daemons and configuration. • Sercurity setting depends on the manner in which the users want to deploy their Stork instances. 1. Running Stork in the SlapOS cloud: After installation of the SlapOS slave node, the user requests one instance which includes two Stork components(server and client tools), both will use the same configuration file. 2. Submitting jobs to an external Stork server: An important property of our approach is the ability to handle transfers using existing Stork_Server. 3. Remote GridFTP transfer: to use GSIFTP transfers with Stork, the users need to specify a valid grid proxy and a user crendentials in place. 18
  • 20. Data sharing via SaaS • Once data are placed on SlapOS, a second SaaS based on Bitdew is automatically launched to published and to distribute data over SlapOS community. • BitDew is a programmable framework for large-scale data management and distribution for DG systems. • Bitdew offers two sets of nodes: server (service host) and client(consumer) • To share data with Bitdew, end-users need to connect to SlapOS, request Bitdew software and specify information for instances deployment. • Cloud hosted approach divides the world in three sets of nodes: • Could-middleware node(SlapOS master), cloud-provider node (SlapOS slave node), SaaS instances (Bitdew server and client) 20
  • 21. Data sharing via SaaS • SlapOS user must invoke the following steps: • Request-instance-parameters: for client instances, slap-parametes are classified into two steps a. Bitdew_Server: the user sets information about the remote server hostname b. data information’s parameters: the user must specify the protocol used to get remote data and the signature of the file. • Deploy-instance • Share-data(transfer_protocal, data-path, properties.json) • Get-data(transfer_protocol, file_md5_ID) • Bitdew buildout profiles • The integration of Bitdew into SlapOS needs writing multiple Buildout profiles. Buildout profiles are divided in three types(component, software Release, Software Instances), organized into several directories. 21
  • 23. Experimental results • Experiment have performed on the experimental grid computing infrastructure Grid 5000. Experiments were conducted in four cluster of Lyon site using more than 50 machines. Set two Debian Linux Distribution images of SlapOS. • Deployment steps of SlapOS on Grid ‘5000 • SlapOS is designed to work natively with IPV6. Several restrictions are applied to limit access to and from outside the Grid ‘5000 infrastructure. To overcome restrictions, we prepared pre-compiled images containing all the standard install files of SlapOS: the kernel and runtime daemons. These images are also configured to run IPv6 at startup. Slapos-vifib image is implemented and slapos-image is used. • Usage scenario • To show the capacity of our cloud-hosted model to build a scalable platform for the purpose to manage bag-of-tasks applications with intensive data. 23
  • 24. Experimental results • Two type of metrics: • The scalability in terms of how-many-instances-requests-are-supported. If the master is overloaded, the time needed to respond to a request instance may increase. • Measure the time required to create Stork and Bitdew instances as a function of the number of SlapOS nodes. • In our experiments, we use blastn program to search respectively Human DNA sequences in DNA databases. To run BLAST jobs we need the BLAST application package, the DNA Genebase which contains millions of sequences is a compressed large archive, the DNA Sequence to compare with sequences in Genebase. • The recommended scenarios to be used in our experiments is shown in Algorithm. At the end of computation, each job will create a result file containing all matched sequences. 24
  • 26. Experimental results • Experimentation stepss: 26
  • 27. Experimental results • Result Analysis • Data movement service completion time • All instances are launched simultaneously and completed successfully, the total completion time includes times to: • Register SlapOS node to the master: • Deploy of Stork instances: • Transfer BLAST files from NCBI FTP server to SlapOS nodes: • The completion time of instances is proportional to • the number of nodes connected to the master • the number of instances required simultaneously. • Data sharing service completion time • Deploy of server instances • Deployment of client instances • BLAST execution 27
  • 28. Experimental results • This figure illustrates the total completion time for two Stork instances using 50 SlapOS nodes (a total of 100 instances). All instances are launched simultaneously and completed successfully 28
  • 29. Conclusion and future work • The emergence of data-intensive applications and cloud SaaS technologies brought the flexibility to introduce new data management handling mechanism that help the basic scientist and the grid users to deploy easily their distributed platform. • This works focuses on data management as SaaS-based solutions for the purpose to mask the complexity of the installation and configuration processes and the IT infrastructure requirements. • Since SaaS solutions is already in production into the SlapOS cloud at Paris 13 University, our future research is focused more on self-configuration, scalability and security transfer. 29
  • 30. Acknowledgements • In France, this work is funded by the FUI-12 Resilience project from the ministry of industry. Experiments presented in this paper were partly carried out using the Grid ‘5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organisations (see https://www.grid5000.fr). Some experiments were carried out on the SlapOS cloud available at University of Paris 13 (see https://slapos.cloud.univ-paris13.fr). 30
  • 31. References • http://pypi.python.org/pypi/slapos.cookbook/ • Abbes, H., Cerin, C. and Jemni, M. (2008) ‘Bonjourgrid as adescentralized scheduler’, IEEE APSCC, December. • Foster, I. (2011) ‘Globus online: accelerating and democratizing science through cloud-based services’, IEEE Internet Computing, Vol. 15, No. 3, pp.70–73. Thank YOU!!! 31