Application cloudification with liberty and urban code deploy - UCD
Multi-Task Scheduling Framework for Distributed Systems
1. Slide 1
Framework for Multi-Task Scheduling in
Distributed Systems using Mobile Agent
2. Outlines
Outlines Slide 2
1. Introduction.
2. Problem Statements.
3. Recent Studies
4. Objective.
5. Motivations.
6. Proposed Strategies and Results .
7. Concluding Remarks and Future Work
3. Introduction
Distributed System Concept
Distributed System (DS):
- is a collection of independent computers that appears to its users as a
single coherent system [1]. Distributed computing refers to the use of
DS to solve large scale scientific and engineering problem. How?
- An application may be divided into tasks and these tasks are processed
concurrently on the different machines of DS. The problem now is how
to distribute the application tasks onto machines.
[1] Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2007.
1 Slide 3
4. Problem statements
- Given a parallel application consist of M communicating tasks and a DS
of N computers.
- The problem is how to allocate or schedule different tasks of the given
application onto the different computers of the DS to minimize
turnaround time of the application and improving reliability and
availability of DCS.
Problems Statements.1 Slide 42
How tasks are assigned to
machines?
6. Recent Studies
[8] S. N. Gujar, G.R.Bamnote, R.S.Apare, M.A.Pund and S.R.Gupta,” MOBILE AGENT BASED DISTRIBUTED SYSTEM COMPUTING IN
NETWORK”, International Journal of Recent Trends in Engineering, Vol 2, No. 4, pp. 117-119, November 2009.
Recent Studies. Slide 6
• MOBILE AGENT BASED DISTRIBUTED SYSTEM COMPUTING IN
NETWORK [8].
• S. N. Gujar , G.R.Bamnote , R.S.Apare , M.A.Pund and S.R.Gupta
developed a new e-commerce application to compare performance and
scalability issues of a mobile agent network with client server
approaches.
• This application consists of a multi-agent system and its agents co-
operate and communicate in a network that allows agent mobility.
• Gujar team finds that mobile agents are an appropriate technology for
implementing net-centric applications.
1 2 3
7. Recent Studies
[9] N. V. Blamah and A. O. Adewumi, A multi-agent-based model for distributed system processing ,
International Journal of the Physical Sciences, Vol. 7, No.43, pp. 5297-5303, 6 September, 2012.
Slide 7
• A multi-agent-based model for distributed system processing [9].
• N. V. Blamah and A. O. Adewumi create a new model for multi agent
system for distributed processing of tasks, where the processes continue
to operate despite network disconnection.
• The model provides a more reliable and cost effective means of
distributed processing. The model was designed as a hybrid of static and
mobile agents, and it enables reduced requirement for network
connectivity and provides an opportunity for data to be processed offline.
• The model for multi agent system provides a more reliable and cost
effective means of distributed processing of tasks.
Recent Studies.1 2 3
8. Weakness
Slide 8
1. Most of proposed algorithms concerned with improving performance.
2. Current studies not support scheduling task using mobile agent.
3. Not adaptive when any change occurred in environment.
4. Centralized node problem.
Recent Studies.1 2 3
9. Objective
Slide 9Objective.1 2 3 4
- The work presented in this thesis is motivated by the need to develop a new
framework for task scheduling on distributed computing systems in order to:
1-Overcome the drawbacks of the existing strategies that affects on the
performance and reliability of the whole system.
2- Ensure efficient execution of parallel applications, including Big Data, on
distributed computing systems.
10. Research Motivations
The main motivations of this research is to:
• Overcome the problem of centralized strategy.
• Improving DCS reliability against:
1- Network Failures.
2- Nodes Shutdown.
3- Nodes Failure.
• Handling new complex applications as Big Data processing.
• Implementing Process Migration.
Motivations. Slide 101 2 3 4 5
11. Proposed Strategies
Slide 11
Proposed
Framework
1st phase
2nd phase
3rd phase
4th phase
5th phase
1st phase
2nd phase
3rd phase
1st Proposal: MRAM framework
for achieving acceptable performance
and improving reliability and
availability of DCS.
2nd Proposal: Enhanced MRAM
to support Big Data processing on
DS (EMRAM).
- Create MRAM Framework
- Implement Task
Scheduling.
- Overcome link Failure.
- Overcome Node Failure.
- Enhance MapReduce
workflow.
- Overcome Task
Replication.
- Overcome
Centralization.
Proposed Strategies.1 2 3 4 5 6
- Platform Selection
12. - What is Agent?
- Agent ‘Versus’ Object[5]
An agent is something that acts in an environment, interacts with the
environment with a body, receives information through their sensors, and
acts in the world through their actuators, also called effectors.
Agent Based Computing
1st Difference :
- Agents interact with the environment to get the essential information to work.
- Objects are not intended to interact with the environment.
2nd Difference:
- An agent is explicitly associated with a goal.
- An object is NOT explicitly associated with a goal.
[5] Alex Berson, Client/server architecture, McGraw-Hill Computer Communications Series, 2nd
edition,1996.
Slide 121st Proposal.1 2 3 4 5 6
13. Agent classification: [6]
[6] Michael Wooldridge, An Introduction to MultiAgent Systems, John Wiley & Sons Ltd, 2002.
Many several types of software agent are specified:
1- Autonomous Agent.
2- Interactive Agent.
3- Adaptive Agent.
4- Stationary Agent : this kind of agent is residing in the node once it
created and doesn’t move any where than its place. It is system dependant
and increasingly being used in DS.
5- Mobile Agent : this kind of agent is able to move between computer hosts
on the network. It has the ability to suspend its execution at the network
element and then resume it in another place to accomplish its assigned task.
Once it finishes its task, it returns to its home or dispose itself.
Slide 131st Proposal.1 2 3 4 5 6
14. Notation of Mobility [7]
[7] Peter Braun,Wilhelm Rossak, Mobile Agents: Basic Concepts, Mobility Models, and the Tracy
Toolkit, Elsevier inc. and dpunkt.verlag, 2009.
- A- Strong Mobility:
1. Mobility of code, data state and execution state
2. Restart the execution exactly from the point where it was stopped
before movement.
- B- Weak Mobility.
1. Permits mobility of code and data state
2. After the movement, the agent is restarted and the values of its
variables are restored, but its execution restarts from the beginning of
a given procedure ( a method in case of objects).
Slide 141st Proposal.1 2 3 4 5 6
15. First Phase: Creating MRAM Framework
- The goal of first phase is to start creating the MRAM framework under
JADE platform and implementing process distribution using mobile agent
and make sure the proposed framework can work on heterogeneous
systems.
Slide 151st Proposal.1 2 3 4 5 6
16. First Phase: Creating MRAM Framework
Slide 161st Proposal.1 2 3 4 5 6
- Distribution Strategy:
Task independent graph Agents Movement in MRAM
5
17. First Phase: Results
- two scenarios are used and the application turnaround time is measured in
each scenario.
Slide 171st Proposal.1 2 3 4 5 6
DCS Architecture of first Scenario
(a) First Scenario :
- an application represented by multiplication of two matrices is required to
be processed on the DCS composed of 3 machines.
- each matrix has two dimensions of sizes [6000 x 6000].
18. First Phase: Results
Slide 181st Proposal.1 2 3 4 5 6
The total time for each case
1st state 2nd state 3rd state
Processing time
(minutes)
28.32967 15.6552 9.8474
Communication time
(Seconds)
0 0.51510 1.08187
Start-up time
(Seconds)
0.36033 0.72066 0.51510
Execution time
19. First Phase: Results
Slide 191st Proposal.1 2 3 4 5 6
DCS Architecture of Second Scenario
(b) Second Scenario:
- an application represented by summation of two matrices is require to be
processed on the DCS composed of 10 machines. - Each matrix has two
dimensions of sizes [6000 x 6000].
Application Turnaround Time for MRAM and BLAMAH
0
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
1050
1100
1150
1200
1250
1300
1350
1400
1450
1500
1550
1600
1650
1700
1750
1 2 3 4 6 8 10
TimeinSec
Number of machines
MRAM BLAMAH
20. Second Phase: Platform Selection
- By considering the same application of two matrices multiplication and
DCS composed of 3 machines.
- The purpose is to select the best platform for the proposed MRAM
framework.
- To do so:
1- The behavior of the proposed MRAM is evaluated considering
different platforms including
A- CORBA,
B- Java Socket
C- RMI
2- the results are compared with that obtained by using the JADE
platform.
Slide 201st Proposal.1 2 3 4 5 6
21. Second Phase: Results
Slide 211st Proposal.1 2 3 4 5 6
Comparison between JADE, RMI, sockets and CORBA
0
5
10
15
20
25
30
35
Single Machine Two Machines Three Machines
Timein(Mintus)
Number of machines
JADE
JAVA Socket
RMI
CORBA
22. Third Phase: Handle Tasks Dependencies
- The proposed MRAM is modified to support dependencies between
application tasks by using strong mobility agent.
- The agent works as strong mobility by using:
1- Serialization object
2- Save last line has been executed.
Slide 221st Proposal.1 2 3 4 5 6
23. Third Phase: Results
Slide 231st Proposal.1 2 3 4 5 6
Task Dependence Graph
(A) First Scenario:
The DCS consists of 4 machines connected as star topology. Three machines
have the same specifications of machines in1st phase while the forth
machine uses CPU Intel core i7- 2.53 GHz, the operating system used is
Win8 and this machine has 4GB RAM.
25. Third Phase: Results
Slide 251st Proposal.1 2 3 4 5 6
Task Dependence Graph
(b) Second Scenario:
-the DCS consists of 10 shown in phase one.
-the task graph consists of 13 tasks, each task performs specific operation as
follows:
t1=t2=t3=t5=t6=t7=t8=t10=t12= m1[600 x 600] + m2[600 x 600];
t4 = m3[300 x 300] + m4[300 x 300]]
t9=t11=t13= m5[100 x 100]+ m6[100 x 100]]
27. MRAM Vs. Most Recent Strategies
Slide 271st Proposal.1 2 3 4 5 6
MRAM Vs. Most Recent Strategies
MRAM Blamah et al. [5] Gujar et al. [4]
Allocation YES YES YES
Scheduling YES NO NO
Architecture
Distributed
Agents
Client Server Client Server
Mobility YES YES YES
Offline processing YES YES NO
Solved node shutdown YES NO NO
Solved network
disconnection
YES NO NO
Methodology Object Oriented Object Oriented Object Oriented
28. Phase Four: Link Failure
- In this phase the agent behaviour of the proposed MRAM framework is
modified to solve the problem of link failure and so improve the DCS
reliability.
- Agent Behavior under Link Failure:
Slide 281 2 3 4 5 6 1st Proposal.
Agent behaviour for link failure Searching for another path using mobile agent
29. Phase Four: Link Failure
Slide 291 2 3 4 5 6 1st Proposal.
1- Case 1: The faulty link may go back to operational mode before the agent
finds an alternative link and the total time is described by:
Total time= execution time + migration time + Tm
Where, Tm is the elapsed time for the faulty link to go back to operational mode.
2- Case 2: The agent move through an alternative operational link before the
faulty link goes back and the total time is described by:
Where, Ts is the search time (the elapsed time by the agent to find an
alternative operational link).
- During the agent searching process, there are two different cases:
30. Phase Four : Results
Slide 301st Proposal.1 2 3 4 5 6
(a) First Scenario :
- It used to process the same application of two matrix multiplication on a
DCS considering two cases.
- First case, the DCS has two machines.
- Second case, the DCS has three machines and one or more links between
the machines are failed for a period of time.
(b) Second Scenario :
Applying the MRAM and the improved BLAMAH algorithm to process the
same application of matrix summation on the DCS of 10 machines.
31. Phase Four : Results
Slide 311st Proposal.1 2 3 4 5 6
Turnaround Time on Ten Machines under
Link Failure
Turnaround time under link failure
0
50
100
150
200
250
300
350
400
450
500
t0 t1 t2 t3 t4 t5 t6
TimeinSec
different time with fixed period=60 Sec
MRAM
Improve
BLAMAH
32. Phase Five: Machine Failure
- Since a DCS consists of a set of distributed machines interconnected by a
communication network, a machine failure occurs if:
1- An user tries to shutdown a machine during processing a task
assigned to it.
2- An user logs-in an idle machine after assigning some tasks to it
according to the availability state at task distribution and the state of the
machine is changed accordingly.
Slide 321 2 3 4 5 6 1st Proposal.
33. Phase Five: Machine Failure
Slide 331 2 3 4 5 6 1st Proposal.
In Figure, shows the overall MRAM framework with the behaviour of mobile
agent under machine failure. By applying this scenario, the total time is:
• Agent Behavior under Machine Failure:
Where, the migration time is the
time required for the agent to move
tasks from the master machine to
the target machine plus the time
required for the agent to migrate
suspended tasks from machine to
another machine based on the
number of time of machine failed
and the time required to return the
partial results to the master.
MRAM framework with Agent Behaviour under
Machine Failure
34. Phase Five: Results
Slide 341st Proposal.1 2 3 4 5 6
(a) First Scenario :
- First application, It used to process the same application of two matrix
multiplication on a DCS under one and two machines failure.
- Second application, fetches rows in large database under one and two
machines failure.
- The machine failure occurs before completing corresponding task
processing in case 1, while, in case 2, the machine failure occurs after the
application tasks are assigned to the different machines and before a
machine starts processing the corresponding task.
35. Phase Five: Results
Slide 351st Proposal.1 2 3 4 5 6
a. Turnaround Time under one node failure
b. Turnaround Time under two nodes failure.
Performance of MRAM and Sockets for
two Matrices
a. Turnaround Time under one node failure
b. Turnaround Time under two nodes failure.
Query Time under two nodes failure.
0
5
10
15
20
25
30
35
40
Case1 Case2
TimeinMinute
Cases
Sockets
MRAM
0
5
10
15
20
25
30
Case1 Case2
TimeinMinute
Cases
Sockets
MRAM
0
5
10
15
20
25
30
35
Case1 Case2
TimeinSec
Cases
Sockets
MRAM
0
10
20
30
40
50
60
70
Case1 Case2
TimeinSec
Cases
Sockets
MRAM
36. Phase Five: Results
Slide 361st Proposal.1 2 3 4 5 6
(a) Second Scenario :
- Applying the MRAM and the improved BLAMAH algorithm to process the
same application of matrix summation on the DCS of 10 machines under
machine failure.
performance when one node closed
0
50
100
150
200
250
2 3 4 6 8 10
TimeinSec
Number of machines
MRAM Improve BLAMAH
38. Hadoop
- Hadoop architecture consists of a Hadoop Distributed File System (HDFS)
and a programming framework MapReduce.
HDFS: Hadoop Distrbuted File System
Slide 381 2 3 4 5 6 2nd Proposal
39. Map Reduce
- Hadoop architecture consists of a Hadoop Distributed File System (HDFS)
and a programming framework MapReduce.
MapReduce Technique
Slide 391 2 3 4 5 6 2nd Proposal
40. Hadoop Weakness
- Hadoop works based on master/slave model.
- Hadoopdoesn’tsupporttaskdependencies.
- Replication problem.
Slide 401 2 3 4 5 6 2nd Proposal
41. First phase: Enhance MepReduce
- The goal of first phase is to enhance the MRAM to overcome Hadoop
limitations of its MapReduce workflow so as to improve its performance.
Slide 411 2 3 4 5 6
Hadoop workflow. MRAM workflow.
2nd Proposal
42. Hardware Specification
Slide 421 2 3 4 5 6
Server PC1 PC2 PC3
Model IBM x3650 IBM
CPU
Intel Dual Core2Quad
2.56 GHZ
Intel Dual Core2Due
2.53 GHz
RAM 16 GB 2 GB
OS Linux Linux
Sun JRE JRE 7u25 JRE 7u25
JADE 4.1 4.1
-.
2nd Proposal
43. First phase: Results
Slide 43
Performance comparison between EMRAM and Hadoop.
1 2 3 4 5 6
- To evaluate the performance of the EMRAM and compare it with Hadoop,
the word count application is applying on each platform.
- The word count application is a simple program given a text file and count
repeated time of each word.
- save the output as a list in the form of (<Word>, <Count>).
2nd Proposal
44. Second Phase: Overcome Replication
Slide 44
- Hadoop enables a fault tolerant by replicating data on three or more
machines to avoid data loss, but this method causes some problems.
1 2 3 4 5 6
- Basic idea to overcome replication in EMRAM:
EMRAM behaviour under machine fail.
2nd Proposal
45. Phase two: Results
Slide 451 2 3 4 5 6
Position of indexes i and j at fixed period t
0.25 s 0.5 s 1 s 2 s 4 s 8 s
Start snapshot of objects
i=1
j=1988
i=1
j=1983
i=1
j=2088
i=1
j=2001
i=1
j=2001
i=1
j=1912
current snapshot of task
processing before machine
down
i=3
j=2240
i=5
J=1255
i=8
j=338
i=17
j=150
i=38
j=613
i=65
j=2236
last updating of processing
status
i=1
j= 1886
i=2
j=1246
i=4
j=1676
i=8
j=2420
i=19
j=1271
i=37
j=871
- The application to be processes on the DCS using machines on the table is
the two matrices multiplication of sizes 3000x3000 with first dimension is
indexedby"i"andseconddimensionindexedby"j“.Thistableshowstheexact
position of the indexes i and j at two different snapshots taken at starting of
processing and after the system works a fixed period of time t.
2nd Proposal
46. Phase two: Results
Slide 46
Size of data at different periods
Time of Exchange data
1 2 3 4 5 6
Data losses under different platforms
- Based on above factors, EMRAM founds the best fixed period of time to be
transfer updating data is one second.
2nd Proposal
47. Phase two: Results
Slide 47
-The cost of HadoopF is described by the following equation:
CostHadoopF = (tf+ts+tcom).
Where, tf is the total time spent in executing a task on a machine before failing.
ts is the time spent in executing task on the final machine. tcom is the time
between falling first machine and the task beginning in the second machine.
- The cost of HadoopC is described by the following equation:
CostHadoopC = 3* Tfa
Where, Tfa is the time spent for task executed at the fastest machine.
- The cost of MRAM is dependable on the execution time for task (Te) and total
migration time between machines (Tc) and is described by the following
equation:
CostMRAM =Te+ Tc
Machine=0
2
Machine=0
2
1 2 3 4 5 6 2nd Proposal
48. Phase two: Results
Slide 481 2 3 4 5 6
a- Cost values when one machine is failed b. Cost values when two machines are failed
Cost values for HadoopF, HadoopC and EMRAM.
2nd Proposal
0
5
10
15
20
25
30
35
0.25 0.5 1 2 4 8
TimeinMinutes
Time periods
HadoopC
hadoopF
MRAM
0
5
10
15
20
25
30
35
0.25 0.5 1 2 4 8
Timeinminutes
Time periods
HadoopC
hadoopF
MRAM
49. Third phase: Overcome Centralization
- Hadoop platform is based on workstation-server model. In this model.
- An arising problem is that, when the master node fails, the entire system
fails.
- This phase leverages the mobility features of mobile agent that can react
dynamically and autonomously to change in their environment and present a
new strategy to solve the problem of single master failure.
Slide 491 2 3 4 5 6 2nd Proposal
51. Phase three: Result
Slide 51
-Total time= execution time+communication time+ Tm+ Tstart up time.
-Where the communication time is the time required for the agent migration
from the master machine to the target machine and return to home.
-Tm is the time spent from master machine to back online again.
-The Tstart up time is the time spent from new master to start work, this time is
based on the times of master machine failed.
-The word count application was applied in this phase and has been
assumed Tm= 1 minute.
Machine=0
2
Machine=0
2
1 2 3 4 5 6 2nd Proposal
52. Phase three: Result
Slide 52
Performance of EMRAM when master machine is failed.
1 2 3 4 5 6 2nd Proposal
53. Summary of Second Proposal
Slide 531 2 3 4 5 6
Factors
Platforms
Hadoop EMRAM
Architecture Client/Server Distributed Agent
Startup time Long Less
Performance Less Better
Reliability Reliable More Reliable
Algorithm Map-Reduce Map-Reduce
Mobility N/A Support
Management disk Support N/A
Allocation Tasks Support Support
Scheduling Tasks N/A Support
Methodology Object-Oriented Object-Oriented
2nd Proposal
54. Concluding Remarks
In this work:
Based on the evaluation and comparative study carried out in chapters 3 and 4,
the proposed MRAM provides better performance and more reliability to DCS
than the most recent frameworks as Blamah et al. [5] and Gujar et al. [5]. This is
because; the proposed MRAM has several features improve its behaviour.
The proposed EMRAM provides better performance and more reliability to
big data analysis on DCS environment than the well known Hadoop platform.
This is because; the EMRAM has several features overcoming the drawbacks of
Hadoop.
Concluding Remarks and Future Work.1 Slide 542 3 4 5 6 7
55. Future Work
• Enhancing the MRAM to support various dynamic scheduling algorithms.
•Enhancing Hadoop platform by using the features implemented in the
EMRAM because the widespread use of Hadoop by many establishments.
•Proposing new strategies to overcome the small files problem of Hadoop.
Slide 55
Future work:
Concluding Remarks and Future Work.1 2 3 4 5 6 7
56. Our Achievements
1- Y.Essa, G.Attiya, A.Elsayed, " Mobile Agent based A New Framework for
ImprovingBigDataAnalysis,“IEEEInternationalConferenceonCloud
Computing and Big Data IEEE CloudCom-Asia, DEC 2013, Fuzhou, China.
2- Y.Essa, G.Attiya, A.Elsayed, " New Framework For Improving Big Data
AnalysisUsingMobileAgent“,SubmittedinInternationalJournalofAdvanced
Computer Science and Applications.
3- Y.Essa,G.Attiya,A.Elsayed,“ImprovingtheReliabilityofDistributed
ComputingSystemsbyUsingMobileAgent“,Readynow.