The document discusses protecting applications from transient failures during cloud deployments. It presents the AURA tool, which uses a directed acyclic graph and message passing to deploy applications and recover from failures. The tool monitors deployment and retries failed nodes to overcome transient errors. The author proposes implementing error recovery and message passing in AURA using ASP.NET and SQL Server for improved failure handling and snapshots.
2. Outline of the talk
• Introduction
• Related Work
• Application Deployment
- Architecture
- Idea Architecture
- Deployment Model
- Error Recovery Process Dependency Graph
- Error Recovery Basic Flowchart
- Error Recovery Flow Chart
• Implementation Aspects
- Cloud deployment graph
- Execute the Deployment Script
- Experimental Evaluation
- ASP.NET Message Passing
- ASP.NET Message Passing Exception Handling
- ASP.NET Message Passing Console Display
- Microsoft Azure Cloud Dashboard
- Microsoft Azure Cloud Message Routing
• Conclusions and Future Work
• References Useful Links
Dhaka University of Engineering and Technology, Gazipur
3. Introduction
• Transient failures occur temporarily for a short period of time.
• AURA is an Openstack application deployment tool with error-
recovery improvements. [https://github.com/giagiannis/aura]
• To overcome this failure, Direct Acyclic Graph that traverse the
graph breath first manner and find the deployment specific script
error using AURA Tool.
• To monitor the graphical user Interface and try minimum fail script
node re-executed.
• To wait for threshold time and retry fetching the resource before
calling it quits.
• A fixed amount of retries within threshold time, To make consider it a
permanent failure.
Dhaka University of Engineering and Technology, Gazipur
4. Related Work
To protect transient failure, Wrangler receives application descriptions in XML
format and deploys them to different cloud providers ( Amazon EC2,
OpenNebula and Eucalyptus). [1]
Liu provide database transaction and implement the necessary mechanisms to
achieve the ACID properties to protect transient failure. [1]
Katsuno study the problem of deployment parallelization in multi-module
applications and protecting transient failure [1]
Pegasus is deploying scientific workloads to protect transient failure . [1]
NPACI Rocks [1] attempts to automate the provisioning of high-performance
computing clusters in order to simplify software installation and version
tracking.
Dhaka University of Engineering and Technology, Gazipur
5. Application Deployment Architecture
Dhaka University of Engineering and Technology, Gazipur
AURA Master: It is deployed in a dedicated VM into the cloud .
AURA Executors: The AURA Executors are responsible to execute the appropriate
deployment scripts. Each software module is deployed by a dedicated AURA Executor.
Application: It consists of multiple modules (Methods).
Queue: It helps to store message in FIFO order.
REST API: Data will be stored in JSON Format in file to help application Interface.
Scheduler: Responsible for monitoring the entire deployment process.
Cloud Connector: It helps to make interface with cloud provider.
WEB UI: Describe new applications, issue new deployment requests, obtain real-time
monitoring statistic.
Figure 1. AURA Architecture
Sc
Application
WEB UI Queue
REST API
Cloud
Connector
Scheduler
AURA Master
Application
Module (1) Module (2) Module (3)
AURA ExecutorAURA ExecutorAURA Executor
Cloud Provider
6. Idea Application Deployment Architecture
Dhaka University of Engineering and Technology, Gazipur
Programming ASP.NET & SQL Server 2012 Database
Namespace AURA {
Public function Module1(parameter){ send/Receive a message }
Public function Module2(parameter){ send/Receive a message }
Public function Module3(parameter){ send/Receive a message }
Public function Queue(parameter) { Stored FIFO & action do }
Public function Scheduler(parameter){ Retry within threshold time limit }
Public function data_manage(parameter){ JSON FORMAT Data Manipulate As DB }
Public function Cloud_connector(parameter){ cloud_connector=>Cloud
Provider=>Message passing to Modules}
Public function Web_UI(parameter) { Graphical Representation to User }
}
Sc
Application
WEB UI Queue
REST API
Cloud
Connector
Scheduler
AURA Master
Application
Module (1) Module (2) Module (3)
AURA ExecutorAURA ExecutorAURA Executor
Cloud Provider
7. Application Deployment Model
Dhaka University of Engineering and Technology, Gazipur
In Figure 2, It has three Modules (1), (2) and (3) that works as Method.
The horizontal solid rows are defined the message passing source to
destination points.
The vertical dotted rows are defined the time slice.
When a module waits for a message (e.g. points A’, B’), it blocks until the
message is received. [1]
A module sends a message in point A, module (1) proceeds with the
script execution between points A-C’. [1]
8. Error Recovery Process
Every module can send message to other Module . Message must reach in
specific threshold time. Otherwise, It has been tried to re-execute several
times within threshold time. Otherwise, it deletes the fail node script from
layered and adds the new empty layer. [1]
Figure 2 (a). The vertical edges define the script executions . [1]
Horizontal line presents the message passing in different modules. [2]
Figure 2 (b). It represents the dependency Graphs in which a module blocks
and scans for the health of the neighbor nodes.[2]
Dhaka University of Engineering and Technology, Gazipur
3
9. Transient Failure Handling Flowchart
Dhaka University of Engineering and Technology, Gazipur
Start
Call Service
Success ?
YesNO
Call Successful
Next
Action
Figure 4: Error Recovery Basic Flowchart
Transient
Failure ?
Yes
Retry ?
Delay Threshold Time
YesCall
failed
NO
Permanent
call fail
20<
10. Error Recovery Flowchart
Dhaka University of Engineering and Technology, Gazipur
Start
Two Parameter Graph T
Failed Node n
failed ← {n}
healthy← ∅
While failed ≠ ∅
Have t ∈ depends (T, v) ?
Have
failed (t) ?
failed ← failed ∪ {t}
healthy ← healthy ∪ {t}
NO
Yes
Yes
Yes
v = POP(failed)
NO
Return Healthy
End
While
Again
Figure 5: Error Recovery Flowchart
11. Cloud Deployment Graph
Web server starts before the database server is successfully configure in right way.
Each Executor will run the necessary deployment scripts .
To monitor the deployment status through a real-time monitoring UI. [2]
Specifically, To view each script’s logs, a real-time view of the Deployment Graph, the
status of the respective script execution.
Dhaka University of Engineering and Technology, Gazipur
Figure 6. Cloud deployment graph with 1 Web Server and 1 Database Server
12. Execute the Deployment Script
Green edges represent completed scripts.
Blue edges represent running scripts.
Red edges represent failed executions .
Gray edges represent pending scripts.
Web Server stores the deployment script.
DB Server helps data manipulation activities of the Application. [2]
Dhaka University of Engineering and Technology, Gazipur
Figure 7. Cloud deployment graph with 1 Web Server and 1 Database Server
DBServer1
DBServer2
WEBServer1
DBServer3
WEBServer2
APPServer3
3
APPServer2
APPServer1 APPServer4
WEBServer3
WEBServer4
13. Experimental Evaluation
Dhaka University of Engineering and Technology, Gazipur
To recovery from failure module deployment script using Retry Policy, to use of
No Bakeoff, Constant Bakeoff, Linear Bakeoff, Fibonacci Bakeoff, Quadratic
Bakeoff, Exponential Bakeoff and Polynomial Bakeoff etc.
# Retries
Constant
Backoff Time
Linear
Backoff
Time
Fibonacci
Backoff Time
Quadratic
Backoff Time
Exponential
Backoff Time
Polynomial
Backoff Time
1 1s 0s 0s 0s 1s 0s
3 3s 3s 2s 5s 7s 9s
5 5s
10s
7s 30s 31s 100s
10 10s 45s
88s
285s 1023s 2025s
20 20s 190s 10945s 2470s 1048575s 36100s
Table 1: Retries Backoff Approaches
14. Conclusion and Future Work
In our work, To use the AURA deployment tool to detect transient failure and re-
execute the fail module script.
The ability of AURA tool is minimum number of fail script re-execute or Retry
strategy. The retry policy delay depends on number of retries. From the Retry
strategies, To choose the exponential backoff and recover from the transient
failure in effective and optimal way .
I have tested several backoff strategies to use c programing . I have found
many results and compare among them . I have selected the exponential
backoff to protect the transient failure in effective and optimal way.
AURA (Error Recovery ) tool has been developed by using the python
programing language. I want to implement the message passing error node re-
execute or retry strategies using ASP.NET programing and database SQL
Server 2012 and snapshot implementation after certain time.
Dhaka University of Engineering and Technology, Gazipur
18. ASP.NET Message Passing Console Display
Dhaka University of Engineering and Technology, Gazipur
19. Microsoft Azure Cloud Dashboard
Dhaka University of Engineering and Technology, Gazipur
20. Microsoft Azure Cloud Message Routing
Dhaka University of Engineering and Technology, Gazipur
21. References
[1] Ioannis Giannakopoulos , Ioannis Konstantinou, Dimitrios Tsoumakos and
Nectarios Koziris “Cloud application deployment with transient failure
recovery .” Giannakopoulos et al. Journal of Cloud Computing: Advances,
Systems and Applications 2018 .
[2] Ioannis Giannakopoulos , Ioannis Konstantinou, Dimitrios Tsoumakos and
Nectarios Koziris “AURA Recovering from Transient Failures in Cloud
Deployments.” 2017 17th IEEE/ACM International Symposium on Cluster,
Cloud and Grid Computing.
Dhaka University of Engineering and Technology, Gazipur
22. Section Questions and Answers
Thanks
Dhaka University of Engineering and Technology, Gazipur