3/25/2015 This is a project for the CS7930 Class
-Distributed Systems Development with Research
1
An Infectious Disease
Surveillance Simulation
(IDSS) in the Cloud
Prepared by: Jorge Edison Lascano
edison_lascano@yahoo.com
PhD student at Utah State University
Associate Professor at Universidad de las Fuerzas Armadas ESPE-Ecuador
Presented in the Industry Day at Utah State University
Computer Science Department
3/25/2015 [Infectious Disease Surveillance
Simulation in the Cloud]
Introduction
•  Objective
–  Understand the behavior of Vector Timestamps in a
Distributed System Deployed in the Cloud.
•  Results
–  Infectious Disease Surveillance System
•  Implemented and tested
•  Setup for automated deployment to the cloud
(AWS EC2 instances with S3 storage)
•  Demonstrated effective use of Vector
Timestamps
2
3/25/2015 [Infectious Disease Surveillance
Simulation in the Cloud]
Agenda
•  Project Overview
•  Tasks Details
•  Technology
•  Outstanding Risks and Issues
•  Demo (github available)
•  Results
•  Conclusion
3
3/25/2015 [Infectious Disease Surveillance
Simulation in the Cloud]
Project Overview
•  Project description
–  This project is divided in three phases:
•  Implement a IDSS using communication
protocols like: UDP/TCP/HTTP
•  Implement automated deployment that
–  Starts up Virtual Machines in a cloud
–  Deploys software to those VM’s
–  Launches the IDSS
•  Study the resulting Vector Timestamps
–  This project allowed us to see the behavior of
asynchronous calls in a distributed environment
and to deploy and run automatically its components
4
3/25/2015 [Infectious Disease Surveillance
Simulation in the Cloud]
IDSS Diagram
5
3/25/2015 [Infectious Disease Surveillance
Simulation in the Cloud]
Vector Timestamp
6
Source http://compquiz.blogspot.com/2010/12/logical-and-vector-timestamps-in.html
3/25/2015 [Infectious Disease Surveillance
Simulation in the Cloud]
7
Tasks Details
Ord. Task Description
1 Implement HDS A Health District System collects infectious disease
information from EMRs and keeps track of diseases,
also sends information to every DOA; DOA1: Influenza,
DOA2: Chicken pox; and, DOA3: Measles.
Saves its Vector Timestamp to the log file
2 Implement EMR The Electronical Medical Record Simulator sends
information about diseases periodically to its HDS
Saves its Vector Timestamp to the log file.
3 Implement DOA Every DOA keeps track of a disease rate, if a threshold
is reached, it sends a notification to every HDS.
Saves its Vector Timestamp to the log file.
4 Local Tests EMR – HDS – DOA are generating traffic and Vector
Timestamps locally
3/25/2015 [Infectious Disease Surveillance
Simulation in the Cloud]
8
Tasks Details
Ord. Task Description
5 Launching /
starting
Instances
A number of EC2 instances are launched, these
instances will host the EMRs, HDSs and DOAs
processes.
6 Name
Resolution
A shell script that registers the corresponding 9 EMRs, 3
HDs and 3 DOAs is implemented and updates the hosts
file in every EC2 instance. The number of launched
instances vary from 1 to 15. A simple Round Robin
algorithm is implement in case of less than 15 instances
were launched
7 Deployment A script that allows the update of changes and automatic
distribution of the system in the EC2 instances is created
8 Execution After the system was deployed, a script that runs every
process based on the name resolution will execute every
one of the 15 processes
3/25/2015 [Infectious Disease Surveillance
Simulation in the Cloud]
Technology
•  Framework: node.js
•  IDE: Webstorm
•  Cloud Provision: AWS: S3+EC2+EBS
•  Languages
–  Javascript (node.js)
–  shell script (bash, awk, sed)
•  Communication Protocols
–  UDP, HTTP
–  ssh, aws cli
•  Partial Ordering Algorithm: Vector Timestamps
9
3/25/2015 [Infectious Disease Surveillance
Simulation in the Cloud]
Outstanding Risks and Issues
•  Project Risks
–  Phase 1: Implementation of the system
•  Risk:
–  New language, new IDE
•  Solution:
–  Tutorials
–  Phase 2: Distribution in the Cloud
•  Risk :
–  Running out of money (I still can not find my money)
•  Solution:
–  Monitor, detach and shutdown every resource
•  Project Issues
–  Phase 1:
•  Debugging
–  Phase 2:
•  Testing
•  S3 sync will not update if the file size is the same
10
3/25/2015 [Infectious Disease Surveillance
Simulation in the Cloud]
Demo
•  https://github.com/elascano/HW2VectorTimestamp_on_AWS
11
EC2 instances
Vector Timestamps
3/25/2015 [Infectious Disease Surveillance
Simulation in the Cloud]
Results
12
•  A set of vector timestamps that can be used to
partially order the execution if the events.
•  These vector timestamps allow to see which
are the processes that supports most of the
load in the whole system.
•  A non-blocking / asynchronous system that
allows processing information from any client
at any point of time, the asynchronous feature
was easily implemented using node.js
•  An automated process for deployment of
distributed systems.
3/25/2015 [Infectious Disease Surveillance
Simulation in the Cloud]
Conclusion
•  This project helped understand the order/
disorder of how the processes and their events
are executed based on the communication
model in a real distributed environment.
•  If there is not direct or indirect connection
(communications) between process x and y,
then the y-th element of x's timestamp doesn't
get updated.
edison_lascano@yahoo.com
13

An Infectious Disease Surveillance Simulation (IDSS) in the Cloud

  • 1.
    3/25/2015 This isa project for the CS7930 Class -Distributed Systems Development with Research 1 An Infectious Disease Surveillance Simulation (IDSS) in the Cloud Prepared by: Jorge Edison Lascano edison_lascano@yahoo.com PhD student at Utah State University Associate Professor at Universidad de las Fuerzas Armadas ESPE-Ecuador Presented in the Industry Day at Utah State University Computer Science Department
  • 2.
    3/25/2015 [Infectious DiseaseSurveillance Simulation in the Cloud] Introduction •  Objective –  Understand the behavior of Vector Timestamps in a Distributed System Deployed in the Cloud. •  Results –  Infectious Disease Surveillance System •  Implemented and tested •  Setup for automated deployment to the cloud (AWS EC2 instances with S3 storage) •  Demonstrated effective use of Vector Timestamps 2
  • 3.
    3/25/2015 [Infectious DiseaseSurveillance Simulation in the Cloud] Agenda •  Project Overview •  Tasks Details •  Technology •  Outstanding Risks and Issues •  Demo (github available) •  Results •  Conclusion 3
  • 4.
    3/25/2015 [Infectious DiseaseSurveillance Simulation in the Cloud] Project Overview •  Project description –  This project is divided in three phases: •  Implement a IDSS using communication protocols like: UDP/TCP/HTTP •  Implement automated deployment that –  Starts up Virtual Machines in a cloud –  Deploys software to those VM’s –  Launches the IDSS •  Study the resulting Vector Timestamps –  This project allowed us to see the behavior of asynchronous calls in a distributed environment and to deploy and run automatically its components 4
  • 5.
    3/25/2015 [Infectious DiseaseSurveillance Simulation in the Cloud] IDSS Diagram 5
  • 6.
    3/25/2015 [Infectious DiseaseSurveillance Simulation in the Cloud] Vector Timestamp 6 Source http://compquiz.blogspot.com/2010/12/logical-and-vector-timestamps-in.html
  • 7.
    3/25/2015 [Infectious DiseaseSurveillance Simulation in the Cloud] 7 Tasks Details Ord. Task Description 1 Implement HDS A Health District System collects infectious disease information from EMRs and keeps track of diseases, also sends information to every DOA; DOA1: Influenza, DOA2: Chicken pox; and, DOA3: Measles. Saves its Vector Timestamp to the log file 2 Implement EMR The Electronical Medical Record Simulator sends information about diseases periodically to its HDS Saves its Vector Timestamp to the log file. 3 Implement DOA Every DOA keeps track of a disease rate, if a threshold is reached, it sends a notification to every HDS. Saves its Vector Timestamp to the log file. 4 Local Tests EMR – HDS – DOA are generating traffic and Vector Timestamps locally
  • 8.
    3/25/2015 [Infectious DiseaseSurveillance Simulation in the Cloud] 8 Tasks Details Ord. Task Description 5 Launching / starting Instances A number of EC2 instances are launched, these instances will host the EMRs, HDSs and DOAs processes. 6 Name Resolution A shell script that registers the corresponding 9 EMRs, 3 HDs and 3 DOAs is implemented and updates the hosts file in every EC2 instance. The number of launched instances vary from 1 to 15. A simple Round Robin algorithm is implement in case of less than 15 instances were launched 7 Deployment A script that allows the update of changes and automatic distribution of the system in the EC2 instances is created 8 Execution After the system was deployed, a script that runs every process based on the name resolution will execute every one of the 15 processes
  • 9.
    3/25/2015 [Infectious DiseaseSurveillance Simulation in the Cloud] Technology •  Framework: node.js •  IDE: Webstorm •  Cloud Provision: AWS: S3+EC2+EBS •  Languages –  Javascript (node.js) –  shell script (bash, awk, sed) •  Communication Protocols –  UDP, HTTP –  ssh, aws cli •  Partial Ordering Algorithm: Vector Timestamps 9
  • 10.
    3/25/2015 [Infectious DiseaseSurveillance Simulation in the Cloud] Outstanding Risks and Issues •  Project Risks –  Phase 1: Implementation of the system •  Risk: –  New language, new IDE •  Solution: –  Tutorials –  Phase 2: Distribution in the Cloud •  Risk : –  Running out of money (I still can not find my money) •  Solution: –  Monitor, detach and shutdown every resource •  Project Issues –  Phase 1: •  Debugging –  Phase 2: •  Testing •  S3 sync will not update if the file size is the same 10
  • 11.
    3/25/2015 [Infectious DiseaseSurveillance Simulation in the Cloud] Demo •  https://github.com/elascano/HW2VectorTimestamp_on_AWS 11 EC2 instances Vector Timestamps
  • 12.
    3/25/2015 [Infectious DiseaseSurveillance Simulation in the Cloud] Results 12 •  A set of vector timestamps that can be used to partially order the execution if the events. •  These vector timestamps allow to see which are the processes that supports most of the load in the whole system. •  A non-blocking / asynchronous system that allows processing information from any client at any point of time, the asynchronous feature was easily implemented using node.js •  An automated process for deployment of distributed systems.
  • 13.
    3/25/2015 [Infectious DiseaseSurveillance Simulation in the Cloud] Conclusion •  This project helped understand the order/ disorder of how the processes and their events are executed based on the communication model in a real distributed environment. •  If there is not direct or indirect connection (communications) between process x and y, then the y-th element of x's timestamp doesn't get updated. edison_lascano@yahoo.com 13