Integrated Data, Message, and Process Recovery for Failure Masking in Web Services
1. Max Planck Institute for Informatics
AG5: Databases and Information Systems
Integrated
Data, Process, and Message
Recovery
for Failure Masking in Web Services
Doctoral Thesis Colloquium
German Shegalov
funded by
Saarbrücken, Aug. 26th, 2005 1
2. Outline
Problem Statement and Background
Interaction Contracts Framework
• Formal Specification of the Committed IC
• Verification of IC's with model checking
• Verification of Web Service IC Model
Implementation: Exactly-Once Web
Service (EOS)
• Overview
• EOS-PHP
• Demo
Summary
3. Problem Statement
Non-idempotence (math)
• f ( x) ≠ f ( x) , n > 1
n
Non-idempotence (Web, ERP, etc.)
• "Request timeout" "request failure"
• "Request send" "request resend"
• 8 Medicare cards for a 3 member family
• Order one, get many , pay many
4. Transaction Recovery
Accounts (LSN=0) Accounts (LSN=3)
Number Balance Number Balance
1 1000,00 1 900,00
2 2000,00 2 2100,00
At most once semantics
BEGIN TRANSACTION
/* LSN= 1: log for undo and redo in MM buffer*/
UPDATE Accounts SET balance = balance – 100,00
WHERE Number = 1
/* LSN = 2: log for undo and redo in MM buffer*/
UPDATE Accounts SET balance = balance + 100,00
WHERE Number = 2
/* LSN = 3: log commit and force (5-6 orders slower)*/
COMMIT TRANSACTION
Redo Committed, Undo Uncommitted
• LSN test guarantees idempotence
5. However, …
Transactions alone are not a panacea!!!
Web Client Web Application Database
Server Server
Purchase Request
Start Transaction
SQL Request
SQL Response
Timeline
SQL Request
SQL Response
Commit Transaction
ACK
Order Confirmation Transaction Restart
Purchase Request
Non-idempotent execution!
Resubmission
6. Real-World n-Tier App
Don't panic! Peer-to-peer apps
may be even worse.
Client
Web Server
Expedia Sabre
Sabre Amadeus
Expedia Amadeus
App Server App Server
Server App Server
DB1 DB2 DB3 DB4
7. Outline
Problem Statement and Background
Interaction Contracts Framework
• Formal Specification of the Committed IC
• Verification of IC's with model checking
• Verification of Web Service IC Model
Implementation: Exactly-Once Web
Service (EOS)
• Overview
• EOS-PHP
• Demo
Summary
8. IC Framework
Components and Guarantees
• Persistent Pcom: Persistent, testable
state and messages
• External Xcom (e.g., humans): No
guarantees
Interaction Contracts
• Xcom Pcom = External IC (XIC)
• Pcom Pcom = Committed IC (CIC)
Exactly-Once Semantics
• Forget rollbacks, exactly-once execution is
guaranteed
9. Pcom Design
Redo Log & Recovery Managers
Piecewise determinism + Logging =
Full Determinism
Deterministic replay recovers Pcom's
Installation Points speed up replay
Failure model
• Crashes Transient failures due to
• Message losses nondeterministic Heisenbugs
• No malicious manipulations
• No disk corruption (sufficient redundancy)
10. CIC's Informal Design
CIC sender (Pcom) obligations
• Persist state before send
• Tag message with a MSN
• Resend on timeout until stable ack
• Resend on receiver's "get msg"
• Forget interaction on installed ack
CIC receiver (Pcom) obligations
• Eliminates duplicates by MSN's
• Persists interaction before stable ack
• "gets msg" if msg is not in log after failure
• Ensures autonomous recovery before
installed ack
12. Committed IC Monitor
Statechart = Behavioral View
• Finite State Automaton (FSA) +
• Nesting + Orthogonal substates +
• E[C]/A transitions: on Event while Condition
Leave source, enter target, execute Action
E.g., A = E' means generate event E'
CIC_SC
SNDR_S
(not SNDR_CRASH)
•Configuration =active(CIC_SNDR_AC) ]/
SENDING [not set of entered
states
start!(CIC_SNDR_AC)
• Execution context = variable valuation
Stepi: confi ctxti confi+1 ctxti+1 RCVR_S
(not RCVR_CRASH)
RECEIVING
[not active(CIC_RCVR_AC)]/
start!(CIC_RCVR_AC)
13. Committed IC Sender
CIC_SNDR_SC
MSG_LOOKUP
MSG_RECOVERED_TM/
SEND_MSG GET_MSG_OK
STABLE_S INSTALLED_OK/
SNDR_LAST_LOGGED:='INSTALLED'
SNDR_MSG_TM and
not (STABLE_OK or STABLE_OK SNDR_STABLE_TM and
INSTALLED_OK)/ not (INSTALLED_OK or GET_MSG_OK)/
SEND_MSG IS_INSTALLED
INSTALLED_OK/
SENDING SNDR_LAST_LOGGED:='INSTALLED' INSTALLED_S
SNDR_ND/
SEND_MSG SNDR_TRIGGER [SNDR_LAST_LOGGED=='INSTALLED']
[SNDR_LAST_LOGGED=='']/
SNDR_LAST_LOGGED
PREPARE_PERSISTENCE SNDR_ND RECOVERY
SNDR_CRASH
T
*
EVENT_OK = EVENT LINK_OUTAGE _TM means TIMEOUT
14. Committed IC Receiver
CIC_RCVR_SC [RCVR_LAST_LOGGED=='STABLE']/
GET_MSG
MSG_RECOVERY
not SEND_MSG_OK
and SEND_MSG_OK
GET_MSG_TM/ SEND_MSG_OK
GET_MSG [RCVR_LAST_LOGGED=='']
MSG_RECEIVED RECOVERY
MSG_EXEC_TM/
RECEIVED;
[RCVR_LAST_LOGGED=='INSTALLED']
[RCVR_LAST_LOGGED=='STABLE'] [ICIC]/
MSG_PROCESSED RCVR_LAST_LOGGED:='INSTALLED';
INSTALLED
( RCVR_STABLE_TM or
RCVR_ND [MSG_ORDER_MATTERS] )
[not ICIC and RCVR_LAST_LOGGED=='']/
RCVR_LAST_LOGGED:='STABLE';
STABLE
RCVR_INSTALL_TM/ RCVR_LAST_LOGGED:='INSTALLED'; INSTALLED
RCVR_LAST_LOGGED:='INSTALLED' INSTALLED_R
STABLE_R
SEND_MSG or IS_INSTALLED/ SEND_MSG or IS_INSTALLED/
STABLE INSTALLED
RCVR_CRASH
T
*
EVENT_OK = EVENT LINK_OUTAGE, _TM means TIMEOUT
15. Execution Abstraction
Kripke structure K=(S,R,L) over P
• P is a finite set of atomic propositions
• Software: P is a union of all memory bits
• S finite set of states
• R S S state transitions
• L S P {true, false} valuation
• Non-determinism to determinism p
Computation Tree vs. Sequence
p p
p, q P
q q
16. Computation Tree Logic
Basic Syntax
• Atomic propositions P CTL(P)
• If p, q CTL(P), then so are
Propositional logic formulas (p, p q, etc.)
Path quantifiers Exists, All + modality neXt, Until
EX p
{E, A} (p U q)
Derived Syntax
AX p (EX p )
A Finally p A (true U p)
EF p E (true U p)
A Globally p ( E (true U p) )
EG p ( A (true U p) )
17. CIC Verification
Safety
For all log values v {'stable', 'installed'}
AG
(
written(log) log = v
AX AG ¬(written(log) log = v)
)
i.e., a value is written at most once
Liveness for timeouts < 30 steps
• F< n eventually after at most n steps
• AF<500 AG ¬failures AF<700 CIC installed
18. IC's & Web Service
CUSTOMER
HTML_PROMPT BUTTON_CLICKED HTML_REPLY
USER1_REQ
BROWSER_INPUT BROWSER_OUTPUT
<XIC_I_AC <XIC_O_AC
@USER1_SC
CLICK_CAPTURED WEBSRVR_REP_RCVD
WEBSRVR_REQ WEBSRVR_REP
<CIC_AC <CIC_AC
WEBSRVR_REQ_RCVD
APPSRVR2_REP_RCVD APPSRVR1_REP_RCVD
APPSRVR1_REQ APPSRVR2_REQ APPSRVR2_REQ_RCVD APPSRVR2_REP APPSRVR1_REP
<CIC_AC <CIC_AC <CIC_AC <CIC_AC
Web server reply's SNDR_ND =
APPSRVR1_REQ_RCVD
App server replies' RCVR_ND = WEBSRVR_ND,
XACT_UPDATE XACT_COMMITTED
<TIC_AC
i.e., commits app server reply order
BROWSER_CRASH, WEBSERVER_CRASH, APPSERVER{1;2}_CRASH, DBSRVR_CRASH,
AG websrvr_rep:send_msg
XACT_{USER, INTERNAL}_ABORT, WEB_APP{1,2}_LINK_OUTAGE, APP1_DB_LINK_OUTAGE
BROWSER_WEBSRVR_LINK_OUTAGE
i=1,2 (appsrvr :rcvr_log=’stable'
LOCAL_FAILURES
i
GLOBAL_FAILURES
19. Explicit Model Checking
For K = (S, R, L) over P, s S, f CTL(P)
• s |= f, f P L(s, f) = true
• s |= f, f =f1 s| f1
• s |= f, f = f1 f2 s|= f1 or s|= f2
• s |= f, f = EX f (s, r) R with r|= f
• s |= f, f = E(f1 U f2)
if s already checked then false else check
if s|= f then true
2
if s|= f1 and (s, r) R with r|= f then true
• s|= f, f = A(f1 U f2)
if s already checked then false else check
if s|= f then true
2
21. Outline
Problem Statement and Background
Interaction Contracts Framework
• Formal Specification of the Committed IC
• Verification of IC's with model checking
• Verification of Web Service IC Model
Implementation: Exactly-Once Web
Service (EOS)
• Overview
• EOS-PHP
• Demo
Summary
22. PHP and Zend Engine
<html>
Web Web
Script called 5 times Web Web
Client Client Client Client
1. <html> server reports: Script called 1000 times
Other
</html>
2. <?php
3. session_start();
4. $HTTP_SESSION_VARS["count"]++;
5. printf("Script called %i times",
6. $HTTP_SESSION_VARS["count"]);
Zend Engine
7. $ch = curl_init("http://eos-php.net/b2b.php");
8. $b2b_reply = Session CURL
curl_exec($ch);
9. printf("Other server reports: %s", $b2b_reply);
10. curl_close($ch);
11.?>
12.</html>
Zend Engine Zend Engine
Session CURL Session CURL
23. EOS
Exactly-once semantics with
• Transparent browser recovery
• Concurrent accesses to shared data
• Nondeterm. functions: time, curl_exec, rand
• Any n in n-tier, any fanout
• Failure masking: no changes to app code
neither to PHP scripts, nor to the browser
Performance enhancements (side effects)
• Log structured data access (sequential I/O)
• LRU buffers for state and log data
• Latches (Shared/Exclusive)
• session_start(bool $read_only)
24. Experiment Setup
eBay-like auction service
User settings at frontend (private)
Auction items at backend (shared)
5 concurrent end users, synthetic load
Frontend Server Backend Server
P4 3Ghz, 1GB P4 3Ghz, 1GB
POST (ICIC)
POST (ICIC) action=increment
action=increment b2b=true
Web <html> private 1235
private
private shared
Client <p>Privatel Count: 3 privatecount
count
count count
<p>Shared Count: 1235 count 23
2123 1234 1235
</html> 2 3
25. Run-Time Overhead
Frontend
Session Server1 step 5 steps Backend
10 steps
Server
PHP elapsed time [sec] 0.1560 0.7900 1.6100
POST (ICIC)
POST (ICIC) action=increment
EOS-PHPaction=increment [sec]
elapsed time 0.3140
b2b=true 1.6850 3.1000
Overhead (elapsed time) [%]
Web <html>
101% 113%
1235 93%
private
private shared
PHP frontend CPU Count: 3
Client <p>Privatel time [sec] private
private
count
count 0.0390
<p>Shared Count: 1235 count 0.2708 0.5727
count
count 23
23 12341235
EOS-PHP frontend CPU time [sec]
</html> 21
23
0.0815 0.6000 1.1545
Overhead (frontend CPU) [%] 109% 122% 102%
PHP backend CPU time [sec] 0.0090 0.0550 0.1200
EOS-PHP backend CPU time [sec] 0.0130 0.0750 0.1600
Overhead (backend CPU) [%] 44% 36% 33%
26. Outline
Problem Statement and Background
Interaction Contracts Framework
• Formal Specification of the Committed IC
• Verification of IC's with model checking
• Verification of Web Service IC Model
Implementation: Exactly-Once Web
Service (EOS)
• Overview
• EOS-PHP
• Demo
Summary
27. Summary
Generic IC framework specification
Formal verification at IC and app level
• To do: Overcome "model checking" non-
scalability
Efficient implementation: EOS
• Rigorous recovery guarantees
Based on the formal verified models
• Many enhancements to PHP
LRU buffer management
Mostly sequential disk accesses
Concurrency control with latches
28. EOS Demo
Frontend Backend
Server Server
B2C_LINK B2B_LINK
USER 1
Welcome to my colloquium. Today I present research results of my dissertation entitled "integrated data, process, and message recovery for failure masking in Web Services".
My presentation consists of the following points. I will state the problem of providing recovery guarantees for multi-tier applications. Then I will introduce our solution comprising a family of recovery protocols coined the "interaction contracts framework". I show you a generic state-and-activity chart specification of the committed IC easily adaptable to a concrete application scenario. First we verify a single instance of the generic specification. The we prove that it also behaves correctly in a composed Web Service model that uses IC instances as building blocks. In the second part of my talk I present a prototype system, EOS, I have built to demonstrate the IC framework viability for Web services. It enables failure masking in arbitrarily distributed Web applications written in the PHP programming language. Beyond that it provides the recovery guarantees for the end-user by incorporating the IC functionality into the Web browser, specifically, Microsoft Internet Explorer And I conclude the talk with a short summary.
The problem of doing Business over the Internet, or with a distributed Application infrastructure in general can be characterized by the term "non-idempotence". The mathematic definition of this term is rather simple: the results of a single and multiple function applications are not the same. With a distributed information system, the developers and the users need to realize that a timeout of a request may simply result from high delays during the peak load of the system rather than from a failure. The users have learned that hitting the refresh or a submit button several times is tempting but leads to unexpected results. For instance, a friend of mine applied for a new healthcare insurance and got 8 smart cards for his 3-member family. It does not always sound like a bad deal when you order one and get many goods unless you have to pay for all of them.
A traditional approach of doing business in a failure-prone environment manages the application state in a transactional database. Suppose we have a banking application with accounts stored in a relational table that maps account numbers to corresponding balances. The transaction shown on this slide transfers 100 euros from account 1 to account 2 as indicated by these 2 SQL statements. Declaring this operation sequence as a transaction , using begin and commit statements, guarantees that the sequence is executed atomically, either completely or not at all. A situation where account 1 simply loses 100 euros isn't possible even if the transaction is interrupted in the middle. To achieve this, each operation is logged ahead. The log entry contains the log sequence number and the information how to undo and redo this operation. Logging is initially done in the main memory. However, on transaction commit all log entries have to be written to disk synchronously, which is 6 orders of magnitude slower. This operation is called log forcing. After a failure the log on disk is analyzed and the operations of committed transactions are redone, whereas the transactions without a commit log entry are undone. Since the database server may fail several times before recovery completion, we need to make sure that undo and redo operations are not applied more than once. This is achieved by stamping the disk pages with the LSN of the most recent operation they reflect. A simple LSN test guarantees recovery idempotence.
Consider now a scenario with a 3-tier Web application where an end user submits a purchase request to the Web Application server. A transaction is started on the database server on behalf of the user. Assume that the database successfully commits the transaction, but the acknowledgement message does not reach the web application server either due to a database server crash or a network failure. Developers handle this failure as usually by retrying the transaction because they assume that the transaction has been aborted, which is not necessarily true as we have seen. Unfortunately this is not the end of story. How is the end user supposed to react on the server timeout message ?? People love hitting the refresh button of the browser. I am aware of some of those in this room. It is a very bad idea because Web servers normally do not eliminate duplicates. The bottom line is that recovery needs to treat messages as well as states to ensure correct execution.
When that simple 3-tier system was complicated. How long does it take to analyze all possible failure combinations and their implications in a system with 10 components spread over 4 tiers. How about ad-hoc interactions in a Peer-to-Peer network .
This problems have motivated the IC framework. It considers applications as consisting of a set of components that exchange messages. In this talk we concentrate on persistent components. They can recreate state and messages after a failure and can determine whether they have executed a particular message. Another relevant component type (external) covers the end users and conventional components outside the IC framework. Interaction contracts define the way how components need to exchange messages to keep the interactions recoverable. We will cover the Committed IC (CIC) in this talks as it is the most important IC in the framework. The main design goal is to ensure the exactly once semantics that guarantees that once an interaction has started, it will be executed exactly once. All failures are masked.
To provide recovery guarantees all Pcoms such as client and server components need to be equipped with logging and recovery capabilities. Unlike database systems, we do not want and do not need to enable undo. Components are piecewise deterministic, they execute deterministically between two consecutive non-deterministic events such incoming messages from other components or reading the system clock. SO, logging of nondeterministic events turns piecewise-deterministic components into truly deterministic ones. We can recreate Pcom's state and messages by simply replaying the log from some initial state. To accelerate the deterministic replay the component needs to truncate the log on a regular basis. before doing this it has to dump its current state to disk. We call such state dumps "installation points". Out failure model includes crashes of the sending and receiving components as well as network failures causing message losses. Such transient failures are due to nondeterministic so-called Heisenbugs that are impossible to reproduce to take them out. We do not consider malicious manipulations called commission failures. And we do not deal with the corruption of stable storage as this can be avoided by a sufficient replication.
The CIC can be informally described as follows: By sending a message to a different component the CIC sender commits its state. Usually, it forces the log to disk to make its state and the message recoverable. The sender deterministically tags its message with a unique id, a message sequence number MSN The sender keeps sending the message periodically until it gets a stable notification from the receiver. It keeps the message for the receiver may request the message again after a failure. The sender is released from all of its obligations when it gets an installed notification from the receiver. The CIC receiver eliminates message duplicates based on MSN. It persists an interaction before sending a stable notification to the sender. Normally this is done by logging the message header and forcing the log. The receiver requests the original message from the sender after a failure, when its log contains only the message header. The receiver ensures its autonomous recovery by forcing the complete message to disk or creating an installation point before sending an installed notification to the sender.
We use the state-and-activity chart language to formally specify the interaction contracts. The State-and-Activity chart language is provided with a leading tool for specification of reactive systems Statemate. The specification process begins with an activity chart providing the functional view on the system. Internal activities are represented by solid-line boxes. Dashed-line boxes specify external activities, an execution environment, and external applications. The arrows represent the data flow. Labels indicates which data or events are concerned. In this concrete scenario we specify an activity ensuring that a message is passed from one CIC component to an other one according to the CIC rules in a failure-prone environment that non-deterministically supplies failure events (crashes and link outages). What the application needs to know about it that it should activate the "sender trigger" and await an occurrence of the event "message processed" . This is important, please memorize that. The system administrator specifies the timeout values suitable for the given application along with some other options. The manager may stop the specification process at this stage. Activities are hierarchical and allow for a step-wise refinement. The next employee will say that actually the behavior of the cic activity is controlled by a so-called control activity cic_sc (sc stands for statechart) depicted as a green rounded box and has two further sub-activities: cic_sender and cic_receiver exchanging the messages and notifications as I have described informally before. The behaviors of these subactivities are defined by the corresponding control activities.
A Control activity is defined by a statechart. A Statechart is basically a finite state automaton with some additional features. First again we have nested states. Dashed-lines separate so called orthogonal components that represent processes that run simultaneously. In this case, the orthogonal components are the sndr and rcvr. The system is initialized by entering states through a default transition , a transition without a source state. A state targeted by a default transition is called a default substate. When a state is entered, its orthogonal substates are entered within the very same step. When a state is entered, its default substate is entered in the same step as well . Usual transitions are labeled with event-condition-action rules. The transition is taken if the event was generated in the previous step while the condition was true. When the transition is taken, the action is executed. The action might be as simple as an event generation or starting an activity as in this example o r a complex branching or loop statement. The only purpose of the given statechart is to restart the sender and the receiver activities after a crash. The condition "not active" guards the system from starting duplicate activity instances while the original one is still running. The set of entered states is called a configuration . Current variable valuation define the execution context of the system. Based on the current configuration and execution context the system performs a step by computing a new configuration and execution context.
This is the statechart controlling the behavior of a CIC sender. It is of course impossible to work out all the detail in this short talk. Let us however take a look on some important specification techniques. The systems starts in the default substate recovery. Further behavior depends on the content of the log. If the log is empty, the sender does not start sending, it awaits a trigger event . The log is modeled by a string variable, SNDR_LAST_LOGGED in this example. Log forcing is represented by value assignments to the log variable. A regular message or an acknowledgement is considered delivered i f its generation does not coincide with a LINK_OUTAGE event which is represented by compound events suffixed _OK. before sending message, the sender signals sender nondeterminism. Sending out a message usually commits the order of the received messages. Normal operation can be non-deterministically interrupted by a sender crash event . Transitions originating in a higher-level state dominate all transitions connecting substates. So the sender activity stops due to entering the termination connector represented by an orange circle labeled T. The activity terminates logically when it enters the state "installed"
This statechart defines the behavior of the sender's counterpart, the receiver component. The difference to the sender is in that the log variable can assume two values: stable and installed. And that log is forced-written only when we have a non-deterministic situation and the message order matters for the given application as specified by the developper. The receiver nondeterminism event is usually coupled with the sndr non determinism events generated by the sender activities running on the same component. Again, the receiver activity terminates logically in the state installed .
Before we start with the verification of the IC we need some additional definitions. A finite state computational system, e.g. a Statemate specification, can be represented as a Kripke structure. It contains a finite state transition graph with nodes labeled with atomic propositions that are valid in this node. These atomic propositions would refer to individual memory bits in a software system. If we unwind the state transition diagram we obtain a computation tree with potentially infinite branches.
A computation tree over the set of atomic propositions P can be characterized by the temporal logic called CTL. Its syntax is inductively defined as shown on this slide. The temporal aspects of the execution paths originating in the given state can be characterized by the Path quantifiers Exists and All combined with the temporal modalities Next and Util, finally, and globally. The modality Finally is used in a sense that some property holds eventually. Globally means that a property holds in every state of a path.
In my dissertation, I have proved many interesting safety and liveness properties using the Statemate's integrated model checker. I present the most important ones here. I show that my CIC specification for the sender as well as for the receiver never logs an interaction twice. We show for all execution paths that if a value is written to a log variable as indicated by the internal Statemate event written, it is never written again. To show liveness we use the Statemate-specific modality F less than meaning that the property holds eventually after at most so many steps. So I have proved that if failures do no longer occur after at most 500 steps. The CIC terminates after at most 700 steps if the maximum timeout value does not exceed 30 steps. Altogether this shows the exactly once character of the CIC specification
As the next step we would like to specify and verify the interaction contract framework applied to a complex Web Service scenario. We consider a 4-tier application encompassing a browser, a Web server, two application servers, and last but not least a database server. Internal activities are instances of the generic IC specifications. The arrows couple the event MSG_PROCESSED in one interaction with the SNDR_TRIGGER in another one. User submits a request to the Web Server. The web server calls both application servers asynchronously. One app server starts a transaction on the database server. The other responds immediately. When both app server replies arrive, the web server generates a reply to the browser that is displayed to the user. An interesting observation here is that some instances share the same failure events. For example, the sender crash in the web server reply is the same as the receiver crash in the application server reply. Analogously, the sender nondeterminism event of the web server reply and the receiver nondeterminism event in the application server replies are identical. Consequently the web server reply commits the order of the application server reply messages. Which we can verify by stating the following CTL formula. It says that when the web server reply is sent, the application server interaction are already captured in the log.
Explicit model checking is a rather simple recursive algorithm with the quadratic run-time. There are heuristic solutions using ordered binary decision diagrams as in the Statemate's symbolic model checker. Other model checkers use SAT solvers.
At the end, we learned that we need to make compromises between the realism of the models and their verifiability. A web service model using integer expressions to generate timeouts periodically as it would happen in a real system could not be verified. We succeeded after replacing the integer-based timeouts by nondeterministic 1-bit timeouts, which is a more general case. No engineering tricks however have helped to obtain any results for a multi-user model and for the liveness of the single-user-model.
Now I would like to briefly introduce the prototype system EOS.
I implemented the committed and external interaction contracts for PHP-based Web-services. PHP is a scripting language that is embedded into usual HTML pages. PHP is interpreted by the Zend engine that has a great variety of modules extending the capabilities of the PHP language. With PHP we can manage the application state across multiple HTTP requests using the Session module. There is a number of options of invoking remote Web services to build a complex multi-tier Application. In my work I concentrated on the CURL module. A reply message of a PHP script is normally an HTML page that is displayed by the browser.
Our prototype implements the exactly sematics. It delivers the recovery guarantees to the end-user by implementing the external and the committed interaction contracts for the Internet Explorer. On the PHP side we can recover concurrent request accessing shared objects. We can recover calls to the nondeterminisatic functions, time, curl_exec, and the random number generator rand. We do really support n-tier for any n with any fanout in the call structure. We have enhanced performance of the original PHP implementation with Regard to disk I/Os and made the conccurency control. For instance it is now possible to access the session data read only.
We performed measurements to evaluate the overhead of the interaction contracts in a 3-tier application that has a similar structure as an ebay like auction service. The front-end server manages private user setting that are accessed simultaneously without contention. The backend server manages the current highest bids for auction items that are accessed concurrently. The load was generated by a synthetic load generator Apache Jmeter from 5 different machines
The run-time overhead of EOS-PHP is on average about 100% in terms of both the elapsed and the CPU time. At this price we support failure making which radically simplifies the development process and provides a correct and highly available service to customers.
I conclude my talk. I presented formal specifications of recently proposed interaction contracts that have been just informally described in the original literature. We mathematically proved many safety and liveness properties of the ICs. We have learned that the model checking technology has its limitations due to the state-explosion problem. There are several directions how to cope with this. For example, some researchers have explore opportunities of combining manual induction proofs with the model checker. Last but not least, there are other verification technologies such as theorem proving. Another major part of my dissertation is a viable implementation of the IC framework for PHP-based Web services. We provided rigorous recovery guarantees for applications and end-users at a reasonable price. In the context of this work, we added some brand-new features as well as optimizations to the existing ones for both Internet Explorer and the PHP language.
Thank you very much for your attentions. And I know you have questions.