Successfully reported this slideshow.
Max Planck Institute for Informatics                                AG5: Databases and Information Systems              In...
Outline   Problem Statement and Background   Interaction Contracts Framework    • Formal Specification of the Committed ...
Problem Statement   Non-idempotence (math)    • f ( x) ≠ f ( x) , n > 1       n   Non-idempotence (Web, ERP, etc.)    • ...
Transaction Recovery       Accounts (LSN=0)            Accounts (LSN=3)       Number Balance              Number Balance  ...
However, …   Transactions alone are not a panacea!!!Web Client                        Web Application                    ...
Real-World n-Tier App             Dont panic! Peer-to-peer apps             may be even worse.Client         Web Server   ...
Outline   Problem Statement and Background   Interaction Contracts Framework    • Formal Specification of the Committed ...
IC Framework   Components and Guarantees    • Persistent Pcom: Persistent, testable      state and messages    • External...
Pcom Design   Redo Log & Recovery Managers   Piecewise determinism + Logging =    Full Determinism   Deterministic repl...
CICs Informal Design   CIC sender (Pcom) obligations    • Persist state before send    • Tag message with a MSN    • Rese...
Committed IC Activities   Activitychart = Functional View                                                          EXTERN...
Committed IC Monitor   Statechart = Behavioral View    • Finite State Automaton (FSA) +    • Nesting + Orthogonal substat...
Committed IC SenderCIC_SNDR_SC                                             MSG_LOOKUP                           MSG_RECOVE...
Committed IC ReceiverCIC_RCVR_SC                                                  [RCVR_LAST_LOGGED==STABLE]/             ...
Execution Abstraction   Kripke structure K=(S,R,L) over P    •   P is a finite set of atomic propositions    •   Software...
Computation Tree Logic   Basic Syntax    • Atomic propositions P  CTL(P)    • If p, q  CTL(P), then so are         Pro...
CIC Verification   Safety    For all log values v  {stable, installed}       AG       (         written(log)  log = v ...
ICs & Web Service                                         CUSTOMER                HTML_PROMPT    BUTTON_CLICKED          H...
Explicit Model Checking   For K = (S, R, L) over P, s  S, f  CTL(P)    • s |= f, f  P                 L(s, f) = true ...
Verification Run-Times                                              VerificationProperty/Specification Type         OBDD s...
Outline   Problem Statement and Background   Interaction Contracts Framework    • Formal Specification of the Committed ...
PHP and Zend Engine<html>                Web   Web    Script called 5 times     Web    Web               Client Client  Cl...
EOS   Exactly-once semantics with    • Transparent browser recovery    • Concurrent accesses to shared data    • Nondeter...
Experiment Setup    eBay-like auction service    User settings at frontend (private)    Auction items at backend (share...
Run-Time Overhead                                  FrontendSession                            Server1 step         5 steps...
Outline   Problem Statement and Background   Interaction Contracts Framework    • Formal Specification of the Committed ...
Summary   Generic IC framework specification   Formal verification at IC and app level    • To do: Overcome "model check...
EOS Demo                    Frontend              Backend                     Server                Server         B2C_LIN...
Thank You!
Upcoming SlideShare
Loading in …5
×

Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

630 views

Published on

  • Be the first to comment

  • Be the first to like this

Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

  1. 1. Max Planck Institute for Informatics AG5: Databases and Information Systems Integrated Data, Process, and Message Recovery for Failure Masking in Web Services Doctoral Thesis Colloquium German Shegalov funded bySaarbrücken, Aug. 26th, 2005 1
  2. 2. Outline Problem Statement and Background Interaction Contracts Framework • Formal Specification of the Committed IC • Verification of ICs with model checking • Verification of Web Service IC Model Implementation: Exactly-Once Web Service (EOS) • Overview • EOS-PHP • Demo Summary
  3. 3. Problem Statement Non-idempotence (math) • f ( x) ≠ f ( x) , n > 1 n Non-idempotence (Web, ERP, etc.) • "Request timeout"  "request failure" • "Request send"  "request resend" • 8 Medicare cards for a 3 member family • Order one, get many  , pay many 
  4. 4. Transaction Recovery Accounts (LSN=0) Accounts (LSN=3) Number Balance Number Balance 1 1000,00 1 900,00 2 2000,00 2 2100,00 At most once semantics BEGIN TRANSACTION /* LSN= 1: log for undo and redo in MM buffer*/ UPDATE Accounts SET balance = balance – 100,00 WHERE Number = 1 /* LSN = 2: log for undo and redo in MM buffer*/ UPDATE Accounts SET balance = balance + 100,00 WHERE Number = 2 /* LSN = 3: log commit and force (5-6 orders slower)*/ COMMIT TRANSACTION Redo Committed, Undo Uncommitted • LSN test guarantees idempotence
  5. 5. However, … Transactions alone are not a panacea!!!Web Client Web Application Database Server Server Purchase Request Start Transaction SQL Request SQL Response Timeline SQL Request SQL Response Commit Transaction ACK Order Confirmation Transaction Restart Purchase Request Non-idempotent execution! Resubmission
  6. 6. Real-World n-Tier App Dont panic! Peer-to-peer apps may be even worse.Client Web Server Expedia Sabre Sabre Amadeus Expedia Amadeus App Server App Server Server App ServerDB1 DB2 DB3 DB4
  7. 7. Outline Problem Statement and Background Interaction Contracts Framework • Formal Specification of the Committed IC • Verification of ICs with model checking • Verification of Web Service IC Model Implementation: Exactly-Once Web Service (EOS) • Overview • EOS-PHP • Demo Summary
  8. 8. IC Framework Components and Guarantees • Persistent Pcom: Persistent, testable state and messages • External Xcom (e.g., humans): No guarantees Interaction Contracts • Xcom  Pcom = External IC (XIC) • Pcom  Pcom = Committed IC (CIC) Exactly-Once Semantics • Forget rollbacks, exactly-once execution is guaranteed
  9. 9. Pcom Design Redo Log & Recovery Managers Piecewise determinism + Logging = Full Determinism Deterministic replay recovers Pcoms Installation Points speed up replay Failure model • Crashes Transient failures due to • Message losses nondeterministic Heisenbugs • No malicious manipulations • No disk corruption (sufficient redundancy)
  10. 10. CICs Informal Design CIC sender (Pcom) obligations • Persist state before send • Tag message with a MSN • Resend on timeout until stable ack • Resend on receivers "get msg" • Forget interaction on installed ack CIC receiver (Pcom) obligations • Eliminates duplicates by MSNs • Persists interaction before stable ack • "gets msg" if msg is not in log after failure • Ensures autonomous recovery before installed ack
  11. 11. Committed IC Activities Activitychart = Functional View EXTERNAL_APP_LOGIC SNDR_TRIGGER MSG_PROCESSED RCVR_CRASH CIC_ACFAILURE_PRONE_ENVIRONMENT @CIC_SC SNDR_CRASH CIC_SNDR_AC CIC_RCVR_AC SEND_MSG LINK_OUTAGE STABLE @CIC_RCVR_SC @CIC_SNDR_SC GET_MSG ICICSYSTEM_ADMINISTRATOR INSTALLED TIMEOUTS
  12. 12. Committed IC Monitor Statechart = Behavioral View • Finite State Automaton (FSA) + • Nesting + Orthogonal substates + • E[C]/A transitions: on Event while Condition  Leave source, enter target, execute Action  E.g., A = E means generate event E CIC_SC SNDR_S (not SNDR_CRASH) •Configuration =active(CIC_SNDR_AC) ]/ SENDING [not set of entered states start!(CIC_SNDR_AC) • Execution context = variable valuation  Stepi: confi  ctxti  confi+1  ctxti+1 RCVR_S (not RCVR_CRASH) RECEIVING [not active(CIC_RCVR_AC)]/ start!(CIC_RCVR_AC)
  13. 13. Committed IC SenderCIC_SNDR_SC MSG_LOOKUP MSG_RECOVERED_TM/ SEND_MSG GET_MSG_OK STABLE_S INSTALLED_OK/ SNDR_LAST_LOGGED:=INSTALLED SNDR_MSG_TM and not (STABLE_OK or STABLE_OK SNDR_STABLE_TM and INSTALLED_OK)/ not (INSTALLED_OK or GET_MSG_OK)/ SEND_MSG IS_INSTALLED INSTALLED_OK/ SENDING SNDR_LAST_LOGGED:=INSTALLED INSTALLED_S SNDR_ND/ SEND_MSG SNDR_TRIGGER [SNDR_LAST_LOGGED==INSTALLED] [SNDR_LAST_LOGGED==]/ SNDR_LAST_LOGGED PREPARE_PERSISTENCE SNDR_ND RECOVERY SNDR_CRASH T* EVENT_OK = EVENT   LINK_OUTAGE _TM means TIMEOUT
  14. 14. Committed IC ReceiverCIC_RCVR_SC [RCVR_LAST_LOGGED==STABLE]/ GET_MSG MSG_RECOVERY not SEND_MSG_OK and SEND_MSG_OK GET_MSG_TM/ SEND_MSG_OK GET_MSG [RCVR_LAST_LOGGED==] MSG_RECEIVED RECOVERY MSG_EXEC_TM/ RECEIVED; [RCVR_LAST_LOGGED==INSTALLED] [RCVR_LAST_LOGGED==STABLE] [ICIC]/ MSG_PROCESSED RCVR_LAST_LOGGED:=INSTALLED; INSTALLED ( RCVR_STABLE_TM or RCVR_ND [MSG_ORDER_MATTERS] ) [not ICIC and RCVR_LAST_LOGGED==]/ RCVR_LAST_LOGGED:=STABLE; STABLE RCVR_INSTALL_TM/ RCVR_LAST_LOGGED:=INSTALLED; INSTALLED RCVR_LAST_LOGGED:=INSTALLED INSTALLED_R STABLE_R SEND_MSG or IS_INSTALLED/ SEND_MSG or IS_INSTALLED/ STABLE INSTALLED RCVR_CRASH T* EVENT_OK = EVENT  LINK_OUTAGE, _TM means TIMEOUT
  15. 15. Execution Abstraction Kripke structure K=(S,R,L) over P • P is a finite set of atomic propositions • Software: P is a union of all memory bits • S finite set of states • R  S  S state transitions • L  S  P  {true, false} valuation • Non-determinism to determinism p Computation Tree vs. Sequence p p p, q  P q q
  16. 16. Computation Tree Logic Basic Syntax • Atomic propositions P  CTL(P) • If p, q  CTL(P), then so are  Propositional logic formulas (p, p  q, etc.)  Path quantifiers Exists, All + modality neXt, Until  EX p  {E, A} (p U q) Derived Syntax  AX p  (EX p )  A Finally p  A (true U p)  EF p  E (true U p)  A Globally p  ( E (true U p) )  EG p  ( A (true U p) )
  17. 17. CIC Verification Safety For all log values v  {stable, installed} AG ( written(log)  log = v  AX AG ¬(written(log)  log = v) ) i.e., a value is written at most once Liveness for timeouts < 30 steps • F< n eventually after at most n steps • AF<500 AG ¬failures  AF<700 CIC installed
  18. 18. ICs & Web Service CUSTOMER HTML_PROMPT BUTTON_CLICKED HTML_REPLY USER1_REQ BROWSER_INPUT BROWSER_OUTPUT <XIC_I_AC <XIC_O_AC @USER1_SC CLICK_CAPTURED WEBSRVR_REP_RCVD WEBSRVR_REQ WEBSRVR_REP <CIC_AC <CIC_AC WEBSRVR_REQ_RCVD APPSRVR2_REP_RCVD APPSRVR1_REP_RCVD APPSRVR1_REQ APPSRVR2_REQ APPSRVR2_REQ_RCVD APPSRVR2_REP APPSRVR1_REP <CIC_AC <CIC_AC <CIC_AC <CIC_AC Web server replys SNDR_ND = APPSRVR1_REQ_RCVD App server replies RCVR_ND = WEBSRVR_ND, XACT_UPDATE XACT_COMMITTED <TIC_AC i.e., commits app server reply order BROWSER_CRASH, WEBSERVER_CRASH, APPSERVER{1;2}_CRASH, DBSRVR_CRASH, AG websrvr_rep:send_msg  XACT_{USER, INTERNAL}_ABORT, WEB_APP{1,2}_LINK_OUTAGE, APP1_DB_LINK_OUTAGE BROWSER_WEBSRVR_LINK_OUTAGE  i=1,2 (appsrvr :rcvr_log=’stable  LOCAL_FAILURES i GLOBAL_FAILURES
  19. 19. Explicit Model Checking For K = (S, R, L) over P, s  S, f  CTL(P) • s |= f, f  P  L(s, f) = true • s |= f, f =f1  s| f1 • s |= f, f = f1  f2  s|= f1 or s|= f2 • s |= f, f = EX f (s, r)  R with r|= f • s |= f, f = E(f1 U f2)  if s already checked then false else check  if s|= f then true 2  if s|= f1 and (s, r)  R with r|= f then true • s|= f, f = A(f1 U f2)  if s already checked then false else check  if s|= f then true 2
  20. 20. Verification Run-Times VerificationProperty/Specification Type OBDD size Time Integer Timeout ~104 ~5 secondsIC-level safety Nondeterministic ~103 ~1sec. Timeout Integer Timeout ~106 ~10 hoursIC-levelliveness Nondeterministic ~105 ~10 hours Timeout Integer Timeout ~107 Not terminated1-user WSsafety Nondeterministic ~106 ~10 hours Timeout
  21. 21. Outline Problem Statement and Background Interaction Contracts Framework • Formal Specification of the Committed IC • Verification of ICs with model checking • Verification of Web Service IC Model Implementation: Exactly-Once Web Service (EOS) • Overview • EOS-PHP • Demo Summary
  22. 22. PHP and Zend Engine<html> Web Web Script called 5 times Web Web Client Client Client Client1. <html> server reports: Script called 1000 times Other</html>2. <?php3. session_start();4. $HTTP_SESSION_VARS["count"]++;5. printf("Script called %i times",6. $HTTP_SESSION_VARS["count"]); Zend Engine7. $ch = curl_init("http://eos-php.net/b2b.php");8. $b2b_reply = Session CURL curl_exec($ch);9. printf("Other server reports: %s", $b2b_reply);10. curl_close($ch);11.?>12.</html> Zend Engine Zend Engine Session CURL Session CURL
  23. 23. EOS Exactly-once semantics with • Transparent browser recovery • Concurrent accesses to shared data • Nondeterm. functions: time, curl_exec, rand • Any n in n-tier, any fanout • Failure masking: no changes to app code neither to PHP scripts, nor to the browser Performance enhancements (side effects) • Log structured data access (sequential I/O) • LRU buffers for state and log data • Latches (Shared/Exclusive) • session_start(bool $read_only)
  24. 24. Experiment Setup eBay-like auction service User settings at frontend (private) Auction items at backend (shared) 5 concurrent end users, synthetic load Frontend Server Backend Server P4 3Ghz, 1GB P4 3Ghz, 1GB POST (ICIC) POST (ICIC) action=increment action=increment b2b=true Web <html> private 1235 private private shared Client <p>Privatel Count: 3 privatecount count count count <p>Shared Count: 1235 count 23 2123 1234 1235 </html> 2 3
  25. 25. Run-Time Overhead FrontendSession Server1 step 5 steps Backend 10 steps ServerPHP elapsed time [sec] 0.1560 0.7900 1.6100 POST (ICIC) POST (ICIC) action=incrementEOS-PHPaction=increment [sec] elapsed time 0.3140 b2b=true 1.6850 3.1000Overhead (elapsed time) [%] Web <html> 101% 113% 1235 93% private private sharedPHP frontend CPU Count: 3 Client <p>Privatel time [sec] private private count count 0.0390 <p>Shared Count: 1235 count 0.2708 0.5727 count count 23 23 12341235EOS-PHP frontend CPU time [sec] </html> 21 23 0.0815 0.6000 1.1545Overhead (frontend CPU) [%] 109% 122% 102%PHP backend CPU time [sec] 0.0090 0.0550 0.1200EOS-PHP backend CPU time [sec] 0.0130 0.0750 0.1600Overhead (backend CPU) [%] 44% 36% 33%
  26. 26. Outline Problem Statement and Background Interaction Contracts Framework • Formal Specification of the Committed IC • Verification of ICs with model checking • Verification of Web Service IC Model Implementation: Exactly-Once Web Service (EOS) • Overview • EOS-PHP • Demo Summary
  27. 27. Summary Generic IC framework specification Formal verification at IC and app level • To do: Overcome "model checking" non- scalability Efficient implementation: EOS • Rigorous recovery guarantees  Based on the formal verified models • Many enhancements to PHP  LRU buffer management  Mostly sequential disk accesses  Concurrency control with latches
  28. 28. EOS Demo Frontend Backend Server Server B2C_LINK B2B_LINKUSER 1
  29. 29. Thank You!

×