SlideShare a Scribd company logo
1 of 29
Max Planck Institute for Informatics
                                AG5: Databases and Information Systems



              Integrated
     Data, Process, and Message
               Recovery
 for Failure Masking in Web Services
                      Doctoral Thesis Colloquium
                         German Shegalov

                                funded by




Saarbrücken, Aug. 26th, 2005                                                1
Outline
   Problem Statement and Background
   Interaction Contracts Framework
    • Formal Specification of the Committed IC
    • Verification of IC's with model checking
    • Verification of Web Service IC Model
   Implementation: Exactly-Once Web
    Service (EOS)
    • Overview
    • EOS-PHP
    • Demo
   Summary
Problem Statement


   Non-idempotence (math)
    • f ( x) ≠ f ( x) , n > 1
       n



   Non-idempotence (Web, ERP, etc.)
    • "Request timeout"  "request failure"
    • "Request send"  "request resend"
    • 8 Medicare cards for a 3 member family
    • Order one, get many  , pay many 
Transaction Recovery
       Accounts (LSN=0)            Accounts (LSN=3)
       Number Balance              Number Balance
       1        1000,00            1       900,00
       2        2000,00            2       2100,00

   At most once semantics
     BEGIN TRANSACTION
     /* LSN= 1: log for undo and redo in MM buffer*/
        UPDATE Accounts SET balance = balance – 100,00
          WHERE Number = 1
     /* LSN = 2: log for undo and redo in MM buffer*/
        UPDATE Accounts SET balance = balance + 100,00
          WHERE Number = 2
     /* LSN = 3: log commit and force (5-6 orders slower)*/
     COMMIT TRANSACTION
   Redo Committed, Undo Uncommitted
    • LSN test guarantees idempotence
However, …
   Transactions alone are not a panacea!!!

Web Client                        Web Application                         Database
                                      Server                               Server

               Purchase Request
                                             Start Transaction
                                             SQL Request
                                                                   SQL Response
    Timeline




                                             SQL Request
                                                                   SQL Response
                                             Commit Transaction
                                                                         ACK
                        Order Confirmation   Transaction Restart

               Purchase Request
                                              Non-idempotent execution!
               Resubmission
Real-World n-Tier App
             Don't panic! Peer-to-peer apps
             may be even worse.
Client



         Web Server


          Expedia Sabre
                      Sabre       Amadeus
          Expedia                 Amadeus
         App Server App Server
                      Server     App Server



DB1        DB2         DB3         DB4
Outline
   Problem Statement and Background
   Interaction Contracts Framework
    • Formal Specification of the Committed IC
    • Verification of IC's with model checking
    • Verification of Web Service IC Model
   Implementation: Exactly-Once Web
    Service (EOS)
    • Overview
    • EOS-PHP
    • Demo
   Summary
IC Framework
   Components and Guarantees
    • Persistent Pcom: Persistent, testable
      state and messages
    • External Xcom (e.g., humans): No
      guarantees

   Interaction Contracts
    • Xcom  Pcom = External IC (XIC)
    • Pcom  Pcom = Committed IC (CIC)

   Exactly-Once Semantics
    • Forget rollbacks, exactly-once execution is
      guaranteed
Pcom Design
   Redo Log & Recovery Managers
   Piecewise determinism + Logging =
    Full Determinism
   Deterministic replay recovers Pcom's
   Installation Points speed up replay
   Failure model
    • Crashes           Transient failures due to
    • Message losses nondeterministic Heisenbugs
    • No malicious manipulations
    • No disk corruption (sufficient redundancy)
CIC's Informal Design
   CIC sender (Pcom) obligations
    • Persist state before send
    • Tag message with a MSN
    • Resend on timeout until stable ack
    • Resend on receiver's "get msg"
    • Forget interaction on installed ack
   CIC receiver (Pcom) obligations
    • Eliminates duplicates by MSN's
    • Persists interaction before stable ack
    • "gets msg" if msg is not in log after failure
    • Ensures autonomous recovery before
      installed ack
Committed IC Activities
   Activitychart = Functional View
                                                          EXTERNAL_APP_LOGIC




                                                   SNDR_TRIGGER      MSG_PROCESSED



                            RCVR_CRASH    CIC_AC

FAILURE_PRONE_ENVIRONMENT
                                                                  @CIC_SC
                            SNDR_CRASH
                                            CIC_SNDR_AC                         CIC_RCVR_AC
                                                                  SEND_MSG


                            LINK_OUTAGE


                                                                   STABLE

                                                                                  @CIC_RCVR_SC
                                              @CIC_SNDR_SC
                                                                    GET_MSG

                            ICIC

SYSTEM_ADMINISTRATOR                                                INSTALLED
                            TIMEOUTS
Committed IC Monitor
   Statechart = Behavioral View
    • Finite State Automaton (FSA) +
    • Nesting + Orthogonal substates +
    • E[C]/A transitions: on Event while Condition
       Leave source, enter target, execute Action
       E.g., A = E' means generate event E'

           CIC_SC
                                                       SNDR_S
                  (not SNDR_CRASH)
    •Configuration =active(CIC_SNDR_AC) ]/
          SENDING [not set of entered
                                    states
                  start!(CIC_SNDR_AC)
    • Execution context = variable valuation
         Stepi: confi  ctxti  confi+1  ctxti+1      RCVR_S
                          (not RCVR_CRASH)
              RECEIVING
                          [not active(CIC_RCVR_AC)]/
                          start!(CIC_RCVR_AC)
Committed IC Sender
CIC_SNDR_SC
                                             MSG_LOOKUP

                           MSG_RECOVERED_TM/
                           SEND_MSG                       GET_MSG_OK

                                               STABLE_S              INSTALLED_OK/
                                                                     SNDR_LAST_LOGGED:='INSTALLED'
    SNDR_MSG_TM and
    not (STABLE_OK or      STABLE_OK    SNDR_STABLE_TM and
         INSTALLED_OK)/                 not (INSTALLED_OK or GET_MSG_OK)/
    SEND_MSG                            IS_INSTALLED

                             INSTALLED_OK/
           SENDING           SNDR_LAST_LOGGED:='INSTALLED'                      INSTALLED_S


            SNDR_ND/
            SEND_MSG        SNDR_TRIGGER                          [SNDR_LAST_LOGGED=='INSTALLED']
                            [SNDR_LAST_LOGGED=='']/
                             SNDR_LAST_LOGGED
    PREPARE_PERSISTENCE     SNDR_ND                 RECOVERY


                                               SNDR_CRASH


                                                   T




*
    EVENT_OK = EVENT   LINK_OUTAGE                            _TM means TIMEOUT
Committed IC Receiver
CIC_RCVR_SC                                                  [RCVR_LAST_LOGGED=='STABLE']/
                                                             GET_MSG
                                   MSG_RECOVERY
               not SEND_MSG_OK
               and                 SEND_MSG_OK
               GET_MSG_TM/                                   SEND_MSG_OK
               GET_MSG                                       [RCVR_LAST_LOGGED=='']
                                    MSG_RECEIVED                                             RECOVERY

                                             MSG_EXEC_TM/
                                             RECEIVED;
                                                                         [RCVR_LAST_LOGGED=='INSTALLED']
    [RCVR_LAST_LOGGED=='STABLE']                      [ICIC]/
                                   MSG_PROCESSED      RCVR_LAST_LOGGED:='INSTALLED';
                                                      INSTALLED
                      ( RCVR_STABLE_TM or
                      RCVR_ND [MSG_ORDER_MATTERS] )
                      [not ICIC and RCVR_LAST_LOGGED=='']/
                      RCVR_LAST_LOGGED:='STABLE';
                      STABLE

                RCVR_INSTALL_TM/ RCVR_LAST_LOGGED:='INSTALLED'; INSTALLED
                                 RCVR_LAST_LOGGED:='INSTALLED'                         INSTALLED_R
    STABLE_R


    SEND_MSG or IS_INSTALLED/                                                  SEND_MSG or IS_INSTALLED/
    STABLE                                                                     INSTALLED

                                                 RCVR_CRASH


                                                      T


*
    EVENT_OK = EVENT  LINK_OUTAGE, _TM means TIMEOUT
Execution Abstraction
   Kripke structure K=(S,R,L) over P
    •   P is a finite set of atomic propositions
    •   Software: P is a union of all memory bits
    •   S finite set of states
    •   R  S  S state transitions
    •   L  S  P  {true, false} valuation
    •   Non-determinism to determinism         p
        Computation Tree vs. Sequence
                                          p         p
                           p, q  P
                                      q       q
Computation Tree Logic
   Basic Syntax
    • Atomic propositions P  CTL(P)
    • If p, q  CTL(P), then so are
         Propositional logic formulas (p, p  q, etc.)
         Path quantifiers Exists, All + modality neXt, Until
         EX p
         {E, A} (p U q)
   Derived Syntax
         AX p                (EX p )
         A Finally p         A (true U p)
         EF p                E (true U p)
         A Globally p        ( E (true U p) )
         EG p                ( A (true U p) )
CIC Verification
   Safety
    For all log values v  {'stable', 'installed'}
       AG
       (
         written(log)  log = v 
         AX AG ¬(written(log)  log = v)
       )
    i.e., a value is written at most once
   Liveness for timeouts < 30 steps
    • F< n eventually after at most n steps
    • AF<500 AG ¬failures  AF<700 CIC installed
IC's & Web Service
                                         CUSTOMER



                HTML_PROMPT    BUTTON_CLICKED          HTML_REPLY


                                                                                                  USER1_REQ
                    BROWSER_INPUT                   BROWSER_OUTPUT
                    <XIC_I_AC                       <XIC_O_AC
                                                                               @USER1_SC

                        CLICK_CAPTURED              WEBSRVR_REP_RCVD



                        WEBSRVR_REQ                   WEBSRVR_REP
                        <CIC_AC                       <CIC_AC


                   WEBSRVR_REQ_RCVD

                                                                     APPSRVR2_REP_RCVD     APPSRVR1_REP_RCVD

         APPSRVR1_REQ               APPSRVR2_REQ    APPSRVR2_REQ_RCVD     APPSRVR2_REP      APPSRVR1_REP
         <CIC_AC                    <CIC_AC                               <CIC_AC           <CIC_AC

   Web server reply's SNDR_ND =
       APPSRVR1_REQ_RCVD


    App server replies' RCVR_ND = WEBSRVR_ND,
          XACT_UPDATE     XACT_COMMITTED
          <TIC_AC


    i.e., commits app server reply order
          BROWSER_CRASH,                             WEBSERVER_CRASH, APPSERVER{1;2}_CRASH, DBSRVR_CRASH,

    AG websrvr_rep:send_msg 
          XACT_{USER, INTERNAL}_ABORT,               WEB_APP{1,2}_LINK_OUTAGE, APP1_DB_LINK_OUTAGE
          BROWSER_WEBSRVR_LINK_OUTAGE


        i=1,2 (appsrvr :rcvr_log=’stable' 
             LOCAL_FAILURES
                                i
                                                                       GLOBAL_FAILURES
Explicit Model Checking
   For K = (S, R, L) over P, s  S, f  CTL(P)
    • s |= f, f  P                 L(s, f) = true
    • s |= f, f =f1                s| f1
    • s |= f, f = f1  f2  s|= f1 or s|= f2
    • s |= f, f = EX f             (s, r)  R with r|= f
    • s |= f, f = E(f1 U f2)
        if s already checked then false else check
        if s|= f then true
                  2

          if s|= f1 and (s, r)  R with r|= f then true
    • s|= f, f = A(f1 U f2)
        if s already checked then false else check
        if s|= f then true
                  2
Verification Run-Times

                                              Verification
Property/Specification Type         OBDD size
                                              Time
                  Integer Timeout   ~104       ~5 seconds
IC-level safety   Nondeterministic
                                   ~103        ~1sec.
                  Timeout
                  Integer Timeout   ~106       ~10 hours
IC-level
liveness          Nondeterministic
                                   ~105        ~10 hours
                  Timeout
                  Integer Timeout   ~107       Not terminated
1-user WS
safety            Nondeterministic
                                   ~106        ~10 hours
                  Timeout
Outline
   Problem Statement and Background
   Interaction Contracts Framework
    • Formal Specification of the Committed IC
    • Verification of IC's with model checking
    • Verification of Web Service IC Model
   Implementation: Exactly-Once Web
    Service (EOS)
    • Overview
    • EOS-PHP
    • Demo
   Summary
PHP and Zend Engine
<html>
                Web   Web
    Script called 5 times     Web    Web
               Client Client  Client Client
1. <html> server reports: Script called 1000 times
    Other
</html>
2. <?php
3.      session_start();
4.      $HTTP_SESSION_VARS["count"]++;
5.      printf("Script called %i times",
6.                         $HTTP_SESSION_VARS["count"]);
                   Zend Engine
7.     $ch = curl_init("http://eos-php.net/b2b.php");
8.     $b2b_reply = Session CURL
                     curl_exec($ch);
9.     printf("Other server reports: %s", $b2b_reply);
10.    curl_close($ch);
11.?>
12.</html>



      Zend Engine             Zend Engine

      Session   CURL           Session   CURL
EOS
   Exactly-once semantics with
    • Transparent browser recovery
    • Concurrent accesses to shared data
    • Nondeterm. functions: time, curl_exec, rand
    • Any n in n-tier, any fanout
    • Failure masking: no changes to app code
      neither to PHP scripts, nor to the browser
   Performance enhancements (side effects)
    • Log structured data access (sequential I/O)
    • LRU buffers for state and log data
    • Latches (Shared/Exclusive)
    • session_start(bool $read_only)
Experiment Setup
    eBay-like auction service
    User settings at frontend (private)
    Auction items at backend (shared)
    5 concurrent end users, synthetic load
                                     Frontend Server                       Backend Server
                                      P4 3Ghz, 1GB                          P4 3Ghz, 1GB

                                                        POST (ICIC)
             POST (ICIC)                                action=increment
             action=increment                           b2b=true

    Web      <html>                           private     1235
                                           private
                                         private                              shared
    Client   <p>Privatel Count: 3      privatecount
                                            count
                                         count                                 count
             <p>Shared Count: 1235     count 23
                                           2123                            1234 1235
             </html>                     2 3
Run-Time Overhead
                                  Frontend
Session                            Server1 step         5 steps Backend
                                                                10 steps
                                                                 Server
PHP elapsed time [sec]                       0.1560         0.7900          1.6100
                                                  POST (ICIC)
         POST (ICIC)                              action=increment
EOS-PHPaction=increment [sec]
        elapsed time                         0.3140
                                                  b2b=true 1.6850           3.1000
Overhead (elapsed time) [%]
 Web    <html>
                                   101%             113%
                                                 1235           93%
                                        private
                                      private                    shared
PHP frontend CPU Count: 3
 Client <p>Privatel time [sec]      private
                                  private
                                        count
                                      count 0.0390
      <p>Shared Count: 1235         count                0.2708       0.5727
                                                                  count
                                  count 23
                                        23                     12341235
EOS-PHP frontend CPU time [sec]
      </html>                         21
                                    23
                                             0.0815          0.6000         1.1545
Overhead (frontend CPU) [%]        109%               122%            102%
PHP backend CPU time [sec]                   0.0090          0.0550         0.1200
EOS-PHP backend CPU time [sec]               0.0130          0.0750         0.1600
Overhead (backend CPU) [%]         44%                36%             33%
Outline
   Problem Statement and Background
   Interaction Contracts Framework
    • Formal Specification of the Committed IC
    • Verification of IC's with model checking
    • Verification of Web Service IC Model
   Implementation: Exactly-Once Web
    Service (EOS)
    • Overview
    • EOS-PHP
    • Demo
   Summary
Summary
   Generic IC framework specification
   Formal verification at IC and app level
    • To do: Overcome "model checking" non-
      scalability
   Efficient implementation: EOS
    • Rigorous recovery guarantees
         Based on the formal verified models
    • Many enhancements to PHP
       LRU buffer management
       Mostly sequential disk accesses

       Concurrency control with latches
EOS Demo


                    Frontend              Backend
                     Server                Server
         B2C_LINK              B2B_LINK


USER 1
Thank You!

More Related Content

What's hot

Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyJean-François Gagné
 
Upgrade 11.2.0.1 rac db to 11.2.0.2 in linux
Upgrade 11.2.0.1 rac db to 11.2.0.2 in linuxUpgrade 11.2.0.1 rac db to 11.2.0.2 in linux
Upgrade 11.2.0.1 rac db to 11.2.0.2 in linuxmaclean liu
 
Weblogic Administration Managed Server migration
Weblogic Administration Managed Server migrationWeblogic Administration Managed Server migration
Weblogic Administration Managed Server migrationRakesh Gujjarlapudi
 
MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...
MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...
MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...Jean-François Gagné
 
Summary tables with flexviews
Summary tables with flexviewsSummary tables with flexviews
Summary tables with flexviewsJustin Swanhart
 
MySQL Binary Log API Presentation - OSCON 2011
MySQL Binary Log API Presentation - OSCON 2011MySQL Binary Log API Presentation - OSCON 2011
MySQL Binary Log API Presentation - OSCON 2011Mats Kindahl
 
MySQL 5.6 Replication Webinar
MySQL 5.6 Replication WebinarMySQL 5.6 Replication Webinar
MySQL 5.6 Replication WebinarMark Swarbrick
 
Oracle 10g Performance: chapter 11 SQL*Net
Oracle 10g Performance: chapter 11 SQL*NetOracle 10g Performance: chapter 11 SQL*Net
Oracle 10g Performance: chapter 11 SQL*NetKyle Hailey
 
Zookeeper Architecture
Zookeeper ArchitectureZookeeper Architecture
Zookeeper ArchitecturePrasad Wali
 

What's hot (12)

What’s new in Windows Server 2012 Active Directory?
What’s new in Windows Server 2012 Active Directory?What’s new in Windows Server 2012 Active Directory?
What’s new in Windows Server 2012 Active Directory?
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash Safety
 
Upgrade 11.2.0.1 rac db to 11.2.0.2 in linux
Upgrade 11.2.0.1 rac db to 11.2.0.2 in linuxUpgrade 11.2.0.1 rac db to 11.2.0.2 in linux
Upgrade 11.2.0.1 rac db to 11.2.0.2 in linux
 
Automated master failover
Automated master failoverAutomated master failover
Automated master failover
 
Weblogic Administration Managed Server migration
Weblogic Administration Managed Server migrationWeblogic Administration Managed Server migration
Weblogic Administration Managed Server migration
 
MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...
MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...
MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...
 
Replication skeptic
Replication skepticReplication skeptic
Replication skeptic
 
Summary tables with flexviews
Summary tables with flexviewsSummary tables with flexviews
Summary tables with flexviews
 
MySQL Binary Log API Presentation - OSCON 2011
MySQL Binary Log API Presentation - OSCON 2011MySQL Binary Log API Presentation - OSCON 2011
MySQL Binary Log API Presentation - OSCON 2011
 
MySQL 5.6 Replication Webinar
MySQL 5.6 Replication WebinarMySQL 5.6 Replication Webinar
MySQL 5.6 Replication Webinar
 
Oracle 10g Performance: chapter 11 SQL*Net
Oracle 10g Performance: chapter 11 SQL*NetOracle 10g Performance: chapter 11 SQL*Net
Oracle 10g Performance: chapter 11 SQL*Net
 
Zookeeper Architecture
Zookeeper ArchitectureZookeeper Architecture
Zookeeper Architecture
 

Viewers also liked

CTL Model Checking in Database Cloud
CTL Model Checking in Database CloudCTL Model Checking in Database Cloud
CTL Model Checking in Database CloudGera Shegalov
 
Why Being a Creeper is Awesome
Why Being a Creeper is AwesomeWhy Being a Creeper is Awesome
Why Being a Creeper is Awesomerelak213
 
Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Gera Shegalov
 
Materi 2 teori teori belajar
Materi 2 teori teori belajarMateri 2 teori teori belajar
Materi 2 teori teori belajarNhia Item
 
The Role of Database Systems in the Era of Big Data
The Role  of Database Systems  in the Era of Big DataThe Role  of Database Systems  in the Era of Big Data
The Role of Database Systems in the Era of Big DataGera Shegalov
 
cara membuat fotfolio sains tahun 6
cara membuat fotfolio sains tahun 6cara membuat fotfolio sains tahun 6
cara membuat fotfolio sains tahun 6Muadzam Peace
 
Thermo part 2
Thermo part 2Thermo part 2
Thermo part 2elly_q3a
 
Materi 1 hakekat psikologi
Materi 1 hakekat psikologiMateri 1 hakekat psikologi
Materi 1 hakekat psikologiNhia Item
 
Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant Scale Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant Scale Gera Shegalov
 
Responsive Web Design – Best Practice Approach
Responsive Web Design – Best Practice ApproachResponsive Web Design – Best Practice Approach
Responsive Web Design – Best Practice Approachlet's dev GmbH & Co. KG
 

Viewers also liked (17)

CTL Model Checking in Database Cloud
CTL Model Checking in Database CloudCTL Model Checking in Database Cloud
CTL Model Checking in Database Cloud
 
Regolamento tarsu
Regolamento tarsuRegolamento tarsu
Regolamento tarsu
 
Place
PlacePlace
Place
 
Usl6
Usl6Usl6
Usl6
 
Why Being a Creeper is Awesome
Why Being a Creeper is AwesomeWhy Being a Creeper is Awesome
Why Being a Creeper is Awesome
 
Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013
 
Materi 2 teori teori belajar
Materi 2 teori teori belajarMateri 2 teori teori belajar
Materi 2 teori teori belajar
 
The Role of Database Systems in the Era of Big Data
The Role  of Database Systems  in the Era of Big DataThe Role  of Database Systems  in the Era of Big Data
The Role of Database Systems in the Era of Big Data
 
Ppr1
Ppr1Ppr1
Ppr1
 
cara membuat fotfolio sains tahun 6
cara membuat fotfolio sains tahun 6cara membuat fotfolio sains tahun 6
cara membuat fotfolio sains tahun 6
 
Presentación2
Presentación2Presentación2
Presentación2
 
Fr
FrFr
Fr
 
Thermo part 2
Thermo part 2Thermo part 2
Thermo part 2
 
Materi 1 hakekat psikologi
Materi 1 hakekat psikologiMateri 1 hakekat psikologi
Materi 1 hakekat psikologi
 
Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant Scale Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant Scale
 
Biynees khemjee awah
Biynees khemjee awahBiynees khemjee awah
Biynees khemjee awah
 
Responsive Web Design – Best Practice Approach
Responsive Web Design – Best Practice ApproachResponsive Web Design – Best Practice Approach
Responsive Web Design – Best Practice Approach
 

Similar to Integrated Data Recovery for Web Services

Formal Verification of Web Service Interaction Contracts
Formal Verification of Web Service Interaction ContractsFormal Verification of Web Service Interaction Contracts
Formal Verification of Web Service Interaction ContractsGera Shegalov
 
Formal Verification of Transactional Interaction Contract
Formal Verification of Transactional Interaction ContractFormal Verification of Transactional Interaction Contract
Formal Verification of Transactional Interaction ContractGera Shegalov
 
Reactive Microservices with JRuby and Docker
Reactive Microservices with JRuby and DockerReactive Microservices with JRuby and Docker
Reactive Microservices with JRuby and DockerJohn Scattergood
 
Oracle Drivers configuration for High Availability
Oracle Drivers configuration for High AvailabilityOracle Drivers configuration for High Availability
Oracle Drivers configuration for High AvailabilityLudovico Caldara
 
Oracle Client Failover - Under The Hood
Oracle Client Failover - Under The HoodOracle Client Failover - Under The Hood
Oracle Client Failover - Under The HoodLudovico Caldara
 
Introduction to aop
Introduction to aopIntroduction to aop
Introduction to aopDror Helper
 
Serverless Multi Region Cache Replication
Serverless Multi Region Cache ReplicationServerless Multi Region Cache Replication
Serverless Multi Region Cache ReplicationSanghyun Lee
 
Building Asynchronous Services With Sca
Building Asynchronous Services With ScaBuilding Asynchronous Services With Sca
Building Asynchronous Services With ScaLuciano Resende
 
Anton Moldovan "Load testing which you always wanted"
Anton Moldovan "Load testing which you always wanted"Anton Moldovan "Load testing which you always wanted"
Anton Moldovan "Load testing which you always wanted"Fwdays
 
Vertically Scaled Design Patters
Vertically Scaled Design PattersVertically Scaled Design Patters
Vertically Scaled Design PattersJeff Malnick
 
Circuit breakers for Java: Failsafe, Javaslang-Circuitbreaker, Hystrix and Ve...
Circuit breakers for Java: Failsafe, Javaslang-Circuitbreaker, Hystrix and Ve...Circuit breakers for Java: Failsafe, Javaslang-Circuitbreaker, Hystrix and Ve...
Circuit breakers for Java: Failsafe, Javaslang-Circuitbreaker, Hystrix and Ve...Micha Kops
 
Node.js: Continuation-Local-Storage and the Magic of AsyncListener
Node.js: Continuation-Local-Storage and the Magic of AsyncListenerNode.js: Continuation-Local-Storage and the Magic of AsyncListener
Node.js: Continuation-Local-Storage and the Magic of AsyncListenerIslam Sharabash
 
transactions-advanced for automatic payment.pptx
transactions-advanced for automatic payment.pptxtransactions-advanced for automatic payment.pptx
transactions-advanced for automatic payment.pptxssusereced02
 
Tunning mobicent-jean deruelle
Tunning mobicent-jean deruelleTunning mobicent-jean deruelle
Tunning mobicent-jean deruelleIvelin Ivanov
 
Intro to ReactiveCocoa
Intro to ReactiveCocoaIntro to ReactiveCocoa
Intro to ReactiveCocoakleneau
 
Unit 38 - Spring MVC Introduction.pptx
Unit 38 - Spring MVC Introduction.pptxUnit 38 - Spring MVC Introduction.pptx
Unit 38 - Spring MVC Introduction.pptxAbhijayKulshrestha1
 
Monitoring und Metriken im Wunderland
Monitoring und Metriken im WunderlandMonitoring und Metriken im Wunderland
Monitoring und Metriken im WunderlandD
 
Ebs dba con4696_pdf_4696_0001
Ebs dba con4696_pdf_4696_0001Ebs dba con4696_pdf_4696_0001
Ebs dba con4696_pdf_4696_0001jucaab
 

Similar to Integrated Data Recovery for Web Services (20)

Formal Verification of Web Service Interaction Contracts
Formal Verification of Web Service Interaction ContractsFormal Verification of Web Service Interaction Contracts
Formal Verification of Web Service Interaction Contracts
 
Formal Verification of Transactional Interaction Contract
Formal Verification of Transactional Interaction ContractFormal Verification of Transactional Interaction Contract
Formal Verification of Transactional Interaction Contract
 
Reactive Microservices with JRuby and Docker
Reactive Microservices with JRuby and DockerReactive Microservices with JRuby and Docker
Reactive Microservices with JRuby and Docker
 
Oracle Drivers configuration for High Availability
Oracle Drivers configuration for High AvailabilityOracle Drivers configuration for High Availability
Oracle Drivers configuration for High Availability
 
Oracle Client Failover - Under The Hood
Oracle Client Failover - Under The HoodOracle Client Failover - Under The Hood
Oracle Client Failover - Under The Hood
 
Introduction to aop
Introduction to aopIntroduction to aop
Introduction to aop
 
Serverless Multi Region Cache Replication
Serverless Multi Region Cache ReplicationServerless Multi Region Cache Replication
Serverless Multi Region Cache Replication
 
Building Asynchronous Services With Sca
Building Asynchronous Services With ScaBuilding Asynchronous Services With Sca
Building Asynchronous Services With Sca
 
Anton Moldovan "Load testing which you always wanted"
Anton Moldovan "Load testing which you always wanted"Anton Moldovan "Load testing which you always wanted"
Anton Moldovan "Load testing which you always wanted"
 
Vertically Scaled Design Patters
Vertically Scaled Design PattersVertically Scaled Design Patters
Vertically Scaled Design Patters
 
Circuit breakers for Java: Failsafe, Javaslang-Circuitbreaker, Hystrix and Ve...
Circuit breakers for Java: Failsafe, Javaslang-Circuitbreaker, Hystrix and Ve...Circuit breakers for Java: Failsafe, Javaslang-Circuitbreaker, Hystrix and Ve...
Circuit breakers for Java: Failsafe, Javaslang-Circuitbreaker, Hystrix and Ve...
 
Node.js: Continuation-Local-Storage and the Magic of AsyncListener
Node.js: Continuation-Local-Storage and the Magic of AsyncListenerNode.js: Continuation-Local-Storage and the Magic of AsyncListener
Node.js: Continuation-Local-Storage and the Magic of AsyncListener
 
transactions-advanced for automatic payment.pptx
transactions-advanced for automatic payment.pptxtransactions-advanced for automatic payment.pptx
transactions-advanced for automatic payment.pptx
 
Tunning mobicent-jean deruelle
Tunning mobicent-jean deruelleTunning mobicent-jean deruelle
Tunning mobicent-jean deruelle
 
17-Networking.pdf
17-Networking.pdf17-Networking.pdf
17-Networking.pdf
 
Intro to ReactiveCocoa
Intro to ReactiveCocoaIntro to ReactiveCocoa
Intro to ReactiveCocoa
 
Grover's Algorithm
Grover's AlgorithmGrover's Algorithm
Grover's Algorithm
 
Unit 38 - Spring MVC Introduction.pptx
Unit 38 - Spring MVC Introduction.pptxUnit 38 - Spring MVC Introduction.pptx
Unit 38 - Spring MVC Introduction.pptx
 
Monitoring und Metriken im Wunderland
Monitoring und Metriken im WunderlandMonitoring und Metriken im Wunderland
Monitoring und Metriken im Wunderland
 
Ebs dba con4696_pdf_4696_0001
Ebs dba con4696_pdf_4696_0001Ebs dba con4696_pdf_4696_0001
Ebs dba con4696_pdf_4696_0001
 

More from Gera Shegalov

#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More CapacityGera Shegalov
 
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...Gera Shegalov
 
Logging Last Resource Optimization for Distributed Transactions in Oracle We...
Logging Last Resource Optimization for Distributed Transactions in  Oracle We...Logging Last Resource Optimization for Distributed Transactions in  Oracle We...
Logging Last Resource Optimization for Distributed Transactions in Oracle We...Gera Shegalov
 
Logging Last Resource Optimization for Distributed Transactions in Oracle…
Logging Last Resource Optimization for Distributed Transactions in  Oracle…Logging Last Resource Optimization for Distributed Transactions in  Oracle…
Logging Last Resource Optimization for Distributed Transactions in Oracle…Gera Shegalov
 
Transaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal DatabasesTransaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal DatabasesGera Shegalov
 
Unstoppable Stateful PHP Web Services
Unstoppable Stateful PHP Web ServicesUnstoppable Stateful PHP Web Services
Unstoppable Stateful PHP Web ServicesGera Shegalov
 

More from Gera Shegalov (6)

#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity
 
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
 
Logging Last Resource Optimization for Distributed Transactions in Oracle We...
Logging Last Resource Optimization for Distributed Transactions in  Oracle We...Logging Last Resource Optimization for Distributed Transactions in  Oracle We...
Logging Last Resource Optimization for Distributed Transactions in Oracle We...
 
Logging Last Resource Optimization for Distributed Transactions in Oracle…
Logging Last Resource Optimization for Distributed Transactions in  Oracle…Logging Last Resource Optimization for Distributed Transactions in  Oracle…
Logging Last Resource Optimization for Distributed Transactions in Oracle…
 
Transaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal DatabasesTransaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal Databases
 
Unstoppable Stateful PHP Web Services
Unstoppable Stateful PHP Web ServicesUnstoppable Stateful PHP Web Services
Unstoppable Stateful PHP Web Services
 

Integrated Data Recovery for Web Services

  • 1. Max Planck Institute for Informatics AG5: Databases and Information Systems Integrated Data, Process, and Message Recovery for Failure Masking in Web Services Doctoral Thesis Colloquium German Shegalov funded by Saarbrücken, Aug. 26th, 2005 1
  • 2. Outline  Problem Statement and Background  Interaction Contracts Framework • Formal Specification of the Committed IC • Verification of IC's with model checking • Verification of Web Service IC Model  Implementation: Exactly-Once Web Service (EOS) • Overview • EOS-PHP • Demo  Summary
  • 3. Problem Statement  Non-idempotence (math) • f ( x) ≠ f ( x) , n > 1 n  Non-idempotence (Web, ERP, etc.) • "Request timeout"  "request failure" • "Request send"  "request resend" • 8 Medicare cards for a 3 member family • Order one, get many  , pay many 
  • 4. Transaction Recovery Accounts (LSN=0) Accounts (LSN=3) Number Balance Number Balance 1 1000,00 1 900,00 2 2000,00 2 2100,00  At most once semantics BEGIN TRANSACTION /* LSN= 1: log for undo and redo in MM buffer*/ UPDATE Accounts SET balance = balance – 100,00 WHERE Number = 1 /* LSN = 2: log for undo and redo in MM buffer*/ UPDATE Accounts SET balance = balance + 100,00 WHERE Number = 2 /* LSN = 3: log commit and force (5-6 orders slower)*/ COMMIT TRANSACTION  Redo Committed, Undo Uncommitted • LSN test guarantees idempotence
  • 5. However, …  Transactions alone are not a panacea!!! Web Client Web Application Database Server Server Purchase Request Start Transaction SQL Request SQL Response Timeline SQL Request SQL Response Commit Transaction ACK Order Confirmation Transaction Restart Purchase Request Non-idempotent execution! Resubmission
  • 6. Real-World n-Tier App Don't panic! Peer-to-peer apps may be even worse. Client Web Server Expedia Sabre Sabre Amadeus Expedia Amadeus App Server App Server Server App Server DB1 DB2 DB3 DB4
  • 7. Outline  Problem Statement and Background  Interaction Contracts Framework • Formal Specification of the Committed IC • Verification of IC's with model checking • Verification of Web Service IC Model  Implementation: Exactly-Once Web Service (EOS) • Overview • EOS-PHP • Demo  Summary
  • 8. IC Framework  Components and Guarantees • Persistent Pcom: Persistent, testable state and messages • External Xcom (e.g., humans): No guarantees  Interaction Contracts • Xcom  Pcom = External IC (XIC) • Pcom  Pcom = Committed IC (CIC)  Exactly-Once Semantics • Forget rollbacks, exactly-once execution is guaranteed
  • 9. Pcom Design  Redo Log & Recovery Managers  Piecewise determinism + Logging = Full Determinism  Deterministic replay recovers Pcom's  Installation Points speed up replay  Failure model • Crashes Transient failures due to • Message losses nondeterministic Heisenbugs • No malicious manipulations • No disk corruption (sufficient redundancy)
  • 10. CIC's Informal Design  CIC sender (Pcom) obligations • Persist state before send • Tag message with a MSN • Resend on timeout until stable ack • Resend on receiver's "get msg" • Forget interaction on installed ack  CIC receiver (Pcom) obligations • Eliminates duplicates by MSN's • Persists interaction before stable ack • "gets msg" if msg is not in log after failure • Ensures autonomous recovery before installed ack
  • 11. Committed IC Activities  Activitychart = Functional View EXTERNAL_APP_LOGIC SNDR_TRIGGER MSG_PROCESSED RCVR_CRASH CIC_AC FAILURE_PRONE_ENVIRONMENT @CIC_SC SNDR_CRASH CIC_SNDR_AC CIC_RCVR_AC SEND_MSG LINK_OUTAGE STABLE @CIC_RCVR_SC @CIC_SNDR_SC GET_MSG ICIC SYSTEM_ADMINISTRATOR INSTALLED TIMEOUTS
  • 12. Committed IC Monitor  Statechart = Behavioral View • Finite State Automaton (FSA) + • Nesting + Orthogonal substates + • E[C]/A transitions: on Event while Condition  Leave source, enter target, execute Action  E.g., A = E' means generate event E' CIC_SC SNDR_S (not SNDR_CRASH) •Configuration =active(CIC_SNDR_AC) ]/ SENDING [not set of entered states start!(CIC_SNDR_AC) • Execution context = variable valuation  Stepi: confi  ctxti  confi+1  ctxti+1 RCVR_S (not RCVR_CRASH) RECEIVING [not active(CIC_RCVR_AC)]/ start!(CIC_RCVR_AC)
  • 13. Committed IC Sender CIC_SNDR_SC MSG_LOOKUP MSG_RECOVERED_TM/ SEND_MSG GET_MSG_OK STABLE_S INSTALLED_OK/ SNDR_LAST_LOGGED:='INSTALLED' SNDR_MSG_TM and not (STABLE_OK or STABLE_OK SNDR_STABLE_TM and INSTALLED_OK)/ not (INSTALLED_OK or GET_MSG_OK)/ SEND_MSG IS_INSTALLED INSTALLED_OK/ SENDING SNDR_LAST_LOGGED:='INSTALLED' INSTALLED_S SNDR_ND/ SEND_MSG SNDR_TRIGGER [SNDR_LAST_LOGGED=='INSTALLED'] [SNDR_LAST_LOGGED=='']/ SNDR_LAST_LOGGED PREPARE_PERSISTENCE SNDR_ND RECOVERY SNDR_CRASH T * EVENT_OK = EVENT   LINK_OUTAGE _TM means TIMEOUT
  • 14. Committed IC Receiver CIC_RCVR_SC [RCVR_LAST_LOGGED=='STABLE']/ GET_MSG MSG_RECOVERY not SEND_MSG_OK and SEND_MSG_OK GET_MSG_TM/ SEND_MSG_OK GET_MSG [RCVR_LAST_LOGGED==''] MSG_RECEIVED RECOVERY MSG_EXEC_TM/ RECEIVED; [RCVR_LAST_LOGGED=='INSTALLED'] [RCVR_LAST_LOGGED=='STABLE'] [ICIC]/ MSG_PROCESSED RCVR_LAST_LOGGED:='INSTALLED'; INSTALLED ( RCVR_STABLE_TM or RCVR_ND [MSG_ORDER_MATTERS] ) [not ICIC and RCVR_LAST_LOGGED=='']/ RCVR_LAST_LOGGED:='STABLE'; STABLE RCVR_INSTALL_TM/ RCVR_LAST_LOGGED:='INSTALLED'; INSTALLED RCVR_LAST_LOGGED:='INSTALLED' INSTALLED_R STABLE_R SEND_MSG or IS_INSTALLED/ SEND_MSG or IS_INSTALLED/ STABLE INSTALLED RCVR_CRASH T * EVENT_OK = EVENT  LINK_OUTAGE, _TM means TIMEOUT
  • 15. Execution Abstraction  Kripke structure K=(S,R,L) over P • P is a finite set of atomic propositions • Software: P is a union of all memory bits • S finite set of states • R  S  S state transitions • L  S  P  {true, false} valuation • Non-determinism to determinism p Computation Tree vs. Sequence p p p, q  P q q
  • 16. Computation Tree Logic  Basic Syntax • Atomic propositions P  CTL(P) • If p, q  CTL(P), then so are  Propositional logic formulas (p, p  q, etc.)  Path quantifiers Exists, All + modality neXt, Until  EX p  {E, A} (p U q)  Derived Syntax  AX p  (EX p )  A Finally p  A (true U p)  EF p  E (true U p)  A Globally p  ( E (true U p) )  EG p  ( A (true U p) )
  • 17. CIC Verification  Safety For all log values v  {'stable', 'installed'} AG ( written(log)  log = v  AX AG ¬(written(log)  log = v) ) i.e., a value is written at most once  Liveness for timeouts < 30 steps • F< n eventually after at most n steps • AF<500 AG ¬failures  AF<700 CIC installed
  • 18. IC's & Web Service CUSTOMER HTML_PROMPT BUTTON_CLICKED HTML_REPLY USER1_REQ BROWSER_INPUT BROWSER_OUTPUT <XIC_I_AC <XIC_O_AC @USER1_SC CLICK_CAPTURED WEBSRVR_REP_RCVD WEBSRVR_REQ WEBSRVR_REP <CIC_AC <CIC_AC WEBSRVR_REQ_RCVD APPSRVR2_REP_RCVD APPSRVR1_REP_RCVD APPSRVR1_REQ APPSRVR2_REQ APPSRVR2_REQ_RCVD APPSRVR2_REP APPSRVR1_REP <CIC_AC <CIC_AC <CIC_AC <CIC_AC  Web server reply's SNDR_ND = APPSRVR1_REQ_RCVD App server replies' RCVR_ND = WEBSRVR_ND, XACT_UPDATE XACT_COMMITTED <TIC_AC i.e., commits app server reply order BROWSER_CRASH, WEBSERVER_CRASH, APPSERVER{1;2}_CRASH, DBSRVR_CRASH, AG websrvr_rep:send_msg  XACT_{USER, INTERNAL}_ABORT, WEB_APP{1,2}_LINK_OUTAGE, APP1_DB_LINK_OUTAGE BROWSER_WEBSRVR_LINK_OUTAGE  i=1,2 (appsrvr :rcvr_log=’stable'  LOCAL_FAILURES i GLOBAL_FAILURES
  • 19. Explicit Model Checking  For K = (S, R, L) over P, s  S, f  CTL(P) • s |= f, f  P  L(s, f) = true • s |= f, f =f1  s| f1 • s |= f, f = f1  f2  s|= f1 or s|= f2 • s |= f, f = EX f (s, r)  R with r|= f • s |= f, f = E(f1 U f2)  if s already checked then false else check  if s|= f then true 2  if s|= f1 and (s, r)  R with r|= f then true • s|= f, f = A(f1 U f2)  if s already checked then false else check  if s|= f then true 2
  • 20. Verification Run-Times Verification Property/Specification Type OBDD size Time Integer Timeout ~104 ~5 seconds IC-level safety Nondeterministic ~103 ~1sec. Timeout Integer Timeout ~106 ~10 hours IC-level liveness Nondeterministic ~105 ~10 hours Timeout Integer Timeout ~107 Not terminated 1-user WS safety Nondeterministic ~106 ~10 hours Timeout
  • 21. Outline  Problem Statement and Background  Interaction Contracts Framework • Formal Specification of the Committed IC • Verification of IC's with model checking • Verification of Web Service IC Model  Implementation: Exactly-Once Web Service (EOS) • Overview • EOS-PHP • Demo  Summary
  • 22. PHP and Zend Engine <html> Web Web Script called 5 times Web Web Client Client Client Client 1. <html> server reports: Script called 1000 times Other </html> 2. <?php 3. session_start(); 4. $HTTP_SESSION_VARS["count"]++; 5. printf("Script called %i times", 6. $HTTP_SESSION_VARS["count"]); Zend Engine 7. $ch = curl_init("http://eos-php.net/b2b.php"); 8. $b2b_reply = Session CURL curl_exec($ch); 9. printf("Other server reports: %s", $b2b_reply); 10. curl_close($ch); 11.?> 12.</html> Zend Engine Zend Engine Session CURL Session CURL
  • 23. EOS  Exactly-once semantics with • Transparent browser recovery • Concurrent accesses to shared data • Nondeterm. functions: time, curl_exec, rand • Any n in n-tier, any fanout • Failure masking: no changes to app code neither to PHP scripts, nor to the browser  Performance enhancements (side effects) • Log structured data access (sequential I/O) • LRU buffers for state and log data • Latches (Shared/Exclusive) • session_start(bool $read_only)
  • 24. Experiment Setup  eBay-like auction service  User settings at frontend (private)  Auction items at backend (shared)  5 concurrent end users, synthetic load Frontend Server Backend Server P4 3Ghz, 1GB P4 3Ghz, 1GB POST (ICIC) POST (ICIC) action=increment action=increment b2b=true Web <html> private 1235 private private shared Client <p>Privatel Count: 3 privatecount count count count <p>Shared Count: 1235 count 23 2123 1234 1235 </html> 2 3
  • 25. Run-Time Overhead Frontend Session Server1 step 5 steps Backend 10 steps Server PHP elapsed time [sec] 0.1560 0.7900 1.6100 POST (ICIC) POST (ICIC) action=increment EOS-PHPaction=increment [sec] elapsed time 0.3140 b2b=true 1.6850 3.1000 Overhead (elapsed time) [%] Web <html> 101% 113% 1235 93% private private shared PHP frontend CPU Count: 3 Client <p>Privatel time [sec] private private count count 0.0390 <p>Shared Count: 1235 count 0.2708 0.5727 count count 23 23 12341235 EOS-PHP frontend CPU time [sec] </html> 21 23 0.0815 0.6000 1.1545 Overhead (frontend CPU) [%] 109% 122% 102% PHP backend CPU time [sec] 0.0090 0.0550 0.1200 EOS-PHP backend CPU time [sec] 0.0130 0.0750 0.1600 Overhead (backend CPU) [%] 44% 36% 33%
  • 26. Outline  Problem Statement and Background  Interaction Contracts Framework • Formal Specification of the Committed IC • Verification of IC's with model checking • Verification of Web Service IC Model  Implementation: Exactly-Once Web Service (EOS) • Overview • EOS-PHP • Demo  Summary
  • 27. Summary  Generic IC framework specification  Formal verification at IC and app level • To do: Overcome "model checking" non- scalability  Efficient implementation: EOS • Rigorous recovery guarantees  Based on the formal verified models • Many enhancements to PHP  LRU buffer management  Mostly sequential disk accesses  Concurrency control with latches
  • 28. EOS Demo Frontend Backend Server Server B2C_LINK B2B_LINK USER 1

Editor's Notes

  1. Welcome to my colloquium. Today I present research results of my dissertation entitled &quot;integrated data, process, and message recovery for failure masking in Web Services&quot;.
  2. My presentation consists of the following points. I will state the problem of providing recovery guarantees for multi-tier applications. Then I will introduce our solution comprising a family of recovery protocols coined the &quot;interaction contracts framework&quot;. I show you a generic state-and-activity chart specification of the committed IC easily adaptable to a concrete application scenario. First we verify a single instance of the generic specification. The we prove that it also behaves correctly in a composed Web Service model that uses IC instances as building blocks. In the second part of my talk I present a prototype system, EOS, I have built to demonstrate the IC framework viability for Web services. It enables failure masking in arbitrarily distributed Web applications written in the PHP programming language. Beyond that it provides the recovery guarantees for the end-user by incorporating the IC functionality into the Web browser, specifically, Microsoft Internet Explorer And I conclude the talk with a short summary.
  3. The problem of doing Business over the Internet, or with a distributed Application infrastructure in general can be characterized by the term &quot;non-idempotence&quot;. The mathematic definition of this term is rather simple: the results of a single and multiple function applications are not the same. With a distributed information system, the developers and the users need to realize that a timeout of a request may simply result from high delays during the peak load of the system rather than from a failure. The users have learned that hitting the refresh or a submit button several times is tempting but leads to unexpected results. For instance, a friend of mine applied for a new healthcare insurance and got 8 smart cards for his 3-member family. It does not always sound like a bad deal when you order one and get many goods unless you have to pay for all of them.
  4. A traditional approach of doing business in a failure-prone environment manages the application state in a transactional database. Suppose we have a banking application with accounts stored in a relational table that maps account numbers to corresponding balances. The transaction shown on this slide transfers 100 euros from account 1 to account 2 as indicated by these 2 SQL statements. Declaring this operation sequence as a transaction , using begin and commit statements, guarantees that the sequence is executed atomically, either completely or not at all. A situation where account 1 simply loses 100 euros isn&apos;t possible even if the transaction is interrupted in the middle. To achieve this, each operation is logged ahead. The log entry contains the log sequence number and the information how to undo and redo this operation. Logging is initially done in the main memory. However, on transaction commit all log entries have to be written to disk synchronously, which is 6 orders of magnitude slower. This operation is called log forcing. After a failure the log on disk is analyzed and the operations of committed transactions are redone, whereas the transactions without a commit log entry are undone. Since the database server may fail several times before recovery completion, we need to make sure that undo and redo operations are not applied more than once. This is achieved by stamping the disk pages with the LSN of the most recent operation they reflect. A simple LSN test guarantees recovery idempotence.
  5. Consider now a scenario with a 3-tier Web application where an end user submits a purchase request to the Web Application server. A transaction is started on the database server on behalf of the user. Assume that the database successfully commits the transaction, but the acknowledgement message does not reach the web application server either due to a database server crash or a network failure. Developers handle this failure as usually by retrying the transaction because they assume that the transaction has been aborted, which is not necessarily true as we have seen. Unfortunately this is not the end of story. How is the end user supposed to react on the server timeout message ?? People love hitting the refresh button of the browser. I am aware of some of those in this room. It is a very bad idea because Web servers normally do not eliminate duplicates. The bottom line is that recovery needs to treat messages as well as states to ensure correct execution.
  6. When that simple 3-tier system was complicated. How long does it take to analyze all possible failure combinations and their implications in a system with 10 components spread over 4 tiers. How about ad-hoc interactions in a Peer-to-Peer network .
  7. This problems have motivated the IC framework. It considers applications as consisting of a set of components that exchange messages. In this talk we concentrate on persistent components. They can recreate state and messages after a failure and can determine whether they have executed a particular message. Another relevant component type (external) covers the end users and conventional components outside the IC framework. Interaction contracts define the way how components need to exchange messages to keep the interactions recoverable. We will cover the Committed IC (CIC) in this talks as it is the most important IC in the framework. The main design goal is to ensure the exactly once semantics that guarantees that once an interaction has started, it will be executed exactly once. All failures are masked.
  8. To provide recovery guarantees all Pcoms such as client and server components need to be equipped with logging and recovery capabilities. Unlike database systems, we do not want and do not need to enable undo. Components are piecewise deterministic, they execute deterministically between two consecutive non-deterministic events such incoming messages from other components or reading the system clock. SO, logging of nondeterministic events turns piecewise-deterministic components into truly deterministic ones. We can recreate Pcom&apos;s state and messages by simply replaying the log from some initial state. To accelerate the deterministic replay the component needs to truncate the log on a regular basis. before doing this it has to dump its current state to disk. We call such state dumps &quot;installation points&quot;. Out failure model includes crashes of the sending and receiving components as well as network failures causing message losses. Such transient failures are due to nondeterministic so-called Heisenbugs that are impossible to reproduce to take them out. We do not consider malicious manipulations called commission failures. And we do not deal with the corruption of stable storage as this can be avoided by a sufficient replication.
  9. The CIC can be informally described as follows: By sending a message to a different component the CIC sender commits its state. Usually, it forces the log to disk to make its state and the message recoverable. The sender deterministically tags its message with a unique id, a message sequence number MSN The sender keeps sending the message periodically until it gets a stable notification from the receiver. It keeps the message for the receiver may request the message again after a failure. The sender is released from all of its obligations when it gets an installed notification from the receiver. The CIC receiver eliminates message duplicates based on MSN. It persists an interaction before sending a stable notification to the sender. Normally this is done by logging the message header and forcing the log. The receiver requests the original message from the sender after a failure, when its log contains only the message header. The receiver ensures its autonomous recovery by forcing the complete message to disk or creating an installation point before sending an installed notification to the sender.
  10. We use the state-and-activity chart language to formally specify the interaction contracts. The State-and-Activity chart language is provided with a leading tool for specification of reactive systems Statemate. The specification process begins with an activity chart providing the functional view on the system. Internal activities are represented by solid-line boxes. Dashed-line boxes specify external activities, an execution environment, and external applications. The arrows represent the data flow. Labels indicates which data or events are concerned. In this concrete scenario we specify an activity ensuring that a message is passed from one CIC component to an other one according to the CIC rules in a failure-prone environment that non-deterministically supplies failure events (crashes and link outages). What the application needs to know about it that it should activate the &quot;sender trigger&quot; and await an occurrence of the event &quot;message processed&quot; . This is important, please memorize that. The system administrator specifies the timeout values suitable for the given application along with some other options. The manager may stop the specification process at this stage. Activities are hierarchical and allow for a step-wise refinement. The next employee will say that actually the behavior of the cic activity is controlled by a so-called control activity cic_sc (sc stands for statechart) depicted as a green rounded box and has two further sub-activities: cic_sender and cic_receiver exchanging the messages and notifications as I have described informally before. The behaviors of these subactivities are defined by the corresponding control activities.
  11. A Control activity is defined by a statechart. A Statechart is basically a finite state automaton with some additional features. First again we have nested states. Dashed-lines separate so called orthogonal components that represent processes that run simultaneously. In this case, the orthogonal components are the sndr and rcvr. The system is initialized by entering states through a default transition , a transition without a source state. A state targeted by a default transition is called a default substate. When a state is entered, its orthogonal substates are entered within the very same step. When a state is entered, its default substate is entered in the same step as well . Usual transitions are labeled with event-condition-action rules. The transition is taken if the event was generated in the previous step while the condition was true. When the transition is taken, the action is executed. The action might be as simple as an event generation or starting an activity as in this example o r a complex branching or loop statement. The only purpose of the given statechart is to restart the sender and the receiver activities after a crash. The condition &quot;not active&quot; guards the system from starting duplicate activity instances while the original one is still running. The set of entered states is called a configuration . Current variable valuation define the execution context of the system. Based on the current configuration and execution context the system performs a step by computing a new configuration and execution context.
  12. This is the statechart controlling the behavior of a CIC sender. It is of course impossible to work out all the detail in this short talk. Let us however take a look on some important specification techniques. The systems starts in the default substate recovery. Further behavior depends on the content of the log. If the log is empty, the sender does not start sending, it awaits a trigger event . The log is modeled by a string variable, SNDR_LAST_LOGGED in this example. Log forcing is represented by value assignments to the log variable. A regular message or an acknowledgement is considered delivered i f its generation does not coincide with a LINK_OUTAGE event which is represented by compound events suffixed _OK. before sending message, the sender signals sender nondeterminism. Sending out a message usually commits the order of the received messages. Normal operation can be non-deterministically interrupted by a sender crash event . Transitions originating in a higher-level state dominate all transitions connecting substates. So the sender activity stops due to entering the termination connector represented by an orange circle labeled T. The activity terminates logically when it enters the state &quot;installed&quot;
  13. This statechart defines the behavior of the sender&apos;s counterpart, the receiver component. The difference to the sender is in that the log variable can assume two values: stable and installed. And that log is forced-written only when we have a non-deterministic situation and the message order matters for the given application as specified by the developper. The receiver nondeterminism event is usually coupled with the sndr non determinism events generated by the sender activities running on the same component. Again, the receiver activity terminates logically in the state installed .
  14. Before we start with the verification of the IC we need some additional definitions. A finite state computational system, e.g. a Statemate specification, can be represented as a Kripke structure. It contains a finite state transition graph with nodes labeled with atomic propositions that are valid in this node. These atomic propositions would refer to individual memory bits in a software system. If we unwind the state transition diagram we obtain a computation tree with potentially infinite branches.
  15. A computation tree over the set of atomic propositions P can be characterized by the temporal logic called CTL. Its syntax is inductively defined as shown on this slide. The temporal aspects of the execution paths originating in the given state can be characterized by the Path quantifiers Exists and All combined with the temporal modalities Next and Util, finally, and globally. The modality Finally is used in a sense that some property holds eventually. Globally means that a property holds in every state of a path.
  16. In my dissertation, I have proved many interesting safety and liveness properties using the Statemate&apos;s integrated model checker. I present the most important ones here. I show that my CIC specification for the sender as well as for the receiver never logs an interaction twice. We show for all execution paths that if a value is written to a log variable as indicated by the internal Statemate event written, it is never written again. To show liveness we use the Statemate-specific modality F less than meaning that the property holds eventually after at most so many steps. So I have proved that if failures do no longer occur after at most 500 steps. The CIC terminates after at most 700 steps if the maximum timeout value does not exceed 30 steps. Altogether this shows the exactly once character of the CIC specification
  17. As the next step we would like to specify and verify the interaction contract framework applied to a complex Web Service scenario. We consider a 4-tier application encompassing a browser, a Web server, two application servers, and last but not least a database server. Internal activities are instances of the generic IC specifications. The arrows couple the event MSG_PROCESSED in one interaction with the SNDR_TRIGGER in another one. User submits a request to the Web Server. The web server calls both application servers asynchronously. One app server starts a transaction on the database server. The other responds immediately. When both app server replies arrive, the web server generates a reply to the browser that is displayed to the user. An interesting observation here is that some instances share the same failure events. For example, the sender crash in the web server reply is the same as the receiver crash in the application server reply. Analogously, the sender nondeterminism event of the web server reply and the receiver nondeterminism event in the application server replies are identical. Consequently the web server reply commits the order of the application server reply messages. Which we can verify by stating the following CTL formula. It says that when the web server reply is sent, the application server interaction are already captured in the log.
  18. Explicit model checking is a rather simple recursive algorithm with the quadratic run-time. There are heuristic solutions using ordered binary decision diagrams as in the Statemate&apos;s symbolic model checker. Other model checkers use SAT solvers.
  19. At the end, we learned that we need to make compromises between the realism of the models and their verifiability. A web service model using integer expressions to generate timeouts periodically as it would happen in a real system could not be verified. We succeeded after replacing the integer-based timeouts by nondeterministic 1-bit timeouts, which is a more general case. No engineering tricks however have helped to obtain any results for a multi-user model and for the liveness of the single-user-model.
  20. Now I would like to briefly introduce the prototype system EOS.
  21. I implemented the committed and external interaction contracts for PHP-based Web-services. PHP is a scripting language that is embedded into usual HTML pages. PHP is interpreted by the Zend engine that has a great variety of modules extending the capabilities of the PHP language. With PHP we can manage the application state across multiple HTTP requests using the Session module. There is a number of options of invoking remote Web services to build a complex multi-tier Application. In my work I concentrated on the CURL module. A reply message of a PHP script is normally an HTML page that is displayed by the browser.
  22. Our prototype implements the exactly sematics. It delivers the recovery guarantees to the end-user by implementing the external and the committed interaction contracts for the Internet Explorer. On the PHP side we can recover concurrent request accessing shared objects. We can recover calls to the nondeterminisatic functions, time, curl_exec, and the random number generator rand. We do really support n-tier for any n with any fanout in the call structure. We have enhanced performance of the original PHP implementation with Regard to disk I/Os and made the conccurency control. For instance it is now possible to access the session data read only.
  23. We performed measurements to evaluate the overhead of the interaction contracts in a 3-tier application that has a similar structure as an ebay like auction service. The front-end server manages private user setting that are accessed simultaneously without contention. The backend server manages the current highest bids for auction items that are accessed concurrently. The load was generated by a synthetic load generator Apache Jmeter from 5 different machines
  24. The run-time overhead of EOS-PHP is on average about 100% in terms of both the elapsed and the CPU time. At this price we support failure making which radically simplifies the development process and provides a correct and highly available service to customers.
  25. I conclude my talk. I presented formal specifications of recently proposed interaction contracts that have been just informally described in the original literature. We mathematically proved many safety and liveness properties of the ICs. We have learned that the model checking technology has its limitations due to the state-explosion problem. There are several directions how to cope with this. For example, some researchers have explore opportunities of combining manual induction proofs with the model checker. Last but not least, there are other verification technologies such as theorem proving. Another major part of my dissertation is a viable implementation of the IC framework for PHP-based Web services. We provided rigorous recovery guarantees for applications and end-users at a reasonable price. In the context of this work, we added some brand-new features as well as optimizations to the existing ones for both Internet Explorer and the PHP language.
  26. Thank you very much for your attentions. And I know you have questions.