Faculty of Sciences, University of Porto




On the Integration of Real-Time and Fault-Tolerance
                 in P2 P Middleware

                                  Rolando Martins



                                     Scientific Advisors:
                     Lu´ Lopes, Faculty of Science - University of Porto
                       ıs
                   Fernando Silva, Faculty of Science - University of Porto




 Rolando Martins             On the Integration of RT & FT in P2 P                   May 7, 2012          1
Faculty of Sciences, University of Porto



Target Systems
   EFACEC’s Oporto light-train deployment
         5 lines, 70 stations, trains multiplexed over 5 lines
                70+ computational nodes (peers), 200+ sensors, arbitrary topology
         Traffic comprised of normal operations, critical events, alarms
         Tight timing, e.g., 2s for end-to-end response time
   Deployments across cities/regions can be overwhelmingly large
   What is needed to support such systems?
         Peer-to-peer (P2P) infrastructure that mirrors physical deployment
         Combined real-time and fault-tolerance guarantees
         Hierarchical abstraction (cells) to scale to large deployments




    Rolando Martins          On the Integration of RT & FT in P2 P                 May 7, 2012          2
Faculty of Sciences, University of Porto



In Search of a Solution

                         DDS                                       Video
                                                                 Streaming


                                          RT
                                                        RT+P2P


    CORBA RT FT                   RT+FT            RT+FT+P2P                     P2P


                                                        FT+P2P
                                          FT
                                                                                                 Pastry

                                                                Distributed
                                                                 storage
                       CORBA FT
                                               Stheno




     Rolando Martins              On the Integration of RT & FT in P2 P                     May 7, 2012          3
Faculty of Sciences, University of Porto



Research Challenges and Opportunities


   Challenges
         FT mechanisms consume additional resources
         FT mechanisms add overhead (e.g., additional latency)
         Different traffic types have different soft-RT requirements
         Different traffic types may require different FT configurations
         RT requirements must continue to be met even under faults
   Opportunities
         P2P infrastructures have network-aware resilience
         COTS operating systems have priority-based scheduling,
         multi-threading and resource-reservation mechanisms
         Proven FT configuration options exist (replication styles)




    Rolando Martins       On the Integration of RT & FT in P2 P                 May 7, 2012          4
Faculty of Sciences, University of Porto



Research Question


Can we opportunistically leverage and integrate these proven strategies to
simultaneously support soft-RT and FT to meet the needs of our target
systems?




      Rolando Martins    On the Integration of RT & FT in P2 P                 May 7, 2012          5
Faculty of Sciences, University of Porto



Scope


   Non-Goals
         Handling value faults and byzantine faults
         Formal specification and verification of the system
         Support for hard real-time
         Fully optimized implementation
         Testing in production (not yet)

   Assumptions
         Fault model: crash of a peer, message loss
         Resource-reservation mechanisms are always available




    Rolando Martins       On the Integration of RT & FT in P2 P                 May 7, 2012          6
Faculty of Sciences, University of Porto



Stheno: System Architecture




    Rolando Martins   On the Integration of RT & FT in P2 P                 May 7, 2012          7
Faculty of Sciences, University of Porto



Stheno: Operating-System Interface

   Problem: Control and monitor resource usage from userspace
   Solution:
         Leverage threads, priorities, /proc
         Resource reservation
         CPU partitioning
   Example:
         Highly critical surveillance feed has reserved amount of CPU for
         processing




    Rolando Martins        On the Integration of RT & FT in P2 P                 May 7, 2012          8
Faculty of Sciences, University of Porto



Stheno: Support Framework


   Problem: Tasks have different RT requirements
   Solution:
         Leverage threading policies
         QoS Daemon
   Example:
         Thread-per-Connection used for critical events in our target system to
         achieve low latency




    Rolando Martins       On the Integration of RT & FT in P2 P                 May 7, 2012          9
Faculty of Sciences, University of Porto



Stheno: P2P Overlay and FT Configuration

   Problem: Tailor choice of P2P overlay and FT configuration to
   application needs
   Solution:
         High-level API to support alternative overlays, e.g., P3, Pastry
         Leverage proven replication styles, e.g., active, semi-active, passive
         Configure replication properties, e.g., number and placement of replicas
         Support service discovery
   Example:
         P3 mirrors regional hierarchy of target system
         Active replication for critical tasks needing instantaneous fail-over




    Rolando Martins        On the Integration of RT & FT in P2 P                May 7, 2012          10
Faculty of Sciences, University of Porto



Stheno: Core

   Problem: Manage services with different RT and FT requirements
   Solution:
         QoS daemon proxy
         Service repository
         Creator and coordinator of service instances and clients
         Delegation of service discovery to the P2 P layer
   Example:
         Service repository could include RPC, streaming service, etc




    Rolando Martins        On the Integration of RT & FT in P2 P                May 7, 2012          11
Faculty of Sciences, University of Porto



Stheno: Application and Services


    Problem: Expose system functionalities and configuration options to
    the user
    Solution:
          High-level APIs for querying and configuring different layers
    Example:
          Create a video streaming service from light-train station and set the
          frame rate and replication style




     Rolando Martins        On the Integration of RT & FT in P2 P                May 7, 2012          12
Faculty of Sciences, University of Porto



Stheno: Interaction between Layers




    Rolando Martins   On the Integration of RT & FT in P2 P                May 7, 2012          13
Faculty of Sciences, University of Porto



Proof-of-Concept Prototype


   First prototype implementation in Java had more than 50k SLOC
   Current (unoptimized) prototype implementation in C/C++ with
   more than 60k SLOC
   P3 overlay plugin implementation
   CPU resource reservation
   Thread priorities: three classes corresponding to low, medium and
   high criticality
   Threading policies: Thread-per-Connection, Thread-per-Request,
   Leader-Followers
   Semi-active replication style



    Rolando Martins     On the Integration of RT & FT in P2 P                May 7, 2012          14
Faculty of Sciences, University of Porto



Empirical Evaluation

    Goals: To quantify
          Overhead of fault-tolerance mechanisms with/without faults
          Impact of background workload and faults on end-to-end latency
    Metrics:
          End-to-end latency, jitter, recovery time
    Experimental setup:
          20 nodes, each quad-core AMD Phenom with 4GB RAM
          100 Mbit/s switch
    Experimental procedure:
          Used a P3 -based overlay, semi-active replication
          Run of 1000 invocations
          Fault-injection mid-way through each run



     Rolando Martins        On the Integration of RT & FT in P2 P                May 7, 2012          15
Faculty of Sciences, University of Porto



End-to-End Latency Results
                      4 replicas, without resource reservation: max time of 1s/invocation
                      4 replicas, With resource reservation: max time of 1ms/invocation
                104                                                                                       104
                                   Legend:                                                                                   Legend:
                             No FT            2 Replicas                                                               No FT            2 Replicas
                             1 Replica        4 Replicas                                                               1 Replica        4 Replicas
                103                                                                                       103


                102                                                                                       102
 Latency (ms)




                                                                                           Latency (ms)
                101                                                                                       101


                100                                                                                       100


                      0    10     20     30      40      50    60   70   80   90   100                          0     10    20     30      40      50    60   70   80   90   100
                                                      Load (%)                                                                                  Load (%)

                  (a) Without resource reservation.                                                                 (b) With resource reservation.


                      Stheno’s RT+FT support meets and exceeds target system
                      requirements (2s end-to-end response time, even under a fault)
                          Rolando Martins                            On the Integration of RT & FT in P2 P                                               May 7, 2012          16
Faculty of Sciences, University of Porto



Fail-over Latency Results
                      Without resource reservation: max fail-over time of 3s
                      With resource reservation: max fail-over time of 30ms
                104                                                                                       104
                                   Legend:                                                                                   Legend:
                             1 Replica         4 Replicas                                                              1 Replica         4 Replicas
                             2 Replicas                                                                                2 Replicas


                103
                                                                                                          103
 Latency (ms)




                                                                                           Latency (ms)
                102
                                                                                                          102


                101

                                                                                                          101
                      0    10     20      30     40      50    60   70   80   90   100                          0     10    20      30     40      50    60   70   80   90   100
                                                      Load (%)                                                                                  Load (%)

                  (a) Without resource reservation.                                                                 (b) With resource reservation.


                      Stheno’s RT+FT provides low fail-over latency that meets
                      target system requirements
                          Rolando Martins                            On the Integration of RT & FT in P2 P                                               May 7, 2012          17
Faculty of Sciences, University of Porto



Thesis Contributions

    Stheno, an RT+FT+P2 P middleware
          Motivated by the timing, reliability and physical deployment
          characteristics of our target systems
    To the best of our knowledge, Stheno is the first system that
          Supports traffic types with different soft-RT requirements
          Supports different FT configurations
          Supports configurability at multiple levels: P2P, RT and FT
          Continues to meet RT requirements even under faults
    Implementation of a proof-of-concept prototype
    Empirical evaluation demonstrates that
          Stheno meets and exceeds target system requirements for end-to-end
          latency and fail-over latency




     Rolando Martins       On the Integration of RT & FT in P2 P                May 7, 2012          18
Faculty of Sciences, University of Porto



Thank You


                               Stheno, in Greek mythology, was
                               the eldest of the three Gorgons.
                               She was known to be the most
                               independent and ferocious, hav-
                               ing killed more men than both
                               of her sisters combined. (source
                               Wikipedia)
                               In many ways, Stheno represents
                               the complexity of the problem that
                               we set out to solve.




    Rolando Martins   On the Integration of RT & FT in P2 P                May 7, 2012          19
Faculty of Sciences, University of Porto



Publications
    Rolando Martins, Lu´ Lopes and Fernando Silva. Lightweight Fault-Tolerance for Peer-to-Peer
                       ıs
    Middleware (full version). Technical Report DCC-2011-01, Department of Computer Science, Faculty
    of Sciences, University of Porto, 2011.

    Rolando Martins, Priya Narasimhan, Lu´ Lopes, and Fernando Silva. Lightweight Fault-Tolerance for
                                         ıs
    Peer-to-Peer Middleware. In Proceedings of the 29th IEEE Symposium on Reliable Distributed Systems
    (SRDS’10), pages 313-317, November 2010.

    Rolando Martins, Priya Narasimhan, Lu´ Lopes and Fernando Silva. On the Impact of Fault-Tolerance
                                         ıs
    Mechanisms in a Peer-to-Peer Middleware. Technical Report DCC-2010-02, Department of Computer
    Science, Faculty of Sciences, University of Porto, 2010.

    Rolando Martins, Lu´ Lopes, and Fernando Silva. A Peer-to-Peer Middleware Platform for QoS and
                       ıs
    Soft Real-Time Computing. Technical Report DCC-2008-02, Department of Computer Science,
    Faculty of Sciences, University of Porto, 2008.

    Rolando Martins, Lu´ Lopes, and Fernando Silva. A Peer-To-Peer Middleware Platform for
                       ıs
    Fault-Tolerant, QoS, Real-Time Computing. In Proceedings of the 2nd Workshop on
    Middleware-Application Interaction, part of DisCoTec 2008, pages 1-6, New York, NY, USA, June
    2008. ACM.
     Rolando Martins               On the Integration of RT & FT in P2 P                May 7, 2012          20
Faculty of Sciences, University of Porto



Replication Groups Over Group Communications




             (a) Semi-active                                           (b) Passive


    Rolando Martins            On the Integration of RT & FT in P2 P                    May 7, 2012          21
Faculty of Sciences, University of Porto



Resource Reservation Daemon




   Rolando Martins   On the Integration of RT & FT in P2 P                May 7, 2012          22
Faculty of Sciences, University of Porto



Multicore: Examples of CPU Partitioning.




   (a) Quad-core partitioning.                     (b) Six-core partitioning.




                          (c) Eight-core partitioning.

   Core Os: Threads belonging to the operating system
   BE: Threads served by SCHED OTHER scheduling policy
   RT: Threads served by SCHED {FIFO,RR} scheduling policies
   Isolated RT: Isolated RT threads that are isolated from all other
   threads present in the system
    Rolando Martins       On the Integration of RT & FT in P2 P                May 7, 2012          23
Faculty of Sciences, University of Porto



RT Support: Object-to-Object interactions.




           (a) Direct calling with dif-            (b) Direct calling within the
           ferent partitions.                      same partition.




                     (c) Deferred calling with different partitions.



   Rolando Martins           On the Integration of RT & FT in P2 P                May 7, 2012          24
Faculty of Sciences, University of Porto



Threading Strategies




    Rolando Martins   On the Integration of RT & FT in P2 P                May 7, 2012          25
Faculty of Sciences, University of Porto



Minimizing Priority Inversion Through Traffic
Demultiplexing




    Rolando Martins   On the Integration of RT & FT in P2 P                May 7, 2012          26
Faculty of Sciences, University of Porto



Minimizing Priority Inversion Through Traffic
Demultiplexing




    Rolando Martins   On the Integration of RT & FT in P2 P                May 7, 2012          27
Faculty of Sciences, University of Porto



Putting It All Together




     Rolando Martins   On the Integration of RT & FT in P2 P                May 7, 2012          28
Faculty of Sciences, University of Porto



Putting It All Together (Continuation)




     Rolando Martins   On the Integration of RT & FT in P2 P                May 7, 2012          29
Faculty of Sciences, University of Porto



Execution Context/Execution Model (ECEM) Design
Pattern




    Rolando Martins   On the Integration of RT & FT in P2 P                May 7, 2012          30
Faculty of Sciences, University of Porto



Comparison with other Middlewares (RPC)
                                105
                                              Legend:
                                         Stheno, No QoS     TAO
                                         Stheno, QoS        RMI
                                         ICE
                                104
                 Latency (us)



                                103



                                102



                                101 0   10    20      30   40      50    60   70     80   90   100
                                                                Load (%)



   Our approach enable us to provide a 200us latency even in the
   presence of a 95% CPU workload

    Rolando Martins                          On the Integration of RT & FT in P2 P                      May 7, 2012          31
Faculty of Sciences, University of Porto



Related Work
   1 - Decentralized scalability:
         Lic´
            ınio Oliveira, Lu´ Lopes, and Fernando Silva. P3 : Parallel Peer to Peer - An Internet Parallel Programming
                             ıs
         Environment. In Workshop on Web Engineering & Peer-to-Peer Computing, part of Networking 2002, volume
         2376 of Lecture Notes in Computer Science, pages 274-288. Springer-Verlag, May 2002.
         A. Rowstron and P. Druschel. Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale
         Peer-to-Peer Systems. In Proceedings of the 2nd ACM/IFIP/USENIX International Middleware Conference
         (Middleware’01), pages 329-350, November 2001.
   2 - Modular FT:
         Tudor Dumitra, Deepti Srivastava, and Priya Narasimhan. Architecting and Implementing Versatile
         Dependability. In Rog´rio de Lemos, Cristina Gacek, and Alexander Romanovsky, editors, Architecting
                              e
         Dependable Systems III, volume 3549 of Lecture Notes in Computer Science, pages 212-231. Springer Berlin /
         Heidelberg, 2005.
         P. Bond P. Barrett, A. Hilborne, Lu´ Rodrigues, D. Seaton, N. Speirs, and Paulo Ver´
                                            ıs                                              ıssimo. The Delta-4 Extra
         Performance Architecture (XPA). 20th International Symposium on Fault-Tolerant Computing, pages 481-488,
         1990.
   3 - Resource reservation + CPU partitioning:
         Chen Lee, R. Rajkumar and Cliff Mercer, Experiences with Processor Reservation and Dynamic QoS in
         Real-Time Mach, In Proceedings of Multimedia Japan, March 1996
         Luigi Palopoli, Tommaso Cucinotta, Luca Marzario, and Giuseppe Lipari. AQuoSA - Adaptive Quality of Service
         Architecture. Software: Practice and Experience, 39(1):1-31, April 2009.
   4 - Real-time support:
         Priya Narasimhan, Tudor Dumitras , Aaron Paulos, Soila Pertet, Carlos Reverte, Joseph Slember, and Deepti
         Srivastava. MEAD: Support for Real-Time Fault- Tolerant CORBA: Research Articles. Concurrency and
         Computation: Practice & Experience 17(12):1527-1545, October 2005.
         Douglas Schmidt, David Levine, and Sumedh Mungee. The Design of the TAO Real-Time Object Request
         Broker. Computer Communications, 21(4):294-324, 1998.

    Rolando Martins                 On the Integration of RT & FT in P2 P                        May 7, 2012          32

On the Integration of Real-Time and Fault-Tolerance in P2P Middleware

  • 1.
    Faculty of Sciences,University of Porto On the Integration of Real-Time and Fault-Tolerance in P2 P Middleware Rolando Martins Scientific Advisors: Lu´ Lopes, Faculty of Science - University of Porto ıs Fernando Silva, Faculty of Science - University of Porto Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 1
  • 2.
    Faculty of Sciences,University of Porto Target Systems EFACEC’s Oporto light-train deployment 5 lines, 70 stations, trains multiplexed over 5 lines 70+ computational nodes (peers), 200+ sensors, arbitrary topology Traffic comprised of normal operations, critical events, alarms Tight timing, e.g., 2s for end-to-end response time Deployments across cities/regions can be overwhelmingly large What is needed to support such systems? Peer-to-peer (P2P) infrastructure that mirrors physical deployment Combined real-time and fault-tolerance guarantees Hierarchical abstraction (cells) to scale to large deployments Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 2
  • 3.
    Faculty of Sciences,University of Porto In Search of a Solution DDS Video Streaming RT RT+P2P CORBA RT FT RT+FT RT+FT+P2P P2P FT+P2P FT Pastry Distributed storage CORBA FT Stheno Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 3
  • 4.
    Faculty of Sciences,University of Porto Research Challenges and Opportunities Challenges FT mechanisms consume additional resources FT mechanisms add overhead (e.g., additional latency) Different traffic types have different soft-RT requirements Different traffic types may require different FT configurations RT requirements must continue to be met even under faults Opportunities P2P infrastructures have network-aware resilience COTS operating systems have priority-based scheduling, multi-threading and resource-reservation mechanisms Proven FT configuration options exist (replication styles) Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 4
  • 5.
    Faculty of Sciences,University of Porto Research Question Can we opportunistically leverage and integrate these proven strategies to simultaneously support soft-RT and FT to meet the needs of our target systems? Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 5
  • 6.
    Faculty of Sciences,University of Porto Scope Non-Goals Handling value faults and byzantine faults Formal specification and verification of the system Support for hard real-time Fully optimized implementation Testing in production (not yet) Assumptions Fault model: crash of a peer, message loss Resource-reservation mechanisms are always available Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 6
  • 7.
    Faculty of Sciences,University of Porto Stheno: System Architecture Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 7
  • 8.
    Faculty of Sciences,University of Porto Stheno: Operating-System Interface Problem: Control and monitor resource usage from userspace Solution: Leverage threads, priorities, /proc Resource reservation CPU partitioning Example: Highly critical surveillance feed has reserved amount of CPU for processing Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 8
  • 9.
    Faculty of Sciences,University of Porto Stheno: Support Framework Problem: Tasks have different RT requirements Solution: Leverage threading policies QoS Daemon Example: Thread-per-Connection used for critical events in our target system to achieve low latency Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 9
  • 10.
    Faculty of Sciences,University of Porto Stheno: P2P Overlay and FT Configuration Problem: Tailor choice of P2P overlay and FT configuration to application needs Solution: High-level API to support alternative overlays, e.g., P3, Pastry Leverage proven replication styles, e.g., active, semi-active, passive Configure replication properties, e.g., number and placement of replicas Support service discovery Example: P3 mirrors regional hierarchy of target system Active replication for critical tasks needing instantaneous fail-over Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 10
  • 11.
    Faculty of Sciences,University of Porto Stheno: Core Problem: Manage services with different RT and FT requirements Solution: QoS daemon proxy Service repository Creator and coordinator of service instances and clients Delegation of service discovery to the P2 P layer Example: Service repository could include RPC, streaming service, etc Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 11
  • 12.
    Faculty of Sciences,University of Porto Stheno: Application and Services Problem: Expose system functionalities and configuration options to the user Solution: High-level APIs for querying and configuring different layers Example: Create a video streaming service from light-train station and set the frame rate and replication style Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 12
  • 13.
    Faculty of Sciences,University of Porto Stheno: Interaction between Layers Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 13
  • 14.
    Faculty of Sciences,University of Porto Proof-of-Concept Prototype First prototype implementation in Java had more than 50k SLOC Current (unoptimized) prototype implementation in C/C++ with more than 60k SLOC P3 overlay plugin implementation CPU resource reservation Thread priorities: three classes corresponding to low, medium and high criticality Threading policies: Thread-per-Connection, Thread-per-Request, Leader-Followers Semi-active replication style Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 14
  • 15.
    Faculty of Sciences,University of Porto Empirical Evaluation Goals: To quantify Overhead of fault-tolerance mechanisms with/without faults Impact of background workload and faults on end-to-end latency Metrics: End-to-end latency, jitter, recovery time Experimental setup: 20 nodes, each quad-core AMD Phenom with 4GB RAM 100 Mbit/s switch Experimental procedure: Used a P3 -based overlay, semi-active replication Run of 1000 invocations Fault-injection mid-way through each run Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 15
  • 16.
    Faculty of Sciences,University of Porto End-to-End Latency Results 4 replicas, without resource reservation: max time of 1s/invocation 4 replicas, With resource reservation: max time of 1ms/invocation 104 104 Legend: Legend: No FT 2 Replicas No FT 2 Replicas 1 Replica 4 Replicas 1 Replica 4 Replicas 103 103 102 102 Latency (ms) Latency (ms) 101 101 100 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Load (%) Load (%) (a) Without resource reservation. (b) With resource reservation. Stheno’s RT+FT support meets and exceeds target system requirements (2s end-to-end response time, even under a fault) Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 16
  • 17.
    Faculty of Sciences,University of Porto Fail-over Latency Results Without resource reservation: max fail-over time of 3s With resource reservation: max fail-over time of 30ms 104 104 Legend: Legend: 1 Replica 4 Replicas 1 Replica 4 Replicas 2 Replicas 2 Replicas 103 103 Latency (ms) Latency (ms) 102 102 101 101 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Load (%) Load (%) (a) Without resource reservation. (b) With resource reservation. Stheno’s RT+FT provides low fail-over latency that meets target system requirements Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 17
  • 18.
    Faculty of Sciences,University of Porto Thesis Contributions Stheno, an RT+FT+P2 P middleware Motivated by the timing, reliability and physical deployment characteristics of our target systems To the best of our knowledge, Stheno is the first system that Supports traffic types with different soft-RT requirements Supports different FT configurations Supports configurability at multiple levels: P2P, RT and FT Continues to meet RT requirements even under faults Implementation of a proof-of-concept prototype Empirical evaluation demonstrates that Stheno meets and exceeds target system requirements for end-to-end latency and fail-over latency Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 18
  • 19.
    Faculty of Sciences,University of Porto Thank You Stheno, in Greek mythology, was the eldest of the three Gorgons. She was known to be the most independent and ferocious, hav- ing killed more men than both of her sisters combined. (source Wikipedia) In many ways, Stheno represents the complexity of the problem that we set out to solve. Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 19
  • 20.
    Faculty of Sciences,University of Porto Publications Rolando Martins, Lu´ Lopes and Fernando Silva. Lightweight Fault-Tolerance for Peer-to-Peer ıs Middleware (full version). Technical Report DCC-2011-01, Department of Computer Science, Faculty of Sciences, University of Porto, 2011. Rolando Martins, Priya Narasimhan, Lu´ Lopes, and Fernando Silva. Lightweight Fault-Tolerance for ıs Peer-to-Peer Middleware. In Proceedings of the 29th IEEE Symposium on Reliable Distributed Systems (SRDS’10), pages 313-317, November 2010. Rolando Martins, Priya Narasimhan, Lu´ Lopes and Fernando Silva. On the Impact of Fault-Tolerance ıs Mechanisms in a Peer-to-Peer Middleware. Technical Report DCC-2010-02, Department of Computer Science, Faculty of Sciences, University of Porto, 2010. Rolando Martins, Lu´ Lopes, and Fernando Silva. A Peer-to-Peer Middleware Platform for QoS and ıs Soft Real-Time Computing. Technical Report DCC-2008-02, Department of Computer Science, Faculty of Sciences, University of Porto, 2008. Rolando Martins, Lu´ Lopes, and Fernando Silva. A Peer-To-Peer Middleware Platform for ıs Fault-Tolerant, QoS, Real-Time Computing. In Proceedings of the 2nd Workshop on Middleware-Application Interaction, part of DisCoTec 2008, pages 1-6, New York, NY, USA, June 2008. ACM. Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 20
  • 21.
    Faculty of Sciences,University of Porto Replication Groups Over Group Communications (a) Semi-active (b) Passive Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 21
  • 22.
    Faculty of Sciences,University of Porto Resource Reservation Daemon Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 22
  • 23.
    Faculty of Sciences,University of Porto Multicore: Examples of CPU Partitioning. (a) Quad-core partitioning. (b) Six-core partitioning. (c) Eight-core partitioning. Core Os: Threads belonging to the operating system BE: Threads served by SCHED OTHER scheduling policy RT: Threads served by SCHED {FIFO,RR} scheduling policies Isolated RT: Isolated RT threads that are isolated from all other threads present in the system Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 23
  • 24.
    Faculty of Sciences,University of Porto RT Support: Object-to-Object interactions. (a) Direct calling with dif- (b) Direct calling within the ferent partitions. same partition. (c) Deferred calling with different partitions. Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 24
  • 25.
    Faculty of Sciences,University of Porto Threading Strategies Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 25
  • 26.
    Faculty of Sciences,University of Porto Minimizing Priority Inversion Through Traffic Demultiplexing Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 26
  • 27.
    Faculty of Sciences,University of Porto Minimizing Priority Inversion Through Traffic Demultiplexing Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 27
  • 28.
    Faculty of Sciences,University of Porto Putting It All Together Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 28
  • 29.
    Faculty of Sciences,University of Porto Putting It All Together (Continuation) Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 29
  • 30.
    Faculty of Sciences,University of Porto Execution Context/Execution Model (ECEM) Design Pattern Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 30
  • 31.
    Faculty of Sciences,University of Porto Comparison with other Middlewares (RPC) 105 Legend: Stheno, No QoS TAO Stheno, QoS RMI ICE 104 Latency (us) 103 102 101 0 10 20 30 40 50 60 70 80 90 100 Load (%) Our approach enable us to provide a 200us latency even in the presence of a 95% CPU workload Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 31
  • 32.
    Faculty of Sciences,University of Porto Related Work 1 - Decentralized scalability: Lic´ ınio Oliveira, Lu´ Lopes, and Fernando Silva. P3 : Parallel Peer to Peer - An Internet Parallel Programming ıs Environment. In Workshop on Web Engineering & Peer-to-Peer Computing, part of Networking 2002, volume 2376 of Lecture Notes in Computer Science, pages 274-288. Springer-Verlag, May 2002. A. Rowstron and P. Druschel. Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems. In Proceedings of the 2nd ACM/IFIP/USENIX International Middleware Conference (Middleware’01), pages 329-350, November 2001. 2 - Modular FT: Tudor Dumitra, Deepti Srivastava, and Priya Narasimhan. Architecting and Implementing Versatile Dependability. In Rog´rio de Lemos, Cristina Gacek, and Alexander Romanovsky, editors, Architecting e Dependable Systems III, volume 3549 of Lecture Notes in Computer Science, pages 212-231. Springer Berlin / Heidelberg, 2005. P. Bond P. Barrett, A. Hilborne, Lu´ Rodrigues, D. Seaton, N. Speirs, and Paulo Ver´ ıs ıssimo. The Delta-4 Extra Performance Architecture (XPA). 20th International Symposium on Fault-Tolerant Computing, pages 481-488, 1990. 3 - Resource reservation + CPU partitioning: Chen Lee, R. Rajkumar and Cliff Mercer, Experiences with Processor Reservation and Dynamic QoS in Real-Time Mach, In Proceedings of Multimedia Japan, March 1996 Luigi Palopoli, Tommaso Cucinotta, Luca Marzario, and Giuseppe Lipari. AQuoSA - Adaptive Quality of Service Architecture. Software: Practice and Experience, 39(1):1-31, April 2009. 4 - Real-time support: Priya Narasimhan, Tudor Dumitras , Aaron Paulos, Soila Pertet, Carlos Reverte, Joseph Slember, and Deepti Srivastava. MEAD: Support for Real-Time Fault- Tolerant CORBA: Research Articles. Concurrency and Computation: Practice & Experience 17(12):1527-1545, October 2005. Douglas Schmidt, David Levine, and Sumedh Mungee. The Design of the TAO Real-Time Object Request Broker. Computer Communications, 21(4):294-324, 1998. Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 32