On the Integration of Real-Time and Fault-Tolerance in P2P Middleware

Faculty of Sciences, University of Porto

On the Integration of Real-Time and Fault-Tolerance
in P2 P Middleware

Rolando Martins

Scientiﬁc Advisors:
Lu´ Lopes, Faculty of Science - University of Porto
ıs
Fernando Silva, Faculty of Science - University of Porto

Rolando Martins On the Integration of RT & FT in P2 P May 7, 2012 1


Target Systems
EFACEC’s Oporto light-train deployment
5 lines, 70 stations, trains multiplexed over 5 lines
70+ computational nodes (peers), 200+ sensors, arbitrary topology
Traﬃc comprised of normal operations, critical events, alarms
Tight timing, e.g., 2s for end-to-end response time
Deployments across cities/regions can be overwhelmingly large
What is needed to support such systems?
Peer-to-peer (P2P) infrastructure that mirrors physical deployment
Combined real-time and fault-tolerance guarantees
Hierarchical abstraction (cells) to scale to large deployments



In Search of a Solution

DDS Video
Streaming

RT
RT+P2P

CORBA RT FT RT+FT RT+FT+P2P P2P

FT+P2P
FT
Pastry

Distributed
storage
CORBA FT
Stheno



Research Challenges and Opportunities

Challenges
FT mechanisms consume additional resources
FT mechanisms add overhead (e.g., additional latency)
Different traffic types have different soft-RT requirements
Different traffic types may require different FT configurations
RT requirements must continue to be met even under faults
Opportunities
P2P infrastructures have network-aware resilience
COTS operating systems have priority-based scheduling,
multi-threading and resource-reservation mechanisms
Proven FT configuration options exist (replication styles)



Research Question

Can we opportunistically leverage and integrate these proven strategies to
simultaneously support soft-RT and FT to meet the needs of our target
systems?



Scope

Non-Goals
Handling value faults and byzantine faults
Formal speciﬁcation and veriﬁcation of the system
Support for hard real-time
Fully optimized implementation
Testing in production (not yet)

Assumptions
Fault model: crash of a peer, message loss
Resource-reservation mechanisms are always available



Stheno: System Architecture



Stheno: Operating-System Interface

Problem: Control and monitor resource usage from userspace
Solution:
Leverage threads, priorities, /proc
Resource reservation
CPU partitioning
Example:
Highly critical surveillance feed has reserved amount of CPU for
processing



Stheno: Support Framework

Problem: Tasks have diﬀerent RT requirements
Solution:
Leverage threading policies
QoS Daemon
Example:
Thread-per-Connection used for critical events in our target system to
achieve low latency



Stheno: P2P Overlay and FT Configuration

Problem: Tailor choice of P2P overlay and FT configuration to
application needs
Solution:
High-level API to support alternative overlays, e.g., P3, Pastry
Leverage proven replication styles, e.g., active, semi-active, passive
Configure replication properties, e.g., number and placement of replicas
Support service discovery
Example:
P3 mirrors regional hierarchy of target system
Active replication for critical tasks needing instantaneous fail-over



Stheno: Core

Problem: Manage services with diﬀerent RT and FT requirements
Solution:
QoS daemon proxy
Service repository
Creator and coordinator of service instances and clients
Delegation of service discovery to the P2 P layer
Example:
Service repository could include RPC, streaming service, etc



Stheno: Application and Services

Problem: Expose system functionalities and configuration options to
the user
Solution:
High-level APIs for querying and configuring different layers
Example:
Create a video streaming service from light-train station and set the
frame rate and replication style



Stheno: Interaction between Layers



Proof-of-Concept Prototype

First prototype implementation in Java had more than 50k SLOC
Current (unoptimized) prototype implementation in C/C++ with
more than 60k SLOC
P3 overlay plugin implementation
CPU resource reservation
Thread priorities: three classes corresponding to low, medium and
high criticality
Threading policies: Thread-per-Connection, Thread-per-Request,
Leader-Followers
Semi-active replication style



Empirical Evaluation

Goals: To quantify
Overhead of fault-tolerance mechanisms with/without faults
Impact of background workload and faults on end-to-end latency
Metrics:
End-to-end latency, jitter, recovery time
Experimental setup:
20 nodes, each quad-core AMD Phenom with 4GB RAM
100 Mbit/s switch
Experimental procedure:
Used a P3 -based overlay, semi-active replication
Run of 1000 invocations
Fault-injection mid-way through each run



End-to-End Latency Results
4 replicas, without resource reservation: max time of 1s/invocation
4 replicas, With resource reservation: max time of 1ms/invocation
104 104
Legend: Legend:
No FT 2 Replicas No FT 2 Replicas
1 Replica 4 Replicas 1 Replica 4 Replicas
103 103

102 102
Latency (ms)

Latency (ms)
101 101

100 100

0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Load (%) Load (%)

(a) Without resource reservation. (b) With resource reservation.

Stheno’s RT+FT support meets and exceeds target system
requirements (2s end-to-end response time, even under a fault)


Fail-over Latency Results
Without resource reservation: max fail-over time of 3s
With resource reservation: max fail-over time of 30ms
104 104
Legend: Legend:
1 Replica 4 Replicas 1 Replica 4 Replicas
2 Replicas 2 Replicas

103
103
Latency (ms)

Latency (ms)
102
102

101

101
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Load (%) Load (%)

(a) Without resource reservation. (b) With resource reservation.

Stheno’s RT+FT provides low fail-over latency that meets
target system requirements


Thesis Contributions

Stheno, an RT+FT+P2 P middleware
Motivated by the timing, reliability and physical deployment
characteristics of our target systems
To the best of our knowledge, Stheno is the first system that
Supports traffic types with different soft-RT requirements
Supports different FT configurations
Supports configurability at multiple levels: P2P, RT and FT
Continues to meet RT requirements even under faults
Implementation of a proof-of-concept prototype
Empirical evaluation demonstrates that
Stheno meets and exceeds target system requirements for end-to-end
latency and fail-over latency



Thank You

Stheno, in Greek mythology, was
the eldest of the three Gorgons.
She was known to be the most
independent and ferocious, hav-
ing killed more men than both
of her sisters combined. (source
Wikipedia)
In many ways, Stheno represents
the complexity of the problem that
we set out to solve.



Publications
Rolando Martins, Lu´ Lopes and Fernando Silva. Lightweight Fault-Tolerance for Peer-to-Peer
ıs
Middleware (full version). Technical Report DCC-2011-01, Department of Computer Science, Faculty
of Sciences, University of Porto, 2011.

Rolando Martins, Priya Narasimhan, Lu´ Lopes, and Fernando Silva. Lightweight Fault-Tolerance for
ıs
Peer-to-Peer Middleware. In Proceedings of the 29th IEEE Symposium on Reliable Distributed Systems
(SRDS’10), pages 313-317, November 2010.

Rolando Martins, Priya Narasimhan, Lu´ Lopes and Fernando Silva. On the Impact of Fault-Tolerance
ıs
Mechanisms in a Peer-to-Peer Middleware. Technical Report DCC-2010-02, Department of Computer
Science, Faculty of Sciences, University of Porto, 2010.

Rolando Martins, Lu´ Lopes, and Fernando Silva. A Peer-to-Peer Middleware Platform for QoS and
ıs
Soft Real-Time Computing. Technical Report DCC-2008-02, Department of Computer Science,
Faculty of Sciences, University of Porto, 2008.

Rolando Martins, Lu´ Lopes, and Fernando Silva. A Peer-To-Peer Middleware Platform for
ıs
Fault-Tolerant, QoS, Real-Time Computing. In Proceedings of the 2nd Workshop on
Middleware-Application Interaction, part of DisCoTec 2008, pages 1-6, New York, NY, USA, June
2008. ACM.


Replication Groups Over Group Communications

(a) Semi-active (b) Passive



Resource Reservation Daemon



Multicore: Examples of CPU Partitioning.

(a) Quad-core partitioning. (b) Six-core partitioning.

(c) Eight-core partitioning.

Core Os: Threads belonging to the operating system
BE: Threads served by SCHED OTHER scheduling policy
RT: Threads served by SCHED {FIFO,RR} scheduling policies
Isolated RT: Isolated RT threads that are isolated from all other
threads present in the system


RT Support: Object-to-Object interactions.

(a) Direct calling with dif- (b) Direct calling within the
ferent partitions. same partition.

(c) Deferred calling with diﬀerent partitions.



Threading Strategies



Minimizing Priority Inversion Through Traﬃc
Demultiplexing



Putting It All Together



Putting It All Together (Continuation)



Execution Context/Execution Model (ECEM) Design
Pattern



Comparison with other Middlewares (RPC)
105
Legend:
Stheno, No QoS TAO
Stheno, QoS RMI
ICE
104
Latency (us)

103

102

101 0 10 20 30 40 50 60 70 80 90 100
Load (%)

Our approach enable us to provide a 200us latency even in the
presence of a 95% CPU workload



Related Work
1 - Decentralized scalability:
Lic´
ınio Oliveira, Lu´ Lopes, and Fernando Silva. P3 : Parallel Peer to Peer - An Internet Parallel Programming
ıs
Environment. In Workshop on Web Engineering & Peer-to-Peer Computing, part of Networking 2002, volume
2376 of Lecture Notes in Computer Science, pages 274-288. Springer-Verlag, May 2002.
A. Rowstron and P. Druschel. Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale
Peer-to-Peer Systems. In Proceedings of the 2nd ACM/IFIP/USENIX International Middleware Conference
(Middleware’01), pages 329-350, November 2001.
2 - Modular FT:
Tudor Dumitra, Deepti Srivastava, and Priya Narasimhan. Architecting and Implementing Versatile
Dependability. In Rog´rio de Lemos, Cristina Gacek, and Alexander Romanovsky, editors, Architecting
e
Dependable Systems III, volume 3549 of Lecture Notes in Computer Science, pages 212-231. Springer Berlin /
Heidelberg, 2005.
P. Bond P. Barrett, A. Hilborne, Lu´ Rodrigues, D. Seaton, N. Speirs, and Paulo Ver´
ıs ıssimo. The Delta-4 Extra
Performance Architecture (XPA). 20th International Symposium on Fault-Tolerant Computing, pages 481-488,
1990.
3 - Resource reservation + CPU partitioning:
Chen Lee, R. Rajkumar and Cliﬀ Mercer, Experiences with Processor Reservation and Dynamic QoS in
Real-Time Mach, In Proceedings of Multimedia Japan, March 1996
Luigi Palopoli, Tommaso Cucinotta, Luca Marzario, and Giuseppe Lipari. AQuoSA - Adaptive Quality of Service
Architecture. Software: Practice and Experience, 39(1):1-31, April 2009.
4 - Real-time support:
Priya Narasimhan, Tudor Dumitras , Aaron Paulos, Soila Pertet, Carlos Reverte, Joseph Slember, and Deepti
Srivastava. MEAD: Support for Real-Time Fault- Tolerant CORBA: Research Articles. Concurrency and
Computation: Practice & Experience 17(12):1527-1545, October 2005.
Douglas Schmidt, David Levine, and Sumedh Mungee. The Design of the TAO Real-Time Object Request
Broker. Computer Communications, 21(4):294-324, 1998.


On the Integration of Real-Time and Fault-Tolerance in P2P Middleware

More Related Content

Viewers also liked

Similar to On the Integration of Real-Time and Fault-Tolerance in P2P Middleware

On the Integration of Real-Time and Fault-Tolerance in P2P Middleware