©2016 OpenNFP 1
Open-NFP Summer Webinar Series:
CAANS: Consensus As A Network
Service
Huynh Tu Dang, Daniele Sciascia, Pietro Bressana, 

Han Wang, Ki Suh Lee, Hakim Weatherspoon, 

Marco Canini, Fernando Pedone, and Robert Soulé

Università della Svizzera italiana (USI),

Cornell University, and Université catholique de Louvain
©2016 OpenNFP 2
Outline
• Introduction
• CAANS
• Demo
• Results
• Conclusion
©2016 OpenNFP 3
Introduction
Many distributed problems can be reduced to consensus
▪ E.g., Atomic commit, leader election, atomic broadcast
Consensus protocols are the foundation for fault-tolerant
systems
▪ E.g., OpenReplica, Ceph, Chubby, etc.,
Paxos: one of the most widely used consensus protocols
▪ Paxos is slow
▪ Optimizations: Generalized Paxos, Fast Paxos
©2016 OpenNFP 4
Motivations
Network devices are becoming:
▪ More powerful: NFP-6xxxx, PISA chips, FlexPipe
▪ More programmable: custom pipeline processing
©2016 OpenNFP 5
Motivations
Network devices are becoming:
▪ More powerful: NFP-6xxxx, PISA chips, FlexPipe
▪ More programmable: custom pipeline processing
High level languages:
▪ OpenFlow, PX, P4
Co-design	networks	and	consensus	protocols
©2016 OpenNFP 6
Background: Consensus Problem
©2016 OpenNFP 7
Background on Paxos Consensus
Client Proposer Acceptor Learner
submit	
requests
coordinate	
requests
learn	the	
outcome
}
©2016 OpenNFP 8
Background on Paxos Consensus
Client Coordinator Acceptor
prepare(2)
Distinguished
Proposer
Acceptor
Acceptor
©2016 OpenNFP 9
Background on Paxos Consensus
Client Coordinator Acceptor
promise(2, ø, ø)
proposal: 2
value: ø
Acceptor
Acceptor
Case 1:
acceptor has
NOT accepted
any message
previously
©2016 OpenNFP 10
Background on Paxos Consensus
Client Coordinator Acceptor
promise(2, 1, x)
Acceptor
Acceptor
Case 2:
acceptor has
ACCEPTED
proposal(1,x)
proposal: 2
value: x
©2016 OpenNFP 11
Background on Paxos Consensus
Client Coordinator Acceptor
promise(2,ø, ø)
Acceptor
Acceptor
request(v)
proposal: 2
value: v
©2016 OpenNFP 12
Background on Paxos Consensus
Client Coordinator Acceptor
accept(2,v)
Acceptor
Acceptor
request(v)
proposal: 2
value: v
©2016 OpenNFP 13
Background on Paxos Consensus
Client Coordinator Acceptor
accept(2,v)
Acceptor
Acceptor
request(v)
Learner
Learner
accepted(2,v)
proposal: 2
value: v
©2016 OpenNFP 14
Bottleneck Experiment Setup
Client	+	
Acceptor	+	
Learner
Coordinator	+	
Acceptor	+	
Learner
Acceptor	+	
Learner
©2016 OpenNFP 15
Coordinator Bottleneck
CPU	Usage	(%)
0
25
50
75
100
Client Coordinator Acceptor Learner
Client offered load at maximum rate
Messages are 102 Bytes
Minimum latency is 96 µs
©2016 OpenNFP 16
Acceptor BottleneckCPU	Usage	(%)
0
25
50
75
100
Number	of	Learners
4 8 12 16 20
Acceptor Client Coordinator Learner
To increase replication
factor, we launch multiple
learner processes on
each servers
©2016 OpenNFP 17
CAANS:
Consensus As A Network Service
©2016 OpenNFP 18
Consensus / Network Design Space
Traditional

Paxos
Exploit 

Better Service

Guarantees
Best 

effort
FiFO

Delivery
No lost

messages
Stateless
processing
Persistent 

storage
stateful 

processing
Implement
Paxos

in the network
©2016 OpenNFP 19
Design Goals
A network consensus library:
▪ Compatible with the software library
▪ Independent targets
▪ High consensus throughput
▪ Low end-to-end latency
©2016 OpenNFP 20
CAANS
Consensus as a network service
submit deliver
……
Coordinate	requests
©2016 OpenNFP 21
CAANS
Consensus as a network service
submit deliver
……
Coordinate	requests
©2016 OpenNFP 22
P4Paxos Data Flow
Eth
IPv4
UDP
Paxos
Parser Control Actions
round_tbl
forward_tbl
read_round
forward
Ingress
paxos_tbl
handle_phase1a
drop
handle_phase2a
©2016 OpenNFP 23
P4Paxos Data Flow
is	Paxos?
round_tbl read_round
handle_phase1a
drop
forward
handle_phase2a
paxos_tbl
Packet’s	N		>=		Acceptor’s	N?
N:	proposal	number
type
forward_tbl
Egress https://github.com/open-nfpsw/NetPaxos
©2016 OpenNFP 24
DEMO: Debugging Paxos Acceptor
©2016 OpenNFP 25
DEMO
Paxos Acceptor
▪ Store Accept Message in Registers
▪ Modify Paxos header
▪ Read accepted proposal from Registers
Acceptor	
node
CX-Agilio
Packet	
Generator	
node
Intel	XL710 40	GbE	link Forward	to	“vf0”
©2016 OpenNFP 26
Results: Absolute Performance
©2016 OpenNFP 27
Result: Forwarding LatencyLatency	(ns)
0
300
600
900
*Forwarding Coordinator Acceptor
790
720
40
810
340
40
Agilio-CX NetFPGA	SUME
*Numbers	are	provided	by	other	sources
Packet

Generator
DUT
software				96µs	
														vs.		
hardware		340ns
©2016 OpenNFP 28
Experiment Setup
Clients Coordinator Acceptor
Acceptor
Acceptor
Learner
Learner
Coordinator & Acceptors:
Libpaxos: software processes
*CAANS: hardwares
*	https://github.com/usi-systems/p4paxos
©2016 OpenNFP 29
Experiment Setup
Clients Coordinator Acceptor
Acceptor
Acceptor
Learner
Learner
Clients & Learners:
software processes
©2016 OpenNFP 30
End-to-End Latency
Latency	(us)
0
100
200
300
400
Throughput	(	Messages	/	Second)
10000 25000 30000 90000 120000
Libpaxos CAANS
©2016 OpenNFP 31
CAANS: Fault Tolerance
Throughput			(Msg/S)
0
45000
90000
135000
180000
1 2 3 4 5 6 7
Coordinator	Failure
Acceptor	Failure
Links are down at the 3rd
second.
Coordinator failure: requests
are switched to a software
coordinator
Acceptor failure: Paxos
tolerates failure of a minority
of nodes
©2016 OpenNFP 32
Summary
©2016 OpenNFP 33
Summary
Consensus is important in distributed systems
▪ Optimizations exploit network assumptions
SDN allows network programmability
▪ OpenFlow, P4
CAANS: Co-design Network and Consensus
▪ Implement Paxos in network devices
▪ Achieve high performance
©2016 OpenNFP 34
Future Work
• Checkpoint Acceptor’s log
• Fast fail-over to software/hardware coordinator
• Use kernel-bypass API to increase learner’s performance
• Full-fledged deployment in traditional networks
©2016 OpenNFP 35
Acknowledgements
• Thank Bapi Vinnakota, Mary Pham, Jici Gao, and Netronome
for providing us with a hardware testbed, and support with
using their toolchain
• Thank Gordon Brebner and Xilinx for donating two SUME
boards, providing access to SDNet, performing measurements
• Thank Google! Faculty award helped support this research
©2016 OpenNFP 36
The End
Huynh Tu Dang,

Università della Svizzera italiana (USI)
tudang.github.io
NetPaxos
http://www.inf.usi.ch/faculty/soule/netpaxos.html

Consensus as a Network Service

  • 1.
    ©2016 OpenNFP 1 Open-NFPSummer Webinar Series: CAANS: Consensus As A Network Service Huynh Tu Dang, Daniele Sciascia, Pietro Bressana, 
 Han Wang, Ki Suh Lee, Hakim Weatherspoon, 
 Marco Canini, Fernando Pedone, and Robert Soulé
 Università della Svizzera italiana (USI),
 Cornell University, and Université catholique de Louvain
  • 2.
    ©2016 OpenNFP 2 Outline •Introduction • CAANS • Demo • Results • Conclusion
  • 3.
    ©2016 OpenNFP 3 Introduction Manydistributed problems can be reduced to consensus ▪ E.g., Atomic commit, leader election, atomic broadcast Consensus protocols are the foundation for fault-tolerant systems ▪ E.g., OpenReplica, Ceph, Chubby, etc., Paxos: one of the most widely used consensus protocols ▪ Paxos is slow ▪ Optimizations: Generalized Paxos, Fast Paxos
  • 4.
    ©2016 OpenNFP 4 Motivations Networkdevices are becoming: ▪ More powerful: NFP-6xxxx, PISA chips, FlexPipe ▪ More programmable: custom pipeline processing
  • 5.
    ©2016 OpenNFP 5 Motivations Networkdevices are becoming: ▪ More powerful: NFP-6xxxx, PISA chips, FlexPipe ▪ More programmable: custom pipeline processing High level languages: ▪ OpenFlow, PX, P4 Co-design networks and consensus protocols
  • 6.
  • 7.
    ©2016 OpenNFP 7 Backgroundon Paxos Consensus Client Proposer Acceptor Learner submit requests coordinate requests learn the outcome }
  • 8.
    ©2016 OpenNFP 8 Backgroundon Paxos Consensus Client Coordinator Acceptor prepare(2) Distinguished Proposer Acceptor Acceptor
  • 9.
    ©2016 OpenNFP 9 Backgroundon Paxos Consensus Client Coordinator Acceptor promise(2, ø, ø) proposal: 2 value: ø Acceptor Acceptor Case 1: acceptor has NOT accepted any message previously
  • 10.
    ©2016 OpenNFP 10 Backgroundon Paxos Consensus Client Coordinator Acceptor promise(2, 1, x) Acceptor Acceptor Case 2: acceptor has ACCEPTED proposal(1,x) proposal: 2 value: x
  • 11.
    ©2016 OpenNFP 11 Backgroundon Paxos Consensus Client Coordinator Acceptor promise(2,ø, ø) Acceptor Acceptor request(v) proposal: 2 value: v
  • 12.
    ©2016 OpenNFP 12 Backgroundon Paxos Consensus Client Coordinator Acceptor accept(2,v) Acceptor Acceptor request(v) proposal: 2 value: v
  • 13.
    ©2016 OpenNFP 13 Backgroundon Paxos Consensus Client Coordinator Acceptor accept(2,v) Acceptor Acceptor request(v) Learner Learner accepted(2,v) proposal: 2 value: v
  • 14.
    ©2016 OpenNFP 14 BottleneckExperiment Setup Client + Acceptor + Learner Coordinator + Acceptor + Learner Acceptor + Learner
  • 15.
    ©2016 OpenNFP 15 CoordinatorBottleneck CPU Usage (%) 0 25 50 75 100 Client Coordinator Acceptor Learner Client offered load at maximum rate Messages are 102 Bytes Minimum latency is 96 µs
  • 16.
    ©2016 OpenNFP 16 AcceptorBottleneckCPU Usage (%) 0 25 50 75 100 Number of Learners 4 8 12 16 20 Acceptor Client Coordinator Learner To increase replication factor, we launch multiple learner processes on each servers
  • 17.
  • 18.
    ©2016 OpenNFP 18 Consensus/ Network Design Space Traditional
 Paxos Exploit 
 Better Service
 Guarantees Best 
 effort FiFO
 Delivery No lost
 messages Stateless processing Persistent 
 storage stateful 
 processing Implement Paxos
 in the network
  • 19.
    ©2016 OpenNFP 19 DesignGoals A network consensus library: ▪ Compatible with the software library ▪ Independent targets ▪ High consensus throughput ▪ Low end-to-end latency
  • 20.
    ©2016 OpenNFP 20 CAANS Consensusas a network service submit deliver …… Coordinate requests
  • 21.
    ©2016 OpenNFP 21 CAANS Consensusas a network service submit deliver …… Coordinate requests
  • 22.
    ©2016 OpenNFP 22 P4PaxosData Flow Eth IPv4 UDP Paxos Parser Control Actions round_tbl forward_tbl read_round forward Ingress paxos_tbl handle_phase1a drop handle_phase2a
  • 23.
    ©2016 OpenNFP 23 P4PaxosData Flow is Paxos? round_tbl read_round handle_phase1a drop forward handle_phase2a paxos_tbl Packet’s N >= Acceptor’s N? N: proposal number type forward_tbl Egress https://github.com/open-nfpsw/NetPaxos
  • 24.
    ©2016 OpenNFP 24 DEMO:Debugging Paxos Acceptor
  • 25.
    ©2016 OpenNFP 25 DEMO PaxosAcceptor ▪ Store Accept Message in Registers ▪ Modify Paxos header ▪ Read accepted proposal from Registers Acceptor node CX-Agilio Packet Generator node Intel XL710 40 GbE link Forward to “vf0”
  • 26.
    ©2016 OpenNFP 26 Results:Absolute Performance
  • 27.
    ©2016 OpenNFP 27 Result:Forwarding LatencyLatency (ns) 0 300 600 900 *Forwarding Coordinator Acceptor 790 720 40 810 340 40 Agilio-CX NetFPGA SUME *Numbers are provided by other sources Packet
 Generator DUT software 96µs vs. hardware 340ns
  • 28.
    ©2016 OpenNFP 28 ExperimentSetup Clients Coordinator Acceptor Acceptor Acceptor Learner Learner Coordinator & Acceptors: Libpaxos: software processes *CAANS: hardwares * https://github.com/usi-systems/p4paxos
  • 29.
    ©2016 OpenNFP 29 ExperimentSetup Clients Coordinator Acceptor Acceptor Acceptor Learner Learner Clients & Learners: software processes
  • 30.
    ©2016 OpenNFP 30 End-to-EndLatency Latency (us) 0 100 200 300 400 Throughput ( Messages / Second) 10000 25000 30000 90000 120000 Libpaxos CAANS
  • 31.
    ©2016 OpenNFP 31 CAANS:Fault Tolerance Throughput (Msg/S) 0 45000 90000 135000 180000 1 2 3 4 5 6 7 Coordinator Failure Acceptor Failure Links are down at the 3rd second. Coordinator failure: requests are switched to a software coordinator Acceptor failure: Paxos tolerates failure of a minority of nodes
  • 32.
  • 33.
    ©2016 OpenNFP 33 Summary Consensusis important in distributed systems ▪ Optimizations exploit network assumptions SDN allows network programmability ▪ OpenFlow, P4 CAANS: Co-design Network and Consensus ▪ Implement Paxos in network devices ▪ Achieve high performance
  • 34.
    ©2016 OpenNFP 34 FutureWork • Checkpoint Acceptor’s log • Fast fail-over to software/hardware coordinator • Use kernel-bypass API to increase learner’s performance • Full-fledged deployment in traditional networks
  • 35.
    ©2016 OpenNFP 35 Acknowledgements •Thank Bapi Vinnakota, Mary Pham, Jici Gao, and Netronome for providing us with a hardware testbed, and support with using their toolchain • Thank Gordon Brebner and Xilinx for donating two SUME boards, providing access to SDNet, performing measurements • Thank Google! Faculty award helped support this research
  • 36.
    ©2016 OpenNFP 36 TheEnd Huynh Tu Dang,
 Università della Svizzera italiana (USI) tudang.github.io NetPaxos http://www.inf.usi.ch/faculty/soule/netpaxos.html