Performance assessment of the MASQUE extension for proxying scenarios in the QUIC transport protocol.pdf

Università degli Studi di Trieste
Dipartimento di Ingegneria e Architettura
Master’s degree in Computer and Electronic Engineering
Performance assessment of the MASQUE
extension for proxying scenarios in the QUIC
transport protocol
Graduating:
Alessandro Nuzzi
Supervisor:
Prof. Alberto Bartoli
Co-supervisor:
Prof. Martino Trevisan
ACADEMIC YEAR 2022-2023

Abstract
I protocolli web sono in continua evoluzione e i recenti sviluppi in questo ambito hanno
avuto come obiettivo quello di migliorare le prestazioni di protocolli esistenti al fine
di ottenere una migliore esperienza nella navigazione degli utenti e maggior efficienza
in alcuni ambiti. Privacy e sicurezza rappresentano un altro aspetto fondamentale su
Internet e, per questo motivo, un altro scopo delle ultime innovazioni è stato quello di
proteggere le informazioni sensibili e fornire sicurezza nelle comunicazioni.
Le crescenti esigenze di prestazioni e sicurezza su Internet degli ultimi anni hanno portato
all’emergere di due nuovi protocolli, HTTP/3 e QUIC, che mirano a ottimizzare le comu-
nicazioni su Internet, riducendo i ritardi, migliorando l’affidabilità e garantendo una mag-
giore sicurezza. In particolare, QUIC (Quick UDP Internet Connections) è un protocollo
di trasporto basato su UDP e sviluppato per superare le limitazioni di TCP, riducendo
la latenza, fornendo supporto a migrazione delle connessioni e applicando cifratura agli
header. L’utilizzo di QUIC sta certamente crescendo e rappresenta ad oggi una grossa
fetta del traffico delle più popolari applicazioni, usate quotidianamente da milioni di
utenti.
L’integrazione di tali protocolli in scenari che fanno uso di proxy può però risultare
complessa a causa delle proprietà di sicurezza di QUIC. La cifratura applicata agli header,
infatti, può impedire ai proxy di portare a termine i loro compiti di ispezione e modifica
del traffico, influenzando negativamente le prestazioni. Inoltre, i tradizionali proxy HTTP
non supportano nativamente protocolli non basati su TCP.
Il gruppo di lavoro MASQUE è nato proprio per risolvere il problema di consentire il
proxying di UDP e IP al di sopra di HTTP. Ciò è reso possibile dalla specifica del nuovo
metodo CONNECT-UDP , che consente di creare un tunnel UDP con un proxy, all’interno
del quale vengono scambiati datagrammi QUIC. MASQUE è stato progettato anche per
la protezione di dati sensibili, offrendo diverse garanzie di sicurezza agli utenti. Tali
garanzie includono nascondere l’indirizzo IP del client al server e offuscare la destinazione
del traffico degli utenti ai loro fornitori di servizi Internet, trasferendo questa informazione
al proxy MASQUE.
L’obiettivo di questa tesi è di analizzare le prestazioni di MASQUE in scenari che uti-
lizzano proxy, esaminando varie condizioni di rete come ritardi aggiuntivi e larghezza di
banda limitata. Lo scopo dell’analisi è di quantificare il costo, in termini di prestazioni,
associato all’utilizzo di un proxy MASQUE che fornisca le garanzie di privacy promesse.
Inoltre, le prestazioni di MASQUE saranno confrontate con quelle di altri protocolli
esistenti come QUIC e TCP+TLS. Saranno anche discussi scenari in cui l’utilizzo di
MASQUE potrebbe portare benefici.
I risultati ottenuti rivelano che l’uso di TCP in scenari con proxy sia il più vantaggioso
in termini di throughput e tempi di trasferimento. Allo stesso tempo, l’adozione di
MASQUE può rivelarsi una scelta idonea in determinati contesti che già utilizzano QUIC,
a causa del suo costo relativamente basso in termini di prestazioni.
1

Contents
1 Introduction 4
2 Background 6
2.1 Problem introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 QUIC Transport Protocol . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 QUIC relevant features . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 Proxying in HTTP/1.1 and HTTP/2 . . . . . . . . . . . . . . . . 9
2.2 Problems related to proxying . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Protocol ossification . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Proxying in HTTP/3 with QUIC . . . . . . . . . . . . . . . . . . 11
2.3 Support to QUIC proxying . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.1 QUIC built-in encryption . . . . . . . . . . . . . . . . . . . . . . 11
2.3.2 DATAGRAM extension . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.3 MASQUE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.4 MASQUE objectives and guarantees . . . . . . . . . . . . . . . . 14
3 Related work 16
3.1 MASQUE available implementations . . . . . . . . . . . . . . . . . . . . 16
3.2 QUIC available implementations . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 State of the art and literature review . . . . . . . . . . . . . . . . . . . . 16
3.3.1 MASQUE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3.2 QUIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Related work discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4 Testbed technologies 24
4.1 Software technologies used . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Tools used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Testing environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3.1 Docker-based emulation setup . . . . . . . . . . . . . . . . . . . . 25
4.3.2 Bash scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5 Experimental campaigns 28
5.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.1.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2.1 Delay distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2.2 Packet loss distribution . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2.3 Tools selection and evolution . . . . . . . . . . . . . . . . . . . . . 32
6 Results 32
6.1 Data and graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.1.1 Collected data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.1.2 Data visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.1.3 Considered scenarios . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.2 Low bandwidth scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2

6.2.1 1MB file download . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.2.2 10MB file download . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.3 Medium bandwidth scenario . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.3.1 1MB file download . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.3.2 10MB file download . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.4 High bandwidth scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.4.1 1MB file download . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.4.2 10MB file download . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.5 Other scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.5.1 1MB file download, 100Mbps bandwidth, 10ms delay, variable packet
loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.6 Result discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.7 Limitations and assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 50
7 Conclusions 51
7.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
8 Appendix 53
8.1 Script usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
8.1.1 Inputs and outputs . . . . . . . . . . . . . . . . . . . . . . . . . . 53
8.1.2 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
8.2 Discarded scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
8.2.1 1Mbps bandwidth, 1MB file download . . . . . . . . . . . . . . . 54
8.2.2 10Mbps bandwidth, 1MB file download with curl client . . . . . . 55
8.3 Other plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
8.3.1 Low bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
8.3.2 Medium bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . 60
8.3.3 High bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3

1 Introduction
The QUIC transport protocol, one of the latest additions to web protocols, aims to bring
several enhancements in performance and security with respect to TLS over TCP, in terms
of improved management of head-of-line blocking, connection establishment latency and
connection migration. While TCP is still the most used transport protocol, QUIC usage
is certainly growing across the world, representing a big part of the traffic for the most
popular applications and websites, making it a protocol that is used on a daily basis and
is widely supported.
The usage of QUIC poses challenges in networking scenarios that use proxies, as tradi-
tional HTTP proxies do not natively support non-TCP protocols. HTTP CONNECT
method allows the creation of a TCP tunnel to a HTTP proxy, but there is no equiva-
lent method for UDP traffic. Moreover, QUIC built-in encryption can interfere with the
operation of proxies. The fact that QUIC encrypts almost all of its packet header fields,
such as packet numbers, as well as the payload, can hamper the proxies ability to inspect
and modify traffic, causing impact on performance. The rise of QUIC created the need
for new proxying technologies which would allow HTTP to create tunnels for proxying
any non-TCP-based protocols, such as QUIC.
The MASQUE working group was born and chartered to solve these problems and to allow
the proxying of UDP and IP over HTTP. MASQUE specifies the new CONNECT-UDP
method, that enables tunneling UDP to a proxy over HTTP. Inside this QUIC-secured
UDP tunnel it is possible to exchange unreliable QUIC datagrams.
While the primary goal of the MASQUE working group is providing the building blocks
to effectively use QUIC as a substrate protocol, MASQUE design is also focused on
protecting sensitive information and providing an Internet-accessible node that can relay
client traffic in order to provide privacy guarantees. Such guarantees include hiding the
client IP address from the target server and obfuscating the destination of the client traffic
from their network provider to prevent data collection, by transferring that information
to the MASQUE proxy [52].
The goal of this thesis is to evaluate the performance of the MASQUE proposal in prox-
ying scenarios, testing a set of basic network conditions such as different delays and
bandwidth limits. The evaluation aims to assess the performance impact of employing
a MASQUE proxy to achieve the promised privacy guarantees. Additionally, MASQUE
will be compared to existing protocols such as end-to-end QUIC and TCP+TLS. We will
also discuss in which scenarios the usage of MASQUE might be beneficial.
The achieved results reveal that, while proxying in TCP proved to still be the best choice
in terms of throughput and performance, the usage of MASQUE-based proxying could be
a good choice for certain contexts that use QUIC, given the relatively modest performance
cost associated to its employment.
The work is organised as follows.
Chapter 2 provides an overview of the relevant context, with a brief description of the
QUIC protocol, with special attention to proxying and the mechanisms related to it. It
also describes how MASQUE works and its propositions.
4

Chapter 3 includes a comprehensive examination of existing research, current advance-
ments, and the latest developments in the field, providing a context for the research being
conducted in this thesis and a baseline for the results.
Chapter 4 focuses on the practical aspects of this work, outlining the tools, software
technologies and testing environment used to evaluate the various MASQUE aspects
discussed in this thesis.
Chapter 5 describes the methodologies and the approaches employed in the study and to
address the research objectives or questions.
Chapter 6 presents and discusses the findings and outcomes of the work. It highlights the
data collected, the scenarios considered, and the interpretation of the results in relation
to the research objectives.
Finally, chapter 7 presents the conclusions drawn from the work of this thesis.
This thesis was carried out at the Machine Learning Lab of the University of Trieste.
5

2 Background
2.1 Problem introduction
Web protocols are in a constant state of evolution, and the latest additions to this land-
scape are HTTP/3 and QUIC. These protocols have been standardized with the goal
of enhancing both performance and security in web communications. Middleboxes are
relevant components in many networking scenarios and they include network devices or
software components such as firewalls, NATs, load balancers and, most importantly, prox-
ies, that sit between the source and destination of network traffic to provide a variety
of functionalities, especially in enterprise contexts. For instance, when using a forward
proxy, a client establishes an end-to-end tunnel to a target server with the aid of a proxy
server, allowing multiple clients to route traffic to an external network. The normal op-
eration of proxies is challenged by the new web protocols, such as HTTP/3 and QUIC,
as they implement end-to-end encryption with various implications. Futhermore, traf-
fic through an HTTP/3 proxy would go through two nested layers of encryption and
congestion control, which is known to severely impact performance.
2.1.1 QUIC Transport Protocol
QUIC (Quick UDP Internet connections) [36] is a user-space, stream-based and multi-
plexed transport protocol developed by Google. Initially this protocol was designed and
developed for HTTP but later it was declared as general purpose transport layer proto-
col. QUIC protocol provides security and reliability along with reduced connection and
transport latency. Google has widely deployed QUIC in their servers and is currently
in use [29]. QUIC sits in the application layer unlike other traditional transport layer
protocols such as TCP or UDP and runs on UDP with the idea of overcoming TCP
limitations and being compatible with all the middleboxes [16, 42]. UDP is an unreliable
and best-effort protocol and doesn’t provide any features needed for reliable connection,
so all the connection reliability features are in the application layer. QUIC features such
as congestion control algorithms, stream multiplexing and connection management are
implemented in the user space rather than kernel space, in order to allow developers
to iterate and experiment with different approaches without requiring modifications to
the kernel or operating system. In this way, QUIC provides a more accessible develop-
ment environment, allows for rapid prototyping, and facilitates easier customization and
extensibility for specific use cases or applications.
QUIC advantages over TCP include [42]:
• Lower Connection establishment latency
• Improved Congestion Control
• Multiplexing without head-of-line blocking
• Forward Error Correction
• Connection migration support
In fact, TCP, combined with TLS to guarantee security, integrity and confidentiality,
6

requires 3-4 RTT to establish a connection, introducing a significant overhead. Further-
more, TCP connections are identified by source port number, source IP, destination port
number and destination IP. Port numbers might change when the application or the de-
vice are changed, leading to a re-establishment of the connection. Another limitation of
TCP is half open connections. Receiving data in TCP is passive. It means that dropped
connections are only detected by the sender and receiver has no way of detecting them.
Finally, TCP suffers a limitation called Head-of-Line blocking, which occurs when re-
sources are being held up by some entity in order to complete an action. Head-of-Line
blocking can significantly increase the packet reordering delay, and leads to consume re-
sources and latency in packets being processed by network stack and application. The
QUIC aim is to solve all these problems [16, 42, 40].
The HTTP/3 and QUIC stack is on the rise across the world [29, 47], representing the
48% of traffic for big applications requiring high efficiency such as Facebook, Netflix,
YouTube and Instagram. Considering total network data worldwide, QUIC covers more
than 46% of traffic in Latin America, 42% of traffic in Europe and 32% of traffic in
the United States, targeting enhanced performance compared to TCP as its main goal.
QUIC is also being widely supported and adopted by providers like Cloudflare, Fastly
and Akamai.
2.1.2 QUIC relevant features
In order to achieve a reduced overhead compared to TCP, QUIC takes at most 1 RTT for
fresh connection and 0 RTT for repetitive connections. If a connection is being established
between client and server for the first time, then it must perform 1 RTT handshake in
order to get the necessary information. This is done by the client sending an empty
Inchoate Client Hello (CHLO) packet to the server, also called CI, Client Initial, used
to create a connection. Upon receiving a CHLO packet, the server immediately sends a
Rejection (REJ) message to the client. REJ contains information such as source address
token and server’s certificates. Next time, the client sends a CHLO packet, which can
use the cache credentials from the previous connection to immediately send encrypted
requests to the server. In 0-RTT scenario, server and client have communicated in the
past and no need to establish connection again: the first packet itself carries the data,
there is no need for connection establishment or key exchanges.
About congestion control, QUIC uses TCP Cubic algorithm for congestion control by
default. Unlike TCP, QUIC has pluggable congestion control, allowing for different con-
gestion control algorithms to be used. Furthermore, each packet in QUIC, whether it
is an original transmission or a re-transmission, is assigned a unique sequence number.
In this way, the QUIC sender has the possibility to differentiate between original and
re-transmitted packets without ambiguity, unlike TCP [37, 63]. For even more clarity,
ACK frames in QUIC use a ACK Delay field, carrying the time between the receipt of
the largest packet and the transmission of the acknowledgement packet.
Differently from TCP, where a client opens up multiple TCP connections to server to
fetch data from server, QUIC allows multiple streams over one UDP pipeline. In QUIC,
there is only one UDP connection for transport, while in the usage of TCP, there are
multiple connections being established.
7

Figure 1: All QUIC handshakes [42]
Figure 2: QUIC header format [16]
QUIC also uses the Forward Error Correction (FEC) mechanism that enables the recovery
of lost packets without resorting to re-transmission. QUIC achieves this by supplementing
a group of packets with an FEC packet, in a similar way as RAID (Redundant Array
of Independent Disks) where the FEC packet acts as parity for the packets in the FEC
group [40]. In the event of packet loss, the content of the lost packet can be reconstructed
using the FEC packet and the remaining packets in the group. FEC packets might be or
not be used according to the sender’s preference.
QUIC supports connection migration by keeping Connection ID the same even when a
parameter in the 5-tuple is changed, without ending a connection in case of changes.
QUIC provides built-in security: In TCP, choice of TLS, needing an additional header
to be imposed on top of TCP as well as the extra TLS handshake. The cryptographic
handshake, in QUIC, is part of the usual handshake, in which endpoints can exchange
cryptography parameters. As said before, this mechanism reduces the number of RTTs
needed, going from 3-4 RTTs for TCP to only one RTT [42].
About head-of-line blocking, QUIC works on top of UDP, allowing the network stack
8

TLS 1.3
TLS 1.2+
TLS/SSL
(optional)
TCP TCP
UDP
HTTP 1.1 HTTP/2
HTTP/3
Figure 3: Comparison between HTTP versions using TCP and HTTP/3 using QUIC [22]
to treat QUIC packets as individual datagrams without enforcing any specific order for
their delivery. QUIC receives these datagrams and, if supported by the implementation,
can deliver them to the application layer in any order required. This approach ensures
that packets are promptly received by the application as soon as they are pushed by
the network stack, effectively addressing the head-of-line blocking problem commonly
associated with TCP.
Finally, another inherent feature of QUIC is pacing [35], used to regulate the rate at
which packets are transmitted, ensuring a controlled flow of data. QUIC’s pacing mech-
anism is designed to improve congestion control, reduce packet loss, and enhance overall
network performance. The pacing mechanism in QUIC allows for a more granular control
over packet transmission, enabling a smoother and more efficient utilization of available
network capacity. It helps prevent bursts of packets that may lead to congestion and
provides a more consistent and controlled transmission rate.
In order to further extend the future usability of the protocol and add performance
improvements, several extensions have also been introduced to enhance existing features
or provide new capabilities within QUIC as well as HTTP/3. The most relevant ones for
the goals of this thesis will be discussed in the next paragraphs.
2.1.3 Proxying in HTTP/1.1 and HTTP/2
Proxies are very important devices in Web communication as they enable various net-
work optimizations, enhance security, and facilitate the efficient distribution of network
resources [20]. When using a forward proxy, a client establishes an end-to-end tunnel to
a target server with the aid of a proxy server, allowing multiple clients to route traffic
to an external network. A reverse proxy, on the other hand, routes traffic on behalf of
multiple servers, for example the server sits behind the firewall in a private network and
directs client requests to the appropriate backend server. Proxying is also possible and
very common in TCP. One of the many TCP proxying options is the HTTP CONNECT
method, which transforms HTTPS connections into opaque byte streams. This method
can be used to establish an end-to-end TCP tunnel to a target server via a proxy server.
The way HTTP CONNECT acts and its result depend on the HTTP version used. In
9

HTTP/1.1, a client sends a CONNECT request to the proxy server, which requests that
it opens a TCP connection to the target server and desired port. In case of successful
connection opening, a tunnel is established and two independent TCP connections exist.
This is how a HTTP CONNECT request looks like [32]:
CONNECT server.example.com:80 HTTP/1.1
Host: server.example.com:80
HTTP/2 [59], on the other hand, introduces streams on top of TCP, independent and
bidirectional sequences of frames that can be exchanged between a client and a server
within a single connection. Open streams can concurrently operate in a single connection.
Each stream has an identifier, contained in frames. Usage of streams allows frames
multiplexing: frames from multiple streams are combined and transmitted over a single
byte stream within a TCP connection. This means that a HTTP/2 CONNECT request
converts a stream into an end-to-end tunnel. A HTTP/2 CONNECT has this form [59]:
:method = CONNECT
:authority = target.example.com:443
It’s important to note that, in HTTP/1.1, the TCP packets themselves are not tunneled,
but rather the data on the logical byte stream. In HTTP/2, DATA frames sent by a
client are put into TCP packets and forwarded to the target server.
2.2 Problems related to proxying
2.2.1 Protocol ossification
One very common problem with proxying, present especially in TCP, is protocol ossifica-
tion [30]. When middleboxes are distributed in a network, they need to inspect network
protocols and determine what traffic is acceptable or what is not. Ossification happens
because these devices were deployed based on older versions of protocols, when they used
to have a certain feature set, therefore introducing new features or changes in behavior
risks ending up being considered bad or illegal by middleboxes. This phenomenon can
also occur considering that network operators and manufacturers aim for interoperability
across different devices and vendors. To achieve this, they adhere strictly to standard-
ized protocol specifications, making it difficult to deviate from those specifications or
introduce new features. This can result in traffic being dropped or delayed, because seen
as potentially malicious or unknown. Some middleboxes might also implement feature
disabling policies or similar impairments [30].
In the context of QUIC, traditional HTTP proxies have been designed to support TCP.
HTTP provides the CONNECT method for creating a TCP tunnel to a HTTP proxy, but
lacks a method for performing the same operation with UDP. Considering that HTTP/3
operates over QUIC, which in turn operates over UDP, it is necessary to define a way to
create a UDP tunnel to a to a server acting as proxy over HTTP.
10

2.2.2 Proxying in HTTP/3 with QUIC
QUIC carries application data on streams, unidirectional or bidirectional channels identi-
fied by a unique stream ID that is assigned by the sender. Applications that use streams
must define how they are used. The payload of QUIC packets, after removing packet
protection, consists of a sequence of complete frames (though some packet types do not
include frames). In particular, a STREAM frame is a type of frame that is used to carry
data between the sender and receiver over a particular stream. Differently from HTTP, in
which large frames might span multiple TLS records or TCP segments when sent over the
network, QUIC doesn’t allow fragmentation and a QUIC packet must entirely fit a UDP
datagram. Furthermore, there are also two more key differences in terms of loss scenarios:
first of all, QUIC lets implementations decide how rescheduling of packetization of lost
data is performed and secondly, multiplexed streams are independent from each other,
since STREAM frames might belong to different stream identifiers. HTTP/3 [28], as
QUIC requires, defines how streams are used, while the QUIC layer handles segmenting
HTTP/3 frames over STREAM frames for sending in packets. HTTP/3 also provides
the CONNECT method that works similarly to HTTP/2 CONNECT described earlier,
but it is specifically designed to work with TCP-based connections.
Given the stream-based mode of operation and the structure of QUIC as a transport
protocol built on top of UDP, a proxy, when it receives a QUIC packet from a client,
needs to decapsulate its UDP payload in order to forward it to a target server in the form
of a UDP datagram. Moreover, the proxy needs to know where to send the aforementioned
UDP datagram and needs to follow a standard procedure to open a UDP association, for
example an UDP tunnel, to a target server [21].
In normal stream mode, QUIC packets are encrypted and encapsulated within other
packets, making it difficult for middleboxes to inspect the traffic and apply policy rules.
Middleboxes that are not designed to work with QUIC may treat the traffic as unknown
or potentially malicious, and either block it or apply suboptimal policies, leading to
performance degradation or connection failures.
2.3 Support to QUIC proxying
2.3.1 QUIC built-in encryption
One effective way to fight ossification is to maximize encryption for the communica-
tion, aiming to minimize the visibility of the protocol passing through middle-boxes [16].
Some of QUIC’s built-in features limit this problem [42, 16, 39]. As said previously, QUIC
encrypts the entire communication, including the protocol headers, preventing middle-
boxes from inspecting or modifying the protocol-specific information. This reduces the
reliance on middleboxes that may have been designed for specific protocol versions or
features. Furthermore, QUIC supports version negotiation mechanisms that allow clients
and servers to negotiate the protocol version and capabilities during the initial hand-
shake. This enables the graceful introduction of new protocol versions and features while
maintaining backward compatibility with older versions.
However, QUIC built-in encryption is a double-edged sword in this context. Since QUIC
11

encrypts the entire communication, including the protocol headers, it can limit the ability
of proxies and middleboxes to inspect or modify the protocol-specific information. It pre-
vents these devices from performing deep packet inspection or applying protocol-specific
optimizations, which they may have been designed for, leading to some performance
impacts.
Figure 4: QUIC encryption as opposed to TCP [15]
2.3.2 DATAGRAM extension
To address the first problem about QUIC stream mode, the unreliable DATAGRAM
extension [45] has recently been added to QUIC, providing application protocols running
over QUIC with a mechanism to send unreliable data while leveraging the security and
congestion control properties of QUIC and allowing a reliable QUIC stream to carry
encapsulated datagrams payload. In fact, this extension introduces a new DATAGRAM
mode for QUIC, in addition to the traditional stream mode, in order to allow applications
to take advantage of both a reliable stream and an unreliable datagram flow to the same
peer can benefit by sharing a single handshake and authentication context between a
reliable QUIC stream and a flow of unreliable QUIC datagrams. DATAGRAM frames
are individual messages, unlike a long QUIC stream and do not contain a multiplexing
identifier. They are subject to congestion control, but, when a loss occurs, they are not
retransmitted. Anyway, applications can decide to define identifiers used to multiplex
different kinds of datagrams or flows of datagrams. All application data transmitted
with the DATAGRAM frame, like the STREAM frame, must be protected either by 0-
RTT or 1-RTT keys in order to provide the same security considerations described in the
QUIC specification.
DATAGRAM mode is designed to work better with middleboxes than stream mode be-
cause it uses a simpler packet format that is more easily understood by network devices
such as firewalls, NATs, and load balancers. Just like UDP, DATAGRAM mode can also
be useful for optimizing streaming, gaming, and other real-time network applications.
2.3.3 MASQUE
With the introduction of the DATAGRAM mode, QUIC provides the necessary means to
support proxying goals with the STREAM and DATAGRAM primitives, but the way they
are used is the responsibility of the application layer. The MASQUE (Multiplexed Appli-
cation Substrate over QUIC Encryption) working group [7] has the goal and responsibility
12

to define how an application establishes an end-to-end tunnel, providing instructions to
a proxy server regarding the destination where to send UDP datagrams and where to
receive them from. MASQUE first provided a document describing HTTP Datagrams
[54], a convention for conveying multiplexed, potentially unreliable datagrams inside an
HTTP connection, allowing HTTP/3 to work with QUIC DATAGRAMs, then defined
CONNECT-UDP [51], a new kind of HTTP request that initiates a UDP socket to a
target server. This method was first defined for HTTP/2, then it was ported to HTTP/3
[34]. A client sends an extended CONNECT request to a proxy server, which identifies a
target server in the path section of the pseudo-header. If the proxy succeeds in opening
a UDP socket, it responds with a 2xx (Successful) status code. After this, an end-to-end
flow of unreliable messages between the client and target is possible; the client and proxy
exchange QUIC DATAGRAM frames with an encapsulated payload, and the proxy and
target exchange UDP datagrams bearing that payload. This is how a CONNECT-UDP
request looks like [21, 51]:
:method = CONNECT
:protocol = connect-udp
:scheme = https
:path = /target.example.com/443/
:authority = proxy.example.com
By using the CONNECT-UDP header, the client communicates to the proxy its request to
establish a UDP connection with a specified URI. In the case of HTTP/3, QUIC datagram
frames are employed, enabling the establishment of a proxied and unreliable connection
between the client and the server. This mechanism allows for the transportation of
connections to multiple servers within the same HTTP/3 connection between proxy and
client. The multiplexing and de-multiplexing of these connections are performed using
the Datagram-Flow-Id. This identifier is used to link a datagram flow identifier with
an HTTP message. In turn, flows are conceptually similar to streams, but they do not
provide ordering or reliability. [50, 39].
Figure 5: MASQUE scheme after a successful CONNECT-UDP request [21]
The DATAGRAM frame is made up of a Quarter ID field [53], that links the DATA-
GRAM to the client-initiated bidirectional stream, divided by four, and then a payload.
A Context ID [51] is placed directly after the Quarter Stream ID field. This field is set to
13

zero for UDP packets encoded using HTTP Datagrams. As said previously, fragmenta-
tion is prohibited in QUIC [36] and QUIC DATAGRAM frames have a limited capacity
determined by the QUIC connection configuration and the Path MTU. From this limit,
it is also necessary to subtract the overheads of the UDP datagram header, QUIC packet
header, and QUIC DATAGRAM frame header. This results in tunneled messages with a
limit between 1,200 and 1,300 bytes [21].
Figure 6: A QUIC packet encapsulated into a UDP datagram [21]
With MASQUE, it’s also possible to implement nested tunneling [46, 52]. Supposed to
have a client, a server and two proxies: first, a QUIC connection between the client
and the first proxy is established. After it, another QUIC connection between the client
and second proxy is opened, running through a CONNECT tunnel in the first connection.
Finally, an end-to-end byte stream between the client and the target server, which runs via
a CONNECT tunnel in the second connection, is established. A TCP connection exists
between the second proxy and the target server. When a proxy receives a CONNECT-
UDP request, it can establish a connection either with the specified target or with an
upstream proxy. The proxy can make a decision based on its configuration and routing
rules to determine the appropriate course of action for handling the CONNECT-UDP
request [51].
2.3.4 MASQUE objectives and guarantees
In summary, the DATAGRAM extension in QUIC allows the transport of unreliable and
unordered datagrams within a QUIC connection. MASQUE defines how applications
can establish end-to-end tunnels over QUIC using the CONNECT-UDP method, which
instructs a proxy to establish a UDP connection. In this way:
• The client sends a CONNECT-UDP request to the proxy, specifying the target
URI.
• The proxy can either directly connect to the target or forward the request to an
upstream proxy.
14

Figure 7: An example of nested MASQUE proxying in iCloud Private Relay [21]
• If the proxy establishes a direct connection, it uses the DATAGRAM extension to
transport unreliable datagrams.
• If the proxy forwards the request, the same process continues with the upstream
proxy.
The usage of a MASQUE proxy should provide several privacy guarantees [52]. In par-
ticular, user agents running their traffic through a MASQUE proxy will have their IP
address hidden from the target server, that will have access only to the proxy address.
Furthermore, obfuscation provided by the usage of MASQUE should prevent network
providers from collecting the data obtained from the network traffic in ways that go
against the user’s intentions. To further increase client privacy towards the target server,
a MASQUE proxy can also perform address translation or, when the proxy is closer to the
client, DNS resolution. Nested tunneling, like the one used by iCloud Private Relay [17],
provides enhanced privacy by nesting MASQUE tunnels, allowing traffic to pass through
multiple MASQUE proxies to prevent user data correlation. Finally, network observers
being able to inspect unencrypted bits in the QUIC connection would not be able to
assess the usage of MASQUE, since data appears to be identical to the unencrypted data
of a typical web browsing connection. In other words, the MASQUE proxy would be seen
like a regular Web server.
Having listed and discussed the MASQUE value proposition, the objective of this thesis
is to evaluate MASQUE performance in various network scenarios. In this way, it will
be possible to assess the performance cost of its adoption and discuss the contexts that
might find its usage beneficial.
15

3 Related work
3.1 MASQUE available implementations
iCloud Private Relay [17] is a first implementation of QUIC proxying using MASQUE,
in particular with nested tunneling. Supposed to have a client connected to a network
and a server, two devices sit between client and server. These devices are called Ingress
Relay and Egress Relay. The network and the Ingress Relay can see the client IP address,
but the server name is encrypted and therefore not visible from them. This encrypted
data is sent from the Ingress to the Egress Relay (provided, for example, by Cloudflare
[19] or Akamai) for it to forward the traffic to a target server. Similarly to before,
the Egress Relay will only know that the sender uses iCloud Private Relay, but not its
IP address. MASQUE is also implemented in the Google implementation of QUIC and
related protocols that powers Chromium as well as Google’s QUIC servers and some other
projects, residing in the Google quiche library [26]. It provides a simple MASQUE client
and proxy to be used with the already existing QUIC server tool. Other implementations
also exist on GitHub, such as Masquerade [62], written in Rust and based on Cloudflare
quiche library [4], and masque-go [56], written in Go and based on the quic-go library
[8].
3.2 QUIC available implementations
There currently is a wide range of implementations of QUIC [23] in the form of libraries
and frameworks written in many programming languages. The most important ones
include Microsoft’s msquic [24], lsquic [58], more focused on speed, Cloudflare’s quiche
[4], Google quiche [26], currently used in Chromium and Envoy, quic-go and Mozilla’s
Neqo [25]. Moreover, many popular web servers, such as NGINX and Apache, have
added QUIC support in their implementations, as well as the command-line utility curl
in its development build. Microsoft recently introduced SMB (Server Message Block)
over QUIC [44] in order to provide an alternative to the TCP network transport to its
own file-sharing protocol.
3.3 State of the art and literature review
The following paragraphs will provide an overview of the state of the art of QUIC and
MASQUE, with a focus on benchmarks and performance analyses.
3.3.1 MASQUE
Towards a tectonic traffic shift? Investigating Apple’s new relay network
This paper [48] aims to provide researchers and network operators with an analysis of
iCloud Private Relay, including its goals, architecture, and behavior. By examining the
system, the paper aims to offer valuable insights and expectations for researchers and
network operators as the service gains popularity.
Differently from this thesis, this paper does not conduct any performance assessment,
but it rather focuses on providing an overview of iCloud Private Relay and its effect on
16

the Internet.
Goals:
• Analysing iCloud Private Relay from a networking point of view, such as IP address
providers, geographical distribution and interaction between Ingress and Egress
Relays
• Assessing privacy and security implications and enhancements introduced
Measurements and setup:
• Local DNS server with the ECS extension, local Web Server, MacBook Pro with
PR enabled and RIPE Atlas measurement platform
• Collection of Ingress Relays IPv4 addresses using A-type DNS queries with the ECS
extension to the Authoritative Name Server, to obtain an IP address near to the
requester.
• Collection of Ingress Relays IPv6 addresses using AAAA measurements on RIPE
Atlas
• Scans using curl, ipecho and a local DNS resolver to assess the relationship between
Ingress and Egress Relays
Results:
• Most of Ingress IP addresses are provided by Akamai and only a small part by
Apple
• Egress nodes are provided by Akamai, Fastly and Cloudflare and are distributed
mostly in the USA
• IP addresses are periodically changed for each client, confirming Apple privacy
promises
Measuring the Performance of iCloud Private Relay
In this work [61], the authors focus on analyzing the impact of iCloud Private Relay
on web performance, assessing Apple declarations, in particular the one that states that
though iCloud Private Relay can negatively affect web speed tests due to the usage of
several simultaneous connections to deliver the highest possible result, actual browsing
experience remains fast. In fact, the architecture of the service can introduce computa-
tional and routing overheads, as well as performance hamperings.
The goals and the methodology of this work are very similar to the ones this thesis is
based on. On the other hand, this work focused more on the transfer time measurements
and its methodology heavily relied on traffic control.
Goals:
• Performance assessment of iCloud Private Relay in different scenarios and from
different countries (Italy, France and Hawaii)
17

• Collection of Quality of Experience metrics such as throughput, download time and
page loading time
• MacBook Pro with MacOS Monterey located in Trieste, Lyon and Hawaii and
connected to Gbit/s University Ethernet. Automated testing using scripts that
enable and disable PR according to necessity
• Throughput measurements using Ookla Speedtest with Selenium for automating
tasks such as accepting the cookie policy and pressing the start button, running
100 tests with PR enabled and disabled
• Transfer time measurements downloading a 1GB test file from Hetzner using curl,
running 200 tests with PR enabled and disabled
• Page load time measurements, collecting the onLoad parameter with and without
PR, visiting 100 most popular websites 5 times each
Results:
• Performance degradation with iCloud Private Relay enabled due to different traffic
paths introduced by PR, observing a 60% increase in page load time in some cases.
• Generally unstable throughput whose value strongly depends on the Egress nodes
chosen.
• Performance impairments also occur in cases where a single connection is used to
download a large file, thus questioning the claim that several simultaneous connec-
tions are the root cause of performance penalties.
Evaluation of QUIC-based MASQUE proxying
In this paper [41], the authors investigate impacts on end-to-end QUIC performance when
using a MASQUE-based tunnel setup. Specifically, this analysis focuses on the compari-
son between reliable and unreliable transmission of packets in the tunnel connection. It
also explores the effects of nested congestion control between the inner connection and
the outer tunnel connection.
Almost the same traffic control and containerization techniques employed in this work
have been used in this thesis. Moreover, while this paper sets less complex network
conditions, it deals with different protocol configurations, such as congestion control
algorithms and packet size.
Goals:
• Performance assessment of QUIC using MASQUE tunneling, comparing end-to-end
and tunneling scenarios, both in stream mode and DATAGRAM mode
• Analysis of the nested congestion control
18

• Modified version of aioquic to enable MASQUE and allow package size setting and
pluggable congestion control. Docker for containerization of client, server and proxy
and tc for traffic control with defined delay, buffer depth and throughput
• Two different packet sizes for end-to-end mode and tunneled mode, 1380 bytes and
1280 bytes respectively
• 25 HTTP/3 requests to the server with specified payload size, with a preliminary
CONNECT-UDP request for the tunneled case and a simple redirection using ipt-
ables in the proxy for the end-to-end case
• Transmission times and byte overhead registered in qlog files
• Fixed one-way delay of 25ms between proxy and server, variable delay, from 1ms to
50ms, between client and proxy. 10MB message to measure the transmission time
of
• Three congestion control scenarios in the proxy: Reno and Cubic in DATAGRAM
and stream mode, disabled congestion control in DATAGRAM mode, Reno in end-
to-end mode.
Results:
• DATAGRAM mode has smaller transfer times when packets are bigger, but has a
larger overhead when using more than one connection in download. Stream mode
has a performance degradation when the packet size exceeds the MTU. Packet loss
is higher, as well as the additional overhead due to more connection, but transfer
times are better when using parallel connections
• DATAGRAM mode has smaller transfer times too, no matter what CC algorithm
is used. In DATAGRAM mode, Reno has better transfer times when the one-way
delay is bigger, while Cubic has longer transfer times when used in stream mode
Real-time Emulation of MASQUE-based QUIC Proxying in LTE networks
using ns-3
This paper [49] introduces a new net device within the ns-3 open source network sim-
ulator, which enables end-to-end real-time emulation of LTE networks with actual end-
points. The rationale, design, and prototype implementation of this novel net device are
presented. Subsequently, the performance evaluation of a QUIC proxy implemented on
MASQUE is demonstrated using the emulated LTE setup.
Similarly to this paper, this thesis categorizes measurement campaigns based on the
bandwidth limit. On the other hand, it does not analyse LTE networks specifically, but
focuses on emulating network conditions that could resemble several network types.
Goals:
• Performance analysis of QUIC with MASQUE in LTE networks, emulated using
ns-3 in high throughput and low throughput scenarios
19

• ns-3 for discrete-event network simulator with the implementation of a novelty mod-
ule that combines two existing modules, respectively for connection to an emulated
LTE network and reading and writing traffic using file descriptors. Four Docker
containers for client, proxy and server and the emulated LTE network. The com-
munication with the proxy is implemented with a device called RightNode.
• The client packet has a fixed path: from eNodeB to SGW and then to PGW. When
it arrives at the RightNode, its IP address is changed to the User Equipment one.
• RightNode, eNodeB and User Equipment are disposed as a rectangular triangle and
distanced of 99m
• High and low throughput measurements are performed, based on the RB size and 10
tests consisting in downloading a 10MB file are executed in three scenarios: DNAT
forwarding (no MASQUE), DATAGRAM mode MASQUE, stream mode MASQUE
Results:
• Low throughput: 31% bigger transfer times in stream mode, 4.5% bigger transfer
times in DATAGRAM mode, smaller RTT than DNAT
• High throughput: 19% bigger transfer times in stream mode, 6% bigger transfer
times in DATAGRAM mode, generally greater RTT
3.3.2 QUIC
Evaluating QUIC Performance over Web, Cloud Storage and Video Work-
loads
This paper [57] assesses the performance of QUIC in various workloads, including web,
cloud storage, and video applications. It compares the performance of QUIC to that of
traditional TLS/TCP protocols.
Goals:
• Assessing the performance of QUIC web, cloud storage and video workloads, also
comparing different QUIC versions
• Four active measurement tests written in C: two for each scenario (TLS and QUIC),
using respectively libcurl and lsquic, that connect to a web servers, and two for video
downloads and video streaming, mimicking the playout of Youtube on the command
line.
• 3 years long measurements from different vantage points: an educational network,
a high-bandwidth low-RTT residential link in Germany and a low-bandwidth high-
RTT residential link in India.
• Several metrics collection targeting almost 6K websites supporting QUIC, extracted
from Alexa Top-1M, also comparing several versions of Google QUIC, also called
20

gQUIC (Q050, Q046, Q044, Q043, Q039 and Q035). These metrics include con-
nection time, download time, DNS lookup, protocol version and time to first byte
(TTFB)
• Throughput measurements by uploading files of different sizes (namely 1 KB to 2
GB) to Google Drive and repeatedly downloading them using the tests with various
QUIC versions and TLS 1.2 over TCP.
• Connection establishment times, achievable throughput and CPU utilization mea-
surement in video downloading scenarios.
• CPU utilization measurements when downloading large files from Google Drive.
Results:
• For web workloads, QUIC demonstrates faster handshake times compared to TLS
1.2 and 1.3 over both IPv4 and IPv6. In particular, the IETF QUIC version exhibits
approximately 50% lower latency than gQUIC versions in half of the samples over
IPv6. Among the gQUIC versions tested, Q050 performs the best in terms of
latency. However, as the connection state prolongs, the latency benefits of QUIC
diminish.
• For cloud storage workloads, QUIC has a higher mean throughput for small file
sizes. However, for larger file sizes ranging from over 20 MB to 2 GB, TLS/TCP
performs better. This is because for smaller files, the connection times and time
to first byte (TTFB) have a greater impact on the total download time, but this
advantage diminishes as the file size increases. QUIC also exhibits high CPU usage,
because of certain in-kernel optimizations missing for UDP flows.
• For video download, QUIC reduces connection times by 550 ms in India (410 ms
in Germany) compared to TLS 1.2 for half of the samples. In terms of overall
download rate, TLS 1.2 performs better than QUIC.
• For video streaming, TLS/TCP experiences longer startup delays compared to
QUIC. This performance gap widens in networks with higher packet loss. De-
spite having a lower overall download rate, QUIC provides better video content
delivery with reduced stall events and shorter stall durations. This improvement
is attributed to QUIC’s reduced latency overheads and more efficient loss recovery
mechanism.
How quick is QUIC?
The authors of this paper [43] present a comprehensive study about the performance of
QUIC, SPDY and HTTP particularly about how they affect page load time at the time
in which QUIC was a very recent protocol.
Goals:
• Present the performance of QUIC in wide range of live network scenarios, starting
from Google early research results and comparing it to HTTP and SPDY
21

• Client Ubuntu laptop with Chrome browser and browser caching disabled to down-
load the context of websites in HAR format. Server hosting four different pages
from Google Sites with either small sized (400B-8KB) or large sized (128KB) and
either small number or large number (50) of objects. Shaper server with tc between
client and server to emulate network conditions.
• Page load time measurements at 2 Mbps, 10 Mbps and 50 Mbps, with either no
loss or 2% loss in upstream and downstream and with an additional delay of either
zero or 100ms.
Results:
• QUIC performs poorly under very high bandwidth when large amount of data needs
to be downloaded, but it performs very good compared to HTTP and SPDY under
high RTT values especially when the bandwidth is low
• Small object size favors QUIC and SPDY against HTTP due to multiplexing
Measuring HTTP/3: Adoption and Performance
After the final standardization of HTTP/3 and its added support to QUIC, the aim of
the authors [60] is to assess the adoption and the performance of HTTP/3 in various
network scenarios, also with respect to the previous versions. The paper also evaluates
the usage of third-party servers still using HTTP/2 by websites.
Goals:
• Running a first comprehensive large-scale measurement study on the adoption and
performance of HTTP/3.
• Starting list with more than 14k websites with HTTP/3 support. Two high-end
servers connected to the Internet via a university 1 Gbit/s Ethernet.
• Network configuration using tc with four different configurations for each of the
following three parameters: extra latency, extra packet loss and bandwidth limit
• Automated website visiting using Google Chrome and BrowserTime with three
HTTP versions: HTTP/1.1, HTTP/2 and HTTP/3. For each network configura-
tion, four scenarios are considered, based on the HTTP version support: three for
each version and one for all of them.
• Collection, via BrowserTime, of the onLoad and SpeedIndex metrics, corresponding
respectively to the time in which all elements of the page have been downloaded
and parsed and the time at which visible portions of the page are displayed.
• More than 2.5 million visits over a period of one month
Results:
• Almost all the HTTP/3-supporting websites are hosted by Google, Facebook and
Cloudflare, though most of web page objects are still hosted on third-party servers
without HTTP/3 support
22

• Most performance benefits were achieved only in scenarios with high latency or
poor network bandwidth
• The performance improvements primarily rely on the infrastructure that hosts the
website, potentially due to optimizations implemented on the server-side infrastruc-
ture. The number of connections required to load objects also affects the benefits
of the protocol (the fewer, the better)
Same Standards, Different Decisions: A Study of QUIC and HTTP/3 Imple-
mentation Diversity
The main focus of this paper lies predominantly on the various QUIC implementations
that are currently available, delving into their features, functionalities, and performance
characteristics.
Goals:
• Running a comparative analysis of several QUIC available implementations to ex-
plore protocol aspects that pose challenges for automated measurements and are
anticipated to exhibit significant heterogeneity across different implementations,
instead of interoperability.
• Running of each test at least 5 times on two different Belgian WAN networks: the
Hasselt University network (1 Gbps downlink/10Mbps up) and second a residential
Wi-Fi network (35Mbps/2Mbps).
• Usage of qlog and qvis, to evaluate 15 active IETF QUIC implementations, and
scripts to automatically postprocess such files. Quic-interop runner for client-side
behaviour analysis and aioquic for server-side behaviour analysis. No servers are
manually run, but public Internet endpoints are used.
• Assessment of features such as flow control, congestion control, multiplexing, pri-
oritization and packetization and 0-RTT connection setup over a 4-month period.
Results:
• Only Cloudflare quiche uses autotuning to dynamically change receive buffer size
using RTT and estimates and data consumption rate. Most implementations rely on
keeping the receive buffer unchanged, linearly increasing the maximum allowance.
Google Chrome, for example, simply sets an initial high allowance of 15MB. In
general, the absence of better flow control schemes is due to the lack of fine-tuning
of such schemes by implementers, as well as memory limitations.
• With downloads bigger than 1MB, 5 out of 13 implementations generate small
DATA frames, while 6 of 13 prefer large ones (respectively, less than 100kB and
more than 1MB). 3 stacks also dynamically change the size of the frames. Path MTU
Discovery (PMTUD), used to improve efficiency by increasing packet size, is em-
ployed in 3 implementations, but the size is simply increased by adding PADDING
frames, without using more sophisticated mechanisms.
23

• 9/18 stacks use a form Round Robin for packet prioritization, while for the remain-
ing ones, approaches like sequencial prioritization or FIFO are preferred.
• About congestion control, most stacks (11/13) adopt an initial congestion window
size of 12kB-15kB. The remaining implementations choose a much bigger window
of more than 40kB. Facebook tunes the size by using machine learning algorithms.
QUIC pacing is present only in 8 implementations. ACK frequency varies between
1 and 10 packets in most implementations.
3.4 Related work discussion
All the papers mentioned in this chapter focused on assessing the operation and perfor-
mance of the QUIC transport protocol and the MASQUE extension, also with respect
to other state-of-art protocols, such as TCP. The methodologies and the tools used in
these works have been one of the starting points of this thesis in choosing the approaches
and techniques to employ, especially when it comes to MASQUE. In fact, a lot of ap-
proaches included downloading files, emulating network conditions and simulating the
entities involved and their communication.
The reviewed MASQUE papers focus on assessing more aspects of the protocol, while
setting less or less varying network conditions. This thesis, on the other hand, is based
on only one single setup, the download of an arbitrarily-sized file. In spite of this, some
specific network conditions vary within the same experiment to span the simulation of
multiple communication types and geographical distances. Furthermore, MASQUE and
QUIC are also compared to TCP with TLS. The conclusions and insights provided by
the state-of-art were useful to make a comparison with the results of this thesis, as they
served as a reference point for interpreting and validating the results obtained.
4 Testbed technologies
A dedicated testing environment has been developed in order to evaluate the perfor-
mance of MASQUE and compare it to other protocols under several simulated network
conditions.
4.1 Software technologies used
In order to achieve the objectives of this thesis, the following software technologies have
been used:
• Docker [18] for the containerization of the involved entities to be analyzed and to
ensure isolation and consistency across different environments
• Bash scripting to automate various tasks and streamline the execution of repetitive
or complex operations. It has been also used for file management, orchestration
and data manipulation tasks
• Linux networking stack for networking functionalities and configuration, as well as
tc [11] and tc-netem for traffic control and for simulating network conditions such
24

as additional delay or fixed bandwidth
• Wireshark [14] and TCPDump [12] for capturing traffic
• R programming language for data manipulation and plotting
• Several Linux utility tools, including datamash [6], time [13], sed [9] and awk [1]
for calculations and file manipulation
4.2 Tools used
The basic building blocks for the evaluation of MASQUE are the tools provided by Google
by manually compiling the Chromium source [3]. From the output folder, it’s possible to
run a MASQUE client (masque_client) and a MASQUE proxy (masque_server) that
can be used in combination with a generic QUIC server. Even though this was provided
by Google as well, for more versatility and for its better customization, it was decided to
use a server from the Cloudflare quiche implementation (quiche_server) [4], also for its
compatibility with other clients. In fact, the same server is also used for the QUIC end-to-
end scenario, with the client being the QUIC client provided by Google (quic_client).
For TCP+TLS, the choice was curl, Twisted server [5], a HTTPS enabled version of
SimpleHTTPServer and Squid proxy [10].
4.3 Testing environment
An experimental testing environment has been implemented and set up for the network
emulation, test automation, post-processing and result saving and plotting. This envi-
ronment has been used to evaluate MASQUE and compare it to other protocols under
several network conditions. The tools used include Docker, Bash scripting, the Linux net-
working stack and traffic control techniques. All test cases described in this thesis consist
of a client requesting a file of variable size using the HTTP GET method. Network con-
ditions and characteristics are emulated using the Linux traffic control (tc) tool [11] and
they consist of artificial delays, resulting in an additional RTT, bandwidth limitation to a
fixed amount and a simulated packet loss. In particular, four different test categories are
considered and compared, two with proxies and two end-to-end: MASQUE, TCP+TLS
with proxy, QUIC, TCP+TLS without proxy. In each measurement, a client sends a
certain number of requests for a file to a server, either through a proxy or not, with a set
delay a fixed bandwidth and, in case, a packet loss. On the other hand, an experiment
is a set of measurements with a starting delay that is increased for each measurement,
according to a step. The parameters of an experiment are the number of measurements,
the number of iterations (requests) per measurement, the starting delay, the step, the file
size, the bandwidth and finally the packet loss. An experiment is performed for a certain
category, so a complete test includes four experiments, one for each possible category.
After a test, the results are saved and several plots are drawn.
4.3.1 Docker-based emulation setup
The emulation setup consists of three Docker containers simulating a client, proxy and
server for the tunneled scenario and two Docker containers simulating a client and a
25

server for the end-to-end scenario. In any case, the server container hosts a sample
website with a very simple index page and a file of arbitrary size created using the
Linux dd command. The site data is stored on the /dev/shm shared memory for more
efficiency and speed. A user-defined bridge has been created for the communication
between containers, which share the same address space. The traffic control rules are
applied to the eth0 interface of each container for the outgoing traffic, depending on the
scenario considered. In particular, using tc-netem, the delay is symmetrically set on the
client and server interface in the tunneled case and on the client interface only in the
end-to-end cases, so that the total additional RTT is the same in all scenarios. On the
other hand, the bandwidth is set by applying a Token Bucket Filter (tbf) filter on the
client and server interface in all cases. As said, the delay changes incrementally for each
measurement, while the bandwidth limitation stays the same. The packet loss is applied
to each entity, but, like the delay, it is distributed so that the total loss is the same in
the two sets of protocols. Each Docker container has net_admin capabilities in order to
be able to apply traffic control rules, and server containers also have a dynamically set
shared memory size, based on the size of the file to host.
Figure 8: How the three containers communicate inside Docker
Inside the Docker containers, the rules are applied in the following way, for example with
a starting delay of 10ms and a bandwidth of 10Mbps:
tc qdisc add dev eth0 root handle 1: tbf rate 10mbit
burst 100kb limit 100kb
tc qdisc add dev eth0 parent 1:1 handle 10: netem delay 10ms
They first create a Token Bucket Filter (tbf) queueing discipline (qdisc) to the eth0 inter-
face, setting the average rate at which traffic is allowed to pass through the qdisc. Then
they add a netem queueing discipline to the parent one identified by "1:1", introducing
a delay to the outgoing traffic. In case, also a packet loss can be added. In particular,
the Token Bucket Filter maintains a token bucket that represents the available tokens for
transmission. Tokens are generated at a fixed rate and added to the bucket over time.
When a packet needs to be transmitted, a certain number of tokens are required from
the bucket. When a packet is sent, tokens are consumed, but if there are insufficient
tokens, the packet is temporarily delayed or possibly dropped. The burst size determines
26

the maximum amount of data that can be sent at once without being delayed, while the
limit restricts the total number of tokens that can be accumulated in the bucket.
4.3.2 Bash scripts
The scripts are divided into internal scripts and orchestration scripts.
The Docker emulation setup is regulated by three internal Bash scripts, one serving as the
entrypoint of the containers, while two being utility scripts. The scripts are as follows,
in a bottom-up order:
• execute.sh: configures and runs an entity, that can be a client, a server, or a
proxy, in the QUIC, MASQUE or TCP+TLS (tunneled and end-to-end) scenarios,
for a total of seven possible entities. The configuration includes generating and
loading certificates for the server, creating an index.html file as well as a text file
of a specified size to be downloaded and writing the needed headers on such files.
• cmd_timestat.sh: calls a specified command a certain number of times and mea-
sures its execution time using the Linux time command. Furthermore, it saves inter-
mediate results, logs eventual errors and calculates and saves some useful statistics
with the collected times, including mean time, median, minimum, maximum and
quartiles. In case of a failed request, it repeats it without saving the time elapsed.
• run_measurements.sh: Docker entrypoint. It first applies traffic control rules (de-
lay, bandwidth and packet loss), if any, on the specified entity and, if it is a server
or a proxy, it simply runs it using the execute.sh script. If it is a client, it executes
the cmd_timestat.sh on the execute.sh script and saves the statistics in a csv
file. To sum up, this script basically runs a measurement consisting of a specific
number of requests for a certain scenario with certain traffic conditions.
Orchestration scripts are present on the host machine and they serve the purpose of
setting up and managing the tests using Docker. The scripts are as follows:
• docker_tests.sh: performs a specified number of measurements of a certain cat-
egory by running Docker containers: a client, a server and a proxy (based on the
measurement category, it can be present or not), each with a certain delay, a band-
width limit and a packet loss. The delay can be applied to any entity and it is
updated according to a specified step for each measurement, in which the contain-
ers are stopped, removed and run again. At the end of each measurement, the
results csv files and error logs are copied from the container to the host machine
and further modified to include other useful information. When the measurements
are concluded, the files are merged.
• main_tests.sh: as the name suggests, the main script to launch in order to run
a full test. It creates a specific folder for the test, along with the four subfolders
for each category, which in turn contain the necessary subfolders for intermediate
iterations, measurements, experiments and errors. It then runs four experiments,
one for each scenario (MASQUE, TCP+TLS tunneled, QUIC and TCP+TLS end-
to-end) and then merges the results in two main csv files, one with the summary
27

of the results and one with all the intermediate results for easier consultation.
Optionally, it calls a R script to draw boxplots, ECDFs and CPU time plots.
To sum up, a full test is made of four experiments, one for each category or scenario
(MASQUE, TCP-TLS with proxy, QUIC and TCP-TLS without proxy). In turn, each
experiment is made of M measurements, each with a different additional delay, distanced
according to a predefined step. In fact, a single measurement is described by an artificial
delay, which results in an additional RTT, a fixed bandwidth, a packet loss, a file size
and a summary of the measured time. This summary is calculated based on N iterations,
each corresponding to a request that the client sends to the server for a file, either
through a proxy or not. In other words, a test contains four experiments, each with M
measurements, each with N iterations.
More details about the inputs and outputs of the scripts and their usage can be found in
the Appendix (Section 8.1).
Figure 9: Internal and orchestration scripts scheme
5 Experimental campaigns
5.1 Methodology
5.1.1 Setup
As previously explained, a full test is described by number of measurements, number of
iterations, starting delay in ms, delay step in ms, file size in bytes, bandwidth limit in
28

Mbps and packet loss in percentage.
All tests have been run on a virtual machine running on a remote dedicated computer
located at the BigData@Polito cluster [2], a computing infrastructure specifically designed
and implemented by Politecnico di Torino (Polito) for handling big data processing and
analytics tasks. Using a dedicated computer ensures that all resources are solely dedicated
to running the tests, preventing any interference with a personal computer’s normal usage,
such as background processes, applications, or network activities running that can impact
the test results. The computer and the virtual machine are accessed via SSH and have
the following specifications:
• AMD EPYC 7301 16-Core 64bit 2.70GHz processor, 500GB of RAM, 64 of which
dedicated to the virtual machine
• Ubuntu 18.04.25 Bionic LTS
• Docker version 24.0.2
Furthermore, with the usage of the –cpuset-cpus flag, the three Docker containers corre-
sponding to server, proxy and client are run respectively on the logical CPU core 1, 2 and
3, in turn mapped to three different physical CPU cores, for a more efficient concurrent
execution. Intermediate and final results are saved in csv files and error logs, if any, are
saved in log files. These results are then used to create several plots that will be explained
in the next chapter.
Figure 10: Experiments scheme with the created folders and results
With respect to the tools used mentioned earlier, the four test categories are as follows:
• MASQUE (masque)
29

– Client: Google masque_client communicating to target server through proxy
– Proxy: Google masque_server at address 172.18.0.3 and port 6122
– Server: Cloudflare quiche_server at address 172.18.0.2 and port 6121
• TCP+TLS with proxy (tcp-tls_pro)
– Client: curl communicating to target server through proxy
– Proxy: Squid proxy at address 172.18.0.3 and port 3128
– Server: Twisted HTTPS server at address 172.18.0.2 and port 8081
• QUIC (quic)
– Client: Google quic_client directly communicating to target server
– Server: Cloudflare quiche_server at address 172.18.0.2 and port 6121
• TCP+TLS without proxy (tcp-tls_no-pro)
– Client: Google masque_client directly communicating to target server
– Server: Twisted HTTPS server at address 172.18.0.2 and port 8081
it is worth noting that in QUIC both endpoints make authenticated declarations of their
transport parameters during connection establishment. These parameters are encoded
in the cryptographic handshake and they have been checked before running the tests by
inspecting the traffic inside the Docker containers. In fact, the first Initial packet sent
by the client contains a CRYPTO frame carrying the TLS Client Hello, that can be used
to read the transport parameters. These are the most important ones followed by the
definition taken from the RFC 9000 [36]:
• initial_max_stream_data_bidi_local: integer value specifying the initial flow
control limit for locally initiated bidirectional streams. This limit applies to newly
created bidirectional streams opened by the endpoint that sends the transport pa-
rameter.
• initial_max_stream_data_bidi_remote: integer value specifying the initial flow
control limit for peer-initiated bidirectional streams. This limit applies to newly
created bidirectional streams opened by the endpoint that receives the transport
parameter.
• initial_max_stream_data_uni: integer value specifying the initial flow control
limit for unidirectional streams. This limit applies to newly created unidirectional
streams opened by the endpoint that receives the transport parameter.
• initial_max_data: integer value that contains the initial value for the maximum
amount of data that can be sent on the connection.
• max_udp_payload_size: integer value that limits the size of UDP payloads that
the endpoint is willing to receive. UDP datagrams with payloads larger than this
limit are not likely to be processed by the receiver.
30

All these values, except one, are the same in the connection between the client and the
proxy and between the proxy and the server.
In fact, the first three parameters are all equal to 6MB, initial_max_data is 15MB and
finally max_udp_payload_size equals 1472 bytes for the connection between client and
proxy and 1250 for the connection between proxy and server.
Finally, QUIC default congestion control algorithm is Cubic and the following experiments
use it.
5.2 Considerations
5.2.1 Delay distribution
The artificial additional delay introduced with tc-netem sums up to the round-trip time
(RTT) of the normal communication. The way the artificial delay is set is different
between the end-to-end and tunneled scenarios, because of the presence (or the absence)
of the proxy. In fact, when there is a proxy involved in the communication between a
client and a server, the round-trip time consists of multiple contributions: the client-to-
proxy RTT, the proxy processing time, and the proxy-to-server RTT. To add an artificial
delay that simulates the RTT on the client and server side, it is necessary to distribute the
delay appropriately, taking into account the fact that the proxy acts as an intermediary
between a client and a server, so it not only forwards requests from the client to the
server, but also the other way around, so, in the total RTT calculation, the delay would
be counted twice. On the other hand, without a proxy, the entire RTT is experienced
directly between the client and server. That’s why, in the tunneled case, it is necessary
to split the additional delay between the client interface and the server interface, without
applying any rule to the proxy interface and, conversely, it is possible to apply the full
amount of delay on either the client or server side since in the end-to-end case, where
there are no intermediary processing or forwarding steps involved. For simplicity, the
delay specified in the involved scripts refers to the delay added on the client and on
the server interface in the tunneled case, so it is doubled and added only on the client
interface in the end-to-end case. For instance, if the test specifies a starting delay of 5ms,
it means that 5ms of delay is applied to the client interface and another 5ms of delay
is applied to the server interface in the MASQUE and TCP+TLS with proxy case, and
10ms of delay is applied only on the client interface in the QUIC and TCP+TLS without
proxy case. In this way, the total RTT introduced by the traffic control rules is the same
on all the test categories (10ms in the example considered).
5.2.2 Packet loss distribution
Similarly to the delay, also the chance of packet loss has been carefully distributed over the
network interfaces, such that the test categories could have the same value. In particular,
in test categories without a proxy, the specified packet loss is simply halved and applied
to the interface of the two endpoints. On the other hand, in presence of a proxy, this loss
is divided by four, taking into account that the proxy communicates with the client and
the server using the same interface and forwards requests and responses from the client
to the server and from the server to the client. For instance, if the test specifies a packet
31

loss of 2%, in the end-to-end cases, 1% of loss will be applied to the client’s interface and
1% of loss on the server’s interface. In the case with proxy, instead, 0.5% on loss will be
in turn applied to the client, server and proxy interface, so the same chance of packet
loss will be achieved in all the test categories.
5.2.3 Tools selection and evolution
it is worth noting that the tools used to carry out the experiments have not been the same
and were subject to change during the implementation and throughout the experimen-
tation phase. In fact, as the testing and work progressed, and as a thorough analysis of
the results was conducted, certain tools were initially chosen, then subsequently replaced
with alternatives based on the evolving requirements and findings. Some other tools were
discarded at the beginning and used again at the end, until the final configuration. For
instance, at the beginning, for both the QUIC and MASQUE case, we relied solely on
Google tools. In particular, Google MASQUE has been a constant throughout the tests,
even though some other implementations, such as the ones mentioned at the beginning,
have been tried but very soon discarded for their lack of features or very early development
phase. About QUIC, the Google quic_client and quic_server were the initial choice
for a good part of the experiments, but they were replaced with the curl development
build with HTTP/3 support and with the Cloudflare quiche_server respectively. The
former was chosen because of the apparent enhanced performance, especially if compared
to TCP+TLS, the latter for its support for curl and the better customization options.
At the end, curl was replaced again by Google quic_client that proved to have more
realistic performance, especially for medium to high latency.
The results presented in the next chapter are based on the configuration described in this
chapter. More results achieved with the curl configuration can be found in the Appendix
(Section 8.3).
6 Results
6.1 Data and graphs
6.1.1 Collected data
When a client requests a file to the server (either through a proxy or not) during a
measurement iteration, it downloads said file and then it exits with a return code. This
means that a client execution corresponds to a complete file transfer, no matter if the
operation was successful or not. For this reason, for each request, corresponding to an
iteration of a single measurement, the client command execution time is collected using
the Linux built-in time command. The whole time command output is collected and
saved and it is made of:
• Real time (real): the actual elapsed time from the start of the command until its
completion. It includes both the time spent on the command’s execution and any
waiting time (e.g., for input/output or system resources).
• User CPU time (user): the amount of CPU time consumed by the command while
32

executing in user mode. It includes the execution time spent on the command itself
and any user-level function calls made by the command.
• System CPU time (sys): the CPU time used by the command while executing in
kernel mode. It includes the time spent in the kernel while executing system calls
or handling system-related tasks on behalf of the command.
The sum of user time and system time provides the total CPU time used by the command
and comparing it with real time can provide insights into the efficiency and performance
of a command or program. When an error occurs, such as request timeout, invalid headers
and so on, it is logged, but the execution time is discarded and the request is repeated
by increasing the number of iterations. In this way, errors are detected and accessible,
but they don’t hamper the measurement. After N iterations are successfully performed,
the collected real times are used to create a measurement summary. This includes mean
time, standard deviation, median, maximum and minimum time, first and third quartiles
and 5th and 95th percentiles.
6.1.2 Data visualization
Several plots have been laid out to interpret and visually understand the time measure-
ments. In particular, the result summary has been used to draw
• Boxplots to display the distribution of the data and for a quick understanding of
the data’s central tendency, spread, and skewness.
• Normalized plots, in which the mean measured time for each total additional delay
is normalized by the value at the minimum delay, that is zero. This is done to
better understand the temporal evolution and to compare the four scenarios.
• Line plots with error bars to more easily provide a visual representation of the
variability associated to data points.
In any case, the data is grouped by the total additional delay.
The single iteration results, instead, have been used to draw
• CPU time VS real time graphs, to compare the real time and CPU time evolution
as a function of the total additional delay. The purpose is to assess how much the
command relies on CPU resources and the waiting time or I/O overhead.
• Empirical Cumulative Distribution Functions (ECDF) to visualize the entire dis-
tribution of the data
All plots, except the CPU time vs real time graphs, compare the four categories in the
same graph with different colors.
6.1.3 Considered scenarios
Testing scenarios differ based on file size to be downloaded, bandwidth limit and packet
loss, while, in all of them, the delay varies for each measurement according to a step. This
delay is symmetrically set on client and server on proxying cases and only on the client
on end-to-end cases. Setting a delay, in fact, can simulate geographical distances, as well
33

as communication types, such as wireless, wired or satellite [55, 33]. Three measurement
campaigns have been run, each one with a different bandwidth limit, each in turn with
two downloaded file sizes. In particular, bandwidth has been divided into low, medium
and high, corresponding respectively to 10Mbps, 100Mbps and 1Gbps. In each campaign,
the download sizes considered are 1MB and 10MB. In any case, for each experiment, 21
measurements containing 100 requests have been made. This means that each scenario
has a total additional artificial delay varying from 0ms (no delay added) to 200ms and the
time to complete a request for each applied delay has been evaluated 100 times. Packet
loss has never been applied, except for one extra scenario that is listed at the end. In the
following scenarios, it is implied that no packet loss is added.
In each scenario, the time evolution of each test category will be assessed, as well as the
overall stability and the behaviour as the additional artificial delay increases. Further-
more, there will also be a comparison between the average real time and CPU time used
throughout the test, in order to understand the CPU usage and the waiting times.
In the next paragraphs, after the result description for each scenario, only line plots with
error bars and CPU time VS real time plots will be included. The other plots, such as
boxplots and relevant ECDFs, have been included in the Appendix (Section 8.3)
6.2 Low bandwidth scenario
A bandwidth limit of 10Mbps might correspond to some home internet connections, small
business networks and mobile cellular networks.
6.2.1 1MB file download
In all the test categories, the trend is growing as the total delay increases.
In particular, MASQUE appears to be the slowest protocol of them all, keeping its position
above QUIC most of the time and TCP+TLS the whole time. However, in the range of
140ms to 170ms of additional delay, it performs quite significantly better than QUIC.
QUIC, on the other hand, appears to have a more irregular trend overall. In fact, it per-
forms slightly better than MASQUE and similarly to TCP+TLS without proxy, especially
between 80ms and 110ms of total delay, where its median stays below of its TCP+TLS
equivalent category. As said previously, at 140ms of total delay, QUIC abruptly starts
behaving worse not only than TCP+TLS without proxy, but also than MASQUE, where
it experiences a sudden slope change between 140ms and 150ms. It goes on growing with
a less steep slope, until, starting from 170ms of total delay, it goes down again and starts
performing significantly better than MASQUE and TCP+TLS without proxy. The trend
of end-to-end TCP+TLS has already been implicitly discussed by talking about QUIC
and MASQUE, so it is worth mentioning that its standard deviation, compared to all the
other protocols, is generally higher, especially with higher latency (from 140ms on), being
even 6 times bigger than the other protocols in the worst cases. The tunneled TCP+TLS
case has surely the best performance and stability in terms of measured time.
QUIC and MASQUE have similar performance at the very beginning of the test and as the
delay increases, keeping a time difference from 2% to 5%. However, in the time interval
34

in which QUIC performs worse than MASQUE, this difference reaches a maximum of
11%.
About the comparison between CPU time and real time, it can be observed that, in
the TCP+TLS cases, the trend of CPU time is more or less constant and most part of
variation comes from the real time. MASQUE, on the other hand, has a slowly decreasing
trend instead. In the QUIC case, the CPU time almost perfectly follows the trend of CPU
time.
1.0
1.5
2.0
0 50 100 150 200
Total delay [ms]
Measured
time
[s]
Category
masque
quic
tcp-tls_no-pro
tcp-tls_pro
Unified bar plot - 1M 10Mbps
Figure 11: Line plot with error bars for the low bandwidth scenario with a 1MB file
download
In this configuration, both QUIC and MASQUE perform generally worse than the equiv-
alent cases in TCP+TLS. In particular, it was observed that MASQUE failed its request
in almost 9% of the iterations, reporting the following errors:
Failed to connect with client 0 server <cid> to 172.18.0.2:6121.
Error: QUIC_NETWORK_IDLE_TIMEOUT
Missing :status response header
Because of these errors, though the measurements corresponding to the failed requests
have been discarded, some outliers are present and a more unstable and irregular trend
35

tcp-tls_no-pro tcp-tls_pro
masque quic
0 50 100 150 200 0 50 100 150 200
0.0
0.5
1.0
1.5
2.0
0.0
0.5
1.0
1.5
2.0
Total delay [ms]
Real
time,
CPU
Time
[s]
Time
CPUTime
RealTime
Figure 12: CPU time VS Real time plots for each category in the low bandwidth scenario
with a 1MB file download
is observed, especially at 60ms, 120ms and 150ms of delay. In fact, MASQUE appears to
perform worse than all the other protocols and it is the only one having failed requests.
it is worth noting that the first error displayed does not occur because of a failed commu-
nication between the client and the proxy, but between the proxy and the target server.
Conversely, with the :status response header missing, the client treats the proxying
attempt as failed and aborts the request, as specified by the RFC 9298 [51]. QUIC has
better times than MASQUE, but it shows a relatively high standard deviation and never
reaches the performance of its equivalent TCP+TLS case, though it achieves similar re-
sults for high latencies, from 170ms on, in which its median time can also be lower than
TCP+TLS without proxy. TCP+TLS without proxy has slightly better performance
compared to the previous case.
QUIC and MASQUE are close with little to no added delay and they have an average
difference between 1% and 3%, the latter happening at 120ms, where the two have the
highest performance difference.
TCP+TLS cases have an almost constant mean CPU time throughout the experiments,
while MASQUE keeps having a decreasing trend, though it is a bit more irregular. On
the other hand, QUIC’s CPU time almost perfectly follows the real time.
36

8.5
9.0
9.5
10.0
0 50 100 150 200
Total delay [ms]
Measured
time
[s]
Category
masque
quic
tcp-tls_no-pro
tcp-tls_pro
Figure 13: Line plot with error bars for the low bandwidth scenario with a 10MB file
download
6.3 Medium bandwidth scenario
Typical high bandwidth scenarios, with a limit of 100Mbps, include situations where
network connectivity is relatively fast, like residential areas with FTTC optical fiber and
educational institutions.
In this scenario, for very low additional delays, the usual situation in which MASQUE and
QUIC have worse performance than the TCP+TLS cases can be observed. QUIC, during
the whole test, has better times than MASQUE, except at 30ms of delay. Moreover,
starting from this time, QUIC resumes having better performance than MASQUE, but
also, a gap between the group formed by MASQUE, QUIC and end-to-end TCP+TLS
and the tunneled TCP+TLS starts existing. This gap gets bigger and bigger as the
delay increases. In this group, starting from around 80ms until the end of the test,
QUIC performs better than TCP+TLS without proxy, also showing way less variability.
Also MASQUE starts perfoming better than TCP+TLS without proxy, with similar
considerations as the QUIC cases. However, while QUIC manages to stay below the 25th
percentile of the equivalent TCP+TLS box, MASQUE mostly stays near or below its
median. For very high latencies of 190ms and 200ms, also MASQUE breaks away from
end-to-end TCP+TLS 25th percentile. The tunneled TCP+TLS case appears to have a
37

masque quic
0 50 100 150 200 0 50 100 150 200
0.0
2.5
5.0
7.5
10.0
0.0
2.5
5.0
7.5
10.0
Total delay [ms]
Real
time,
CPU
Time
[s]
Time
CPUTime
RealTime
Figure 14: CPU time VS Real time plots for each category in the low bandwidth scenario
higher standard deviation than before, that keeps growing as the total delay increases.
In spite of this, it is still the best-performing protocol among the four, even though it
is slightly surpassed from the end-to-end case at the very beginning of the test. Both
MASQUE and QUIC show less variability, with a very low standard deviation compared
to their TCP+TLS counterparts, especially when latencies grow higher. For example, at
160ms of total additional delay, the end-to-end TCP+TLS standard deviation is 8 times
higher than QUIC and MASQUE.
The difference between QUIC and MASQUE is remarkable with little to no added latency,
reaching the 31% of percentage. This difference, however, stays between 6% and 8% as
the latency increases.
In this scenario, QUIC keeps having CPU time equal to real time and TCP+TLS cases
keep showing an almost constant CPU time evolution. MASQUE’s CPU time has a
decreasing trend that stops at 30ms, and then becomes constant.
At the very beginning of the test for this scenario, a very net separation between QUIC
and MASQUE and between QUIC and the TCP+TLS cases can be observed, when the
additional delay goes from 0ms to 10ms, with almost 1 whole second of difference. In this
38

0.0
0.5
1.0
1.5
2.0
0 50 100 150 200
Total delay [ms]
Measured
time
[s]
Category
masque
quic
tcp-tls_no-pro
tcp-tls_pro
Figure 15: Line plot with error bars for the medium bandwidth scenario with a 1MB file
download
case, QUIC performs significantly worse than TCP+TLS (both with and without proxy)
and, in turn, MASQUE performs significantly worse than QUIC. After, MASQUE and
QUIC become comparable in terms of times, with the former staying slightly above the
latter. This happens because MASQUE experiences an abrupt slope change. Starting
from 100ms, in which QUIC and MASQUE perform almost the same in terms of mean
elapsed time, QUIC starts linearly growing, obtaining, from this moment on, worse per-
formance than MASQUE. In fact, between 100ms and 140ms, MASQUE’s mean time
grows in a non-linear fashion, while, starting from around 140ms, also MASQUE grows
linearly, but with a smaller slope than QUIC. This allows MASQUE to keep staying
below QUIC’s curve. Both TCP+TLS cases grow linearly, but the proxied case keeps
performing better than the end-to-end case, also having a less steep growth curve and
less data variability. In no case MASQUE and QUIC perform better than or similarly
to the TCP+TLS cases, always keeping a quite net separation between the two protocol
groups.
The initial difference between MASQUE and QUIC also here reaches the 29% and, when
the former performs worse than the latter, the difference is between 6% and 12%, while
it sits in the 6% to 11% interval when the other way around happens.
Also here, like all the cases seen so far, in QUIC the sum of user time and system time
is equal to real time, resulting in CPU time and real time having the same evolution.
39

masque quic
0 50 100 150 200 0 50 100 150 200
0.0
0.5
1.0
1.5
2.0
0.0
0.5
1.0
1.5
2.0
Total delay [ms]
Real
time,
CPU
Time
[s]
Time
CPUTime
RealTime
Figure 16: CPU time VS Real time plots for each category in the medium bandwidth
scenario with a 1MB file download
TCP+TLS keeps showing a linear trend like before in both cases. In MASQUE, on the
other hand, CPU time and real time almost have the same trend, but, starting from
around 30ms, the CPU time starts drifting from the real time and showing an evolution
similar to the 10MB case of the previous scenario.
6.4 High bandwidth scenario
A scenario with a bandwidth of 1Gbps offers a significant amount of data transfer capacity
and corresponds to various possible scenarios, including residential areas with FTTH fiber
or Gigabit Ethernet.
This scenario is very similar to the medium bandwidth one, with the same download size.
In fact, for very low additional delays, both QUIC and MASQUE perform worse than the
TCP+TLS cases. QUIC, during the whole test, has better performance than MASQUE,
except at 30ms of delay, in which MASQUE has slightly better transfer times. Differently
from before, the grouping behaviour and the QUIC performance change with respect to
end-to-end TCP+TLS starts happening way before, at around 40ms. Starting from this
time, in fact, QUIC, MASQUE and end-to-end TCP+TLS form a group and there’s a gap
40

1
2
3
4
0 50 100 150 200
Total delay [ms]
Measured
time
[s]
Category
masque
quic
tcp-tls_no-pro
tcp-tls_pro
Figure 17: Line plot with error bars for the medium bandwidth scenario with a 10MB
file download
between this group and tunneled TCP+TLS, that grows as the delay increases. As said,
starting from around 40ms, QUIC performs better than TCP+TLS without proxy until
the end of the test, with some exceptions being 100ms, 120ms and 130ms, in which the two
protocols perform in a comparable way, even though the TCP+TLS equivalent case has a
way higher standard deviation and therefore more variability and taller boxes. MASQUE,
on the other hand, performs comparatively to TCP+TLS without proxy starting from
around 70ms, and gets better than it starting from 140ms until the end of the test.
Also here, both MASQUE and QUIC show less variability, with a very low standard
deviation compared to their TCP+TLS counterparts, especially with higher latency. The
only exception to this rule is at 150ms, where end-to-end TCP+TLS has a low standard
deviation, but still showing long whiskers. Tunneled TCP+TLS has a similar behaviour to
the previous scenario, showing the best performance among the four protocols considered.
Also here, except the very beginning of the test, MASQUE and QUIC perform similarly,
with their difference staying between 6% and 8%.
CPU time vs real time plots are very similar to the previous scenario with the same
download size.
41

masque quic
0 50 100 150 200 0 50 100 150 200
0
1
2
3
4
0
1
2
3
4
Total delay [ms]
Real
time,
CPU
Time
[s]
Time
CPUTime
RealTime
Figure 18: CPU time VS Real time plots for each category in the medium bandwidth
scenario with a 10MB file download
With this configuration, the behaviour is very similar to the previous scenario, with
medium bandwidth the same downloaded file size. In fact, just like before, there is a
net separation between MASQUE and QUIC and between QUIC and the TCP+TLS
cases and this happens for a very low additional delay, between 0ms (no additional delay)
and 10ms. Differently from before, in which the size of the two gaps was about the
same, here QUIC is closer to MASQUE than TCP+TLS. Starting from 20ms, MASQUE
and QUIC become comparable and get closer, with QUIC performing generally better
than MASQUE, until 100ms of additional delay. Starting from this time, for the second
half of the test, QUIC performs worse than MASQUE. Just like the previous scenario,
MASQUE’s mean time grows in a non-linear fashion between 100ms and 140ms, then
almost linearly from 140ms until the end of the test. Starting from 100ms, MASQUE’s
curve stays below QUIC’s curve. The same abrupt slope change as the previous scenario
can be observed in MASQUE between 10ms and 20ms. About TCP+TLS, the cases with
and without proxy have a similar relative behaviour as the previous scenario, with the
end-to-end TCP+TLS curve being less linear in the 150ms to 200ms interval and the
boxes being overall taller, especially for higher latencies.
The highest difference between MASQUE and QUIC when the former performs worse
than the latter is 13%, while, in the opposite situation, after 110ms, this difference, in
42

0.0
0.5
1.0
1.5
2.0
0 50 100 150 200
Total delay [ms]
Measured
time
[s]
Category
masque
quic
tcp-tls_no-pro
tcp-tls_pro
Figure 19: Line plot with error bars for the high bandwidth scenario with a 1MB file
download
favour of MASQUE, reaches the 11%.
The same behaviour about QUIC’s equal CPU and real time and TCP+TLS linear trend
in the CPU time can be observed. The trend of MASQUE is the same as the previous
scenario.
6.5 Other scenarios
In this paragraph, tests that do not fall in one of the previous scenarios are included.
6.5.1 1MB file download, 100Mbps bandwidth, 10ms delay, variable packet
loss
This is the only scenario that involves a simulated packet loss and has a fixed additional
delay of 10ms. The purpose of this scenario is to compare the four protocol categories
in presence of four different packet losses: no packet loss, 1%, 2% and 5%. For this
scenario, only a boxplot will be shown because it provides the best readability. Also,
since the additional delay is fixed, the boxes are grouped by the simulated packet loss.
Because of the way the loss has been distributed, it is harder to directly compare the
cases with proxy and without proxy, so, in the description, there will be a stronger focus
on comparing the two cases with proxy (MASQUE and TCP+TLS with proxy) and the
43

masque quic
0 50 100 150 200 0 50 100 150 200
0.0
0.5
1.0
1.5
2.0
0.0
0.5
1.0
1.5
2.0
Total delay [ms]
Real
time,
CPU
Time
[s]
Time
CPUTime
RealTime
Figure 20: CPU time VS Real time plots for each category in the high bandwidth scenario
two cases without proxy (QUIC and TCP+TLS without proxy).
It can be first observed that, as the loss increases, the measured time of each box increases
too, resulting in a growing trend. In all groups, MASQUE is the one performing the
worst for the whole test, while QUIC manages to keep its median below the TCP+TLS
counterpart when the loss is the highest. The end-to-end TCP+TLS case, as well as
MASQUE, have a more increasing variability as the packet loss grows compared to the
other two cases. The only errors present in the experiments come from QUIC and they
involve only two out of 100 requests in the 5% loss scenario, due to undecryptable packets.
44

0
1
2
3
4
0 50 100 150 200
Total delay [ms]
Measured
time
[s]
Category
masque
quic
tcp-tls_no-pro
tcp-tls_pro
Figure 21: Line plot with error bars for the high bandwidth scenario with a 10MB file
download
45

masque quic
0 50 100 150 200 0 50 100 150 200
0
1
2
3
4
0
1
2
3
4
Total delay [ms]
Real
time,
CPU
Time
[s]
Time
CPUTime
RealTime
Figure 22: CPU time VS Real time plots for each category in the high bandwidth scenario
46

0
1
2
3
4
0 1 2 5
Packet loss [%]
Measured
time
[s]
Category
masque
quic
tcp-tls_no-pro
tcp-tls_pro
Unified boxplot - 1M 100Mbps
Figure 23: Boxplot for the medium bandwidth scenario with a 1MB file download and in
presence of varying packet loss
47

Performance assessment of the MASQUE extension for proxying scenarios in the QUIC transport protocol.pdf

Performance assessment of the MASQUE extension for proxying scenarios in the QUIC transport protocol.pdf

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Performance assessment of the MASQUE extension for proxying scenarios in the QUIC transport protocol.pdf

Similar to Performance assessment of the MASQUE extension for proxying scenarios in the QUIC transport protocol.pdf (20)

Recently uploaded

Recently uploaded (20)

Performance assessment of the MASQUE extension for proxying scenarios in the QUIC transport protocol.pdf