SlideShare a Scribd company logo
1 of 182
Download to read offline
THE UNIVERSITY OF CALGARY
On the Security of the BB84 Quantum Key Distribution Protocol
by
Richard Cannings
A THESIS
SUBMITTED TO THE FACULTY OF GRADUATE STUDIES
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
CROSS-DISCIPLINARY DEGREE OF MASTER OF SCIENCE
DEPARTMENT OF MATHEMATICS AND STATISTICS
and
DEPARTMENT OF COMPUTER SCIENCE
CALGARY, ALBERTA
March, 2004
c Richard Cannings 2004
Abstract
The BB84 quantum key distribution (QKD) protocol enables two authenticated par-
ties to generate a secret key over an insecure quantum channel. Using a standardized
security definition, we prove that BB84 is secure and include explicit bounds on its se-
curity. Furthermore, our use of quantum circuit diagrams simplify the Shor–Preskill
proof. Namely, we can reduce the Modified Lo-Chau QKD to a practical version
of BB84 using the observation from Shor and Preskill that one may ignore a cor-
rectable number of phase errors, and the fact that computational basis measurements
commute with controls of CNOT operations.
The first four chapters provide the required background material on quantum
computing, information theory, cryptography, coding theory, and quantum error
correcting codes. Chapter 5 presents protocols for entanglement purification. Chap-
ter 6 reduces an entanglement purification protocol to the Modified Lo-Chau QKD,
and proves that it is secure. Finally, a reduction from the Modified Lo-Chau QKD
to BB84 establishes the security of the latter.
iii
Acknowledgments
Many people helped me write this thesis. First and foremost, I thank my entire thesis
committee: Richard Cleve, Barry Sanders, Renate Scheidler, John Watrous and
Hugh Williams1
who focused so much attention on problems dear to me—especially
Richard Cleve and Renate Scheidler for their conscientious support throughout my
entire graduate degree.
Writing this thesis was only possible with financial support from the iCORE Chair
in Algorithmic Number Theory and Cryptography (ICANTC), Renate Scheidler’s
NSERC grant, and funding from Richard Cleve’s NSERC and MITACS grants.
Finally, I want to express special thanks to:
– Richard Cleve for providing clear, simple, and stunningly beautiful solutions
to the most complex problems,
– Claude Laflamme for believing in me and coaching me through my undergrad-
uate and graduate degree,
– Christiane Lemieux for kindly guiding me through the horrors of probability
theory numerous times,
– Renate Scheidler for her devotion to my thesis, persistent focus on mathemati-
cal rigor, and her direct nature (for which I may not have been as immediately
appreciative as I should have been),
– John Watrous for challenging me and exposing me to some very beautiful
mathematics in CPSC 601.86, and
– Hugh Williams for his wit, sarcasm, and financial support.
1
All names in the Acknowledgments are intentionally listed in alphabetical order by surname.
iv
Dedication
I dedicate this thesis to my mother for her constant love, support, and encourage-
ment.
v
Table of Contents
Abstract iii
Acknowledgments iv
Dedication v
Table of Contents vi
Epigraph xii
1 Quantum Computing 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Postulates of Quantum Mechanics . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Qubits and State Space . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Quantum Evolution and Circuit Diagrams . . . . . . . . . . . 8
1.2.3 Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Reversible Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4 Circuit Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5 The Density Operator Formalism . . . . . . . . . . . . . . . . . . . . 22
1.5.1 Mixed States . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5.2 Quantum Evolution . . . . . . . . . . . . . . . . . . . . . . . . 23
1.5.3 Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.5.4 Fidelity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.6 Entanglement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.6.1 The Partial Trace . . . . . . . . . . . . . . . . . . . . . . . . . 27
2 Information Theory and Cryptography 29
2.1 Information Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Quantum Information Theory . . . . . . . . . . . . . . . . . . . . . . 32
2.3 Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.1 Creating a Private Channel . . . . . . . . . . . . . . . . . . . 34
2.3.2 The One-Time Pad . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.3 The Diffie–Hellman Key Distribution Protocol . . . . . . . . . 39
2.3.4 The BB84 Quantum Key Distribution Protocol . . . . . . . . 41
2.3.5 On the Security of Key Distribution Protocols . . . . . . . . . 44
vi
vii
3 Coding Theory 47
3.1 Repetition Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.1.1 Performance of Repetition Codes . . . . . . . . . . . . . . . . 49
3.2 Binary Linear Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.1 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.2 Nearest Neighbour Decoding . . . . . . . . . . . . . . . . . . . 54
3.2.3 The Parity Check Matrix . . . . . . . . . . . . . . . . . . . . . 56
3.2.4 Syndrome Decoding . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3 Asymptotic Performance of Linear Codes . . . . . . . . . . . . . . . . 61
3.3.1 Code Duals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4 Parameterized Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4 Quantum Error Correcting Codes 65
4.1 Quantum Bit Flip Code . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.1.1 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.1.2 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.1.3 Transporting Classical Information Over A t-out-of-n X-Channel 73
4.2 Quantum Phase Flip Code . . . . . . . . . . . . . . . . . . . . . . . . 74
4.3 CSS Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.3.1 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.3.2 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.4 The Steane Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.4.1 The Steane code is a [[7, 1]] CSS Code . . . . . . . . . . . . . 90
4.4.2 Encoding and Decoding Circuits for the Steane Code . . . . . 91
4.5 Parameterized CSS Codes . . . . . . . . . . . . . . . . . . . . . . . . 94
4.6 The Quantum Gilbert–Varshamov Bound . . . . . . . . . . . . . . . . 98
4.7 Quantum Noisy Channels . . . . . . . . . . . . . . . . . . . . . . . . 99
4.7.1 CSS Codes are Robust . . . . . . . . . . . . . . . . . . . . . . 100
4.7.2 Definitions of a Quantum Noisy Channel . . . . . . . . . . . . 106
5 Entanglement Purification Using CSS Codes 108
5.1 Entanglement Purification Problems . . . . . . . . . . . . . . . . . . 108
5.2 Solving the Simple Entanglement Purification Problems . . . . . . . . 111
5.2.1 Solving SEPP-1 . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.2.2 Solving SEPP-2 . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.3 Solving GEPP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.3.1 The Random Sample Test . . . . . . . . . . . . . . . . . . . . 121
5.3.2 Using the RST in Quantum Problems . . . . . . . . . . . . . . 125
5.3.3 Using DRST in the GEPP Solution . . . . . . . . . . . . . . . 132
viii
6 On The Security of a Practical BB84 QKD 142
6.1 The Modified Lo–Chau QKD . . . . . . . . . . . . . . . . . . . . . . 142
6.1.1 On The Security of The Modified Lo–Chau QKD . . . . . . . 148
6.2 A Practical BB84 QKD . . . . . . . . . . . . . . . . . . . . . . . . . . 151
6.3 On the Success of a Practical BB84 QKD . . . . . . . . . . . . . . . . 154
7 Concluding Remarks 158
7.1 Simplifying the Shor–Preskill Reductions . . . . . . . . . . . . . . . . 158
7.2 Simplifying the Modified Lo–Chau QKD Security Proof . . . . . . . . 159
7.3 Explicit Security Bounds for BB84 . . . . . . . . . . . . . . . . . . . 160
Bibliography 161
A The Chernoff–Hoeffding Bounds 167
List of Figures
1.1 A quantum circuit of the Pauli X operation. . . . . . . . . . . . . . . 10
1.2 The controlled NOT operation on two qubits. . . . . . . . . . . . . . 11
1.3 A quantum circuit illustrating the inverted CNOT operation. . . . . . 12
1.4 The Toffoli operation on three qubits. . . . . . . . . . . . . . . . . . . 13
1.5 An example of a generalized n-controlled NOT operation. . . . . . . . 13
1.6 The quantum circuit for measurement in the computational basis. . . 14
1.7 Measuring two qubits of a 2-qubit system. . . . . . . . . . . . . . . . 15
1.8 Measuring one qubit of a 2-qubit system. . . . . . . . . . . . . . . . . 15
1.9 Alternate output representation of measuring one qubit of a 2-qubit
system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.10 An illustration of a measurement in the computational basis commut-
ing with the control of a CNOT operation. . . . . . . . . . . . . . . . 16
1.11 A quantum circuit implementing the logical OR. . . . . . . . . . . . . 20
1.12 A quantum circuit generating the Bell basis states. . . . . . . . . . . 26
1.13 A quantum circuit representing the partial trace. . . . . . . . . . . . 28
2.1 An example of the original BB84 QKD where no errors occurred. . . 42
3.1 A visual representation of a binary symmetric channel with error prob-
ability p. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1 UG3 based on the [3, 1, 3] binary linear code. . . . . . . . . . . . . . . 67
4.2 UG1 based on the [7, 4, 3] binary linear code. . . . . . . . . . . . . . . 68
4.3 A multiple controlled CNOT gate (left) is composed of two standard
CNOT gates (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.4 UH3 based on the [3, 1, 3] code using parity check matrix H3. . . . . . 71
4.5 UH1 based on the [7, 4, 3] code using parity check matrix H1 . . . . . 72
4.6 Creating the state 1√
|C2| c∈C2
|c . . . . . . . . . . . . . . . . . . . . . 78
4.7 The quantum circuit for encoding, Uencode. . . . . . . . . . . . . . . . 80
4.8 An alternative quantum circuit for encoding, Uencode. . . . . . . . . . 81
4.9 The quantum circuit for error detecting, error correcting a CSS code-
word. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.10 High level conceptualization of error detection, error correction, and
decoding a CSS codeword. . . . . . . . . . . . . . . . . . . . . . . . . 89
4.11 The encoding circuit diagram for the Steane code. . . . . . . . . . . . 92
4.12 The complete Udecode circuit for the Steane code with error detection,
error correction and decoding circuit. . . . . . . . . . . . . . . . . . . 95
4.13 Encoding |bx,z
L with Uencode and state preparation. . . . . . . . . . . . 97
ix
x
4.14 A conceptualization of a quantum noisy channel. . . . . . . . . . . . . 100
4.15 An example of Mother Nature’s adversarial strategies. . . . . . . . . . 101
5.1 The quantum circuit for Protocol 5.1, a solution to SEPP-1. . . . . . 112
5.2 The quantum circuit for Protocol 5.1 with quantum communication. . 119
5.3 Illustration of the DRST with our claim. . . . . . . . . . . . . . . . . 138
5.4 Two equivalent measurement operations. . . . . . . . . . . . . . . . . 139
5.5 An illustration of the GEPP Solution. . . . . . . . . . . . . . . . . . . 141
6.1 An illustration of the Modified Lo–Chau QKD. . . . . . . . . . . . . 147
6.2 A slightly altered Modified Lo–Chau QKD. . . . . . . . . . . . . . . . 153
6.3 An illustration of a practical BB84 QKD. . . . . . . . . . . . . . . . . 156
List of Protocols
2.1 The Vernam One-Time Pad Cipher . . . . . . . . . . . . . . . . . . . 36
2.2 The Diffie–Hellman key distribution protocol . . . . . . . . . . . . . . 40
2.3 The Original BB84 QKD . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.1 SEPP-1 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.2 A solution to SEPP-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.3 The Random Sample Test (RST) . . . . . . . . . . . . . . . . . . . . 122
5.4 The Double Random Sample Test (DRST) . . . . . . . . . . . . . . . 129
5.5 GEPP Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.1 The Modified Lo–Chau QKD . . . . . . . . . . . . . . . . . . . . . . 146
6.2 A Practical BB84 QKD . . . . . . . . . . . . . . . . . . . . . . . . . . 155
xi
Epigraph
Anyone who is not shocked by quantum mechanics has not fully understood it.
—Niels Bohr
xii
Chapter 1
Quantum Computing
1.1 Introduction
Before the existence of modern computers, Turing proposed a simple definition of a
theoretical computer called the universal Turing machine that was used to simplify
the study of computing devices in the context of evaluating functions. Turing [Tur36]
and Church [Chu36] proposed that:
Every “computing device can be simulated by a [universal] Turing ma-
chine.” [Sho94]
This proposal is commonly known as Church’s thesis, and simplified the study of
computing devices, because it reduced the study of numerous theoretical computing
devices to a single theoretical model.
With the realization of modern computers and their subsequent widespread adop-
tion, researchers began to focus on efficient physical computing devices. It is gen-
erally accepted that an efficient physical computing device is one that uses at most
a polynomial number of steps relative to its input size. Thus, in the back of their
minds, researchers envisioned a strong Church’s thesis. Namely,
“Any physical computing device can be simulated by a [universal] Turing
machine in a number of steps polynomial in the resources used by the
computing device.” [Sho94]
However, Deutsch noted that a “physics experiment” is a physical computing
device because it is a process with input and output connected together by some
1
2
sequence of events. However, physics experiments based on quantum theory cannot
be perfectly simulated by a universal Turing machine using only a polynomial number
of steps. Both Deutsch [Deu85] and Feynman [Fey86] addressed this problem by
proposing a theoretical model of computing based on quantum physics, called a
quantum computer. This led to the Church–Turing principle which states:
“Every [physical] system can be perfectly simulated by a universal model
of computing machine operating” in a number of steps polynomial in the
resources used by the physical system. [Deu85]
In a number of cases, quantum computing and communication have substantial
power over classical computers and communication, and have affected the cross-
disciplinary field of cryptology tremendously. In 1994, Shor created two probabilis-
tic polynomial-time quantum algorithms to solve two difficult mathematical and
computational problems: the discrete logarithm problem and the factoring prob-
lem [Sho94]. As a result, most number theoretic key distribution protocols such as
RSA [RSA79] and Diffie–Hellman [DH76] will be rendered useless upon the physical
realization of a quantum computer. More recently, Hallgren described a probabilis-
tic polynomial-time quantum algorithm to solve Pell’s Equation and the Principal
Ideal Problem [Hal02], and hence breaking even more cryptosystems, including the
possibly stronger Buchmann–Williams public-key cryptosystem [BW89].
Even though quantum computers break the most popular key distribution pro-
tocols, the idea of security is not lost upon the physical realization of quantum com-
puters. The theory of quantum communication (i.e., quantum networks) provides a
mathematical formalism to prove that some quantum key distribution (QKD) pro-
tocols are information theoretically secure—essentially unbreakable! In 1984, Ben-
nett and Brassard introduced the BB84 quantum key distribution protocol for two
authenticated parties to generate a secret cryptographic key via an insecure commu-
3
nication channel [BB84]. The BB84 QKD is physically realizable and commercially
available1
today. So proving its security has direct applications to industry. This
led to [May01], [SP00], and others proving that BB84 was information theoretically
secure.
BB84 and its security is the focus of this thesis. We begin the journey of proving
the security of BB84 by first introducing the postulates of quantum mechanics, fo-
cusing on how they apply to quantum computing. The material in this chapter can
be found in most quantum information and quantum computation text books such
as [NC00]. Quantum mechanics can be expressed in the language of linear algebra.
The texts [Nic90] and [HJ85] combined provide a great reference to the mathematical
foundations of quantum mechanics.
1.2 Postulates of Quantum Mechanics
There are many useful analogies between the postulates of quantum mechanics and
classical computing. Consider describing classical computing as being based on three
postulates: a state space, logical evolution, and observation. The state of a classical
computer is held within one or more registers called bits each taking on the value
0 or 1. The state space of a classical computer is the set of all possible states the
computer may be in. For example, the state space of an n-bit classical computer
is the set of all n-bit strings, {0, 1}n
. We also assume bits evolve and change only
through a series of basic logical operations—a logical evolution. Finally, we can
observe the inner workings of a classical computer at any moment—especially at
the end in order to observe the output. We believe that observing the output of an
algorithm, or even pausing an algorithm to look at the computer’s state, does not
affect the computation.
1
Purchase yours at http://www.magiqtech.com/, or http://www.idquantique.com/ today!
4
Quantum computing is based on the three postulates of quantum mechanics: a
state space, unitary evolution, and measurement. Like classical bits, one or more
quantum bits called qubits hold the state of a quantum system. The qubits evolve
through a series of unitary operations. Unlike classical computing, observing—or,
equivalently, measuring—a qubit may alter its value.
In this section, we introduce the three postulates of quantum mechanics in the
pure state formalism as they relate to quantum computing. The text [NC00] is an
excellent reference for quantum computing. In fact, much of this chapter is based
on [NC00].
1.2.1 Qubits and State Space
Before discussing qubits, let us briefly continue discussing classical computing. An-
other form of classical computing is probabilistic classical computing which associates
probabilities with each state. For instance, let B be a random variable representing
a state of a 1-bit classical system. With probability p (i.e. p = Pr[B = 0]), B takes
on the value 0, and with probability q = 1 − p, B takes on the value 1. Such a state
can be represented as a probability vector


p
q

 = p


1
0

 + q


0
1

 ,
where p+q = 1 and p, q ≥ 0. The ith
component of the vector contains the probability
that the state is i. In this case, the vector


1
0

 ,
represents the state 0 with certainty, and the vector


0
1

 ,
5
represents the state 1 with certainty.
In quantum computing, a qubit is a register whose possible values are pure quan-
tum states2
, and which is somewhat similar to probabilistic classical information.
For instance, a pure quantum state is represented as a unit vector


α
β

 = α


1
0

 + β


0
1

 ,
where α, β ∈ C and |α|2
+ |β|2
= 1. The values α and β are called amplitudes and
do not represent probabilities. A linear combination of basis states is said to be a
superposition, meaning that when α, β = 0 a qubit is “both” 0 and 1, rather than
being “either” 0 and 1, in the classical probabilistic sense. The power of quantum
computing arises from the difference between probabilities and amplitudes.
All such quantum states are in a vector space called a Hilbert space. For the
purposes of this thesis, we present a limited, yet suitable definition of a Hilbert
space.
Definition 1.1 A Hilbert space is a finite-dimensional complex inner product space
with the dot product as the inner product. H denotes a 2-dimensional Hilbert space.
This leads us to a preliminary version of the first postulate of quantum mechanics:
Preliminary Postulate 1 (State Space) A one-qubit quantum system is com-
pletely described by a unit length state vector in H.
We describe the state of a quantum system as a superposition, or linear combi-
nation, of basis states that span the Hilbert space. We may use any basis, but the
2
Pure quantum states are also called pure states, or sometimes just states.
6
most common basis consists of the computational basis states,
|0
def
=


1
0

 , and
|1
def
=


0
1

 .
The symbol |· is called a ket and represents a column vector. Associated with a ket
is a bra symbolized by ·|. A bra is the conjugate transpose of the ket. For instance,
0|
def
= |0
T
= 1 0 , and
1|
def
= |1
T
= 0 1 .
In general, let |ψ = α |0 + β |1 and |φ = γ |0 + δ |1 be arbitrary quantum states,
where |α|2
+ |β|2
= |γ|2
+ |δ|2
= 1. Then ψ| = ¯α 0| + ¯β 1| and φ| = ¯γ 0| + ¯δ 1|.
Multiplying a bra and a ket together forms a bracket, · | · . More precisely, the
bracket is the inner product. Namely,
ψ | φ
def
= ( ψ|) · (|φ )
= ¯α ¯β ·


γ
δ


= ¯αγ + ¯βδ.
Also |ψ = |φ if and only if ψ | φ = ψ | ψ = 1.
Like classical computing, more qubits allow for more complex and useful quantum
computing. Classically, bits are concatenated together into finite length bit strings
in {0, 1}n
. Qubits are “concatenated” using the Kronecker (a.k.a. tensor) product
symbolized by “⊗.”
7
Definition 1.2 Let A be an m × n matrix with entries ai,j and let B be any matrix.
The Kronecker product of A and B is the matrix
A ⊗ B
def
=








a1,1B a1,2B . . . a1,nB
a2,1B a2,2B . . . a2,nB
...
...
...
...
am,1B am,2B . . . am,nB








.
Typically, tensor products are implicitly assumed and often omitted so that for
b1, b2 ∈ {0, 1}, |b1 |b2 , |b1, b2 , and even |b1b2 are all assumed to be |b1 ⊗ |b2 .
Definition 1.3 Let |φ0 , |φ1 be basis vectors in H. Then H⊗n
is a 2n
-dimensional
Hilbert space with basis vectors
|φb1 ⊗ |φb2 ⊗ · · · ⊗ |φbn ,
for all b = b1b2 . . . bn ∈ {0, 1}n
.
For example, let |0 , |1 be basis vectors in H. We define space H⊗2
to be a
Hilbert space with basis vectors: |0 ⊗ |0 , |0 ⊗ |1 , |1 ⊗ |0 , |1 ⊗ |1 . Thus, the
computational basis of H⊗n
is simply given by all unit vectors |b where b ∈ {0, 1}n
.
This leads us to the full version of the state space postulate of quantum mechanics.
Postulate 1 (State Space (Discrete Version)) An n-qubit quantum system is
completely described by a unit length state vector in H⊗n
.
Tensor products and kets combined generate a descriptive and compact represen-
tation of a quantum state. Let |ψ = α |0 + β |1 ∈ H and |φ = γ |0 + δ |1 ∈ H be
two arbitrary qubits. Then the state of a two qubit quantum system is the vector


α
β

 ⊗


γ
δ

 =








αγ
αδ
βγ
βδ








8
in H⊗2
. Using kets, the state above is,
|ψ ⊗ |φ = (α |0 + β |1 ) ⊗ (γ |0 + δ |1 )
= αγ |0 ⊗ |0 + αδ |0 ⊗ |1 + βγ |1 ⊗ |0 + βδ |1 ⊗ |1
= αγ |00 + αδ |01 + βγ |10 + βδ |11 .
We will generally discuss n-qubit quantum states, such as the arbitrary state
|ψ =








α0
α1
...
α2n−1








= α0








1
0
...
0








+ α1








0
1
...
0








+ · · · + α2n−1








0
0
...
1








=
i∈{0,1}n
αi |i
in H⊗n
, where i |αi|2
= 1. Note that we identify {0, 1}n
≡ {0, 1, . . . , 2n
−1} above.
We shall commonly identify {0, 1}n
≡ {0, 1, . . ., 2n
− 1} and {0, 1}n
≡ Zn
when
appropriate.
Now that we feel comfortable with kets, bras and Hilbert spaces, let us move on
to the second postulate of quantum mechanics.
1.2.2 Quantum Evolution and Circuit Diagrams
Postulate 2 (Quantum Evolution) The evolution of a quantum system is repre-
sented by a unitary transformation. The state |ψ at time t is related to the state
|ψ at time t by a unitary transformation matrix U given by the equation
U |ψ = |ψ , or equally
|ψ
U
→ |ψ .
The above definition is a slightly altered version of [NC00, page 81]. As a re-
minder, a unitary matrix is a square matrix with complex entries whose inverse is
its conjugate transpose. We denote the conjugate transpose of a matrix M by M†
.
9
The unitary evolution of a quantum system is somewhat analogous to logical
evolution in classical computing. For example, consider applying the logical NOT
operation “¬” to a bit:
¬0 = 1, or
¬1 = 0.
The X Pauli matrix,
X =


0 1
1 0

 ,
is a quantum operation similar to “¬” since
X |0 = |1 , and
X |1 = |0 .
We often represent quantum operations in a conceptually helpful illustration
called a quantum circuit diagram. Figure 1.1 illustrates applying the unitary op-
eration X to the state α |0 + β |1 , and mathematically represents:
α |0 + β |1 = α


1
0

 + β


0
1


=


α
β


X
→


0 1
1 0




α
β


=


β
α


= β |0 + α |1 .
10
Xα |0 + β |1 β |0 + α |1
Figure 1.1: A quantum circuit of the Pauli X operation.
Quantum circuits are always read left to right, or top to bottom depending on
the orientation of the diagram. The solid horizontal lines are paths carrying one or
more qubits through the circuitry, where unitary operations are usually represented
as boxes marked by symbols.
We commonly use the following seven unitary operations. The first four are called
the Pauli operations and act on one qubit:
I =


1 0
0 1

 ,
X =


0 1
1 0

 ,
Y =


0 −i
i 0

 , and
Z =


1 0
0 −1

 .
Another single qubit operation is the Hadamard operation
H =
1
√
2


1 1
1 −1

 ,
that maps
|0
H
→
1
√
2
(|0 + |1 )
def
= |+ ,
11
and
|1
H
→
1
√
2
(|0 − |1 )
def
= |− ,
which are both equal superpositions of |0 and |1 , but differing by the −1 factor in
front of |1 , which is also known as a phase factor.
The most common multiple qubit operation is the controlled NOT, or CNOT
operation which acts on two qubits: a control and a target. It is a quantum version of
the exclusive-or (XOR) logical operation, symbolized by “⊕.” The CNOT operation
is mathematically defined as the matrix
CNOT =








1 0 0 0
0 1 0 0
0 0 0 1
0 0 1 0








.
However, it is best described by the quantum circuit in Figure 1.2, where c, t ∈ {0, 1},
|c is the control qubit, and |t is the target qubit. When the control qubit |c is
set to |1 , we apply X to the target qubit |t . When the control qubit is set to |0 ,
we apply I to the target qubit. In Figure 1.2, we define the CNOT operation by
describing how it acts on the basis inputs |00 , |01 , |10 , |11 . By linearity, defining
CNOT on the basis vectors of H⊗2
defines the operation for every vector in H⊗2
.
|c ⊕ th
|c|c
|t
x
Figure 1.2: The controlled NOT operation on two qubits.
Sometimes we wish to invert the control of the CNOT operation so that |c |t −→
12
|c |¬c ⊕ t . This is achieved by combining the CNOT and X operations. Namely,
|c |t
X⊗I
−→ |¬c |t (1.1)
CNOT
−→ |¬c |¬c ⊕ t (1.2)
X⊗I
−→ |¬(¬c) |¬c ⊕ t
= |c |¬c ⊕ t .
We illustrate the inverted CNOT in the circuit diagram in Figure 1.3. The left-hand
side shows the exact computations. Each vertical dashed line identifies the state of
the qubits at that point with the corresponding equation number. The right-hand
side of Figure 1.3 represents the equivalent shorthand illustration of the inverted
CNOT operation.
(1.1)
h h
h
≡
|c|c
|t |¬c ⊕ t
XX |c|c
|t |¬c ⊕ t
(1.2)
x
Figure 1.3: A quantum circuit illustrating the inverted CNOT operation.
The seventh and last quantum operation is the Toffoli gate which is a three
qubit version of the CNOT. It has two control qubits and one target qubit and is
represented by a matrix that is usually denoted by T. In Figure 1.4, we define the
Toffoli gate by how it acts on 3 qubit basis vectors, where c1, c2, t ∈ {0, 1}.
At times, we generalize the CNOT and Toffoli gates to an n-controlled NOT gate.
Figure 1.5 provides an example of one such gate, where c1, . . . , cn, t ∈ {0, 1}.
1.2.3 Measurement
Unlike classical computers, we may not observe qubits at any given time without
potentially harmful side-effects. In general, quantum measurement transforms a
13
|c1
h
x
|c2|c2
|t |(c1 ∧ c2) ⊕ t
|c1
x
Figure 1.4: The Toffoli operation on three qubits.
...
h
|cn|cn
|t |(c1 ∧ · · · ∧ ¬cn−1 ∧ cn) ⊕ t
x
h
|c1 |c1
|cn−1 |cn−1
x
Figure 1.5: An example of a generalized n-controlled NOT operation.
14
quantum state to a probabilistic state, where the amplitudes become probability
distributions.
The postulate of quantum measurement is very elaborate. However, we can repre-
sent almost every measurement in this thesis in a simple form called the measurement
in the computational basis, in which an arbitrary qubit |ψ = α |0 + β |1 collapses
to the classical probabilistic state


|α|2
|β|2

 ,
where one observes the bit 0 with probability |α|2
and the bit 1 with probability
|β|2
. Furthermore, immediately after measurement, the qubit |ψ becomes the value
observed. Namely, |ψ becomes either |0 or |1 .
In quantum circuit diagrams, computational basis measurements are represented
as a half circle. Figure 1.6 illustrates how measurement in the computational basis
works. As before, the single line represents a qubit path, but the new double line
represents the path of one classical bit.
3
2
α |0 + β |1
0, with probability |α|2
1, with probability |β|2
Figure 1.6: The quantum circuit for measurement in the computational basis.
Consider measuring the state |0 = 1 |0 + 0 |1 (i.e. α = 1, β = 0) in the
computational basis. Figure 1.6 shows that the resulting outcome will be 0 with
certainty. Similarly, measuring |1 in the computational basis results in the outcome
1 with certainty. We can extend measurements in the computational basis to multiple
qubits. For example, Figure 1.7 describes measuring two qubits of a 2-qubit system.
What about only measuring the first qubit of a 2-qubit system? The result is
best explained in Figure 1.8.
15



α |00 + β |01 + γ |10 + δ |11



00, with probability |α|2
01, with probability |β|2
10, with probability |γ|2
11, with probability |δ|2
Figure 1.7: Measuring two qubits of a 2-qubit system.


α |00 + β |01 + γ |10 + δ |11
0, w. p. |α|2 + |β|2
1, w. p. |γ|2 + |δ|2



α|0 +β|1
√
|α|2+|β|2
, if 0 was measured
γ|0 +δ|1
√
|γ|2+|δ|2
, if 1 was measured
Figure 1.8: Measuring one qubit of a 2-qubit system.
Intuitively, the measured qubit immediately becomes the value observed, and
the unmeasured qubit takes on the renormalized superposition not destroyed by the
measurement. Since measuring an arbitrary state in the computational basis will
collapse the state to |0 or |1 , at times, it is convenient to assume that the classical
data path holds a known quantum state |0 or |1 . So the output of measuring one
qubit in a 2-qubit system can also be represented as Figure 1.9.


α |00 + β |01 + γ |10 + δ |11



|0 ⊗ α|0 +β|1
√
|α|2+|β|2
, w.p. |α|2 + |β|2
|1 ⊗ γ|0 +δ|1
√
|γ|2+|δ|2
, w.p. |γ|2 + |δ|2
Figure 1.9: Alternate output representation of measuring one qubit of a 2-qubit
system.
From this basic definition of measurement, we can prove the following proposition.
Proposition 1.4 Computational basis measurements commute with the controls of
CNOT operations.
Proposition 1.4 is best described in Figure 1.10. The lower circuit uses classical
input for the control.
Proof: Let α |00 + β |01 + γ |10 + δ |11 be an arbitrary 2-qubit quantum state.
16



α|00 +β|01
√
|α|2+|β|2
, w.p. |α|2 + |β|2
δ|10 +γ|11
√
|γ|2+|δ|2
, w.p. |γ|2 + |δ|2


h
h
x
x
α |00 + β |01 + γ |10 + δ |11
(1.5)
(1.3)
≡
α |00 + β |01 + γ |10 + δ |11



α|00 +β|01
√
|α|2+|β|2
, w.p. |α|2 + |β|2
δ|10 +γ|11
√
|γ|2+|δ|2
, w.p. |γ|2 + |δ|2


Figure 1.10: An illustration of a measurement in the computational basis commuting
with the control of a CNOT operation.
The top circuit in Figure 1.10 performs the following steps:
α |00 + β |01 + γ |10 + δ |11
CNOT
−→ α |00 + β |01 + δ |10 + γ |11 (1.3)
measure⊗I
−→



|0 ⊗ α|0 +β|1
√
|α|2+|β|2
, w.p. |α|2
+ |β|2
|1 ⊗ δ|0 +γ|1
√
|γ|2+|δ|2
, w.p. |γ|2
+ |δ|2
=



α|00 +β|01
√
|α|2+|β|2
, w.p. |α|2
+ |β|2
δ|10 +γ|11
√
|γ|2+|δ|2
, w.p. |γ|2
+ |δ|2
(1.4)
The lower circuit in Figure 1.10 performs the following steps:
α |00 + β |01 + γ |10 + δ |11
measure⊗I
−→



|0 ⊗ α|0 +β|1
√
|α|2+|β|2
, w.p. |α|2
+ |β|2
|1 ⊗ γ|0 +δ|1
√
|γ|2+|δ|2
, w.p. |γ|2
+ |δ|2
=



α|00 +β|01
√
|α|2+|β|2
, w.p. |α|2
+ |β|2
γ|10 +δ|11
√
|γ|2+|δ|2
, w.p. |γ|2
+ |δ|2
(1.5)
CNOT
−→



α|00 +β|01
√
|α|2+|β|2
, w.p. |α|2
+ |β|2
δ|10 +γ|11
√
|γ|2+|δ|2
, w.p. |γ|2
+ |δ|2
(1.6)
17
Since the distributions in (1.4) and (1.6) are equal, computational basis measure-
ments commute with the controls of a CNOT operations.
Let us continue by defining the most general form of quantum measurement.
Postulate 3 (Quantum Measurement) Quantum measurements are described by
a collection of linear mappings {M0, . . . , Mm} from H⊗n
to H⊗n
, where
m
x=0
M†
xMx = I. (1.7)
The measurement {M0, . . . , Mm} acts on the state space being measured. The index
x = 0, . . . , m refers to the measurement outcome that may occur in the experiment.
If the state of the quantum system is |ψ immediately before measurement, then the
probability that result x = 0, . . . , m occurs is
ψ| M†
xMx |ψ .
The state of the quantum system immediately after measurement is defined to be
Mx |ψ
ψ| M†
xMx |ψ
. (1.8)
The postulate above is an adaptation of [NC00, pp. 84-85]. Equation 1.7 is called
the completeness equation which expresses the fact that the probabilities sum to 1.
Namely,
m
x=1
ψ| M†
xMx |ψ = 1.
When the measurement operators {M0, . . . , Mm} are all projections, we refer to
the measurement as a projective measurement. Projective measurements are said to
project the qubits to be measured into an outcome space.
The following proposition shows that the measurement in the computational basis
abides by the above postulate.
18
Proposition 1.5 Measurement on the computational basis is the quantum measure-
ment {|0 0| , |1 1|}.
Proof: First note that |0 0| + |1 1| = I, thus satisfying the completeness equa-
tion.
Let |ψ = α |0 + β |1 be an arbitrary quantum state, and set M0 = |0 0|. The
probability of observing 0 is
ψ| M†
0 M0 |ψ = ψ| (|0 0|)†
(|0 0|) |ψ
= ψ| (|0 0|) |ψ
= (¯α 0| + ¯β 1|)(|0 0|)(α |0 + β |1 )
= (¯α 0 | 0 + ¯β 1 | 0 )(α 0 | 0 + β 0 | 1 )
= ¯αα
= |α|2
.
The state of the quantum system immediately after measurement is
M |ψ
ψ| M†
0 M0 |ψ
=
(|0 0|)(α |0 + β |1 )
|α|2
=
α |0 0 | 0 + β |0 0 | 1
|α|
=
α
|α|
|0 .
So, when we observe 0, the resulting state is α
|α|
|0 . The state α
|α|
|0 is equivalent to
|0 in the sense that any quantum evolution U mapping |0
U
→ U |0 , maps α
|α|
|0
U
→
α
|α|
U |0 . And measuring U |0 and α
|α|
U |0 results in the same outcome with the
same probability. Without loss of generality, we assume our outcome is |0 .
The probability of observing a 1 is similar.
19
1.3 Reversible Computing
Since classical computers abide by the laws of physics, we should be able to describe
a classical computer based on the postulates of quantum mechanics. In this section,
we show that a classical computer can be implemented on a quantum computer.
Thus, a quantum computer is at least as powerful as a classical computer. To do
so, we implement a special kind of classical computer called a reversible computer
whereby no information is lost while performing the algorithm, so we can always
infer the input from the output.
To simulate a classical computer on a quantum computer, we represent the bit
0 as |0 and the bit 1 as |1 . For classical systems requiring multiple bits, we shall
represent the bit string b ∈ {0, 1}n
as |b ∈ H⊗n
. The most common universal set
of logical operations consist of AND, OR, NOT and FANOUT. The Toffoli gate
implements the logical AND. Let b1, b2 ∈ {0, 1}. Then
|b1 |b2 |0
T
→ |b1 |b2 |b1 ∧ b2 ,
where the third qubit holds the desired outcome. The Toffoli gate outputs the
original input, thus it is trivial to reverse. Please note that the third qubit is an
extra ancillary qubit used to store the solution. We call ancillary qubits added for
computation ancilla.
Using DeMorgan’s law, we can implement logical OR with the Toffoli and Pauli
X operations as X⊗3
T(X ⊗ X ⊗ I). Namely,
|b1 |b2 |0
X⊗X⊗I
−→ |¬b1 |¬b2 |0
T
→ |¬b1 |¬b2 |¬b1 ∧ ¬b2
X⊗3
−→ |¬(¬b1) |¬(¬b2) |¬(¬b1 ∧ ¬b2)
= |b1 |b2 |b1 ∨ b2 .
20
Again, the third qubit holds the desired outcome. The quantum circuit for the
implementation of OR is below in Figure 1.11
|b1 ∨ b2
x
x
X
X
X
X
X
|b1 |b1
|b2 |b2
|0 h
Figure 1.11: A quantum circuit implementing the logical OR.
The NOT operation is simply the Pauli X operation as previously described.
FANOUT is implemented by a CNOT operation, because |b1 |0
CNOT
−→ |b1 |b1 ⊕ 0 =
|b1b1 . However, cloning any arbitrary quantum state, akin to how FANOUT per-
forms on |0 and |1 , is not possible. This was first proved in [WZ82] and is proved
below.
Theorem 1.6 (The No-Cloning Theorem) There does not exist a quantum op-
eration that maps |ψ to |ψ |ψ for all states |ψ .
Proof: Suppose we have such a quantum operation U. Let |ψ and |φ be any two
pure quantum states. By definition,
|ψ |0
U
→ |ψ |ψ ,
|φ |0
U
→ |φ |φ , and
(|ψ + |φ ) |0
U
→ (|ψ + |φ )(|ψ + |φ )
= |ψ |ψ + |ψ |φ + |φ |ψ + |φ |φ (1.9)
But by linearity,
(|ψ + |φ ) |0
U
→ |ψ |ψ + |φ |φ (1.10)
21
But (1.9) and (1.10) differ in general.
Finally, we can observe the state of a classical computer at any time without
any side-effects to our simulation of a classical computer on a quantum computer.
This is because we are only using the computational basis states. We may measure
a computational basis state in the computational basis with certainty and without
altering its state.
Therefore, we can simulate a classical computer on a quantum computer as
claimed.
1.4 Circuit Complexity
In this section, we informally discuss classical and quantum complexity. It is assumed
that the reader has basic knowledge of classical complexity theory.
One form of measuring the complexity of a classical algorithm is circuit complexity
where we assume that a set of circuits (C1, C2, . . .) performs a certain algorithm. To
account for varying input sizes, for n = 1, 2, . . ., the circuit Cn has an n-bit input
string [Pap94, pp. 267-268]. We say that a set of circuits (C1, C2, . . .) is in O(f(n))
when there exist positive constants c and n0 such that the circuit Cn requires at
most c · f(n) logical operations from some set of universal logical operations for all
n ≥ n0 [CLR99, page 26].
As noted previously, the logical operations AND, OR, NOT, and FANOUT form a
universal set of logical operations. Any classical algorithm can be reduced to a series
of operations in this set. In the quantum scenario, the CNOT operation combined
with all 2 × 2 unitary operations form a universal set of quantum operations, such
that any quantum algorithm can be described as a series of these operations [NC00,
pp. 191-193]. The complexity of a quantum circuit is analogous to the classical circuit
complexity, where we count the number of quantum operations from a universal set
22
of quantum operations rather than a set of universal logical operations to bound the
complexity of a quantum circuit.
1.5 The Density Operator Formalism
Measurements in quantum computing induce probability distributions, so it seems
natural to consider a probabilistic quantum computer. We describe a probabilistic
quantum computer in the density operator formalism. We have taken great steps to
minimize the use of density operators and maximize the use of pure states because
pure states are typically easier to understand. However, there are some occasions in
this thesis where we must use density operators. The following is a brief summary of
the density operator formalism which is enough to understand its use in this thesis.
For more information on the density operator formalism, see [NC00, pp. 98-108].
1.5.1 Mixed States
Previously, we described a quantum system as a certain pure state |ψ ∈ H⊗n
.
Now suppose that a quantum system is in one of a number of probable pure states
|ψ1 , |ψ2 , . . . , |ψn ∈ H⊗n
, where the probability that the quantum system is in the
state |ψx is p(x). We describe such a state as a density operator, or mixed state,
ρ =
x
p(x) |ψx ψx| .
For example, recall that (1.6) described a quantum system as being in the state
α|00 +β|01
√
|α|2+|β|2
with probability |α|2
+ |β|2
, and in the state δ|10 +γ|11
√
|δ|2+|γ|2
with probability
|δ|2
+ |γ|2
. Using the mixed state formalism, we represent this probabilistic mixture
as
ρ = (α |00 + β |01 ) ¯α 00| + ¯β 01| +
(δ |10 + γ |11 ) ¯δ 10| + ¯γ 11| .
23
1.5.2 Quantum Evolution
Previously, we described quantum evolution as a unitary operation U mapping |ψ ∈
H⊗n
to U |ψ ∈ H⊗n
. Quantum evolution is still described by a unitary operation U.
Namely, if a system was in the state |ψx with probability p(x), then after applying
U, the system will be in the state U |ψx with probability p(x). Using the density
operator formalism, we describe quantum evolution as
ρ =
x
p(x) |ψx ψx|
U
→
x
p(x)(U |ψx )( ψx| U†
)
=
x
p(x)U |ψx ψx| U†
= U
x
p(x) |ψx ψx| U†
= UρU†
.
1.5.3 Measurement
Quantum measurements are still described by a collection {M0, . . . , Mm} of linear
operations. Recall that in the pure state formalism, given that the initial state was
the pure state |ψ , the probability that we measure y = 0, . . . , m is
ψ| M†
y My |ψ ,
and the state immediately after measurement is defined to be
My |ψ
ψ| M†
y My |ψ
.
24
This implies that, using the mixed state formalism, the probability that we measure
y = 0, 1, . . . , m is
x
p(x) ψx| M†
y My |ψx =
x
p(x) · tr(|ψx ψx| M†
y My)
= tr
x
p(x) |ψx ψx| M†
y My
= tr(ρM†
y My),
where tr(·) is the trace of a matrix. Also it is possible to derive that the state after
the measurement is
ρy =
MyρM†
y
tr(ρM†
y My)
. (1.11)
We omit this derivation here (see [NC00, pp. 99-100] for the derivation).
1.5.4 Fidelity
At times, it is necessary to see how “close” a mixed state is to some pure state. We
quantify this with the notion of fidelity defined below.
Definition 1.7 Let |ψ be a pure state and ρ a mixed state. The fidelity between
|ψ and ρ is
F(|ψ , ρ) = ψ| ρ |ψ .
The above definition is a specialized version of fidelity. See [NC00, pp. 409-415]
for more information. If ρ = |φ φ| (i.e. we are certain that the quantum system is
in the pure state |φ ), then
F(|ψ , ρ) = ψ| ρ |ψ
= ψ | φ φ | ψ
= | ψ | φ |.
25
So the fidelity between two pure states is simply the absolute value of the inner
product.
1.6 Entanglement
Entanglement is an astonishing property that quantum physical systems can acquire
which cannot be represented classically. The Bell states, also known as EPR pairs
after Einstein, Podolsky and Rosen [EPR35], best demonstrate entanglement. They
are the states
|β0,0
def
=
|00 + |11
√
2
,
|β0,1
def
=
|01 + |10
√
2
,
|β1,0
def
=
|00 − |11
√
2
, and
|β1,1
def
=
|01 − |10
√
2
.
Note that they cannot be written as a tensor product state of two qubits. For
instance,
|00 + |11
√
2
= |ψ ⊗ |φ ,
for any |ψ , |φ ∈ H. This property is called entanglement.
Measuring the Bell states results in a surprising outcome. Consider measuring
|β0,0 or |β0,1 in the computational basis. Measuring the first qubit of |β0,0 or
|β0,1 innocuously results in observing 0 with probability 1
2
and 1 with probability 1
2
.
However, when one measures the second qubit at the same time or any time after
the first measurement, the outcome is identical to the first measurement. Measuring
|β1,0 or |β1,0 results in anti-correlated measurements. We may consider these states
as distributed random number generators, because if two parties share a Bell state
26
and require an identical and perfectly random bit shared between them, then they
simply measure their half of the Bell state.
Figure 1.12 illustrates a quantum circuit generating the Bell states. Let us quickly
|βz,x
xH
Xx
Zz
|0
|0 h
Figure 1.12: A quantum circuit generating the Bell basis states.
follow the circuit starting with
|00
H⊗I
−→ |+ |0
=
1
√
2
(|0 + |1 ) |0
=
|00 + |10
√
2
CNOT
−→
|00 + |11
√
2
.
Next, we perform the operation I ⊗ Zz
. Note that when z = 1, we perform the
operation I ⊗ Z1
= I ⊗ Z which changes the phase when the second qubit is |1 , and
when z = 0 we perform the operation I ⊗ Z0
= I ⊗ I which does not change the
state at all. Thus,
|00 + |11
√
2
I⊗Zz
−→
|00 + (−1)z
|11
√
2
.
Similarly, performing I ⊗ Xx
results in
|00 + (−1)z
|11
√
2
I⊗Xx
−→
|0, x + (−1)z
|1, ¬x
√
2
= |βz,x .
Finally, note that the Bell states form an orthonormal basis in H⊗2
. In Chapter
5, it will be useful to describe any state in H⊗2n
as a superposition of n Bell states.
27
1.6.1 The Partial Trace
Sometimes we have some state in H⊗n
and wish to disregard the last m qubits.
Unfortunately, the last m qubits may be entangled with the first n − m qubits, so
we cannot represent them as a product of two states. We use the partial trace to
represent one or more qubits entangled with a larger space without discussing the
larger space. For instance, say we had the Bell state
|00 + |11
√
2
,
and wanted to describe the first qubit alone. What would the state of the first qubit
be? Consider the case where we threw the second qubit into the trash where it
inadvertently got measured in the computational basis. Now, our first qubit is no
longer entangled with the qubit in the trash because it was measured. Also, our
first qubit is identical to the ignored qubit in the trash, again, because the two were
entangled. Thus, without looking at our first qubit, we know it is |0 with probability
1
2
and |1 with probability 1
2
. Namely, the mixed state
1
2
|0 0| +
1
2
|1 1| =
1
2
I,
which is a random qubit called the completely mixed state. On the other hand, say
we had the pure state |00 . Since we can represent it as a product of two states (i.e.
|00 = |0 ⊗ |0 ), if we threw away the second qubit, then the first qubit will keep
the state |0 as long as the second qubit does not jump out of the trash to reunite
with the first qubit.
To adequately represent a quantum subsystem in H⊗(n−m)
of a larger quantum
system in H⊗n
that cannot be represented as a product of two states |φ ⊗|ψ , where
|φ ∈ H⊗(n−m)
and |ψ ∈ H⊗m
, we must trace out the last m qubits using the partial
trace. The first n − m qubits are described by the state trH⊗m ρ defined below.
28
Definition 1.8 Let ρ be a mixed state representing an ensemble of pure states in
H⊗n
. Then the partial trace of ρ is
trH⊗m ρ =
2m
i=1
(I ⊗ φi|)ρ(I ⊗ |φi ),
where |φ1 , . . . , |φ2m is a set of orthogonal basis vectors in H⊗m
.
The partial trace is represented as a “trash can” in quantum circuit diagrams,
such as in Figure 1.13; however, it should not be considered a quantum operation.
Rather consider tracing out a qubit as safely ignoring it.
ρ
§
¦
¤
¥......................................
......................................
trH⊗m ρ
¦ ¥
Figure 1.13: A quantum circuit representing the partial trace.
Chapter 2
Information Theory and Cryptography
2.1 Information Theory
Information theory is the mathematical formalism used to quantify the amount of
randomness versus the amount of useful data within one random variable, or shared
amongst many random variables. Both [McE77] and [MS77] are excellent resources
on this topic. In fact, the pedagogy and proofs in this section closely follow [McE77,
pp. 15–26].
Probability theory is the foundation of information theory. As such, we use the
following definitions and theorems from probability theory:
• Let A be a discrete random variable taking on the values a ∈ {0, 1}n
with
probability Pr[A = a]. Let p(a) = Pr[A = a] be the probability mass function
of A. Likewise, let B be a discrete random variable taking on the values
b ∈ {0, 1}n
with respect to the probability mass function p(b) = Pr[B = b],
and let E be a discrete random variable taking on the values e ∈ {0, 1}m
with
respect to the probability mass function p(e) = Pr[E = e].
• For a ∈ {0, 1}n
, and b ∈ {0, 1}n
define
p(a, b) = Pr[A = a and B = b],
p(a|b) =
p(a, b)
p(b)
, when p(b) = 0.
So
p(a, b) = p(a|b)p(b).
29
30
• Bayes’ Theorem states: if p(b)  0, then
p(a|b) =
p(a)p(b|a)
p(b)
.
• Let x1, . . . , xn ∈ (0, 1) such that n
i=1 xi = 1, and y1, . . . , yn ∈ R≥0
. Then
Jensen’s Inequality of logarithms states:
n
i=1
xi log2 yi ≤ log2
i
xiyi ,
with equality when x1 = . . . = xn = y1 = . . . = yn.
Bayes’ Theorem is proved in many elementary probability theory texts including
[WMS96, pp. 62–63]. Among others, Bollobas [Bol90, pp. 3–4] contains a nice proof
of Jensen’s Inequality.
Our first information theoretic quantity is the binary Shannon entropy, or simply,
the Shannon entropy.
Definition 2.1 The binary Shannon entropy of A is defined as
H(A) =
a∈{0,1}n
p(a) log2
1
p(a)
,
where we follow the convention that 0 log2
1
0
def
= 0.
The Shannon entropy, H(A) can be thought of as quantifying the number of
random bits contained in the outcome of A. For instance, let p(a) = 1 for some
chosen a, then there is no randomness in A because we will only observe the chosen
a, and we can intuitively assume H(A) = 0. On the other extreme, say p(a) = 1
2n
for all a, then each outcome occurs with equal probability, and intuitively, every bit
we observe will be random. These intuitions are proved in the next lemma.
Lemma 2.2 H(A) ≥ 0 with equality when p(a) = 1 for some a ∈ {0, 1}n
; and
H(A) ≤ n with equality when p(a) = 1
2n for all a ∈ {0, 1}n
.
31
Proof: Since p(a) ≤ 1 for all a, then for all p(a) = 0, p(a) log2
1
p(a)
≥ 0. Further-
more, p(a) log2
1
p(a)
= 0 if and only if p(a) = 1 or p(a) = 0. So, H(A) = 0 if and only
if p(a) = 1 for some a.
By Jensen’s inequality of logarithms,
H(A) =
a∈{0,1}n
p(a) log2
1
p(a)
≤ log2


a∈{0,1}n
p(a)
1
p(a)


= n,
with equality when p(a) = 1
2n for all a ∈ {0, 1}n
.
Another measure of information is called mutual information which quantifies the
amount of information A provides about B. Mutual information is defined below.
Definition 2.3 The mutual information between A and B is
I(A; B) =
a,b
p(a, b) log2
p(b|a)
p(b)
.
Two useful points regarding mutual information are: since for p(a), p(b)  0,
p(b, a) = p(b|a)p(a), then I(A; B) = a,b p(a, b) log2
p(a,b)
p(a)p(b)
; and by Bayes’ Theorem,
I(A; B) = I(B; A).
If we have three random variables, we can derive the amount of mutual informa-
tion that E provides about A and B as
I(A, B; E) =
a,b,e
p(a, b, e) log2
p(e|a, b)
p(e)
,
where p(e|a, b) = p(a,b,e)
p(a,b)
= Pr[A=a and B=b and E=e]
Pr[A=a and B=b]
.
32
2.2 Quantum Information Theory
In this section, we motivate and define a quantum version of Shannon entropy. Let
X be a random variable taking on the value x ∈ {0, 1}n
with probability p(x). Let
ρ1, . . . , ρn be mixed states. Given a mixed state ρ = n
x=1 p(x)ρx, we can ask: “how
much randomness is in ρ?” For instance, how certain are we of the outcome of
measuring ρ. If ρ = |0 0|, then there is no randomness or uncertainty, because
the outcome is always 0 upon measuring it in the computational basis. However,
if ρ = 1
n
n
x=1 |x x|, then measuring ρ will output log2 n random bits. Hence, it
seems appropriate to define an uncertainty quantity for mixed states—the entropy
of ρ. We call this entropy the von Neumann entropy defined below.
Definition 2.4 Let ρ be a mixed state with eigenvalues λ1, . . . , λn. The von Neu-
mann entropy of ρ is
S(ρ) =
n
x=1
λx log2
1
λx
.
The above definition is from [NC00, page 510]. Also, [NC00, Theorem 2.5 pp. 101-
102] proves that tr(ρ) = 1 and ρ is a positive operator, so S(ρ) ≥ 0.
The weak Holevo Bound identifies a relationship between quantum and classical
information. It is stated below without proof. See the original work of Holevo
[Hol73]. Alternatively, see [Wat03, Lectures 14–17] or [NC00, pp. 531–534] for more
understandable proofs.
Theorem 2.5 (The Weak Holevo Bound) Suppose Alice prepares a state ρA where
A = a with probability p(a), and gives Eve the state ρ = a p(a)ρa. Eve performs a
quantum measurement described by the elements {M0, . . . , Mm} on ρ. Eve’s measure-
ment outcome is represented by the random variable E taking on the values 0, . . . , m.
Then for any such measurement {M0, . . . , Mm} done by Eve,
I(A; E) ≤ S(ρ).
33
2.3 Cryptography
Cryptography is the mathematical study of information security. There are many
objectives to information security. Three very important objectives are quoted below
from [MvOV97, page 3]:
Privacy: “keeping information secret from all but those who are authorized to see
it,”
Entity Authentication: “corroboration of the identity of an entity (e.g. a person,
a computer terminal, etc.),” and
Data Integrity: “ensuring information has not been altered by unauthorized or
unknown means.”
Privacy is the core focus of this thesis. We define a private channel as a means for
two parties to communicate with privacy. Typically, a private channel is realized by
a protocol utilizing an existing insecure public channel: a means of communication
whereby anyone can observe all communications.
Entity authentication is another very important objective in information security
deserving far more discussion than given in this thesis. We do not discuss entity
authentication in this thesis and assume that our three main entities, Alice, Bob,
and Eve are always authenticated when communicating over a channel exchanging
classical information.
Data integrity is discussed to some extent in this thesis. Chapters 3 and 4 are
dedicated to data integrity; however, they are not discussed in a cryptographic sense.
Only in Chapter 6 do we use our data integrity techniques developed in the previous
chapters for cryptographic data integrity.
For more information on the many aspects of cryptography, refer to the texts
[MvOV97] and [Sti95]. Unfortunately, there are many equivalent definitions in clas-
34
sical1
and quantum cryptography under different names. We tend to follow the
definitions used in quantum cryptography while noting the equivalent definitions in
classical cryptography.
2.3.1 Creating a Private Channel
The process of creating a private channel is best described as a game among three
entities we name Alice, Bob, and Eve. Alice’s and Bob’s goal is for Alice to communi-
cate a message to Bob in private, whereby Eve “the adversary” gains no knowledge of
the private message. Eve’s goal is to acquire as much information about the message
as possible.
To play the game, Alice and Bob use a symmetric cryptosystem, equivalently
known as a cipher, in conjunction with a key distribution protocol, also known as a
key establishment protocol.
Definition 2.6 A symmetric cryptosystem is a five-tuple (K, M, C, E, D), consist-
ing of five nonempty finite sets: a key space K, a message space M, a ciphertext
space C, an encryption function space E = {ek : M → C|k ∈ K}, and a decryption
function space D = {dk : C → M|k ∈ K}. For each k ∈ K, there is an encryp-
tion/decryption function pair (ek, dk) satisfying the property that for all messages
m ∈ M, dk(ek(m)) = m. The elements of K are called secret keys, the elements of
M messages and the elements of C ciphertexts [Sti95, adapted from Definition 1.1].
Consider applying a symmetric cryptosystem to the game of creating a private
channel. If we allow Alice and Bob to communicate in private before playing the
game, then Alice and Bob may exchange a secret k ∈ K in private prior to starting.
Upon starting the game, Alice chooses a message m, encrypts the message to the
1
“Classical cryptography” is short for cryptography based on classical physics. However, some
literature defines classical cryptography as cryptography used prior to the end of World War II.
35
ciphertext c = ek(m), and sends the ciphertext c to Bob. By Definition 2.6, Bob
may decrypt c to the original message m by applying dk(c) = dk(ek(m)) = m.
The success and failure of the game rests on Eve’s shoulders. Since k was ex-
changed in private before the game begun, Eve only observes c. Eve wins the game,
or equivalently breaks the cipher, if she extracts a significant amount (to be specified
shortly) of m from c. Otherwise, the cipher is said to be secure. To make the key as
difficult as possible to guess, Alice and Bob attempt to chose the secret key k ∈ K
uniformly at random. We will investigate the probability distributions of the key
space, message space, and ciphertext space shortly.
There are many different forms of security. For instance, if Eve cannot break
the cipher before some predetermined amount of time, then we say the cipher is
computationally secure. Adding such constraints only makes the definition of security
weaker, but in this thesis, we define and use only the strongest definitions of security.
When Alice and Bob do not have the luxury of secretly distributing keys a priori,
they must somehow generate keys which are identical, secret and random in the
presence of Eve. To do so, Alice and Bob use a key distribution protocol which is a
multi-party process whereby a shared secret key becomes available to two parties, for
subsequent use in a symmetric cryptosystem [MvOV97, page 490]. Key distribution
protocols will be discussed shortly, but for now, let us continue to focus our attention
on symmetric cryptosystems.
2.3.2 The One-Time Pad
The Vernam one-time pad cipher, or simply the one-time pad, insures that Alice
and Bob win the private channel game. It is an unconditionally secure symmetric
cryptosystem whereby Eve receives absolutely no information about the message
from the ciphertext, thus allowing Alice and Bob to communicate in absolute privacy.
In this section, we define the one-time pad, define unconditional security, and prove
36
that the one-time pad is unconditionally secure.
The one-time pad was first published by Gilbert Vernam in [Ver26] and is de-
scribed in Protocol 2.1.
Protocol 2.1 The Vernam One-Time Pad Cipher
Given: Let K = M = C = {0, 1}n
. Assume Alice and Bob privately exchanged a
secret key k ∈ K prior to starting the protocol.
1: Alice encrypts an n bit message m with
ek(m) = m ⊕ k,
and sends c to Bob over an insecure channel.
2: Bob receives c and decrypts with
dk(c) = c ⊕ k,
to regain the Alice’s message m.
The one-time pad correctly encrypts and decrypts any message because for any
k ∈ K, m ∈ M, and c ∈ C,
dk(ek(m)) = (m ⊕ k) ⊕ k
= m ⊕ (k ⊕ k)
= m ⊕ 0
= m.
Previously, we considered K, C, and M as sets. At times, it is appropriate to
assign probability distributions to these sets. Consider K, C and M as both sets
and random variables. Let K be a random variable taking on the key value k ∈ K
with probability Pr[K = k]. If the key is chosen uniformly at random in order to
make it as difficult as possible for Eve to guess, then Pr[K = k] = 1
2n . Let M be
a random variable taking on the message value m ∈ M. The a priori probability
that the message m occurs is Pr[M = m]. The two random variables K and M
37
induce the probability distribution of the ciphertext. Let C be a random variable
taking on the ciphertext c ∈ C with probability Pr[C = c]. We find Pr[C = c]
based on the probability distributions of K and M by fixing a k ∈ K and letting
Ck = {ek(m) : m ∈ M} be the set of all possible ciphertexts given that the key was
k. Then for every c ∈ C,
Pr[C = c] =
k∈K : c∈Ck
Pr[K = k] · Pr[M = dk(c)].
It is also useful to know the probability that the ciphertext c ∈ C was obtained when
the message was m ∈ M. This probability is
Pr[C = c|M = m] =
k∈K : m=dk(c)
Pr[K = k].
Using this notation, we can now define unconditional security.
Definition 2.7 Let (K, M, C, E, D) be a symmetric cryptosystem. Let m ∈ M be a
message chosen by Alice and c = ek(m). Then (K, M, C, E, D) is unconditionally
secure if Pr[M = m|C = c] = Pr[M = m], for all m ∈ M and c ∈ C.
In plain English, a cipher is unconditionally secure if Eve’s chance of guessing
the message m is the same with or without knowing the ciphertext c. Therefore, the
only data Eve sees, the ciphertext, has absolutely no value! Unconditional security
is also known as perfect secrecy, and was defined by Claude Shannon in [Sha49]. In
the same paper, Shannon proved that the one-time pad is unconditionally secure.
Theorem 2.8 If each key is chosen with equal likelihood, then the one-time pad is
unconditionally secure.
Proof: This proof is based on the proof of a similar cryptosystem in [Sti95, pp. 48–
49].
38
The theorem assumes Alice and Bob chose the key k ∈ K uniformly at random.
Namely, Pr[K = k] = 1
2n for all k ∈ K.
Our next step is to show Pr[C = c] = Pr[C = c|M = m] for all m ∈ M and
c ∈ C. Let c ∈ C and m ∈ M. First consider Pr[C = c]. Since
Ck = {ek(m) : m ∈ {0, 1}n
}
= {k ⊕ m : m ∈ {0, 1}n
}
= k ⊕ {0, 1}n
= {0, 1}n
,
we can express Pr[C = c] as,
Pr[C = c] =
k∈{0,1}n
Pr[K = k] · Pr[M = dk(c)]
=
k∈{0,1}n
1
2n
· Pr[M = c ⊕ k]
=
1
2n
m∈{0,1}n
Pr[M = m]
=
1
2n
.
The third equality is based on the fact that for any fixed c the map k → c⊕k is a per-
mutation on the set {0, 1}n
. Thus k∈{0,1}n Pr[M = c ⊕ k] is just k∈{0,1}n Pr[M =
k] = m∈{0,1}n Pr[M = m] with the summands permuted. The last inequality above
is true because the sum of probabilities of any probability distribution is 1.
Next, consider Pr[C = c|M = m] which is
Pr[C = c|M = m] =
k∈K : m=dk(c)
Pr[K = k]
= Pr[K = k]
=
1
2n
.
39
The second equality is because there is only one key which decrypts a fixed message
to a fixed ciphertext.
Finally, by Bayes’ Theorem,
Pr[M = m|C = c] =
Pr[M = m] · Pr[C = c|M = m]
Pr[C = c]
=
Pr[M = m] · 1
2n
1
2n
= Pr[M = m].
Therefore, by Definition 2.7, the one-time pad is unconditionally secure.
2.3.3 The Diffie–Hellman Key Distribution Protocol
If we disallow Alice and Bob to communicate in private before playing the game,
then Alice and Bob must use a key distribution protocol to generate a secret key
k ∈ K, and then continue using a cipher as before. If Alice and Bob use the key
for the one-time pad, then all of Eve’s attacks are based on her knowledge gained
from the key distribution protocol. In this case, the security of the key distribution
protocol is the weakest link in creating a private channel.
The Diffie–Hellman key distribution protocol [DH76] was the first key distribution
protocol that did not require a trusted third party. It is described in Protocol 2.2.
The Diffie–Hellman key distribution protocol correctly generates identical keys
because
(a )b
≡ (ga
)b
(mod p)
≡ (gb
)a
(mod p)
≡ (b )a
(mod p).
Let us briefly analyze the security of the Diffie–Hellman key distribution protocol.
During the communication, Eve can acquire g, p, a and b . One possible attack is
40
Protocol 2.2 The Diffie–Hellman key distribution protocol
Given: Alice and Bob publicly select a prime p and a generator g of Z∗
p.
1: Alice randomly chooses a secret 1 ≤ a ≤ p − 2, and sends a ≡ ga
(mod p) to
Bob.
2: Bob randomly chooses a secret 1 ≤ b ≤ p − 2, and sends b ≡ gb
(mod p) to
Alice.
3: Upon receiving a , Bob computes the key
k ≡ (a )b
(mod p), 0 ≤ k  p.
4: Upon receiving b , Alice computes the key
k ≡ (b )a
(mod p), 0 ≤ k  p.
for Eve to find a ≡ logg a (mod p) and calculate k ≡ (b )a
(mod p). Equivalently,
Eve may find b ≡ logg b (mod p) and calculate k ≡ (a )b
(mod p).
The problem of finding a from a is called the discrete logarithm problem. With
current technology, the discrete logarithm problem seems quite difficult to solve. The
fastest known classical discrete logarithm solver is the number field sieve requiring
a staggering number of operations, specifically 2O(n
1
3 log
2
3 n)
operations, where n =
log2 p (See [Gor93] and [Sch00]). Using current technology, it is possible to compute
the discrete logarithm of numbers with 399-bit modulus [JL02]. Besides a brute-force
search, solving the discrete logarithm problem is the only known classical attack on
Diffie–Hellman.
Extracting discrete logarithms represents one of a number of practical problems
where quantum computers excel. In 1994, Shor created a quantum algorithm to
solve the discrete logarithm problem in probabilistic polynomial-time, specifically in
O(n2
log n log log n) quantum operations [Sho94].
Upon the physical realization of a general purpose quantum computer, most key
distribution protocols, including Diffie–Hellman, will be rendered useless, and since
[Gor93] is only the best currently known algorithm, it is possible that someone may
41
find a better classical technique to break Diffie–Hellman even sooner. Therefore, it is
necessary to design provably secure key distribution protocols under strict security
definitions. A few such key distribution protocols exist, one of which is the BB84
quantum key distribution (QKD) protocol.
2.3.4 The BB84 Quantum Key Distribution Protocol
The BB84 QKD was developed by Charles Bennett and Gilles Brassard in 1984.
The acronym is simply based on its citation [BB84]. The original BB84 protocol
is described in Protocol 2.3 with a slight modification2
. We provide an example of
BB84 when no errors occurred in Figure 2.1.
Protocol 2.3 The Original BB84 QKD
1: Alice chooses two random n-bit strings r, b, and creates the state
|ψ = (Hb1
⊗ · · · ⊗ Hbn
) |r
= H(A)
|r ,
and sends |ψ to Bob.
2: Bob chooses a random n-bit string b , receives a noisy version of |ψ , applies
H(B)
= (Hb1 ⊗ · · · ⊗ Hbn ) to |ψ and measures the qubits on the computational
basis to form the bit string r .
3: Alice and Bob publicly disclose b and b . If bi = bi, then they discard the ith
bit
of r and r .
4: Alice and Bob publicly decide on a random permutation P and apply P to their
respective remaining bits. The first half of Alice’s permuted bits are her test bits
t, and last half are her key bits k. The first half of Bob’s permuted bits are his
test bits t , and last half are his key bits k .
5: Alice and Bob publicly disclose t and t . If t = t , then they assume k = k and
it is a secure key. Otherwise, they abort.
For over 10 years, BB84 was presumed to be secure because of the No-Cloning
Theorem (Theorem 1.6). Since Eve cannot perfectly copy the qubits sent to Bob,
2
The original BB84 protocol randomly selected test bits, rather than randomly permuting the
bits and selecting the first half as test bits.
42
Description Classical Data Quantum Data
1: Alice randomly chooses r = [01010101],
b = [00110011],
calculates Hb1
⊗ · · · ⊗ Hbn
|r resulting in |ψ = |01+−01+− ,
and sends |ψ to Bob.
2: Bob receives |ψ , chooses b = [01111101],
applies Hb1 ⊗ · · · ⊗ Hbn to |ψ to get |ψ = |0−01+−+1 ,
and measures |ψ to get r = [00011101].
3: Alice and Bob exchange b and b
keeping only the bits where bi = b1.
Alice now has the bit string r = [0 01 1],
and Bob now has the bit string r = [0 01 1].
4: Alice and Bob decide on a random
permutation matrix P =




1000
0001
0100
0010



.
Alice applies P to r rP = [tk] = [0110],
and Bob applies P to r r P = [t k ] = [0110].
5: Alice and Bob disclose t and t . Since
t = t they assume k = k and k is secret.
Measuring |± in the computational basis results in observing a random bit. The italicized
bits indicate instances of a random observation.
Figure 2.1: An example of the original BB84 QKD where no errors occurred.
43
there is little chance she shares the same result as Bob. Furthermore, measurement
implies disturbance, so if Eve is too aggressive, then Bob’s test bits will differ from
Alice’s test bits, thus forcing Alice and Bob to abort. We did not comment on
the one-time pad or Diffie–Hellman aborting, but nevertheless, these protocols are
susceptible to such an attack as well. In these protocols, Eve could simply block or
manipulate transmitted messages forcing Alice and Bob to abort. This is commonly
known as a denial of service attack (DoS). A DoS attack is not a complete attack
because it does not allow Alice to transmit a message to Bob; hence Eve has nothing
to eavesdrop on.
The original BB84 was also impractical. If any errors occurred, whether it was
from an adversary, or natural noise, then the protocol would abort. Since noise is
common in nature, the original BB84 will almost always abort in practice. This
led to a movement to alter BB84 to make it more practical by adding classical post-
processing, such as additional classical computation and/or classical communication.
Adding classical post-processing to BB84 has the benefit of increasing its security
without increasing its difficulty to implement. However, such additions made BB84
more complex and made it seemingly more difficult to prove its security.
In 1994, Mayers posted a preprint later published in print as [May01] which was
the first proof that the original BB84 and a practical BB84 were secure. Unfortu-
nately, [May01] was understood by few, and many still questioned the security of
BB84. In 2000, Shor and Preskill found a “simple” proof of BB84’s security [SP00]
based on Mayers’ unique use of error correcting codes. This thesis proves the security
of a practical BB84 using the Shor–Preskill style introduced in [SP00], and provides
all necessary background material to understand the Shor–Preskill proof.
44
2.3.5 On the Security of Key Distribution Protocols
In this section, we define a strict security model for key distribution protocols and
discuss its application to Diffie–Hellman. In the context of this thesis, we must define
two types of attacks: a passive attack and an active attack. Eve performs a passive
attack when she does not alter or interrupt communication between Alice and Bob.
Otherwise, Eve’s attack is an active attack.
Definition 2.9 Let 0 ≤  1, let 0 ≤ δ  1
2
, and let KDP be a key distribution
protocol performed between two parties, Alice and Bob, which either aborts or pro-
duces output to both Alice and Bob. Let A, B and E be random variables taking on
some k-bit value: A and B represent Alice’s and Bob’s respective outcomes upon per-
forming KDP, and E represents Eve’s outcome upon performing any eavesdropping
strategy on KDP. Then KDP is (k, , δ)-conditionally secure if:
1. Correctness and Privacy: If Eve performs a passive attack, then with probability
1 − δ, Alice and Bob complete the protocol, and there exists a perfectly uniform
k-bit string represented by the random variable C (i.e. H(C) = k) such that
Pr[A = B = C] ≥ 1 − , (2.1)
and
I(C; E) ≤ . (2.2)
2. Robustness: If Eve performs an active attack, then with probability 1−δ, Alice
and Bob either abort or complete the protocol, satisfying both (2.1) and (2.2).
In information theoretic cryptography, a (k, , δ)-conditionally secure protocol is
also known as a robust (PABE, k, , δ)-protocol [MW03, Definition 5], where PABE is
the joint probability distribution of A, B, and E. In fact, the above definition is
adapted from [MW03, Definition 5].
45
In plain English, Definition 2.9 states that a key distribution protocol is (k, , δ)-
conditionally secure if in the event that Alice and Bob complete the protocol without
aborting, then with high probability, they have identical and random keys of which
Eve has little knowledge. The value bounds both the probability that Alice and
Bob create a uniformly random key, and the amount of information Eve obtains. The
value δ bounds the probability that the protocol performed correctly under attack.
Thus, the smaller and δ get, the more secure and robust the key distribution
protocol is.
Conditional security shares some similarity with unconditional security. A (k, 0, δ)-
conditionally secure key distribution protocol is said to be unconditionally secure,
since
I(C; E) =
c,e
p(c, e) log2
p(c|e)
p(c)
= 0,
implies p(c|e)
p(c)
= 1, or equivalently, p(c|e) = p(c) for all c, e where p(c)  0. If one
interprets the outcome of a key distribution protocol as a “message” and all public
communications as “ciphertext” (i.e. c is the message and e is the ciphertext), then
a (k, 0, δ)-conditionally secure key distribution protocol is unconditionally secure as
per Definition 2.7.
On the other extreme, when no 0 ≤  1 exists for a key distribution protocol
to be (k, , δ)-conditionally secure, we say it is information theoretically insecure.
Proposition 2.10 The Diffie–Hellman key distribution protocol is information the-
oretically insecure.
This proposition seems intuitive because we do not impose a time limit or physical
limit on Eve. We simply assume that Eve must follow the laws of physics. This
allows her to either take the time to solve the required discrete logarithm on a
46
classical computer, or quickly solve the discrete logarithm on a quantum computer,
and subsequently find the key. Hence C = E, so
I(C; E) =
c,e
p(c, e) log2
p(c|e)
p(c)
=
c
p(c) log2
1
p(c)
= H(C)
= k
 1.
The second equality holds because C = E, so p(c, e) = p(c, c) = p(c) and p(c|e) =
p(c|c) = 1. See [MW03, Corollary 6] for a formal proof.
Chapter 6 is dedicated to proving that a practical version of BB84 is conditionally
secure. Proving so requires a base knowledge in the theory of classical error correcting
codes and quantum error correcting codes. We discuss classical error correcting code
in the next chapter.
Chapter 3
Coding Theory
Suppose we wish to send digital information as electrical pulses over a wire. Due to
environmental disturbances and physical imperfections in the wire, the transmitter,
and the receiver, some 0’s may be damaged and mistaken as 1’s and vice versa. We
define a noisy channel, or more formally, a (memoryless) binary symmetric channel
as this scenario, and characterize it by the probability p that 0 erroneously flips to
1, or 1 erroneously flips to 0. A binary symmetric channel with error probability
p is often denoted as BSC(p). We assume p  1
2
because a BSC(1
2
) is simply a
random number generator and useless for communication. Also, if two parties share
a BSC(p), where p  1
2
, then the sender may invert the input or receiver may invert
the output to simulate a BSC(1 − p).
E
E
r
rr
rr
rr
rr
rrj¨
¨¨
¨¨
¨¨
¨¨
¨¨B0
1
0
1
p p
1 − p
1 − p
Figure 3.1: A visual representation of a binary symmetric channel with error prob-
ability p.
To send a digital message reliably over a binary symmetric channel, the sender
encodes a message with a code into a codeword and then sends the codeword over the
binary symmetric channel to the receiver. The receiver decodes the codeword back
to the original message.
In this chapter, we introduce the concept of coding theory, beginning with repe-
tition codes, and then rigorously define binary linear codes. This chapter is based on
47
48
three excellent books on coding theory, namely [MS77], [McE77], and [Ber74]. The
text [Ber74] is a compilation of many republished fundamental papers on coding
theory.
3.1 Repetition Codes
To understand the general process of coding, we shall first informally introduce
repetition codes by example. A repetition code of length n encodes a single bit into
n copies. For example, a repetition code of length 3 has the following encoding:
0
encode
−→ 000, and
1
encode
−→ 111.
To decode, the receiver merely takes a majority vote. More 0’s typically implies
the original message was 0; and more 1’s typically implies the original message was
1. If zero or one bit flips occur in the codeword, then the majority decoding works
correctly. If two or three bit flips occur, then the majority decoding will err. In
general, a repetition code of length n protect against up to n−1
2
bit flip errors in
the codewords. The possible correctable errors for the repetition code of length 3
can be represented as the four following error strings:
e0 = 000 (i.e. no error),
e1 = 100,
e2 = 010, and
e3 = 001.
We represent an erroneous codeword by XORing an error string with the original
codeword. For example, the error e2 has the effect
49
000
e2
−→ 000 ⊕ 010
= 010, (3.1)
and
111
e2
−→ 111 ⊕ 010
= 101. (3.2)
Upon receiving the erroneous codeword, the receiver decodes by taking the ma-
jority vote. In this case, (3.1) would correctly decode to 0; and (3.2) would correctly
decode to 1.
3.1.1 Performance of Repetition Codes
So far, we have discussed how to make repetition codes of length n to correct t =
n−1
2
errors. How will these codes perform over a binary symmetric channel with
bit flip error probability p?
Consider a binary symmetric channel with bit flip error probability 1
10
. Sending
one message bit unencoded would be reliably received 90% of the time. If we were to
encode the message bit with a repetition code of length 3, then the code could correct
up to one bit flip error. We will now calculate the probability that a correctable error
occurs. Let Zn
2 = {[x1x2 . . . xn] | xi ∈ Z2}, and let E be a uniform random variable
that takes on vectors1
in Z3
2. The probability that no errors occur (i.e. E = [000]) is
Pr[E = [000]] = 1 −
1
10
3
=
729
1000
.
1
When applicable, we assume that binary strings of length n are equivalent to n-dimensional
binary vectors. Namely, {0, 1}n
≡ Zn
2 .
50
The probability of one bit flip occurring is Pr[E = [100]] + Pr[E = [010]] + Pr[E =
[001]]. Since Pr[E = [100]] = Pr[E = [010]] = Pr[E = [001]],
Pr[E = [100]] + Pr[E = [010]] + Pr[E = [001]] = 3 · Pr[E = [100]]
= 3 ·
1
10
· 1 −
1
10
2
=
243
1000
.
Thus, the probability that the message can be decoded correctly is 729
1000
+ 243
1000
=
972
1000
≈ 97%, which is about 7% better than sending a single bit unencoded.
What if a 97% chance of success is not good enough? Let us consider the general
case. Let C be a repetition code of length n, and let BSC(p) be a binary symmetric
channel with error probability p. We can properly decode using “majority vote” if
at most t = n−1
2
errors occur to the codeword. Recall that Pr[E = [001]] = Pr[E =
[010]] = Pr[E = [100]]. This is because they all represent one error, or equivalently,
they all represent error vectors of weight 1. The general concept of weight is defined
below.
Definition 3.1 Let x = [x1x2 . . . xn] ∈ Zn
2 be an n-bit binary vector. The (Ham-
ming) weight wt(x) is the number of 1’s in the vector, or equivalently,
wt(x)
def
=
n
i=1
xi.
The number of error vectors with wt(b) is n
b
, where n
b
= n!
(n−b)!b!
is the binomial
coefficient.
The probability that any error vector e ∈ Zn
2 occurs over BSC(p) is easily verified
to be
pwt(e)
(1 − p)n−wt(e)
. (3.3)
51
Thus, C will reliably send a single bit message over BSC(p) with probability
t
i=0
n
i
1
p
i
1 −
1
p
n−i
. (3.4)
So, given some BSC(p), we can choose a repetition code of length n with n large
enough so that the reliability is to our liking.
In §3.3, we show that repetition codes are poor in the sense that, as we approach
higher success rates, the relative distance between the message length and codeword
length go to infinity very quickly. However, there exist binary linear codes with
reasonable message length to codeword length ratios with high success rates.
3.2 Binary Linear Codes
We can mathematically describe repetition codes using linear algebra. Consider the
repetition code of length 3, let the messages, 0 and 1, be the respective vectors [0]
and [1]. The messages form the vector space Z1
2, called the message space. The
codewords, [000] and [111], form a subspace of Z3
2 called the code space, or simply,
the code. We will later refer to the code {[000], [111]} as C3. The difference between
the codewords in C3 is quite large since the two codewords differ in every bit position.
We quantify the difference by the distance defined below.
Definition 3.2 Let x = [x1 . . . xn], y = [y1 . . . yn] ∈ Zn
2 be two n-bit binary vectors.
The (Hamming) distance dist(x, y) between x and y is the number of bits where the
two binary vectors differ. Namely,
dist(x, y)
def
=
n
i=1
xi ⊕ yi.
For example, dist([000], [111]) = 3. Note the connection between distance and
weight. Namely, dist(x, y) = wt(x ⊕ y). Also
52
Definition 3.3 Let V be a set of n-bit binary vectors. We define the minimum
distance d of V as
d = min
u,v∈V : u=v
dist(u, v).
For example, the minimum distance of C3 is 3. Now, we can formally define binary
linear codes.
Definition 3.4 An [n, k, d] (binary) linear code C is a k-dimensional subspace of
the n-dimensional binary vector space Zn
2 . The value n is the code length, k the code
dimension, and d the minimum distance of all codewords in C. The rate of C is k
n
.
The message space of an [n, k, d] binary linear code is Zk
2. Note that the code
can be any k-dimensional subspace of Zn
2 , so there exist many [n, k, d] binary linear
codes for fixed parameters, n, k, and d.
The repetition code of length 3 is a [3, 1, 3] binary linear code. In general, repe-
tition codes of length n are [n, 1, n] binary linear codes.
3.2.1 Encoding
Encoding is a linear function defined by a generator matrix.
Definition 3.5 Let C be an [n, k, d] linear code. A generator matrix G for C is a
k × n matrix with row space equal to C.
Note that we can define a code by its generator matrix alone. If G is a matrix
with entries in Z2, its row space is the code generated by G.
Let C be an [n, k, d] binary linear code with generator matrix G. To encode a
message m ∈ Zk
2 into a codeword c ∈ C, one simply performs the following operation:
mG = c.
53
At times, we use the notation m
G
→ c to represent the operation mG = c.
We will regularly use the following three generator matrices defined below as
examples. They are
G1 =








1 0 0 0 0 1 1
0 1 0 0 1 0 1
0 0 1 0 1 1 0
0 0 0 1 1 1 1








, (3.5)
G2 =





0 1 1 1 1 0 0
1 0 1 1 0 1 0
1 1 0 1 0 0 1





, (3.6)
G3 = [111].
The row space of G1 is a famous binary linear code called the Hamming [7, 4, 3]
binary linear code; the row space of G2 is a [7, 3, 4] binary linear code; and the row
space of G3 is the [3,1,3] binary linear repetition code since
[0]
G3
−→ [0][111]
= [000],
[1]
G3
−→ [1][111]
= [111].
Encoding the message m = [m1, m2, m3, m4] with G1 produces the codeword
mG1 = [m1, m2, m3, m4, m2 ⊕ m3 ⊕ m4, m1 ⊕ m3 ⊕ m4, m1 ⊕ m2 ⊕ m4].
The ordering of the elements does not affect the performance of the code when sent
over a binary symmetric channel, because each bit will flip with equal probability.
This property applies to all linear codes used to transmit codewords over a binary
symmetric channel. Also note that row operations on a generator matrix do not
54
change its row space. Row operations on a matrix are interchanging two rows,
multiplying one row by a nonzero number, and adding a multiple of one row to a
different row [Nic90, page 8]. We combine these two observations to define code
equivalence.
Definition 3.6 Two binary linear codes are equivalent if they are generated by two
generator matrices such that one generator matrix can be obtained from the other by
a sequence of column swaps and row operations.
If possible, we express the generator matrix in row reduced echelon form. Note
that (3.5) is expressed in row reduced echelon form.
3.2.2 Nearest Neighbour Decoding
Consider the following scenario: A sender sends the codeword c ∈ C. The codeword
experiences bit flip errors as it travels over the binary symmetric channel to become
b ∈ Zn
2 . The difference e = b − c is called the error vector, or simply, the error.
A simple but ineffective decoding procedure is to compare b with every codeword
in C.
We decode b as c where c is a codeword such that dist(b, c ) is minimal. Any
such c is a most likely choice for the original codeword since (3.3) shows that fewer
errors are more probable than more errors. Namely, if p is the probability that a bit
gets flipped, then
Pr[wt(e) = i] = pi
(1 − pn−i
)
= pi
− pn
 pi+1
− pn
= pi+1
(1 − pn−(i+1)
)
= Pr[wt(e) = i + 1].
55
All decoding methods assume that the erroneous codeword b was originally closest
to the codeword c . We use the term “most likely” when using this assumption.
This decoding procedure is called nearest neighbour decoding and is used to prove
the following theorem.
Theorem 3.7 (Hamming Bound [Ham50]) An [n, k, d] binary linear code C cor-
rects up to t = d−1
2
bit flip errors under nearest neighbour decoding.
Proof: Consider the scenario in which c ∈ C is sent over a binary symmetric
channel where it acquires at most t errors. The erroneous codeword is received as
b = c ⊕ e, where e ∈ Zn
2 is such that wt(e) ≤ t. The receiver performs nearest
neighbour decoding, producing the codeword c ∈ C.
We now show that c = c . By assumption, 2t  d, so
dist(c, b) = wt(c ⊕ b)
= wt(c ⊕ (c ⊕ e))
= wt(e)
≤ t,
and by nearest neighbour decoding, dist(c , b) ≤ dist(c, b) ≤ t. So
dist(c, c ) = wt(c ⊕ c )
= wt((c ⊕ b) ⊕ (c ⊕ b))
≤ wt(c ⊕ b) + wt(c ⊕ b)
= dist(c, b) + dist(c , b)
≤ 2t
 d.
56
This first inequality is by the triangle inequality, since for any vector v ∈ Zn
2 , wt(v) =
v 1 (the l1-norm), and the last inequality holds by our assumption. However, by
Definition 3.3, d = minc1,c2∈C | c1=c2
dist(c1, c2). Thus, c = c .
We will sometimes refer to an [n, k, d] binary linear code as a t-error correcting
binary linear code, where t = d−1
2
. Furthermore, the larger d is, the more errors a
code can correct. So to create linear codes that correct more errors, we must increase
d, and in turn, increase n since d ≤ n.
To conclude, the nearest neighbour decoding method is sufficient for proving
properties of linear codes; however, the algorithm is computationally time consuming.
One must search all 2k
codewords to decode—an exponential time algorithm. Hence
we require better methods for decoding. A more useful decoding method uses a
special matrix called the parity check matrix.
3.2.3 The Parity Check Matrix
A matrix closely related to the generator matrix is the parity check matrix. The
parity check matrix is used to identify errors in codewords and assist in a more
useful decoding procedure.
Definition 3.8 Let C be an [n, k, d] binary linear code. A parity check matrix H
for C is an (n − k) × n matrix with the property that
HcT
= 0 ⇐: c ∈ C.
Since C = kerH = {v ∈ Zn
2 | vH = 0}, like the generator matrix, the parity check
matrix alone also defines the code C.
Recall that every binary linear code has an equivalent code with a generator
matrix in row reduced echelon form. This is a useful form because if the generator
57
matrix of the code, or its equivalent, is of the form
G = [Ik|A],
where Ik is a k ×k identity matrix and A is a k ×(n−k) matrix, then a parity check
matrix for the same code is
H = [AT
|In−k].
For example, the parity check matrix H1 of the Hamming [7, 4, 3] binary linear
code with generator matrix G1 is derived as follows. From observing (3.5), let
A =








0 1 1
1 0 1
1 1 0
1 1 1








, so
H1 =





0 1 1 1 1 0 0
1 0 1 1 0 1 0
1 1 0 1 0 0 1





.
Notice that H1 = G2. This is not a coincidence and will be discussed in §3.3.1
shortly.
3.2.4 Syndrome Decoding
Using a parity check matrix, we can perform a useful decoding procedure called
syndrome decoding [McE77, pp. 137-140]. Let c ∈ C be a codeword and e ∈ E =
{e ∈ Zn
2 | wt(e) ≤ t} be a correctable error. Given the erroneous codeword b = c⊕e,
we decode by performing the following three steps:
1. error detection: find the error vector e from b by using the parity check matrix,
2. error correction: remove the error e from b to regain the codeword c, and
58
3. unencoding: convert the codeword c back to the message m such that c = mG.
On the occasion when e ∈ E, decoding may not correctly extract the original
message. In the following exposition, we assume e ∈ E.
Error Detection
We use the parity check matrix to speed up decoding by calculating the syndrome,
s = HbT
. The syndrome is only dependent on the error e ∈ E and not the codeword
because
s = HbT
= H(c ⊕ e)T
= HcT
⊕ HeT
= 0 ⊕ HeT
= HeT
.
The second last equality follows from Definition 3.8. The set of solutions a ∈ Zn
2
with HaT
= s form a coset of C. Namely,
C ⊕ e = {c ⊕ e | c ∈ C}
where e ∈ E.
We use syndromes to detect or “diagnose” errors. In fact, for every e ∈ E, there
is a unique syndrome identifying e. This is proved below.
Theorem 3.9 Let C be a t-error correcting [n, k, d] binary linear code with parity
check matrix H. For every e ∈ E = {e ∈ Zn
2 | wt(e) ≤ t} where t = d−1
2
, there is
a unique syndrome s such that
s = HeT
.
59
Before proving the theorem above, we state and prove the following lemma.
Lemma 3.10 Let H be a parity check matrix of an [n, k, d] binary linear code C.
Then the following four properties are equivalent:
1. vectors u and w are in the same coset of C,
2. u ⊕ w ∈ C,
3. H(u ⊕ w)T
= 0, and
4. HuT
= HwT
.
Proof: Let C ⊕ x be a coset of C, and let u, w ∈ Zn
2 . To show Property 1 implies
Property 2, assume u, w ∈ C ⊕ x. So u = u ⊕ x and w = w ⊕ x for some u , w ∈ C.
Then
u ⊕ w = (u ⊕ x) ⊕ (w ⊕ x)
= u ⊕ w
∈ C.
Property 2 implies Property 3 by Definition 3.8. Property 3 implies Property 4
because by distributivity 0 = H(u⊕w)T
= HuT
⊕HwT
which implies HuT
= HwT
.
Finally, Property 4 implies Property 1 because HuT
= HxT
implies H(u ⊕ w)T
= 0,
so u ⊕ w ∈ C by Definition 3.8 and hence u ⊕ C = w ⊕ C.
Using Lemma 3.10 we can prove Theorem 3.9.
Proof: (of Theorem 3.9) Let e, f ∈ E be two correctable errors with the same
syndrome. Consider the distance between e, f,
dist(e, f) = wt(e ⊕ f)
≤ wt(e) + wt(f)
≤ 2t
 d.
60
The first inequality is by the triangle inequality since the weight is equivalent to the
l1-norm in Zn
2 , and the last by assumption. However, by Lemma 3.10, e ⊕ f ∈ C so
we must have e = f.
As an example, consider a parity check matrix for C3:
H3 =


1 1 0
1 0 1

 .
The possible syndromes are [00]T
, [01]T
, [10]T
, and [11]T
associated to the correctable
error vectors [000], [001], [010], and [100], respectively.
In summary, using the parity check matrix, we perform error detection by cal-
culating the syndrome of an erroneous codeword. Assuming that the error vector
has at most t bit flips, the unique error vector is determined from the syndrome. In
practice, we can find it by pre-computing all syndrome/error pairs in a table indexed
by the syndrome and looking up the error by the syndrome. Such a table requires
O(2t
) time to pre-compute. In fact, for general binary linear codes, finding the error
vector associated with the syndrome requires O(2t
) time.
Error Correction
Given the error e and the erroneous codeword b, we remove the error (i.e. error
correct) by subtracting the error from the erroneous codeword. Namely, the vector
b
e
→ b ⊕ e
= (c ⊕ e) ⊕ e
= c ⊕ (e ⊕ e)
= c,
results in the original codeword in O(n) time.
61
Unencoding
Unencoding is the final step of the decoding procedure that regains the message
m ∈ Zk
2 such that c = mG, where G is the generator matrix for the code C.
In the preferable case when G = [I|A], we simply truncate the last n − k bits
resulting in the original message m in linear time. One might even argue that
truncation requires no time because we are simply changing our perspective to look
at the first k bits rather than all the n bits. Similarly, one can imagine a polynomial
time unencoding procedure when the generator matrix is a column permutation of
[I|A]. In the general case, we must solve the system mG = c for the message m,
where c is our codeword in C and G is the generator for C.
In conclusion, for an arbitrary binary linear code, syndrome decoding requires
exponential time. In fact, the decision problem related to decoding is N P-complete
[BMvT78]. However, there exist a number of types of binary linear codes that decode
in polynomial-time, such as Goppa codes [McE77, Chapter 8]. Discussing such codes
is beyond the scope of this thesis.
3.3 Asymptotic Performance of Linear Codes
As discussed in §3.2.2, the better the code protects, the longer the codewords need
to be. However, the rate of growth of the codeword length relative to the message
length must not grow too fast or else the code will be unusably long. In this section,
we consider families of codes parameterized by some parameter a such as the family
of [a, 1, a] repetition codes and the family of [2a
− 1, 2a
− a − 1, 3] Hamming codes.
We see how good the families perform asymptotically.
Definition 3.11 A family of [na, ka, da] codes {Ca}, parameterized by a, is said to
be good if the following limits are achieved:
1. lima→∞ na = ∞,
62
2. lima→∞
ka
na
= R  0, and
3. lima→∞
da
na
= δ  0.
R is called the asymptotic rate and δ is called the asymptotic distance.
Unfortunately, the family of repetition codes and the family of Hamming codes
are not good. For example, the family of repetition codes has R = 0, and the family of
Hamming codes has δ = 0. However, good codes do exist. The Gilbert–Varshamov2
lower bound is a theorem based on a constructive proof that builds a family of good
linear codes to protect messages over a binary symmetric channel that applies errors
at random.
Theorem 3.12 (The Gilbert–Varshamov Lower Bound) There exists a fam-
ily of codes with asymptotic rate R and asymptotic distance δ such that
R ≥ 1 − H(δ),
where H(δ) is the binary Shannon entropy H(δ) = −δ log2 δ − (1 − δ) log2(1 − δ).
The Gilbert–Varshamov lower bound is stated here without proof. See [Sud02,
Lecture 5 Notes] and [vL82, pp. 66-67] for understandable proofs, and [Gil52] for the
original proof.
3.3.1 Code Duals
There exists a beautiful and useful symmetry between some binary linear codes. If
C is a binary linear code with generator matrix G and parity check matrix H, then
there is another code C⊥
with generator matrix G⊥
= H and parity check matrix
H⊥
= G. This is formally discussed below.
2
In selected literature, “Varshamov” is often spelled “Varsharmov.” We follow the spelling in
[MS77].
63
Definition 3.13 If C is an [n, k, d] binary linear code, then its dual code C⊥
is the
set of vectors orthogonal to C. Namely,
C⊥
= {v ∈ Zn
2 | c · v = 0, ∀ c ∈ C}.
Theorem 3.14 If C is an [n, k, d] binary linear code with generator matrix G and
parity check matrix H, then C⊥
is an [n, n − k, d ] binary linear code with generator
matrix G⊥
= H and parity check matrix H⊥
= G, for suitable d .
Proof: The dimension of C is dim(C) = k. Since dim(C) + dim(C⊥
) = n, then
dim(C⊥
) = n − k. Let d be the minimum distance of C⊥
. By Definition 3.4, C⊥
is
an [n, n − k, d ] binary linear code.
By Definition 3.5, G⊥
= H is a generator matrix of C⊥
since its row space is
equal to C⊥
.
Since the row space of G equals C, then by Definition 3.13,
GxT
= 0 ⇐: x ∈ C⊥
which, by Definition 3.8, defines H⊥
= G to be a parity check matrix of C⊥
.
Not all code duals are useful. The dual to any [3, 1, 3] binary linear code is a
[3, 2, 2] binary linear code which corrects 0 = 2−1
2
errors. However, the Hamming
[7, 4, 3] binary linear code has a [7, 3, 4] binary linear code for its dual. Both a [7, 4, 3]
binary linear code and a [7, 3, 4] binary linear code correct up to one error. Together,
the [7, 4, 3] Hamming code and its dual [7, 3, 4] code are used as a quantum error
correcting code known as the Steane code [Ste96]. The Steane code is discussed
shortly in §4.4.
3.4 Parameterized Codes
In this section, we show that any specific linear code gives rise to a class of function-
ally equivalent codes by mapping the code C to a collection of cosets of C.
64
Definition 3.15 Let C be an [n, k, d] binary linear code, and let x be any vector in
Zn
2 . Then an [n, k, d] parameterized code Cx is the coset C ⊕ x.
Encoding and decoding parameterized codes requires minor additions to what we
have seen thus far. Let C be an [n, k, d] binary linear code with generator matrix
G and parity check matrix H. Previously, encoding some message m ∈ Zn
2 with
respect to C was c = mG. In the case of a parameterized code Cx, the codeword cx
corresponding to the message m is produced by the function
cx = mG ⊕ x.
Any error e ∈ Zn
2 acting on a codeword cx ∈ Cx acts the same as if it was acting on
the corresponding codeword c ∈ C, where cx = c⊕x. Let b = cx ⊕e be the erroneous
codeword. So
b = cx ⊕ e = (c ⊕ x) ⊕ e
= x ⊕ (c ⊕ e).
Thus, calculating the syndrome s of b is simply
s = H(b ⊕ x)T
,
because
H(b ⊕ x)T
= H((x ⊕ (c ⊕ e)) ⊕ x)T
= H(c ⊕ e)T
= HeT
= s.
Thus, decoding a parameterized code via syndrome decoding is straightforward:
merely map the received erroneous codeword b to b = b ⊕ x and process b as per
§3.2.4.
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004
thesis-cannings-2004

More Related Content

What's hot

Ali-Dissertation-5June2015
Ali-Dissertation-5June2015Ali-Dissertation-5June2015
Ali-Dissertation-5June2015Ali Farznahe Far
 
David_Mateos_Núñez_thesis_distributed_algorithms_convex_optimization
David_Mateos_Núñez_thesis_distributed_algorithms_convex_optimizationDavid_Mateos_Núñez_thesis_distributed_algorithms_convex_optimization
David_Mateos_Núñez_thesis_distributed_algorithms_convex_optimizationDavid Mateos
 
Quantum Cryptography and Possible Attacks
Quantum Cryptography and Possible AttacksQuantum Cryptography and Possible Attacks
Quantum Cryptography and Possible AttacksArinto Murdopo
 
LChen_diss_Pitt_FVDBM
LChen_diss_Pitt_FVDBMLChen_diss_Pitt_FVDBM
LChen_diss_Pitt_FVDBMLeitao Chen
 
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...Alexander Zhdanov
 
Modeling_Future_All_Optical_Networks_without_Buff
Modeling_Future_All_Optical_Networks_without_BuffModeling_Future_All_Optical_Networks_without_Buff
Modeling_Future_All_Optical_Networks_without_BuffMiguel de Vega, Ph. D.
 
Classical mechanics
Classical mechanicsClassical mechanics
Classical mechanicshue34
 
Stochastic Processes in R
Stochastic Processes in RStochastic Processes in R
Stochastic Processes in RTomas Gonzalez
 

What's hot (16)

Ali-Dissertation-5June2015
Ali-Dissertation-5June2015Ali-Dissertation-5June2015
Ali-Dissertation-5June2015
 
Thesis lebanon
Thesis lebanonThesis lebanon
Thesis lebanon
 
David_Mateos_Núñez_thesis_distributed_algorithms_convex_optimization
David_Mateos_Núñez_thesis_distributed_algorithms_convex_optimizationDavid_Mateos_Núñez_thesis_distributed_algorithms_convex_optimization
David_Mateos_Núñez_thesis_distributed_algorithms_convex_optimization
 
Transport
TransportTransport
Transport
 
Quantum Cryptography and Possible Attacks
Quantum Cryptography and Possible AttacksQuantum Cryptography and Possible Attacks
Quantum Cryptography and Possible Attacks
 
add_2_diplom_main
add_2_diplom_mainadd_2_diplom_main
add_2_diplom_main
 
LChen_diss_Pitt_FVDBM
LChen_diss_Pitt_FVDBMLChen_diss_Pitt_FVDBM
LChen_diss_Pitt_FVDBM
 
dcorreiaPhD
dcorreiaPhDdcorreiaPhD
dcorreiaPhD
 
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
 
Time series Analysis
Time series AnalysisTime series Analysis
Time series Analysis
 
Phd dissertation
Phd dissertationPhd dissertation
Phd dissertation
 
CADances-thesis
CADances-thesisCADances-thesis
CADances-thesis
 
Modeling_Future_All_Optical_Networks_without_Buff
Modeling_Future_All_Optical_Networks_without_BuffModeling_Future_All_Optical_Networks_without_Buff
Modeling_Future_All_Optical_Networks_without_Buff
 
Erlangga
ErlanggaErlangga
Erlangga
 
Classical mechanics
Classical mechanicsClassical mechanics
Classical mechanics
 
Stochastic Processes in R
Stochastic Processes in RStochastic Processes in R
Stochastic Processes in R
 

Similar to thesis-cannings-2004

An_Introduction_WSNS_V1.8.pdf
An_Introduction_WSNS_V1.8.pdfAn_Introduction_WSNS_V1.8.pdf
An_Introduction_WSNS_V1.8.pdfAnil Sagar
 
Michael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDFMichael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDFAniruddh Tyagi
 
Michael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDFMichael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDFaniruddh Tyagi
 
Morton john canty image analysis and pattern recognition for remote sensing...
Morton john canty   image analysis and pattern recognition for remote sensing...Morton john canty   image analysis and pattern recognition for remote sensing...
Morton john canty image analysis and pattern recognition for remote sensing...Kevin Peña Ramos
 
vdoc.pub_static-timing-analysis-for-nanometer-designs-a-practical-approach-.pdf
vdoc.pub_static-timing-analysis-for-nanometer-designs-a-practical-approach-.pdfvdoc.pub_static-timing-analysis-for-nanometer-designs-a-practical-approach-.pdf
vdoc.pub_static-timing-analysis-for-nanometer-designs-a-practical-approach-.pdfquandao25
 
Wireless Communications Andrea Goldsmith, Stanford University.pdf
Wireless Communications Andrea Goldsmith, Stanford University.pdfWireless Communications Andrea Goldsmith, Stanford University.pdf
Wireless Communications Andrea Goldsmith, Stanford University.pdfJanviKale2
 
Implementation of a Localization System for Sensor Networks-berkley
Implementation of a Localization System for Sensor Networks-berkleyImplementation of a Localization System for Sensor Networks-berkley
Implementation of a Localization System for Sensor Networks-berkleyFarhad Gholami
 
Math for programmers
Math for programmersMath for programmers
Math for programmersmustafa sarac
 
Pulse Preamplifiers for CTA Camera Photodetectors
Pulse Preamplifiers for CTA Camera PhotodetectorsPulse Preamplifiers for CTA Camera Photodetectors
Pulse Preamplifiers for CTA Camera Photodetectorsnachod40
 
Stochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning PerspectiveStochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning Perspectivee2wi67sy4816pahn
 
NEW METHODS FOR TRIANGULATION-BASED SHAPE ACQUISITION USING LASER SCANNERS.pdf
NEW METHODS FOR TRIANGULATION-BASED SHAPE ACQUISITION USING LASER SCANNERS.pdfNEW METHODS FOR TRIANGULATION-BASED SHAPE ACQUISITION USING LASER SCANNERS.pdf
NEW METHODS FOR TRIANGULATION-BASED SHAPE ACQUISITION USING LASER SCANNERS.pdfTrieuDoMinh
 
The Cellular Automaton Interpretation of Quantum Mechanics
The Cellular Automaton Interpretation of Quantum MechanicsThe Cellular Automaton Interpretation of Quantum Mechanics
The Cellular Automaton Interpretation of Quantum MechanicsHunter Swart
 
Crypto notes
Crypto notesCrypto notes
Crypto notesvedshri
 

Similar to thesis-cannings-2004 (20)

D-STG-SG02.16.1-2001-PDF-E.pdf
D-STG-SG02.16.1-2001-PDF-E.pdfD-STG-SG02.16.1-2001-PDF-E.pdf
D-STG-SG02.16.1-2001-PDF-E.pdf
 
BenThesis
BenThesisBenThesis
BenThesis
 
An_Introduction_WSNS_V1.8.pdf
An_Introduction_WSNS_V1.8.pdfAn_Introduction_WSNS_V1.8.pdf
An_Introduction_WSNS_V1.8.pdf
 
Michael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDFMichael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDF
 
Michael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDFMichael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDF
 
Morton john canty image analysis and pattern recognition for remote sensing...
Morton john canty   image analysis and pattern recognition for remote sensing...Morton john canty   image analysis and pattern recognition for remote sensing...
Morton john canty image analysis and pattern recognition for remote sensing...
 
book_dziekan
book_dziekanbook_dziekan
book_dziekan
 
vdoc.pub_static-timing-analysis-for-nanometer-designs-a-practical-approach-.pdf
vdoc.pub_static-timing-analysis-for-nanometer-designs-a-practical-approach-.pdfvdoc.pub_static-timing-analysis-for-nanometer-designs-a-practical-approach-.pdf
vdoc.pub_static-timing-analysis-for-nanometer-designs-a-practical-approach-.pdf
 
Wireless Communications Andrea Goldsmith, Stanford University.pdf
Wireless Communications Andrea Goldsmith, Stanford University.pdfWireless Communications Andrea Goldsmith, Stanford University.pdf
Wireless Communications Andrea Goldsmith, Stanford University.pdf
 
iosdft
iosdftiosdft
iosdft
 
Implementation of a Localization System for Sensor Networks-berkley
Implementation of a Localization System for Sensor Networks-berkleyImplementation of a Localization System for Sensor Networks-berkley
Implementation of a Localization System for Sensor Networks-berkley
 
Communication
CommunicationCommunication
Communication
 
thesis
thesisthesis
thesis
 
Math for programmers
Math for programmersMath for programmers
Math for programmers
 
phd_unimi_R08725
phd_unimi_R08725phd_unimi_R08725
phd_unimi_R08725
 
Pulse Preamplifiers for CTA Camera Photodetectors
Pulse Preamplifiers for CTA Camera PhotodetectorsPulse Preamplifiers for CTA Camera Photodetectors
Pulse Preamplifiers for CTA Camera Photodetectors
 
Stochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning PerspectiveStochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning Perspective
 
NEW METHODS FOR TRIANGULATION-BASED SHAPE ACQUISITION USING LASER SCANNERS.pdf
NEW METHODS FOR TRIANGULATION-BASED SHAPE ACQUISITION USING LASER SCANNERS.pdfNEW METHODS FOR TRIANGULATION-BASED SHAPE ACQUISITION USING LASER SCANNERS.pdf
NEW METHODS FOR TRIANGULATION-BASED SHAPE ACQUISITION USING LASER SCANNERS.pdf
 
The Cellular Automaton Interpretation of Quantum Mechanics
The Cellular Automaton Interpretation of Quantum MechanicsThe Cellular Automaton Interpretation of Quantum Mechanics
The Cellular Automaton Interpretation of Quantum Mechanics
 
Crypto notes
Crypto notesCrypto notes
Crypto notes
 

thesis-cannings-2004

  • 1. THE UNIVERSITY OF CALGARY On the Security of the BB84 Quantum Key Distribution Protocol by Richard Cannings A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE CROSS-DISCIPLINARY DEGREE OF MASTER OF SCIENCE DEPARTMENT OF MATHEMATICS AND STATISTICS and DEPARTMENT OF COMPUTER SCIENCE CALGARY, ALBERTA March, 2004 c Richard Cannings 2004
  • 2. Abstract The BB84 quantum key distribution (QKD) protocol enables two authenticated par- ties to generate a secret key over an insecure quantum channel. Using a standardized security definition, we prove that BB84 is secure and include explicit bounds on its se- curity. Furthermore, our use of quantum circuit diagrams simplify the Shor–Preskill proof. Namely, we can reduce the Modified Lo-Chau QKD to a practical version of BB84 using the observation from Shor and Preskill that one may ignore a cor- rectable number of phase errors, and the fact that computational basis measurements commute with controls of CNOT operations. The first four chapters provide the required background material on quantum computing, information theory, cryptography, coding theory, and quantum error correcting codes. Chapter 5 presents protocols for entanglement purification. Chap- ter 6 reduces an entanglement purification protocol to the Modified Lo-Chau QKD, and proves that it is secure. Finally, a reduction from the Modified Lo-Chau QKD to BB84 establishes the security of the latter. iii
  • 3. Acknowledgments Many people helped me write this thesis. First and foremost, I thank my entire thesis committee: Richard Cleve, Barry Sanders, Renate Scheidler, John Watrous and Hugh Williams1 who focused so much attention on problems dear to me—especially Richard Cleve and Renate Scheidler for their conscientious support throughout my entire graduate degree. Writing this thesis was only possible with financial support from the iCORE Chair in Algorithmic Number Theory and Cryptography (ICANTC), Renate Scheidler’s NSERC grant, and funding from Richard Cleve’s NSERC and MITACS grants. Finally, I want to express special thanks to: – Richard Cleve for providing clear, simple, and stunningly beautiful solutions to the most complex problems, – Claude Laflamme for believing in me and coaching me through my undergrad- uate and graduate degree, – Christiane Lemieux for kindly guiding me through the horrors of probability theory numerous times, – Renate Scheidler for her devotion to my thesis, persistent focus on mathemati- cal rigor, and her direct nature (for which I may not have been as immediately appreciative as I should have been), – John Watrous for challenging me and exposing me to some very beautiful mathematics in CPSC 601.86, and – Hugh Williams for his wit, sarcasm, and financial support. 1 All names in the Acknowledgments are intentionally listed in alphabetical order by surname. iv
  • 4. Dedication I dedicate this thesis to my mother for her constant love, support, and encourage- ment. v
  • 5. Table of Contents Abstract iii Acknowledgments iv Dedication v Table of Contents vi Epigraph xii 1 Quantum Computing 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Postulates of Quantum Mechanics . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Qubits and State Space . . . . . . . . . . . . . . . . . . . . . . 4 1.2.2 Quantum Evolution and Circuit Diagrams . . . . . . . . . . . 8 1.2.3 Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.3 Reversible Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.4 Circuit Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.5 The Density Operator Formalism . . . . . . . . . . . . . . . . . . . . 22 1.5.1 Mixed States . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.5.2 Quantum Evolution . . . . . . . . . . . . . . . . . . . . . . . . 23 1.5.3 Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 1.5.4 Fidelity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.6 Entanglement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.6.1 The Partial Trace . . . . . . . . . . . . . . . . . . . . . . . . . 27 2 Information Theory and Cryptography 29 2.1 Information Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2 Quantum Information Theory . . . . . . . . . . . . . . . . . . . . . . 32 2.3 Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.3.1 Creating a Private Channel . . . . . . . . . . . . . . . . . . . 34 2.3.2 The One-Time Pad . . . . . . . . . . . . . . . . . . . . . . . . 35 2.3.3 The Diffie–Hellman Key Distribution Protocol . . . . . . . . . 39 2.3.4 The BB84 Quantum Key Distribution Protocol . . . . . . . . 41 2.3.5 On the Security of Key Distribution Protocols . . . . . . . . . 44 vi
  • 6. vii 3 Coding Theory 47 3.1 Repetition Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.1.1 Performance of Repetition Codes . . . . . . . . . . . . . . . . 49 3.2 Binary Linear Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.2.1 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.2.2 Nearest Neighbour Decoding . . . . . . . . . . . . . . . . . . . 54 3.2.3 The Parity Check Matrix . . . . . . . . . . . . . . . . . . . . . 56 3.2.4 Syndrome Decoding . . . . . . . . . . . . . . . . . . . . . . . . 57 3.3 Asymptotic Performance of Linear Codes . . . . . . . . . . . . . . . . 61 3.3.1 Code Duals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.4 Parameterized Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4 Quantum Error Correcting Codes 65 4.1 Quantum Bit Flip Code . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.1.1 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.1.2 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.1.3 Transporting Classical Information Over A t-out-of-n X-Channel 73 4.2 Quantum Phase Flip Code . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3 CSS Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3.1 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.3.2 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.4 The Steane Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.4.1 The Steane code is a [[7, 1]] CSS Code . . . . . . . . . . . . . 90 4.4.2 Encoding and Decoding Circuits for the Steane Code . . . . . 91 4.5 Parameterized CSS Codes . . . . . . . . . . . . . . . . . . . . . . . . 94 4.6 The Quantum Gilbert–Varshamov Bound . . . . . . . . . . . . . . . . 98 4.7 Quantum Noisy Channels . . . . . . . . . . . . . . . . . . . . . . . . 99 4.7.1 CSS Codes are Robust . . . . . . . . . . . . . . . . . . . . . . 100 4.7.2 Definitions of a Quantum Noisy Channel . . . . . . . . . . . . 106 5 Entanglement Purification Using CSS Codes 108 5.1 Entanglement Purification Problems . . . . . . . . . . . . . . . . . . 108 5.2 Solving the Simple Entanglement Purification Problems . . . . . . . . 111 5.2.1 Solving SEPP-1 . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.2.2 Solving SEPP-2 . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.3 Solving GEPP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.3.1 The Random Sample Test . . . . . . . . . . . . . . . . . . . . 121 5.3.2 Using the RST in Quantum Problems . . . . . . . . . . . . . . 125 5.3.3 Using DRST in the GEPP Solution . . . . . . . . . . . . . . . 132
  • 7. viii 6 On The Security of a Practical BB84 QKD 142 6.1 The Modified Lo–Chau QKD . . . . . . . . . . . . . . . . . . . . . . 142 6.1.1 On The Security of The Modified Lo–Chau QKD . . . . . . . 148 6.2 A Practical BB84 QKD . . . . . . . . . . . . . . . . . . . . . . . . . . 151 6.3 On the Success of a Practical BB84 QKD . . . . . . . . . . . . . . . . 154 7 Concluding Remarks 158 7.1 Simplifying the Shor–Preskill Reductions . . . . . . . . . . . . . . . . 158 7.2 Simplifying the Modified Lo–Chau QKD Security Proof . . . . . . . . 159 7.3 Explicit Security Bounds for BB84 . . . . . . . . . . . . . . . . . . . 160 Bibliography 161 A The Chernoff–Hoeffding Bounds 167
  • 8. List of Figures 1.1 A quantum circuit of the Pauli X operation. . . . . . . . . . . . . . . 10 1.2 The controlled NOT operation on two qubits. . . . . . . . . . . . . . 11 1.3 A quantum circuit illustrating the inverted CNOT operation. . . . . . 12 1.4 The Toffoli operation on three qubits. . . . . . . . . . . . . . . . . . . 13 1.5 An example of a generalized n-controlled NOT operation. . . . . . . . 13 1.6 The quantum circuit for measurement in the computational basis. . . 14 1.7 Measuring two qubits of a 2-qubit system. . . . . . . . . . . . . . . . 15 1.8 Measuring one qubit of a 2-qubit system. . . . . . . . . . . . . . . . . 15 1.9 Alternate output representation of measuring one qubit of a 2-qubit system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.10 An illustration of a measurement in the computational basis commut- ing with the control of a CNOT operation. . . . . . . . . . . . . . . . 16 1.11 A quantum circuit implementing the logical OR. . . . . . . . . . . . . 20 1.12 A quantum circuit generating the Bell basis states. . . . . . . . . . . 26 1.13 A quantum circuit representing the partial trace. . . . . . . . . . . . 28 2.1 An example of the original BB84 QKD where no errors occurred. . . 42 3.1 A visual representation of a binary symmetric channel with error prob- ability p. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.1 UG3 based on the [3, 1, 3] binary linear code. . . . . . . . . . . . . . . 67 4.2 UG1 based on the [7, 4, 3] binary linear code. . . . . . . . . . . . . . . 68 4.3 A multiple controlled CNOT gate (left) is composed of two standard CNOT gates (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.4 UH3 based on the [3, 1, 3] code using parity check matrix H3. . . . . . 71 4.5 UH1 based on the [7, 4, 3] code using parity check matrix H1 . . . . . 72 4.6 Creating the state 1√ |C2| c∈C2 |c . . . . . . . . . . . . . . . . . . . . . 78 4.7 The quantum circuit for encoding, Uencode. . . . . . . . . . . . . . . . 80 4.8 An alternative quantum circuit for encoding, Uencode. . . . . . . . . . 81 4.9 The quantum circuit for error detecting, error correcting a CSS code- word. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.10 High level conceptualization of error detection, error correction, and decoding a CSS codeword. . . . . . . . . . . . . . . . . . . . . . . . . 89 4.11 The encoding circuit diagram for the Steane code. . . . . . . . . . . . 92 4.12 The complete Udecode circuit for the Steane code with error detection, error correction and decoding circuit. . . . . . . . . . . . . . . . . . . 95 4.13 Encoding |bx,z L with Uencode and state preparation. . . . . . . . . . . . 97 ix
  • 9. x 4.14 A conceptualization of a quantum noisy channel. . . . . . . . . . . . . 100 4.15 An example of Mother Nature’s adversarial strategies. . . . . . . . . . 101 5.1 The quantum circuit for Protocol 5.1, a solution to SEPP-1. . . . . . 112 5.2 The quantum circuit for Protocol 5.1 with quantum communication. . 119 5.3 Illustration of the DRST with our claim. . . . . . . . . . . . . . . . . 138 5.4 Two equivalent measurement operations. . . . . . . . . . . . . . . . . 139 5.5 An illustration of the GEPP Solution. . . . . . . . . . . . . . . . . . . 141 6.1 An illustration of the Modified Lo–Chau QKD. . . . . . . . . . . . . 147 6.2 A slightly altered Modified Lo–Chau QKD. . . . . . . . . . . . . . . . 153 6.3 An illustration of a practical BB84 QKD. . . . . . . . . . . . . . . . . 156
  • 10. List of Protocols 2.1 The Vernam One-Time Pad Cipher . . . . . . . . . . . . . . . . . . . 36 2.2 The Diffie–Hellman key distribution protocol . . . . . . . . . . . . . . 40 2.3 The Original BB84 QKD . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.1 SEPP-1 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.2 A solution to SEPP-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.3 The Random Sample Test (RST) . . . . . . . . . . . . . . . . . . . . 122 5.4 The Double Random Sample Test (DRST) . . . . . . . . . . . . . . . 129 5.5 GEPP Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.1 The Modified Lo–Chau QKD . . . . . . . . . . . . . . . . . . . . . . 146 6.2 A Practical BB84 QKD . . . . . . . . . . . . . . . . . . . . . . . . . . 155 xi
  • 11. Epigraph Anyone who is not shocked by quantum mechanics has not fully understood it. —Niels Bohr xii
  • 12. Chapter 1 Quantum Computing 1.1 Introduction Before the existence of modern computers, Turing proposed a simple definition of a theoretical computer called the universal Turing machine that was used to simplify the study of computing devices in the context of evaluating functions. Turing [Tur36] and Church [Chu36] proposed that: Every “computing device can be simulated by a [universal] Turing ma- chine.” [Sho94] This proposal is commonly known as Church’s thesis, and simplified the study of computing devices, because it reduced the study of numerous theoretical computing devices to a single theoretical model. With the realization of modern computers and their subsequent widespread adop- tion, researchers began to focus on efficient physical computing devices. It is gen- erally accepted that an efficient physical computing device is one that uses at most a polynomial number of steps relative to its input size. Thus, in the back of their minds, researchers envisioned a strong Church’s thesis. Namely, “Any physical computing device can be simulated by a [universal] Turing machine in a number of steps polynomial in the resources used by the computing device.” [Sho94] However, Deutsch noted that a “physics experiment” is a physical computing device because it is a process with input and output connected together by some 1
  • 13. 2 sequence of events. However, physics experiments based on quantum theory cannot be perfectly simulated by a universal Turing machine using only a polynomial number of steps. Both Deutsch [Deu85] and Feynman [Fey86] addressed this problem by proposing a theoretical model of computing based on quantum physics, called a quantum computer. This led to the Church–Turing principle which states: “Every [physical] system can be perfectly simulated by a universal model of computing machine operating” in a number of steps polynomial in the resources used by the physical system. [Deu85] In a number of cases, quantum computing and communication have substantial power over classical computers and communication, and have affected the cross- disciplinary field of cryptology tremendously. In 1994, Shor created two probabilis- tic polynomial-time quantum algorithms to solve two difficult mathematical and computational problems: the discrete logarithm problem and the factoring prob- lem [Sho94]. As a result, most number theoretic key distribution protocols such as RSA [RSA79] and Diffie–Hellman [DH76] will be rendered useless upon the physical realization of a quantum computer. More recently, Hallgren described a probabilis- tic polynomial-time quantum algorithm to solve Pell’s Equation and the Principal Ideal Problem [Hal02], and hence breaking even more cryptosystems, including the possibly stronger Buchmann–Williams public-key cryptosystem [BW89]. Even though quantum computers break the most popular key distribution pro- tocols, the idea of security is not lost upon the physical realization of quantum com- puters. The theory of quantum communication (i.e., quantum networks) provides a mathematical formalism to prove that some quantum key distribution (QKD) pro- tocols are information theoretically secure—essentially unbreakable! In 1984, Ben- nett and Brassard introduced the BB84 quantum key distribution protocol for two authenticated parties to generate a secret cryptographic key via an insecure commu-
  • 14. 3 nication channel [BB84]. The BB84 QKD is physically realizable and commercially available1 today. So proving its security has direct applications to industry. This led to [May01], [SP00], and others proving that BB84 was information theoretically secure. BB84 and its security is the focus of this thesis. We begin the journey of proving the security of BB84 by first introducing the postulates of quantum mechanics, fo- cusing on how they apply to quantum computing. The material in this chapter can be found in most quantum information and quantum computation text books such as [NC00]. Quantum mechanics can be expressed in the language of linear algebra. The texts [Nic90] and [HJ85] combined provide a great reference to the mathematical foundations of quantum mechanics. 1.2 Postulates of Quantum Mechanics There are many useful analogies between the postulates of quantum mechanics and classical computing. Consider describing classical computing as being based on three postulates: a state space, logical evolution, and observation. The state of a classical computer is held within one or more registers called bits each taking on the value 0 or 1. The state space of a classical computer is the set of all possible states the computer may be in. For example, the state space of an n-bit classical computer is the set of all n-bit strings, {0, 1}n . We also assume bits evolve and change only through a series of basic logical operations—a logical evolution. Finally, we can observe the inner workings of a classical computer at any moment—especially at the end in order to observe the output. We believe that observing the output of an algorithm, or even pausing an algorithm to look at the computer’s state, does not affect the computation. 1 Purchase yours at http://www.magiqtech.com/, or http://www.idquantique.com/ today!
  • 15. 4 Quantum computing is based on the three postulates of quantum mechanics: a state space, unitary evolution, and measurement. Like classical bits, one or more quantum bits called qubits hold the state of a quantum system. The qubits evolve through a series of unitary operations. Unlike classical computing, observing—or, equivalently, measuring—a qubit may alter its value. In this section, we introduce the three postulates of quantum mechanics in the pure state formalism as they relate to quantum computing. The text [NC00] is an excellent reference for quantum computing. In fact, much of this chapter is based on [NC00]. 1.2.1 Qubits and State Space Before discussing qubits, let us briefly continue discussing classical computing. An- other form of classical computing is probabilistic classical computing which associates probabilities with each state. For instance, let B be a random variable representing a state of a 1-bit classical system. With probability p (i.e. p = Pr[B = 0]), B takes on the value 0, and with probability q = 1 − p, B takes on the value 1. Such a state can be represented as a probability vector   p q   = p   1 0   + q   0 1   , where p+q = 1 and p, q ≥ 0. The ith component of the vector contains the probability that the state is i. In this case, the vector   1 0   , represents the state 0 with certainty, and the vector   0 1   ,
  • 16. 5 represents the state 1 with certainty. In quantum computing, a qubit is a register whose possible values are pure quan- tum states2 , and which is somewhat similar to probabilistic classical information. For instance, a pure quantum state is represented as a unit vector   α β   = α   1 0   + β   0 1   , where α, β ∈ C and |α|2 + |β|2 = 1. The values α and β are called amplitudes and do not represent probabilities. A linear combination of basis states is said to be a superposition, meaning that when α, β = 0 a qubit is “both” 0 and 1, rather than being “either” 0 and 1, in the classical probabilistic sense. The power of quantum computing arises from the difference between probabilities and amplitudes. All such quantum states are in a vector space called a Hilbert space. For the purposes of this thesis, we present a limited, yet suitable definition of a Hilbert space. Definition 1.1 A Hilbert space is a finite-dimensional complex inner product space with the dot product as the inner product. H denotes a 2-dimensional Hilbert space. This leads us to a preliminary version of the first postulate of quantum mechanics: Preliminary Postulate 1 (State Space) A one-qubit quantum system is com- pletely described by a unit length state vector in H. We describe the state of a quantum system as a superposition, or linear combi- nation, of basis states that span the Hilbert space. We may use any basis, but the 2 Pure quantum states are also called pure states, or sometimes just states.
  • 17. 6 most common basis consists of the computational basis states, |0 def =   1 0   , and |1 def =   0 1   . The symbol |· is called a ket and represents a column vector. Associated with a ket is a bra symbolized by ·|. A bra is the conjugate transpose of the ket. For instance, 0| def = |0 T = 1 0 , and 1| def = |1 T = 0 1 . In general, let |ψ = α |0 + β |1 and |φ = γ |0 + δ |1 be arbitrary quantum states, where |α|2 + |β|2 = |γ|2 + |δ|2 = 1. Then ψ| = ¯α 0| + ¯β 1| and φ| = ¯γ 0| + ¯δ 1|. Multiplying a bra and a ket together forms a bracket, · | · . More precisely, the bracket is the inner product. Namely, ψ | φ def = ( ψ|) · (|φ ) = ¯α ¯β ·   γ δ   = ¯αγ + ¯βδ. Also |ψ = |φ if and only if ψ | φ = ψ | ψ = 1. Like classical computing, more qubits allow for more complex and useful quantum computing. Classically, bits are concatenated together into finite length bit strings in {0, 1}n . Qubits are “concatenated” using the Kronecker (a.k.a. tensor) product symbolized by “⊗.”
  • 18. 7 Definition 1.2 Let A be an m × n matrix with entries ai,j and let B be any matrix. The Kronecker product of A and B is the matrix A ⊗ B def =         a1,1B a1,2B . . . a1,nB a2,1B a2,2B . . . a2,nB ... ... ... ... am,1B am,2B . . . am,nB         . Typically, tensor products are implicitly assumed and often omitted so that for b1, b2 ∈ {0, 1}, |b1 |b2 , |b1, b2 , and even |b1b2 are all assumed to be |b1 ⊗ |b2 . Definition 1.3 Let |φ0 , |φ1 be basis vectors in H. Then H⊗n is a 2n -dimensional Hilbert space with basis vectors |φb1 ⊗ |φb2 ⊗ · · · ⊗ |φbn , for all b = b1b2 . . . bn ∈ {0, 1}n . For example, let |0 , |1 be basis vectors in H. We define space H⊗2 to be a Hilbert space with basis vectors: |0 ⊗ |0 , |0 ⊗ |1 , |1 ⊗ |0 , |1 ⊗ |1 . Thus, the computational basis of H⊗n is simply given by all unit vectors |b where b ∈ {0, 1}n . This leads us to the full version of the state space postulate of quantum mechanics. Postulate 1 (State Space (Discrete Version)) An n-qubit quantum system is completely described by a unit length state vector in H⊗n . Tensor products and kets combined generate a descriptive and compact represen- tation of a quantum state. Let |ψ = α |0 + β |1 ∈ H and |φ = γ |0 + δ |1 ∈ H be two arbitrary qubits. Then the state of a two qubit quantum system is the vector   α β   ⊗   γ δ   =         αγ αδ βγ βδ        
  • 19. 8 in H⊗2 . Using kets, the state above is, |ψ ⊗ |φ = (α |0 + β |1 ) ⊗ (γ |0 + δ |1 ) = αγ |0 ⊗ |0 + αδ |0 ⊗ |1 + βγ |1 ⊗ |0 + βδ |1 ⊗ |1 = αγ |00 + αδ |01 + βγ |10 + βδ |11 . We will generally discuss n-qubit quantum states, such as the arbitrary state |ψ =         α0 α1 ... α2n−1         = α0         1 0 ... 0         + α1         0 1 ... 0         + · · · + α2n−1         0 0 ... 1         = i∈{0,1}n αi |i in H⊗n , where i |αi|2 = 1. Note that we identify {0, 1}n ≡ {0, 1, . . . , 2n −1} above. We shall commonly identify {0, 1}n ≡ {0, 1, . . ., 2n − 1} and {0, 1}n ≡ Zn when appropriate. Now that we feel comfortable with kets, bras and Hilbert spaces, let us move on to the second postulate of quantum mechanics. 1.2.2 Quantum Evolution and Circuit Diagrams Postulate 2 (Quantum Evolution) The evolution of a quantum system is repre- sented by a unitary transformation. The state |ψ at time t is related to the state |ψ at time t by a unitary transformation matrix U given by the equation U |ψ = |ψ , or equally |ψ U → |ψ . The above definition is a slightly altered version of [NC00, page 81]. As a re- minder, a unitary matrix is a square matrix with complex entries whose inverse is its conjugate transpose. We denote the conjugate transpose of a matrix M by M† .
  • 20. 9 The unitary evolution of a quantum system is somewhat analogous to logical evolution in classical computing. For example, consider applying the logical NOT operation “¬” to a bit: ¬0 = 1, or ¬1 = 0. The X Pauli matrix, X =   0 1 1 0   , is a quantum operation similar to “¬” since X |0 = |1 , and X |1 = |0 . We often represent quantum operations in a conceptually helpful illustration called a quantum circuit diagram. Figure 1.1 illustrates applying the unitary op- eration X to the state α |0 + β |1 , and mathematically represents: α |0 + β |1 = α   1 0   + β   0 1   =   α β   X →   0 1 1 0     α β   =   β α   = β |0 + α |1 .
  • 21. 10 Xα |0 + β |1 β |0 + α |1 Figure 1.1: A quantum circuit of the Pauli X operation. Quantum circuits are always read left to right, or top to bottom depending on the orientation of the diagram. The solid horizontal lines are paths carrying one or more qubits through the circuitry, where unitary operations are usually represented as boxes marked by symbols. We commonly use the following seven unitary operations. The first four are called the Pauli operations and act on one qubit: I =   1 0 0 1   , X =   0 1 1 0   , Y =   0 −i i 0   , and Z =   1 0 0 −1   . Another single qubit operation is the Hadamard operation H = 1 √ 2   1 1 1 −1   , that maps |0 H → 1 √ 2 (|0 + |1 ) def = |+ ,
  • 22. 11 and |1 H → 1 √ 2 (|0 − |1 ) def = |− , which are both equal superpositions of |0 and |1 , but differing by the −1 factor in front of |1 , which is also known as a phase factor. The most common multiple qubit operation is the controlled NOT, or CNOT operation which acts on two qubits: a control and a target. It is a quantum version of the exclusive-or (XOR) logical operation, symbolized by “⊕.” The CNOT operation is mathematically defined as the matrix CNOT =         1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0         . However, it is best described by the quantum circuit in Figure 1.2, where c, t ∈ {0, 1}, |c is the control qubit, and |t is the target qubit. When the control qubit |c is set to |1 , we apply X to the target qubit |t . When the control qubit is set to |0 , we apply I to the target qubit. In Figure 1.2, we define the CNOT operation by describing how it acts on the basis inputs |00 , |01 , |10 , |11 . By linearity, defining CNOT on the basis vectors of H⊗2 defines the operation for every vector in H⊗2 . |c ⊕ th |c|c |t x Figure 1.2: The controlled NOT operation on two qubits. Sometimes we wish to invert the control of the CNOT operation so that |c |t −→
  • 23. 12 |c |¬c ⊕ t . This is achieved by combining the CNOT and X operations. Namely, |c |t X⊗I −→ |¬c |t (1.1) CNOT −→ |¬c |¬c ⊕ t (1.2) X⊗I −→ |¬(¬c) |¬c ⊕ t = |c |¬c ⊕ t . We illustrate the inverted CNOT in the circuit diagram in Figure 1.3. The left-hand side shows the exact computations. Each vertical dashed line identifies the state of the qubits at that point with the corresponding equation number. The right-hand side of Figure 1.3 represents the equivalent shorthand illustration of the inverted CNOT operation. (1.1) h h h ≡ |c|c |t |¬c ⊕ t XX |c|c |t |¬c ⊕ t (1.2) x Figure 1.3: A quantum circuit illustrating the inverted CNOT operation. The seventh and last quantum operation is the Toffoli gate which is a three qubit version of the CNOT. It has two control qubits and one target qubit and is represented by a matrix that is usually denoted by T. In Figure 1.4, we define the Toffoli gate by how it acts on 3 qubit basis vectors, where c1, c2, t ∈ {0, 1}. At times, we generalize the CNOT and Toffoli gates to an n-controlled NOT gate. Figure 1.5 provides an example of one such gate, where c1, . . . , cn, t ∈ {0, 1}. 1.2.3 Measurement Unlike classical computers, we may not observe qubits at any given time without potentially harmful side-effects. In general, quantum measurement transforms a
  • 24. 13 |c1 h x |c2|c2 |t |(c1 ∧ c2) ⊕ t |c1 x Figure 1.4: The Toffoli operation on three qubits. ... h |cn|cn |t |(c1 ∧ · · · ∧ ¬cn−1 ∧ cn) ⊕ t x h |c1 |c1 |cn−1 |cn−1 x Figure 1.5: An example of a generalized n-controlled NOT operation.
  • 25. 14 quantum state to a probabilistic state, where the amplitudes become probability distributions. The postulate of quantum measurement is very elaborate. However, we can repre- sent almost every measurement in this thesis in a simple form called the measurement in the computational basis, in which an arbitrary qubit |ψ = α |0 + β |1 collapses to the classical probabilistic state   |α|2 |β|2   , where one observes the bit 0 with probability |α|2 and the bit 1 with probability |β|2 . Furthermore, immediately after measurement, the qubit |ψ becomes the value observed. Namely, |ψ becomes either |0 or |1 . In quantum circuit diagrams, computational basis measurements are represented as a half circle. Figure 1.6 illustrates how measurement in the computational basis works. As before, the single line represents a qubit path, but the new double line represents the path of one classical bit. 3 2 α |0 + β |1 0, with probability |α|2 1, with probability |β|2 Figure 1.6: The quantum circuit for measurement in the computational basis. Consider measuring the state |0 = 1 |0 + 0 |1 (i.e. α = 1, β = 0) in the computational basis. Figure 1.6 shows that the resulting outcome will be 0 with certainty. Similarly, measuring |1 in the computational basis results in the outcome 1 with certainty. We can extend measurements in the computational basis to multiple qubits. For example, Figure 1.7 describes measuring two qubits of a 2-qubit system. What about only measuring the first qubit of a 2-qubit system? The result is best explained in Figure 1.8.
  • 26. 15 α |00 + β |01 + γ |10 + δ |11    00, with probability |α|2 01, with probability |β|2 10, with probability |γ|2 11, with probability |δ|2 Figure 1.7: Measuring two qubits of a 2-qubit system. α |00 + β |01 + γ |10 + δ |11 0, w. p. |α|2 + |β|2 1, w. p. |γ|2 + |δ|2    α|0 +β|1 √ |α|2+|β|2 , if 0 was measured γ|0 +δ|1 √ |γ|2+|δ|2 , if 1 was measured Figure 1.8: Measuring one qubit of a 2-qubit system. Intuitively, the measured qubit immediately becomes the value observed, and the unmeasured qubit takes on the renormalized superposition not destroyed by the measurement. Since measuring an arbitrary state in the computational basis will collapse the state to |0 or |1 , at times, it is convenient to assume that the classical data path holds a known quantum state |0 or |1 . So the output of measuring one qubit in a 2-qubit system can also be represented as Figure 1.9. α |00 + β |01 + γ |10 + δ |11    |0 ⊗ α|0 +β|1 √ |α|2+|β|2 , w.p. |α|2 + |β|2 |1 ⊗ γ|0 +δ|1 √ |γ|2+|δ|2 , w.p. |γ|2 + |δ|2 Figure 1.9: Alternate output representation of measuring one qubit of a 2-qubit system. From this basic definition of measurement, we can prove the following proposition. Proposition 1.4 Computational basis measurements commute with the controls of CNOT operations. Proposition 1.4 is best described in Figure 1.10. The lower circuit uses classical input for the control. Proof: Let α |00 + β |01 + γ |10 + δ |11 be an arbitrary 2-qubit quantum state.
  • 27. 16    α|00 +β|01 √ |α|2+|β|2 , w.p. |α|2 + |β|2 δ|10 +γ|11 √ |γ|2+|δ|2 , w.p. |γ|2 + |δ|2 h h x x α |00 + β |01 + γ |10 + δ |11 (1.5) (1.3) ≡ α |00 + β |01 + γ |10 + δ |11    α|00 +β|01 √ |α|2+|β|2 , w.p. |α|2 + |β|2 δ|10 +γ|11 √ |γ|2+|δ|2 , w.p. |γ|2 + |δ|2 Figure 1.10: An illustration of a measurement in the computational basis commuting with the control of a CNOT operation. The top circuit in Figure 1.10 performs the following steps: α |00 + β |01 + γ |10 + δ |11 CNOT −→ α |00 + β |01 + δ |10 + γ |11 (1.3) measure⊗I −→    |0 ⊗ α|0 +β|1 √ |α|2+|β|2 , w.p. |α|2 + |β|2 |1 ⊗ δ|0 +γ|1 √ |γ|2+|δ|2 , w.p. |γ|2 + |δ|2 =    α|00 +β|01 √ |α|2+|β|2 , w.p. |α|2 + |β|2 δ|10 +γ|11 √ |γ|2+|δ|2 , w.p. |γ|2 + |δ|2 (1.4) The lower circuit in Figure 1.10 performs the following steps: α |00 + β |01 + γ |10 + δ |11 measure⊗I −→    |0 ⊗ α|0 +β|1 √ |α|2+|β|2 , w.p. |α|2 + |β|2 |1 ⊗ γ|0 +δ|1 √ |γ|2+|δ|2 , w.p. |γ|2 + |δ|2 =    α|00 +β|01 √ |α|2+|β|2 , w.p. |α|2 + |β|2 γ|10 +δ|11 √ |γ|2+|δ|2 , w.p. |γ|2 + |δ|2 (1.5) CNOT −→    α|00 +β|01 √ |α|2+|β|2 , w.p. |α|2 + |β|2 δ|10 +γ|11 √ |γ|2+|δ|2 , w.p. |γ|2 + |δ|2 (1.6)
  • 28. 17 Since the distributions in (1.4) and (1.6) are equal, computational basis measure- ments commute with the controls of a CNOT operations. Let us continue by defining the most general form of quantum measurement. Postulate 3 (Quantum Measurement) Quantum measurements are described by a collection of linear mappings {M0, . . . , Mm} from H⊗n to H⊗n , where m x=0 M† xMx = I. (1.7) The measurement {M0, . . . , Mm} acts on the state space being measured. The index x = 0, . . . , m refers to the measurement outcome that may occur in the experiment. If the state of the quantum system is |ψ immediately before measurement, then the probability that result x = 0, . . . , m occurs is ψ| M† xMx |ψ . The state of the quantum system immediately after measurement is defined to be Mx |ψ ψ| M† xMx |ψ . (1.8) The postulate above is an adaptation of [NC00, pp. 84-85]. Equation 1.7 is called the completeness equation which expresses the fact that the probabilities sum to 1. Namely, m x=1 ψ| M† xMx |ψ = 1. When the measurement operators {M0, . . . , Mm} are all projections, we refer to the measurement as a projective measurement. Projective measurements are said to project the qubits to be measured into an outcome space. The following proposition shows that the measurement in the computational basis abides by the above postulate.
  • 29. 18 Proposition 1.5 Measurement on the computational basis is the quantum measure- ment {|0 0| , |1 1|}. Proof: First note that |0 0| + |1 1| = I, thus satisfying the completeness equa- tion. Let |ψ = α |0 + β |1 be an arbitrary quantum state, and set M0 = |0 0|. The probability of observing 0 is ψ| M† 0 M0 |ψ = ψ| (|0 0|)† (|0 0|) |ψ = ψ| (|0 0|) |ψ = (¯α 0| + ¯β 1|)(|0 0|)(α |0 + β |1 ) = (¯α 0 | 0 + ¯β 1 | 0 )(α 0 | 0 + β 0 | 1 ) = ¯αα = |α|2 . The state of the quantum system immediately after measurement is M |ψ ψ| M† 0 M0 |ψ = (|0 0|)(α |0 + β |1 ) |α|2 = α |0 0 | 0 + β |0 0 | 1 |α| = α |α| |0 . So, when we observe 0, the resulting state is α |α| |0 . The state α |α| |0 is equivalent to |0 in the sense that any quantum evolution U mapping |0 U → U |0 , maps α |α| |0 U → α |α| U |0 . And measuring U |0 and α |α| U |0 results in the same outcome with the same probability. Without loss of generality, we assume our outcome is |0 . The probability of observing a 1 is similar.
  • 30. 19 1.3 Reversible Computing Since classical computers abide by the laws of physics, we should be able to describe a classical computer based on the postulates of quantum mechanics. In this section, we show that a classical computer can be implemented on a quantum computer. Thus, a quantum computer is at least as powerful as a classical computer. To do so, we implement a special kind of classical computer called a reversible computer whereby no information is lost while performing the algorithm, so we can always infer the input from the output. To simulate a classical computer on a quantum computer, we represent the bit 0 as |0 and the bit 1 as |1 . For classical systems requiring multiple bits, we shall represent the bit string b ∈ {0, 1}n as |b ∈ H⊗n . The most common universal set of logical operations consist of AND, OR, NOT and FANOUT. The Toffoli gate implements the logical AND. Let b1, b2 ∈ {0, 1}. Then |b1 |b2 |0 T → |b1 |b2 |b1 ∧ b2 , where the third qubit holds the desired outcome. The Toffoli gate outputs the original input, thus it is trivial to reverse. Please note that the third qubit is an extra ancillary qubit used to store the solution. We call ancillary qubits added for computation ancilla. Using DeMorgan’s law, we can implement logical OR with the Toffoli and Pauli X operations as X⊗3 T(X ⊗ X ⊗ I). Namely, |b1 |b2 |0 X⊗X⊗I −→ |¬b1 |¬b2 |0 T → |¬b1 |¬b2 |¬b1 ∧ ¬b2 X⊗3 −→ |¬(¬b1) |¬(¬b2) |¬(¬b1 ∧ ¬b2) = |b1 |b2 |b1 ∨ b2 .
  • 31. 20 Again, the third qubit holds the desired outcome. The quantum circuit for the implementation of OR is below in Figure 1.11 |b1 ∨ b2 x x X X X X X |b1 |b1 |b2 |b2 |0 h Figure 1.11: A quantum circuit implementing the logical OR. The NOT operation is simply the Pauli X operation as previously described. FANOUT is implemented by a CNOT operation, because |b1 |0 CNOT −→ |b1 |b1 ⊕ 0 = |b1b1 . However, cloning any arbitrary quantum state, akin to how FANOUT per- forms on |0 and |1 , is not possible. This was first proved in [WZ82] and is proved below. Theorem 1.6 (The No-Cloning Theorem) There does not exist a quantum op- eration that maps |ψ to |ψ |ψ for all states |ψ . Proof: Suppose we have such a quantum operation U. Let |ψ and |φ be any two pure quantum states. By definition, |ψ |0 U → |ψ |ψ , |φ |0 U → |φ |φ , and (|ψ + |φ ) |0 U → (|ψ + |φ )(|ψ + |φ ) = |ψ |ψ + |ψ |φ + |φ |ψ + |φ |φ (1.9) But by linearity, (|ψ + |φ ) |0 U → |ψ |ψ + |φ |φ (1.10)
  • 32. 21 But (1.9) and (1.10) differ in general. Finally, we can observe the state of a classical computer at any time without any side-effects to our simulation of a classical computer on a quantum computer. This is because we are only using the computational basis states. We may measure a computational basis state in the computational basis with certainty and without altering its state. Therefore, we can simulate a classical computer on a quantum computer as claimed. 1.4 Circuit Complexity In this section, we informally discuss classical and quantum complexity. It is assumed that the reader has basic knowledge of classical complexity theory. One form of measuring the complexity of a classical algorithm is circuit complexity where we assume that a set of circuits (C1, C2, . . .) performs a certain algorithm. To account for varying input sizes, for n = 1, 2, . . ., the circuit Cn has an n-bit input string [Pap94, pp. 267-268]. We say that a set of circuits (C1, C2, . . .) is in O(f(n)) when there exist positive constants c and n0 such that the circuit Cn requires at most c · f(n) logical operations from some set of universal logical operations for all n ≥ n0 [CLR99, page 26]. As noted previously, the logical operations AND, OR, NOT, and FANOUT form a universal set of logical operations. Any classical algorithm can be reduced to a series of operations in this set. In the quantum scenario, the CNOT operation combined with all 2 × 2 unitary operations form a universal set of quantum operations, such that any quantum algorithm can be described as a series of these operations [NC00, pp. 191-193]. The complexity of a quantum circuit is analogous to the classical circuit complexity, where we count the number of quantum operations from a universal set
  • 33. 22 of quantum operations rather than a set of universal logical operations to bound the complexity of a quantum circuit. 1.5 The Density Operator Formalism Measurements in quantum computing induce probability distributions, so it seems natural to consider a probabilistic quantum computer. We describe a probabilistic quantum computer in the density operator formalism. We have taken great steps to minimize the use of density operators and maximize the use of pure states because pure states are typically easier to understand. However, there are some occasions in this thesis where we must use density operators. The following is a brief summary of the density operator formalism which is enough to understand its use in this thesis. For more information on the density operator formalism, see [NC00, pp. 98-108]. 1.5.1 Mixed States Previously, we described a quantum system as a certain pure state |ψ ∈ H⊗n . Now suppose that a quantum system is in one of a number of probable pure states |ψ1 , |ψ2 , . . . , |ψn ∈ H⊗n , where the probability that the quantum system is in the state |ψx is p(x). We describe such a state as a density operator, or mixed state, ρ = x p(x) |ψx ψx| . For example, recall that (1.6) described a quantum system as being in the state α|00 +β|01 √ |α|2+|β|2 with probability |α|2 + |β|2 , and in the state δ|10 +γ|11 √ |δ|2+|γ|2 with probability |δ|2 + |γ|2 . Using the mixed state formalism, we represent this probabilistic mixture as ρ = (α |00 + β |01 ) ¯α 00| + ¯β 01| + (δ |10 + γ |11 ) ¯δ 10| + ¯γ 11| .
  • 34. 23 1.5.2 Quantum Evolution Previously, we described quantum evolution as a unitary operation U mapping |ψ ∈ H⊗n to U |ψ ∈ H⊗n . Quantum evolution is still described by a unitary operation U. Namely, if a system was in the state |ψx with probability p(x), then after applying U, the system will be in the state U |ψx with probability p(x). Using the density operator formalism, we describe quantum evolution as ρ = x p(x) |ψx ψx| U → x p(x)(U |ψx )( ψx| U† ) = x p(x)U |ψx ψx| U† = U x p(x) |ψx ψx| U† = UρU† . 1.5.3 Measurement Quantum measurements are still described by a collection {M0, . . . , Mm} of linear operations. Recall that in the pure state formalism, given that the initial state was the pure state |ψ , the probability that we measure y = 0, . . . , m is ψ| M† y My |ψ , and the state immediately after measurement is defined to be My |ψ ψ| M† y My |ψ .
  • 35. 24 This implies that, using the mixed state formalism, the probability that we measure y = 0, 1, . . . , m is x p(x) ψx| M† y My |ψx = x p(x) · tr(|ψx ψx| M† y My) = tr x p(x) |ψx ψx| M† y My = tr(ρM† y My), where tr(·) is the trace of a matrix. Also it is possible to derive that the state after the measurement is ρy = MyρM† y tr(ρM† y My) . (1.11) We omit this derivation here (see [NC00, pp. 99-100] for the derivation). 1.5.4 Fidelity At times, it is necessary to see how “close” a mixed state is to some pure state. We quantify this with the notion of fidelity defined below. Definition 1.7 Let |ψ be a pure state and ρ a mixed state. The fidelity between |ψ and ρ is F(|ψ , ρ) = ψ| ρ |ψ . The above definition is a specialized version of fidelity. See [NC00, pp. 409-415] for more information. If ρ = |φ φ| (i.e. we are certain that the quantum system is in the pure state |φ ), then F(|ψ , ρ) = ψ| ρ |ψ = ψ | φ φ | ψ = | ψ | φ |.
  • 36. 25 So the fidelity between two pure states is simply the absolute value of the inner product. 1.6 Entanglement Entanglement is an astonishing property that quantum physical systems can acquire which cannot be represented classically. The Bell states, also known as EPR pairs after Einstein, Podolsky and Rosen [EPR35], best demonstrate entanglement. They are the states |β0,0 def = |00 + |11 √ 2 , |β0,1 def = |01 + |10 √ 2 , |β1,0 def = |00 − |11 √ 2 , and |β1,1 def = |01 − |10 √ 2 . Note that they cannot be written as a tensor product state of two qubits. For instance, |00 + |11 √ 2 = |ψ ⊗ |φ , for any |ψ , |φ ∈ H. This property is called entanglement. Measuring the Bell states results in a surprising outcome. Consider measuring |β0,0 or |β0,1 in the computational basis. Measuring the first qubit of |β0,0 or |β0,1 innocuously results in observing 0 with probability 1 2 and 1 with probability 1 2 . However, when one measures the second qubit at the same time or any time after the first measurement, the outcome is identical to the first measurement. Measuring |β1,0 or |β1,0 results in anti-correlated measurements. We may consider these states as distributed random number generators, because if two parties share a Bell state
  • 37. 26 and require an identical and perfectly random bit shared between them, then they simply measure their half of the Bell state. Figure 1.12 illustrates a quantum circuit generating the Bell states. Let us quickly |βz,x xH Xx Zz |0 |0 h Figure 1.12: A quantum circuit generating the Bell basis states. follow the circuit starting with |00 H⊗I −→ |+ |0 = 1 √ 2 (|0 + |1 ) |0 = |00 + |10 √ 2 CNOT −→ |00 + |11 √ 2 . Next, we perform the operation I ⊗ Zz . Note that when z = 1, we perform the operation I ⊗ Z1 = I ⊗ Z which changes the phase when the second qubit is |1 , and when z = 0 we perform the operation I ⊗ Z0 = I ⊗ I which does not change the state at all. Thus, |00 + |11 √ 2 I⊗Zz −→ |00 + (−1)z |11 √ 2 . Similarly, performing I ⊗ Xx results in |00 + (−1)z |11 √ 2 I⊗Xx −→ |0, x + (−1)z |1, ¬x √ 2 = |βz,x . Finally, note that the Bell states form an orthonormal basis in H⊗2 . In Chapter 5, it will be useful to describe any state in H⊗2n as a superposition of n Bell states.
  • 38. 27 1.6.1 The Partial Trace Sometimes we have some state in H⊗n and wish to disregard the last m qubits. Unfortunately, the last m qubits may be entangled with the first n − m qubits, so we cannot represent them as a product of two states. We use the partial trace to represent one or more qubits entangled with a larger space without discussing the larger space. For instance, say we had the Bell state |00 + |11 √ 2 , and wanted to describe the first qubit alone. What would the state of the first qubit be? Consider the case where we threw the second qubit into the trash where it inadvertently got measured in the computational basis. Now, our first qubit is no longer entangled with the qubit in the trash because it was measured. Also, our first qubit is identical to the ignored qubit in the trash, again, because the two were entangled. Thus, without looking at our first qubit, we know it is |0 with probability 1 2 and |1 with probability 1 2 . Namely, the mixed state 1 2 |0 0| + 1 2 |1 1| = 1 2 I, which is a random qubit called the completely mixed state. On the other hand, say we had the pure state |00 . Since we can represent it as a product of two states (i.e. |00 = |0 ⊗ |0 ), if we threw away the second qubit, then the first qubit will keep the state |0 as long as the second qubit does not jump out of the trash to reunite with the first qubit. To adequately represent a quantum subsystem in H⊗(n−m) of a larger quantum system in H⊗n that cannot be represented as a product of two states |φ ⊗|ψ , where |φ ∈ H⊗(n−m) and |ψ ∈ H⊗m , we must trace out the last m qubits using the partial trace. The first n − m qubits are described by the state trH⊗m ρ defined below.
  • 39. 28 Definition 1.8 Let ρ be a mixed state representing an ensemble of pure states in H⊗n . Then the partial trace of ρ is trH⊗m ρ = 2m i=1 (I ⊗ φi|)ρ(I ⊗ |φi ), where |φ1 , . . . , |φ2m is a set of orthogonal basis vectors in H⊗m . The partial trace is represented as a “trash can” in quantum circuit diagrams, such as in Figure 1.13; however, it should not be considered a quantum operation. Rather consider tracing out a qubit as safely ignoring it. ρ § ¦ ¤ ¥...................................... ...................................... trH⊗m ρ ¦ ¥ Figure 1.13: A quantum circuit representing the partial trace.
  • 40. Chapter 2 Information Theory and Cryptography 2.1 Information Theory Information theory is the mathematical formalism used to quantify the amount of randomness versus the amount of useful data within one random variable, or shared amongst many random variables. Both [McE77] and [MS77] are excellent resources on this topic. In fact, the pedagogy and proofs in this section closely follow [McE77, pp. 15–26]. Probability theory is the foundation of information theory. As such, we use the following definitions and theorems from probability theory: • Let A be a discrete random variable taking on the values a ∈ {0, 1}n with probability Pr[A = a]. Let p(a) = Pr[A = a] be the probability mass function of A. Likewise, let B be a discrete random variable taking on the values b ∈ {0, 1}n with respect to the probability mass function p(b) = Pr[B = b], and let E be a discrete random variable taking on the values e ∈ {0, 1}m with respect to the probability mass function p(e) = Pr[E = e]. • For a ∈ {0, 1}n , and b ∈ {0, 1}n define p(a, b) = Pr[A = a and B = b], p(a|b) = p(a, b) p(b) , when p(b) = 0. So p(a, b) = p(a|b)p(b). 29
  • 41. 30 • Bayes’ Theorem states: if p(b) 0, then p(a|b) = p(a)p(b|a) p(b) . • Let x1, . . . , xn ∈ (0, 1) such that n i=1 xi = 1, and y1, . . . , yn ∈ R≥0 . Then Jensen’s Inequality of logarithms states: n i=1 xi log2 yi ≤ log2 i xiyi , with equality when x1 = . . . = xn = y1 = . . . = yn. Bayes’ Theorem is proved in many elementary probability theory texts including [WMS96, pp. 62–63]. Among others, Bollobas [Bol90, pp. 3–4] contains a nice proof of Jensen’s Inequality. Our first information theoretic quantity is the binary Shannon entropy, or simply, the Shannon entropy. Definition 2.1 The binary Shannon entropy of A is defined as H(A) = a∈{0,1}n p(a) log2 1 p(a) , where we follow the convention that 0 log2 1 0 def = 0. The Shannon entropy, H(A) can be thought of as quantifying the number of random bits contained in the outcome of A. For instance, let p(a) = 1 for some chosen a, then there is no randomness in A because we will only observe the chosen a, and we can intuitively assume H(A) = 0. On the other extreme, say p(a) = 1 2n for all a, then each outcome occurs with equal probability, and intuitively, every bit we observe will be random. These intuitions are proved in the next lemma. Lemma 2.2 H(A) ≥ 0 with equality when p(a) = 1 for some a ∈ {0, 1}n ; and H(A) ≤ n with equality when p(a) = 1 2n for all a ∈ {0, 1}n .
  • 42. 31 Proof: Since p(a) ≤ 1 for all a, then for all p(a) = 0, p(a) log2 1 p(a) ≥ 0. Further- more, p(a) log2 1 p(a) = 0 if and only if p(a) = 1 or p(a) = 0. So, H(A) = 0 if and only if p(a) = 1 for some a. By Jensen’s inequality of logarithms, H(A) = a∈{0,1}n p(a) log2 1 p(a) ≤ log2   a∈{0,1}n p(a) 1 p(a)   = n, with equality when p(a) = 1 2n for all a ∈ {0, 1}n . Another measure of information is called mutual information which quantifies the amount of information A provides about B. Mutual information is defined below. Definition 2.3 The mutual information between A and B is I(A; B) = a,b p(a, b) log2 p(b|a) p(b) . Two useful points regarding mutual information are: since for p(a), p(b) 0, p(b, a) = p(b|a)p(a), then I(A; B) = a,b p(a, b) log2 p(a,b) p(a)p(b) ; and by Bayes’ Theorem, I(A; B) = I(B; A). If we have three random variables, we can derive the amount of mutual informa- tion that E provides about A and B as I(A, B; E) = a,b,e p(a, b, e) log2 p(e|a, b) p(e) , where p(e|a, b) = p(a,b,e) p(a,b) = Pr[A=a and B=b and E=e] Pr[A=a and B=b] .
  • 43. 32 2.2 Quantum Information Theory In this section, we motivate and define a quantum version of Shannon entropy. Let X be a random variable taking on the value x ∈ {0, 1}n with probability p(x). Let ρ1, . . . , ρn be mixed states. Given a mixed state ρ = n x=1 p(x)ρx, we can ask: “how much randomness is in ρ?” For instance, how certain are we of the outcome of measuring ρ. If ρ = |0 0|, then there is no randomness or uncertainty, because the outcome is always 0 upon measuring it in the computational basis. However, if ρ = 1 n n x=1 |x x|, then measuring ρ will output log2 n random bits. Hence, it seems appropriate to define an uncertainty quantity for mixed states—the entropy of ρ. We call this entropy the von Neumann entropy defined below. Definition 2.4 Let ρ be a mixed state with eigenvalues λ1, . . . , λn. The von Neu- mann entropy of ρ is S(ρ) = n x=1 λx log2 1 λx . The above definition is from [NC00, page 510]. Also, [NC00, Theorem 2.5 pp. 101- 102] proves that tr(ρ) = 1 and ρ is a positive operator, so S(ρ) ≥ 0. The weak Holevo Bound identifies a relationship between quantum and classical information. It is stated below without proof. See the original work of Holevo [Hol73]. Alternatively, see [Wat03, Lectures 14–17] or [NC00, pp. 531–534] for more understandable proofs. Theorem 2.5 (The Weak Holevo Bound) Suppose Alice prepares a state ρA where A = a with probability p(a), and gives Eve the state ρ = a p(a)ρa. Eve performs a quantum measurement described by the elements {M0, . . . , Mm} on ρ. Eve’s measure- ment outcome is represented by the random variable E taking on the values 0, . . . , m. Then for any such measurement {M0, . . . , Mm} done by Eve, I(A; E) ≤ S(ρ).
  • 44. 33 2.3 Cryptography Cryptography is the mathematical study of information security. There are many objectives to information security. Three very important objectives are quoted below from [MvOV97, page 3]: Privacy: “keeping information secret from all but those who are authorized to see it,” Entity Authentication: “corroboration of the identity of an entity (e.g. a person, a computer terminal, etc.),” and Data Integrity: “ensuring information has not been altered by unauthorized or unknown means.” Privacy is the core focus of this thesis. We define a private channel as a means for two parties to communicate with privacy. Typically, a private channel is realized by a protocol utilizing an existing insecure public channel: a means of communication whereby anyone can observe all communications. Entity authentication is another very important objective in information security deserving far more discussion than given in this thesis. We do not discuss entity authentication in this thesis and assume that our three main entities, Alice, Bob, and Eve are always authenticated when communicating over a channel exchanging classical information. Data integrity is discussed to some extent in this thesis. Chapters 3 and 4 are dedicated to data integrity; however, they are not discussed in a cryptographic sense. Only in Chapter 6 do we use our data integrity techniques developed in the previous chapters for cryptographic data integrity. For more information on the many aspects of cryptography, refer to the texts [MvOV97] and [Sti95]. Unfortunately, there are many equivalent definitions in clas-
  • 45. 34 sical1 and quantum cryptography under different names. We tend to follow the definitions used in quantum cryptography while noting the equivalent definitions in classical cryptography. 2.3.1 Creating a Private Channel The process of creating a private channel is best described as a game among three entities we name Alice, Bob, and Eve. Alice’s and Bob’s goal is for Alice to communi- cate a message to Bob in private, whereby Eve “the adversary” gains no knowledge of the private message. Eve’s goal is to acquire as much information about the message as possible. To play the game, Alice and Bob use a symmetric cryptosystem, equivalently known as a cipher, in conjunction with a key distribution protocol, also known as a key establishment protocol. Definition 2.6 A symmetric cryptosystem is a five-tuple (K, M, C, E, D), consist- ing of five nonempty finite sets: a key space K, a message space M, a ciphertext space C, an encryption function space E = {ek : M → C|k ∈ K}, and a decryption function space D = {dk : C → M|k ∈ K}. For each k ∈ K, there is an encryp- tion/decryption function pair (ek, dk) satisfying the property that for all messages m ∈ M, dk(ek(m)) = m. The elements of K are called secret keys, the elements of M messages and the elements of C ciphertexts [Sti95, adapted from Definition 1.1]. Consider applying a symmetric cryptosystem to the game of creating a private channel. If we allow Alice and Bob to communicate in private before playing the game, then Alice and Bob may exchange a secret k ∈ K in private prior to starting. Upon starting the game, Alice chooses a message m, encrypts the message to the 1 “Classical cryptography” is short for cryptography based on classical physics. However, some literature defines classical cryptography as cryptography used prior to the end of World War II.
  • 46. 35 ciphertext c = ek(m), and sends the ciphertext c to Bob. By Definition 2.6, Bob may decrypt c to the original message m by applying dk(c) = dk(ek(m)) = m. The success and failure of the game rests on Eve’s shoulders. Since k was ex- changed in private before the game begun, Eve only observes c. Eve wins the game, or equivalently breaks the cipher, if she extracts a significant amount (to be specified shortly) of m from c. Otherwise, the cipher is said to be secure. To make the key as difficult as possible to guess, Alice and Bob attempt to chose the secret key k ∈ K uniformly at random. We will investigate the probability distributions of the key space, message space, and ciphertext space shortly. There are many different forms of security. For instance, if Eve cannot break the cipher before some predetermined amount of time, then we say the cipher is computationally secure. Adding such constraints only makes the definition of security weaker, but in this thesis, we define and use only the strongest definitions of security. When Alice and Bob do not have the luxury of secretly distributing keys a priori, they must somehow generate keys which are identical, secret and random in the presence of Eve. To do so, Alice and Bob use a key distribution protocol which is a multi-party process whereby a shared secret key becomes available to two parties, for subsequent use in a symmetric cryptosystem [MvOV97, page 490]. Key distribution protocols will be discussed shortly, but for now, let us continue to focus our attention on symmetric cryptosystems. 2.3.2 The One-Time Pad The Vernam one-time pad cipher, or simply the one-time pad, insures that Alice and Bob win the private channel game. It is an unconditionally secure symmetric cryptosystem whereby Eve receives absolutely no information about the message from the ciphertext, thus allowing Alice and Bob to communicate in absolute privacy. In this section, we define the one-time pad, define unconditional security, and prove
  • 47. 36 that the one-time pad is unconditionally secure. The one-time pad was first published by Gilbert Vernam in [Ver26] and is de- scribed in Protocol 2.1. Protocol 2.1 The Vernam One-Time Pad Cipher Given: Let K = M = C = {0, 1}n . Assume Alice and Bob privately exchanged a secret key k ∈ K prior to starting the protocol. 1: Alice encrypts an n bit message m with ek(m) = m ⊕ k, and sends c to Bob over an insecure channel. 2: Bob receives c and decrypts with dk(c) = c ⊕ k, to regain the Alice’s message m. The one-time pad correctly encrypts and decrypts any message because for any k ∈ K, m ∈ M, and c ∈ C, dk(ek(m)) = (m ⊕ k) ⊕ k = m ⊕ (k ⊕ k) = m ⊕ 0 = m. Previously, we considered K, C, and M as sets. At times, it is appropriate to assign probability distributions to these sets. Consider K, C and M as both sets and random variables. Let K be a random variable taking on the key value k ∈ K with probability Pr[K = k]. If the key is chosen uniformly at random in order to make it as difficult as possible for Eve to guess, then Pr[K = k] = 1 2n . Let M be a random variable taking on the message value m ∈ M. The a priori probability that the message m occurs is Pr[M = m]. The two random variables K and M
  • 48. 37 induce the probability distribution of the ciphertext. Let C be a random variable taking on the ciphertext c ∈ C with probability Pr[C = c]. We find Pr[C = c] based on the probability distributions of K and M by fixing a k ∈ K and letting Ck = {ek(m) : m ∈ M} be the set of all possible ciphertexts given that the key was k. Then for every c ∈ C, Pr[C = c] = k∈K : c∈Ck Pr[K = k] · Pr[M = dk(c)]. It is also useful to know the probability that the ciphertext c ∈ C was obtained when the message was m ∈ M. This probability is Pr[C = c|M = m] = k∈K : m=dk(c) Pr[K = k]. Using this notation, we can now define unconditional security. Definition 2.7 Let (K, M, C, E, D) be a symmetric cryptosystem. Let m ∈ M be a message chosen by Alice and c = ek(m). Then (K, M, C, E, D) is unconditionally secure if Pr[M = m|C = c] = Pr[M = m], for all m ∈ M and c ∈ C. In plain English, a cipher is unconditionally secure if Eve’s chance of guessing the message m is the same with or without knowing the ciphertext c. Therefore, the only data Eve sees, the ciphertext, has absolutely no value! Unconditional security is also known as perfect secrecy, and was defined by Claude Shannon in [Sha49]. In the same paper, Shannon proved that the one-time pad is unconditionally secure. Theorem 2.8 If each key is chosen with equal likelihood, then the one-time pad is unconditionally secure. Proof: This proof is based on the proof of a similar cryptosystem in [Sti95, pp. 48– 49].
  • 49. 38 The theorem assumes Alice and Bob chose the key k ∈ K uniformly at random. Namely, Pr[K = k] = 1 2n for all k ∈ K. Our next step is to show Pr[C = c] = Pr[C = c|M = m] for all m ∈ M and c ∈ C. Let c ∈ C and m ∈ M. First consider Pr[C = c]. Since Ck = {ek(m) : m ∈ {0, 1}n } = {k ⊕ m : m ∈ {0, 1}n } = k ⊕ {0, 1}n = {0, 1}n , we can express Pr[C = c] as, Pr[C = c] = k∈{0,1}n Pr[K = k] · Pr[M = dk(c)] = k∈{0,1}n 1 2n · Pr[M = c ⊕ k] = 1 2n m∈{0,1}n Pr[M = m] = 1 2n . The third equality is based on the fact that for any fixed c the map k → c⊕k is a per- mutation on the set {0, 1}n . Thus k∈{0,1}n Pr[M = c ⊕ k] is just k∈{0,1}n Pr[M = k] = m∈{0,1}n Pr[M = m] with the summands permuted. The last inequality above is true because the sum of probabilities of any probability distribution is 1. Next, consider Pr[C = c|M = m] which is Pr[C = c|M = m] = k∈K : m=dk(c) Pr[K = k] = Pr[K = k] = 1 2n .
  • 50. 39 The second equality is because there is only one key which decrypts a fixed message to a fixed ciphertext. Finally, by Bayes’ Theorem, Pr[M = m|C = c] = Pr[M = m] · Pr[C = c|M = m] Pr[C = c] = Pr[M = m] · 1 2n 1 2n = Pr[M = m]. Therefore, by Definition 2.7, the one-time pad is unconditionally secure. 2.3.3 The Diffie–Hellman Key Distribution Protocol If we disallow Alice and Bob to communicate in private before playing the game, then Alice and Bob must use a key distribution protocol to generate a secret key k ∈ K, and then continue using a cipher as before. If Alice and Bob use the key for the one-time pad, then all of Eve’s attacks are based on her knowledge gained from the key distribution protocol. In this case, the security of the key distribution protocol is the weakest link in creating a private channel. The Diffie–Hellman key distribution protocol [DH76] was the first key distribution protocol that did not require a trusted third party. It is described in Protocol 2.2. The Diffie–Hellman key distribution protocol correctly generates identical keys because (a )b ≡ (ga )b (mod p) ≡ (gb )a (mod p) ≡ (b )a (mod p). Let us briefly analyze the security of the Diffie–Hellman key distribution protocol. During the communication, Eve can acquire g, p, a and b . One possible attack is
  • 51. 40 Protocol 2.2 The Diffie–Hellman key distribution protocol Given: Alice and Bob publicly select a prime p and a generator g of Z∗ p. 1: Alice randomly chooses a secret 1 ≤ a ≤ p − 2, and sends a ≡ ga (mod p) to Bob. 2: Bob randomly chooses a secret 1 ≤ b ≤ p − 2, and sends b ≡ gb (mod p) to Alice. 3: Upon receiving a , Bob computes the key k ≡ (a )b (mod p), 0 ≤ k p. 4: Upon receiving b , Alice computes the key k ≡ (b )a (mod p), 0 ≤ k p. for Eve to find a ≡ logg a (mod p) and calculate k ≡ (b )a (mod p). Equivalently, Eve may find b ≡ logg b (mod p) and calculate k ≡ (a )b (mod p). The problem of finding a from a is called the discrete logarithm problem. With current technology, the discrete logarithm problem seems quite difficult to solve. The fastest known classical discrete logarithm solver is the number field sieve requiring a staggering number of operations, specifically 2O(n 1 3 log 2 3 n) operations, where n = log2 p (See [Gor93] and [Sch00]). Using current technology, it is possible to compute the discrete logarithm of numbers with 399-bit modulus [JL02]. Besides a brute-force search, solving the discrete logarithm problem is the only known classical attack on Diffie–Hellman. Extracting discrete logarithms represents one of a number of practical problems where quantum computers excel. In 1994, Shor created a quantum algorithm to solve the discrete logarithm problem in probabilistic polynomial-time, specifically in O(n2 log n log log n) quantum operations [Sho94]. Upon the physical realization of a general purpose quantum computer, most key distribution protocols, including Diffie–Hellman, will be rendered useless, and since [Gor93] is only the best currently known algorithm, it is possible that someone may
  • 52. 41 find a better classical technique to break Diffie–Hellman even sooner. Therefore, it is necessary to design provably secure key distribution protocols under strict security definitions. A few such key distribution protocols exist, one of which is the BB84 quantum key distribution (QKD) protocol. 2.3.4 The BB84 Quantum Key Distribution Protocol The BB84 QKD was developed by Charles Bennett and Gilles Brassard in 1984. The acronym is simply based on its citation [BB84]. The original BB84 protocol is described in Protocol 2.3 with a slight modification2 . We provide an example of BB84 when no errors occurred in Figure 2.1. Protocol 2.3 The Original BB84 QKD 1: Alice chooses two random n-bit strings r, b, and creates the state |ψ = (Hb1 ⊗ · · · ⊗ Hbn ) |r = H(A) |r , and sends |ψ to Bob. 2: Bob chooses a random n-bit string b , receives a noisy version of |ψ , applies H(B) = (Hb1 ⊗ · · · ⊗ Hbn ) to |ψ and measures the qubits on the computational basis to form the bit string r . 3: Alice and Bob publicly disclose b and b . If bi = bi, then they discard the ith bit of r and r . 4: Alice and Bob publicly decide on a random permutation P and apply P to their respective remaining bits. The first half of Alice’s permuted bits are her test bits t, and last half are her key bits k. The first half of Bob’s permuted bits are his test bits t , and last half are his key bits k . 5: Alice and Bob publicly disclose t and t . If t = t , then they assume k = k and it is a secure key. Otherwise, they abort. For over 10 years, BB84 was presumed to be secure because of the No-Cloning Theorem (Theorem 1.6). Since Eve cannot perfectly copy the qubits sent to Bob, 2 The original BB84 protocol randomly selected test bits, rather than randomly permuting the bits and selecting the first half as test bits.
  • 53. 42 Description Classical Data Quantum Data 1: Alice randomly chooses r = [01010101], b = [00110011], calculates Hb1 ⊗ · · · ⊗ Hbn |r resulting in |ψ = |01+−01+− , and sends |ψ to Bob. 2: Bob receives |ψ , chooses b = [01111101], applies Hb1 ⊗ · · · ⊗ Hbn to |ψ to get |ψ = |0−01+−+1 , and measures |ψ to get r = [00011101]. 3: Alice and Bob exchange b and b keeping only the bits where bi = b1. Alice now has the bit string r = [0 01 1], and Bob now has the bit string r = [0 01 1]. 4: Alice and Bob decide on a random permutation matrix P =     1000 0001 0100 0010    . Alice applies P to r rP = [tk] = [0110], and Bob applies P to r r P = [t k ] = [0110]. 5: Alice and Bob disclose t and t . Since t = t they assume k = k and k is secret. Measuring |± in the computational basis results in observing a random bit. The italicized bits indicate instances of a random observation. Figure 2.1: An example of the original BB84 QKD where no errors occurred.
  • 54. 43 there is little chance she shares the same result as Bob. Furthermore, measurement implies disturbance, so if Eve is too aggressive, then Bob’s test bits will differ from Alice’s test bits, thus forcing Alice and Bob to abort. We did not comment on the one-time pad or Diffie–Hellman aborting, but nevertheless, these protocols are susceptible to such an attack as well. In these protocols, Eve could simply block or manipulate transmitted messages forcing Alice and Bob to abort. This is commonly known as a denial of service attack (DoS). A DoS attack is not a complete attack because it does not allow Alice to transmit a message to Bob; hence Eve has nothing to eavesdrop on. The original BB84 was also impractical. If any errors occurred, whether it was from an adversary, or natural noise, then the protocol would abort. Since noise is common in nature, the original BB84 will almost always abort in practice. This led to a movement to alter BB84 to make it more practical by adding classical post- processing, such as additional classical computation and/or classical communication. Adding classical post-processing to BB84 has the benefit of increasing its security without increasing its difficulty to implement. However, such additions made BB84 more complex and made it seemingly more difficult to prove its security. In 1994, Mayers posted a preprint later published in print as [May01] which was the first proof that the original BB84 and a practical BB84 were secure. Unfortu- nately, [May01] was understood by few, and many still questioned the security of BB84. In 2000, Shor and Preskill found a “simple” proof of BB84’s security [SP00] based on Mayers’ unique use of error correcting codes. This thesis proves the security of a practical BB84 using the Shor–Preskill style introduced in [SP00], and provides all necessary background material to understand the Shor–Preskill proof.
  • 55. 44 2.3.5 On the Security of Key Distribution Protocols In this section, we define a strict security model for key distribution protocols and discuss its application to Diffie–Hellman. In the context of this thesis, we must define two types of attacks: a passive attack and an active attack. Eve performs a passive attack when she does not alter or interrupt communication between Alice and Bob. Otherwise, Eve’s attack is an active attack. Definition 2.9 Let 0 ≤ 1, let 0 ≤ δ 1 2 , and let KDP be a key distribution protocol performed between two parties, Alice and Bob, which either aborts or pro- duces output to both Alice and Bob. Let A, B and E be random variables taking on some k-bit value: A and B represent Alice’s and Bob’s respective outcomes upon per- forming KDP, and E represents Eve’s outcome upon performing any eavesdropping strategy on KDP. Then KDP is (k, , δ)-conditionally secure if: 1. Correctness and Privacy: If Eve performs a passive attack, then with probability 1 − δ, Alice and Bob complete the protocol, and there exists a perfectly uniform k-bit string represented by the random variable C (i.e. H(C) = k) such that Pr[A = B = C] ≥ 1 − , (2.1) and I(C; E) ≤ . (2.2) 2. Robustness: If Eve performs an active attack, then with probability 1−δ, Alice and Bob either abort or complete the protocol, satisfying both (2.1) and (2.2). In information theoretic cryptography, a (k, , δ)-conditionally secure protocol is also known as a robust (PABE, k, , δ)-protocol [MW03, Definition 5], where PABE is the joint probability distribution of A, B, and E. In fact, the above definition is adapted from [MW03, Definition 5].
  • 56. 45 In plain English, Definition 2.9 states that a key distribution protocol is (k, , δ)- conditionally secure if in the event that Alice and Bob complete the protocol without aborting, then with high probability, they have identical and random keys of which Eve has little knowledge. The value bounds both the probability that Alice and Bob create a uniformly random key, and the amount of information Eve obtains. The value δ bounds the probability that the protocol performed correctly under attack. Thus, the smaller and δ get, the more secure and robust the key distribution protocol is. Conditional security shares some similarity with unconditional security. A (k, 0, δ)- conditionally secure key distribution protocol is said to be unconditionally secure, since I(C; E) = c,e p(c, e) log2 p(c|e) p(c) = 0, implies p(c|e) p(c) = 1, or equivalently, p(c|e) = p(c) for all c, e where p(c) 0. If one interprets the outcome of a key distribution protocol as a “message” and all public communications as “ciphertext” (i.e. c is the message and e is the ciphertext), then a (k, 0, δ)-conditionally secure key distribution protocol is unconditionally secure as per Definition 2.7. On the other extreme, when no 0 ≤ 1 exists for a key distribution protocol to be (k, , δ)-conditionally secure, we say it is information theoretically insecure. Proposition 2.10 The Diffie–Hellman key distribution protocol is information the- oretically insecure. This proposition seems intuitive because we do not impose a time limit or physical limit on Eve. We simply assume that Eve must follow the laws of physics. This allows her to either take the time to solve the required discrete logarithm on a
  • 57. 46 classical computer, or quickly solve the discrete logarithm on a quantum computer, and subsequently find the key. Hence C = E, so I(C; E) = c,e p(c, e) log2 p(c|e) p(c) = c p(c) log2 1 p(c) = H(C) = k 1. The second equality holds because C = E, so p(c, e) = p(c, c) = p(c) and p(c|e) = p(c|c) = 1. See [MW03, Corollary 6] for a formal proof. Chapter 6 is dedicated to proving that a practical version of BB84 is conditionally secure. Proving so requires a base knowledge in the theory of classical error correcting codes and quantum error correcting codes. We discuss classical error correcting code in the next chapter.
  • 58. Chapter 3 Coding Theory Suppose we wish to send digital information as electrical pulses over a wire. Due to environmental disturbances and physical imperfections in the wire, the transmitter, and the receiver, some 0’s may be damaged and mistaken as 1’s and vice versa. We define a noisy channel, or more formally, a (memoryless) binary symmetric channel as this scenario, and characterize it by the probability p that 0 erroneously flips to 1, or 1 erroneously flips to 0. A binary symmetric channel with error probability p is often denoted as BSC(p). We assume p 1 2 because a BSC(1 2 ) is simply a random number generator and useless for communication. Also, if two parties share a BSC(p), where p 1 2 , then the sender may invert the input or receiver may invert the output to simulate a BSC(1 − p). E E r rr rr rr rr rrj¨ ¨¨ ¨¨ ¨¨ ¨¨ ¨¨B0 1 0 1 p p 1 − p 1 − p Figure 3.1: A visual representation of a binary symmetric channel with error prob- ability p. To send a digital message reliably over a binary symmetric channel, the sender encodes a message with a code into a codeword and then sends the codeword over the binary symmetric channel to the receiver. The receiver decodes the codeword back to the original message. In this chapter, we introduce the concept of coding theory, beginning with repe- tition codes, and then rigorously define binary linear codes. This chapter is based on 47
  • 59. 48 three excellent books on coding theory, namely [MS77], [McE77], and [Ber74]. The text [Ber74] is a compilation of many republished fundamental papers on coding theory. 3.1 Repetition Codes To understand the general process of coding, we shall first informally introduce repetition codes by example. A repetition code of length n encodes a single bit into n copies. For example, a repetition code of length 3 has the following encoding: 0 encode −→ 000, and 1 encode −→ 111. To decode, the receiver merely takes a majority vote. More 0’s typically implies the original message was 0; and more 1’s typically implies the original message was 1. If zero or one bit flips occur in the codeword, then the majority decoding works correctly. If two or three bit flips occur, then the majority decoding will err. In general, a repetition code of length n protect against up to n−1 2 bit flip errors in the codewords. The possible correctable errors for the repetition code of length 3 can be represented as the four following error strings: e0 = 000 (i.e. no error), e1 = 100, e2 = 010, and e3 = 001. We represent an erroneous codeword by XORing an error string with the original codeword. For example, the error e2 has the effect
  • 60. 49 000 e2 −→ 000 ⊕ 010 = 010, (3.1) and 111 e2 −→ 111 ⊕ 010 = 101. (3.2) Upon receiving the erroneous codeword, the receiver decodes by taking the ma- jority vote. In this case, (3.1) would correctly decode to 0; and (3.2) would correctly decode to 1. 3.1.1 Performance of Repetition Codes So far, we have discussed how to make repetition codes of length n to correct t = n−1 2 errors. How will these codes perform over a binary symmetric channel with bit flip error probability p? Consider a binary symmetric channel with bit flip error probability 1 10 . Sending one message bit unencoded would be reliably received 90% of the time. If we were to encode the message bit with a repetition code of length 3, then the code could correct up to one bit flip error. We will now calculate the probability that a correctable error occurs. Let Zn 2 = {[x1x2 . . . xn] | xi ∈ Z2}, and let E be a uniform random variable that takes on vectors1 in Z3 2. The probability that no errors occur (i.e. E = [000]) is Pr[E = [000]] = 1 − 1 10 3 = 729 1000 . 1 When applicable, we assume that binary strings of length n are equivalent to n-dimensional binary vectors. Namely, {0, 1}n ≡ Zn 2 .
  • 61. 50 The probability of one bit flip occurring is Pr[E = [100]] + Pr[E = [010]] + Pr[E = [001]]. Since Pr[E = [100]] = Pr[E = [010]] = Pr[E = [001]], Pr[E = [100]] + Pr[E = [010]] + Pr[E = [001]] = 3 · Pr[E = [100]] = 3 · 1 10 · 1 − 1 10 2 = 243 1000 . Thus, the probability that the message can be decoded correctly is 729 1000 + 243 1000 = 972 1000 ≈ 97%, which is about 7% better than sending a single bit unencoded. What if a 97% chance of success is not good enough? Let us consider the general case. Let C be a repetition code of length n, and let BSC(p) be a binary symmetric channel with error probability p. We can properly decode using “majority vote” if at most t = n−1 2 errors occur to the codeword. Recall that Pr[E = [001]] = Pr[E = [010]] = Pr[E = [100]]. This is because they all represent one error, or equivalently, they all represent error vectors of weight 1. The general concept of weight is defined below. Definition 3.1 Let x = [x1x2 . . . xn] ∈ Zn 2 be an n-bit binary vector. The (Ham- ming) weight wt(x) is the number of 1’s in the vector, or equivalently, wt(x) def = n i=1 xi. The number of error vectors with wt(b) is n b , where n b = n! (n−b)!b! is the binomial coefficient. The probability that any error vector e ∈ Zn 2 occurs over BSC(p) is easily verified to be pwt(e) (1 − p)n−wt(e) . (3.3)
  • 62. 51 Thus, C will reliably send a single bit message over BSC(p) with probability t i=0 n i 1 p i 1 − 1 p n−i . (3.4) So, given some BSC(p), we can choose a repetition code of length n with n large enough so that the reliability is to our liking. In §3.3, we show that repetition codes are poor in the sense that, as we approach higher success rates, the relative distance between the message length and codeword length go to infinity very quickly. However, there exist binary linear codes with reasonable message length to codeword length ratios with high success rates. 3.2 Binary Linear Codes We can mathematically describe repetition codes using linear algebra. Consider the repetition code of length 3, let the messages, 0 and 1, be the respective vectors [0] and [1]. The messages form the vector space Z1 2, called the message space. The codewords, [000] and [111], form a subspace of Z3 2 called the code space, or simply, the code. We will later refer to the code {[000], [111]} as C3. The difference between the codewords in C3 is quite large since the two codewords differ in every bit position. We quantify the difference by the distance defined below. Definition 3.2 Let x = [x1 . . . xn], y = [y1 . . . yn] ∈ Zn 2 be two n-bit binary vectors. The (Hamming) distance dist(x, y) between x and y is the number of bits where the two binary vectors differ. Namely, dist(x, y) def = n i=1 xi ⊕ yi. For example, dist([000], [111]) = 3. Note the connection between distance and weight. Namely, dist(x, y) = wt(x ⊕ y). Also
  • 63. 52 Definition 3.3 Let V be a set of n-bit binary vectors. We define the minimum distance d of V as d = min u,v∈V : u=v dist(u, v). For example, the minimum distance of C3 is 3. Now, we can formally define binary linear codes. Definition 3.4 An [n, k, d] (binary) linear code C is a k-dimensional subspace of the n-dimensional binary vector space Zn 2 . The value n is the code length, k the code dimension, and d the minimum distance of all codewords in C. The rate of C is k n . The message space of an [n, k, d] binary linear code is Zk 2. Note that the code can be any k-dimensional subspace of Zn 2 , so there exist many [n, k, d] binary linear codes for fixed parameters, n, k, and d. The repetition code of length 3 is a [3, 1, 3] binary linear code. In general, repe- tition codes of length n are [n, 1, n] binary linear codes. 3.2.1 Encoding Encoding is a linear function defined by a generator matrix. Definition 3.5 Let C be an [n, k, d] linear code. A generator matrix G for C is a k × n matrix with row space equal to C. Note that we can define a code by its generator matrix alone. If G is a matrix with entries in Z2, its row space is the code generated by G. Let C be an [n, k, d] binary linear code with generator matrix G. To encode a message m ∈ Zk 2 into a codeword c ∈ C, one simply performs the following operation: mG = c.
  • 64. 53 At times, we use the notation m G → c to represent the operation mG = c. We will regularly use the following three generator matrices defined below as examples. They are G1 =         1 0 0 0 0 1 1 0 1 0 0 1 0 1 0 0 1 0 1 1 0 0 0 0 1 1 1 1         , (3.5) G2 =      0 1 1 1 1 0 0 1 0 1 1 0 1 0 1 1 0 1 0 0 1      , (3.6) G3 = [111]. The row space of G1 is a famous binary linear code called the Hamming [7, 4, 3] binary linear code; the row space of G2 is a [7, 3, 4] binary linear code; and the row space of G3 is the [3,1,3] binary linear repetition code since [0] G3 −→ [0][111] = [000], [1] G3 −→ [1][111] = [111]. Encoding the message m = [m1, m2, m3, m4] with G1 produces the codeword mG1 = [m1, m2, m3, m4, m2 ⊕ m3 ⊕ m4, m1 ⊕ m3 ⊕ m4, m1 ⊕ m2 ⊕ m4]. The ordering of the elements does not affect the performance of the code when sent over a binary symmetric channel, because each bit will flip with equal probability. This property applies to all linear codes used to transmit codewords over a binary symmetric channel. Also note that row operations on a generator matrix do not
  • 65. 54 change its row space. Row operations on a matrix are interchanging two rows, multiplying one row by a nonzero number, and adding a multiple of one row to a different row [Nic90, page 8]. We combine these two observations to define code equivalence. Definition 3.6 Two binary linear codes are equivalent if they are generated by two generator matrices such that one generator matrix can be obtained from the other by a sequence of column swaps and row operations. If possible, we express the generator matrix in row reduced echelon form. Note that (3.5) is expressed in row reduced echelon form. 3.2.2 Nearest Neighbour Decoding Consider the following scenario: A sender sends the codeword c ∈ C. The codeword experiences bit flip errors as it travels over the binary symmetric channel to become b ∈ Zn 2 . The difference e = b − c is called the error vector, or simply, the error. A simple but ineffective decoding procedure is to compare b with every codeword in C. We decode b as c where c is a codeword such that dist(b, c ) is minimal. Any such c is a most likely choice for the original codeword since (3.3) shows that fewer errors are more probable than more errors. Namely, if p is the probability that a bit gets flipped, then Pr[wt(e) = i] = pi (1 − pn−i ) = pi − pn pi+1 − pn = pi+1 (1 − pn−(i+1) ) = Pr[wt(e) = i + 1].
  • 66. 55 All decoding methods assume that the erroneous codeword b was originally closest to the codeword c . We use the term “most likely” when using this assumption. This decoding procedure is called nearest neighbour decoding and is used to prove the following theorem. Theorem 3.7 (Hamming Bound [Ham50]) An [n, k, d] binary linear code C cor- rects up to t = d−1 2 bit flip errors under nearest neighbour decoding. Proof: Consider the scenario in which c ∈ C is sent over a binary symmetric channel where it acquires at most t errors. The erroneous codeword is received as b = c ⊕ e, where e ∈ Zn 2 is such that wt(e) ≤ t. The receiver performs nearest neighbour decoding, producing the codeword c ∈ C. We now show that c = c . By assumption, 2t d, so dist(c, b) = wt(c ⊕ b) = wt(c ⊕ (c ⊕ e)) = wt(e) ≤ t, and by nearest neighbour decoding, dist(c , b) ≤ dist(c, b) ≤ t. So dist(c, c ) = wt(c ⊕ c ) = wt((c ⊕ b) ⊕ (c ⊕ b)) ≤ wt(c ⊕ b) + wt(c ⊕ b) = dist(c, b) + dist(c , b) ≤ 2t d.
  • 67. 56 This first inequality is by the triangle inequality, since for any vector v ∈ Zn 2 , wt(v) = v 1 (the l1-norm), and the last inequality holds by our assumption. However, by Definition 3.3, d = minc1,c2∈C | c1=c2 dist(c1, c2). Thus, c = c . We will sometimes refer to an [n, k, d] binary linear code as a t-error correcting binary linear code, where t = d−1 2 . Furthermore, the larger d is, the more errors a code can correct. So to create linear codes that correct more errors, we must increase d, and in turn, increase n since d ≤ n. To conclude, the nearest neighbour decoding method is sufficient for proving properties of linear codes; however, the algorithm is computationally time consuming. One must search all 2k codewords to decode—an exponential time algorithm. Hence we require better methods for decoding. A more useful decoding method uses a special matrix called the parity check matrix. 3.2.3 The Parity Check Matrix A matrix closely related to the generator matrix is the parity check matrix. The parity check matrix is used to identify errors in codewords and assist in a more useful decoding procedure. Definition 3.8 Let C be an [n, k, d] binary linear code. A parity check matrix H for C is an (n − k) × n matrix with the property that HcT = 0 ⇐: c ∈ C. Since C = kerH = {v ∈ Zn 2 | vH = 0}, like the generator matrix, the parity check matrix alone also defines the code C. Recall that every binary linear code has an equivalent code with a generator matrix in row reduced echelon form. This is a useful form because if the generator
  • 68. 57 matrix of the code, or its equivalent, is of the form G = [Ik|A], where Ik is a k ×k identity matrix and A is a k ×(n−k) matrix, then a parity check matrix for the same code is H = [AT |In−k]. For example, the parity check matrix H1 of the Hamming [7, 4, 3] binary linear code with generator matrix G1 is derived as follows. From observing (3.5), let A =         0 1 1 1 0 1 1 1 0 1 1 1         , so H1 =      0 1 1 1 1 0 0 1 0 1 1 0 1 0 1 1 0 1 0 0 1      . Notice that H1 = G2. This is not a coincidence and will be discussed in §3.3.1 shortly. 3.2.4 Syndrome Decoding Using a parity check matrix, we can perform a useful decoding procedure called syndrome decoding [McE77, pp. 137-140]. Let c ∈ C be a codeword and e ∈ E = {e ∈ Zn 2 | wt(e) ≤ t} be a correctable error. Given the erroneous codeword b = c⊕e, we decode by performing the following three steps: 1. error detection: find the error vector e from b by using the parity check matrix, 2. error correction: remove the error e from b to regain the codeword c, and
  • 69. 58 3. unencoding: convert the codeword c back to the message m such that c = mG. On the occasion when e ∈ E, decoding may not correctly extract the original message. In the following exposition, we assume e ∈ E. Error Detection We use the parity check matrix to speed up decoding by calculating the syndrome, s = HbT . The syndrome is only dependent on the error e ∈ E and not the codeword because s = HbT = H(c ⊕ e)T = HcT ⊕ HeT = 0 ⊕ HeT = HeT . The second last equality follows from Definition 3.8. The set of solutions a ∈ Zn 2 with HaT = s form a coset of C. Namely, C ⊕ e = {c ⊕ e | c ∈ C} where e ∈ E. We use syndromes to detect or “diagnose” errors. In fact, for every e ∈ E, there is a unique syndrome identifying e. This is proved below. Theorem 3.9 Let C be a t-error correcting [n, k, d] binary linear code with parity check matrix H. For every e ∈ E = {e ∈ Zn 2 | wt(e) ≤ t} where t = d−1 2 , there is a unique syndrome s such that s = HeT .
  • 70. 59 Before proving the theorem above, we state and prove the following lemma. Lemma 3.10 Let H be a parity check matrix of an [n, k, d] binary linear code C. Then the following four properties are equivalent: 1. vectors u and w are in the same coset of C, 2. u ⊕ w ∈ C, 3. H(u ⊕ w)T = 0, and 4. HuT = HwT . Proof: Let C ⊕ x be a coset of C, and let u, w ∈ Zn 2 . To show Property 1 implies Property 2, assume u, w ∈ C ⊕ x. So u = u ⊕ x and w = w ⊕ x for some u , w ∈ C. Then u ⊕ w = (u ⊕ x) ⊕ (w ⊕ x) = u ⊕ w ∈ C. Property 2 implies Property 3 by Definition 3.8. Property 3 implies Property 4 because by distributivity 0 = H(u⊕w)T = HuT ⊕HwT which implies HuT = HwT . Finally, Property 4 implies Property 1 because HuT = HxT implies H(u ⊕ w)T = 0, so u ⊕ w ∈ C by Definition 3.8 and hence u ⊕ C = w ⊕ C. Using Lemma 3.10 we can prove Theorem 3.9. Proof: (of Theorem 3.9) Let e, f ∈ E be two correctable errors with the same syndrome. Consider the distance between e, f, dist(e, f) = wt(e ⊕ f) ≤ wt(e) + wt(f) ≤ 2t d.
  • 71. 60 The first inequality is by the triangle inequality since the weight is equivalent to the l1-norm in Zn 2 , and the last by assumption. However, by Lemma 3.10, e ⊕ f ∈ C so we must have e = f. As an example, consider a parity check matrix for C3: H3 =   1 1 0 1 0 1   . The possible syndromes are [00]T , [01]T , [10]T , and [11]T associated to the correctable error vectors [000], [001], [010], and [100], respectively. In summary, using the parity check matrix, we perform error detection by cal- culating the syndrome of an erroneous codeword. Assuming that the error vector has at most t bit flips, the unique error vector is determined from the syndrome. In practice, we can find it by pre-computing all syndrome/error pairs in a table indexed by the syndrome and looking up the error by the syndrome. Such a table requires O(2t ) time to pre-compute. In fact, for general binary linear codes, finding the error vector associated with the syndrome requires O(2t ) time. Error Correction Given the error e and the erroneous codeword b, we remove the error (i.e. error correct) by subtracting the error from the erroneous codeword. Namely, the vector b e → b ⊕ e = (c ⊕ e) ⊕ e = c ⊕ (e ⊕ e) = c, results in the original codeword in O(n) time.
  • 72. 61 Unencoding Unencoding is the final step of the decoding procedure that regains the message m ∈ Zk 2 such that c = mG, where G is the generator matrix for the code C. In the preferable case when G = [I|A], we simply truncate the last n − k bits resulting in the original message m in linear time. One might even argue that truncation requires no time because we are simply changing our perspective to look at the first k bits rather than all the n bits. Similarly, one can imagine a polynomial time unencoding procedure when the generator matrix is a column permutation of [I|A]. In the general case, we must solve the system mG = c for the message m, where c is our codeword in C and G is the generator for C. In conclusion, for an arbitrary binary linear code, syndrome decoding requires exponential time. In fact, the decision problem related to decoding is N P-complete [BMvT78]. However, there exist a number of types of binary linear codes that decode in polynomial-time, such as Goppa codes [McE77, Chapter 8]. Discussing such codes is beyond the scope of this thesis. 3.3 Asymptotic Performance of Linear Codes As discussed in §3.2.2, the better the code protects, the longer the codewords need to be. However, the rate of growth of the codeword length relative to the message length must not grow too fast or else the code will be unusably long. In this section, we consider families of codes parameterized by some parameter a such as the family of [a, 1, a] repetition codes and the family of [2a − 1, 2a − a − 1, 3] Hamming codes. We see how good the families perform asymptotically. Definition 3.11 A family of [na, ka, da] codes {Ca}, parameterized by a, is said to be good if the following limits are achieved: 1. lima→∞ na = ∞,
  • 73. 62 2. lima→∞ ka na = R 0, and 3. lima→∞ da na = δ 0. R is called the asymptotic rate and δ is called the asymptotic distance. Unfortunately, the family of repetition codes and the family of Hamming codes are not good. For example, the family of repetition codes has R = 0, and the family of Hamming codes has δ = 0. However, good codes do exist. The Gilbert–Varshamov2 lower bound is a theorem based on a constructive proof that builds a family of good linear codes to protect messages over a binary symmetric channel that applies errors at random. Theorem 3.12 (The Gilbert–Varshamov Lower Bound) There exists a fam- ily of codes with asymptotic rate R and asymptotic distance δ such that R ≥ 1 − H(δ), where H(δ) is the binary Shannon entropy H(δ) = −δ log2 δ − (1 − δ) log2(1 − δ). The Gilbert–Varshamov lower bound is stated here without proof. See [Sud02, Lecture 5 Notes] and [vL82, pp. 66-67] for understandable proofs, and [Gil52] for the original proof. 3.3.1 Code Duals There exists a beautiful and useful symmetry between some binary linear codes. If C is a binary linear code with generator matrix G and parity check matrix H, then there is another code C⊥ with generator matrix G⊥ = H and parity check matrix H⊥ = G. This is formally discussed below. 2 In selected literature, “Varshamov” is often spelled “Varsharmov.” We follow the spelling in [MS77].
  • 74. 63 Definition 3.13 If C is an [n, k, d] binary linear code, then its dual code C⊥ is the set of vectors orthogonal to C. Namely, C⊥ = {v ∈ Zn 2 | c · v = 0, ∀ c ∈ C}. Theorem 3.14 If C is an [n, k, d] binary linear code with generator matrix G and parity check matrix H, then C⊥ is an [n, n − k, d ] binary linear code with generator matrix G⊥ = H and parity check matrix H⊥ = G, for suitable d . Proof: The dimension of C is dim(C) = k. Since dim(C) + dim(C⊥ ) = n, then dim(C⊥ ) = n − k. Let d be the minimum distance of C⊥ . By Definition 3.4, C⊥ is an [n, n − k, d ] binary linear code. By Definition 3.5, G⊥ = H is a generator matrix of C⊥ since its row space is equal to C⊥ . Since the row space of G equals C, then by Definition 3.13, GxT = 0 ⇐: x ∈ C⊥ which, by Definition 3.8, defines H⊥ = G to be a parity check matrix of C⊥ . Not all code duals are useful. The dual to any [3, 1, 3] binary linear code is a [3, 2, 2] binary linear code which corrects 0 = 2−1 2 errors. However, the Hamming [7, 4, 3] binary linear code has a [7, 3, 4] binary linear code for its dual. Both a [7, 4, 3] binary linear code and a [7, 3, 4] binary linear code correct up to one error. Together, the [7, 4, 3] Hamming code and its dual [7, 3, 4] code are used as a quantum error correcting code known as the Steane code [Ste96]. The Steane code is discussed shortly in §4.4. 3.4 Parameterized Codes In this section, we show that any specific linear code gives rise to a class of function- ally equivalent codes by mapping the code C to a collection of cosets of C.
  • 75. 64 Definition 3.15 Let C be an [n, k, d] binary linear code, and let x be any vector in Zn 2 . Then an [n, k, d] parameterized code Cx is the coset C ⊕ x. Encoding and decoding parameterized codes requires minor additions to what we have seen thus far. Let C be an [n, k, d] binary linear code with generator matrix G and parity check matrix H. Previously, encoding some message m ∈ Zn 2 with respect to C was c = mG. In the case of a parameterized code Cx, the codeword cx corresponding to the message m is produced by the function cx = mG ⊕ x. Any error e ∈ Zn 2 acting on a codeword cx ∈ Cx acts the same as if it was acting on the corresponding codeword c ∈ C, where cx = c⊕x. Let b = cx ⊕e be the erroneous codeword. So b = cx ⊕ e = (c ⊕ x) ⊕ e = x ⊕ (c ⊕ e). Thus, calculating the syndrome s of b is simply s = H(b ⊕ x)T , because H(b ⊕ x)T = H((x ⊕ (c ⊕ e)) ⊕ x)T = H(c ⊕ e)T = HeT = s. Thus, decoding a parameterized code via syndrome decoding is straightforward: merely map the received erroneous codeword b to b = b ⊕ x and process b as per §3.2.4.