The document describes Muhammad Ahsan's Ph.D. defense which focused on developing an architecture framework and performance simulation tool for trapped ion quantum computers in order to estimate the reliability and resource requirements for large-scale quantum applications. The research involved defining quantum hardware and architecture models, benchmarking application circuits like Shor's algorithm, and using the simulation tool to evaluate performance metrics under different architecture parameters and device limitations.
3. My Research Background
Computer Architecture Quantum Computing
Fault Tolerance
Quantum Error
Correction
Quantum
Computer
Architecture
Resource
Performance
Estimation
Tool
Topics of the defense
Some Interesting
findings!
Computer Systems
Undergraduate
Ph.D. Research
Computer ScienceElectrical Engineering
Physics
4. Quick Introduction to Quantum Computing (1)
• Quantum Computer consists of
• Quantum- Bit (Qubit) : Store Information in binary basis states:|0>,|1> (e.g. Trapped-
(171Yb+) Ion energy levels)
• Gates: Process Information (e.g. Lasers to cause transition between energy levels)
|𝑏
|𝑐
|𝑏
|𝑏 ⊕ 𝑐
|𝑎 |𝑏 |𝑐 |𝑎 |𝑏 |𝑐
Qubits
T i m e
Quantum Circuit
Quantum
Hardware
Qubits
Lasers
Gates
Ions
X|𝑎 |1 ⊕ 𝑎
Unitary UU† = I operations
U |a> → |a’>
5. Quick Introduction to Quantum Computing (2)
• What makes quantum computers interesting (non-conventional)
• Superposition of two states : a|0> + b|1> 1
• Entanglement between qubits: a|00> + b|11>
• What makes quantum computers more powerful
• Phase-Gates: (a|0> + eiπ/2 b|1> )
• Amplitudes (a, b) are complex (e.g. a|0> - b|1> )
• Quantum Speedup:
• Amplitude cancellation can efficiently eliminate incorrect candidate solutions to search problem
• Universal Quantum Computation
{H, , X, Z , S}
H
Z
S
CNOT
Hadamard
π/2 phase-shift gate
π phase-shift gate
T
π/4 phase-shift gate
(T gate)
U
Controlled –CNOT
(Toffoli gate)
Toffoli , T gates are Double-Edged
Swords
• Practical Quantum Speedup
• Practically Resource Consuming
Insufficient for arbitrary
Quantum Computation
or
Clifford-Gates
1 |a|2+|b|2=1
6. Quantum Computing in the nutshell
• Theoretically, Quantum Computers can solve certain important
problems much faster than conventional (classical) computers:
• Shor’s Integer Factorization Algorithm (Exponential speedup)
• Practically, quantum device component (qubits, gates) are very noisy
and unreliable than classical computers
Need Error-Correction (Redundancy) to protect quantum Information
Mean Time to Failure:
Classical:
~ 107 – 108 hours
Quantum:
~Seconds – Minutes
Failure Prob.
p = 10-3
1 in 1,000 Quantum
Gate fails
8. Multiple Layers of Encoding in [[7,1,3]] code
No Encoding Single-Layer
(L1) Encoding
Two-Layers
(L2) Encoding
Qubits
Clifford-Gates
{H,X,Z,CNOT}
Noise Level
Failure Prob. (p)
1 7 72 = 49
p [e.g. p = 10-7] O(p2) O(p4) [e.g. 10-16]
1 7 72 = 49
Non-Clifford Gates
{e.g. Toffoli}
1 O(103) O(105)
[e.g. 10-10]
Good News: Gain in Reliability > Qubit, Gate Overhead
0 O(101) O(102)Ancilla Qubits
102-103x ↑
106x ↓
9. Fundamental Research Question
To Estimate
How Reliable?
And
How Many?
Qubits, Gates are needed to accomplish what classical computer
cannot in Realistic Time Scale
(e.g., 2,048-bit factorization)
10. Answer depends on …
Compilation of Quantum
Application into Fault-
tolerant gates (e.g. Gate
Decomposition Methods)
Fault-Tolerance Overhead
(e.g. Error-Correcting Codes)
Integrate Qubits, Gates on
Hardware (e.g. Trapped-Ion)
Quantum Application (e.g.
Shor’s Algorithm)
Research Progress (Theory)
Research Progress (Experiment)
Precise Estimate Needs Information about the Quantum Hardware
11. Fundamental Research Question
How Reliable?
And
How Many?
Qubits, Gates are needed to accomplish what classical computer
cannot in Realistic Time Scale
(e.g., 2,048-bit factorization)
Answer Heavily Depends on the Architecture of Quantum Hardware
12. Impact of Hardware Assumptions on the Speed of
Quantum Computer
(Included with permission of rdv, TDL, KMI, quant-ph/0507023)
Days -> Years
Hours -> Days
Classical Quantum
Architecture
Assumption
Matters!
1,024-bit
Factorization
13. Question
• Why think about architecture for large quantum computer ??
WHEN
• We do not know exactly how to build a small quantum computer…
14. Answer…
• Architecture can
1. Compensate Technology Limitations (Memory hierarchy, Multi-Core Designs)
2. Reveal performance-limiting factors
3. Guide future advances in technology
• Example from the History
Slower
Unreliable
Computers
Discrete
Transistors
Integrated
Circuits (IC)
Fast and
Reliable
Computers MOORE’S LAW
15. Research Methodology
• Need to Define Mechanism in which very large number of Qubits are
• Allocated
• Functioned
• Protected
• Connected
in a realistically constructible quantum computer system
• Need a method to efficiently
• Model Quantum Computer Architecture
• Map Quantum Application on the Architecture
• Evaluate the Performance Limiting Factors
Quantum
Computer
Architecture
Performance
Simulation
Crucial Components of my Research
Communication Channel
16. Tool: Taxonomy of Important Terms
• Device Parameters (DPs)
• e.g. physical gate times, failure probability
• Resource Investment
• e.g. Total physical qubits used in the system
• Architecture Parameters
• Functional Allocation (Data, Ancilla) and Connectivity of qubits
• Performance Metrics
• e.g. Total Execution Time (TEXEC), Failure Probability (PFAIL) prob. That quantum
circuit gives incorrect output
Design Space
18. Papers/Publications
• Performance simulator based on Hardware Resources Constraints for
Ion- Trap Quantum Computer (ICCD 2013)
• Optimization of a Quantum Computer Architecture Using Resource
Performance Simulator (DATE 2015)
• Designing Million-Qubit Quantum Computer Using Resource
Performance Simulator [In Submission (2nd Attempt)]
Challenge: Target Community Mostly Unfamiliar with Quantum Computing
19. Philosophy of Performance Simulation Toolset(1)
Device
Parameters
Resource
Overhead
Performance
Evaluation
Architecture
Parameters
Quantum
Circuit
Prior Work:
• Svore et al. (2004)
• Balensiefer et al. (2005)
• Whitney et al. (2007)
• Dousti and Pedram (2012)
F I X E D
LESS FLEXIBLE
LESS FLEXIBLE
INSUFFICIENT
I N S I G H T
20. Philosophy of Performance Simulation Toolset (2)
Device
Parameters
Resource
Overhead
Performanc
e Evaluation
Architecture
Parameters
Quantum
Circuit
My Contribution
Fine Tuning
Knob
Magnifying
Glass
My Contribution
21. Philosophy of Performance Simulation Toolset (2)
Device
Parameters
Resource
Overhead
Performanc
e Evaluation
Architecture
Parameters
Quantum
Circuit
My Contribution
Knob
Magnifying
Glass
Desired
Performance
? No
No
No
Qubits how
Reliable?
Qubits How
Many?
22. What have Learned So Far…
• Question: Quantum Computer Practically Faster Classical Computer?
• What is the quality and the amount of resources needed?
• Answer: Depends upon Quantum Computer Architecture study
• Performance Simulation Tool for Architecture Study
• Quantum Computer Design Cycle
• Flexible Tool to balance improvement in Device Parameters and investment
in Resource , Architecture
25. Fault-tolerant and Scalable Quantum
Computer Architecture
Planar ion traps
. . .
Layer-1 Optical Switch (OS)
. . .
. . .
...
...
. . .
. . .
...
...
. . .
. . .
...
...
Segment
photonic Links
Basic
Ion-trap
Cell Layer-2 Optical Switch (2x time
expensive than Layer-1 OS)
Empty Channels
For Ballistic
Shuttling
Different
Qubit Blocks
Communication Port
Architecture Idea:
Combines the good of
IONS:
Reliable
Storage and
Computation
PHOTONS:
Communication
26. Hardware Description and Device Parameters
U
M
I
Device Parameters (DPs)
2L x 5000μs
. . .
. . .
5000μs @ L=0
10,000μs @ L=1
20,000μs @ L=2
Speed and Reliability of Computation > Speed and Reliability of Communication
28. Shor’s Integer Factorization Algorithm Circuit
Controlled Modular Exponentiation: U(x) = ax mod N
Contains O(n2) Adder calls: 512-bit → ~ 1 Million Adders
1024-bit → ~4 Million Adders
2048-bit → ~16 Million Adder
m-qubitRegister
m=2n
.
.
.
|𝟎
|𝟎
|𝟎
|𝟎
|𝟎
𝑼 𝑼 𝟐
𝑼 𝟒
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
𝑼 𝟐 𝒎−𝟏
Inverse
Quantum
Fourier
Transform
.
.
.
MZ
MZ
MZ
MZ
MZ
n-qubitregister
Contains:
O(n2)
Small angle phase
Rz(π/2n+1) =
Depth = O(n)
𝑛
𝑇
For n-bit integer N
GCD (a, N) = 1, a < N
N = (ar/2-1) (ar/2+1)
Period r is hidden in
Eigenvalues of
U(x) = ax mod N
Classical Complexity:
Exponential in n
Quantum Complexity:
Polynomial O(n3)
Bulk of Shor’s Algorithm.
29. Benchmark Circuits (Approx. QFT)
1
2R
z
2
2R
z
3
2R
z
4
2R
z
2kzR
H T’ H T Z
. . .
. . . T
2kzR 2kzR
2kzR
TXT’
Decode M
Z
7 cat
|0>L T|+>
Magic State Preparation (latency: 78 ms)
SX
M
T|+>
|Data>
T|Data>
Data Teleportation into Magic State
(latency: 12 ms)
|Data>
|Data>
A1
A2
A3
…
…
…
Time
Exec. Delay
V. Kliuchnikov et al. (2013)
Fowler et al. (2005)
Rz(π/28) sufficient to factorize
Integers > 4096-bit long
~375 gates (~150 T gates) for
Approx. Accuracy within 10-16
30. Benchmark Circuits (Quantum Adders)
Quantum Carry-Look Ahead Adder
(QCLA) log-depth
Draper et al. (2004)
Quantum Carry-Look Ahead Adder
(QCLA) linear-depth
Cuccaro et al. (2004)
|𝑥
|𝑦
|𝑧
|𝑥
|𝑦
|𝑥 ⊕ 𝑦 ⊕ 𝑧
Magic State Preparation
Data Injection into Magic State
Toffoli
37. Main Algorithms/Heuristics
• Mapper:
• Goal: Circuit-Level Connectivity Hardware Level Proximity
• Algorithm: Polynomial Time Graph-Theory algo. solving Optimal Linear
Arrangement Problem
• Scheduler:
• Goal: Reduce Execution Time, Insert Error Correction to Minimize Failure Prob.
• Algorithm: [Greedy] Dispatch gate for execution AS SOON AS resources become
available
• Error Analyzer:
• Goal: To evaluate PFAIL by fully counting logical error events
• Algorithm: O(n3) Fault-path Counting
Details in the Thesis
38. Validity/Optimality of Scheduler
• Correctness:
• Detailed analysis the complete
schedule (Smaller Circuits )
• Overall validation by using
Visualizer Output (Larger Circuits)
• Optimality
• Comparing the TEXEC of circuit with
and without resource constraints
• X-fold ↑ Resource should yield X-
fold (or more) ↓ in TEXEC
• TEXEC with constraint approach TEXEC
with no constraints
64-bit Quantum Carry Look-Ahead Adder Circuit
Fewer
Resource
Regime
Sufficient
Resource
Regime
Show that Mapper, Scheduler ‘good’ at
1. Working with Fewer Resources
2. Achieve optimal Performance with
Sufficient Resource
39. Demo
Breakdown
of
TEXEC
A2
B2
C2
S1
S2
Magic State Preparation
Overhead
Toffoli Gate Execution
CNOT Gate Execution
Cross-Segment Swapping
Overhead
Time
Breakdown of Critical Path
Segment S2
Magic
State Prep
A1
B1
C1
Segment S1
Magic
State Prep
Data
Tel.
into
Magic
State
Data
Tel.
into
Magic
State
Magic
State Prep
Magic
State Prep
B1
B2
C2
A1
A2
C1
Data
Tel.
into
Magic
State
A1
B1
C1
A2
B2
C2
Circuit To be Scheduled
Requires
Cross-
Segment
Swapping
40. Execution Time (seconds)
SegmentIDNumber
Adder Circuit (QCLA)
Time Steps
Scheduled Adder Circuit (QCLA)
Demo VisualizationQubIts
Horizontal Lines:
Delays due to fewer
Ancilla qubits
Non-horizontal Lines:
Delays due to fewer
Comm. qubits
41. What Have Learned So Far …
• Benchmark Circuits
• Approximate Quantum Fourier Transform (AQFT: Long sequence of Approx. gates)
• Quantum Ripple Carry Adder (QRCA: nearest-neighbor gates)
• Quantum Carry Look-Ahead Adder (QCLA: Highly Parallelizable circuit, long-distance
gates)
• Performance-Simulation Tool
• Mapper, Scheduler, Error Analyzer (Standard components)
• Can work with varying device and architecture parameters
• Advanced components:
• Visualizer
• Performance metrics Decomposer
• Validation and optimality of Mapper, Scheduler
42. Remember this picture…
Planar ion traps
. . .
Layer-1 Optical Switch (OS)
COMM Tile
. . .
. . .
...
...
. . .
. . .
...
...
. . .
. . .
...
...
Segment
ANCILLA Tile
DATA Tile
Optical Links
Basic
Ion-trap
Cell Layer-2 Optical Switch (2x time
expensive than Layer-1 OS)
Empty Channels
For Ballistic
Shuttling
Architecture Parameters
List
NSeg: Number of Segments
NData: Data Tiles/Segment
NComm: Comm. Tiles /Segment
NAnc: Total Ancilla Tiles
44. First Set of Simulations (Setup)
Goal: Study Performance Limiting Arch. Device Parameters
• Benchmark Circuit
• 1024-bit QCLA
• Error Correction (Steane [[7,1,3]] code)
• Two layers of concatenation (L1, L2)
• L2 Error-correction after each gate
• Qubit sitting idle for long enough time
• Perform L2 Error Correction only
• (No L1 Error-Correction),
• Qubits can decohere for ~ 4.8ms
• Computation
• CNOT (Local, Non-Local)
• TOFFOLI (Local)
• Can execute Toffoli in any Segment
Precomputed Tile performance numbers
L2 Toffoli : ~50 ms, O(10-14)
L2 cross-Seg Teleportation: ~10ms, O(10-11)
45. Labelling the Architecture Space
TEXEC as function of Architecture Parameters
Large Segments, TEXEC depends on Ancilla
(Less Distributed System)
Small Segments, TEXEC depends on Comm. Qubits
(Highly Distributed System)
47. Reducing PFAIL using Device Parameters (DP)
Improving DP: Qubit Memory 10 x → Failure Prob. Reduction ~ 100 x – 1000 x
4 million Adder calls need Failure Prob. << 2.5 x 10-7 to
run 1024-bit Modular Exponentiation, Shor’s algorithm
Ancilla-Regime Optimal Architecture Tel-Regime Optimal Architecture
48. Second Set of Simulations (Setup)
Goal: Study Performance-Scaling of Circuits
• Benchmark Circuit
• QCLA, QRCA and AQFT
• Error Correction (Steane [[7,1,3]] code)
• Two layers of concatenation (TEXEC
L1 < TEXEC
L2)
• L1 Error-correction after each gate
• Qubit sitting idle for long enough time
• Perform L2 Error Correction
• Else, Perform L1 Error-Correction
• Qubits can decohere for ~ 0.6ms
• Computation
• CNOT (Local, Non-Local)
• TOFFOLI, T gates (Local)
• Execute TOFFOLI, T gates in
Computational Segment (CS)
Architecture Configuration:
[#Data Tiles, #Ancilla Tiles, #Comm. Tiles], #CS
49. Resource-Performance Scalability of MUSIQC
Architecture
Requirement:
TEXEC scales logarithmically with
Problems Size and Resources
Requirement: TEXEC scales Linearly with
Problems Size and Resources
QCLA QRCA AQFT
PFAIL scales Linearly with Problems Size
and Resources
MUSIQC Architecture Can Maintain
‘Desirable’ Performance/Resource as
Problem Size Increases
50. TEXEC as function of (#Data, #Ancilla, #Comm. Qubits), #CS
1,024-bit QCLA 1,024-bit QRCA 1,024-bit AQFT
Benchmark
Circuits
Resource Consumption
NCS/Ancilla
Qubits
Comm.
Qubits
Overall
Resource
QCLA High High High
QRCA Low Modest Modest
AQFT Low None Low
51. TEXEC as function of Problem Size
(Total Qubits < 1.5 million qubits)
Optimized Architecture
Configurations (upto
1024-bit Circuit)
[Data, Ancilla, Comm.], #CS
QCLA: [30, 8, 5] , 137
QRCA: [19, 12, 6] , 108
AQFT: [1, 8 , 1] , 16
QCLA: [48,4,2] , 25
2,048 –bit QCLA Adder Doesn’t Fit Well into 1.5 Million-Qubit Machine
QCLA: [30,8,5] ,137
~90x ↓ in Ancilla, Comm. /
Data Tile
52. Running 2,048-bit Shor’s algorithm
in < 5 months
• 2,048-bit Shor’s algorithm ≈ 2,048-bit Modular Exponentiation Circuit
• 2,048-bit Modular Exponentiation → 16 million calls to Adder
• TEXEC of each Adder < 0.8 sec
• Choose QCLA: [Current TEXEC = 260 sec]
• 2x ↑ Resource Budget → TEXEC = 2.76 sec (96x reduction)
• 10x ↓ Physical Op. Time (CNOT, Measurement, Shuttling Time) → TEXEC = 0.69
sec (~4x reduction)
Resources can be important than gate speed for decreasing TEXEC
53. Reducing PFAIL using Device Parameters
• To run 2048-bit modular exponentiation we need 16 million calls to the Adder
• Adder PFAIL = 2.77 x 10-7, needs to be << 1/16 x 106 = 6.67 x 10-8
10x decrease in Entanglement Infidelity → 100x reduction in PFAIL
2.37 x 10-9 << 6.67 x 10-8
(a) Pfail = 2.77 x 10-7
TEL
99.6%
MEM
0.08%
SHUT
0.01%
GATE
0.31%
Infidelity of EPR Pair: 10-4
(b) Pfail = 2.37 x 10-9
TEL
1.9%
SHUT
0.55%
MEM
22.3%
GATE
75.3%
Infidelity of EPR Pair: 10-5
Gate Noise (GATE)
Teleportation Noise (TEL)
Shuttling Noise (SHUT)
Memory Noise (MEM)
54. Overall Performance of 2,048-bit Shor’s
Algorithm within 3 x 106 Qubits
• QCLA (TEXEC = 0.68 sec , PFAIL = 2.37 x 10-9), AQFT (1 day, PFAIL = 6 x 10-5)
• Total execution time: 16 x 106 x 0.68 + 86400 ≈ 128 days
• Total failure probability : 16 x 106 x 2.37 x 10-9 + 6 x 10-5 ≈ 0.04
55. • Architecture Space Exploration
• Overall performance of fully error-corrected circuit can be bottlenecked by
• Non-Clifford Ancilla qubits
• Long-distance Communication qubits
• Performance-Scalability of Architecture
• Resource allocation should meet the computational workload of the
benchmark application circuit
Summary of My Results (1)
Studying Architecture Space for Variety of Benchmarks of Varying Sizes
56. Summary of My Results (1)
• Reliability
• Overall PFAIL can be limited either by Memory Coherence Time or Fidelity of
Communication Channel (depending on error-correction strategy)
• Speed
• For overall TEXEC Resource budget (#qubits) can be more crucial than speeding
up the quantum gates
• Using 3 million qubits, each failing with prob. 10-7/op, a 2,048-bit
number can be factored in less than five months.
57. Future Work
• Sophisticated Models of Physical Components
• Cross-talk, loss of qubits etc.
• Quantum Technologies
• Superconductors
• Quantum dots
• Error-Correcting Codes and Noise Models
• Topological quantum codes
• Exploiting X/Z error asymmetry
• Correlated errors
Interesting Venues of Future Research
60. First Set of Simulations (Results[2])
PFAIL as function of Architecture Parameters
Ancilla-Regime, PFAIL depends on Shuttling Noise
Tel-Regime, PFAIL depends on Memory Noise, Teleportation Noise
61. More Answers…
• Architectures can compensate for the technology limitations
• Memory/Cache-Hierarchy
• Hides the slowness of dynamic RAM,
Secondary memory devices
• Multi-core designs
• Reduce power dissipation without compromising
the performance
62. Research in Classical Architecture
• Classical Era (lasted until 90s)
• Human intuition and experience for searching the computer design space
• Modern Era
• Software tools find good computer design. [Todd Austin’s SIMPLESCALAR
Toolset]
The idea of performance Simulation Toolset for Quantum Computers
63. Fault tolerance and resource overhead
•Encode 1-logical qubit → 7-physical qubit error-correcting code (e.g.,
Steane [[7,1,3]])
• Construct DATA QUBITS
• Perform error correction regularly on encoded qubits to kill errors
• Requires ANCILLA QUBITS to perform parity checks
• Apply fault-tolerant gates on encoded (logical) qubits
• Generally requires ANCILLA QUBITS for correct operation and error prevention
Noise level reduces O(p) → O(p2) with each layer of encoding
64. 1.0E-8
1.0E-7
1.0E-6
1.0E-5
1.0E-4
0 200 400 600 800 1000 1200
FailureProbability
Execution Time (sec)
QRCA 2048
QCLA 2048
(TEXEC, PFAIL) as function of Gate Speed, Resources
1.0E-8
1.0E-7
1.0E-6
1.0E-5
1.0E-4
0 200 400 600 800 1000 1200
Failureprobability
Execution Time (sec)
QRCA 2048
QCLA 2048
Gate Speed: 10X ↑Resources: 2X ↑
1.0E-8
1.0E-7
1.0E-6
1.0E-5
1.0E-4
0 200 400 600 800 1000 1200
FailureProbability
Execution Time (sec)
QRCA 2048
QCLA 2048
Resources can be important than gate speed for decreasing TEXEC
PFAIL doesn’t decrease appreciably by adding more resources
100x ↓ TEXEC for QCLA
2,048-QCLA TEXEC = 0.69s, PFAIL = 2.77 X 10-7 with 2x ↑ Resource and 10x ↓ Gate Times
65. Quick Introduction to Quantum Computing
• Quantum Computer consists of
• Qubits : Store Information (e.g. Trapped-(171Yb+) Ion energy levels)
• Gates: Process Information (e.g. Lasers to cause transition between energy levels)
|𝑥
|𝑦
|𝑧
|𝑥
|𝑦
|𝑥 ⊕ 𝑦 ⊕ 𝑧
|𝑥 |𝑦 |𝑧 |𝑥 |𝑦 |𝑧
Qubits
T i m e
Quantum Circuit
Quantum
Hardware
Qubits
Lasers
Gate
Ions
66.
67. Labelling the Architecture Space
TEXEC as function of Architecture Parameters
Large Segments, TEXEC depends on Ancilla
(Less Distributed System)
Small Segments, TEXEC depends on Comm. Qubits
(Highly Distributed System)
68. TEXEC as function of Problem Size
(Total Qubits < 1.5 million qubits)
Optimal Architecture
Configurations (upto
1024-bit Circuit)
[Data, Ancilla, Comm.], #CS
QCLA: [30, 8, 5] , 137
QRCA: [19, 12, 6] , 108
AQFT: [1, 8 , 1] , 16
QCLA: [48,4,2] , 25
Ancilla Prep:10%
Comm.: 85%
Gate: 4%
2,048 –bit QCLA Adder doesn’t well fit into 1.5 million-qubit machine
QCLA: [30,8,5] ,137
Ancilla Prep: 38%
Comm.: 29%
Gate: 33%
69. Second Set of Simulations (Setup)
Goal: Study Performance-Scaling of Circuits
• Benchmark Circuit
• QCLA, QRCA and AQFT
• Error Correction (Steane [[7,1,3]] code)
• Two layers of concatenation (TL2 = 70TL1)
• L1 Error-correction after each gate
• Qubit sitting idle for long enough time
• Perform L2 Error Correction
• Else, Perform L1 Error-Correction
• Physical qubits can decohere for < 0.6ms
• Computation
• CNOT (Local, Non-Local)
• TOFFOLI, T gates (Local)
• Execute TOFFOLI, T gates in Computational Segment (CS)
Pre-computed Tile performance numbers
Architecture Configuration: (#Data, #Ancilla, #Comm (qubits)), # CS
L2 Toffoli : ~51 ms, O(10-14)
L2 cross-Seg Teleportation: ~5ms, O(10-11)