Architecture Framework for Trapped Ion Quantum Computer Simulation

Architecture Framework for
Trapped Ion Quantum Computer
Muhammad Ahsan
[Ph.D. Defense]
Department of Computer Science, Duke University

Agenda
• Background and Motivation
• Quantum Hardware and Architecture Models
• Benchmark Application Circuits
• Performance Simulation Tool
• Results

My Research Background
Computer Architecture Quantum Computing
Fault Tolerance
Quantum Error
Correction
Quantum
Computer
Architecture
Resource
Performance
Estimation
Tool
Topics of the defense
Some Interesting
findings!
Computer Systems
Undergraduate
Ph.D. Research
Computer ScienceElectrical Engineering
Physics

Quick Introduction to Quantum Computing (1)
• Quantum Computer consists of
• Quantum- Bit (Qubit) : Store Information in binary basis states:|0>,|1> (e.g. Trapped-
(171Yb+) Ion energy levels)
• Gates: Process Information (e.g. Lasers to cause transition between energy levels)
|𝑏
|𝑐
|𝑏
|𝑏 ⊕ 𝑐
|𝑎 |𝑏 |𝑐 |𝑎 |𝑏 |𝑐
Qubits
T i m e
Quantum Circuit
Quantum
Hardware
Qubits
Lasers
Gates
Ions
X|𝑎 |1 ⊕ 𝑎
Unitary UU† = I operations
U |a> → |a’>

Quick Introduction to Quantum Computing (2)
• What makes quantum computers interesting (non-conventional)
• Superposition of two states : a|0> + b|1> 1
• Entanglement between qubits: a|00> + b|11>
• What makes quantum computers more powerful
• Phase-Gates: (a|0> + eiπ/2 b|1> )
• Amplitudes (a, b) are complex (e.g. a|0> - b|1> )
• Quantum Speedup:
• Amplitude cancellation can efficiently eliminate incorrect candidate solutions to search problem
• Universal Quantum Computation
{H, , X, Z , S}
H
Z
S
CNOT
Hadamard
π/2 phase-shift gate
π phase-shift gate
T
π/4 phase-shift gate
(T gate)
U
Controlled –CNOT
(Toffoli gate)
Toffoli , T gates are Double-Edged
Swords
• Practical Quantum Speedup
• Practically Resource Consuming
Insufficient for arbitrary
Quantum Computation
or
Clifford-Gates
1 |a|2+|b|2=1

Quantum Computing in the nutshell
• Theoretically, Quantum Computers can solve certain important
problems much faster than conventional (classical) computers:
• Shor’s Integer Factorization Algorithm (Exponential speedup)
• Practically, quantum device component (qubits, gates) are very noisy
and unreliable than classical computers
Need Error-Correction (Redundancy) to protect quantum Information
Mean Time to Failure:
Classical:
~ 107 – 108 hours
Quantum:
~Seconds – Minutes
Failure Prob.
p = 10-3
1 in 1,000 Quantum
Gate fails

Example: Fault Tolerant 3-qubit (Toffoli) Gate
Error CorrectionEncoding
Encoding
Encoding
Error Correction
Error Correction
|𝑥
|𝑦
|𝑧
|𝑥
|𝑦
|𝑥 ⊕ 𝑦 ⊕ 𝑧
|𝑥 𝐿
|𝑦 𝐿
|𝑧 𝐿
4-cat
4-cat
dec.4-cat
4-cat
dec.4-cat
4-cat
dec.
4-cat
4-cat
dec.4-cat
4-cat
dec.4-cat
4-cat
dec
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
Recovery
Unprotected Quantum Gate Fault Tolerant Quantum Gate
LOGICALQUBIT
ANCILLA QUBITS
ANCILLA QUBITS
Parity
Checks
Special
Entangled
QubitState
Large Number of Additional Qubits, Gates to reduce effective noise level from O(p) -> O(p2)
|𝑥
e.g. Steane [[7,1,3]] code
- -
- -
- -
- -
- -
- -
- -

Multiple Layers of Encoding in [[7,1,3]] code
No Encoding Single-Layer
(L1) Encoding
Two-Layers
(L2) Encoding
Qubits
Clifford-Gates
{H,X,Z,CNOT}
Noise Level
Failure Prob. (p)
1 7 72 = 49
p [e.g. p = 10-7] O(p2) O(p4) [e.g. 10-16]
1 7 72 = 49
Non-Clifford Gates
{e.g. Toffoli}
1 O(103) O(105)
[e.g. 10-10]
Good News: Gain in Reliability > Qubit, Gate Overhead
0 O(101) O(102)Ancilla Qubits
102-103x ↑
106x ↓

Fundamental Research Question
To Estimate
How Reliable?
And
How Many?
Qubits, Gates are needed to accomplish what classical computer
cannot in Realistic Time Scale
(e.g., 2,048-bit factorization)

Answer depends on …
Compilation of Quantum
Application into Fault-
tolerant gates (e.g. Gate
Decomposition Methods)
Fault-Tolerance Overhead
(e.g. Error-Correcting Codes)
Integrate Qubits, Gates on
Hardware (e.g. Trapped-Ion)
Quantum Application (e.g.
Shor’s Algorithm)
Research Progress (Theory)
Research Progress (Experiment)
Precise Estimate Needs Information about the Quantum Hardware

Fundamental Research Question
How Reliable?
And
How Many?
Qubits, Gates are needed to accomplish what classical computer
cannot in Realistic Time Scale
(e.g., 2,048-bit factorization)
Answer Heavily Depends on the Architecture of Quantum Hardware

Impact of Hardware Assumptions on the Speed of
Quantum Computer
(Included with permission of rdv, TDL, KMI, quant-ph/0507023)
Days -> Years
Hours -> Days
Classical Quantum
Architecture
Assumption
Matters!
1,024-bit
Factorization

Question
• Why think about architecture for large quantum computer ??
WHEN
• We do not know exactly how to build a small quantum computer…

Answer…
• Architecture can
1. Compensate Technology Limitations (Memory hierarchy, Multi-Core Designs)
2. Reveal performance-limiting factors
3. Guide future advances in technology
• Example from the History
Slower
Unreliable
Computers
Discrete
Transistors
Integrated
Circuits (IC)
Fast and
Reliable
Computers MOORE’S LAW

Research Methodology
• Need to Define Mechanism in which very large number of Qubits are
• Allocated
• Functioned
• Protected
• Connected
in a realistically constructible quantum computer system
• Need a method to efficiently
• Model Quantum Computer Architecture
• Map Quantum Application on the Architecture
• Evaluate the Performance Limiting Factors
Quantum
Computer
Architecture
Performance
Simulation
Crucial Components of my Research
Communication Channel

Tool: Taxonomy of Important Terms
• Device Parameters (DPs)
• e.g. physical gate times, failure probability
• Resource Investment
• e.g. Total physical qubits used in the system
• Architecture Parameters
• Functional Allocation (Data, Ancilla) and Connectivity of qubits
• Performance Metrics
• e.g. Total Execution Time (TEXEC), Failure Probability (PFAIL) prob. That quantum
circuit gives incorrect output
Design Space

Research Methodology
Quantum Circuit (Quantum Adder)
Quantum Hardware (e.g., Trapped-ion)
Quantum Architecture (MUSIQC)
Mapping
(Qubits -> physical
resources)
Scheduling
(Gates -> Sequence of
physical operations)
Performance Analysis
(Latency, Reliability)

Papers/Publications
• Performance simulator based on Hardware Resources Constraints for
Ion- Trap Quantum Computer (ICCD 2013)
• Optimization of a Quantum Computer Architecture Using Resource
Performance Simulator (DATE 2015)
• Designing Million-Qubit Quantum Computer Using Resource
Performance Simulator [In Submission (2nd Attempt)]
Challenge: Target Community Mostly Unfamiliar with Quantum Computing

Philosophy of Performance Simulation Toolset(1)
Device
Parameters
Resource
Overhead
Performance
Evaluation
Architecture
Parameters
Quantum
Circuit
Prior Work:
• Svore et al. (2004)
• Balensiefer et al. (2005)
• Whitney et al. (2007)
• Dousti and Pedram (2012)
F I X E D
LESS FLEXIBLE
LESS FLEXIBLE
INSUFFICIENT
I N S I G H T

Philosophy of Performance Simulation Toolset (2)
Device
Parameters
Resource
Overhead
Performanc
e Evaluation
Architecture
Parameters
Quantum
Circuit
My Contribution
Fine Tuning
Knob
Magnifying
Glass
My Contribution

Philosophy of Performance Simulation Toolset (2)
Device
Parameters
Resource
Overhead
Performanc
e Evaluation
Architecture
Parameters
Quantum
Circuit
My Contribution
Knob
Magnifying
Glass
Desired
Performance
? No
No
No
Qubits how
Reliable?
Qubits How
Many?

What have Learned So Far…
• Question: Quantum Computer Practically Faster Classical Computer?
• What is the quality and the amount of resources needed?
• Answer: Depends upon Quantum Computer Architecture study
• Performance Simulation Tool for Architecture Study
• Quantum Computer Design Cycle
• Flexible Tool to balance improvement in Device Parameters and investment
in Resource , Architecture

Quantum Gate and Hardware Model
Laser (gate)
Ions (qubits)
Electrodes
Optical
Switch
Photon
Detectors
Photons
Ballistic
Shuttling
Channel
U
|𝑥
|𝑦
|𝑧
|𝑥
|𝑦
|𝑥 ⊕ 𝑦 ⊕ 𝑧
|𝑥
|𝑦
|𝑥
|𝑥 ⊕ 𝑦
|𝑥 𝑈|𝑥
M|𝑥 {+1, -1}
Entangled
Pair
(EPR pair)
Quantum Gates
Quantumbits(Qubits)
Video credit: Jason Amini
Beam
Splitter

Fault-tolerant and Scalable Quantum
Computer Architecture
Planar ion traps
. . .
Layer-1 Optical Switch (OS)
. . .
. . .
...
...
. . .
. . .
...
...
. . .
. . .
...
...
Segment
photonic Links
Basic
Ion-trap
Cell Layer-2 Optical Switch (2x time
expensive than Layer-1 OS)
Empty Channels
For Ballistic
Shuttling
Different
Qubit Blocks
Communication Port
Architecture Idea:
Combines the good of
IONS:
Reliable
Storage and
Computation
PHOTONS:
Communication

Hardware Description and Device Parameters
U
M
I
Device Parameters (DPs)
2L x 5000μs
. . .
. . .
5000μs @ L=0
10,000μs @ L=1
20,000μs @ L=2
Speed and Reliability of Computation > Speed and Reliability of Communication

Shor’s Integer Factorization Algorithm Circuit
Controlled Modular Exponentiation: U(x) = ax mod N
Contains O(n2) Adder calls: 512-bit → ~ 1 Million Adders
1024-bit → ~4 Million Adders
2048-bit → ~16 Million Adder
m-qubitRegister
m=2n
.
.
.
|𝟎
|𝟎
|𝟎
|𝟎
|𝟎
𝑼 𝑼 𝟐
𝑼 𝟒
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
𝑼 𝟐 𝒎−𝟏
Inverse
Quantum
Fourier
Transform
.
.
.
MZ
MZ
MZ
MZ
MZ
n-qubitregister
Contains:
O(n2)
Small angle phase
Rz(π/2n+1) =
Depth = O(n)
𝑛
𝑇
For n-bit integer N
GCD (a, N) = 1, a < N
N = (ar/2-1) (ar/2+1)
Period r is hidden in
Eigenvalues of
U(x) = ax mod N
Classical Complexity:
Exponential in n
Quantum Complexity:
Polynomial O(n3)
Bulk of Shor’s Algorithm.

Benchmark Circuits (Approx. QFT)
 1
2R
z

 2
2R
z

 3
2R
z

 4
2R
z

 2kzR 
H T’ H T Z
. . .
. . . T
 2kzR   2kzR 
 2kzR 
TXT’
Decode M
Z
7 cat
|0>L T|+>
Magic State Preparation (latency: 78 ms)
SX
M
T|+>
|Data>
T|Data>
Data Teleportation into Magic State
(latency: 12 ms)
|Data>
|Data>
A1
A2
A3
…
…
…
Time
Exec. Delay
V. Kliuchnikov et al. (2013)
Fowler et al. (2005)
Rz(π/28) sufficient to factorize
Integers > 4096-bit long
~375 gates (~150 T gates) for
Approx. Accuracy within 10-16

Benchmark Circuits (Quantum Adders)
Quantum Carry-Look Ahead Adder
(QCLA) log-depth
Draper et al. (2004)
Quantum Carry-Look Ahead Adder
(QCLA) linear-depth
Cuccaro et al. (2004)
|𝑥
|𝑦
|𝑧
|𝑥
|𝑦
|𝑥 ⊕ 𝑦 ⊕ 𝑧
Magic State Preparation
Data Injection into Magic State
Toffoli

Non-Local Toffoli Gate
e1
e2
|𝑥
|𝑦
|𝑧
|𝑥
|𝑦
|𝑥 ⊕ 𝑦 ⊕ 𝑧
|𝑥
|𝑦
|𝑧
Local
Quantum
operations
|𝑥
|𝑦
|𝑧
Local
Quantum
operations
Segment-1
Segment-2
Quantum
Teleportation
|𝑥
|𝑥
|𝑥 ⊕ 𝑦 ⊕ 𝑧
1
2
|00+|11
DATA Qubits
Ancilla Qubits
Communication (COMM) Qubits

Fault-tolerant and Scalable Quantum
Computer Architecture
Planar ion traps
. . .
COMM. Qubit Block
. . .
. . .
...
...
. . .
. . .
...
...
. . .
. . .
...
...
Segment
Ancilla Qubit Block
Data Qubit Block
Optical Links
Basic
Ion-trap
Empty Channels
For Ballistic
Shuttling
Architecture Parameters
List
NSeg: Number of Segments
NData: Data Tiles/Segment
NComm: Comm. Tiles /Segment
NAnc: Total Ancilla Tiles Comm. Tile
Ancilla Tile
Data Tile

Toolbox Description(1)
Tile Designer
Tile
Database
Low Level
Scheduler
Low Level
Mapper
Low Level
Error
Analyzer
Fault
Tolerant
Circuit
Generator
High Level
Scheduler
High Level
Mapper
High Level
Error
Analyzer
Quantum
Application
Circuit
Generator
Input:
Architecture
Parameters
Output: Failure
Probability
Probability, Latency,
Resource count
Tile Designer and Performance
Analyzer (TDPA)
Visualizer
Performance
Metrics
Decomposer
Input:
DPs
Application-LevelDesignerand
PerformanceAnalyzer(ADPA)
FLEXIBILITY
Critical Tool
Components
Do not Assume Fixed
Device, Architecture
Parameter Values
DEEPER INSIGHTS
Advanced
Output Analysis
Components

Toolbox Description (2)
Error
Correction
Circuit
Quantum
Carry
Look-Ahead
Adder
MUSIQC
Architecture
TDPA ADPA
Low-Level Mapper: Physical Qubits to Ions
Low-Level Scheduler: Physical Gates to Ion
Manipulation, Movement
Low-Level Error Analyzer: Calculates Logical
Failure Prob. from Physical Fail Prob.
High-Level Mapper: Logical Qubits to Tiles
High-Level Scheduler: Logical Gates to Tile
operations and Cross-Segment Movement
High-Level Error Analyzer: Calculates
Application level Failure Prob.
Tile
Tile Performance (TEXEC, PFAIL)

Tool and Analysis Method
q1
q2
q3
q4
q5
q6
q7
q8
q9
q10
q1,q2
q3
q4,
q5
q6,
q7
q8, q9
q10
q1,q2
q3
q4,
q5
q6,
q7
q8, q9
q10
T1
T2
T3
T4
Quantum Circuit
Device ParametersArchitecture Definition
PERFORMANCE
METRICS
DECOMPOSER/
VISUALIZER
HIGH-LEVEL SCHEDULERHIGH-LEVEL MAPPER
Pre-Computed Tile Performance (TDPA)
Input
Input

Main Algorithms/Heuristics
• Mapper:
• Goal: Circuit-Level Connectivity Hardware Level Proximity
• Algorithm: Polynomial Time Graph-Theory algo. solving Optimal Linear
Arrangement Problem
• Scheduler:
• Goal: Reduce Execution Time, Insert Error Correction to Minimize Failure Prob.
• Algorithm: [Greedy] Dispatch gate for execution AS SOON AS resources become
available
• Error Analyzer:
• Goal: To evaluate PFAIL by fully counting logical error events
• Algorithm: O(n3) Fault-path Counting
Details in the Thesis

Validity/Optimality of Scheduler
• Correctness:
• Detailed analysis the complete
schedule (Smaller Circuits )
• Overall validation by using
Visualizer Output (Larger Circuits)
• Optimality
• Comparing the TEXEC of circuit with
and without resource constraints
• X-fold ↑ Resource should yield X-
fold (or more) ↓ in TEXEC
• TEXEC with constraint approach TEXEC
with no constraints
64-bit Quantum Carry Look-Ahead Adder Circuit
Fewer
Resource
Regime
Sufficient
Resource
Regime
Show that Mapper, Scheduler ‘good’ at
1. Working with Fewer Resources
2. Achieve optimal Performance with
Sufficient Resource

Demo
Breakdown
of
TEXEC
A2
B2
C2
S1
S2
Magic State Preparation
Overhead
Toffoli Gate Execution
CNOT Gate Execution
Cross-Segment Swapping
Overhead
Time
Breakdown of Critical Path
Segment S2
Magic
State Prep
A1
B1
C1
Segment S1
Magic
State Prep
Data
Tel.
into
Magic
State
Data
Tel.
into
Magic
State
Magic
State Prep
Magic
State Prep
B1
B2
C2
A1
A2
C1
Data
Tel.
into
Magic
State
A1
B1
C1
A2
B2
C2
Circuit To be Scheduled
Requires
Cross-
Segment
Swapping

Execution Time (seconds)
SegmentIDNumber
Adder Circuit (QCLA)
Time Steps
Scheduled Adder Circuit (QCLA)
Demo VisualizationQubIts
Horizontal Lines:
Delays due to fewer
Ancilla qubits
Non-horizontal Lines:
Delays due to fewer
Comm. qubits

What Have Learned So Far …
• Benchmark Circuits
• Approximate Quantum Fourier Transform (AQFT: Long sequence of Approx. gates)
• Quantum Ripple Carry Adder (QRCA: nearest-neighbor gates)
• Quantum Carry Look-Ahead Adder (QCLA: Highly Parallelizable circuit, long-distance
gates)
• Performance-Simulation Tool
• Mapper, Scheduler, Error Analyzer (Standard components)
• Can work with varying device and architecture parameters
• Advanced components:
• Visualizer
• Performance metrics Decomposer
• Validation and optimality of Mapper, Scheduler

Remember this picture…
Planar ion traps
. . .
COMM Tile
. . .
. . .
...
...
. . .
. . .
...
...
. . .
. . .
...
...
Segment
ANCILLA Tile
DATA Tile
Optical Links
Basic
Ion-trap
Empty Channels
For Ballistic
Shuttling
Architecture Parameters
List
NSeg: Number of Segments
NData: Data Tiles/Segment
NComm: Comm. Tiles /Segment
NAnc: Total Ancilla Tiles

First Set of Simulations (Setup)
Goal: Study Performance Limiting Arch. Device Parameters
• Benchmark Circuit
• 1024-bit QCLA
• Error Correction (Steane [[7,1,3]] code)
• Two layers of concatenation (L1, L2)
• L2 Error-correction after each gate
• Qubit sitting idle for long enough time
• Perform L2 Error Correction only
• (No L1 Error-Correction),
• Qubits can decohere for ~ 4.8ms
• Computation
• CNOT (Local, Non-Local)
• TOFFOLI (Local)
• Can execute Toffoli in any Segment
Precomputed Tile performance numbers
L2 Toffoli : ~50 ms, O(10-14)
L2 cross-Seg Teleportation: ~10ms, O(10-11)

Labelling the Architecture Space
TEXEC as function of Architecture Parameters
Large Segments, TEXEC depends on Ancilla
(Less Distributed System)
Small Segments, TEXEC depends on Comm. Qubits
(Highly Distributed System)

Optimal Architecture Selection
(Minimizing TEXEC - Qubits product)
Ancilla Regime Architecture Tel-Regime Architecture
Minimum Exec. Time-Qubit Product for
(NSeg = 4, NAnc = 1362, NComm = 1)
Minimum Exec. Time-Qubit Product for
(NSeg = 16, NAnc = 1362 NComm = 6)

Reducing PFAIL using Device Parameters (DP)
Improving DP: Qubit Memory 10 x → Failure Prob. Reduction ~ 100 x – 1000 x
4 million Adder calls need Failure Prob. << 2.5 x 10-7 to
run 1024-bit Modular Exponentiation, Shor’s algorithm
Ancilla-Regime Optimal Architecture Tel-Regime Optimal Architecture

Second Set of Simulations (Setup)
Goal: Study Performance-Scaling of Circuits
• QCLA, QRCA and AQFT
• Two layers of concatenation (TEXEC
L1 < TEXEC
L2)
• Perform L2 Error Correction
• Else, Perform L1 Error-Correction
• Qubits can decohere for ~ 0.6ms
• Computation
• TOFFOLI, T gates (Local)
• Execute TOFFOLI, T gates in
Computational Segment (CS)
Architecture Configuration:
[#Data Tiles, #Ancilla Tiles, #Comm. Tiles], #CS

Resource-Performance Scalability of MUSIQC
Architecture
Requirement:
TEXEC scales logarithmically with
Problems Size and Resources
Requirement: TEXEC scales Linearly with
Problems Size and Resources
QCLA QRCA AQFT
PFAIL scales Linearly with Problems Size
and Resources
MUSIQC Architecture Can Maintain
‘Desirable’ Performance/Resource as
Problem Size Increases

TEXEC as function of (#Data, #Ancilla, #Comm. Qubits), #CS
1,024-bit QCLA 1,024-bit QRCA 1,024-bit AQFT
Benchmark
Circuits
Resource Consumption
NCS/Ancilla
Qubits
Comm.
Qubits
Overall
Resource
QCLA High High High
QRCA Low Modest Modest
AQFT Low None Low

TEXEC as function of Problem Size
(Total Qubits < 1.5 million qubits)
Optimized Architecture
Configurations (upto
1024-bit Circuit)
[Data, Ancilla, Comm.], #CS
QCLA: [30, 8, 5] , 137
QRCA: [19, 12, 6] , 108
AQFT: [1, 8 , 1] , 16
QCLA: [48,4,2] , 25
2,048 –bit QCLA Adder Doesn’t Fit Well into 1.5 Million-Qubit Machine
QCLA: [30,8,5] ,137
~90x ↓ in Ancilla, Comm. /
Data Tile

Running 2,048-bit Shor’s algorithm
in < 5 months
• 2,048-bit Shor’s algorithm ≈ 2,048-bit Modular Exponentiation Circuit
• 2,048-bit Modular Exponentiation → 16 million calls to Adder
• TEXEC of each Adder < 0.8 sec
• Choose QCLA: [Current TEXEC = 260 sec]
• 2x ↑ Resource Budget → TEXEC = 2.76 sec (96x reduction)
• 10x ↓ Physical Op. Time (CNOT, Measurement, Shuttling Time) → TEXEC = 0.69
sec (~4x reduction)
Resources can be important than gate speed for decreasing TEXEC

Reducing PFAIL using Device Parameters
• To run 2048-bit modular exponentiation we need 16 million calls to the Adder
• Adder PFAIL = 2.77 x 10-7, needs to be << 1/16 x 106 = 6.67 x 10-8
10x decrease in Entanglement Infidelity → 100x reduction in PFAIL
2.37 x 10-9 << 6.67 x 10-8
(a) Pfail = 2.77 x 10-7
TEL
99.6%
MEM
0.08%
SHUT
0.01%
GATE
0.31%
Infidelity of EPR Pair: 10-4
(b) Pfail = 2.37 x 10-9
TEL
1.9%
SHUT
0.55%
MEM
22.3%
GATE
75.3%
Infidelity of EPR Pair: 10-5
Gate Noise (GATE)
Teleportation Noise (TEL)
Shuttling Noise (SHUT)
Memory Noise (MEM)

Overall Performance of 2,048-bit Shor’s
Algorithm within 3 x 106 Qubits
• QCLA (TEXEC = 0.68 sec , PFAIL = 2.37 x 10-9), AQFT (1 day, PFAIL = 6 x 10-5)
• Total execution time: 16 x 106 x 0.68 + 86400 ≈ 128 days
• Total failure probability : 16 x 106 x 2.37 x 10-9 + 6 x 10-5 ≈ 0.04

• Architecture Space Exploration
• Overall performance of fully error-corrected circuit can be bottlenecked by
• Non-Clifford Ancilla qubits
• Long-distance Communication qubits
• Performance-Scalability of Architecture
• Resource allocation should meet the computational workload of the
benchmark application circuit
Summary of My Results (1)
Studying Architecture Space for Variety of Benchmarks of Varying Sizes

Summary of My Results (1)
• Reliability
• Overall PFAIL can be limited either by Memory Coherence Time or Fidelity of
Communication Channel (depending on error-correction strategy)
• Speed
• For overall TEXEC Resource budget (#qubits) can be more crucial than speeding
up the quantum gates
• Using 3 million qubits, each failing with prob. 10-7/op, a 2,048-bit
number can be factored in less than five months.

Future Work
• Sophisticated Models of Physical Components
• Cross-talk, loss of qubits etc.
• Quantum Technologies
• Superconductors
• Quantum dots
• Error-Correcting Codes and Noise Models
• Topological quantum codes
• Exploiting X/Z error asymmetry
• Correlated errors
Interesting Venues of Future Research

First Set of Simulations (Results[2])
PFAIL as function of Architecture Parameters
Ancilla-Regime, PFAIL depends on Shuttling Noise
Tel-Regime, PFAIL depends on Memory Noise, Teleportation Noise

More Answers…
• Architectures can compensate for the technology limitations
• Memory/Cache-Hierarchy
• Hides the slowness of dynamic RAM,
Secondary memory devices
• Multi-core designs
• Reduce power dissipation without compromising
the performance

Research in Classical Architecture
• Classical Era (lasted until 90s)
• Human intuition and experience for searching the computer design space
• Modern Era
• Software tools find good computer design. [Todd Austin’s SIMPLESCALAR
Toolset]
The idea of performance Simulation Toolset for Quantum Computers

Fault tolerance and resource overhead
•Encode 1-logical qubit → 7-physical qubit error-correcting code (e.g.,
Steane [[7,1,3]])
• Construct DATA QUBITS
• Perform error correction regularly on encoded qubits to kill errors
• Requires ANCILLA QUBITS to perform parity checks
• Apply fault-tolerant gates on encoded (logical) qubits
• Generally requires ANCILLA QUBITS for correct operation and error prevention
Noise level reduces O(p) → O(p2) with each layer of encoding

1.0E-8
1.0E-7
1.0E-6
1.0E-5
1.0E-4
0 200 400 600 800 1000 1200
FailureProbability
Execution Time (sec)
QRCA 2048
QCLA 2048
(TEXEC, PFAIL) as function of Gate Speed, Resources
1.0E-8
1.0E-7
1.0E-6
1.0E-5
1.0E-4
0 200 400 600 800 1000 1200
Failureprobability
QRCA 2048
QCLA 2048
Gate Speed: 10X ↑Resources: 2X ↑
1.0E-8
1.0E-7
1.0E-6
1.0E-5
1.0E-4
0 200 400 600 800 1000 1200
FailureProbability
QRCA 2048
QCLA 2048
Resources can be important than gate speed for decreasing TEXEC
PFAIL doesn’t decrease appreciably by adding more resources
100x ↓ TEXEC for QCLA
2,048-QCLA TEXEC = 0.69s, PFAIL = 2.77 X 10-7 with 2x ↑ Resource and 10x ↓ Gate Times

Quick Introduction to Quantum Computing
• Quantum Computer consists of
• Qubits : Store Information (e.g. Trapped-(171Yb+) Ion energy levels)
• Gates: Process Information (e.g. Lasers to cause transition between energy levels)
|𝑥
|𝑦
|𝑧
|𝑥
|𝑦
|𝑥 ⊕ 𝑦 ⊕ 𝑧
|𝑥 |𝑦 |𝑧 |𝑥 |𝑦 |𝑧
Qubits
T i m e
Quantum Circuit
Quantum
Hardware
Qubits
Lasers
Gate
Ions

TEXEC as function of Problem Size
(Total Qubits < 1.5 million qubits)
Optimal Architecture
Configurations (upto
1024-bit Circuit)
[Data, Ancilla, Comm.], #CS
QCLA: [30, 8, 5] , 137
QRCA: [19, 12, 6] , 108
AQFT: [1, 8 , 1] , 16
QCLA: [48,4,2] , 25
Ancilla Prep:10%
Comm.: 85%
Gate: 4%
2,048 –bit QCLA Adder doesn’t well fit into 1.5 million-qubit machine
QCLA: [30,8,5] ,137
Ancilla Prep: 38%
Comm.: 29%
Gate: 33%

Second Set of Simulations (Setup)
Goal: Study Performance-Scaling of Circuits
• QCLA, QRCA and AQFT
• Two layers of concatenation (TL2 = 70TL1)
• Perform L2 Error Correction
• Else, Perform L1 Error-Correction
• Physical qubits can decohere for < 0.6ms
• Computation
• TOFFOLI, T gates (Local)
• Execute TOFFOLI, T gates in Computational Segment (CS)
Pre-computed Tile performance numbers
Architecture Configuration: (#Data, #Ancilla, #Comm (qubits)), # CS
L2 Toffoli : ~51 ms, O(10-14)
L2 cross-Seg Teleportation: ~5ms, O(10-11)

Architecture Framework for Trapped Ion Quantum Computer Simulation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Architecture Framework for Trapped Ion Quantum Computer Simulation

Similar to Architecture Framework for Trapped Ion Quantum Computer Simulation (20)

Architecture Framework for Trapped Ion Quantum Computer Simulation