SlideShare a Scribd company logo
1 of 69
Architecture Framework for
Trapped Ion Quantum Computer
Muhammad Ahsan
[Ph.D. Defense]
Department of Computer Science, Duke University
Agenda
• Background and Motivation
• Quantum Hardware and Architecture Models
• Benchmark Application Circuits
• Performance Simulation Tool
• Results
My Research Background
Computer Architecture Quantum Computing
Fault Tolerance
Quantum Error
Correction
Quantum
Computer
Architecture
Resource
Performance
Estimation
Tool
Topics of the defense
Some Interesting
findings!
Computer Systems
Undergraduate
Ph.D. Research
Computer ScienceElectrical Engineering
Physics
Quick Introduction to Quantum Computing (1)
• Quantum Computer consists of
• Quantum- Bit (Qubit) : Store Information in binary basis states:|0>,|1> (e.g. Trapped-
(171Yb+) Ion energy levels)
• Gates: Process Information (e.g. Lasers to cause transition between energy levels)
|𝑏
|𝑐
|𝑏
|𝑏 ⊕ 𝑐
|𝑎 |𝑏 |𝑐 |𝑎 |𝑏 |𝑐
Qubits
T i m e
Quantum Circuit
Quantum
Hardware
Qubits
Lasers
Gates
Ions
X|𝑎 |1 ⊕ 𝑎
Unitary UU† = I operations
U |a> → |a’>
Quick Introduction to Quantum Computing (2)
• What makes quantum computers interesting (non-conventional)
• Superposition of two states : a|0> + b|1> 1
• Entanglement between qubits: a|00> + b|11>
• What makes quantum computers more powerful
• Phase-Gates: (a|0> + eiπ/2 b|1> )
• Amplitudes (a, b) are complex (e.g. a|0> - b|1> )
• Quantum Speedup:
• Amplitude cancellation can efficiently eliminate incorrect candidate solutions to search problem
• Universal Quantum Computation
{H, , X, Z , S}
H
Z
S
CNOT
Hadamard
π/2 phase-shift gate
π phase-shift gate
T
π/4 phase-shift gate
(T gate)
U
Controlled –CNOT
(Toffoli gate)
Toffoli , T gates are Double-Edged
Swords
• Practical Quantum Speedup
• Practically Resource Consuming
Insufficient for arbitrary
Quantum Computation
or
Clifford-Gates
1 |a|2+|b|2=1
Quantum Computing in the nutshell
• Theoretically, Quantum Computers can solve certain important
problems much faster than conventional (classical) computers:
• Shor’s Integer Factorization Algorithm (Exponential speedup)
• Practically, quantum device component (qubits, gates) are very noisy
and unreliable than classical computers
Need Error-Correction (Redundancy) to protect quantum Information
Mean Time to Failure:
Classical:
~ 107 – 108 hours
Quantum:
~Seconds – Minutes
Failure Prob.
p = 10-3
1 in 1,000 Quantum
Gate fails
Example: Fault Tolerant 3-qubit (Toffoli) Gate
Error CorrectionEncoding
Encoding
Encoding
Error Correction
Error Correction
|𝑥
|𝑦
|𝑧
|𝑥
|𝑦
|𝑥 ⊕ 𝑦 ⊕ 𝑧
|𝑥 𝐿
|𝑦 𝐿
|𝑧 𝐿
4-cat
4-cat
dec.4-cat
4-cat
dec.4-cat
4-cat
dec.
4-cat
4-cat
dec.4-cat
4-cat
dec.4-cat
4-cat
dec
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
Recovery
Unprotected Quantum Gate Fault Tolerant Quantum Gate
LOGICALQUBIT
ANCILLA QUBITS
ANCILLA QUBITS
Parity
Checks
Special
Entangled
QubitState
Large Number of Additional Qubits, Gates to reduce effective noise level from O(p) -> O(p2)
|𝑥
e.g. Steane [[7,1,3]] code
- -
- -
- -
- -
- -
- -
- -
Multiple Layers of Encoding in [[7,1,3]] code
No Encoding Single-Layer
(L1) Encoding
Two-Layers
(L2) Encoding
Qubits
Clifford-Gates
{H,X,Z,CNOT}
Noise Level
Failure Prob. (p)
1 7 72 = 49
p [e.g. p = 10-7] O(p2) O(p4) [e.g. 10-16]
1 7 72 = 49
Non-Clifford Gates
{e.g. Toffoli}
1 O(103) O(105)
[e.g. 10-10]
Good News: Gain in Reliability > Qubit, Gate Overhead
0 O(101) O(102)Ancilla Qubits
102-103x ↑
106x ↓
Fundamental Research Question
To Estimate
How Reliable?
And
How Many?
Qubits, Gates are needed to accomplish what classical computer
cannot in Realistic Time Scale
(e.g., 2,048-bit factorization)
Answer depends on …
Compilation of Quantum
Application into Fault-
tolerant gates (e.g. Gate
Decomposition Methods)
Fault-Tolerance Overhead
(e.g. Error-Correcting Codes)
Integrate Qubits, Gates on
Hardware (e.g. Trapped-Ion)
Quantum Application (e.g.
Shor’s Algorithm)
Research Progress (Theory)
Research Progress (Experiment)
Precise Estimate Needs Information about the Quantum Hardware
Fundamental Research Question
How Reliable?
And
How Many?
Qubits, Gates are needed to accomplish what classical computer
cannot in Realistic Time Scale
(e.g., 2,048-bit factorization)
Answer Heavily Depends on the Architecture of Quantum Hardware
Impact of Hardware Assumptions on the Speed of
Quantum Computer
(Included with permission of rdv, TDL, KMI, quant-ph/0507023)
Days -> Years
Hours -> Days
Classical Quantum
Architecture
Assumption
Matters!
1,024-bit
Factorization
Question
• Why think about architecture for large quantum computer ??
WHEN
• We do not know exactly how to build a small quantum computer…
Answer…
• Architecture can
1. Compensate Technology Limitations (Memory hierarchy, Multi-Core Designs)
2. Reveal performance-limiting factors
3. Guide future advances in technology
• Example from the History
Slower
Unreliable
Computers
Discrete
Transistors
Integrated
Circuits (IC)
Fast and
Reliable
Computers MOORE’S LAW
Research Methodology
• Need to Define Mechanism in which very large number of Qubits are
• Allocated
• Functioned
• Protected
• Connected
in a realistically constructible quantum computer system
• Need a method to efficiently
• Model Quantum Computer Architecture
• Map Quantum Application on the Architecture
• Evaluate the Performance Limiting Factors
Quantum
Computer
Architecture
Performance
Simulation
Crucial Components of my Research
Communication Channel
Tool: Taxonomy of Important Terms
• Device Parameters (DPs)
• e.g. physical gate times, failure probability
• Resource Investment
• e.g. Total physical qubits used in the system
• Architecture Parameters
• Functional Allocation (Data, Ancilla) and Connectivity of qubits
• Performance Metrics
• e.g. Total Execution Time (TEXEC), Failure Probability (PFAIL) prob. That quantum
circuit gives incorrect output
Design Space
Research Methodology
Quantum Circuit (Quantum Adder)
Quantum Hardware (e.g., Trapped-ion)
Quantum Architecture (MUSIQC)
Mapping
(Qubits -> physical
resources)
Scheduling
(Gates -> Sequence of
physical operations)
Performance Analysis
(Latency, Reliability)
Papers/Publications
• Performance simulator based on Hardware Resources Constraints for
Ion- Trap Quantum Computer (ICCD 2013)
• Optimization of a Quantum Computer Architecture Using Resource
Performance Simulator (DATE 2015)
• Designing Million-Qubit Quantum Computer Using Resource
Performance Simulator [In Submission (2nd Attempt)]
Challenge: Target Community Mostly Unfamiliar with Quantum Computing
Philosophy of Performance Simulation Toolset(1)
Device
Parameters
Resource
Overhead
Performance
Evaluation
Architecture
Parameters
Quantum
Circuit
Prior Work:
• Svore et al. (2004)
• Balensiefer et al. (2005)
• Whitney et al. (2007)
• Dousti and Pedram (2012)
F I X E D
LESS FLEXIBLE
LESS FLEXIBLE
INSUFFICIENT
I N S I G H T
Philosophy of Performance Simulation Toolset (2)
Device
Parameters
Resource
Overhead
Performanc
e Evaluation
Architecture
Parameters
Quantum
Circuit
My Contribution
Fine Tuning
Knob
Magnifying
Glass
My Contribution
Philosophy of Performance Simulation Toolset (2)
Device
Parameters
Resource
Overhead
Performanc
e Evaluation
Architecture
Parameters
Quantum
Circuit
My Contribution
Knob
Magnifying
Glass
Desired
Performance
? No
No
No
Qubits how
Reliable?
Qubits How
Many?
What have Learned So Far…
• Question: Quantum Computer Practically Faster Classical Computer?
• What is the quality and the amount of resources needed?
• Answer: Depends upon Quantum Computer Architecture study
• Performance Simulation Tool for Architecture Study
• Quantum Computer Design Cycle
• Flexible Tool to balance improvement in Device Parameters and investment
in Resource , Architecture
Agenda
• Background and Motivation
• Quantum Hardware and Architecture Models
• Benchmark Application Circuits
• Performance Simulation Tool
• Results
Quantum Gate and Hardware Model
Laser (gate)
Ions (qubits)
Electrodes
Optical
Switch
Photon
Detectors
Photons
Ballistic
Shuttling
Channel
U
|𝑥
|𝑦
|𝑧
|𝑥
|𝑦
|𝑥 ⊕ 𝑦 ⊕ 𝑧
|𝑥
|𝑦
|𝑥
|𝑥 ⊕ 𝑦
|𝑥 𝑈|𝑥
M|𝑥 {+1, -1}
Entangled
Pair
(EPR pair)
Quantum Gates
Quantumbits(Qubits)
Video credit: Jason Amini
Beam
Splitter
Fault-tolerant and Scalable Quantum
Computer Architecture
Planar ion traps
. . .
Layer-1 Optical Switch (OS)
. . .
. . .
...
...
. . .
. . .
...
...
. . .
. . .
...
...
Segment
photonic Links
Basic
Ion-trap
Cell Layer-2 Optical Switch (2x time
expensive than Layer-1 OS)
Empty Channels
For Ballistic
Shuttling
Different
Qubit Blocks
Communication Port
Architecture Idea:
Combines the good of
IONS:
Reliable
Storage and
Computation
PHOTONS:
Communication
Hardware Description and Device Parameters
U
M
I
Device Parameters (DPs)
2L x 5000μs
. . .
. . .
5000μs @ L=0
10,000μs @ L=1
20,000μs @ L=2
Speed and Reliability of Computation > Speed and Reliability of Communication
Agenda
• Background and Motivation
• Quantum Hardware and Architecture Models
• Benchmark Application Circuits
• Performance Simulation Tool
• Results
Shor’s Integer Factorization Algorithm Circuit
Controlled Modular Exponentiation: U(x) = ax mod N
Contains O(n2) Adder calls: 512-bit → ~ 1 Million Adders
1024-bit → ~4 Million Adders
2048-bit → ~16 Million Adder
m-qubitRegister
m=2n
.
.
.
|𝟎
|𝟎
|𝟎
|𝟎
|𝟎
𝑼 𝑼 𝟐
𝑼 𝟒
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
𝑼 𝟐 𝒎−𝟏
Inverse
Quantum
Fourier
Transform
.
.
.
MZ
MZ
MZ
MZ
MZ
n-qubitregister
Contains:
O(n2)
Small angle phase
Rz(π/2n+1) =
Depth = O(n)
𝑛
𝑇
For n-bit integer N
GCD (a, N) = 1, a < N
N = (ar/2-1) (ar/2+1)
Period r is hidden in
Eigenvalues of
U(x) = ax mod N
Classical Complexity:
Exponential in n
Quantum Complexity:
Polynomial O(n3)
Bulk of Shor’s Algorithm.
Benchmark Circuits (Approx. QFT)
 1
2R
z

 2
2R
z

 3
2R
z

 4
2R
z

 2kzR 
H T’ H T Z
. . .
. . . T
 2kzR   2kzR 
 2kzR 
TXT’
Decode M
Z
7 cat
|0>L T|+>
Magic State Preparation (latency: 78 ms)
SX
M
T|+>
|Data>
T|Data>
Data Teleportation into Magic State
(latency: 12 ms)
|Data>
|Data>
A1
A2
A3
…
…
…
Time
Exec. Delay
V. Kliuchnikov et al. (2013)
Fowler et al. (2005)
Rz(π/28) sufficient to factorize
Integers > 4096-bit long
~375 gates (~150 T gates) for
Approx. Accuracy within 10-16
Benchmark Circuits (Quantum Adders)
Quantum Carry-Look Ahead Adder
(QCLA) log-depth
Draper et al. (2004)
Quantum Carry-Look Ahead Adder
(QCLA) linear-depth
Cuccaro et al. (2004)
|𝑥
|𝑦
|𝑧
|𝑥
|𝑦
|𝑥 ⊕ 𝑦 ⊕ 𝑧
Magic State Preparation
Data Injection into Magic State
Toffoli
Non-Local Toffoli Gate
e1
e2
|𝑥
|𝑦
|𝑧
|𝑥
|𝑦
|𝑥 ⊕ 𝑦 ⊕ 𝑧
|𝑥
|𝑦
|𝑧
Local
Quantum
operations
|𝑥
|𝑦
|𝑧
Local
Quantum
operations
Segment-1
Segment-2
Quantum
Teleportation
|𝑥
|𝑥
|𝑥 ⊕ 𝑦 ⊕ 𝑧
1
2
|00+|11
DATA Qubits
Ancilla Qubits
Communication (COMM) Qubits
Fault-tolerant and Scalable Quantum
Computer Architecture
Planar ion traps
. . .
Layer-1 Optical Switch (OS)
COMM. Qubit Block
. . .
. . .
...
...
. . .
. . .
...
...
. . .
. . .
...
...
Segment
Ancilla Qubit Block
Data Qubit Block
Optical Links
Basic
Ion-trap
Cell Layer-2 Optical Switch (2x time
expensive than Layer-1 OS)
Empty Channels
For Ballistic
Shuttling
Architecture Parameters
List
NSeg: Number of Segments
NData: Data Tiles/Segment
NComm: Comm. Tiles /Segment
NAnc: Total Ancilla Tiles Comm. Tile
Ancilla Tile
Data Tile
Agenda
• Background and Motivation
• Quantum Hardware and Architecture Models
• Benchmark Application Circuits
• Performance Simulation Tool
• Results
Toolbox Description(1)
Tile Designer
Tile
Database
Low Level
Scheduler
Low Level
Mapper
Low Level
Error
Analyzer
Fault
Tolerant
Circuit
Generator
High Level
Scheduler
High Level
Mapper
High Level
Error
Analyzer
Quantum
Application
Circuit
Generator
Input:
Architecture
Parameters
Output: Failure
Probability
Probability, Latency,
Resource count
Tile Designer and Performance
Analyzer (TDPA)
Visualizer
Performance
Metrics
Decomposer
Input:
DPs
Application-LevelDesignerand
PerformanceAnalyzer(ADPA)
FLEXIBILITY
Critical Tool
Components
Do not Assume Fixed
Device, Architecture
Parameter Values
DEEPER INSIGHTS
Advanced
Output Analysis
Components
Toolbox Description (2)
Error
Correction
Circuit
Quantum
Carry
Look-Ahead
Adder
MUSIQC
Architecture
TDPA ADPA
Low-Level Mapper: Physical Qubits to Ions
Low-Level Scheduler: Physical Gates to Ion
Manipulation, Movement
Low-Level Error Analyzer: Calculates Logical
Failure Prob. from Physical Fail Prob.
High-Level Mapper: Logical Qubits to Tiles
High-Level Scheduler: Logical Gates to Tile
operations and Cross-Segment Movement
High-Level Error Analyzer: Calculates
Application level Failure Prob.
Tile
Tile Performance (TEXEC, PFAIL)
Tool and Analysis Method
q1
q2
q3
q4
q5
q6
q7
q8
q9
q10
q1,q2
q3
q4,
q5
q6,
q7
q8, q9
q10
q1,q2
q3
q4,
q5
q6,
q7
q8, q9
q10
T1
T2
T3
T4
Quantum Circuit
Device ParametersArchitecture Definition
PERFORMANCE
METRICS
DECOMPOSER/
VISUALIZER
HIGH-LEVEL SCHEDULERHIGH-LEVEL MAPPER
Pre-Computed Tile Performance (TDPA)
Input
Input
Main Algorithms/Heuristics
• Mapper:
• Goal: Circuit-Level Connectivity Hardware Level Proximity
• Algorithm: Polynomial Time Graph-Theory algo. solving Optimal Linear
Arrangement Problem
• Scheduler:
• Goal: Reduce Execution Time, Insert Error Correction to Minimize Failure Prob.
• Algorithm: [Greedy] Dispatch gate for execution AS SOON AS resources become
available
• Error Analyzer:
• Goal: To evaluate PFAIL by fully counting logical error events
• Algorithm: O(n3) Fault-path Counting
Details in the Thesis
Validity/Optimality of Scheduler
• Correctness:
• Detailed analysis the complete
schedule (Smaller Circuits )
• Overall validation by using
Visualizer Output (Larger Circuits)
• Optimality
• Comparing the TEXEC of circuit with
and without resource constraints
• X-fold ↑ Resource should yield X-
fold (or more) ↓ in TEXEC
• TEXEC with constraint approach TEXEC
with no constraints
64-bit Quantum Carry Look-Ahead Adder Circuit
Fewer
Resource
Regime
Sufficient
Resource
Regime
Show that Mapper, Scheduler ‘good’ at
1. Working with Fewer Resources
2. Achieve optimal Performance with
Sufficient Resource
Demo
Breakdown
of
TEXEC
A2
B2
C2
S1
S2
Magic State Preparation
Overhead
Toffoli Gate Execution
CNOT Gate Execution
Cross-Segment Swapping
Overhead
Time
Breakdown of Critical Path
Segment S2
Magic
State Prep
A1
B1
C1
Segment S1
Magic
State Prep
Data
Tel.
into
Magic
State
Data
Tel.
into
Magic
State
Magic
State Prep
Magic
State Prep
B1
B2
C2
A1
A2
C1
Data
Tel.
into
Magic
State
A1
B1
C1
A2
B2
C2
Circuit To be Scheduled
Requires
Cross-
Segment
Swapping
Execution Time (seconds)
SegmentIDNumber
Adder Circuit (QCLA)
Time Steps
Scheduled Adder Circuit (QCLA)
Demo VisualizationQubIts
Horizontal Lines:
Delays due to fewer
Ancilla qubits
Non-horizontal Lines:
Delays due to fewer
Comm. qubits
What Have Learned So Far …
• Benchmark Circuits
• Approximate Quantum Fourier Transform (AQFT: Long sequence of Approx. gates)
• Quantum Ripple Carry Adder (QRCA: nearest-neighbor gates)
• Quantum Carry Look-Ahead Adder (QCLA: Highly Parallelizable circuit, long-distance
gates)
• Performance-Simulation Tool
• Mapper, Scheduler, Error Analyzer (Standard components)
• Can work with varying device and architecture parameters
• Advanced components:
• Visualizer
• Performance metrics Decomposer
• Validation and optimality of Mapper, Scheduler
Remember this picture…
Planar ion traps
. . .
Layer-1 Optical Switch (OS)
COMM Tile
. . .
. . .
...
...
. . .
. . .
...
...
. . .
. . .
...
...
Segment
ANCILLA Tile
DATA Tile
Optical Links
Basic
Ion-trap
Cell Layer-2 Optical Switch (2x time
expensive than Layer-1 OS)
Empty Channels
For Ballistic
Shuttling
Architecture Parameters
List
NSeg: Number of Segments
NData: Data Tiles/Segment
NComm: Comm. Tiles /Segment
NAnc: Total Ancilla Tiles
Agenda
• Background and Motivation
• Quantum Hardware and Architecture Models
• Benchmark Application Circuits
• Performance Simulation Tool
• Results
First Set of Simulations (Setup)
Goal: Study Performance Limiting Arch. Device Parameters
• Benchmark Circuit
• 1024-bit QCLA
• Error Correction (Steane [[7,1,3]] code)
• Two layers of concatenation (L1, L2)
• L2 Error-correction after each gate
• Qubit sitting idle for long enough time
• Perform L2 Error Correction only
• (No L1 Error-Correction),
• Qubits can decohere for ~ 4.8ms
• Computation
• CNOT (Local, Non-Local)
• TOFFOLI (Local)
• Can execute Toffoli in any Segment
Precomputed Tile performance numbers
L2 Toffoli : ~50 ms, O(10-14)
L2 cross-Seg Teleportation: ~10ms, O(10-11)
Labelling the Architecture Space
TEXEC as function of Architecture Parameters
Large Segments, TEXEC depends on Ancilla
(Less Distributed System)
Small Segments, TEXEC depends on Comm. Qubits
(Highly Distributed System)
Optimal Architecture Selection
(Minimizing TEXEC - Qubits product)
Ancilla Regime Architecture Tel-Regime Architecture
Minimum Exec. Time-Qubit Product for
(NSeg = 4, NAnc = 1362, NComm = 1)
Minimum Exec. Time-Qubit Product for
(NSeg = 16, NAnc = 1362 NComm = 6)
Reducing PFAIL using Device Parameters (DP)
Improving DP: Qubit Memory 10 x → Failure Prob. Reduction ~ 100 x – 1000 x
4 million Adder calls need Failure Prob. << 2.5 x 10-7 to
run 1024-bit Modular Exponentiation, Shor’s algorithm
Ancilla-Regime Optimal Architecture Tel-Regime Optimal Architecture
Second Set of Simulations (Setup)
Goal: Study Performance-Scaling of Circuits
• Benchmark Circuit
• QCLA, QRCA and AQFT
• Error Correction (Steane [[7,1,3]] code)
• Two layers of concatenation (TEXEC
L1 < TEXEC
L2)
• L1 Error-correction after each gate
• Qubit sitting idle for long enough time
• Perform L2 Error Correction
• Else, Perform L1 Error-Correction
• Qubits can decohere for ~ 0.6ms
• Computation
• CNOT (Local, Non-Local)
• TOFFOLI, T gates (Local)
• Execute TOFFOLI, T gates in
Computational Segment (CS)
Architecture Configuration:
[#Data Tiles, #Ancilla Tiles, #Comm. Tiles], #CS
Resource-Performance Scalability of MUSIQC
Architecture
Requirement:
TEXEC scales logarithmically with
Problems Size and Resources
Requirement: TEXEC scales Linearly with
Problems Size and Resources
QCLA QRCA AQFT
PFAIL scales Linearly with Problems Size
and Resources
MUSIQC Architecture Can Maintain
‘Desirable’ Performance/Resource as
Problem Size Increases
TEXEC as function of (#Data, #Ancilla, #Comm. Qubits), #CS
1,024-bit QCLA 1,024-bit QRCA 1,024-bit AQFT
Benchmark
Circuits
Resource Consumption
NCS/Ancilla
Qubits
Comm.
Qubits
Overall
Resource
QCLA High High High
QRCA Low Modest Modest
AQFT Low None Low
TEXEC as function of Problem Size
(Total Qubits < 1.5 million qubits)
Optimized Architecture
Configurations (upto
1024-bit Circuit)
[Data, Ancilla, Comm.], #CS
QCLA: [30, 8, 5] , 137
QRCA: [19, 12, 6] , 108
AQFT: [1, 8 , 1] , 16
QCLA: [48,4,2] , 25
2,048 –bit QCLA Adder Doesn’t Fit Well into 1.5 Million-Qubit Machine
QCLA: [30,8,5] ,137
~90x ↓ in Ancilla, Comm. /
Data Tile
Running 2,048-bit Shor’s algorithm
in < 5 months
• 2,048-bit Shor’s algorithm ≈ 2,048-bit Modular Exponentiation Circuit
• 2,048-bit Modular Exponentiation → 16 million calls to Adder
• TEXEC of each Adder < 0.8 sec
• Choose QCLA: [Current TEXEC = 260 sec]
• 2x ↑ Resource Budget → TEXEC = 2.76 sec (96x reduction)
• 10x ↓ Physical Op. Time (CNOT, Measurement, Shuttling Time) → TEXEC = 0.69
sec (~4x reduction)
Resources can be important than gate speed for decreasing TEXEC
Reducing PFAIL using Device Parameters
• To run 2048-bit modular exponentiation we need 16 million calls to the Adder
• Adder PFAIL = 2.77 x 10-7, needs to be << 1/16 x 106 = 6.67 x 10-8
10x decrease in Entanglement Infidelity → 100x reduction in PFAIL
2.37 x 10-9 << 6.67 x 10-8
(a) Pfail = 2.77 x 10-7
TEL
99.6%
MEM
0.08%
SHUT
0.01%
GATE
0.31%
Infidelity of EPR Pair: 10-4
(b) Pfail = 2.37 x 10-9
TEL
1.9%
SHUT
0.55%
MEM
22.3%
GATE
75.3%
Infidelity of EPR Pair: 10-5
Gate Noise (GATE)
Teleportation Noise (TEL)
Shuttling Noise (SHUT)
Memory Noise (MEM)
Overall Performance of 2,048-bit Shor’s
Algorithm within 3 x 106 Qubits
• QCLA (TEXEC = 0.68 sec , PFAIL = 2.37 x 10-9), AQFT (1 day, PFAIL = 6 x 10-5)
• Total execution time: 16 x 106 x 0.68 + 86400 ≈ 128 days
• Total failure probability : 16 x 106 x 2.37 x 10-9 + 6 x 10-5 ≈ 0.04
• Architecture Space Exploration
• Overall performance of fully error-corrected circuit can be bottlenecked by
• Non-Clifford Ancilla qubits
• Long-distance Communication qubits
• Performance-Scalability of Architecture
• Resource allocation should meet the computational workload of the
benchmark application circuit
Summary of My Results (1)
Studying Architecture Space for Variety of Benchmarks of Varying Sizes
Summary of My Results (1)
• Reliability
• Overall PFAIL can be limited either by Memory Coherence Time or Fidelity of
Communication Channel (depending on error-correction strategy)
• Speed
• For overall TEXEC Resource budget (#qubits) can be more crucial than speeding
up the quantum gates
• Using 3 million qubits, each failing with prob. 10-7/op, a 2,048-bit
number can be factored in less than five months.
Future Work
• Sophisticated Models of Physical Components
• Cross-talk, loss of qubits etc.
• Quantum Technologies
• Superconductors
• Quantum dots
• Error-Correcting Codes and Noise Models
• Topological quantum codes
• Exploiting X/Z error asymmetry
• Correlated errors
Interesting Venues of Future Research
THANK YOU
Q & A
Supplemental Slides…
First Set of Simulations (Results[2])
PFAIL as function of Architecture Parameters
Ancilla-Regime, PFAIL depends on Shuttling Noise
Tel-Regime, PFAIL depends on Memory Noise, Teleportation Noise
More Answers…
• Architectures can compensate for the technology limitations
• Memory/Cache-Hierarchy
• Hides the slowness of dynamic RAM,
Secondary memory devices
• Multi-core designs
• Reduce power dissipation without compromising
the performance
Research in Classical Architecture
• Classical Era (lasted until 90s)
• Human intuition and experience for searching the computer design space
• Modern Era
• Software tools find good computer design. [Todd Austin’s SIMPLESCALAR
Toolset]
The idea of performance Simulation Toolset for Quantum Computers
Fault tolerance and resource overhead
•Encode 1-logical qubit → 7-physical qubit error-correcting code (e.g.,
Steane [[7,1,3]])
• Construct DATA QUBITS
• Perform error correction regularly on encoded qubits to kill errors
• Requires ANCILLA QUBITS to perform parity checks
• Apply fault-tolerant gates on encoded (logical) qubits
• Generally requires ANCILLA QUBITS for correct operation and error prevention
Noise level reduces O(p) → O(p2) with each layer of encoding
1.0E-8
1.0E-7
1.0E-6
1.0E-5
1.0E-4
0 200 400 600 800 1000 1200
FailureProbability
Execution Time (sec)
QRCA 2048
QCLA 2048
(TEXEC, PFAIL) as function of Gate Speed, Resources
1.0E-8
1.0E-7
1.0E-6
1.0E-5
1.0E-4
0 200 400 600 800 1000 1200
Failureprobability
Execution Time (sec)
QRCA 2048
QCLA 2048
Gate Speed: 10X ↑Resources: 2X ↑
1.0E-8
1.0E-7
1.0E-6
1.0E-5
1.0E-4
0 200 400 600 800 1000 1200
FailureProbability
Execution Time (sec)
QRCA 2048
QCLA 2048
Resources can be important than gate speed for decreasing TEXEC
PFAIL doesn’t decrease appreciably by adding more resources
100x ↓ TEXEC for QCLA
2,048-QCLA TEXEC = 0.69s, PFAIL = 2.77 X 10-7 with 2x ↑ Resource and 10x ↓ Gate Times
Quick Introduction to Quantum Computing
• Quantum Computer consists of
• Qubits : Store Information (e.g. Trapped-(171Yb+) Ion energy levels)
• Gates: Process Information (e.g. Lasers to cause transition between energy levels)
|𝑥
|𝑦
|𝑧
|𝑥
|𝑦
|𝑥 ⊕ 𝑦 ⊕ 𝑧
|𝑥 |𝑦 |𝑧 |𝑥 |𝑦 |𝑧
Qubits
T i m e
Quantum Circuit
Quantum
Hardware
Qubits
Lasers
Gate
Ions
Labelling the Architecture Space
TEXEC as function of Architecture Parameters
Large Segments, TEXEC depends on Ancilla
(Less Distributed System)
Small Segments, TEXEC depends on Comm. Qubits
(Highly Distributed System)
TEXEC as function of Problem Size
(Total Qubits < 1.5 million qubits)
Optimal Architecture
Configurations (upto
1024-bit Circuit)
[Data, Ancilla, Comm.], #CS
QCLA: [30, 8, 5] , 137
QRCA: [19, 12, 6] , 108
AQFT: [1, 8 , 1] , 16
QCLA: [48,4,2] , 25
Ancilla Prep:10%
Comm.: 85%
Gate: 4%
2,048 –bit QCLA Adder doesn’t well fit into 1.5 million-qubit machine
QCLA: [30,8,5] ,137
Ancilla Prep: 38%
Comm.: 29%
Gate: 33%
Second Set of Simulations (Setup)
Goal: Study Performance-Scaling of Circuits
• Benchmark Circuit
• QCLA, QRCA and AQFT
• Error Correction (Steane [[7,1,3]] code)
• Two layers of concatenation (TL2 = 70TL1)
• L1 Error-correction after each gate
• Qubit sitting idle for long enough time
• Perform L2 Error Correction
• Else, Perform L1 Error-Correction
• Physical qubits can decohere for < 0.6ms
• Computation
• CNOT (Local, Non-Local)
• TOFFOLI, T gates (Local)
• Execute TOFFOLI, T gates in Computational Segment (CS)
Pre-computed Tile performance numbers
Architecture Configuration: (#Data, #Ancilla, #Comm (qubits)), # CS
L2 Toffoli : ~51 ms, O(10-14)
L2 cross-Seg Teleportation: ~5ms, O(10-11)

More Related Content

What's hot

COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHMCOMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHMcsitconf
 
Matt Purkeypile's Doctoral Dissertation Defense Slides
Matt Purkeypile's Doctoral Dissertation Defense SlidesMatt Purkeypile's Doctoral Dissertation Defense Slides
Matt Purkeypile's Doctoral Dissertation Defense Slidesmpurkeypile
 
MMath Paper, Canlin Zhang
MMath Paper, Canlin ZhangMMath Paper, Canlin Zhang
MMath Paper, Canlin Zhangcanlin zhang
 
Computational Method to Solve the Partial Differential Equations (PDEs)
Computational Method to Solve the Partial Differential  Equations (PDEs)Computational Method to Solve the Partial Differential  Equations (PDEs)
Computational Method to Solve the Partial Differential Equations (PDEs)Dr. Khurram Mehboob
 
Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03Rediet Moges
 
Aaex5 group2(中英夾雜)
Aaex5 group2(中英夾雜)Aaex5 group2(中英夾雜)
Aaex5 group2(中英夾雜)Shiang-Yun Yang
 
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHMTHE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHMIJCSEA Journal
 
Digital Signal Processing[ECEG-3171]-Ch1_L04
Digital Signal Processing[ECEG-3171]-Ch1_L04Digital Signal Processing[ECEG-3171]-Ch1_L04
Digital Signal Processing[ECEG-3171]-Ch1_L04Rediet Moges
 
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and SystemsDSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and SystemsAmr E. Mohamed
 
Extreme‐Scale Parallel Symmetric Eigensolver for Very Small‐Size Matrices Usi...
Extreme‐Scale Parallel Symmetric Eigensolver for Very Small‐Size Matrices Usi...Extreme‐Scale Parallel Symmetric Eigensolver for Very Small‐Size Matrices Usi...
Extreme‐Scale Parallel Symmetric Eigensolver for Very Small‐Size Matrices Usi...Takahiro Katagiri
 
Sampled-Data Piecewise Affine Slab Systems: A Time-Delay Approach
Sampled-Data Piecewise Affine Slab Systems: A Time-Delay ApproachSampled-Data Piecewise Affine Slab Systems: A Time-Delay Approach
Sampled-Data Piecewise Affine Slab Systems: A Time-Delay ApproachBehzad Samadi
 
Digital Signal Processing[ECEG-3171]-Ch1_L05
Digital Signal Processing[ECEG-3171]-Ch1_L05Digital Signal Processing[ECEG-3171]-Ch1_L05
Digital Signal Processing[ECEG-3171]-Ch1_L05Rediet Moges
 
Introduction of Quantum Annealing and D-Wave Machines
Introduction of Quantum Annealing and D-Wave MachinesIntroduction of Quantum Annealing and D-Wave Machines
Introduction of Quantum Annealing and D-Wave MachinesArithmer Inc.
 
Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)Shiang-Yun Yang
 

What's hot (20)

COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHMCOMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
 
Qualifier
QualifierQualifier
Qualifier
 
OPTICALQuantum
OPTICALQuantumOPTICALQuantum
OPTICALQuantum
 
220exercises2
220exercises2220exercises2
220exercises2
 
NLP_KASHK:Markov Models
NLP_KASHK:Markov ModelsNLP_KASHK:Markov Models
NLP_KASHK:Markov Models
 
Matt Purkeypile's Doctoral Dissertation Defense Slides
Matt Purkeypile's Doctoral Dissertation Defense SlidesMatt Purkeypile's Doctoral Dissertation Defense Slides
Matt Purkeypile's Doctoral Dissertation Defense Slides
 
MMath Paper, Canlin Zhang
MMath Paper, Canlin ZhangMMath Paper, Canlin Zhang
MMath Paper, Canlin Zhang
 
Computational Method to Solve the Partial Differential Equations (PDEs)
Computational Method to Solve the Partial Differential  Equations (PDEs)Computational Method to Solve the Partial Differential  Equations (PDEs)
Computational Method to Solve the Partial Differential Equations (PDEs)
 
Adaptive dynamic programming algorithm for uncertain nonlinear switched systems
Adaptive dynamic programming algorithm for uncertain nonlinear switched systemsAdaptive dynamic programming algorithm for uncertain nonlinear switched systems
Adaptive dynamic programming algorithm for uncertain nonlinear switched systems
 
Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03
 
Aaex5 group2(中英夾雜)
Aaex5 group2(中英夾雜)Aaex5 group2(中英夾雜)
Aaex5 group2(中英夾雜)
 
Lec5
Lec5Lec5
Lec5
 
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHMTHE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
 
Digital Signal Processing[ECEG-3171]-Ch1_L04
Digital Signal Processing[ECEG-3171]-Ch1_L04Digital Signal Processing[ECEG-3171]-Ch1_L04
Digital Signal Processing[ECEG-3171]-Ch1_L04
 
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and SystemsDSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
 
Extreme‐Scale Parallel Symmetric Eigensolver for Very Small‐Size Matrices Usi...
Extreme‐Scale Parallel Symmetric Eigensolver for Very Small‐Size Matrices Usi...Extreme‐Scale Parallel Symmetric Eigensolver for Very Small‐Size Matrices Usi...
Extreme‐Scale Parallel Symmetric Eigensolver for Very Small‐Size Matrices Usi...
 
Sampled-Data Piecewise Affine Slab Systems: A Time-Delay Approach
Sampled-Data Piecewise Affine Slab Systems: A Time-Delay ApproachSampled-Data Piecewise Affine Slab Systems: A Time-Delay Approach
Sampled-Data Piecewise Affine Slab Systems: A Time-Delay Approach
 
Digital Signal Processing[ECEG-3171]-Ch1_L05
Digital Signal Processing[ECEG-3171]-Ch1_L05Digital Signal Processing[ECEG-3171]-Ch1_L05
Digital Signal Processing[ECEG-3171]-Ch1_L05
 
Introduction of Quantum Annealing and D-Wave Machines
Introduction of Quantum Annealing and D-Wave MachinesIntroduction of Quantum Annealing and D-Wave Machines
Introduction of Quantum Annealing and D-Wave Machines
 
Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)
 

Similar to Architecture Framework for Trapped Ion Quantum Computer Simulation

HDT TOOLS PRESENTATION (2000)
HDT TOOLS PRESENTATION (2000)HDT TOOLS PRESENTATION (2000)
HDT TOOLS PRESENTATION (2000)Piero Belforte
 
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...Tom Hubregtsen
 
Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18Aritra Sarkar
 
IQM slide pitch deck
IQM slide pitch deckIQM slide pitch deck
IQM slide pitch deckKan Yuenyong
 
Tech Days 2015: Embedded Product Update
Tech Days 2015: Embedded Product UpdateTech Days 2015: Embedded Product Update
Tech Days 2015: Embedded Product UpdateAdaCore
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfDuy-Hieu Bui
 
QGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATORQGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATORNVIDIA Japan
 
Quantum Computing and Qiskit
Quantum Computing and QiskitQuantum Computing and Qiskit
Quantum Computing and QiskitPooja Mistry
 
A Novel and Efficient Design for Squaring Units by Quantum-Dot Cellular Automata
A Novel and Efficient Design for Squaring Units by Quantum-Dot Cellular AutomataA Novel and Efficient Design for Squaring Units by Quantum-Dot Cellular Automata
A Novel and Efficient Design for Squaring Units by Quantum-Dot Cellular AutomataVIT-AP University
 
Barcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaBarcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaFacultad de Informática UCM
 
Echi isca2007
Echi isca2007Echi isca2007
Echi isca2007CAA Sudan
 
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISALec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISAHsien-Hsin Sean Lee, Ph.D.
 

Similar to Architecture Framework for Trapped Ion Quantum Computer Simulation (20)

Digital_system_design_A (1).ppt
Digital_system_design_A (1).pptDigital_system_design_A (1).ppt
Digital_system_design_A (1).ppt
 
computer architecture.
computer architecture.computer architecture.
computer architecture.
 
HDT TOOLS PRESENTATION (2000)
HDT TOOLS PRESENTATION (2000)HDT TOOLS PRESENTATION (2000)
HDT TOOLS PRESENTATION (2000)
 
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
 
Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18
 
Quantum programming
Quantum programmingQuantum programming
Quantum programming
 
Quantum Computers.ppt
Quantum Computers.pptQuantum Computers.ppt
Quantum Computers.ppt
 
Quantum Computers.ppt
Quantum Computers.pptQuantum Computers.ppt
Quantum Computers.ppt
 
IQM slide pitch deck
IQM slide pitch deckIQM slide pitch deck
IQM slide pitch deck
 
Tech Days 2015: Embedded Product Update
Tech Days 2015: Embedded Product UpdateTech Days 2015: Embedded Product Update
Tech Days 2015: Embedded Product Update
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
 
5378086.ppt
5378086.ppt5378086.ppt
5378086.ppt
 
QGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATORQGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATOR
 
Quantum Computing and Qiskit
Quantum Computing and QiskitQuantum Computing and Qiskit
Quantum Computing and Qiskit
 
Soc.pptx
Soc.pptxSoc.pptx
Soc.pptx
 
A Novel and Efficient Design for Squaring Units by Quantum-Dot Cellular Automata
A Novel and Efficient Design for Squaring Units by Quantum-Dot Cellular AutomataA Novel and Efficient Design for Squaring Units by Quantum-Dot Cellular Automata
A Novel and Efficient Design for Squaring Units by Quantum-Dot Cellular Automata
 
Lect_01_Intro.ppt
Lect_01_Intro.pptLect_01_Intro.ppt
Lect_01_Intro.ppt
 
Barcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaBarcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de Riqueza
 
Echi isca2007
Echi isca2007Echi isca2007
Echi isca2007
 
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISALec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
 

Architecture Framework for Trapped Ion Quantum Computer Simulation

  • 1. Architecture Framework for Trapped Ion Quantum Computer Muhammad Ahsan [Ph.D. Defense] Department of Computer Science, Duke University
  • 2. Agenda • Background and Motivation • Quantum Hardware and Architecture Models • Benchmark Application Circuits • Performance Simulation Tool • Results
  • 3. My Research Background Computer Architecture Quantum Computing Fault Tolerance Quantum Error Correction Quantum Computer Architecture Resource Performance Estimation Tool Topics of the defense Some Interesting findings! Computer Systems Undergraduate Ph.D. Research Computer ScienceElectrical Engineering Physics
  • 4. Quick Introduction to Quantum Computing (1) • Quantum Computer consists of • Quantum- Bit (Qubit) : Store Information in binary basis states:|0>,|1> (e.g. Trapped- (171Yb+) Ion energy levels) • Gates: Process Information (e.g. Lasers to cause transition between energy levels) |𝑏 |𝑐 |𝑏 |𝑏 ⊕ 𝑐 |𝑎 |𝑏 |𝑐 |𝑎 |𝑏 |𝑐 Qubits T i m e Quantum Circuit Quantum Hardware Qubits Lasers Gates Ions X|𝑎 |1 ⊕ 𝑎 Unitary UU† = I operations U |a> → |a’>
  • 5. Quick Introduction to Quantum Computing (2) • What makes quantum computers interesting (non-conventional) • Superposition of two states : a|0> + b|1> 1 • Entanglement between qubits: a|00> + b|11> • What makes quantum computers more powerful • Phase-Gates: (a|0> + eiπ/2 b|1> ) • Amplitudes (a, b) are complex (e.g. a|0> - b|1> ) • Quantum Speedup: • Amplitude cancellation can efficiently eliminate incorrect candidate solutions to search problem • Universal Quantum Computation {H, , X, Z , S} H Z S CNOT Hadamard π/2 phase-shift gate π phase-shift gate T π/4 phase-shift gate (T gate) U Controlled –CNOT (Toffoli gate) Toffoli , T gates are Double-Edged Swords • Practical Quantum Speedup • Practically Resource Consuming Insufficient for arbitrary Quantum Computation or Clifford-Gates 1 |a|2+|b|2=1
  • 6. Quantum Computing in the nutshell • Theoretically, Quantum Computers can solve certain important problems much faster than conventional (classical) computers: • Shor’s Integer Factorization Algorithm (Exponential speedup) • Practically, quantum device component (qubits, gates) are very noisy and unreliable than classical computers Need Error-Correction (Redundancy) to protect quantum Information Mean Time to Failure: Classical: ~ 107 – 108 hours Quantum: ~Seconds – Minutes Failure Prob. p = 10-3 1 in 1,000 Quantum Gate fails
  • 7. Example: Fault Tolerant 3-qubit (Toffoli) Gate Error CorrectionEncoding Encoding Encoding Error Correction Error Correction |𝑥 |𝑦 |𝑧 |𝑥 |𝑦 |𝑥 ⊕ 𝑦 ⊕ 𝑧 |𝑥 𝐿 |𝑦 𝐿 |𝑧 𝐿 4-cat 4-cat dec.4-cat 4-cat dec.4-cat 4-cat dec. 4-cat 4-cat dec.4-cat 4-cat dec.4-cat 4-cat dec -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Recovery Unprotected Quantum Gate Fault Tolerant Quantum Gate LOGICALQUBIT ANCILLA QUBITS ANCILLA QUBITS Parity Checks Special Entangled QubitState Large Number of Additional Qubits, Gates to reduce effective noise level from O(p) -> O(p2) |𝑥 e.g. Steane [[7,1,3]] code - - - - - - - - - - - - - -
  • 8. Multiple Layers of Encoding in [[7,1,3]] code No Encoding Single-Layer (L1) Encoding Two-Layers (L2) Encoding Qubits Clifford-Gates {H,X,Z,CNOT} Noise Level Failure Prob. (p) 1 7 72 = 49 p [e.g. p = 10-7] O(p2) O(p4) [e.g. 10-16] 1 7 72 = 49 Non-Clifford Gates {e.g. Toffoli} 1 O(103) O(105) [e.g. 10-10] Good News: Gain in Reliability > Qubit, Gate Overhead 0 O(101) O(102)Ancilla Qubits 102-103x ↑ 106x ↓
  • 9. Fundamental Research Question To Estimate How Reliable? And How Many? Qubits, Gates are needed to accomplish what classical computer cannot in Realistic Time Scale (e.g., 2,048-bit factorization)
  • 10. Answer depends on … Compilation of Quantum Application into Fault- tolerant gates (e.g. Gate Decomposition Methods) Fault-Tolerance Overhead (e.g. Error-Correcting Codes) Integrate Qubits, Gates on Hardware (e.g. Trapped-Ion) Quantum Application (e.g. Shor’s Algorithm) Research Progress (Theory) Research Progress (Experiment) Precise Estimate Needs Information about the Quantum Hardware
  • 11. Fundamental Research Question How Reliable? And How Many? Qubits, Gates are needed to accomplish what classical computer cannot in Realistic Time Scale (e.g., 2,048-bit factorization) Answer Heavily Depends on the Architecture of Quantum Hardware
  • 12. Impact of Hardware Assumptions on the Speed of Quantum Computer (Included with permission of rdv, TDL, KMI, quant-ph/0507023) Days -> Years Hours -> Days Classical Quantum Architecture Assumption Matters! 1,024-bit Factorization
  • 13. Question • Why think about architecture for large quantum computer ?? WHEN • We do not know exactly how to build a small quantum computer…
  • 14. Answer… • Architecture can 1. Compensate Technology Limitations (Memory hierarchy, Multi-Core Designs) 2. Reveal performance-limiting factors 3. Guide future advances in technology • Example from the History Slower Unreliable Computers Discrete Transistors Integrated Circuits (IC) Fast and Reliable Computers MOORE’S LAW
  • 15. Research Methodology • Need to Define Mechanism in which very large number of Qubits are • Allocated • Functioned • Protected • Connected in a realistically constructible quantum computer system • Need a method to efficiently • Model Quantum Computer Architecture • Map Quantum Application on the Architecture • Evaluate the Performance Limiting Factors Quantum Computer Architecture Performance Simulation Crucial Components of my Research Communication Channel
  • 16. Tool: Taxonomy of Important Terms • Device Parameters (DPs) • e.g. physical gate times, failure probability • Resource Investment • e.g. Total physical qubits used in the system • Architecture Parameters • Functional Allocation (Data, Ancilla) and Connectivity of qubits • Performance Metrics • e.g. Total Execution Time (TEXEC), Failure Probability (PFAIL) prob. That quantum circuit gives incorrect output Design Space
  • 17. Research Methodology Quantum Circuit (Quantum Adder) Quantum Hardware (e.g., Trapped-ion) Quantum Architecture (MUSIQC) Mapping (Qubits -> physical resources) Scheduling (Gates -> Sequence of physical operations) Performance Analysis (Latency, Reliability)
  • 18. Papers/Publications • Performance simulator based on Hardware Resources Constraints for Ion- Trap Quantum Computer (ICCD 2013) • Optimization of a Quantum Computer Architecture Using Resource Performance Simulator (DATE 2015) • Designing Million-Qubit Quantum Computer Using Resource Performance Simulator [In Submission (2nd Attempt)] Challenge: Target Community Mostly Unfamiliar with Quantum Computing
  • 19. Philosophy of Performance Simulation Toolset(1) Device Parameters Resource Overhead Performance Evaluation Architecture Parameters Quantum Circuit Prior Work: • Svore et al. (2004) • Balensiefer et al. (2005) • Whitney et al. (2007) • Dousti and Pedram (2012) F I X E D LESS FLEXIBLE LESS FLEXIBLE INSUFFICIENT I N S I G H T
  • 20. Philosophy of Performance Simulation Toolset (2) Device Parameters Resource Overhead Performanc e Evaluation Architecture Parameters Quantum Circuit My Contribution Fine Tuning Knob Magnifying Glass My Contribution
  • 21. Philosophy of Performance Simulation Toolset (2) Device Parameters Resource Overhead Performanc e Evaluation Architecture Parameters Quantum Circuit My Contribution Knob Magnifying Glass Desired Performance ? No No No Qubits how Reliable? Qubits How Many?
  • 22. What have Learned So Far… • Question: Quantum Computer Practically Faster Classical Computer? • What is the quality and the amount of resources needed? • Answer: Depends upon Quantum Computer Architecture study • Performance Simulation Tool for Architecture Study • Quantum Computer Design Cycle • Flexible Tool to balance improvement in Device Parameters and investment in Resource , Architecture
  • 23. Agenda • Background and Motivation • Quantum Hardware and Architecture Models • Benchmark Application Circuits • Performance Simulation Tool • Results
  • 24. Quantum Gate and Hardware Model Laser (gate) Ions (qubits) Electrodes Optical Switch Photon Detectors Photons Ballistic Shuttling Channel U |𝑥 |𝑦 |𝑧 |𝑥 |𝑦 |𝑥 ⊕ 𝑦 ⊕ 𝑧 |𝑥 |𝑦 |𝑥 |𝑥 ⊕ 𝑦 |𝑥 𝑈|𝑥 M|𝑥 {+1, -1} Entangled Pair (EPR pair) Quantum Gates Quantumbits(Qubits) Video credit: Jason Amini Beam Splitter
  • 25. Fault-tolerant and Scalable Quantum Computer Architecture Planar ion traps . . . Layer-1 Optical Switch (OS) . . . . . . ... ... . . . . . . ... ... . . . . . . ... ... Segment photonic Links Basic Ion-trap Cell Layer-2 Optical Switch (2x time expensive than Layer-1 OS) Empty Channels For Ballistic Shuttling Different Qubit Blocks Communication Port Architecture Idea: Combines the good of IONS: Reliable Storage and Computation PHOTONS: Communication
  • 26. Hardware Description and Device Parameters U M I Device Parameters (DPs) 2L x 5000μs . . . . . . 5000μs @ L=0 10,000μs @ L=1 20,000μs @ L=2 Speed and Reliability of Computation > Speed and Reliability of Communication
  • 27. Agenda • Background and Motivation • Quantum Hardware and Architecture Models • Benchmark Application Circuits • Performance Simulation Tool • Results
  • 28. Shor’s Integer Factorization Algorithm Circuit Controlled Modular Exponentiation: U(x) = ax mod N Contains O(n2) Adder calls: 512-bit → ~ 1 Million Adders 1024-bit → ~4 Million Adders 2048-bit → ~16 Million Adder m-qubitRegister m=2n . . . |𝟎 |𝟎 |𝟎 |𝟎 |𝟎 𝑼 𝑼 𝟐 𝑼 𝟒 . . . . . . . . . . . . . . . . . . . . . . . . . 𝑼 𝟐 𝒎−𝟏 Inverse Quantum Fourier Transform . . . MZ MZ MZ MZ MZ n-qubitregister Contains: O(n2) Small angle phase Rz(π/2n+1) = Depth = O(n) 𝑛 𝑇 For n-bit integer N GCD (a, N) = 1, a < N N = (ar/2-1) (ar/2+1) Period r is hidden in Eigenvalues of U(x) = ax mod N Classical Complexity: Exponential in n Quantum Complexity: Polynomial O(n3) Bulk of Shor’s Algorithm.
  • 29. Benchmark Circuits (Approx. QFT)  1 2R z   2 2R z   3 2R z   4 2R z   2kzR  H T’ H T Z . . . . . . T  2kzR   2kzR   2kzR  TXT’ Decode M Z 7 cat |0>L T|+> Magic State Preparation (latency: 78 ms) SX M T|+> |Data> T|Data> Data Teleportation into Magic State (latency: 12 ms) |Data> |Data> A1 A2 A3 … … … Time Exec. Delay V. Kliuchnikov et al. (2013) Fowler et al. (2005) Rz(π/28) sufficient to factorize Integers > 4096-bit long ~375 gates (~150 T gates) for Approx. Accuracy within 10-16
  • 30. Benchmark Circuits (Quantum Adders) Quantum Carry-Look Ahead Adder (QCLA) log-depth Draper et al. (2004) Quantum Carry-Look Ahead Adder (QCLA) linear-depth Cuccaro et al. (2004) |𝑥 |𝑦 |𝑧 |𝑥 |𝑦 |𝑥 ⊕ 𝑦 ⊕ 𝑧 Magic State Preparation Data Injection into Magic State Toffoli
  • 31. Non-Local Toffoli Gate e1 e2 |𝑥 |𝑦 |𝑧 |𝑥 |𝑦 |𝑥 ⊕ 𝑦 ⊕ 𝑧 |𝑥 |𝑦 |𝑧 Local Quantum operations |𝑥 |𝑦 |𝑧 Local Quantum operations Segment-1 Segment-2 Quantum Teleportation |𝑥 |𝑥 |𝑥 ⊕ 𝑦 ⊕ 𝑧 1 2 |00+|11 DATA Qubits Ancilla Qubits Communication (COMM) Qubits
  • 32. Fault-tolerant and Scalable Quantum Computer Architecture Planar ion traps . . . Layer-1 Optical Switch (OS) COMM. Qubit Block . . . . . . ... ... . . . . . . ... ... . . . . . . ... ... Segment Ancilla Qubit Block Data Qubit Block Optical Links Basic Ion-trap Cell Layer-2 Optical Switch (2x time expensive than Layer-1 OS) Empty Channels For Ballistic Shuttling Architecture Parameters List NSeg: Number of Segments NData: Data Tiles/Segment NComm: Comm. Tiles /Segment NAnc: Total Ancilla Tiles Comm. Tile Ancilla Tile Data Tile
  • 33. Agenda • Background and Motivation • Quantum Hardware and Architecture Models • Benchmark Application Circuits • Performance Simulation Tool • Results
  • 34. Toolbox Description(1) Tile Designer Tile Database Low Level Scheduler Low Level Mapper Low Level Error Analyzer Fault Tolerant Circuit Generator High Level Scheduler High Level Mapper High Level Error Analyzer Quantum Application Circuit Generator Input: Architecture Parameters Output: Failure Probability Probability, Latency, Resource count Tile Designer and Performance Analyzer (TDPA) Visualizer Performance Metrics Decomposer Input: DPs Application-LevelDesignerand PerformanceAnalyzer(ADPA) FLEXIBILITY Critical Tool Components Do not Assume Fixed Device, Architecture Parameter Values DEEPER INSIGHTS Advanced Output Analysis Components
  • 35. Toolbox Description (2) Error Correction Circuit Quantum Carry Look-Ahead Adder MUSIQC Architecture TDPA ADPA Low-Level Mapper: Physical Qubits to Ions Low-Level Scheduler: Physical Gates to Ion Manipulation, Movement Low-Level Error Analyzer: Calculates Logical Failure Prob. from Physical Fail Prob. High-Level Mapper: Logical Qubits to Tiles High-Level Scheduler: Logical Gates to Tile operations and Cross-Segment Movement High-Level Error Analyzer: Calculates Application level Failure Prob. Tile Tile Performance (TEXEC, PFAIL)
  • 36. Tool and Analysis Method q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q1,q2 q3 q4, q5 q6, q7 q8, q9 q10 q1,q2 q3 q4, q5 q6, q7 q8, q9 q10 T1 T2 T3 T4 Quantum Circuit Device ParametersArchitecture Definition PERFORMANCE METRICS DECOMPOSER/ VISUALIZER HIGH-LEVEL SCHEDULERHIGH-LEVEL MAPPER Pre-Computed Tile Performance (TDPA) Input Input
  • 37. Main Algorithms/Heuristics • Mapper: • Goal: Circuit-Level Connectivity Hardware Level Proximity • Algorithm: Polynomial Time Graph-Theory algo. solving Optimal Linear Arrangement Problem • Scheduler: • Goal: Reduce Execution Time, Insert Error Correction to Minimize Failure Prob. • Algorithm: [Greedy] Dispatch gate for execution AS SOON AS resources become available • Error Analyzer: • Goal: To evaluate PFAIL by fully counting logical error events • Algorithm: O(n3) Fault-path Counting Details in the Thesis
  • 38. Validity/Optimality of Scheduler • Correctness: • Detailed analysis the complete schedule (Smaller Circuits ) • Overall validation by using Visualizer Output (Larger Circuits) • Optimality • Comparing the TEXEC of circuit with and without resource constraints • X-fold ↑ Resource should yield X- fold (or more) ↓ in TEXEC • TEXEC with constraint approach TEXEC with no constraints 64-bit Quantum Carry Look-Ahead Adder Circuit Fewer Resource Regime Sufficient Resource Regime Show that Mapper, Scheduler ‘good’ at 1. Working with Fewer Resources 2. Achieve optimal Performance with Sufficient Resource
  • 39. Demo Breakdown of TEXEC A2 B2 C2 S1 S2 Magic State Preparation Overhead Toffoli Gate Execution CNOT Gate Execution Cross-Segment Swapping Overhead Time Breakdown of Critical Path Segment S2 Magic State Prep A1 B1 C1 Segment S1 Magic State Prep Data Tel. into Magic State Data Tel. into Magic State Magic State Prep Magic State Prep B1 B2 C2 A1 A2 C1 Data Tel. into Magic State A1 B1 C1 A2 B2 C2 Circuit To be Scheduled Requires Cross- Segment Swapping
  • 40. Execution Time (seconds) SegmentIDNumber Adder Circuit (QCLA) Time Steps Scheduled Adder Circuit (QCLA) Demo VisualizationQubIts Horizontal Lines: Delays due to fewer Ancilla qubits Non-horizontal Lines: Delays due to fewer Comm. qubits
  • 41. What Have Learned So Far … • Benchmark Circuits • Approximate Quantum Fourier Transform (AQFT: Long sequence of Approx. gates) • Quantum Ripple Carry Adder (QRCA: nearest-neighbor gates) • Quantum Carry Look-Ahead Adder (QCLA: Highly Parallelizable circuit, long-distance gates) • Performance-Simulation Tool • Mapper, Scheduler, Error Analyzer (Standard components) • Can work with varying device and architecture parameters • Advanced components: • Visualizer • Performance metrics Decomposer • Validation and optimality of Mapper, Scheduler
  • 42. Remember this picture… Planar ion traps . . . Layer-1 Optical Switch (OS) COMM Tile . . . . . . ... ... . . . . . . ... ... . . . . . . ... ... Segment ANCILLA Tile DATA Tile Optical Links Basic Ion-trap Cell Layer-2 Optical Switch (2x time expensive than Layer-1 OS) Empty Channels For Ballistic Shuttling Architecture Parameters List NSeg: Number of Segments NData: Data Tiles/Segment NComm: Comm. Tiles /Segment NAnc: Total Ancilla Tiles
  • 43. Agenda • Background and Motivation • Quantum Hardware and Architecture Models • Benchmark Application Circuits • Performance Simulation Tool • Results
  • 44. First Set of Simulations (Setup) Goal: Study Performance Limiting Arch. Device Parameters • Benchmark Circuit • 1024-bit QCLA • Error Correction (Steane [[7,1,3]] code) • Two layers of concatenation (L1, L2) • L2 Error-correction after each gate • Qubit sitting idle for long enough time • Perform L2 Error Correction only • (No L1 Error-Correction), • Qubits can decohere for ~ 4.8ms • Computation • CNOT (Local, Non-Local) • TOFFOLI (Local) • Can execute Toffoli in any Segment Precomputed Tile performance numbers L2 Toffoli : ~50 ms, O(10-14) L2 cross-Seg Teleportation: ~10ms, O(10-11)
  • 45. Labelling the Architecture Space TEXEC as function of Architecture Parameters Large Segments, TEXEC depends on Ancilla (Less Distributed System) Small Segments, TEXEC depends on Comm. Qubits (Highly Distributed System)
  • 46. Optimal Architecture Selection (Minimizing TEXEC - Qubits product) Ancilla Regime Architecture Tel-Regime Architecture Minimum Exec. Time-Qubit Product for (NSeg = 4, NAnc = 1362, NComm = 1) Minimum Exec. Time-Qubit Product for (NSeg = 16, NAnc = 1362 NComm = 6)
  • 47. Reducing PFAIL using Device Parameters (DP) Improving DP: Qubit Memory 10 x → Failure Prob. Reduction ~ 100 x – 1000 x 4 million Adder calls need Failure Prob. << 2.5 x 10-7 to run 1024-bit Modular Exponentiation, Shor’s algorithm Ancilla-Regime Optimal Architecture Tel-Regime Optimal Architecture
  • 48. Second Set of Simulations (Setup) Goal: Study Performance-Scaling of Circuits • Benchmark Circuit • QCLA, QRCA and AQFT • Error Correction (Steane [[7,1,3]] code) • Two layers of concatenation (TEXEC L1 < TEXEC L2) • L1 Error-correction after each gate • Qubit sitting idle for long enough time • Perform L2 Error Correction • Else, Perform L1 Error-Correction • Qubits can decohere for ~ 0.6ms • Computation • CNOT (Local, Non-Local) • TOFFOLI, T gates (Local) • Execute TOFFOLI, T gates in Computational Segment (CS) Architecture Configuration: [#Data Tiles, #Ancilla Tiles, #Comm. Tiles], #CS
  • 49. Resource-Performance Scalability of MUSIQC Architecture Requirement: TEXEC scales logarithmically with Problems Size and Resources Requirement: TEXEC scales Linearly with Problems Size and Resources QCLA QRCA AQFT PFAIL scales Linearly with Problems Size and Resources MUSIQC Architecture Can Maintain ‘Desirable’ Performance/Resource as Problem Size Increases
  • 50. TEXEC as function of (#Data, #Ancilla, #Comm. Qubits), #CS 1,024-bit QCLA 1,024-bit QRCA 1,024-bit AQFT Benchmark Circuits Resource Consumption NCS/Ancilla Qubits Comm. Qubits Overall Resource QCLA High High High QRCA Low Modest Modest AQFT Low None Low
  • 51. TEXEC as function of Problem Size (Total Qubits < 1.5 million qubits) Optimized Architecture Configurations (upto 1024-bit Circuit) [Data, Ancilla, Comm.], #CS QCLA: [30, 8, 5] , 137 QRCA: [19, 12, 6] , 108 AQFT: [1, 8 , 1] , 16 QCLA: [48,4,2] , 25 2,048 –bit QCLA Adder Doesn’t Fit Well into 1.5 Million-Qubit Machine QCLA: [30,8,5] ,137 ~90x ↓ in Ancilla, Comm. / Data Tile
  • 52. Running 2,048-bit Shor’s algorithm in < 5 months • 2,048-bit Shor’s algorithm ≈ 2,048-bit Modular Exponentiation Circuit • 2,048-bit Modular Exponentiation → 16 million calls to Adder • TEXEC of each Adder < 0.8 sec • Choose QCLA: [Current TEXEC = 260 sec] • 2x ↑ Resource Budget → TEXEC = 2.76 sec (96x reduction) • 10x ↓ Physical Op. Time (CNOT, Measurement, Shuttling Time) → TEXEC = 0.69 sec (~4x reduction) Resources can be important than gate speed for decreasing TEXEC
  • 53. Reducing PFAIL using Device Parameters • To run 2048-bit modular exponentiation we need 16 million calls to the Adder • Adder PFAIL = 2.77 x 10-7, needs to be << 1/16 x 106 = 6.67 x 10-8 10x decrease in Entanglement Infidelity → 100x reduction in PFAIL 2.37 x 10-9 << 6.67 x 10-8 (a) Pfail = 2.77 x 10-7 TEL 99.6% MEM 0.08% SHUT 0.01% GATE 0.31% Infidelity of EPR Pair: 10-4 (b) Pfail = 2.37 x 10-9 TEL 1.9% SHUT 0.55% MEM 22.3% GATE 75.3% Infidelity of EPR Pair: 10-5 Gate Noise (GATE) Teleportation Noise (TEL) Shuttling Noise (SHUT) Memory Noise (MEM)
  • 54. Overall Performance of 2,048-bit Shor’s Algorithm within 3 x 106 Qubits • QCLA (TEXEC = 0.68 sec , PFAIL = 2.37 x 10-9), AQFT (1 day, PFAIL = 6 x 10-5) • Total execution time: 16 x 106 x 0.68 + 86400 ≈ 128 days • Total failure probability : 16 x 106 x 2.37 x 10-9 + 6 x 10-5 ≈ 0.04
  • 55. • Architecture Space Exploration • Overall performance of fully error-corrected circuit can be bottlenecked by • Non-Clifford Ancilla qubits • Long-distance Communication qubits • Performance-Scalability of Architecture • Resource allocation should meet the computational workload of the benchmark application circuit Summary of My Results (1) Studying Architecture Space for Variety of Benchmarks of Varying Sizes
  • 56. Summary of My Results (1) • Reliability • Overall PFAIL can be limited either by Memory Coherence Time or Fidelity of Communication Channel (depending on error-correction strategy) • Speed • For overall TEXEC Resource budget (#qubits) can be more crucial than speeding up the quantum gates • Using 3 million qubits, each failing with prob. 10-7/op, a 2,048-bit number can be factored in less than five months.
  • 57. Future Work • Sophisticated Models of Physical Components • Cross-talk, loss of qubits etc. • Quantum Technologies • Superconductors • Quantum dots • Error-Correcting Codes and Noise Models • Topological quantum codes • Exploiting X/Z error asymmetry • Correlated errors Interesting Venues of Future Research
  • 60. First Set of Simulations (Results[2]) PFAIL as function of Architecture Parameters Ancilla-Regime, PFAIL depends on Shuttling Noise Tel-Regime, PFAIL depends on Memory Noise, Teleportation Noise
  • 61. More Answers… • Architectures can compensate for the technology limitations • Memory/Cache-Hierarchy • Hides the slowness of dynamic RAM, Secondary memory devices • Multi-core designs • Reduce power dissipation without compromising the performance
  • 62. Research in Classical Architecture • Classical Era (lasted until 90s) • Human intuition and experience for searching the computer design space • Modern Era • Software tools find good computer design. [Todd Austin’s SIMPLESCALAR Toolset] The idea of performance Simulation Toolset for Quantum Computers
  • 63. Fault tolerance and resource overhead •Encode 1-logical qubit → 7-physical qubit error-correcting code (e.g., Steane [[7,1,3]]) • Construct DATA QUBITS • Perform error correction regularly on encoded qubits to kill errors • Requires ANCILLA QUBITS to perform parity checks • Apply fault-tolerant gates on encoded (logical) qubits • Generally requires ANCILLA QUBITS for correct operation and error prevention Noise level reduces O(p) → O(p2) with each layer of encoding
  • 64. 1.0E-8 1.0E-7 1.0E-6 1.0E-5 1.0E-4 0 200 400 600 800 1000 1200 FailureProbability Execution Time (sec) QRCA 2048 QCLA 2048 (TEXEC, PFAIL) as function of Gate Speed, Resources 1.0E-8 1.0E-7 1.0E-6 1.0E-5 1.0E-4 0 200 400 600 800 1000 1200 Failureprobability Execution Time (sec) QRCA 2048 QCLA 2048 Gate Speed: 10X ↑Resources: 2X ↑ 1.0E-8 1.0E-7 1.0E-6 1.0E-5 1.0E-4 0 200 400 600 800 1000 1200 FailureProbability Execution Time (sec) QRCA 2048 QCLA 2048 Resources can be important than gate speed for decreasing TEXEC PFAIL doesn’t decrease appreciably by adding more resources 100x ↓ TEXEC for QCLA 2,048-QCLA TEXEC = 0.69s, PFAIL = 2.77 X 10-7 with 2x ↑ Resource and 10x ↓ Gate Times
  • 65. Quick Introduction to Quantum Computing • Quantum Computer consists of • Qubits : Store Information (e.g. Trapped-(171Yb+) Ion energy levels) • Gates: Process Information (e.g. Lasers to cause transition between energy levels) |𝑥 |𝑦 |𝑧 |𝑥 |𝑦 |𝑥 ⊕ 𝑦 ⊕ 𝑧 |𝑥 |𝑦 |𝑧 |𝑥 |𝑦 |𝑧 Qubits T i m e Quantum Circuit Quantum Hardware Qubits Lasers Gate Ions
  • 66.
  • 67. Labelling the Architecture Space TEXEC as function of Architecture Parameters Large Segments, TEXEC depends on Ancilla (Less Distributed System) Small Segments, TEXEC depends on Comm. Qubits (Highly Distributed System)
  • 68. TEXEC as function of Problem Size (Total Qubits < 1.5 million qubits) Optimal Architecture Configurations (upto 1024-bit Circuit) [Data, Ancilla, Comm.], #CS QCLA: [30, 8, 5] , 137 QRCA: [19, 12, 6] , 108 AQFT: [1, 8 , 1] , 16 QCLA: [48,4,2] , 25 Ancilla Prep:10% Comm.: 85% Gate: 4% 2,048 –bit QCLA Adder doesn’t well fit into 1.5 million-qubit machine QCLA: [30,8,5] ,137 Ancilla Prep: 38% Comm.: 29% Gate: 33%
  • 69. Second Set of Simulations (Setup) Goal: Study Performance-Scaling of Circuits • Benchmark Circuit • QCLA, QRCA and AQFT • Error Correction (Steane [[7,1,3]] code) • Two layers of concatenation (TL2 = 70TL1) • L1 Error-correction after each gate • Qubit sitting idle for long enough time • Perform L2 Error Correction • Else, Perform L1 Error-Correction • Physical qubits can decohere for < 0.6ms • Computation • CNOT (Local, Non-Local) • TOFFOLI, T gates (Local) • Execute TOFFOLI, T gates in Computational Segment (CS) Pre-computed Tile performance numbers Architecture Configuration: (#Data, #Ancilla, #Comm (qubits)), # CS L2 Toffoli : ~51 ms, O(10-14) L2 cross-Seg Teleportation: ~5ms, O(10-11)