SlideShare a Scribd company logo
1 of 40
ECE/CS 552: Nanophotonics
Instructor: Mikko H Lipasti
Fall 2010
University of Wisconsin-Madison
Optional lecture – just for “fun”
Good News
 Technology advances at astounding rate
– 19th century: attempts to build mechanical computers
– Early 20th century: mechanical counting systems (cash
registers, etc.)
– Mid 20th century: vacuum tubes as switches
– Since: transistors, integrated circuits
 1965: Moore’s law [Gordon Moore]
– Predicted doubling of IC capacity every 18 months
 Drives functionality, performance, cost
– Exponential improvement for 40+ years
– Built on Von Neumann model (fetch/execute)
0.00E+00
1.00E+07
1965 1970 1975 1980 1985 1990 1995 2000
IC Capacity 1965-1995
0.00E+00
5.00E+09
1.00E+10
1965 1975 1985 1995 2005 2015
IC Capacity 1965-2010
Distributed processing on chip
 Future chips rely on distributed processing
– Many computation/cache/DRAM/IO nodes
– Placement, topology, core uarch/strength, tbd
 Conventional interconnects may not suffice
– Buses not viable
– Crossbars are slow, power-hungry, expensive
– NOCs impose latency, power overhead
 Nanophotonics to the rescue
– Communicate with photons
– Inherent bandwidth, latency, energy advantages
– Silicon integration becoming a reality
 Challenges & opportunities remain
~3.5 μm
[HP]
Ring Resonator
• Wavelength Magnet
Si Photonics: How it works
[Intel]
0.5 μm
Waveguide
• Optical Wire
Laser
• Off-Chip Power
[Koch ‘07]
OFF
ON
Ring Resonators
5
Ge Doped
OFF ON : Diverting ON : Diverting
+ Detecting
ON : Injecting
Key attributes of Si photonics
 Very low latency, very high bandwidth
 Up to 1000x energy efficiency gain
 Challenges
– Resonator thermal tuning: heaters
– Integration, fabrication, is this real?
 Opportunities
– Static power dominant (laser, thermal)
– Destructive reads: fast wired or
© Hill, Lipasti
7
Nanophotonics
 Nanophotonics overview
 Sharing the nanophotonic channel
– Light-speed arbitration [MICRO 09]
 Utilizing the nanophotonic channel
– Atomic coherence [HPCA 11]
$ $
$
$ $
$
$
$
$
$ $ $ $
$
$
$
Dateline
Corona substrate [ISCA08]
Targeting Year 2017
– Logically a ring topology
– One concentric ring per node
– 3D stacked: optical, analog, digital
8
Multiple writer single reader
(MWSR) interconnects
Arbitration prevents corruption of in-flight data
latchless/
wave-pipelined
Motivating an optical arbitration solution
MWSR Arbiter must be:
1. Global - Many writers requesting access
2. Very fast – Otherwise bottleneck
Optical arbiter avoids OEO conversion delays,
provides light-speed arbitration
Proposed optical protocols
 Token-based protocols
– Inspired by classic token ring
– Token == transmission rights
– Fits well with ring-shaped interconnect
– Distributed, Scalable
– (limited to ring)
Baseline
 Based on traditional
token protocols
 Repeat token at each
node
– But data is not
repeated!
– Poor utilization
Token - Inject Token - Seize Token - Pass
Optical arbitration basics
Waveguide
Ring resonator
Power
•No Repeat!
•Token latency bounded
by the time of flight
between requesters.
Token Slot
Token Channel
Single Token / Serial Writes
Multiple Tokens / Simultaneous Writes
Arbitration solutions
Token passing allows token to pace transmission tail (no bubbles) Token passing allows token to directly precede slot
Flow control and fairness
Flow Control:
 Use token refresh as opportunity to encode
flow control information (credits available)
 Arbitration winners decrement credit count
Fairness:
 Upstream nodes get first shot at tokens
 Need mechanism to prevent starvation of
downstream nodes
Results - Performance
Uniform HotSpot
Token Slot benefits from
• the availability of multiple tokens (multiple writers)
• fast turn-around time of flow-control mechanism
Results - Latency
Token Slot has the lowest latency and saturates at 80%+ load
Uniform HotSpot
Optical arbitration summary
 Arbitration speed has to match transfer
speed for fine-grained communication
– Arbiter has to be optical
 High throughput is achievable
– 85+% for token slot
 Limited to simple topologies (MWSR)
 Implementation challenges
– Opt-elec-logic-elec-opt in 200ps (@5GHz)
© Hill, Lipasti
19
Nanophotonics
 Nanophotonics overview
 Sharing the nanophotonic channel
– Light-speed arbitration [MICRO 09]
 Utilizing the nanophotonic channel
– Atomic coherence [HPCA 11]
What makes coherence hard?
Unordered interconnects
– split transaction buses,
meshes, etc
Speculation
– Sharer-prediction, speculative data use, etc.
Multiple initiators of coherence requests
– L1-to-L2, Directory Caches, Coherence Domains,
etc
State-event pair explosion
 Verification headache
Example: MSI (SGI-Origin-like, directory, invalidate)
21
Stable States
Example: MSI (SGI-Origin-like, directory, invalidate)
22
Stable States
Busy States
Example: MSI (SGI-Origin-like, directory, invalidate)
23
Stable States
Busy States
Races
“unexpected” events from
concurrent requests to
same block
Cache coherence complexity
24
[Lepak Thesis, ‘03]
L2 MOETSI Transitions
Cache coherence
verification headache
25
Papers:
So Many States, So Little Time:
Verifying Memory Coherence in the Cray X1
Formal Methods:
e.g. Leslie Lamport’s TLA+
specification language @
Intel
Intel Core 2 Duo Errata:
AI39. Cache Data Access Request from One Core
Hitting a Modified Line in the L1 Data Cache of the
Other Core May Cause Unpredictable System Behavior
Complex Protocol
=
Complex
Verification
Simple
Simple
Atomic Coherence: Simplicity
26
w/ races w/o races
Race resolution
• Cause:
 Concurrently active coherence requests to block A
• Remedy:
 Only allow one coherence request to block A to be
active at a time.
27
Core 0
$CACHE$
Core 1
$CACHE$
A
A
Race resolution
28
Core 0
$CACHE$
Core 1
$CACHE$
A
Atomic
Substrate
Coherence
Substrate
M
S
I
Race resolution
29
Atomic
Substrate
M
S
I
Coherence
Substrate
-- Atomic Substrate is on critical path
+ Can optimize substrates separately
Atomic & Coherence Substrates
M
O
I F
S
E
Coherence
Substrate
Atomic
Substrate
(Apply Fancy
Nanophotonics Here)
(Add speculation to a
traditional protocol)
aggres
sive
aggres
sive
Mutexes circulate on ring
P1
P0
P2
P3
P1
P0
P2
P3
P1
P0
P2
P3
P1
P0
P2
P3
P1
P0
P2
P3
P1
P0
P2
P3
P1
P0
P2
P3
P1
P0
P2
P3
P1
P0
P2
P3
P1
P0
P2
P3
31
Single out mutex:
hash(addr X) λ Y @ cycle Z
Mutex acquire
[Requesting
Mutex]
P1
P0
P2
P3
Detector
P1
P0
P2
P3
P1
P0
P2
P3
P1
P0
P2
P3
P1
P0
P2
P3
P1
P0
P2
P3
P1
P0
P2
P3
P1
P0
P2
P3
P1
P0
P2
P3
P1
P0
P2
P3
[Requesting
Mutex]
[Won Mutex]
32
Exploits OFF-resonance rings: mutex
passes P1, P2 uninterrupted
[Requesting
Mutex]
[Won Mutex]
[Release Mutex]
[Requesting
Mutex]
[Won Mutex]
Mutex release
Detector
Detector
Injector
P1
P0
P2
P3
P1
P0
P2
P3
P1
P0
P2
P3
P1
P0
P2
P3
P1
P0
P2
P3
P1
P0
P2
P3 33
Mutexes on ring
34
$ $
$
$ $
$
$
$
$
$ $ $ $
$
$
$
$ $
$
$ $
$
$
$
$
$ $ $ $
$
$
$
$ $
$
$ $
$
$
$
$
$ $ $ $
$
$
$
$ $
$
$ $
$
$
$
$
$ $ $ $
$
$
$
$ $
$
$ $
$
$
$
$
$ $ $ $
$
$
$
Dateline
$ $
$
$ $
$
$
$
$
$ $ $ $
$
$
$
Dateline
$ $
$
$ $
$
$
$
$
$ $ $ $
$
$
$
Dateline
$ $
$
$ $
$
$
$
$
$ $ $ $
$
$
$
Dateline
Detectors Injectors
1 mutex = 200 ps = ~2cm = 1 cycle @ 5 GHz
2 cm
waveguides
waveguide
mutex
4
64
4





# Mutex
1024

Latency To:
• seize free mutex : ≤ 4 cycles
• tune ring resonator: < 1 cycle
35
Static:
Dynamic:
(random tester)
* Atomic Coherence
reduces complexity
Atomic Coherence:
Complexity
Performance
(128 in-order cores, optical data interconnect, MOEFSI directory)
Slowdown relative to
non-atomic MOEFSI
What is causing the
slowdown?
coherence
agnostic
Optimizing coherence
37
O.wned and F.orward State:
• Responsible for satisfying on-chip read misses
Opportunity:
• Try to keep O/F alive
• If O (or F) block evicted:
While mutex is held, ‘shift’ O/F state to sharer
Observation:
Holding Block B’s mutex gives holder free reign over
coherence activity related to block B
(or hand-off
responsibility)
Optimizing coherence
38
• If O (or F) block evicted: ‘Shift’ O/F state to sharer
# L2 transitions
(b/c less variety in
sharing possibilities)
Speedup relative to
atomic MOEFSI
Complexity
:
Performance:
Atomic Coherence Summary
 Nanophotonics as enabler
– Very fast chip-wide consensus
 Atomic Protocols are simpler
protocols
– And can have minimal cost to
performance (w/ nanophotonics)
– Opportunity for straightforward
protocol enhancements: ShiftF
 More details in HPCA-11 paper
– Push protocol (update-like)
39
races
coherence
© Hill, Lipasti
40
Nanophotonics
 Nanophotonics overview
 Sharing the nanophotonic channel
– Light-speed arbitration [MICRO 09]
 Utilizing the nanophotonic channel
– Atomic coherence [HPCA 11]

More Related Content

Similar to lec16_nanophotonics.pptx

Silicon Photonics and Photonic NoCs: A Survey
Silicon Photonics and Photonic NoCs: A SurveySilicon Photonics and Photonic NoCs: A Survey
Silicon Photonics and Photonic NoCs: A Surveydrv11291
 
Signal Integrity - A Crash Course [R Lott]
Signal Integrity - A Crash Course [R Lott]Signal Integrity - A Crash Course [R Lott]
Signal Integrity - A Crash Course [R Lott]Ryan Lott
 
Performance beyond moore's law
Performance beyond moore's lawPerformance beyond moore's law
Performance beyond moore's lawAnand Haridass
 
Lecture24 clockpower routing
Lecture24 clockpower routingLecture24 clockpower routing
Lecture24 clockpower routingfreeloadtailieu
 
OPTICAL COMMUNICATION Unit 5
OPTICAL COMMUNICATION Unit 5OPTICAL COMMUNICATION Unit 5
OPTICAL COMMUNICATION Unit 5Asif Iqbal
 
Photonics21 – Next-Generation Optical Internet Access: Roadmap for Broadband ...
Photonics21 – Next-Generation Optical Internet Access: Roadmap for Broadband ...Photonics21 – Next-Generation Optical Internet Access: Roadmap for Broadband ...
Photonics21 – Next-Generation Optical Internet Access: Roadmap for Broadband ...Xi'an Jiaotong-Liverpool University
 
CMOS VLSI design
CMOS VLSI designCMOS VLSI design
CMOS VLSI designRajan Kumar
 
Very Large Scale Integrated Circuits VLSI Overview
Very Large Scale Integrated Circuits VLSI OverviewVery Large Scale Integrated Circuits VLSI Overview
Very Large Scale Integrated Circuits VLSI OverviewEngr. Bilal Sarwar
 
Emerging Technologies in On-Chip and Off-Chip Interconnection Networks
Emerging Technologies in On-Chip and Off-Chip Interconnection NetworksEmerging Technologies in On-Chip and Off-Chip Interconnection Networks
Emerging Technologies in On-Chip and Off-Chip Interconnection NetworksAshif Sikder
 
The Optiputer - Toward a Terabit LAN
The Optiputer - Toward a Terabit LANThe Optiputer - Toward a Terabit LAN
The Optiputer - Toward a Terabit LANLarry Smarr
 
Sonet fall2011
Sonet fall2011Sonet fall2011
Sonet fall2011kongara
 
Accurate Synchronization of EtherCAT Systems Using Distributed Clocks
Accurate Synchronization of EtherCAT Systems Using Distributed ClocksAccurate Synchronization of EtherCAT Systems Using Distributed Clocks
Accurate Synchronization of EtherCAT Systems Using Distributed ClocksDesign World
 
Silicon to software share
Silicon to software shareSilicon to software share
Silicon to software shareNarendra Patel
 
6TiSCH @Telecom Bretagne 2015
6TiSCH @Telecom Bretagne 20156TiSCH @Telecom Bretagne 2015
6TiSCH @Telecom Bretagne 2015Pascal Thubert
 
Optical network and architecture
Optical network and architectureOptical network and architecture
Optical network and architectureRadha Mahalle
 

Similar to lec16_nanophotonics.pptx (20)

Silicon Photonics and Photonic NoCs: A Survey
Silicon Photonics and Photonic NoCs: A SurveySilicon Photonics and Photonic NoCs: A Survey
Silicon Photonics and Photonic NoCs: A Survey
 
Signal Integrity - A Crash Course [R Lott]
Signal Integrity - A Crash Course [R Lott]Signal Integrity - A Crash Course [R Lott]
Signal Integrity - A Crash Course [R Lott]
 
Performance beyond moore's law
Performance beyond moore's lawPerformance beyond moore's law
Performance beyond moore's law
 
Get Real
Get RealGet Real
Get Real
 
Lecture24 clockpower routing
Lecture24 clockpower routingLecture24 clockpower routing
Lecture24 clockpower routing
 
OPTICAL COMMUNICATION Unit 5
OPTICAL COMMUNICATION Unit 5OPTICAL COMMUNICATION Unit 5
OPTICAL COMMUNICATION Unit 5
 
L1 introduction
L1 introductionL1 introduction
L1 introduction
 
Photonics21 – Next-Generation Optical Internet Access: Roadmap for Broadband ...
Photonics21 – Next-Generation Optical Internet Access: Roadmap for Broadband ...Photonics21 – Next-Generation Optical Internet Access: Roadmap for Broadband ...
Photonics21 – Next-Generation Optical Internet Access: Roadmap for Broadband ...
 
CMOS VLSI design
CMOS VLSI designCMOS VLSI design
CMOS VLSI design
 
Very Large Scale Integrated Circuits VLSI Overview
Very Large Scale Integrated Circuits VLSI OverviewVery Large Scale Integrated Circuits VLSI Overview
Very Large Scale Integrated Circuits VLSI Overview
 
Emerging Technologies in On-Chip and Off-Chip Interconnection Networks
Emerging Technologies in On-Chip and Off-Chip Interconnection NetworksEmerging Technologies in On-Chip and Off-Chip Interconnection Networks
Emerging Technologies in On-Chip and Off-Chip Interconnection Networks
 
The Optiputer - Toward a Terabit LAN
The Optiputer - Toward a Terabit LANThe Optiputer - Toward a Terabit LAN
The Optiputer - Toward a Terabit LAN
 
Ph.D. Defense
Ph.D. DefensePh.D. Defense
Ph.D. Defense
 
Thesis
ThesisThesis
Thesis
 
Thesis
ThesisThesis
Thesis
 
Sonet fall2011
Sonet fall2011Sonet fall2011
Sonet fall2011
 
Accurate Synchronization of EtherCAT Systems Using Distributed Clocks
Accurate Synchronization of EtherCAT Systems Using Distributed ClocksAccurate Synchronization of EtherCAT Systems Using Distributed Clocks
Accurate Synchronization of EtherCAT Systems Using Distributed Clocks
 
Silicon to software share
Silicon to software shareSilicon to software share
Silicon to software share
 
6TiSCH @Telecom Bretagne 2015
6TiSCH @Telecom Bretagne 20156TiSCH @Telecom Bretagne 2015
6TiSCH @Telecom Bretagne 2015
 
Optical network and architecture
Optical network and architectureOptical network and architecture
Optical network and architecture
 

Recently uploaded

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 

Recently uploaded (20)

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 

lec16_nanophotonics.pptx

  • 1. ECE/CS 552: Nanophotonics Instructor: Mikko H Lipasti Fall 2010 University of Wisconsin-Madison Optional lecture – just for “fun”
  • 2. Good News  Technology advances at astounding rate – 19th century: attempts to build mechanical computers – Early 20th century: mechanical counting systems (cash registers, etc.) – Mid 20th century: vacuum tubes as switches – Since: transistors, integrated circuits  1965: Moore’s law [Gordon Moore] – Predicted doubling of IC capacity every 18 months  Drives functionality, performance, cost – Exponential improvement for 40+ years – Built on Von Neumann model (fetch/execute) 0.00E+00 1.00E+07 1965 1970 1975 1980 1985 1990 1995 2000 IC Capacity 1965-1995 0.00E+00 5.00E+09 1.00E+10 1965 1975 1985 1995 2005 2015 IC Capacity 1965-2010
  • 3. Distributed processing on chip  Future chips rely on distributed processing – Many computation/cache/DRAM/IO nodes – Placement, topology, core uarch/strength, tbd  Conventional interconnects may not suffice – Buses not viable – Crossbars are slow, power-hungry, expensive – NOCs impose latency, power overhead  Nanophotonics to the rescue – Communicate with photons – Inherent bandwidth, latency, energy advantages – Silicon integration becoming a reality  Challenges & opportunities remain
  • 4. ~3.5 μm [HP] Ring Resonator • Wavelength Magnet Si Photonics: How it works [Intel] 0.5 μm Waveguide • Optical Wire Laser • Off-Chip Power [Koch ‘07] OFF ON
  • 5. Ring Resonators 5 Ge Doped OFF ON : Diverting ON : Diverting + Detecting ON : Injecting
  • 6. Key attributes of Si photonics  Very low latency, very high bandwidth  Up to 1000x energy efficiency gain  Challenges – Resonator thermal tuning: heaters – Integration, fabrication, is this real?  Opportunities – Static power dominant (laser, thermal) – Destructive reads: fast wired or
  • 7. © Hill, Lipasti 7 Nanophotonics  Nanophotonics overview  Sharing the nanophotonic channel – Light-speed arbitration [MICRO 09]  Utilizing the nanophotonic channel – Atomic coherence [HPCA 11]
  • 8. $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ Dateline Corona substrate [ISCA08] Targeting Year 2017 – Logically a ring topology – One concentric ring per node – 3D stacked: optical, analog, digital 8
  • 9. Multiple writer single reader (MWSR) interconnects Arbitration prevents corruption of in-flight data latchless/ wave-pipelined
  • 10. Motivating an optical arbitration solution MWSR Arbiter must be: 1. Global - Many writers requesting access 2. Very fast – Otherwise bottleneck Optical arbiter avoids OEO conversion delays, provides light-speed arbitration
  • 11. Proposed optical protocols  Token-based protocols – Inspired by classic token ring – Token == transmission rights – Fits well with ring-shaped interconnect – Distributed, Scalable – (limited to ring)
  • 12. Baseline  Based on traditional token protocols  Repeat token at each node – But data is not repeated! – Poor utilization
  • 13. Token - Inject Token - Seize Token - Pass Optical arbitration basics Waveguide Ring resonator Power •No Repeat! •Token latency bounded by the time of flight between requesters.
  • 14. Token Slot Token Channel Single Token / Serial Writes Multiple Tokens / Simultaneous Writes Arbitration solutions Token passing allows token to pace transmission tail (no bubbles) Token passing allows token to directly precede slot
  • 15. Flow control and fairness Flow Control:  Use token refresh as opportunity to encode flow control information (credits available)  Arbitration winners decrement credit count Fairness:  Upstream nodes get first shot at tokens  Need mechanism to prevent starvation of downstream nodes
  • 16. Results - Performance Uniform HotSpot Token Slot benefits from • the availability of multiple tokens (multiple writers) • fast turn-around time of flow-control mechanism
  • 17. Results - Latency Token Slot has the lowest latency and saturates at 80%+ load Uniform HotSpot
  • 18. Optical arbitration summary  Arbitration speed has to match transfer speed for fine-grained communication – Arbiter has to be optical  High throughput is achievable – 85+% for token slot  Limited to simple topologies (MWSR)  Implementation challenges – Opt-elec-logic-elec-opt in 200ps (@5GHz)
  • 19. © Hill, Lipasti 19 Nanophotonics  Nanophotonics overview  Sharing the nanophotonic channel – Light-speed arbitration [MICRO 09]  Utilizing the nanophotonic channel – Atomic coherence [HPCA 11]
  • 20. What makes coherence hard? Unordered interconnects – split transaction buses, meshes, etc Speculation – Sharer-prediction, speculative data use, etc. Multiple initiators of coherence requests – L1-to-L2, Directory Caches, Coherence Domains, etc State-event pair explosion  Verification headache
  • 21. Example: MSI (SGI-Origin-like, directory, invalidate) 21 Stable States
  • 22. Example: MSI (SGI-Origin-like, directory, invalidate) 22 Stable States Busy States
  • 23. Example: MSI (SGI-Origin-like, directory, invalidate) 23 Stable States Busy States Races “unexpected” events from concurrent requests to same block
  • 24. Cache coherence complexity 24 [Lepak Thesis, ‘03] L2 MOETSI Transitions
  • 25. Cache coherence verification headache 25 Papers: So Many States, So Little Time: Verifying Memory Coherence in the Cray X1 Formal Methods: e.g. Leslie Lamport’s TLA+ specification language @ Intel Intel Core 2 Duo Errata: AI39. Cache Data Access Request from One Core Hitting a Modified Line in the L1 Data Cache of the Other Core May Cause Unpredictable System Behavior Complex Protocol = Complex Verification Simple Simple
  • 27. Race resolution • Cause:  Concurrently active coherence requests to block A • Remedy:  Only allow one coherence request to block A to be active at a time. 27 Core 0 $CACHE$ Core 1 $CACHE$ A A
  • 28. Race resolution 28 Core 0 $CACHE$ Core 1 $CACHE$ A Atomic Substrate Coherence Substrate M S I
  • 29. Race resolution 29 Atomic Substrate M S I Coherence Substrate -- Atomic Substrate is on critical path + Can optimize substrates separately
  • 30. Atomic & Coherence Substrates M O I F S E Coherence Substrate Atomic Substrate (Apply Fancy Nanophotonics Here) (Add speculation to a traditional protocol) aggres sive aggres sive
  • 31. Mutexes circulate on ring P1 P0 P2 P3 P1 P0 P2 P3 P1 P0 P2 P3 P1 P0 P2 P3 P1 P0 P2 P3 P1 P0 P2 P3 P1 P0 P2 P3 P1 P0 P2 P3 P1 P0 P2 P3 P1 P0 P2 P3 31 Single out mutex: hash(addr X) λ Y @ cycle Z
  • 33. [Requesting Mutex] [Won Mutex] [Release Mutex] [Requesting Mutex] [Won Mutex] Mutex release Detector Detector Injector P1 P0 P2 P3 P1 P0 P2 P3 P1 P0 P2 P3 P1 P0 P2 P3 P1 P0 P2 P3 P1 P0 P2 P3 33
  • 34. Mutexes on ring 34 $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ Dateline $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ Dateline $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ Dateline $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ Dateline Detectors Injectors 1 mutex = 200 ps = ~2cm = 1 cycle @ 5 GHz 2 cm waveguides waveguide mutex 4 64 4      # Mutex 1024  Latency To: • seize free mutex : ≤ 4 cycles • tune ring resonator: < 1 cycle
  • 35. 35 Static: Dynamic: (random tester) * Atomic Coherence reduces complexity Atomic Coherence: Complexity
  • 36. Performance (128 in-order cores, optical data interconnect, MOEFSI directory) Slowdown relative to non-atomic MOEFSI What is causing the slowdown? coherence agnostic
  • 37. Optimizing coherence 37 O.wned and F.orward State: • Responsible for satisfying on-chip read misses Opportunity: • Try to keep O/F alive • If O (or F) block evicted: While mutex is held, ‘shift’ O/F state to sharer Observation: Holding Block B’s mutex gives holder free reign over coherence activity related to block B (or hand-off responsibility)
  • 38. Optimizing coherence 38 • If O (or F) block evicted: ‘Shift’ O/F state to sharer # L2 transitions (b/c less variety in sharing possibilities) Speedup relative to atomic MOEFSI Complexity : Performance:
  • 39. Atomic Coherence Summary  Nanophotonics as enabler – Very fast chip-wide consensus  Atomic Protocols are simpler protocols – And can have minimal cost to performance (w/ nanophotonics) – Opportunity for straightforward protocol enhancements: ShiftF  More details in HPCA-11 paper – Push protocol (update-like) 39 races coherence
  • 40. © Hill, Lipasti 40 Nanophotonics  Nanophotonics overview  Sharing the nanophotonic channel – Light-speed arbitration [MICRO 09]  Utilizing the nanophotonic channel – Atomic coherence [HPCA 11]