Analysis and Implementation of
    Optoelectronic Network Routers

                Ph.D. Defense
                      by
...
The Big Picture

The Problem: Network bandwidth is becoming a bottleneck.
Interconnection Networks must deliver sufficient...
Outline

s   Background and Motivation
s   Research Issues and Approach
s   Modeling Free-Space Optical k-ary n-cube Wormh...
Problem
                     Starvation for off-chip bandwidth.

                     On-chip clock rates are doubling com...
Problem (cont’d)
                                 s Processor performance has increased
                                 m...
Demand on Network Bandwidth




      Multiprocessor systems require high-performance network.
      Network Router must b...
State-of-the-Art Network Routers
    Ru
     o ter                 Year                  O -ch /O ip
                     ...
State-of-the-Art Interconnects
     In n ect
       terco n                     Y ear       T sm nrate(G z)
              ...
Transceiver Sizes Comparison

                                                Optoelectronic Transmitter [Lucent 97]
     ...
Previous Work in Complex Optoelectronic Chips

s   the AMOEBA switch chip by Krishnamoorthy et al., 1996
s   a 64-bit micr...
CMOS and SEED Technology Trends
                   [SIA 97 and Krishnamoorthy 96]

Y a o firs s ip e t
 e r f t h mn      ...
WARRP Router: Complexity and I/O Pin-out Requirement
 Electronic I/O (BGA packaging) is a limiting factor.

              ...
Proposed Solution

Optoelectronic Network Router based on the WARRP (Wormhole
Adaptive Recovery-based Routing via Preempti...
Research Issues and Approach
Optoelectronic network routers:

 How does it benefit the multiprocessor network?—use analyti...
Implementation Cost Model—Connection Capacity

 s  Bisection Width [Dally 90] is the number of connections
 crossing an im...
Bisection Width and Connection Capacity
         of k-ary n-cube Networks


Bisection width:                  B = 2Wnkn-1
...
Bisection Width and Connection Capacity Comparison
                          Bisection plane   Mirror plane




          ...
Network Latency for Wormhole Switched Networks

                          Tnet = D(tr + ts + tw) + max(ts, tw) L/W


   ...
Other Important Equations for Performance Evaluation
                                           n
                     ...
Channel Cycle Time (TC)
                                                                      propagation delay (topology ...
Performance Evaluation
Optics vs Electronics (64-node system)
                                Parameters for ELECTRICAL sy...
Channel Width and Network Latency
s                                  Optics could provide about an order of magnitude high...
Packaging Issues: Power Dissipation
                                             elec                   optics            ...
Packaging and Device Tolerances

                                    Angular misalignment



                             ...
Optoelectronic Network Routers: How beneficial?

Multiprocessor networks can benefit from optoelectronic routers in two wa...
Pixel-based vs. Core-based CMOS/SEED Designs
    The TRANSPAR chip (courtesy A. Sawchuk, USC)     The WARRP II chip (SMART...
Core-based CMOS/SEED Design Issues

   Large number of SEED transceivers must be integrated with CMOS core.
   CMOS I/O po...
Solutions for the Wiring Problem


 Manual integration (simpler, more primitive method)

     CMOS core and SEED array are...
Core-based Designs using Manual Integration
                                                       SEED transceivers are
 ...
Core-based Designs using Manual Integration (cont’d)


                                     SEED and
                     ...
Solutions for the Wiring Problem (cont’d)

 Automatic integration (under development)

     CMOS core and SEED array are s...
Core-based Design using Automated CAD Tools
                             CMOS circuits, SEED array, and SEED
             ...
Cost Estimation of Core-based CMOS/SEED Designs
                                            Wiring parameters:
           ...
Xpitch
                                                     SEED and Wiring Parameters
         Bonding                Bon...
Wiring Capacity and Wiring Cost Models
              Assumptions:
               Signals are dual-rail.

               Wi...
Performance comparison between CMOS/SEED  CMOS chips
Y a o firs s ip e t
 e r f t h mn                                    ...
Design Cost Estimation
Given the design information is available:

s   Chip area can be estimated.
s   If the cost of desi...
Core-based CMOS/SEED Chips: Are They Effective?

Compared to pure-CMOS chips, CMOS/SEED chips:

s   sacrifice at most 40% ...
Fully adaptive wormhole network router*
                  External flow control                   Internal flow control   ...
The WARRP Core—A Monolithic GaAs Network Router Core
              OPFET detector                   LED
                  ...
The WARRP II Router Chip
SEED modulator and driver circuits
SEED modulator and driver circuits
                           ...
Contributions

s   Explain the network bandwidth problem which is becoming more and more
    critical in multiprocessor sy...
Conclusions

s   Optoelectronic network routers not only increase the network bandwidth but also
    facilitate the develo...
Future Work

Network performance and bandwidth utilization can be improved by
  incorporating advanced architectural techn...
Upcoming SlideShare
Loading in …5
×

Microsoft PowerPoint - PhD-defense

603 views
515 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
603
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Microsoft PowerPoint - PhD-defense

  1. 1. Analysis and Implementation of Optoelectronic Network Routers Ph.D. Defense by Mongkol Raksapatcharawong SMART † Interconnects Group Electrical Engineering - Systems Department University of Southern California - LA http://www.usc.edu/dept/ceng/pinkston/SMART.html Date: September 25, 1998 Time: 12:00pm, EEB-108
  2. 2. The Big Picture The Problem: Network bandwidth is becoming a bottleneck. Interconnection Networks must deliver sufficient bandwidth to keep pace with Interconnection Networks must deliver to keep pace with microprocessor. microprocessor. Potential Solution: Optoelectronic Network Routers Optoelectronic technology increases physical bandwidth. Optoelectronic technology physical bandwidth. Advanced router architectures improve bandwidth utilization. Advanced router architectures bandwidth utilization. The Unknowns: Performance Issues, Design Issues, and Technology Issues. Performance Issues, Design Issues, and Technology Issues.
  3. 3. Outline s Background and Motivation s Research Issues and Approach s Modeling Free-Space Optical k-ary n-cube Wormhole Networks s Design Issues of Optoelectronic Network Routers s Implementing Optoelectronic Network Routers s Conclusions and Future Work
  4. 4. Problem Starvation for off-chip bandwidth. On-chip clock rates are doubling compared to off-chip clock rates. Processor-memory bandwidth is doubling. Intel proc Intel bus SIA proc SIA bus 10000 Possible solution: integrate processor and memory onto one clock rate (M Hz) 1000 Merced (IA-64) Pentium Pro Pentium Pentium II chip--IRAM (Patterson 1995). 80486 100 Problem: shifts bandwidth 80386 80286 8088 problem to the network in 10 multiprocessor systems. 1 1980 1986 1992 1998 2004 2010 year
  5. 5. Problem (cont’d) s Processor performance has increased much faster than memory. s Memory latency hiding/tolerating techniques are required. 60 Sus taine d Bandw idth (GB/s ) Multithreaded 50 s Prefetching and multithreading pipeline processor 40 the memory accesses and threads executions. 30 Available off-chip bandwidth s Both schemes generate more off-chip 20 traffic. 10 Single-threaded processor 0 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 High-bandwidth network is required. Ye ar
  6. 6. Demand on Network Bandwidth Multiprocessor systems require high-performance network. Network Router must be fast and provide high bandwidth. Optoelectronic router can help mitigate the bandwidth problem.
  7. 7. State-of-the-Art Network Routers Ru o ter Year O -ch /O ip n ip ff-ch In al/E al tern xtern C ckR (M z) lo ates H C an el w th(b h n id its) S I S id G p er [G alles, 1996] 100/200(d u le-ed e) ob g 80/20 In T tel eraflop [C o aro erh o , 1996] arb n /V o rn 200/200 16/16 C T ray 3E [S tt/T o n 1996] co h rso , 75/375 70/14 R leR u eliab o ter [D et al., 1994] ally 100/100(d u le-ed e) ob g 32/23 s All routers employ sophisticated architectural techniques, e.g., adaptive routing, pipelined functions, etc. s Network routers are mostly designed according to the available off-chip bandwidth, barely take advantage of the state-of-the-art semiconductor technology. Limited off-chip bandwidth limits the performance of network routers.
  8. 8. State-of-the-Art Interconnects In n ect terco n Y ear T sm nrate(G z) ran issio H C an el w th(b h n id it) Eu q alizedserial line [D 1996] ally, 4 1 B irectio al sig alin id n n g [H ck/M o ey, 1997] ayco o n 2.5 8 PL OO [U C P 1996] S /H , 1 10 O to u II p bs [M to la, 1995] o ro 0.8 10 C E tah hE [U C o eyw 1997] S /H n ell, 1.0 12 s High-performance electrical interconnects suffer more from signal skew and jitter, and usually operate in serial mode. s Optical interconnects suffer less of the same effects and operate in wider channels. s In exchange to higher performance, electrical interconnects require large and sophisticated transceiver circuits. Optical interconnects show a potentially better price/performance.
  9. 9. Transceiver Sizes Comparison Optoelectronic Transmitter [Lucent 97] Size: 17µm x 11µm (2.1%) Speed: 2.48GHz Optoelectronic Receiver [Lucent 97] Size: 17µm x 13µm (2.5%) Speed: 2.48GHz I/O pad driver [Tanner Research 97] Size: 80µm x 112µm (100%) Speed: ~200MHz Equalized Line Transmitter [Dally 96] Size: 550µm x 900µm (5525%) Speed: 4Ghz Sizes are based on 0.5µm (CMOS-HP 14B) technology.
  10. 10. Previous Work in Complex Optoelectronic Chips s the AMOEBA switch chip by Krishnamoorthy et al., 1996 s a 64-bit microprocessor core by Kiamilev et al., 1996 s the Optical Multiprocessor Network Interface (OMNI) chip by Pinkston and Seelan, 1996 s a 1kbit photonic page buffer by Krishnamoorthy et al., 1996 s a 16kbit photonic page buffer by Kiamilev et al., 1997 s a multiply-accumulate DSP core by Rozier et al., 1998 Previous work focused on design and implementation, not performance evaluation of complex optoelectronic chips in general.
  11. 11. CMOS and SEED Technology Trends [SIA 97 and Krishnamoorthy 96] Y a o firs s ip e t e r f t h mn 19 99 20 01 20 03 20 06 20 09 T c n lo y(µm eho g ) 08 .1 05 .1 03 .1 00 .1 07 .0 #T n is rs(m n ) ra s to illio s 6 .2 1 0 1 8 3 9 8 4 O -c ip ff-c ipC c s(M z n h /O h lo k H) 1 5 /4 0 20 8 1 0 /7 5 50 8 2 0 /8 5 10 8 3 0 /1 3 50 05 6 0 /1 8 00 25 #P -o tsR q ire (p s in u e u d in ) 17 50 20 00 20 40 37 20 40 40 #B AP c a eP -o ts(p s G a k g in u in ) 10 50 10 80 20 20 30 00 40 10 #S E s(p rc ip ED e h ) 80 00 100 20 200 00 300 50 400 70 B n in P ds e(µm o d g a iz ) 9 8 7 5 4 Optoelectronic SEED technology shows the potential to sustain the increasing bandwidth requirement.
  12. 12. WARRP Router: Complexity and I/O Pin-out Requirement Electronic I/O (BGA packaging) is a limiting factor. 1-VC 2-VC 3-VC 100 1995 Electronic-based pin-outs CMOS/SEED-based # transistors (millions) pin-outs Year (BGA packaging) Commercial routers 10 Cray T3E 8D-256B-Bi ServerNet II SGI Spider 8D-64B-Bi Intel Teraflop 1 2003 2D-8B-Bi 8D-16B-Bi 1D-16B-Uni Mosaic C (1992) 0.1 1D-8B-Uni 1D-4B-Uni Mosaic (1987) 0.01 2009 10 100 1000 10000 WARRP II # pin-outs Network routers can benefit from large # of I/O pin-outs provided by CMOS/SEED.
  13. 13. Proposed Solution Optoelectronic Network Router based on the WARRP (Wormhole Adaptive Recovery-based Routing via Preemption) Architecture: dense optoelectronic I/O devices—provide design flexibility high-speed signaling—enable the design of high-performance network routers increased bandwidth—allow advanced network router architectures The proposed solution is potentially advantageous in the development of next-generation network routers.
  14. 14. Research Issues and Approach Optoelectronic network routers: How does it benefit the multiprocessor network?—use analytical model based on widely-employed k-ary n-cube class of networks. What are the issues pertinent to the development of such routers?—use CAD tools and semi-empirical model based on the WARRP router to identify the problem and evaluate the chips’ performance. Can they be implemented?—implement the WARRP router through various optoelectronic integrated technologies.
  15. 15. Implementation Cost Model—Connection Capacity s Bisection Width [Dally 90] is the number of connections crossing an imaginary plane dividing system into two equal halves—useful for electrical interconnected systems. s Connection Capacity [Mongkol Pinkston 96] is introduced as the number of connections that can be established for a given imaging system—useful for 3-D free- space optical interconnects.
  16. 16. Bisection Width and Connection Capacity of k-ary n-cube Networks Bisection width: B = 2Wnkn-1 Connection capacity: C = Wnkn Where Where n is the network dimension, n is the network dimension, k is the network radix, k is the network radix, W is the channel width. W is the channel width.
  17. 17. Bisection Width and Connection Capacity Comparison Bisection plane Mirror plane SystemA SystemB Optical signal path 16-node torus 8-node hypercube (Only one row is shown) Bisection width 8 8 Connection 32 24 capacity Microlens-hologram plane Diffractive-Reflective Optical Interconnect (DROI) Diffractive-Reflective Optical Interconnect (DROI) ² A system with connection capacity of 24 can implement only System B though both systems have similar bisection width. Connection capacity is a more accurate implementation cost measure.
  18. 18. Network Latency for Wormhole Switched Networks Tnet = D(tr + ts + tw) + max(ts, tw) L/W Where Where Tnet is the low load network latency, Tnet is the low load network latency, D is the network hops from source to destination, D is the network hops from source to destination, L is the data message length, L is the data message length, W is the channel width, W is the channel width, trr is the routing time, t is the routing time, ts is the data-thru time, ts is the data-thru time, tw is the wire delay time. tw is the wire delay time. s Effects of optoelectronic technology on network latency: s Dense I/O pin-outs affects network topology (D) and channel width (W); and s High-speed signaling reduces propagation delay (tw).
  19. 19. Other Important Equations for Performance Evaluation  n  2  −1 Interconnection distance:  p ⋅ 2   sin θ k = 2,   p ⋅ 2 −1 n Rmax = k = 4,  sin θ  n  2  −1 2⋅ p ⋅ k    sin θ any other k ,  Asystem Connection capacity: C= 2 2M D ( Asystem = F (θ , h, p, n, k ), θ = F λ , nx , Lb , w f )   Woptics (k , n ) =  A Channel width:  ⋅ log k  2 M 2 ⋅ N log N   D  L A k Welec (k , n ) =    NT  ⋅ 2  w 
  20. 20. Channel Cycle Time (TC) propagation delay (topology dependent) s Assuming Tc is determined by propagation delay. s Conversion time is not a killer!! s Tc = To/e +Te/o +Tprop s It is very important to have an efficient } imaging system. conversion delay (technology dependent) to p o lo g y-d e p e n d e n t r e g io n T o /e T e /o Tprop te ch n o lo g y-d e p e n d e n t r e g io n 3.5 75 3 4.85 60 T c-in t = 4n s 2.5 d el ay, n Rm ax, cm 3.85 T c, n s 2 cr o ssp o i n t 45 1.5 2.85 30 1 0.5 15 1.85 0 0 0.85 0.1 0.3 0.5 0.7 0.9 l i n k effi ci en cy, h 0.1 0.3 0.5 0.7 0.9 (r eci eved p o w er , m W ) lin k e fficie n cy, η i n ter co n n ecti o n d i stan ce, m ( r e cie ve d p o w e r , m W )
  21. 21. Performance Evaluation Optics vs Electronics (64-node system) Parameters for ELECTRICAL system. Chip area 1in2 PCB size 12x12in2 # of layers 20 min. connection length (p) 1.5in Parameters for OPTICAL system. laser wavelength (λ) 850nm VCSEL beam radius 5µm VCSEL output power 1mW P-I-N detector size 15x15µm2 microlens diameter 125µm link efficiency (η) ~ 63% chip area 1cm2 interconnection area 12x12cm2 usable microlens area (A) 64cm2 min. connection path (p) 1.5cm max. deflection angle ~ 24 o (θmax)
  22. 22. Channel Width and Network Latency s Optics could provide about an order of magnitude higher connectivity than electronic. s Optics still yields about twice the channel width of electronic. Hence, network latency is lower! s Even if channel cycle time is determined by internal router delay, wider channel still greatly benefits the network latency (shown as optics (200MHz)). elec optics elec optics optics(200Mhz) 180 channel w idth, bits/channel 160 400 140 350 netw ork latency, ns 120 300 100 250 80 200 60 150 40 100 20 50 0 0 2 3 4 5 6 2 3 4 5 6 dim ension, n dim ension, n
  23. 23. Packaging Issues: Power Dissipation elec optics elec optics laser(low ) laser(low ) 600 90 channel width, bit/channel 500 60 latency, ns 400 300 30 200 100 0 0 2 3 4 5 6 2 3 4 5 6 dim ension, n dim ension, n s Limited cooling capability reduces the achievable I/O pin-outs in optics. s Optics still yields lower network latency due to faster achievable cycle time.
  24. 24. Packaging and Device Tolerances Angular misalignment Longitudinal misalignment TX RX Lateral misalignment s Lateral misalignment: ∆Lat = 102µm s Longitudinal misalignment: ∆Long = 230µm s Angular misalignment: ∆θ = 0.044o s Wavelength variation: ∆λ = 0.8nm
  25. 25. Optoelectronic Network Routers: How beneficial? Multiprocessor networks can benefit from optoelectronic routers in two ways: s A large number of I/Os allows more design flexibility, i.e., a wide-range of topologies is efficiently supported. s High-speed optical signaling unleashes the power of high-performance network routers by fully utilizing the advanced semiconductor technology. Given that: s Better packaging technology (includes cooling technique, micro-optic alignment technique, etc.) and more uniform characteristic optoelectronic devices are available. s The bottom line: optoelectronic and its related technologies are progressing at an impressive rate and, hence, the above conclusions are becoming a near-term reality.
  26. 26. Pixel-based vs. Core-based CMOS/SEED Designs The TRANSPAR chip (courtesy A. Sawchuk, USC) The WARRP II chip (SMART group, USC) core pixel Pixel-based designs: Core-based designs: small (self-contained) circuitry large (non-self-contained) circuitry implements simpler functions implements complex functions connections are local and regular connections are global and less regular Design issues exist in implementing core-based designs! in implementing core-based designs!
  27. 27. Core-based CMOS/SEED Design Issues Large number of SEED transceivers must be integrated with CMOS core. CMOS I/O ports are not perfectly aligned with the SEED array. At least the top metal layer is reserved exclusively for SEED wiring to simplify CMOS/SEED integration. Space-invariant imaging system requires structured I/Os on the chip. Consequences: Connections between transceivers to CMOS I/O ports and/or bonding pads are longer. Less wiring and area resources for CMOS circuitry, reducing transistor density. May increase critical paths, reducing achievable on-chip clock rates. Wiring in core-based designs is a problem in core-based designs
  28. 28. Solutions for the Wiring Problem Manual integration (simpler, more primitive method) CMOS core and SEED array are separately designed and do not fully overlap (e.g., WARRP II and 64-bit processor core [Kiamilev et al., MPPOI 96]). (+) Compatible with CMOS CAD tools. (−) Chip resources are hardly optimized, seriously negate the chip performance. (−) Impractical for large core-based designs.
  29. 29. Core-based Designs using Manual Integration SEED transceivers are SEED transceivers are located on the periphery located on the periphery SRAM cells and datapath circuits SRAM cells and datapath circuits with the SEED array on top with the SEED array on top (+) Simplifies the wiring problem (+) Compatible with existing CAD tools (−) Very long connections (−) May increase critical paths (−) Low chip area utilization The 1kbit Photonic Page-buffer chip [Krishnamoorthy et al., AO 96]
  30. 30. Core-based Designs using Manual Integration (cont’d) SEED and receiver array (+) Simplifies the wiring problem SRAM cells SRAM cells datapath datapath (+) Compatible with existing CAD tools (+) Reduces connection length SEED and (+) Improves signal integrity transmitter array (−) Low chip area utilization CMOS circuits are placed on the periphery of the SEED CMOS circuits are placed on the periphery of the SEED array and corresponding transceivers array and corresponding transceivers The 16kbit Photonic Page-buffer chip [Kiamilev et al., IJO 97]
  31. 31. Solutions for the Wiring Problem (cont’d) Automatic integration (under development) CMOS core and SEED array are simultaneously optimized by CAD tool. (+) Higher chip performance can be achieved. (+) Practical for large core-based designs. (−) Requires optoelectronic-compatible CAD tools. (−) Effects of long connections and less transistor density still exist. Automatic integration is the more efficient and preferred method. is the and
  32. 32. Core-based Design using Automated CAD Tools CMOS circuits, SEED array, and SEED CMOS circuits, SEED array, and SEED transceivers are fully-overlapped transceivers are fully-overlapped (+) Directly tackles the wiring problem (+) Improves chip resource utilization (+) Mitigates the longer connections and less transistor density effects (−) Requires optoelectronic-compatible CAD tools The Multiply-accumulate chip [Rozier et al., LEOS 98], designed using EPOCH and EGGO CAD tools
  33. 33. Cost Estimation of Core-based CMOS/SEED Designs Wiring parameters: Wiring parameters: SEED parameters: SEED parameters: number of available metal layers, number of available metal layers, System-level parameter: System-level parameter: bonding pad size, bonding pad size, signal types (single or dual-rail), signal types (single or dual-rail), interconnection pattern (optical interconnection pattern (optical number of SEEDs, number of SEEDs, routing style, wiring utilization, and routing style, wiring utilization, and imaging system constraints) imaging system constraints) and SEED pitches and SEED pitches metal pitches metal pitches Wiring capacity model Wiring cost model Number of metal layers required by SEED wiring Number of metal layers required by SEED wiring Wiring utilization is determined Estimate transistor density by synthesis of the WARRP Estimate critical path length router using EPOCH tool. Estimate aggregate off-chip bandwidth
  34. 34. Xpitch SEED and Wiring Parameters Bonding Bonding Pad Pad Where: P MY-pitch D is the total number of SEED diodes, DX is the number of SEEDs in x-direction, SEED SEED Dy is the number of SEEDs in y-direction, Ypitch P is the bonding pad size, Xpitch is the pitch of diode in x-direction, Ypitch is the pitch of diode in y-direction, MX-pitch is the pitch of metal layer in x-direction, MX-pitch MY-pitch is the pitch of metal layer in y-direction. Bonding Bonding Pad Pad SEED SEED We need to find the wiring capacity provided We need to find the provided by the space in the SEED array and the by the space in the SEED array and the wiring cost required to connect all SEEDs. required to connect all SEEDs.
  35. 35. Wiring Capacity and Wiring Cost Models Assumptions: Signals are dual-rail. Wiring is X-Y style and requires at least 2 metal layers. SEEDs and CMOS I/O ports are placed randomly (worst case). Wiring Capacity in x- and y-directions: Wiring Cost in x- and y-directions:  Y pitch  D Dx X C = Ki ⋅  ⋅ D XR = ⋅  m X − pitch    2 2  X pitch − P  D Dy YC = K j ⋅  ⋅ D YR = ⋅  mY − pitch  2 2   Ki and Kj are the wiring utilization of metal layer i and j, typical values are 65% to 75%
  36. 36. Performance comparison between CMOS/SEED CMOS chips Y a o firs s ip e t e r f t h mn 19 99 20 01 20 03 20 06 20 09 T c n lo y(µm eho g ) 08 .1 05 .1 03 .1 00 .1 07 .0 #o Mta L y rsR q ire (x y f e l ae eu d , ) 1 ,1 1 ,1 2 ,1 2 ,2 2 ,2 N rmlize T n is rD n ity o a d ra s to e s 07 .7 8 07 .7 8 04 .6 5 09 .5 2 07 .6 5 N rmlize O -c ipC c o a d n h lo k 06 .7 8 06 .7 8 03 .7 7 00 .7 6 00 .7 6 N rmlize A g g teB n w th o a d g re a a d id 23 .1 1 24 .7 0 49 .3 2 71 .2 0 91 .7 6 CMOS/SEED Transistor density Max BW (SEED) Max BW (SEED) #I/Os (SEED) #I/Os (BGA) # Metal layers required for SEED Routing 80 5 100000 100000 TX Density (%) available to CMOS/SEED chips Bandwidth (GB/s) # Metal Layers 4 70 10000 # I/Os 3 10000 60 1000 2 50 1 100 1000 0.18 0.15 0.13 0.1 0.07 0.18 0.15 0.13 0.1 0.07 Technology (um) Technology (um)
  37. 37. Design Cost Estimation Given the design information is available: s Chip area can be estimated. s If the cost of design is fixed, what configurations can be implemented? s To conclude, the model gives relevant information that we have not known before regarding optoelectronic implementations of complex chip designs. The results can be used to validate that, even with the wiring problem, complex optoelectronic network routers can still be effectively implemented !!
  38. 38. Core-based CMOS/SEED Chips: Are They Effective? Compared to pure-CMOS chips, CMOS/SEED chips: s sacrifice at most 40% of transistor density and 30% of on-chip clock rates in exchange of an order of magnitude more I/O pin-outs. Given that: s optoelectronic compatible CAD tools are available. s The bottom line: as transistors are cheaper in time, complex CMOS/SEED chips provide the valuable bandwidth critically needed by current and next generation computer systems, at a very compromising cost.
  39. 39. Fully adaptive wormhole network router* External flow control Internal flow control External flow control Processing Node Proc In Proc Out MX FC X+ X+ FC OEI 5x6 EOI Input IB OB Output DM Crossbar MX Physical Physical FC X− Switch X− FC Channels OEI EOI Channels IB OB (optical) DM MX (optical) FC Y+ Y+ FC OEI EOI IB OB DM MX FC Y− Y− FC OEI EOI IB deadlock OB DM MX Normal routing section Normal Router MX DB Deadlock routing section Deadlock Router Legend: DM: Demultiplexer MX: Multiplexer FC: Flow Controller IB: Input VC Buffers OB: Output VC Buffers DB: Deadlock buffer OEI: Opto-Electronic Interface EOI: Electro-Optic Interface *Shown is a 2-D torus-connected, fully-adaptive, deadlock-recovery network router with 1 virtual channel.
  40. 40. The WARRP Core—A Monolithic GaAs Network Router Core OPFET detector LED s NCIPT(ARPA) / MIT Optochip Project s Implements core circuit of deadlock handling mechanisms (deadlock buffer, input/output buffers, arbitration logic, flow control logic). s Uses monolithic GaAs based technology to implement both logic functions and optical I/O (LED and OPFET detector). Wormhole Adaptive Recovery-based Routing via s 1-bit wide, 4-flit deep buffers; ring Preemption (WARRP) core (WARRP) topology. This is sufficient to demonstrate progressive deadlock recovery. Technology: 0.6µm Vitesse H-GaAs III process (ECL compatible logic) s 5 state complex FSM controller with Die size: 2mm x 1mm preemption prediction logic. Complexity: ~1,400 transistors s Operates at 50 MHz (under SPICE). # electrical I/Os: 27 signals s Status: electrical and optical versions are # optical I/Os: 12 single-ended signals available.
  41. 41. The WARRP II Router Chip SEED modulator and driver circuits SEED modulator and driver circuits Ring topology 4-bit-wide unidirectional channels 1 virtual channel with 2-flit deep buffers Fully adaptive, deadlock recovery routing The core circuitry requires ~10,000 transistors. Die size (core) is 0.836x0.822mm2 (0.687mm2) 40 Electrical I/Os and 20 dual-rail SEED I/Os CMOS HP14B process with 3 metal layers Operates at ~30MHz (using IRSIM) WARRP II core circuitry WARRP II core circuitry SEED receiver and driver circuits SEED receiver and driver circuits
  42. 42. Contributions s Explain the network bandwidth problem which is becoming more and more critical in multiprocessor systems. s Introduce the connection capacity concept and establish a cost and performance model based on it to analyze the performance of 3-D optical networks. s Identify the wiring problem in complex CMOS/SEED chip designs and model the performance of such chips incorporating the wiring problem by using a semi-empirical model. s Implement optoelectronic network router chips based on the WARRP router architecture using monolithic and hybrid optoelectronic/VLSI technologies. s Suggest some advanced architectural techniques to improve the network performance and network bandwidth utilization.
  43. 43. Conclusions s Optoelectronic network routers not only increase the network bandwidth but also facilitate the development of high-performance network routers. s Optoelectronic network routers are feasible given that packaging, device, and optoelectronic compatible CAD tool technologies are effectively addressed. s An optoelectronic network router shows the potential to outperforms its electronic counterpart in terms of available bandwidth and number of I/O pin-outs. s Internal network router architectures must also be reevaluated to maximize the on-chip bandwidth and not to be a bottleneck under high-bandwidth interconnects environment.
  44. 44. Future Work Network performance and bandwidth utilization can be improved by incorporating advanced architectural techniques such as: s Efficient channel configurations. s Asynchronous token-based channel arbitration. s Flit-bundling transfer technique. s Delayed-buffer technique.

×