Living with "Moore" & Designing the Ultimate SoC

887 views

Published on

Jack Browne,
Sonics

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
887
On SlideShare
0
From Embeds
0
Number of Embeds
125
Actions
Shares
0
Downloads
16
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • -
  • - maintain logical independence over the same resource - will insure that you are not building a 10-lane highway that is only used at rush hour and then sits idle most of the time. - On the left if one path is blocked the other path can proceed- On the right is the same logical situation with half the wires- Router is still the arbitration point with but will not stop if one path is blocked
  • Living with "Moore" & Designing the Ultimate SoC

    1. 1. Living with “Moore” &Designing the Ultimate SoC Jack Browne Senior VP Sales & Marketing, Sonics, Inc. 2 May 2012 1
    2. 2. Evolution of Consumer SoCsDriving SoC Complexity• Relentless push for higher quality user experience – at minimum system cost!• Feature convergence – Video, Voice, Data, Audio (in every consumer device!)• Critical demand for 1GHz and beyond 2 May 2012 2
    3. 3. Mobile is now! Unprecedented market impact: • 9 years ago: 3G introduced in Europe • First IPhone: 5 years old – Game changing with Apple & Samsung capturing 90% of wireless device profits • iPad 1: 2 years old . . #1 in a market that will ship 119M units in 2012 By 2014, top 4 semi market segments: • Smart phones – # 1 semi mkt 2011 2011 unit volumes > Mobile PC’s • Mobile PC’s • Office PC’s • Tablets Source: Gartner, IHS Supply 2 May 2012 3
    4. 4. Market Drivers• Smart Phone Shipments > PC’s• Consumers transitioning from Personal Computer to Personal Computing, Intel• Sensing/Control = IOT = 7B devices 2012  15B 2015, Broadcom• Cloud = Bandwidth, Connectivity, Services, Commerce, Content, … Cloud 2 May 2012 4
    5. 5. Mobile SoC Design Challenges• Products Shipping today (Smart phones, Tablets, Netbooks): – Single and Dual Core processors at 800MHz – 1.4GHz – 40nm process node – LP DDR2 – 1080p video Encode/Decode Silicon Area – Integrated and discrete baseband = 122mm2 – 60-80 unique IP cores• Products for 2013: – Next generation dual and Quad core processors at 1 - 3GHz , e.g. big.LITTLE – 28nm process node – LP DDR2 and Wide I/O memory – Multi Channel Memory – Integrated and discrete baseband – 80 – 120 Unique IP cores Silicon Area = 163mm2• 3D TSV packaging coming Source: http://www.anandtech.com 2 May 2012 5
    6. 6. Our Market Challenges Give SoC Performance, Bandwidth at Right Power CPU/GPU/Media, Process, Connectivity Complexity, Differentiation, TTM Performance, All Day Use Cortex A9 A5 Cortex A15/A7 Big.LITTLE Cortex 64-bit Big.LITTLESources: ARM, 2011, Morgan Stanley, 2011 2 May 2012 6
    7. 7. Are We Ready?• TSMC is ready…28nm HPM, with full ecosystem enablement• ARM is ready… – Cortex™-A15 CPU: 1-2.5 GHz, 1-4 cores/cluster – Mali™-T658 GPU: 350 GFLOPs, 1-8 cores/cluster• DRAM vendors are ready… – DDR3/4: 1600-3200 Mb/sec/pin 6-50 GB/sec, 1-4 channels – LPDDR2/3: 800-1600 Mb/sec/pin 3-25 GB/sec, 1-4 channels – Wide IO: 200-266 Mb/sec/pin 13-17 GB/sec, 4 channels• But what about the middle? Cortex-A15 Mali-658 Video Audio Camera Display … USB ? Tablet SoC DDR DDR DDR DDR 2 May 2012 7
    8. 8. Why So Fast? Consumer Electronics:• We’re fully converged! “Wish List” 2011 – Computing Rank Rank User – Graphics Product iPad Ages 6-12 Ages 13+ Apps 1 1  – Video/Audio Computer 4 2  iPhone 3 7 • Everything runs user Tablet (non-iPad) TV 5 9 5 4   applications iPod Touch 2 12  Kinect for Xbox 360 7 9• Apps need Giga’s E-Reader 13 3  Smartphone (non-iPhone) 10 8  – 1-2 GHz multicore CPUs Blu-Ray Player 12 6  – 100+ GFLOP multicore GPUs Nintendo 3DS PlayStation 3 6 11 16 11   – 15-50 GB/sec DRAM Nintendo DS* 8 15  Nintendo Wii 16 10 • At consumer pricing Xbox 360 14 13  PlayStation Move 17 14• … and something to Other Mobile Phone PlayStation Portable 15 18 17 18   integrate it all! Source: Nielsen, November 2011 2 May 2012 8
    9. 9. But What About Power? Consumer Electronics:• Convergence drives massive “Wish List” 2011 Rank Rank Battery SoC integration Product Ages 6-12 Ages 13+ Powered  – Thin is in! iPad Computer 1 4 1 2  • All these Giga’s cost power iPhone Tablet (non-iPad) 3 5 7 5  TV 9 4 – But most devices run from iPod Touch 2 12  batteries Kinect for Xbox 360 7 9 E-Reader 13 3 • Result: cannot afford to Smartphone (non-iPhone) Blu-Ray Player 10 12 8 6  power entire SoC at once Nintendo 3DS PlayStation 3 6 11 16 11  – “Dark silicon” Nintendo DS* 8 15  Nintendo Wii 16 10 – Power only those subsystems Xbox 360 14 13 PlayStation Move 17 14 needed for current apps Other Mobile Phone 15 17   – And only as long as needed PlayStation Portable 18 18 Source: Nielsen, November 2011 2 May 2012 9
    10. 10. Managing Dark Silicon… • General techniques – Stop/start subsystem clocks CPU GFX – Dynamic clock frequency – On/off voltage domains – Dynamic voltage/frequency Video Other domains (DVFS) Per • IP-specific techniquesdifficulty – ARM big.LITTLE™ (use optimum IP for loading) • Power managers implement the CPU GFX techniques – Software: flexible, but slow Video BB BB – Hardware: very responsive, but less flexible • Moving towards subsystem blocks normally ‘off’ 2 May 2012 10
    11. 11. System Design ChallengesHow do Semiconductor Companies Keep Pace?• System analysis – Evaluate Performance, Power, and Area early in the design• SoC Architecture Choices – Processor Speed CPU – Bus speed Security SRAM LCD Controller ROM HDMI Camera – Secure ROM DMA Clocks domains – Power and voltage domains On-chip Network – Critical data flow paths Memory Memory Audio – Ethernet Memory subsystem Scheduler Scheduler PCIe – Physical design DRAM DRAM Cont. Cont. – IP selection/development 2 May 2012 11
    12. 12. On-Chip Network SpeedProcessor and Memory selection drive on-chip networkspeeds• Memory with a 2:1 or 4:1 Controller – Example: • DDR3 2133MHz with 2:1 controller requires a network speed of 1066MHz • With 4:1 controller requires a bus speed of 533MHz• Processors with cache: 1-2GHz – On-Chip network typically runs at 2:1 or 4:1 ratio – Option 1: Run “wide and slow”: Eases timing closure – Option 2: Run “fast and narrow”: Save area• Memory speed typically paces the system 2 May 2012 12
    13. 13. Introducing SonicsGN (SGN) On-Chip Network IP for Complex SoCs Design■ High-speed network > GHz■ Low System Power – Clock gated – Power signals■ Highly Optimized area – Virtual channels – Fully configurable IP■ Ideal for Tablet/Smart Phone SoCs – Supports advanced processor speeds: 1-3GHz – Scalable design: supports many heterogeneous IP cores – Supports multiple power domains• Targeted for 28nm process node and below 2 May 2012 13
    14. 14. Key Feature: Performance Efficiency Maximize SoC Performance for Concurrent Applications• Virtual Channels MB/s – Share system resources Peak rate• Non-blocking – Always allow progress in the Flow blocked system – Advanced “knowledge” if the resource is utilized Sustainable rate• Quality of Service algorithm timeNo over-provisioning of the Network 2 May 2012 14
    15. 15. Virtual Channels Concurrency – How it works VCs allow system resources to be• Spatially Concurrent (left) maximized for greater network efficiency– More peak performance– Potential for “over- provisioning” Same area• Shared Resources (right) Shared Link Fewer wires– Uses virtual channels Less area (shared input buffers)– Independent flow control on each channel– Saves wires and area Spatially Concurrent Share Resources 2 May 2012 15
    16. 16. Power ManagementNetwork Power Consumption System Power Management■ Minimize ACTIVE power • Power Management signals • Fine grained clock gating • Fast wake-up and shut down■ Minimize IDLE power • Reliably enter and exit low-power • Coarse grained auto-gating w/ state combo wakeup • Simplify System Power Manager■ Minimum LEAKAGE power design • Efficient network - minimum gate count Power Down Req System Power Mgr On-Chip Network Power Down Ack• Advanced System Partitioning Auto Wake Enable – Identify Power intent Auto Wake Req – CPF and UPF support Active On-Chip Network can efficiently monitor activity 2 May 2012 16
    17. 17. Tablet Processor – Design Example Tablet SoC Functional Blocks 2 May 2012 17
    18. 18. 1333MHz Tablet Processor 1333MHz 533MHz Quad core CPU Quad core CPU Graphics Core1 Core2 Core1 Core2 Sub-System GPU GPU Core3 Core4 Core3 Core4 L2 Cache L2 Cache GPU GPU • Power domains 533MHz Coherency Fabric 133MHz 267MHz 533MHz 267MHz 200MHz ROM Security SRAM LCD Controller Cam 1 Secure ROM DMA HDMI MFV Codec Cam 2 133MHz 267MHz 133MHz 267MHz 200MHz SonicsGN On-chip Network 533MHz 533MHz 400MHz 133MHz• Multi- 133MHz Audio SATA Sonics MemMax Sonics MemMax Ethernet threaded, Memory Scheduler Memory Scheduler Multiple PCIe Queues 267MHz Sonics3220 Peripheral Network• Multiple QoS DRAM Cont. DRAM Cont. USB levels 2133MHz 2133MHz APB Peripherals 133MHz 133MHz 2 May 2012 18 18
    19. 19. On-Chip Network – Under the Hood• Routed Network• Virtual channels• Clock Crossing• Multiple power domains• Bit conversion 2 May 2012 19
    20. 20. Managing Power with SonicsGN• Flexible power domain 1333 MHz Cortex- A15 1066 MHz Cortex- A7 533 MHz Mali- T658 support Cluster Cluster Cluster 267 MHz Display 133 MHz 267 MHz Video 267 MHz Video 533 MHz HDMI CCI-400 Ctrl. Engine Encode – Asynch/mesochronous M M S M M M M 128 128 128 32 64 32 64 I I T I I I I – Isolation/level shifters 200 MHz Cam 64 M I 1 533 MHz 128 On-die I SRAM• HW-controlled safe T S 4x1 200 MHz J 64 Cam A I M 2 2x2 3x1 533 MHz 128 DRAM Ch. 1 DDR3 T C S shutdown 2133 2x3 400 MHz Audio 64 M I 533 MHz 128 DRAM Ch. 2 DDR3 T S• Automatic wakeup 2133 133 MHz 64 USB B M I 1 2x3 H 133 MHz 64 On-die SonicsGN ROM 5x2 S T Request 133 MHz• Benefits: 64 USB Network M I 2 G Control 133 MHz 32 IP T 4x1 S 133 MHz 64 USB – More domains M I 3 D 1x3 E F Peripherals 133 MHz 32 4x1 4x1 133 MHz OTG 64 USB – Quicker shutdown T S M I T I I I I I I I I – Faster wakeup 32 S PCIe M 32 32 M E-net 32 Security M 64 DMA M 64 SATA M 64 UFS M 64 SD/ CF/ M 64 M HSI Engine• Keep more dark, more of MMC Power Domain 267 MHz 133 MHz 267 MHz 267 MHz 133 MHz 133 MHz 133 MHz 133 MHz Boundary the time 50% SoC Power Reduction! 2 May 2012 20
    21. 21. SGN ResultsOn-Chip Network selection critical to SoC Performance Design Goals • SGN met the tablet performance requirement with fabric frequency of Results ================= 1066MHz Process: • Efficient gate count: 508K Gates TSMC 28nm HPM • Advanced system partitioning Base clocks: • Support for System Concurrency: 1.2 GHz, 1GHz – Virtual Channels: Non-blocking network – Quality of Service Area: • 508K Gates • Advanced Power Management – Simplifies unit power manager Cost to add… – >1% free running flops • Master core: 7K Gates • Slave core: 5K Gates • Support for Memory Subsystem – QoS to increase DRAM efficiency – Load balancing for multi-channel DRAM 2 May 2012 22
    22. 22. Summary• GHz, GFLOPs and GB/sec are consumer design points – And your next SoC will need them!• SoC integration must exploit that performance – GHz on-chip networks: SonicsGN – Multichannel DRAM optimization: Sonics IMT – High efficiency DRAM scheduling: Sonics MemMax• … while improving battery life – Automatic hardware power management, with software policies• SonicsGN – Twice the frequency – One half the SoC power 2 May 2012 23
    23. 23. Thank You! Questions?For more information:www.sonicsinc.comContact:jbrowne@sonicsinc.com 2 May 2012 24

    ×