May 1, 2013 1
Trends & Design Considerations
ChipEx 2013
Multicores & Network On Chip
Architectures
ALL Rights Reserved
Oren Hollander
FPGA & ARM Expert
May 1, 2013 2
What is SoC ?
• On-chip integration of a variety of functional
hardware blocks to suit a specific product
application
– CPU/CPUs + Accelerators (GPU, VPU, IPU, etc.)
– Small form factor
– High volume of peripherals
• Blocks can operate at lower frequencies while
delivering higher system-level performance and
consuming much lower system-level power
ALL Rights Reserved
Enable rich features at reasonable computing
speed and reasonable price points
May 1, 2013 3
SoC Trends
• Apple acquired PA-Semi
– Enabling it to design its own application processors
• Qualcomm acquired Atheros
– Strengthen its wireless connectivity suite and Summit
Technology for enhanced power management capability
• Nvidia acquired Icera
– Strengthen its connectivity offering
• Intel acquired Infineon Wireless
– Gain entry into the baseband connectivity market
ALL Rights Reserved
In just five years, the SoC technology has
catapulted from enabling basic
computation/connectivity on a feature phone
to being at the heart of all smartphones and
early stage ultrabooks, capable of a wide
range of functions including audio/video,
gaming, communication and productivity
May 1, 2013 4
ARM Connected Community – 800+
ALL Rights Reserved
May 1, 2013 5
SoC Examples
ALL Rights Reserved
Multimedia
i.MX 6Quad/6Dual
CPU Platform
System Control
Dual / Quad Cortex-A9
Security
Secure JTAG
PLL, Osc
Clock & Reset
NEON
per core
Watch Dog x2
Timer x3
PWM x4
Internal Memory
ROM
RAM
Graphics: OpenGL/ES 2.x,
OpenCL/EP, OpenVG 1.x
Smart DMA
1MB L2-cache + VFPv3
RNG
TrustZone
Security Ctrl
Secure RTC
32KB I-cache
per core
32KB D-cache
per core
Video Codecs: 1080p30
Connectivity
LP-DDR2,
DDR3 / LV-DDR3
x32/64, 533 MHz
MMC 4.4 / SD 3.0 x3
MMC 4.4 / SDXC
UART x5, 5Mbps
I2C x3, SPI x5
ESAI, I2S/SSI x3
3.3V GPIO
USB2 OTG & PHY
USB2 Host & PHY
MIPI HSI
S/PDIF Tx/Rx
PCIe 2.0 (1-lane)
1Gb Ethernet
+ IEEE1588
NAND Ctrl (BCH40)
USB2 HSIC Host x2
S-ATA & PHY 3GbpsPower Mgmt
Power Supplies
FlexCAN x2
MLB150 + DTCP
eFuses
Ciphers
20-bit CSI
HDMI & PHY
MIPI DSI
LCD & Camera Interface
24-bit RGB, LVDS (x3-8)
MIPI CSI2
IOMUX
Temp Monitor
Audio: ASRC
PTM
per core
Keypad
Resizing & Blending
Inversion / Rotation
Image Enhancement
2x Imaging Processing Unit
May 1, 2013 6
What is NoC ?
• NOC is a network of computational, storage and I/O
resources, interconnected by a network of switches
– Connect processing cores and subsystems in
Multiprocessor System-on-Chips
• One of the main component of NoC is a router which
is attached to a processing core (CPU or hardware
accelerator) and tranfer messages from one NoC
processing core to another core
– Resources communicate with each other using addressed
data packets routed to their destination by the switch
fabric
ALL Rights Reserved
May 1, 2013 7
Why do we need NoC ?
• State-of-the-art SoC communication architectures start
facing scalability as well as modularity limitations
– More advanced bus specifications are emerging to deal with
these issues at the expense of silicon area and complexity
• Communication architecture evolutions mainly regard bus
protocols (to better exploit available bandwidth) and bus
topologies (to increase bandwidth)
– More aggressive solutions are needed to overcome the
scalability limitation
• NoCs are currently viewed as a ‘revolutionary’ approach to
provide a scalable, high performance and robust
infrastructure for on-chip communication
ALL Rights Reserved
May 1, 2013 8
NoC Example
ALL Rights Reserved
May 1, 2013 9
Multicore Challenges
• Coherency between Multi-Cores
• Coherency between Multi-Clusters
• Homogeneous and Heterogeneous MP
• Cluster booting
• System interrupts
• Tools issues (compiler & debugger)
• Energy
ALL Rights Reserved
May 1, 2013 10
The ARM big.LITTLE Subsystem
 High performance Cortex-A15
cluster
 Energy efficient Cortex-A7
cluster
 CCI-400 provides cache coherency
between clusters
 Shared GIC-400 interrupt controller
 Note: C-A7 is not required to have
an L2 cache for coherency
management
Cortex-A15 Cortex-A7
CCI-400
CPU 1CPU 0 CPU 0 CPU 1
I$ I$ I$ I$D$ D$ D$ D$
L2 Cache + SCU L2 Cache + SCU
GIC-400
Distributor interface
CPU 0
Interface
CPU 1
Interface
CPU 2
Interface
CPU 3
Interface
Cache coherent interconnect
Interrupts
ALL Rights Reserved
May 1, 2013 11
CCI-400 and System Coherency
• CCI-400 2+3 (x3)
– 2 full AMBA 4 ACE slave
interfaces
– +3 ACE-Lite I/O Coherent
Slave interfaces
– +3 ACE-Lite master
interfaces
• CCI interfaces:
– AMBA 4 ACE and ACE-
Lite manage all
coherency and barriers
– Distributed Virtual
Memory signaling for
System MMU
ALL Rights Reserved
May 1, 2013 12
Heterogeneous Multi-Processing
• SMP OS runs across all CPUs, all clusters
• Some CPUs may be taken offline to save power
– Possibly even all CPUs in a cluster
• OS may support heterogeneous cluster configurations
– Scheduler potentially limits resource-sensitive threads to a specific cluster
SMP Operating System
C-A7 C-A7 C-A7 C-A7
Cluster 0
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
C-A15 C-A15 C-A15 C-A15
Cluster 1
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
ALL Rights Reserved
May 1, 2013 13
Principles of Task Migration
• System running on Cluster 0; Virtualizer decides more computational power is needed
• Cluster 1 powered up
• Threads migrated to Cluster 1 but Cluster 0 caches kept powered so they can still be
snooped
• When the Cluster 0 caches have gone cold, remaining system state cleaned from Cluster 0,
Cluster 0 powered down
SMP Operating System
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
C-A7 C-A7 C-A7 C-A7
Cluster 0
C-A15 C-A15 C-A15 C-A15
Cluster 1
SMP Operating System
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Virtualizer
ALL Rights Reserved
May 1, 2013 14
Coherent multi-core
• In MPCore systems a resource may be shared between threads
running on different CPUs within the cluster
– The coherency logic connects Local Monitors in each of the CPUs in the cluster
Cortex-A LocalMonitor
GlobalMonitor
AXIInterconnect
Memory
Cortex-A
LocalMonitor
CoherencyLogic
Cortex-A MPCore
Thread 0
Thread1
ALL Rights Reserved
May 1, 2013 15
Summary
• Multicore, Multiprocessing, SoC and NoC are
the current technologies
• There are many challenges and considerations
while designing and programming MP system
• You have to acquire an architecture, tools,
programming know how, in order to get the
best trade-off between performance-power
ALL Rights Reserved
May 1, 2013 16
ALL Rights Reserved

TRACK B: Multicores & Network On Chip Architectures/ Oren Hollander

  • 1.
    May 1, 20131 Trends & Design Considerations ChipEx 2013 Multicores & Network On Chip Architectures ALL Rights Reserved Oren Hollander FPGA & ARM Expert
  • 2.
    May 1, 20132 What is SoC ? • On-chip integration of a variety of functional hardware blocks to suit a specific product application – CPU/CPUs + Accelerators (GPU, VPU, IPU, etc.) – Small form factor – High volume of peripherals • Blocks can operate at lower frequencies while delivering higher system-level performance and consuming much lower system-level power ALL Rights Reserved Enable rich features at reasonable computing speed and reasonable price points
  • 3.
    May 1, 20133 SoC Trends • Apple acquired PA-Semi – Enabling it to design its own application processors • Qualcomm acquired Atheros – Strengthen its wireless connectivity suite and Summit Technology for enhanced power management capability • Nvidia acquired Icera – Strengthen its connectivity offering • Intel acquired Infineon Wireless – Gain entry into the baseband connectivity market ALL Rights Reserved In just five years, the SoC technology has catapulted from enabling basic computation/connectivity on a feature phone to being at the heart of all smartphones and early stage ultrabooks, capable of a wide range of functions including audio/video, gaming, communication and productivity
  • 4.
    May 1, 20134 ARM Connected Community – 800+ ALL Rights Reserved
  • 5.
    May 1, 20135 SoC Examples ALL Rights Reserved Multimedia i.MX 6Quad/6Dual CPU Platform System Control Dual / Quad Cortex-A9 Security Secure JTAG PLL, Osc Clock & Reset NEON per core Watch Dog x2 Timer x3 PWM x4 Internal Memory ROM RAM Graphics: OpenGL/ES 2.x, OpenCL/EP, OpenVG 1.x Smart DMA 1MB L2-cache + VFPv3 RNG TrustZone Security Ctrl Secure RTC 32KB I-cache per core 32KB D-cache per core Video Codecs: 1080p30 Connectivity LP-DDR2, DDR3 / LV-DDR3 x32/64, 533 MHz MMC 4.4 / SD 3.0 x3 MMC 4.4 / SDXC UART x5, 5Mbps I2C x3, SPI x5 ESAI, I2S/SSI x3 3.3V GPIO USB2 OTG & PHY USB2 Host & PHY MIPI HSI S/PDIF Tx/Rx PCIe 2.0 (1-lane) 1Gb Ethernet + IEEE1588 NAND Ctrl (BCH40) USB2 HSIC Host x2 S-ATA & PHY 3GbpsPower Mgmt Power Supplies FlexCAN x2 MLB150 + DTCP eFuses Ciphers 20-bit CSI HDMI & PHY MIPI DSI LCD & Camera Interface 24-bit RGB, LVDS (x3-8) MIPI CSI2 IOMUX Temp Monitor Audio: ASRC PTM per core Keypad Resizing & Blending Inversion / Rotation Image Enhancement 2x Imaging Processing Unit
  • 6.
    May 1, 20136 What is NoC ? • NOC is a network of computational, storage and I/O resources, interconnected by a network of switches – Connect processing cores and subsystems in Multiprocessor System-on-Chips • One of the main component of NoC is a router which is attached to a processing core (CPU or hardware accelerator) and tranfer messages from one NoC processing core to another core – Resources communicate with each other using addressed data packets routed to their destination by the switch fabric ALL Rights Reserved
  • 7.
    May 1, 20137 Why do we need NoC ? • State-of-the-art SoC communication architectures start facing scalability as well as modularity limitations – More advanced bus specifications are emerging to deal with these issues at the expense of silicon area and complexity • Communication architecture evolutions mainly regard bus protocols (to better exploit available bandwidth) and bus topologies (to increase bandwidth) – More aggressive solutions are needed to overcome the scalability limitation • NoCs are currently viewed as a ‘revolutionary’ approach to provide a scalable, high performance and robust infrastructure for on-chip communication ALL Rights Reserved
  • 8.
    May 1, 20138 NoC Example ALL Rights Reserved
  • 9.
    May 1, 20139 Multicore Challenges • Coherency between Multi-Cores • Coherency between Multi-Clusters • Homogeneous and Heterogeneous MP • Cluster booting • System interrupts • Tools issues (compiler & debugger) • Energy ALL Rights Reserved
  • 10.
    May 1, 201310 The ARM big.LITTLE Subsystem  High performance Cortex-A15 cluster  Energy efficient Cortex-A7 cluster  CCI-400 provides cache coherency between clusters  Shared GIC-400 interrupt controller  Note: C-A7 is not required to have an L2 cache for coherency management Cortex-A15 Cortex-A7 CCI-400 CPU 1CPU 0 CPU 0 CPU 1 I$ I$ I$ I$D$ D$ D$ D$ L2 Cache + SCU L2 Cache + SCU GIC-400 Distributor interface CPU 0 Interface CPU 1 Interface CPU 2 Interface CPU 3 Interface Cache coherent interconnect Interrupts ALL Rights Reserved
  • 11.
    May 1, 201311 CCI-400 and System Coherency • CCI-400 2+3 (x3) – 2 full AMBA 4 ACE slave interfaces – +3 ACE-Lite I/O Coherent Slave interfaces – +3 ACE-Lite master interfaces • CCI interfaces: – AMBA 4 ACE and ACE- Lite manage all coherency and barriers – Distributed Virtual Memory signaling for System MMU ALL Rights Reserved
  • 12.
    May 1, 201312 Heterogeneous Multi-Processing • SMP OS runs across all CPUs, all clusters • Some CPUs may be taken offline to save power – Possibly even all CPUs in a cluster • OS may support heterogeneous cluster configurations – Scheduler potentially limits resource-sensitive threads to a specific cluster SMP Operating System C-A7 C-A7 C-A7 C-A7 Cluster 0 Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread C-A15 C-A15 C-A15 C-A15 Cluster 1 Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread ALL Rights Reserved
  • 13.
    May 1, 201313 Principles of Task Migration • System running on Cluster 0; Virtualizer decides more computational power is needed • Cluster 1 powered up • Threads migrated to Cluster 1 but Cluster 0 caches kept powered so they can still be snooped • When the Cluster 0 caches have gone cold, remaining system state cleaned from Cluster 0, Cluster 0 powered down SMP Operating System Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread C-A7 C-A7 C-A7 C-A7 Cluster 0 C-A15 C-A15 C-A15 C-A15 Cluster 1 SMP Operating System Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Virtualizer ALL Rights Reserved
  • 14.
    May 1, 201314 Coherent multi-core • In MPCore systems a resource may be shared between threads running on different CPUs within the cluster – The coherency logic connects Local Monitors in each of the CPUs in the cluster Cortex-A LocalMonitor GlobalMonitor AXIInterconnect Memory Cortex-A LocalMonitor CoherencyLogic Cortex-A MPCore Thread 0 Thread1 ALL Rights Reserved
  • 15.
    May 1, 201315 Summary • Multicore, Multiprocessing, SoC and NoC are the current technologies • There are many challenges and considerations while designing and programming MP system • You have to acquire an architecture, tools, programming know how, in order to get the best trade-off between performance-power ALL Rights Reserved
  • 16.
    May 1, 201316 ALL Rights Reserved