TRACK B: Multicores & Network On Chip Architectures/ Oren Hollander


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

TRACK B: Multicores & Network On Chip Architectures/ Oren Hollander

  1. 1. May 1, 2013 1Trends & Design ConsiderationsChipEx 2013Multicores & Network On ChipArchitecturesALL Rights ReservedOren HollanderFPGA & ARM Expert
  2. 2. May 1, 2013 2What is SoC ?• On-chip integration of a variety of functionalhardware blocks to suit a specific productapplication– CPU/CPUs + Accelerators (GPU, VPU, IPU, etc.)– Small form factor– High volume of peripherals• Blocks can operate at lower frequencies whiledelivering higher system-level performance andconsuming much lower system-level powerALL Rights ReservedEnable rich features at reasonable computingspeed and reasonable price points
  3. 3. May 1, 2013 3SoC Trends• Apple acquired PA-Semi– Enabling it to design its own application processors• Qualcomm acquired Atheros– Strengthen its wireless connectivity suite and SummitTechnology for enhanced power management capability• Nvidia acquired Icera– Strengthen its connectivity offering• Intel acquired Infineon Wireless– Gain entry into the baseband connectivity marketALL Rights ReservedIn just five years, the SoC technology hascatapulted from enabling basiccomputation/connectivity on a feature phoneto being at the heart of all smartphones andearly stage ultrabooks, capable of a widerange of functions including audio/video,gaming, communication and productivity
  4. 4. May 1, 2013 4ARM Connected Community – 800+ALL Rights Reserved
  5. 5. May 1, 2013 5SoC ExamplesALL Rights ReservedMultimediai.MX 6Quad/6DualCPU PlatformSystem ControlDual / Quad Cortex-A9SecuritySecure JTAGPLL, OscClock & ResetNEONper coreWatch Dog x2Timer x3PWM x4Internal MemoryROMRAMGraphics: OpenGL/ES 2.x,OpenCL/EP, OpenVG 1.xSmart DMA1MB L2-cache + VFPv3RNGTrustZoneSecurity CtrlSecure RTC32KB I-cacheper core32KB D-cacheper coreVideo Codecs: 1080p30ConnectivityLP-DDR2,DDR3 / LV-DDR3x32/64, 533 MHzMMC 4.4 / SD 3.0 x3MMC 4.4 / SDXCUART x5, 5MbpsI2C x3, SPI x5ESAI, I2S/SSI x33.3V GPIOUSB2 OTG & PHYUSB2 Host & PHYMIPI HSIS/PDIF Tx/RxPCIe 2.0 (1-lane)1Gb Ethernet+ IEEE1588NAND Ctrl (BCH40)USB2 HSIC Host x2S-ATA & PHY 3GbpsPower MgmtPower SuppliesFlexCAN x2MLB150 + DTCPeFusesCiphers20-bit CSIHDMI & PHYMIPI DSILCD & Camera Interface24-bit RGB, LVDS (x3-8)MIPI CSI2IOMUXTemp MonitorAudio: ASRCPTMper coreKeypadResizing & BlendingInversion / RotationImage Enhancement2x Imaging Processing Unit
  6. 6. May 1, 2013 6What is NoC ?• NOC is a network of computational, storage and I/Oresources, interconnected by a network of switches– Connect processing cores and subsystems inMultiprocessor System-on-Chips• One of the main component of NoC is a router whichis attached to a processing core (CPU or hardwareaccelerator) and tranfer messages from one NoCprocessing core to another core– Resources communicate with each other using addresseddata packets routed to their destination by the switchfabricALL Rights Reserved
  7. 7. May 1, 2013 7Why do we need NoC ?• State-of-the-art SoC communication architectures startfacing scalability as well as modularity limitations– More advanced bus specifications are emerging to deal withthese issues at the expense of silicon area and complexity• Communication architecture evolutions mainly regard busprotocols (to better exploit available bandwidth) and bustopologies (to increase bandwidth)– More aggressive solutions are needed to overcome thescalability limitation• NoCs are currently viewed as a ‘revolutionary’ approach toprovide a scalable, high performance and robustinfrastructure for on-chip communicationALL Rights Reserved
  8. 8. May 1, 2013 8NoC ExampleALL Rights Reserved
  9. 9. May 1, 2013 9Multicore Challenges• Coherency between Multi-Cores• Coherency between Multi-Clusters• Homogeneous and Heterogeneous MP• Cluster booting• System interrupts• Tools issues (compiler & debugger)• EnergyALL Rights Reserved
  10. 10. May 1, 2013 10The ARM big.LITTLE Subsystem High performance Cortex-A15cluster Energy efficient Cortex-A7cluster CCI-400 provides cache coherencybetween clusters Shared GIC-400 interrupt controller Note: C-A7 is not required to havean L2 cache for coherencymanagementCortex-A15 Cortex-A7CCI-400CPU 1CPU 0 CPU 0 CPU 1I$ I$ I$ I$D$ D$ D$ D$L2 Cache + SCU L2 Cache + SCUGIC-400Distributor interfaceCPU 0InterfaceCPU 1InterfaceCPU 2InterfaceCPU 3InterfaceCache coherent interconnectInterruptsALL Rights Reserved
  11. 11. May 1, 2013 11CCI-400 and System Coherency• CCI-400 2+3 (x3)– 2 full AMBA 4 ACE slaveinterfaces– +3 ACE-Lite I/O CoherentSlave interfaces– +3 ACE-Lite masterinterfaces• CCI interfaces:– AMBA 4 ACE and ACE-Lite manage allcoherency and barriers– Distributed VirtualMemory signaling forSystem MMUALL Rights Reserved
  12. 12. May 1, 2013 12Heterogeneous Multi-Processing• SMP OS runs across all CPUs, all clusters• Some CPUs may be taken offline to save power– Possibly even all CPUs in a cluster• OS may support heterogeneous cluster configurations– Scheduler potentially limits resource-sensitive threads to a specific clusterSMP Operating SystemC-A7 C-A7 C-A7 C-A7Cluster 0ThreadThreadThreadThreadThreadThreadThreadThreadThreadThreadThreadThreadC-A15 C-A15 C-A15 C-A15Cluster 1ThreadThreadThreadThreadThreadThreadThreadThreadThreadThreadThreadThreadALL Rights Reserved
  13. 13. May 1, 2013 13Principles of Task Migration• System running on Cluster 0; Virtualizer decides more computational power is needed• Cluster 1 powered up• Threads migrated to Cluster 1 but Cluster 0 caches kept powered so they can still besnooped• When the Cluster 0 caches have gone cold, remaining system state cleaned from Cluster 0,Cluster 0 powered downSMP Operating SystemThreadThreadThreadThreadThreadThreadThreadThreadThreadThreadThreadThreadC-A7 C-A7 C-A7 C-A7Cluster 0C-A15 C-A15 C-A15 C-A15Cluster 1SMP Operating SystemThreadThreadThreadThreadThreadThreadThreadThreadThreadThreadThreadThreadVirtualizerALL Rights Reserved
  14. 14. May 1, 2013 14Coherent multi-core• In MPCore systems a resource may be shared between threadsrunning on different CPUs within the cluster– The coherency logic connects Local Monitors in each of the CPUs in the clusterCortex-A LocalMonitorGlobalMonitorAXIInterconnectMemoryCortex-ALocalMonitorCoherencyLogicCortex-A MPCoreThread 0Thread1ALL Rights Reserved
  15. 15. May 1, 2013 15Summary• Multicore, Multiprocessing, SoC and NoC arethe current technologies• There are many challenges and considerationswhile designing and programming MP system• You have to acquire an architecture, tools,programming know how, in order to get thebest trade-off between performance-powerALL Rights Reserved
  16. 16. May 1, 2013 16ALL Rights Reserved