©ARM 2017
Scaling ARM from One to
One Trillion Cores
Eric Van Hensbergen
<eric.vanhensbergen@arm.com>
ISC HPC/IoT BoF
Distinguished Engineer – Director
HPC
Software & Large Scale Systems
Research
September 21, 2019
©ARM 20172 Research - Software & Large Scale Systems
From 100 Billion to 1 Trillion
 "It's not the number of new
devices that is relevant but what
you make out of it in terms of
analytical capabilities.”
- Masayoshi Son – MWC 2017
©ARM 20173 Research - Software & Large Scale Systems
Big data starts with little data
©ARM 20174 Research - Software & Large Scale Systems
Transforming infrastructure, servers, and
storage
©ARM 20175 Research - Software & Large Scale Systems
A
Intelligent Flexible Cloud
CC
ACS ion
Storage
ion
Storage
Packet flows Packet flows
Acceleratio
n
Storage
Compute
Packet flows
S
A
CS
C
A
C
A
S
S
A
C
Cellular
virtualization
Gateway Workload-optimized
data center
Scale-down power consumption and form factor
Decrease latency
Scale-up from Little Data to Big Data
©ARM 20176 Research - Software & Large Scale Systems
Events from Sensors to
Server/Supercomputer
Lambda Engine
Cortex-A Network SoC
Hypervisor uKernel
Lambda UniK NFV Stack
Linux
Cortex-A Server SoC
Hypervisor uKernel
Lambda UniK Applications
Linux
Lambda Lambda LambdaEVENT EVENT
©ARM 20177 Research - Software & Large Scale Systems
Research areas for Serverless hardware
 Hardware driven scheduling
 No preemption, run to completion model
 Strict priority
 Generic event passing framework
 Architected queues
 Cache stashing of data
 Cache coherent “smart” accelerators
that can act as masters and slaves
 Handling bad actors, QoS, reliability
 Where does traditional system
architecture have to change?
-VM Creation
-System setup
-Event core
provisioning
-App
placement
…
Event Queue Interconnect
and Global Scheduler
Event Cores Control Cores
…
Event
Dispatch
Event
Dispatch
App A
App B
App C
App X
App A
App B
…
Buffer
Mgmt
Accelerators
Timer
Mgmt
Pktizer
Class-
ifier
Encrypt/
Decrypt
Comp/
DecompDMA
Smart NIC Smart
Storage
side
ARMs
…
©ARM 20178 Research - Software & Large Scale Systems
Research areas for Serverless software
 Abstract events as a communication
method for serverless applications
 Programming methodology to use
and pipeline “smart” accelerators
 Split system software into data and
control plane zones
 How to handle bad software and
enforce QoS?
 What role does system software play
with with intelligent hardware gaining
responsibility?
Event core
mgmt
daemons
Event core
setup
Control Cores Event Cores
Legac
y App
Legac
y App
Legacy libraries
Linux Operating System
Event App A
Accel
HAL
Reactive
Programming
Runtime
Svc
libs
System
support code
Event App B
Accel
HAL
Reactive
Programming
Runtime
Svc
libs
System
support code
Light weight sand
box per event app
©ARM 20179 Research - Software & Large Scale Systems
Big Data Starts With Little Data
©ARM 201710 Research - Software & Large Scale Systems
ISC events where you can learn more about
ARM• BoF 13: Designing, Porting & Optimizing HPC Workloads for ARM Based Systems
• 2:45 pm - 3:45 pm, Tuesday
• Speakers: James Ang (Sandia), Ross Miller (ORNL), Neil Morgan (Hartree), Larry Wikelius (Cavium)
• Workshop – Going ARM
• 9am-1pm, Thursday morning
• www.goingarm.com for more details
• Also talks at ExaComm Workshop (Pavel Shamis) and Post Moore’s Law Scaling
Workshop (Jonathan Beard)
• Vendor showdowns and exhibitor forums
• Also look out on the exhibition floor for booths from ARM partners
• To schedule a private meeting, e-mail hpc@arm.com
The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM
Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. All other marks
featured may be trademarks of their respective owners.
Copyright © 2017 ARM Limited
www.arm.com/research
Hiring globally, research centers in
Austin, Boston, Cambridge (UK), San Jose, &
Shanghai
developer.arm.com/research/careers

Scaling Arm from One to One Trillion

  • 1.
    ©ARM 2017 Scaling ARMfrom One to One Trillion Cores Eric Van Hensbergen <eric.vanhensbergen@arm.com> ISC HPC/IoT BoF Distinguished Engineer – Director HPC Software & Large Scale Systems Research September 21, 2019
  • 2.
    ©ARM 20172 Research- Software & Large Scale Systems From 100 Billion to 1 Trillion  "It's not the number of new devices that is relevant but what you make out of it in terms of analytical capabilities.” - Masayoshi Son – MWC 2017
  • 3.
    ©ARM 20173 Research- Software & Large Scale Systems Big data starts with little data
  • 4.
    ©ARM 20174 Research- Software & Large Scale Systems Transforming infrastructure, servers, and storage
  • 5.
    ©ARM 20175 Research- Software & Large Scale Systems A Intelligent Flexible Cloud CC ACS ion Storage ion Storage Packet flows Packet flows Acceleratio n Storage Compute Packet flows S A CS C A C A S S A C Cellular virtualization Gateway Workload-optimized data center Scale-down power consumption and form factor Decrease latency Scale-up from Little Data to Big Data
  • 6.
    ©ARM 20176 Research- Software & Large Scale Systems Events from Sensors to Server/Supercomputer Lambda Engine Cortex-A Network SoC Hypervisor uKernel Lambda UniK NFV Stack Linux Cortex-A Server SoC Hypervisor uKernel Lambda UniK Applications Linux Lambda Lambda LambdaEVENT EVENT
  • 7.
    ©ARM 20177 Research- Software & Large Scale Systems Research areas for Serverless hardware  Hardware driven scheduling  No preemption, run to completion model  Strict priority  Generic event passing framework  Architected queues  Cache stashing of data  Cache coherent “smart” accelerators that can act as masters and slaves  Handling bad actors, QoS, reliability  Where does traditional system architecture have to change? -VM Creation -System setup -Event core provisioning -App placement … Event Queue Interconnect and Global Scheduler Event Cores Control Cores … Event Dispatch Event Dispatch App A App B App C App X App A App B … Buffer Mgmt Accelerators Timer Mgmt Pktizer Class- ifier Encrypt/ Decrypt Comp/ DecompDMA Smart NIC Smart Storage side ARMs …
  • 8.
    ©ARM 20178 Research- Software & Large Scale Systems Research areas for Serverless software  Abstract events as a communication method for serverless applications  Programming methodology to use and pipeline “smart” accelerators  Split system software into data and control plane zones  How to handle bad software and enforce QoS?  What role does system software play with with intelligent hardware gaining responsibility? Event core mgmt daemons Event core setup Control Cores Event Cores Legac y App Legac y App Legacy libraries Linux Operating System Event App A Accel HAL Reactive Programming Runtime Svc libs System support code Event App B Accel HAL Reactive Programming Runtime Svc libs System support code Light weight sand box per event app
  • 9.
    ©ARM 20179 Research- Software & Large Scale Systems Big Data Starts With Little Data
  • 10.
    ©ARM 201710 Research- Software & Large Scale Systems ISC events where you can learn more about ARM• BoF 13: Designing, Porting & Optimizing HPC Workloads for ARM Based Systems • 2:45 pm - 3:45 pm, Tuesday • Speakers: James Ang (Sandia), Ross Miller (ORNL), Neil Morgan (Hartree), Larry Wikelius (Cavium) • Workshop – Going ARM • 9am-1pm, Thursday morning • www.goingarm.com for more details • Also talks at ExaComm Workshop (Pavel Shamis) and Post Moore’s Law Scaling Workshop (Jonathan Beard) • Vendor showdowns and exhibitor forums • Also look out on the exhibition floor for booths from ARM partners • To schedule a private meeting, e-mail hpc@arm.com
  • 11.
    The trademarks featuredin this presentation are registered and/or unregistered trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. Copyright © 2017 ARM Limited www.arm.com/research Hiring globally, research centers in Austin, Boston, Cambridge (UK), San Jose, & Shanghai developer.arm.com/research/careers

Editor's Notes

  • #4 Big data starts with little data. The IoT spans sensor to server. Only the ARM partner ecosystem has the breadth and diversity to meet the needs of rapidly evolving interconnectivity of individuals and devices. Looking at the full impact of IoT and the business drivers, not only do you have the mobile and wearables driving the new infrastructure ad server requirements, you also have IoT from the perspective of industrial, M2M, and B2B doing as much or more to add 10s of Billions of devices on to this infrastructure. The little data provided by these nodes is contributing as much to the big data as anything.  IoT will transform every society and the business model of every industry  Thanks to ARM’s commitment and the commitment from our Partners, mbed is growing to become the development platform for IoT – for innovators and incumbents  Intelligence and energy efficiency down to the tiniest of sensor nodes is vital.  Standards are essential for scalable, secure IoT proliferation Soundbites ARM’s partners have shipped over 20bn Cortex-M chips to date ARM ’s partners shipped 6bn Cortex-M chips in 2015 Over 350 Cortex-M licenses signed to date  Gartner forecast ~30billion IoT devices by 2020  By 2017 Gartner estimate that half of IoT solutions will come from companies <3y/o  Gartner estimate that two thirds of IoT solutions will be in diverse applications with volumes of 100 million units or less  By 2020 average (first world) family of four will have 50 internet-connected devices in the home (up from 10 in 2012) – e.g. alarms, cameras, meters, toys, lighting, health monitoring  10s of millions of wearables already shipping inn 2013 – we think >200m units in 2018  Example wearables markets: Smart Watches -Health Monitoring Patches, Smart Glasses, Baby Monitors, Fitness Clips, Smart Jewelry, Posture Monitors Partnership and collaboration will be absolutely necessary to deliver the end to end. It takes a broad range of devices, companies, technologies, and knowledge. No one company can do everything – it takes an ecosystem. This bringing together the best of breed from a diverse set of industries enables us all to the opportunities before us not only in consumer goods and services, but also in energy, health care, communications, and education
  • #5 Hyperconnectivity is reshaping the requirements for our infrastructure and our data centers. The networks that get data to and from us; how that data is managed; how it is stored and shared by others. ARM enables smarter, optimized carrier and enterprise platforms that are radically expanding the realm of what is possible for mobile network and data center deployments. o The ARM business model enables server-class compute to be combined with networking and storage acceleration. This allows ecosystem partners to develop cost and application targeted server platforms for the lowest possible TCO. o Combined with a standardized, open software and tool ecosystem, ARM-based systems allow infrastructure providers to break free of legacy hardware and software boundaries for scalable, flexible, cost-optimized deployments that meet the needs of today’s rapidly evolving traffic demands. o ARM technology enables optimized heterogeneous processing through a range of processor core, interconnect and peripheral IP for significantly higher data rates while minimizing end-to-end latency. o ARM is a cross-platform standard; partners can choose from full COT through to merchant silicon/ASSP to FPGA. This flexibility enables ARM-based designs improve efficiencies at any point in the network or data center. People are re-analyzing the work loads in infrastructure and servers and are now realizing there may be more efficient (cost, power, flexibility) ways of doing business. Software defined networking allows the flexible use of installed infrastructure. Consider the pace of innovation today…This is the only way to future proof the network Open Compute Platform – all about more efficiency in business, greater openness and collaboration Flexible server architectures (HP Moonshot??) to address the changing cloud Soundbites  Subscriber data demands mean data center growth cannot continue with the existing architecture and approach and that network operators cannot continue to add capacity and operate using existing network topology and programing. o There will be more bytes of mobile data traffic per month (10 exabytes from Cisco VNI) then there are grains of sand (7.5 exabytes as estimated by the university of Hawaii) o Enterprise networking and energy efficient servers will represent a $20 billion silicon market by 2018, compared to $13 billion in – o End user Cloud spending represents about $150Billion in 2015 per gartner  We believe the ARM architecture will address about 30% server workloads in 2017  16 licenses signed for server applications  ARM ‘Server Base System Architecture’ (SBSA) server standard is the start of a revolution in the server market, and the trigger for growth in ARM-based Servers. o Includes Canonical, Citrix, Linaro, Microsoft, Red Hat and SUSE, and OEMs incl Dell and HP along with a broad set of silicon partners.  Ecosystem partners including Marvell, TI, AMD, AppliedMicro, Broadcom, and Cavium are developing solutions for the data center space.  Commitment from Dell, HP and ODMs incl MiTAC and others to sell ARM-based servers  13 silicon partners addressing networking applications  Shipments of 1st ARM based enterprise networking solutions in Q3/13 with shipments up ~150% (with a market that is down 15%).  Market share for Base Stations utilizing  Global data traffic reached 885 petabytes per month at the end of 2012, up from 520 petabytes in 2011  2012 MOBILE data was nearly 12x the entire global internet it 2000  Monthly global traffic will surpass 10 exabytes in 2017  According to FB, today alone their photo stores are growing by 7PB/Month.
  • #6 Key messages: At each point in the network from sensor to server, different combinations of compute, storage and acceleration will be required At the edge, typically, there will be more acceleration, for example, to address the need for reduced latency This combination will vary through the network as the workloads required vary from packetization to aggregation And at the core, high performance compute and storage will be key Additional notes: Fog [architecture] is based on a hierarchical solution Each has an amount of compute, storage, and networking at each node, The reasons for pushing capabilities (often complex) to the edge is for requirements like real-time, latency safety, mobility (e.g., robots, vehicles), autonomous operation (many application solutions cannot afford to depend on an internet connection to perform its function – e.g., factory automation, vehicles, building automation, etc.). Applications run where the data is, independent of the network node Heterogeneous Compute is distributed into the network Compute Storage and Acceleration rather than being centered at either end needs distributed Meeting power, latency, cost and performance targets according to use cases being met. Networks and Compute resources are both managed and configured using standard IT technologies
  • #7 This is Eric’s
  • #8 sideARMs would give accelerators “smarts” if they don’t already. Provide TLB services and the like, also allow for extensible event protocols --Add animation of potential flow for Ceph I guess. Take out split-stack architecture for now. It may be something to look at during research. Make this more detailed. Show queues, etc.
  • #9 A light weight sand box is envisioned as some sort of fast-pathed virtualization, primarily to enforce memory usage, bandwidth, cache usage and device resources. This may be some form of unikernel para-virtualization, assisted by ioARMs, but really don’t know until we try to build something with these ideas. *HAND WAVE* Need some type of full control of a core though to disable interrupts and take full control from Linux. Accelerators are incorporated by using a HAL like ODP and using system introspection on startup of an event handling app.