Cerebras Systems © 2020
Supercomputer-Scale AI with Cerebras Systems
A Hock
RIKEN R-CCS 2nd International Symposium
18 February 2020
Cerebras Systems © 2020
AI has massive potential,
but is compute-limited today.
Cerebras Systems © 2020
AI has massive potential
From advertising to autonomy,
Commercial applications to cancer research,
Manufacturing to modeling and simulation for basic science.
AI has massive potential to change the way we work and live.
...for Society 5.0 and beyond.
Cerebras Systems © 2020
...but AI is compute limited today
Researchers continue to make great progress with deeper models and more data.
But model training often takes days, weeks, or more.
This is expensive, constrains research, limits innovation and time to market.
We need 1000x compute. And the challenge is growing.
Cerebras Systems © 2020
AI has massive potential,
but is compute-limited today.
We need a new compute solution
to accelerate deep learning.
Cerebras Systems © 2020
Enter Cerebras Systems
Cerebras Systems © 2020
The right solution for AI compute
Many cores optimized for sparse linear algebra
Memory tightly coupled to compute
High bandwidth communication
Programmable with today’s ML frameworks
Cerebras Systems © 2020
The Cerebras Wafer-Scale Engine
The world’s largest chip and most
powerful AI engine.
Designed from the ground-up to deliver
orders of magnitude performance gain for
deep learning.
- 215 x 215 mm, 1.2 trillion transistor chip
- 400,000 cores
- 18 GB on-chip SRAM
- 100 Pb/s interconnect
Cerebras Systems © 2020
Flexible cores optimized for tensor operations
Fully programmable compute core
Full array of general instructions with ML extensions
Flexible general ops for control processing
- e.g. arithmetic, logical, load/store, branch
Optimized tensor ops for data processing
- fmac [z] = [z], [w], a
- 3D 3D 2D scalar
Sparse compute engine for neural networks
- Dataflow-triggered computation
- Filters out zero data → skips unnecessary processing
- Higher performance and efficiency for sparse NN
Cerebras Systems © 2020
AI-optimized memory architecture
Traditional memory designs not optimal
- Central shared memory is slow & far away
- Requires large batches to drive utilization
The right answer is high performance, on-chip memory
- All memory is fully distributed along with compute datapaths
- Datapath has full performance from memory
- Full utilization down to batch 1
Cerebras Systems © 2020
The Cerebras Wafer-Scale Engine
Massive compute array, configurable high
bandwidth 2D mesh → compile and
compute all layers simultaneously.
- Model parallelism on a single chip
- Cluster-scale performance,
programmable as a single node
The challenge: how do we put this in your
hands and make it easy to use?
Cerebras Systems © 2020
The Cerebras Software Platform
Our software stack makes the Wafer-Scale Engine easy to use:
→ Programmable with today’s ML frameworks
→ Library of high performance DL ops
→ Customizable and extensible for other applications with flexible lower level APIs
Cerebras Systems © 2020
The Cerebras CS-1
The world’s most powerful AI computer
A full solution in a single system:
- Powered by the WSE
- Programmable via TF, other frameworks
- Install, deploy easily into a standard rack
15 RU standard rack-compliant server
1.2 Tbps I/O via 12x100GbE
20 kW power, air-cooled
Cerebras Systems © 2020
Cerebras Systems © 2020
The Cerebras CS-1
Packing the performance of a cluster
into a 15RU chassis wasn’t easy.
Requires systems-level thinking, new
invention and engineering for e.g.
- Packaging
- Power
- Cooling
- I/O
Let’s take a peek under the hood.
Cerebras Systems © 2020
Cerebras Systems © 2020
CS-1 System View
Cerebras Systems © 2020
Cerebras Systems © 2020
Proud to introduce RIKEN RCCS to the Cerebras CS-1, the world’s most
powerful deep learning solution.
Concluding remarks
Cerebras Systems © 2020
Proud to introduce RIKEN RCCS to the Cerebras CS-1, the world’s most
powerful deep learning solution.
Built from the ground up to accelerate deep learning by orders of
magnitude and empower AI and HPC researchers to do more, faster.
Concluding remarks
Cerebras Systems © 2020
Proud to introduce RIKEN RCCS to the Cerebras CS-1, the world’s most
powerful deep learning solution.
Built from the ground up to accelerate deep learning by orders of
magnitude and empower AI and HPC researchers to do more, faster.
WSE, CS-1, Software up and running real customer workloads
today, all the way from TF.
- Already accelerating AI at SC scale for science and health at ANL!
- And more soon...
Concluding remarks
Cerebras Systems © 2020
Proud to introduce RIKEN RCCS to the Cerebras CS-1, the world’s most
powerful deep learning solution.
Built from the ground up to accelerate deep learning by orders of
magnitude and empower AI and HPC researchers to do more, faster.
WSE, CS-1, Software up and running real customer workloads
today, all the way from TF.
- Already accelerating AI at SC scale for science and health at ANL!
- And more soon...
Call to action: bring us your big HPC & AI problems, system and
partnership interests. Can’t wait to work together.
Thank you!
Concluding remarks

13 Supercomputer-Scale AI with Cerebras Systems

  • 1.
    Cerebras Systems ©2020 Supercomputer-Scale AI with Cerebras Systems A Hock RIKEN R-CCS 2nd International Symposium 18 February 2020
  • 2.
    Cerebras Systems ©2020 AI has massive potential, but is compute-limited today.
  • 3.
    Cerebras Systems ©2020 AI has massive potential From advertising to autonomy, Commercial applications to cancer research, Manufacturing to modeling and simulation for basic science. AI has massive potential to change the way we work and live. ...for Society 5.0 and beyond.
  • 4.
    Cerebras Systems ©2020 ...but AI is compute limited today Researchers continue to make great progress with deeper models and more data. But model training often takes days, weeks, or more. This is expensive, constrains research, limits innovation and time to market. We need 1000x compute. And the challenge is growing.
  • 5.
    Cerebras Systems ©2020 AI has massive potential, but is compute-limited today. We need a new compute solution to accelerate deep learning.
  • 6.
    Cerebras Systems ©2020 Enter Cerebras Systems
  • 7.
    Cerebras Systems ©2020 The right solution for AI compute Many cores optimized for sparse linear algebra Memory tightly coupled to compute High bandwidth communication Programmable with today’s ML frameworks
  • 8.
    Cerebras Systems ©2020 The Cerebras Wafer-Scale Engine The world’s largest chip and most powerful AI engine. Designed from the ground-up to deliver orders of magnitude performance gain for deep learning. - 215 x 215 mm, 1.2 trillion transistor chip - 400,000 cores - 18 GB on-chip SRAM - 100 Pb/s interconnect
  • 9.
    Cerebras Systems ©2020 Flexible cores optimized for tensor operations Fully programmable compute core Full array of general instructions with ML extensions Flexible general ops for control processing - e.g. arithmetic, logical, load/store, branch Optimized tensor ops for data processing - fmac [z] = [z], [w], a - 3D 3D 2D scalar Sparse compute engine for neural networks - Dataflow-triggered computation - Filters out zero data → skips unnecessary processing - Higher performance and efficiency for sparse NN
  • 10.
    Cerebras Systems ©2020 AI-optimized memory architecture Traditional memory designs not optimal - Central shared memory is slow & far away - Requires large batches to drive utilization The right answer is high performance, on-chip memory - All memory is fully distributed along with compute datapaths - Datapath has full performance from memory - Full utilization down to batch 1
  • 11.
    Cerebras Systems ©2020 The Cerebras Wafer-Scale Engine Massive compute array, configurable high bandwidth 2D mesh → compile and compute all layers simultaneously. - Model parallelism on a single chip - Cluster-scale performance, programmable as a single node The challenge: how do we put this in your hands and make it easy to use?
  • 12.
    Cerebras Systems ©2020 The Cerebras Software Platform Our software stack makes the Wafer-Scale Engine easy to use: → Programmable with today’s ML frameworks → Library of high performance DL ops → Customizable and extensible for other applications with flexible lower level APIs
  • 13.
    Cerebras Systems ©2020 The Cerebras CS-1 The world’s most powerful AI computer A full solution in a single system: - Powered by the WSE - Programmable via TF, other frameworks - Install, deploy easily into a standard rack 15 RU standard rack-compliant server 1.2 Tbps I/O via 12x100GbE 20 kW power, air-cooled Cerebras Systems © 2020
  • 14.
    Cerebras Systems ©2020 The Cerebras CS-1 Packing the performance of a cluster into a 15RU chassis wasn’t easy. Requires systems-level thinking, new invention and engineering for e.g. - Packaging - Power - Cooling - I/O Let’s take a peek under the hood. Cerebras Systems © 2020
  • 15.
    Cerebras Systems ©2020 CS-1 System View Cerebras Systems © 2020
  • 16.
    Cerebras Systems ©2020 Proud to introduce RIKEN RCCS to the Cerebras CS-1, the world’s most powerful deep learning solution. Concluding remarks
  • 17.
    Cerebras Systems ©2020 Proud to introduce RIKEN RCCS to the Cerebras CS-1, the world’s most powerful deep learning solution. Built from the ground up to accelerate deep learning by orders of magnitude and empower AI and HPC researchers to do more, faster. Concluding remarks
  • 18.
    Cerebras Systems ©2020 Proud to introduce RIKEN RCCS to the Cerebras CS-1, the world’s most powerful deep learning solution. Built from the ground up to accelerate deep learning by orders of magnitude and empower AI and HPC researchers to do more, faster. WSE, CS-1, Software up and running real customer workloads today, all the way from TF. - Already accelerating AI at SC scale for science and health at ANL! - And more soon... Concluding remarks
  • 19.
    Cerebras Systems ©2020 Proud to introduce RIKEN RCCS to the Cerebras CS-1, the world’s most powerful deep learning solution. Built from the ground up to accelerate deep learning by orders of magnitude and empower AI and HPC researchers to do more, faster. WSE, CS-1, Software up and running real customer workloads today, all the way from TF. - Already accelerating AI at SC scale for science and health at ANL! - And more soon... Call to action: bring us your big HPC & AI problems, system and partnership interests. Can’t wait to work together. Thank you! Concluding remarks