Socionext ARM Server Solution
9 March, 2017
High Performance SoC Business Unit
Socionext Inc.
1
Product and Target System
Copyright 2017
Power Efficient
Processor
SC2A11
Scalable
SoC Switch
SC2A20
Product
(Chip)
Low Power
Scalable Server
System Integration
as Proof-Of-Concept
[Data Servers]
Light Weight Parallel Server
(E-Commerce/Web Service)
Storage Server
Index Server
Target Application System
Already available Under evaluation
2
1. Product Introduction
• Power Efficient Multi-core Processor SC2A11
• Scalable SoC Switch SC2A20
2. System Integration as Proof-Of-Concept(POC)
3. Target Application System Consideration
4. Software Development Status
Overview
Copyright 2017
3
1. Product Introduction
 Power Efficient MultiCore Processor SC2A11
 Scalable SoC Switch SC2A20
Copyright 2017
4
 Huge cost and energy spent in data centers is a big issue
as IoT deployment accelerates.
 Socionext will solve it by our own Scalable Small Core Technology.
Consideration of Green Computing
Copyright 2017
Absolutely Huge and Fixed Power Consumption Relatively less and Optimized Power Consumption
5
 Virtualization sounds a
good solution to utilize
overwhelming computing
power of current SoCs.
 But, this implies current
server SoCs have much
more computing power
than required with power
consumption sacrificed.
 Socionext addresses this
dilemma by introducing
"many power-efficient CPU
cores".
Return to Physical Multi-Core
Copyright 2017
CPUCPU
User
OS
Virtualization by CPU
CoreCore
User
Application
Virtualization by OS
Scheduler
OS
Scheduler
Distribution by OS
Core CoreCore
Core CoreCoreCore
Application
Traditional servers Socionext proposal
6
Socionext designed SC2A11/20 based on following key concepts.
 High Power Efficiency
 Moderate Frequency Target 1GHz
 Small and Multi-core Cache Coherent Cortex-A53x24
 Scalable Performance
 Low Latency Interconnect Socionext DDT Technology
 Wide Bandwidth Local Memory 2ch DDR4@2133Mbps
 Server class RAS Feature *RAS:Reliability, Availability and Serviceability
 Independent Monitor/Control System Management Block
 System Level Operation Maintenance Network
Design Target
Copyright 2017
7
Key components - Processor and Switch
Copyright 2017
Processor Cortex-A53 MPCore @1GHz x 24cores, L1 I/D=32KB/32KB, L2=256KB, L3=4MB
Memory I/F 2ch 64bit DDR4@2133Mbps with ECC support (up to 64GB)
PCIe 2ch 4lane PCI express gen2 I/F for processor interconnect and/or peripheral
extension
LAN 2ch Gigabit Ethernet MAC with IPSec Network Offload Engine
Storage I/F SPI, eMMC
Serial I/F UART, I2C, GPIO
Memory I/F 32bit DDR3@1333Mbps with ECC support
PCIe 9ch 4lane PCI express gen2 I/F for processor interconnect
LAN 2ch Gigabit Ethernet MAC with IPSec Network Offload Engine
Storage I/F SPI, eMMC, SATA, USB3
Serial I/F UART, I2C, GPIO
SC2A11
Small Multicore Processor
SC2A20
SoC Switch
8
SC2A11: Design Goal
Copyright 2017
High
Performance
Processor
Frequency
Higher Frequency by
High drivability transistors Big leakage
Power
Consumption
Big and Fast
Cache Memory
Bandwidth Requirement
Additional Feature
(for Linear performance
uplift)
Out of Order Execution
Speculative Execution
Multiple Core/Pipeline
Large and
Complicated Logic
Big Core
Processor
High Speed Interconnect
Optimized
Performance
Processor
Frequency
Optimized Frequency by
Normal/Small transistors Less leakage
Power
ConsumptionLess Cache MemoryBandwidth Requirement
Additional Feature Not Required Not Needed
Small Core
Processor
Socionext
SC2A11
Frequency
Optimized Frequency by
Normal/Small transistors Less leakage
Power
Consumption
Optimized
Cache Memory
Bandwidth Requirement
Additional Feature
High Speed Interconnect
Simple Logic
Scalable
Small Core
ProcessorMultiple Core
9
Performance Comparison
Copyright 2017
Intel
Socionext
440
71
38.4GB/s
5GB/s
4x2133Mbps
2x2133Mbps
105W
4.7W
16%
7.7%
50%
4.5%
CPU
Performance
Chip
Interconnect
Bandwidth
Memory
Bandwidth
Power
Consumption
x3.6
Power Efficient
x3.1
Wider Bandwidth
per Performance
Less than 5W
Absolutely Low
Power
Consumption
Reduce Total System Cost
Minimizes efforts
for Power and Cooling
Optimized Performance
Better Software Performance
Wide areas for any purpose
Relatively Low Latency DDR
from Software/CPU core
Less Integration Restriction
Small Power Supply
Less Cooling Devices
Intel 100% Customer BenefitsOur AdvantageComparison Factors
10
2. System Integration as
Proof-Of-Concept(POC)
Copyright 2017
11
Low Power Scalable Server
Copyright 2017
(*1) PEC: Processor Element Card (1xSC2A11)
(*2) SBB: System Bridge Board (1xSC2A20)
(*3) SBB-R: Root System Bridge Board (1xSC2A20)
 24 x CA53 cores on PEC(*1)
 8 PECs connected via SBB(*2)
 8 SBBs connected via SBB-R(*3)
Total 1536 cores (24 x 8 x 8)
12
PEC: Processor Element Card
Copyright 2017
64GB eMMC (*2) as local storage
4 x 16GB DDR4-2133 RDIMM
= 64GB(*1) DRAM per node
 SC2A11-based minimum compute element
 Operates as independent Linux server node
SC2A11
(*1) can be optimized from 4GB to 64GB depending on target application system
(*2) can be upgraded to SATA HDD/SSD
composite external interface to SBB
(PCIe interconnect included)
13
 PCI express based wideband low-latency interconnect
 Connects up to 64 PECs to form a single cluster system
Socionext Direct Data Transaction(DDT) Technology
Copyright 2017
System Bridge Board:
SBB
Root System Bridge Board:
SBB-R
Switch SoC SC2A20 connects
8 x PECs via PCIexpress
Switch SoC SC2A20 connects
8 x SBBs via PCIexpress
SBB-R
8 x SBBs
14
System Level Performance Comparison ("8xPECs + SBB" case)
Copyright 2017
Intel
Socionext
440
568
38.4GB/s
40GB/s
4x2133Mbps
16x2133Mbps
105W
37W
130%
104%
400%
35%
CPU
Performance
Chip
Interconnect
Bandwidth
Memory
Bandwidth
Power
Consumption
+30% Performance
Competing Bandwidth
x8 Wider Memory Bandwidth
with -65% Power Consumption
Intel 100%Comparison Factors
15
Maintenance & System Management Feature
Copyright 2017
 SC2A11 & SC2A20 are both
equipped with a System
Management Block (SMB)
 SMBs communicate autonomously
via dedicated Gigabit Ethernet
 SMB provides
 secure boot/execution environment
 system monitoring/RAS
 system maintenance operation (e.g.
hot-swap)
16
3. Target Application System
Consideration
Copyright 2017
17
SC2A11/SC2A20-based scalable solution covers a wide range of application fields
Copyright 2017
RACK
(17-64 PECs)
E-Commerce Search
and
Index/
Database
Big
NAS server
4k64ch
Server
2k256ch
Server
UNIT
(2-16 PECs)
Web/Mail
/Office Apps/
Cache Server
Hosting
and
Web
Service
Small
NAS server
Secure
Router
(Packet
Inspection)
4k8ch
Unit
2k32ch
Unit
Single PEC N.A. N.A. Single NAS Home
Router
(Wired or
Wireless)
UTM
(Universal
Transcode
Module)
UTM
(Universal
Transcode
Module)
Apps
Style
Enterprise/
Local service
Public
Cloud
Storage Network
/Security
Streaming
/Transcode
Surveillance
and AI
Gaming
and Edge
Server
Gaming and
Edge
Offload
Unit
N.A.
Edge and
Application
ServerType Data Servers Media Servers
Edge
Servers
System
Gateway
/Master
Server
Area
Gateway
/Local
Server
Node
Compute
Engine
Industry4.0
TYPE-A
TYPE-A TYPE-A
TYPE-C TYPE-B
TYPE-B
TYPE-B
TYPE-BTYPE-A
18
Storage/DRAM size consideration
Copyright 2017
TYPE Typical Application
DRAM
per PEC
Total system
DRAM
Storage
System
Density
A
Light Weight
Parallel Application
(E-Commerce
/Web Hosting)
16-
32GB
up to 2TB Any combination of
(1) per-PEC eMMC
(2) per-SBB SATA SSD
(3) per-PEC external
storage(via PCIe)
32PEC/2U or
64PEC/4U
B Storage Server 4GB up to 256GB
32PEC/1U or
64PEC/2U
C
Index Server
(Hadoop/Spark)
32-
64GB
up to 4TB
16PEC/2U or
64PEC/8U
19
4. Software Development Status
Copyright 2017
20
Software Development Status [1/2]
Copyright 2017
Name Description Notes
CA53 Firmware ARM Trusted Firmware based
UEFI(edk2)
Based on linaro-edk2-baseline-2016.03, add
• Ethernet driver
• eMMC driver
• Variable storage driver for eMMC
Linux
Based on kernel v4.5, add
• Ethernet driver
• PCIe based interconnect driver (*1)
• eMMC driver
• SPI NOR flash driver
• PCIe root complex driver
• driver for SATA host on SBB (*1)
• driver for USB3 host on SBB (*1)
ACPI handling is temporarily
disabled(acpi=off) because we are not
sure ACPI is capable of describing driver
attributes as detailedly as device tree. So
we still use device tree for hardware
description.
(*1) requires SBB/SBB-R boards for it to
operate.
(*1) works only for PEC on slot #0
21
Software Development Status [2/2]
Copyright 2017
Name Description Notes
Root FS CentOS 7 for aarch64 successfully boots.
Application
• Hadoop is already operational (*2).
• Baremetal OpenStack client is under
development.
(*2) requires SBB/SBB-R boards for it to
operate effectively.
BUD17 Socionext SC2A11 ARM Server SoC

BUD17 Socionext SC2A11 ARM Server SoC

  • 1.
    Socionext ARM ServerSolution 9 March, 2017 High Performance SoC Business Unit Socionext Inc.
  • 2.
    1 Product and TargetSystem Copyright 2017 Power Efficient Processor SC2A11 Scalable SoC Switch SC2A20 Product (Chip) Low Power Scalable Server System Integration as Proof-Of-Concept [Data Servers] Light Weight Parallel Server (E-Commerce/Web Service) Storage Server Index Server Target Application System Already available Under evaluation
  • 3.
    2 1. Product Introduction •Power Efficient Multi-core Processor SC2A11 • Scalable SoC Switch SC2A20 2. System Integration as Proof-Of-Concept(POC) 3. Target Application System Consideration 4. Software Development Status Overview Copyright 2017
  • 4.
    3 1. Product Introduction Power Efficient MultiCore Processor SC2A11  Scalable SoC Switch SC2A20 Copyright 2017
  • 5.
    4  Huge costand energy spent in data centers is a big issue as IoT deployment accelerates.  Socionext will solve it by our own Scalable Small Core Technology. Consideration of Green Computing Copyright 2017 Absolutely Huge and Fixed Power Consumption Relatively less and Optimized Power Consumption
  • 6.
    5  Virtualization soundsa good solution to utilize overwhelming computing power of current SoCs.  But, this implies current server SoCs have much more computing power than required with power consumption sacrificed.  Socionext addresses this dilemma by introducing "many power-efficient CPU cores". Return to Physical Multi-Core Copyright 2017 CPUCPU User OS Virtualization by CPU CoreCore User Application Virtualization by OS Scheduler OS Scheduler Distribution by OS Core CoreCore Core CoreCoreCore Application Traditional servers Socionext proposal
  • 7.
    6 Socionext designed SC2A11/20based on following key concepts.  High Power Efficiency  Moderate Frequency Target 1GHz  Small and Multi-core Cache Coherent Cortex-A53x24  Scalable Performance  Low Latency Interconnect Socionext DDT Technology  Wide Bandwidth Local Memory 2ch DDR4@2133Mbps  Server class RAS Feature *RAS:Reliability, Availability and Serviceability  Independent Monitor/Control System Management Block  System Level Operation Maintenance Network Design Target Copyright 2017
  • 8.
    7 Key components -Processor and Switch Copyright 2017 Processor Cortex-A53 MPCore @1GHz x 24cores, L1 I/D=32KB/32KB, L2=256KB, L3=4MB Memory I/F 2ch 64bit DDR4@2133Mbps with ECC support (up to 64GB) PCIe 2ch 4lane PCI express gen2 I/F for processor interconnect and/or peripheral extension LAN 2ch Gigabit Ethernet MAC with IPSec Network Offload Engine Storage I/F SPI, eMMC Serial I/F UART, I2C, GPIO Memory I/F 32bit DDR3@1333Mbps with ECC support PCIe 9ch 4lane PCI express gen2 I/F for processor interconnect LAN 2ch Gigabit Ethernet MAC with IPSec Network Offload Engine Storage I/F SPI, eMMC, SATA, USB3 Serial I/F UART, I2C, GPIO SC2A11 Small Multicore Processor SC2A20 SoC Switch
  • 9.
    8 SC2A11: Design Goal Copyright2017 High Performance Processor Frequency Higher Frequency by High drivability transistors Big leakage Power Consumption Big and Fast Cache Memory Bandwidth Requirement Additional Feature (for Linear performance uplift) Out of Order Execution Speculative Execution Multiple Core/Pipeline Large and Complicated Logic Big Core Processor High Speed Interconnect Optimized Performance Processor Frequency Optimized Frequency by Normal/Small transistors Less leakage Power ConsumptionLess Cache MemoryBandwidth Requirement Additional Feature Not Required Not Needed Small Core Processor Socionext SC2A11 Frequency Optimized Frequency by Normal/Small transistors Less leakage Power Consumption Optimized Cache Memory Bandwidth Requirement Additional Feature High Speed Interconnect Simple Logic Scalable Small Core ProcessorMultiple Core
  • 10.
    9 Performance Comparison Copyright 2017 Intel Socionext 440 71 38.4GB/s 5GB/s 4x2133Mbps 2x2133Mbps 105W 4.7W 16% 7.7% 50% 4.5% CPU Performance Chip Interconnect Bandwidth Memory Bandwidth Power Consumption x3.6 PowerEfficient x3.1 Wider Bandwidth per Performance Less than 5W Absolutely Low Power Consumption Reduce Total System Cost Minimizes efforts for Power and Cooling Optimized Performance Better Software Performance Wide areas for any purpose Relatively Low Latency DDR from Software/CPU core Less Integration Restriction Small Power Supply Less Cooling Devices Intel 100% Customer BenefitsOur AdvantageComparison Factors
  • 11.
    10 2. System Integrationas Proof-Of-Concept(POC) Copyright 2017
  • 12.
    11 Low Power ScalableServer Copyright 2017 (*1) PEC: Processor Element Card (1xSC2A11) (*2) SBB: System Bridge Board (1xSC2A20) (*3) SBB-R: Root System Bridge Board (1xSC2A20)  24 x CA53 cores on PEC(*1)  8 PECs connected via SBB(*2)  8 SBBs connected via SBB-R(*3) Total 1536 cores (24 x 8 x 8)
  • 13.
    12 PEC: Processor ElementCard Copyright 2017 64GB eMMC (*2) as local storage 4 x 16GB DDR4-2133 RDIMM = 64GB(*1) DRAM per node  SC2A11-based minimum compute element  Operates as independent Linux server node SC2A11 (*1) can be optimized from 4GB to 64GB depending on target application system (*2) can be upgraded to SATA HDD/SSD composite external interface to SBB (PCIe interconnect included)
  • 14.
    13  PCI expressbased wideband low-latency interconnect  Connects up to 64 PECs to form a single cluster system Socionext Direct Data Transaction(DDT) Technology Copyright 2017 System Bridge Board: SBB Root System Bridge Board: SBB-R Switch SoC SC2A20 connects 8 x PECs via PCIexpress Switch SoC SC2A20 connects 8 x SBBs via PCIexpress SBB-R 8 x SBBs
  • 15.
    14 System Level PerformanceComparison ("8xPECs + SBB" case) Copyright 2017 Intel Socionext 440 568 38.4GB/s 40GB/s 4x2133Mbps 16x2133Mbps 105W 37W 130% 104% 400% 35% CPU Performance Chip Interconnect Bandwidth Memory Bandwidth Power Consumption +30% Performance Competing Bandwidth x8 Wider Memory Bandwidth with -65% Power Consumption Intel 100%Comparison Factors
  • 16.
    15 Maintenance & SystemManagement Feature Copyright 2017  SC2A11 & SC2A20 are both equipped with a System Management Block (SMB)  SMBs communicate autonomously via dedicated Gigabit Ethernet  SMB provides  secure boot/execution environment  system monitoring/RAS  system maintenance operation (e.g. hot-swap)
  • 17.
    16 3. Target ApplicationSystem Consideration Copyright 2017
  • 18.
    17 SC2A11/SC2A20-based scalable solutioncovers a wide range of application fields Copyright 2017 RACK (17-64 PECs) E-Commerce Search and Index/ Database Big NAS server 4k64ch Server 2k256ch Server UNIT (2-16 PECs) Web/Mail /Office Apps/ Cache Server Hosting and Web Service Small NAS server Secure Router (Packet Inspection) 4k8ch Unit 2k32ch Unit Single PEC N.A. N.A. Single NAS Home Router (Wired or Wireless) UTM (Universal Transcode Module) UTM (Universal Transcode Module) Apps Style Enterprise/ Local service Public Cloud Storage Network /Security Streaming /Transcode Surveillance and AI Gaming and Edge Server Gaming and Edge Offload Unit N.A. Edge and Application ServerType Data Servers Media Servers Edge Servers System Gateway /Master Server Area Gateway /Local Server Node Compute Engine Industry4.0 TYPE-A TYPE-A TYPE-A TYPE-C TYPE-B TYPE-B TYPE-B TYPE-BTYPE-A
  • 19.
    18 Storage/DRAM size consideration Copyright2017 TYPE Typical Application DRAM per PEC Total system DRAM Storage System Density A Light Weight Parallel Application (E-Commerce /Web Hosting) 16- 32GB up to 2TB Any combination of (1) per-PEC eMMC (2) per-SBB SATA SSD (3) per-PEC external storage(via PCIe) 32PEC/2U or 64PEC/4U B Storage Server 4GB up to 256GB 32PEC/1U or 64PEC/2U C Index Server (Hadoop/Spark) 32- 64GB up to 4TB 16PEC/2U or 64PEC/8U
  • 20.
    19 4. Software DevelopmentStatus Copyright 2017
  • 21.
    20 Software Development Status[1/2] Copyright 2017 Name Description Notes CA53 Firmware ARM Trusted Firmware based UEFI(edk2) Based on linaro-edk2-baseline-2016.03, add • Ethernet driver • eMMC driver • Variable storage driver for eMMC Linux Based on kernel v4.5, add • Ethernet driver • PCIe based interconnect driver (*1) • eMMC driver • SPI NOR flash driver • PCIe root complex driver • driver for SATA host on SBB (*1) • driver for USB3 host on SBB (*1) ACPI handling is temporarily disabled(acpi=off) because we are not sure ACPI is capable of describing driver attributes as detailedly as device tree. So we still use device tree for hardware description. (*1) requires SBB/SBB-R boards for it to operate. (*1) works only for PEC on slot #0
  • 22.
    21 Software Development Status[2/2] Copyright 2017 Name Description Notes Root FS CentOS 7 for aarch64 successfully boots. Application • Hadoop is already operational (*2). • Baremetal OpenStack client is under development. (*2) requires SBB/SBB-R boards for it to operate effectively.