SlideShare a Scribd company logo
1 of 20
Download to read offline
Qualcomm Centriq™ 2400 Processor
Barry Wolford, Senior Director, Engineering
Thomas Speier, Senior Director, Engineering
Dileep Bhandarkar, Vice President, Technology
Qualcomm Datacenter Technologies, Inc.
August 22, 2017
@qualcomm Qualcomm Centriq 2400 Processor is a product of Qualcomm Datacenter Technologies, Inc..
2
•Qualcomm Datacenter Technologies Introduction
•Qualcomm® Falkor™ CPU Overview
•Qualcomm CentriqTM 2400 Server SoC Overview
•Summary
Agenda
Qualcomm Falkor CPU is a product of Qualcomm Datacenter Technologies, Inc.
3
QDT Well Positioned to Address Cloud Datacenter Opportunity
Bringing decade of experience delivering high-performance, power-
efficient ARM CPU architectures
Focus on true server class features and performance with
aggressive power management techniques
Partnering with cloud market leaders for product definition
Uniquely positioned to leverage process leadership driven by mobile
industry growth to deliver industry first 10 nm server processor
Unique High Performance, Low Power ARM Based CPUs
4
• QDT-designed custom core powering Qualcomm Centriq
2400 Processor
• 5th generation custom core design
• Designed from the ground up to meet the needs of cloud
service providers
• Fully ARMv8-compliant
• AArch64 only
• Supports EL3 (TrustZone) and EL2 (hypervisor)
• Includes optional cryptography acceleration instructions
◦ AES, SHA1, SHA2-256
• Designed for performance, optimized for power
Qualcomm Falkor™ CPU Designed for the Cloud
5
• Falkor core duplex is building block for SoC
• Two Custom ARM V8 CPUs
• Shared L2 Cache
• Nominal Operating Voltage ~1V
• Shared bus interface to Qualcomm® System Bus
(QSB) ring interconnect
 Qualcomm Proprietary Protocol
 Custom Bi-Directional Segmented Ring Bus
 Fully Coherent (Cache & IO)
 Shortest Path Routing
 Multicast on Read
 > 250 GB/s aggregate bandwidth
Falkor Core configuration
Falkor
ARMv8
Core
Falkor
ARMv8
Core
L2 cache
Ring bus interface
Power Control
Falkor duplex
Qualcomm System Bus is a product of Qualcomm Datacenter Technologies, Inc.
6
• 128-byte lines, 8-way
• Unified between I-side and D-side
• Shared between two CPUs in duplex
• 128-byte interleaved for improved throughput
• SEC-DED ECC protected
• 15-cycle minimum latency for L2 hit
• Inclusive of L1 D-caches
• 32-bytes per direction per interleave per cycle
Falkor L2 Cache
Falkor
ARMv8
Core
Falkor
ARMv8
Core
L2 cache
Ring bus interface
Power Control
Falkor duplex
7
• Heterogeneous pipeline providing optimal
performance per unit power
◦ Variable-length pipelines tuned per function
◦ Minimizes idle hardware
• 4-issue
◦ 3 instructions + 1 direct branch
• 8-dispatch
Falkor CPU
F1
F2
F3
IQ
EXPAND
REN-0
RACC-0
VXBOOK
VXRSV
VX1
VX2
VX3
VX4
VX5
VX6
VYBOOK
VYRSV
VY1
VY2
VY3
VY4
VY5
VY6
BBOOK
BRSV
B1
ZBOOK
ZRSV
Z1
Z2
YBOOK
YRSV
Y1
Y2
XBOOK
XRSV
X1
X2
X3
X4
LSBOOK
LSRSV
ST1
ST2
ST3
ST4
LD1
LD2
LD3
LD4
REN-1
RACC-1
REN-2
RACC-2
REN-BR
RACC-BR
Branch
Predictor
L0 I-cacheL1 I-cache
L1 D-cache
Falkor
ARMv8
Core
Falkor
ARMv8
Core
L2 cache
Ring bus interface
Power Control
Falkor duplex
8
F1
F2
F3
IQ
EXPAND
REN-0
RACC-0
VXBOOK
VXRSV
VX1
VX2
VX3
VX4
VX5
VX6
VYBOOK
VYRSV
VY1
VY2
VY3
VY4
VY5
VY6
BBOOK
BRSV
B1
ZBOOK
ZRSV
Z1
Z2
YBOOK
YRSV
Y1
Y2
XBOOK
XRSV
X1
X2
X3
X4
LSBOOK
LSRSV
ST1
ST2
ST3
ST4
LD1
LD2
LD3
LD4
REN-1
RACC-1
REN-2
RACC-2
REN-BR
RACC-BR
Branch
Predictor
L0 I-cacheL1 I-cache
L1 D-cache
Branch Prediction
• 0-1 cycle penalty for almost all predicted taken
branches
• 16-entry BTIC (branch target instruction cache)
◦ Supports 0-cycle branch penalty
• Multi-level BTAC (branch target address cache)
for indirect branches
◦ 16-entry level-0 BTAC
◦ 256-entry level-1 BTAC
◦ PC-relative branches utilize I-cache as BTAC
• 16-entry link stack
• Multi-level BHT (branch history table)
◦ Multi-faceted scheme involving staged
predictors
9
F1
F2
F3
IQ
EXPAND
REN-0
RACC-0
VXBOOK
VXRSV
VX1
VX2
VX3
VX4
VX5
VX6
VYBOOK
VYRSV
VY1
VY2
VY3
VY4
VY5
VY6
BBOOK
BRSV
B1
ZBOOK
ZRSV
Z1
Z2
YBOOK
YRSV
Y1
Y2
XBOOK
XRSV
X1
X2
X3
X4
LSBOOK
LSRSV
ST1
ST2
ST3
ST4
LD1
LD2
LD3
LD4
REN-1
RACC-1
REN-2
RACC-2
REN-BR
RACC-BR
Branch
Predictor
L0 I-cacheL1 I-cache
L1 D-cache
Instruction Fetch
• Two-level I-cache topology
◦ Key element in performance and
performance/power efficiency advantage
◦ L0 and L1 caches are exclusive
• L0 I-cache
◦ 24KB, 64-byte lines, 3-way
◦ Way-predicted
◦ Parity with auto-correct
◦ 0-cycle penalty for L0 hit
• L1 I-cache
◦ 64KB, 64-byte lines, 8-way
◦ Parity with auto-correct
◦ 4-cycle penalty for L0 miss / L1 hit
◦ Hardware prefetch on L1 miss
• Fetches up to 4 instructions per cycle
◦ Fetch group can span cache lines
• Instructions are decoded and expanded into
micro-ops
◦ Most instructions map to a single micro-op
10
F1
F2
F3
IQ
EXPAND
REN-0
RACC-0
VXBOOK
VXRSV
VX1
VX2
VX3
VX4
VX5
VX6
VYBOOK
VYRSV
VY1
VY2
VY3
VY4
VY5
VY6
BBOOK
BRSV
B1
ZBOOK
ZRSV
Z1
Z2
YBOOK
YRSV
Y1
Y2
XBOOK
XRSV
X1
X2
X3
X4
LSBOOK
LSRSV
ST1
ST2
ST3
ST4
LD1
LD2
LD3
LD4
REN-1
RACC-1
REN-2
RACC-2
REN-BR
RACC-BR
Branch
Predictor
L0 I-cacheL1 I-cache
L1 D-cache
Rename (REN), Register Access (RACC), and Reserve (RSV)
• 256-entry rename/completion buffer
• 76-instruction dispatch window
• Up to 128 uncommitted instructions in flight
◦ Additional committed instructions may still
be waiting on retirement
• Out-of-order dispatch of branches, ALU
operations, loads, stores
• Up to 4 instructions retired per cycle
11
F1
F2
F3
IQ
EXPAND
REN-0
RACC-0
VXBOOK
VXRSV
VX1
VX2
VX3
VX4
VX5
VX6
VYBOOK
VYRSV
VY1
VY2
VY3
VY4
VY5
VY6
BBOOK
BRSV
B1
ZBOOK
ZRSV
Z1
Z2
YBOOK
YRSV
Y1
Y2
XBOOK
XRSV
X1
X2
X3
X4
LSBOOK
LSRSV
ST1
ST2
ST3
ST4
LD1
LD2
LD3
LD4
REN-1
RACC-1
REN-2
RACC-2
REN-BR
RACC-BR
Branch
Predictor
L0 I-cacheL1 I-cache
L1 D-cache
Integer and Branch Execution
• Heterogeneous execution units for
integer ALU operations and branches
• Pipeline length sized based on
operation
Operation B-pipe X-pipe Y-pipe Z-pipe
Direct branch Y
Indirect branch Y
Simple ALU Y Y Y
Multiplies Y
12
F1
F2
F3
IQ
EXPAND
REN-0
RACC-0
VXBOOK
VXRSV
VX1
VX2
VX3
VX4
VX5
VX6
VYBOOK
VYRSV
VY1
VY2
VY3
VY4
VY5
VY6
BBOOK
BRSV
B1
ZBOOK
ZRSV
Z1
Z2
YBOOK
YRSV
Y1
Y2
XBOOK
XRSV
X1
X2
X3
X4
LSBOOK
LSRSV
ST1
ST2
ST3
ST4
LD1
LD2
LD3
LD4
REN-1
RACC-1
REN-2
RACC-2
REN-BR
RACC-BR
Branch
Predictor
L0 I-cacheL1 I-cache
L1 D-cache
Load/Store Execution
• 128 bits load and 128 bits store per cycle
• L1 data cache
◦ 32KB, 64-byte lines, 8-way
◦ 3-cycle latency for L1 hit
◦ Write-through, read-allocate, write-no-
allocate
◦ Split virtual and physical tags
◦ Parity with auto-correct
• Hardware data prefetch engine
◦ Prefetches for L1, L2, and L3 caches
◦ Detects stride patterns
• TLBs
◦ 64-entry L1DTLB
◦ 512-entry "final" L2TLB
◦ 64-entry "non-final" L2TLB
◦ 64-entry Stage-2 TLB
13
• Independent power states for each of CPUs and L2
• Each CPU is powered by a block head switch (BHS) or low-
dropout regulator (LDO) from shared supply rail
◦ Light sleep: gate off CPU clock
◦ Voltage retention: registers and caches retain state
◦ Register retention: register state retained using chip power
rail
• Caches and logic are switched off
◦ Collapse: register and L1 cache state not retained
• L2 controller
◦ Low-power states similar to CPU
◦ L2 may auto-clock gate even when CPUs are active
◦ L2 may enter retention or collapse state if both CPUs are in
low-power states
• Entry/exit to/from low-power states controlled by hardware
state machines
◦ Minimizes entry/exit latency
Power Management
Falkor
ARMv8
Core
Falkor
ARMv8
Core
L2 cache
Ring bus interface
Power Control
Falkor duplex
14
Qualcomm Centriq 2400 SoC Overview
Coherent Ring
CPU
L1
CPU
L1
L2
Falkor
CPU
L1
CPU
L1
L2
Falkor
L3 Cache
DDR4
Memory
Controllers
DMA
IMC
Low-
speed IO
QDF2400
PCIe Gen3
L3 Cache
Large distributed
unified L3 w/ECC
CPU Subsystem
Falkor cores based on ARMv8
48 cores (24 duplexes)
Unified L2 cache w/ECC
SoC
Integrated “south bridge” features
DMA, SATA, USB, I2C, UART, SPI, GPIO
SBSA Level 3 Compliant
PCIe Gen3
32 Lanes
DDR4 Memory
6 Channels w/ECC
Bandwidth Compression
2667 MT/s
RDIMM, LRDIMM
1 or 2 DIMMs per Channel
Package
55mm x 55mm LGA
Socketed
SATA
158/14/2017
L3
L3 Quality of Service (QoS) Extensions
QoS Extensions:
• Hardware Abstracted QoS Domain Identifier
• Per Client (Core/Virtual Machine, IO/Virtual Function)
• Per-Resource Monitoring and Way-based Allocation
• Monitor Utilization per QoSID per L3
• Policy Enforcement per QoSID per L3
• Instruction/Data Granularity
• Fine-Tune Cache Allocation per Thread or Class of Threads
Shared Resource Contention Impacts QoS
- Distributed L3 Cache
- Limited/No Allocation Policy Enforcement for Data
VM/Thread 0 VM/Thread 1 IO/VF 0
L3
CPU 0 CPU 1 Device 0
VM/Thread 0 VM/Thread 1 IO/VF 0
CPU 0 CPU 1 Device 0
No L3 QoS L3 QoS
Improved cache utilization and per-
workload performance (lower application
latency) for critical workloads…..
168/14/2017
Memory Bandwidth Compression
Uncompressed Memory
(128B Lines)
0a 0b 1a 1b
2a 2b 3a 3b
4a 4b 5a 5b
6a 6b 7a 7b
8a 8b 9a 9b
Aa Ab Ba Bb
Bandwidth Compression:
• Proprietary algorithm
• Inline compression w/in Memory Controllers
• Fully transparent to software
• Compress 128B line to 64B when possible
• ECC is encoded with compression bit
• Very low latency decompression
• 2 – 4 cycles
• Effective on compressible bandwidth intensive workloads
• Performance improvement varies with workload characteristics
Compressed Memory
Increased effective memory bandwidth
and reduced power for compressible
workloads…..
Constrained Memory Bandwidth
- Channel limited peak Bandwidth
- Limited number of DDR Channels
0 1
2a 2b 3
4 5a 5b
6a 6b 7a 7b
8 9a 9b
A Ba Bb
0a 0b 1a 1b 2a 2b 3a 3b 4a 4b 5a 5b 6a 6b 7a 7b
0 1 2a 2b 3 4 5a 5b 6a 6b 7a 7b
8a 8b
8
Memory Access Stream – w/o Bandwidth Compression
Memory Access Stream – w/ Bandwidth Compression
9a 9b A Ba Bb
178/14/2017
Secure Boot
 Immutable Boot ROM
 Primary Boot Loader code resident in on-chip ROM
 Contains code to authenticate external Firmware/Software
 Establishes Root of Trust
 Security Controller / Fuse Block
 Selection of public key
 Qualcomm public key (from Boot ROM)
 OEM public key
 Customer public key (hash)
 Authentication of secondary and tertiary Boot Loaders
 Integrated Management Controller
 Dedicated processor for boot sequencing
 Authenticates and anti-rollback checks Boot Loaders
 Accelerates SHA portion of digital signature algorithm
 Firmware performs RSA public key operations
18
• Qualcomm Centriq™ 2400 Processor is the industry’s first 10 nm server CPU
• 5th-generation custom core design
◦ Specifically optimized for server applications
• ARMv8-compliant AArch64 only
• Targeting leading-edge Performance with Performance per Watt leadership
• Motherboard specification submitted to Open Compute Project
◦ based on the latest version of Microsoft’s Project Olympus
• Running Windows Server and multiple versions of Linux
• Chip is being sampled at multiple datacenters
• On track for production by end of 2017
Summary & Status
19
Follow us on:
For more information, visit us at:
www.qualcomm.com & www.qualcomm.com/blog
Nothing in these materials is an offer to sell any of the components or devices referenced herein.
©2017 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Qualcomm is a trademark of Qualcomm Incorporated, registered in the United States and other countries, Qualcomm Centriq and Falkor are
trademarks of Qualcomm Incorporated. Other products and brand names may be trademarks or registered trademarks of their respective
owners.
References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or
business units within the Qualcomm corporate structure, as applicable.
Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of its patent portfolio.
Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of
Qualcomm’s engineering, research and development functions, and substantially all of its product and services businesses, including its
semiconductor business, QCT.
Thank you
20
• SoC - System-on-Chip
• SBSA - Server Base System Architecture
• LGA – Line Grid Array
• SATA - Serial Advanced Technology Attachment
• USB - Universal Serial Bus
• I2C - Inter-Integrated Circuit
• UART - Universal Asynchronous Receiver/Transmitter
• SPI - Shared Peripheral Interrupt
• GPIO - General Purpose Input Output
• RDIMM - Registered (Buffered) Dual Inline Memory Module
• LRDIMM - Load Reduced Dual Inline Memory Module
• DDR – Double Data Rate
Glossary

More Related Content

What's hot

ODSA Workshop: Development Effort Summary
ODSA Workshop: Development Effort SummaryODSA Workshop: Development Effort Summary
ODSA Workshop: Development Effort SummaryODSA Workgroup
 
CommScope RUCKUS ICX Switching Configuration
CommScope RUCKUS ICX Switching ConfigurationCommScope RUCKUS ICX Switching Configuration
CommScope RUCKUS ICX Switching ConfigurationCarla Nadin
 
IBM System Networking Portfolio Update, June 2014
IBM System Networking Portfolio Update, June 2014IBM System Networking Portfolio Update, June 2014
IBM System Networking Portfolio Update, June 2014Angel Villar Garea
 
ODSA Proof of Concept SmartNIC Speeds & Feeds
ODSA Proof of Concept SmartNIC Speeds & FeedsODSA Proof of Concept SmartNIC Speeds & Feeds
ODSA Proof of Concept SmartNIC Speeds & FeedsODSA Workgroup
 
AMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat PresentationAMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat PresentationAMD
 
XPDDS18: Xen Testing at Intel - Xudong Hao, Intel
XPDDS18: Xen Testing at Intel - Xudong Hao, IntelXPDDS18: Xen Testing at Intel - Xudong Hao, Intel
XPDDS18: Xen Testing at Intel - Xudong Hao, IntelThe Linux Foundation
 
SNAPDRAGON SoC Family and ARM Architecture
SNAPDRAGON SoC Family and ARM Architecture SNAPDRAGON SoC Family and ARM Architecture
SNAPDRAGON SoC Family and ARM Architecture Abdullaziz Tagawy
 
DPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. MeltonDPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. Meltonharryvanhaaren
 
Implementation of Soft-core Processor on FPGA
Implementation of Soft-core Processor on FPGAImplementation of Soft-core Processor on FPGA
Implementation of Soft-core Processor on FPGADeepak Kumar
 
LF_OVS_17_IPSEC and OVS DPDK
LF_OVS_17_IPSEC and OVS DPDKLF_OVS_17_IPSEC and OVS DPDK
LF_OVS_17_IPSEC and OVS DPDKLF_OpenvSwitch
 
Installing Oracle Database on LDOM
Installing Oracle Database on LDOMInstalling Oracle Database on LDOM
Installing Oracle Database on LDOMPhilippe Fierens
 
SFO15-407: Performance Overhead of ARM Virtualization
SFO15-407: Performance Overhead of ARM VirtualizationSFO15-407: Performance Overhead of ARM Virtualization
SFO15-407: Performance Overhead of ARM VirtualizationLinaro
 
Vyatta Subscription Edition 6.5 R1 Testing and Analysis
Vyatta Subscription Edition 6.5 R1 Testing and AnalysisVyatta Subscription Edition 6.5 R1 Testing and Analysis
Vyatta Subscription Edition 6.5 R1 Testing and AnalysisRouter Analysis, Inc.
 
HKG15-400: Next steps in KVM enablement on ARM
HKG15-400: Next steps in KVM enablement on ARMHKG15-400: Next steps in KVM enablement on ARM
HKG15-400: Next steps in KVM enablement on ARMLinaro
 
XPDDS18: The Evolution of Virtualization in the Arm Architecture - Julien Gra...
XPDDS18: The Evolution of Virtualization in the Arm Architecture - Julien Gra...XPDDS18: The Evolution of Virtualization in the Arm Architecture - Julien Gra...
XPDDS18: The Evolution of Virtualization in the Arm Architecture - Julien Gra...The Linux Foundation
 
Delivering the Future of High-Performance Computing
Delivering the Future of High-Performance ComputingDelivering the Future of High-Performance Computing
Delivering the Future of High-Performance ComputingAMD
 
Project ACRN I2C mediator introduction
Project ACRN I2C mediator introductionProject ACRN I2C mediator introduction
Project ACRN I2C mediator introductionProject ACRN
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 

What's hot (20)

ODSA Workshop: Development Effort Summary
ODSA Workshop: Development Effort SummaryODSA Workshop: Development Effort Summary
ODSA Workshop: Development Effort Summary
 
CommScope RUCKUS ICX Switching Configuration
CommScope RUCKUS ICX Switching ConfigurationCommScope RUCKUS ICX Switching Configuration
CommScope RUCKUS ICX Switching Configuration
 
IBM System Networking Portfolio Update, June 2014
IBM System Networking Portfolio Update, June 2014IBM System Networking Portfolio Update, June 2014
IBM System Networking Portfolio Update, June 2014
 
ODSA Proof of Concept SmartNIC Speeds & Feeds
ODSA Proof of Concept SmartNIC Speeds & FeedsODSA Proof of Concept SmartNIC Speeds & Feeds
ODSA Proof of Concept SmartNIC Speeds & Feeds
 
AMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat PresentationAMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat Presentation
 
XPDDS18: Xen Testing at Intel - Xudong Hao, Intel
XPDDS18: Xen Testing at Intel - Xudong Hao, IntelXPDDS18: Xen Testing at Intel - Xudong Hao, Intel
XPDDS18: Xen Testing at Intel - Xudong Hao, Intel
 
SNAPDRAGON SoC Family and ARM Architecture
SNAPDRAGON SoC Family and ARM Architecture SNAPDRAGON SoC Family and ARM Architecture
SNAPDRAGON SoC Family and ARM Architecture
 
DPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. MeltonDPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. Melton
 
Implementation of Soft-core Processor on FPGA
Implementation of Soft-core Processor on FPGAImplementation of Soft-core Processor on FPGA
Implementation of Soft-core Processor on FPGA
 
LF_OVS_17_IPSEC and OVS DPDK
LF_OVS_17_IPSEC and OVS DPDKLF_OVS_17_IPSEC and OVS DPDK
LF_OVS_17_IPSEC and OVS DPDK
 
Installing Oracle Database on LDOM
Installing Oracle Database on LDOMInstalling Oracle Database on LDOM
Installing Oracle Database on LDOM
 
SFO15-407: Performance Overhead of ARM Virtualization
SFO15-407: Performance Overhead of ARM VirtualizationSFO15-407: Performance Overhead of ARM Virtualization
SFO15-407: Performance Overhead of ARM Virtualization
 
Building a Router
Building a RouterBuilding a Router
Building a Router
 
Vyatta Subscription Edition 6.5 R1 Testing and Analysis
Vyatta Subscription Edition 6.5 R1 Testing and AnalysisVyatta Subscription Edition 6.5 R1 Testing and Analysis
Vyatta Subscription Edition 6.5 R1 Testing and Analysis
 
HKG15-400: Next steps in KVM enablement on ARM
HKG15-400: Next steps in KVM enablement on ARMHKG15-400: Next steps in KVM enablement on ARM
HKG15-400: Next steps in KVM enablement on ARM
 
XPDDS18: The Evolution of Virtualization in the Arm Architecture - Julien Gra...
XPDDS18: The Evolution of Virtualization in the Arm Architecture - Julien Gra...XPDDS18: The Evolution of Virtualization in the Arm Architecture - Julien Gra...
XPDDS18: The Evolution of Virtualization in the Arm Architecture - Julien Gra...
 
Delivering the Future of High-Performance Computing
Delivering the Future of High-Performance ComputingDelivering the Future of High-Performance Computing
Delivering the Future of High-Performance Computing
 
Intel dpdk Tutorial
Intel dpdk TutorialIntel dpdk Tutorial
Intel dpdk Tutorial
 
Project ACRN I2C mediator introduction
Project ACRN I2C mediator introductionProject ACRN I2C mediator introduction
Project ACRN I2C mediator introduction
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 

Similar to Qualcomm centriq 2400 hot chips final submission corrected

Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxAkshitAgiwal1
 
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)byteLAKE
 
HP Blades Presentation
HP Blades PresentationHP Blades Presentation
HP Blades PresentationBhavin Vyas
 
High-performance 32G Fibre Channel Module on MDS 9700 Directors:
High-performance 32G Fibre Channel Module on MDS 9700 Directors:High-performance 32G Fibre Channel Module on MDS 9700 Directors:
High-performance 32G Fibre Channel Module on MDS 9700 Directors:Tony Antony
 
Конференция Brocade. 4. Развитие технологии Brocade VCS, новое поколение комм...
Конференция Brocade. 4. Развитие технологии Brocade VCS, новое поколение комм...Конференция Brocade. 4. Развитие технологии Brocade VCS, новое поколение комм...
Конференция Brocade. 4. Развитие технологии Brocade VCS, новое поколение комм...SkillFactory
 
Reference design for v mware nsx
Reference design for v mware nsxReference design for v mware nsx
Reference design for v mware nsxsolarisyougood
 
Tech Days 2015: Embedded Product Update
Tech Days 2015: Embedded Product UpdateTech Days 2015: Embedded Product Update
Tech Days 2015: Embedded Product UpdateAdaCore
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioOPNFV
 
POLYTEDA PowerDRC/LVS overview
POLYTEDA PowerDRC/LVS overviewPOLYTEDA PowerDRC/LVS overview
POLYTEDA PowerDRC/LVS overviewAlexander Grudanov
 
Gunjae_ISCA15_slides.pdf
Gunjae_ISCA15_slides.pdfGunjae_ISCA15_slides.pdf
Gunjae_ISCA15_slides.pdfssuser30e7d2
 
Navigating dc architectures tech&sales
Navigating dc architectures tech&salesNavigating dc architectures tech&sales
Navigating dc architectures tech&salesEric Zhaohui Ji
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureCeph Community
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitecturePatrick McGarry
 
Scaling the Container Dataplane
Scaling the Container Dataplane Scaling the Container Dataplane
Scaling the Container Dataplane Michelle Holley
 
PLNOG14: Konwergentność, Wydajność, Szybkość w Data Center - Kazimierz Jantas
PLNOG14: Konwergentność, Wydajność, Szybkość w Data Center - Kazimierz JantasPLNOG14: Konwergentność, Wydajność, Szybkość w Data Center - Kazimierz Jantas
PLNOG14: Konwergentność, Wydajność, Szybkość w Data Center - Kazimierz JantasPROIDEA
 
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...Intel IT Center
 

Similar to Qualcomm centriq 2400 hot chips final submission corrected (20)

Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptx
 
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
 
HP Blades Presentation
HP Blades PresentationHP Blades Presentation
HP Blades Presentation
 
High-performance 32G Fibre Channel Module on MDS 9700 Directors:
High-performance 32G Fibre Channel Module on MDS 9700 Directors:High-performance 32G Fibre Channel Module on MDS 9700 Directors:
High-performance 32G Fibre Channel Module on MDS 9700 Directors:
 
Конференция Brocade. 4. Развитие технологии Brocade VCS, новое поколение комм...
Конференция Brocade. 4. Развитие технологии Brocade VCS, новое поколение комм...Конференция Brocade. 4. Развитие технологии Brocade VCS, новое поколение комм...
Конференция Brocade. 4. Развитие технологии Brocade VCS, новое поколение комм...
 
Reference design for v mware nsx
Reference design for v mware nsxReference design for v mware nsx
Reference design for v mware nsx
 
Tech Days 2015: Embedded Product Update
Tech Days 2015: Embedded Product UpdateTech Days 2015: Embedded Product Update
Tech Days 2015: Embedded Product Update
 
Processors selection
Processors selectionProcessors selection
Processors selection
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
 
WAN - trends and use cases
WAN - trends and use casesWAN - trends and use cases
WAN - trends and use cases
 
Решения NFV в контексте операторов связи
Решения NFV в контексте операторов связиРешения NFV в контексте операторов связи
Решения NFV в контексте операторов связи
 
POLYTEDA PowerDRC/LVS overview
POLYTEDA PowerDRC/LVS overviewPOLYTEDA PowerDRC/LVS overview
POLYTEDA PowerDRC/LVS overview
 
Juniper for Enterprise
Juniper for EnterpriseJuniper for Enterprise
Juniper for Enterprise
 
Gunjae_ISCA15_slides.pdf
Gunjae_ISCA15_slides.pdfGunjae_ISCA15_slides.pdf
Gunjae_ISCA15_slides.pdf
 
Navigating dc architectures tech&sales
Navigating dc architectures tech&salesNavigating dc architectures tech&sales
Navigating dc architectures tech&sales
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
Scaling the Container Dataplane
Scaling the Container Dataplane Scaling the Container Dataplane
Scaling the Container Dataplane
 
PLNOG14: Konwergentność, Wydajność, Szybkość w Data Center - Kazimierz Jantas
PLNOG14: Konwergentność, Wydajność, Szybkość w Data Center - Kazimierz JantasPLNOG14: Konwergentność, Wydajność, Szybkość w Data Center - Kazimierz Jantas
PLNOG14: Konwergentność, Wydajność, Szybkość w Data Center - Kazimierz Jantas
 
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...
 

More from Dileep Bhandarkar

Open Compute Summit Keynote 17 June 2011
Open Compute Summit Keynote 17 June 2011Open Compute Summit Keynote 17 June 2011
Open Compute Summit Keynote 17 June 2011Dileep Bhandarkar
 
Datacenter Dynamics Chicago 30 sept 2010
Datacenter Dynamics Chicago 30 sept 2010Datacenter Dynamics Chicago 30 sept 2010
Datacenter Dynamics Chicago 30 sept 2010Dileep Bhandarkar
 
Energy Efficiency Considerations in Large Datacenters
Energy Efficiency Considerations in Large DatacentersEnergy Efficiency Considerations in Large Datacenters
Energy Efficiency Considerations in Large DatacentersDileep Bhandarkar
 
Data center-server-cooling-power-management-paper
Data center-server-cooling-power-management-paperData center-server-cooling-power-management-paper
Data center-server-cooling-power-management-paperDileep Bhandarkar
 
Moscow conference keynote in 2012
Moscow conference keynote in 2012Moscow conference keynote in 2012
Moscow conference keynote in 2012Dileep Bhandarkar
 
New Delhi Cloud Summit 05 26-11
New Delhi Cloud Summit 05 26-11New Delhi Cloud Summit 05 26-11
New Delhi Cloud Summit 05 26-11Dileep Bhandarkar
 
Performance Characterization of the Pentium Pro Processor
Performance Characterization of the Pentium Pro ProcessorPerformance Characterization of the Pentium Pro Processor
Performance Characterization of the Pentium Pro ProcessorDileep Bhandarkar
 
Innovation lecture for hong kong
Innovation lecture for hong kongInnovation lecture for hong kong
Innovation lecture for hong kongDileep Bhandarkar
 
Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...
Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...
Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...Dileep Bhandarkar
 
Innovation lecture for shanghai final
Innovation lecture for shanghai finalInnovation lecture for shanghai final
Innovation lecture for shanghai finalDileep Bhandarkar
 
Linaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updatedLinaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updatedDileep Bhandarkar
 
Dileep Random Access Talk at salishan 2016
Dileep Random Access Talk at salishan 2016Dileep Random Access Talk at salishan 2016
Dileep Random Access Talk at salishan 2016Dileep Bhandarkar
 
Server design summit keynote handout
Server design summit keynote handoutServer design summit keynote handout
Server design summit keynote handoutDileep Bhandarkar
 

More from Dileep Bhandarkar (20)

Open Compute Summit Keynote 17 June 2011
Open Compute Summit Keynote 17 June 2011Open Compute Summit Keynote 17 June 2011
Open Compute Summit Keynote 17 June 2011
 
Datacenter Dynamics Chicago 30 sept 2010
Datacenter Dynamics Chicago 30 sept 2010Datacenter Dynamics Chicago 30 sept 2010
Datacenter Dynamics Chicago 30 sept 2010
 
Energy Efficiency Considerations in Large Datacenters
Energy Efficiency Considerations in Large DatacentersEnergy Efficiency Considerations in Large Datacenters
Energy Efficiency Considerations in Large Datacenters
 
Samsung cio-forum-2012
Samsung cio-forum-2012 Samsung cio-forum-2012
Samsung cio-forum-2012
 
Data center-server-cooling-power-management-paper
Data center-server-cooling-power-management-paperData center-server-cooling-power-management-paper
Data center-server-cooling-power-management-paper
 
Moscow conference keynote in 2012
Moscow conference keynote in 2012Moscow conference keynote in 2012
Moscow conference keynote in 2012
 
New Delhi Cloud Summit 05 26-11
New Delhi Cloud Summit 05 26-11New Delhi Cloud Summit 05 26-11
New Delhi Cloud Summit 05 26-11
 
Performance Characterization of the Pentium Pro Processor
Performance Characterization of the Pentium Pro ProcessorPerformance Characterization of the Pentium Pro Processor
Performance Characterization of the Pentium Pro Processor
 
Innovation lecture for hong kong
Innovation lecture for hong kongInnovation lecture for hong kong
Innovation lecture for hong kong
 
Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...
Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...
Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...
 
Innovation lecture for shanghai final
Innovation lecture for shanghai finalInnovation lecture for shanghai final
Innovation lecture for shanghai final
 
Semicon2018 dileepb
Semicon2018 dileepbSemicon2018 dileepb
Semicon2018 dileepb
 
Linaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updatedLinaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updated
 
Hipeac 2018 keynote Talk
Hipeac 2018 keynote TalkHipeac 2018 keynote Talk
Hipeac 2018 keynote Talk
 
Alpha memo july 1992
Alpha memo july 1992Alpha memo july 1992
Alpha memo july 1992
 
China AI Summit talk 2017
China AI Summit talk 2017China AI Summit talk 2017
China AI Summit talk 2017
 
Dileep Random Access Talk at salishan 2016
Dileep Random Access Talk at salishan 2016Dileep Random Access Talk at salishan 2016
Dileep Random Access Talk at salishan 2016
 
Risc vs cisc
Risc vs ciscRisc vs cisc
Risc vs cisc
 
Moscow conference keynote
Moscow conference keynoteMoscow conference keynote
Moscow conference keynote
 
Server design summit keynote handout
Server design summit keynote handoutServer design summit keynote handout
Server design summit keynote handout
 

Recently uploaded

Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 

Qualcomm centriq 2400 hot chips final submission corrected

  • 1. Qualcomm Centriq™ 2400 Processor Barry Wolford, Senior Director, Engineering Thomas Speier, Senior Director, Engineering Dileep Bhandarkar, Vice President, Technology Qualcomm Datacenter Technologies, Inc. August 22, 2017 @qualcomm Qualcomm Centriq 2400 Processor is a product of Qualcomm Datacenter Technologies, Inc..
  • 2. 2 •Qualcomm Datacenter Technologies Introduction •Qualcomm® Falkor™ CPU Overview •Qualcomm CentriqTM 2400 Server SoC Overview •Summary Agenda Qualcomm Falkor CPU is a product of Qualcomm Datacenter Technologies, Inc.
  • 3. 3 QDT Well Positioned to Address Cloud Datacenter Opportunity Bringing decade of experience delivering high-performance, power- efficient ARM CPU architectures Focus on true server class features and performance with aggressive power management techniques Partnering with cloud market leaders for product definition Uniquely positioned to leverage process leadership driven by mobile industry growth to deliver industry first 10 nm server processor Unique High Performance, Low Power ARM Based CPUs
  • 4. 4 • QDT-designed custom core powering Qualcomm Centriq 2400 Processor • 5th generation custom core design • Designed from the ground up to meet the needs of cloud service providers • Fully ARMv8-compliant • AArch64 only • Supports EL3 (TrustZone) and EL2 (hypervisor) • Includes optional cryptography acceleration instructions ◦ AES, SHA1, SHA2-256 • Designed for performance, optimized for power Qualcomm Falkor™ CPU Designed for the Cloud
  • 5. 5 • Falkor core duplex is building block for SoC • Two Custom ARM V8 CPUs • Shared L2 Cache • Nominal Operating Voltage ~1V • Shared bus interface to Qualcomm® System Bus (QSB) ring interconnect  Qualcomm Proprietary Protocol  Custom Bi-Directional Segmented Ring Bus  Fully Coherent (Cache & IO)  Shortest Path Routing  Multicast on Read  > 250 GB/s aggregate bandwidth Falkor Core configuration Falkor ARMv8 Core Falkor ARMv8 Core L2 cache Ring bus interface Power Control Falkor duplex Qualcomm System Bus is a product of Qualcomm Datacenter Technologies, Inc.
  • 6. 6 • 128-byte lines, 8-way • Unified between I-side and D-side • Shared between two CPUs in duplex • 128-byte interleaved for improved throughput • SEC-DED ECC protected • 15-cycle minimum latency for L2 hit • Inclusive of L1 D-caches • 32-bytes per direction per interleave per cycle Falkor L2 Cache Falkor ARMv8 Core Falkor ARMv8 Core L2 cache Ring bus interface Power Control Falkor duplex
  • 7. 7 • Heterogeneous pipeline providing optimal performance per unit power ◦ Variable-length pipelines tuned per function ◦ Minimizes idle hardware • 4-issue ◦ 3 instructions + 1 direct branch • 8-dispatch Falkor CPU F1 F2 F3 IQ EXPAND REN-0 RACC-0 VXBOOK VXRSV VX1 VX2 VX3 VX4 VX5 VX6 VYBOOK VYRSV VY1 VY2 VY3 VY4 VY5 VY6 BBOOK BRSV B1 ZBOOK ZRSV Z1 Z2 YBOOK YRSV Y1 Y2 XBOOK XRSV X1 X2 X3 X4 LSBOOK LSRSV ST1 ST2 ST3 ST4 LD1 LD2 LD3 LD4 REN-1 RACC-1 REN-2 RACC-2 REN-BR RACC-BR Branch Predictor L0 I-cacheL1 I-cache L1 D-cache Falkor ARMv8 Core Falkor ARMv8 Core L2 cache Ring bus interface Power Control Falkor duplex
  • 8. 8 F1 F2 F3 IQ EXPAND REN-0 RACC-0 VXBOOK VXRSV VX1 VX2 VX3 VX4 VX5 VX6 VYBOOK VYRSV VY1 VY2 VY3 VY4 VY5 VY6 BBOOK BRSV B1 ZBOOK ZRSV Z1 Z2 YBOOK YRSV Y1 Y2 XBOOK XRSV X1 X2 X3 X4 LSBOOK LSRSV ST1 ST2 ST3 ST4 LD1 LD2 LD3 LD4 REN-1 RACC-1 REN-2 RACC-2 REN-BR RACC-BR Branch Predictor L0 I-cacheL1 I-cache L1 D-cache Branch Prediction • 0-1 cycle penalty for almost all predicted taken branches • 16-entry BTIC (branch target instruction cache) ◦ Supports 0-cycle branch penalty • Multi-level BTAC (branch target address cache) for indirect branches ◦ 16-entry level-0 BTAC ◦ 256-entry level-1 BTAC ◦ PC-relative branches utilize I-cache as BTAC • 16-entry link stack • Multi-level BHT (branch history table) ◦ Multi-faceted scheme involving staged predictors
  • 9. 9 F1 F2 F3 IQ EXPAND REN-0 RACC-0 VXBOOK VXRSV VX1 VX2 VX3 VX4 VX5 VX6 VYBOOK VYRSV VY1 VY2 VY3 VY4 VY5 VY6 BBOOK BRSV B1 ZBOOK ZRSV Z1 Z2 YBOOK YRSV Y1 Y2 XBOOK XRSV X1 X2 X3 X4 LSBOOK LSRSV ST1 ST2 ST3 ST4 LD1 LD2 LD3 LD4 REN-1 RACC-1 REN-2 RACC-2 REN-BR RACC-BR Branch Predictor L0 I-cacheL1 I-cache L1 D-cache Instruction Fetch • Two-level I-cache topology ◦ Key element in performance and performance/power efficiency advantage ◦ L0 and L1 caches are exclusive • L0 I-cache ◦ 24KB, 64-byte lines, 3-way ◦ Way-predicted ◦ Parity with auto-correct ◦ 0-cycle penalty for L0 hit • L1 I-cache ◦ 64KB, 64-byte lines, 8-way ◦ Parity with auto-correct ◦ 4-cycle penalty for L0 miss / L1 hit ◦ Hardware prefetch on L1 miss • Fetches up to 4 instructions per cycle ◦ Fetch group can span cache lines • Instructions are decoded and expanded into micro-ops ◦ Most instructions map to a single micro-op
  • 10. 10 F1 F2 F3 IQ EXPAND REN-0 RACC-0 VXBOOK VXRSV VX1 VX2 VX3 VX4 VX5 VX6 VYBOOK VYRSV VY1 VY2 VY3 VY4 VY5 VY6 BBOOK BRSV B1 ZBOOK ZRSV Z1 Z2 YBOOK YRSV Y1 Y2 XBOOK XRSV X1 X2 X3 X4 LSBOOK LSRSV ST1 ST2 ST3 ST4 LD1 LD2 LD3 LD4 REN-1 RACC-1 REN-2 RACC-2 REN-BR RACC-BR Branch Predictor L0 I-cacheL1 I-cache L1 D-cache Rename (REN), Register Access (RACC), and Reserve (RSV) • 256-entry rename/completion buffer • 76-instruction dispatch window • Up to 128 uncommitted instructions in flight ◦ Additional committed instructions may still be waiting on retirement • Out-of-order dispatch of branches, ALU operations, loads, stores • Up to 4 instructions retired per cycle
  • 11. 11 F1 F2 F3 IQ EXPAND REN-0 RACC-0 VXBOOK VXRSV VX1 VX2 VX3 VX4 VX5 VX6 VYBOOK VYRSV VY1 VY2 VY3 VY4 VY5 VY6 BBOOK BRSV B1 ZBOOK ZRSV Z1 Z2 YBOOK YRSV Y1 Y2 XBOOK XRSV X1 X2 X3 X4 LSBOOK LSRSV ST1 ST2 ST3 ST4 LD1 LD2 LD3 LD4 REN-1 RACC-1 REN-2 RACC-2 REN-BR RACC-BR Branch Predictor L0 I-cacheL1 I-cache L1 D-cache Integer and Branch Execution • Heterogeneous execution units for integer ALU operations and branches • Pipeline length sized based on operation Operation B-pipe X-pipe Y-pipe Z-pipe Direct branch Y Indirect branch Y Simple ALU Y Y Y Multiplies Y
  • 12. 12 F1 F2 F3 IQ EXPAND REN-0 RACC-0 VXBOOK VXRSV VX1 VX2 VX3 VX4 VX5 VX6 VYBOOK VYRSV VY1 VY2 VY3 VY4 VY5 VY6 BBOOK BRSV B1 ZBOOK ZRSV Z1 Z2 YBOOK YRSV Y1 Y2 XBOOK XRSV X1 X2 X3 X4 LSBOOK LSRSV ST1 ST2 ST3 ST4 LD1 LD2 LD3 LD4 REN-1 RACC-1 REN-2 RACC-2 REN-BR RACC-BR Branch Predictor L0 I-cacheL1 I-cache L1 D-cache Load/Store Execution • 128 bits load and 128 bits store per cycle • L1 data cache ◦ 32KB, 64-byte lines, 8-way ◦ 3-cycle latency for L1 hit ◦ Write-through, read-allocate, write-no- allocate ◦ Split virtual and physical tags ◦ Parity with auto-correct • Hardware data prefetch engine ◦ Prefetches for L1, L2, and L3 caches ◦ Detects stride patterns • TLBs ◦ 64-entry L1DTLB ◦ 512-entry "final" L2TLB ◦ 64-entry "non-final" L2TLB ◦ 64-entry Stage-2 TLB
  • 13. 13 • Independent power states for each of CPUs and L2 • Each CPU is powered by a block head switch (BHS) or low- dropout regulator (LDO) from shared supply rail ◦ Light sleep: gate off CPU clock ◦ Voltage retention: registers and caches retain state ◦ Register retention: register state retained using chip power rail • Caches and logic are switched off ◦ Collapse: register and L1 cache state not retained • L2 controller ◦ Low-power states similar to CPU ◦ L2 may auto-clock gate even when CPUs are active ◦ L2 may enter retention or collapse state if both CPUs are in low-power states • Entry/exit to/from low-power states controlled by hardware state machines ◦ Minimizes entry/exit latency Power Management Falkor ARMv8 Core Falkor ARMv8 Core L2 cache Ring bus interface Power Control Falkor duplex
  • 14. 14 Qualcomm Centriq 2400 SoC Overview Coherent Ring CPU L1 CPU L1 L2 Falkor CPU L1 CPU L1 L2 Falkor L3 Cache DDR4 Memory Controllers DMA IMC Low- speed IO QDF2400 PCIe Gen3 L3 Cache Large distributed unified L3 w/ECC CPU Subsystem Falkor cores based on ARMv8 48 cores (24 duplexes) Unified L2 cache w/ECC SoC Integrated “south bridge” features DMA, SATA, USB, I2C, UART, SPI, GPIO SBSA Level 3 Compliant PCIe Gen3 32 Lanes DDR4 Memory 6 Channels w/ECC Bandwidth Compression 2667 MT/s RDIMM, LRDIMM 1 or 2 DIMMs per Channel Package 55mm x 55mm LGA Socketed SATA
  • 15. 158/14/2017 L3 L3 Quality of Service (QoS) Extensions QoS Extensions: • Hardware Abstracted QoS Domain Identifier • Per Client (Core/Virtual Machine, IO/Virtual Function) • Per-Resource Monitoring and Way-based Allocation • Monitor Utilization per QoSID per L3 • Policy Enforcement per QoSID per L3 • Instruction/Data Granularity • Fine-Tune Cache Allocation per Thread or Class of Threads Shared Resource Contention Impacts QoS - Distributed L3 Cache - Limited/No Allocation Policy Enforcement for Data VM/Thread 0 VM/Thread 1 IO/VF 0 L3 CPU 0 CPU 1 Device 0 VM/Thread 0 VM/Thread 1 IO/VF 0 CPU 0 CPU 1 Device 0 No L3 QoS L3 QoS Improved cache utilization and per- workload performance (lower application latency) for critical workloads…..
  • 16. 168/14/2017 Memory Bandwidth Compression Uncompressed Memory (128B Lines) 0a 0b 1a 1b 2a 2b 3a 3b 4a 4b 5a 5b 6a 6b 7a 7b 8a 8b 9a 9b Aa Ab Ba Bb Bandwidth Compression: • Proprietary algorithm • Inline compression w/in Memory Controllers • Fully transparent to software • Compress 128B line to 64B when possible • ECC is encoded with compression bit • Very low latency decompression • 2 – 4 cycles • Effective on compressible bandwidth intensive workloads • Performance improvement varies with workload characteristics Compressed Memory Increased effective memory bandwidth and reduced power for compressible workloads….. Constrained Memory Bandwidth - Channel limited peak Bandwidth - Limited number of DDR Channels 0 1 2a 2b 3 4 5a 5b 6a 6b 7a 7b 8 9a 9b A Ba Bb 0a 0b 1a 1b 2a 2b 3a 3b 4a 4b 5a 5b 6a 6b 7a 7b 0 1 2a 2b 3 4 5a 5b 6a 6b 7a 7b 8a 8b 8 Memory Access Stream – w/o Bandwidth Compression Memory Access Stream – w/ Bandwidth Compression 9a 9b A Ba Bb
  • 17. 178/14/2017 Secure Boot  Immutable Boot ROM  Primary Boot Loader code resident in on-chip ROM  Contains code to authenticate external Firmware/Software  Establishes Root of Trust  Security Controller / Fuse Block  Selection of public key  Qualcomm public key (from Boot ROM)  OEM public key  Customer public key (hash)  Authentication of secondary and tertiary Boot Loaders  Integrated Management Controller  Dedicated processor for boot sequencing  Authenticates and anti-rollback checks Boot Loaders  Accelerates SHA portion of digital signature algorithm  Firmware performs RSA public key operations
  • 18. 18 • Qualcomm Centriq™ 2400 Processor is the industry’s first 10 nm server CPU • 5th-generation custom core design ◦ Specifically optimized for server applications • ARMv8-compliant AArch64 only • Targeting leading-edge Performance with Performance per Watt leadership • Motherboard specification submitted to Open Compute Project ◦ based on the latest version of Microsoft’s Project Olympus • Running Windows Server and multiple versions of Linux • Chip is being sampled at multiple datacenters • On track for production by end of 2017 Summary & Status
  • 19. 19 Follow us on: For more information, visit us at: www.qualcomm.com & www.qualcomm.com/blog Nothing in these materials is an offer to sell any of the components or devices referenced herein. ©2017 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Qualcomm is a trademark of Qualcomm Incorporated, registered in the United States and other countries, Qualcomm Centriq and Falkor are trademarks of Qualcomm Incorporated. Other products and brand names may be trademarks or registered trademarks of their respective owners. References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially all of its product and services businesses, including its semiconductor business, QCT. Thank you
  • 20. 20 • SoC - System-on-Chip • SBSA - Server Base System Architecture • LGA – Line Grid Array • SATA - Serial Advanced Technology Attachment • USB - Universal Serial Bus • I2C - Inter-Integrated Circuit • UART - Universal Asynchronous Receiver/Transmitter • SPI - Shared Peripheral Interrupt • GPIO - General Purpose Input Output • RDIMM - Registered (Buffered) Dual Inline Memory Module • LRDIMM - Load Reduced Dual Inline Memory Module • DDR – Double Data Rate Glossary