Submit Search
Upload
Feeding the Multicore Beast:It’s All About the Data!
•
1 like
•
78 views
S
Slide_N
Follow
Feeding the Multicore Beast:It’s All About the Data!
Read less
Read more
Technology
Report
Share
Report
Share
1 of 38
Download now
Download to read offline
Recommended
Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big Analytics
Heiko Joerg Schick
High Performance Computing - Challenges on the Road to Exascale Computing
High Performance Computing - Challenges on the Road to Exascale Computing
Heiko Joerg Schick
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Heiko Joerg Schick
Nikravesh big datafeb2013bt
Nikravesh big datafeb2013bt
Masoud Nikravesh
Webinaron muticoreprocessors
Webinaron muticoreprocessors
Nagasuri Bala Venkateswarlu
Blue Gene Active Storage
Blue Gene Active Storage
Heiko Joerg Schick
High performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspective
Jason Shih
Early Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor
Early Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor
Intel IT Center
Recommended
Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big Analytics
Heiko Joerg Schick
High Performance Computing - Challenges on the Road to Exascale Computing
High Performance Computing - Challenges on the Road to Exascale Computing
Heiko Joerg Schick
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Heiko Joerg Schick
Nikravesh big datafeb2013bt
Nikravesh big datafeb2013bt
Masoud Nikravesh
Webinaron muticoreprocessors
Webinaron muticoreprocessors
Nagasuri Bala Venkateswarlu
Blue Gene Active Storage
Blue Gene Active Storage
Heiko Joerg Schick
High performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspective
Jason Shih
Early Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor
Early Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor
Intel IT Center
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
Edge AI and Vision Alliance
Anegdotic Maxeler (Romania)
Anegdotic Maxeler (Romania)
Valentina Emilia Balas
Intro to Cell Broadband Engine for HPC
Intro to Cell Broadband Engine for HPC
Slide_N
TotalView Debugger On Blue Gene
TotalView Debugger On Blue Gene
Totalviewtech
Rama krishna ppts for blue gene/L
Rama krishna ppts for blue gene/L
msramakrishna
Blue gene- IBM's SuperComputer
Blue gene- IBM's SuperComputer
Isaaq Mohammed
Blue gene technology
Blue gene technology
Vivek Jha
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
inside-BigData.com
Bluegene
Bluegene
Ravi Jiyani
blue gene ppt
blue gene ppt
RabindraRajSah
Bluegene
Bluegene
Ravi Jiyani
01 From K to Fugaku
01 From K to Fugaku
RCCSRENKEI
Blue gene
Blue gene
ch samaram
Blue Gene
Blue Gene
sranxslide
Super Computer
Super Computer
gueste3bbd0
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware Challegens
Oscar Law
BLUE GENE/L
BLUE GENE/L
Akhila Mohan
High performance computing
High performance computing
Maher Alshammari
Open power ddl and lms
Open power ddl and lms
Ganesan Narayanasamy
08 Supercomputer Fugaku
08 Supercomputer Fugaku
RCCSRENKEI
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
Heiko Joerg Schick
Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
Slide_N
More Related Content
What's hot
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
Edge AI and Vision Alliance
Anegdotic Maxeler (Romania)
Anegdotic Maxeler (Romania)
Valentina Emilia Balas
Intro to Cell Broadband Engine for HPC
Intro to Cell Broadband Engine for HPC
Slide_N
TotalView Debugger On Blue Gene
TotalView Debugger On Blue Gene
Totalviewtech
Rama krishna ppts for blue gene/L
Rama krishna ppts for blue gene/L
msramakrishna
Blue gene- IBM's SuperComputer
Blue gene- IBM's SuperComputer
Isaaq Mohammed
Blue gene technology
Blue gene technology
Vivek Jha
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
inside-BigData.com
Bluegene
Bluegene
Ravi Jiyani
blue gene ppt
blue gene ppt
RabindraRajSah
Bluegene
Bluegene
Ravi Jiyani
01 From K to Fugaku
01 From K to Fugaku
RCCSRENKEI
Blue gene
Blue gene
ch samaram
Blue Gene
Blue Gene
sranxslide
Super Computer
Super Computer
gueste3bbd0
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware Challegens
Oscar Law
BLUE GENE/L
BLUE GENE/L
Akhila Mohan
High performance computing
High performance computing
Maher Alshammari
Open power ddl and lms
Open power ddl and lms
Ganesan Narayanasamy
08 Supercomputer Fugaku
08 Supercomputer Fugaku
RCCSRENKEI
What's hot
(20)
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
Anegdotic Maxeler (Romania)
Anegdotic Maxeler (Romania)
Intro to Cell Broadband Engine for HPC
Intro to Cell Broadband Engine for HPC
TotalView Debugger On Blue Gene
TotalView Debugger On Blue Gene
Rama krishna ppts for blue gene/L
Rama krishna ppts for blue gene/L
Blue gene- IBM's SuperComputer
Blue gene- IBM's SuperComputer
Blue gene technology
Blue gene technology
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
Bluegene
Bluegene
blue gene ppt
blue gene ppt
Bluegene
Bluegene
01 From K to Fugaku
01 From K to Fugaku
Blue gene
Blue gene
Blue Gene
Blue Gene
Super Computer
Super Computer
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware Challegens
BLUE GENE/L
BLUE GENE/L
High performance computing
High performance computing
Open power ddl and lms
Open power ddl and lms
08 Supercomputer Fugaku
08 Supercomputer Fugaku
Similar to Feeding the Multicore Beast:It’s All About the Data!
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
Heiko Joerg Schick
Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
Slide_N
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and Visualization
Slide_N
Enterprise power systems transition to power7 technology
Enterprise power systems transition to power7 technology
solarisyougood
Power 7 Overview
Power 7 Overview
lambertt
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Slide_N
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
Michael Gschwind
Cell/B.E. Servers: A Platform for Real Time Scalable Computing and Visualization
Cell/B.E. Servers: A Platform for Real Time Scalable Computing and Visualization
Slide_N
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010
TELECOM I+D
The Cell Processor
The Cell Processor
Heiko Joerg Schick
Using GZIP Data Compression to Reduce Power Consumption in IoT Devices
Using GZIP Data Compression to Reduce Power Consumption in IoT Devices
CAST, Inc.
Energy Savings Using GZIP IP Within IoT Devices
Energy Savings Using GZIP IP Within IoT Devices
CAST, Inc.
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
inside-BigData.com
IBM HPC Transformation with AI
IBM HPC Transformation with AI
Ganesan Narayanasamy
Reservoir engineering in a HPC (zettaflops) world: a ‘disruptive’ presentation
Reservoir engineering in a HPC (zettaflops) world: a ‘disruptive’ presentation
Hans Haringa
Connection Machine
Connection Machine
butest
Chapter 1.pptx
Chapter 1.pptx
claudio48
The future of tape
The future of tape
Josef Weingand
The Best Programming Practice for Cell/B.E.
The Best Programming Practice for Cell/B.E.
Slide_N
Deeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
Ganesan Narayanasamy
Similar to Feeding the Multicore Beast:It’s All About the Data!
(20)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and Visualization
Enterprise power systems transition to power7 technology
Enterprise power systems transition to power7 technology
Power 7 Overview
Power 7 Overview
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
Cell/B.E. Servers: A Platform for Real Time Scalable Computing and Visualization
Cell/B.E. Servers: A Platform for Real Time Scalable Computing and Visualization
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010
The Cell Processor
The Cell Processor
Using GZIP Data Compression to Reduce Power Consumption in IoT Devices
Using GZIP Data Compression to Reduce Power Consumption in IoT Devices
Energy Savings Using GZIP IP Within IoT Devices
Energy Savings Using GZIP IP Within IoT Devices
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
IBM HPC Transformation with AI
IBM HPC Transformation with AI
Reservoir engineering in a HPC (zettaflops) world: a ‘disruptive’ presentation
Reservoir engineering in a HPC (zettaflops) world: a ‘disruptive’ presentation
Connection Machine
Connection Machine
Chapter 1.pptx
Chapter 1.pptx
The future of tape
The future of tape
The Best Programming Practice for Cell/B.E.
The Best Programming Practice for Cell/B.E.
Deeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
More from Slide_N
New Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - Kutaragi
Slide_N
Sony Transformation 60 - Kutaragi
Sony Transformation 60 - Kutaragi
Slide_N
Sony Transformation 60
Sony Transformation 60
Slide_N
Moving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living Room
Slide_N
Industry Trends in Microprocessor Design
Industry Trends in Microprocessor Design
Slide_N
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Slide_N
Cellular Neural Networks: Theory
Cellular Neural Networks: Theory
Slide_N
Network Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTM
Slide_N
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3
Slide_N
Developing Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of Destruction
Slide_N
NVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM Power
Slide_N
The Visual Computing Revolution Continues
The Visual Computing Revolution Continues
Slide_N
MLAA on PS3
MLAA on PS3
Slide_N
SPU gameplay
SPU gameplay
Slide_N
Insomniac Physics
Insomniac Physics
Slide_N
SPU Shaders
SPU Shaders
Slide_N
SPU Physics
SPU Physics
Slide_N
Deferred Rendering in Killzone 2
Deferred Rendering in Killzone 2
Slide_N
Practical SPU Programming in God of War III
Practical SPU Programming in God of War III
Slide_N
The Technology of Uncharted: Drake’s Fortune
The Technology of Uncharted: Drake’s Fortune
Slide_N
More from Slide_N
(20)
New Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - Kutaragi
Sony Transformation 60 - Kutaragi
Sony Transformation 60 - Kutaragi
Sony Transformation 60
Sony Transformation 60
Moving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living Room
Industry Trends in Microprocessor Design
Industry Trends in Microprocessor Design
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Cellular Neural Networks: Theory
Cellular Neural Networks: Theory
Network Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTM
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3
Developing Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of Destruction
NVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM Power
The Visual Computing Revolution Continues
The Visual Computing Revolution Continues
MLAA on PS3
MLAA on PS3
SPU gameplay
SPU gameplay
Insomniac Physics
Insomniac Physics
SPU Shaders
SPU Shaders
SPU Physics
SPU Physics
Deferred Rendering in Killzone 2
Deferred Rendering in Killzone 2
Practical SPU Programming in God of War III
Practical SPU Programming in God of War III
The Technology of Uncharted: Drake’s Fortune
The Technology of Uncharted: Drake’s Fortune
Recently uploaded
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
ThousandEyes
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
Softradix Technologies
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Florian Wilhelm
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
null - The Open Security Community
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Fwdays
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Malak Abu Hammad
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
Neo4j
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
Kalema Edgar
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
soniya singh
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
carlostorres15106
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
Precisely
The transition to renewables in India.pdf
The transition to renewables in India.pdf
Competition Advisory Services (India) LLP
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
Allon Mureinik
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
OnBoard
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Delhi Call girls
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
Enterprise Knowledge
Key Features Of Token Development (1).pptx
Key Features Of Token Development (1).pptx
LBM Solutions
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Safe Software
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Ridwan Fadjar
Recently uploaded
(20)
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
The transition to renewables in India.pdf
The transition to renewables in India.pdf
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
Key Features Of Token Development (1).pptx
Key Features Of Token Development (1).pptx
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Feeding the Multicore Beast:It’s All About the Data!
1.
IBM Research © 2008 Feeding
the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell Solutions Dept.
2.
IBM Research © 20082
mpp@us.ibm.com Outline History: Data challenge Motivation for multicore Implications for programmers How Cell addresses these implications Examples • 2D/3D FFT – Medical Imaging, Petroleum, general HPC… • Green’s Functions – Seismic Imaging (Petroleum) • String Matching – Network Processing: DPI & Intrusion Detections • Neural Networks – Finance
3.
IBM Research © 20083
mpp@us.ibm.com Chapter 1: The Beast is Hungry!
4.
IBM Research © 20084
mpp@us.ibm.com The Hungry Beast Processor (“beast”) Data (“food”) Data Pipe Pipe too small = starved beast Pipe big enough = well-fed beast Pipe too big = wasted resources
5.
IBM Research © 20085
mpp@us.ibm.com The Hungry Beast Processor (“beast”) Data (“food”) Data Pipe Pipe too small = starved beast Pipe big enough = well-fed beast Pipe too big = wasted resources If flops grow faster than pipe capacity… … the beast gets hungrier!
6.
IBM Research © 20086
mpp@us.ibm.com Move the food closer Example: Intel Tulsa – Xeon MP 7100 series – 65nm, 349mm2, 2 Cores – 3.4 GHz @ 150W – ~54.4 SP GFlops – http://www.intel.com/products /processor/xeon/index.htm Large cache on chip – ~50% of area – Keeps data close for efficient access If the data is local, the beast is happy! – True for many algorithms
7.
IBM Research © 20087
mpp@us.ibm.com What happens if the beast is still hungry? Data Cache If the data set doesn’t fit in cache – Cache misses – Memory latency exposed – Performance degraded Several important application classes don’t fit – Graph searching algorithms – Network security – Natural language processing – Bioinformatics – Many HPC workloads
8.
IBM Research © 20088
mpp@us.ibm.com Make the food bowl larger Data Cache Cache size steadily increasing Implications – Chip real estate reserved for cache – Less space on chip for computes – More power required for fewer FLOPS
9.
IBM Research © 20089
mpp@us.ibm.com Make the food bowl larger Data Cache Cache size steadily increasing Implications – Chip real estate reserved for cache – Less space on chip for computes – More power required for fewer FLOPS But… – Important application working sets are growing faster – Multicore even more demanding on cache than uni-core
10.
IBM Research © 200810
mpp@us.ibm.com Chapter 2: The Beast Has Babies
11.
IBM Research © 200811
mpp@us.ibm.com Power Density – The fundamental problem 1 10 100 1000 1.5 1 0.7 0.5 0.35 0.25 0.18 0.13 0.1 0.07 i386 i486 Pentium® Pentium Pro® Pentium II® Pentium III ® W/cm2 Hot Plate Nuclear Reactor Source: Fred Pollack, Intel. New Microprocessor Challenges in the Coming Generations of CMOS Technologies, Micro32
12.
IBM Research © 200812
mpp@us.ibm.com What’s causing the problem? 10S Tox=11AGate Stack Gate dielectric approaching a fundamental limit (a few atomic layers) PowerDensity(W/cm2) 65 nM Gate Length (microns) 1 0.010.1 1000 100 10 1 0.1 0.01 0.001 Power, signal jitter, etc...
13.
IBM Research © 200813
mpp@us.ibm.com 1.0E+02 1.0E+03 1.0E+04 1990 1995 2000 2005 2010 ClockSpeed(MHz) Clock Speed 103 102 104 Diminishing Returns on Frequency In a power-constrained environment, chip clock speed yields diminishing returns. The industry has moved to lower frequency multicore architectures. Frequency- Driven Design Points
14.
IBM Research © 200814
mpp@us.ibm.com Power vs Performance Trade Offs Relative Performance 0 1 2 3 4 5 RelativePower 1 1.45 1.3.85 1.7 We need to adapt our algorithms to get performance out of multicore
15.
IBM Research © 200815
mpp@us.ibm.com Implications of Multicore There are more mouths to feed – Data movement will take center stage Complexity of cores will stop increasing … and has started to decrease in some cases Complexity increases will center around communication Assumption – Achieving a significant % or peak performance is important
16.
IBM Research © 200816
mpp@us.ibm.com Chapter 3: The Proper Care and Feeding of Hungry Beasts
17.
IBM Research © 200817
mpp@us.ibm.com Cell/B.E. Processor: 200GFLOPS (SP) @ ~70W
18.
IBM Research © 200818
mpp@us.ibm.com Feeding the Cell Processor 8 SPEs each with – LS – MFC – SXU PPE – OS functions – Disk IO – Network IO 16B/cycle (2x)16B/cycle BIC FlexIOTM MIC Dual XDRTM 16B/cycle EIB (up to 96B/cycle) 16B/cycle 64-bit Power Architecture with VMX PPE SPE LS SXU SPU MFC PXUL1 PPU 16B/cycle L2 32B/cycle LS SXU SPU MFC LS SXU SPU MFC LS SXU SPU MFC LS SXU SPU MFC LS SXU SPU MFC LS SXU SPU MFC LS SXU SPU MFC
19.
IBM Research © 200819
mpp@us.ibm.com Cell Approach: Feed the beast more efficiently Explicitly “orchestrate” the data flow between main memory and each SPE’s local store – Use SPE’s DMA engine to gather & scatter data between memory main memory and local store – Enables detailed programmer control of data flow • Get/Put data when & where you want it • Hides latency: Simultaneous reads, writes & computes – Avoids restrictive HW cache management • Unlikely to determine optimal data flow • Potentially very inefficient – Allows more efficient use of the existing bandwidth
20.
IBM Research © 200820
mpp@us.ibm.com Cell Approach: Feed the beast more efficiently Explicitly “orchestrate” the data flow between main memory and each SPE’s local store – Use SPE’s DMA engine to gather & scatter data between memory main memory and local store – Enables detailed programmer control of data flow • Get/Put data when & where you want it • Hides latency: Simultaneous reads, writes & computes – Avoids restrictive HW cache management • Unlikely to determine optimal data flow • Potentially very inefficient – Allows more efficient use of the existing bandwidth BOTTOM LINE: It’s all about the data!
21.
IBM Research © 200821
mpp@us.ibm.com Cell Comparison: ~4x the FLOPS @ ~½ the power Both 65nm technology (to scale)
22.
IBM Research © 200822
mpp@us.ibm.com Memory Managing Processor vs. Traditional General Purpose Processor IBM AMD Intel Cell BE
23.
IBM Research © 200823
mpp@us.ibm.com Examples of Feeding Cell 2D and 3D FFTs Seismic Imaging String Matching Neural Networks (function approximation)
24.
IBM Research © 200824
mpp@us.ibm.com Feeding FFTs to Cell Buffer Input Image Transposed Image Tile Transposed Tile Transposed Buffer SIMDized data DMAs double buffered Pass 1: For each buffer • DMA Get buffer • Do four 1D FFTs in SIMD • Transpose tiles • DMA Put buffer Pass 2: For each buffer • DMA Get buffer • Do four 1D FFTs in SIMD • Transpose tiles • DMA Put buffer
25.
IBM Research © 200825
mpp@us.ibm.com 3D FFTs Long stride trashes cache Cell DMA allows prefetch Single Element Data envelope Stride 1 Stride N2 N
26.
IBM Research © 200826
mpp@us.ibm.com Feeding Seismic Imaging to Cell (X,Y) New G at each (x,y) Radial symmetry of G reduces BW requirements Data Green’s Function ij jiyxGjyixD ),,,(),(
27.
IBM Research © 200827
mpp@us.ibm.com Feeding Seismic Imaging to Cell Data SPE 0 SPE 1 SPE 2 SPE 3 SPE 4 SPE 5 SPE 6 SPE 7
28.
IBM Research © 200828
mpp@us.ibm.com Feeding Seismic Imaging to Cell Data SPE 0 SPE 1 SPE 2 SPE 3 SPE 4 SPE 5 SPE 6 SPE 7
29.
IBM Research © 200829
mpp@us.ibm.com Feeding Seismic Imaging to Cell For each X – Load next column of data – Load next column of indices – For each Y • Load Green’s functions • SIMDize Green’s functions • Compute convolution at (X,Y) – Cycle buffers H 2R+1 1 Data buffer Green’s Index buffer (X,Y) R 2
30.
IBM Research © 200830
mpp@us.ibm.com Feeding String Matching to Cell Find (lots of) substrings in (long) string Build graph of words & represent as DFA Problem: Graph doesn’t fit in LS Sample Word List: “the” “that” “math”
31.
IBM Research © 200831
mpp@us.ibm.com Feeding String Matching to Cell
32.
IBM Research © 200832
mpp@us.ibm.com Hiding Main Memory Latency
33.
IBM Research © 200833
mpp@us.ibm.com Software Multithreading
34.
IBM Research © 200834
mpp@us.ibm.com Feeding Neural Networks to Cell Neural net function F(X) – RBF, MLP, KNN, etc. If too big for LS, BW Bound N Basis functions: dot product + nonlinearity D Input dimensions DxN Matrix of parameters Output F X
35.
IBM Research © 200835
mpp@us.ibm.com Convert BW Bound to Compute Bound Split function over multiple SPEs Avoids unnecessary memory traffic Reduce compute time per SPE Minimal merge overhead Merge
36.
IBM Research © 200836
mpp@us.ibm.com Moral of the Story: It’s All About the Data! The data problem is growing: multicore Intelligent software prefetching – Use DMA engines – Don’t rely on HW prefetching Efficient data management – Multibuffering: Hide the latency! – BW utilization: Make every byte count! – SIMDization: Make every vector count! – Problem/data partitioning: Make every core work! – Software multithreading: Keep every core busy!
37.
IBM Research © 200837
mpp@us.ibm.com Backup
38.
IBM Research © 200838
mpp@us.ibm.com Abstract Technological obstacles have prevented the microprocessor industry from achieving increased performance through increased chip clock speeds. In a reaction to these restrictions, the industry has chosen the multicore processors path. Multicore processors promise tremendous GFLOPS performance but raise the challenge of how one programs them. In this talk, I will discuss the motivation for multicore, the implications to programmers and how the Cell/B.E. processors design addresses these challenges. As an example, I will review one or two applications that highlight the strengths of Cell.
Download now