SlideShare a Scribd company logo
Multicore Computers
Hardware Performance Issues
• Microprocessors have seen an exponential increase in performance
• Improved organization
• Increased clock frequency
• Increase in Parallelism
• Pipelining
• Superscalar
• Simultaneous multithreading (SMT)
• Diminishing returns
• More complexity requires more logic
• Increasing chip area for coordinating and signal transfer logic
• Harder to design, make and debug
Alternative Chip
Organizations
Intel Hardware
Trends
Increased Complexity
• Power requirements grow exponentially with chip density
and clock frequency
• Can use more chip area for cache
• Smaller
• Order of magnitude lower power requirements
• By 2015
• 100 billion transistors on 300mm2
die
• Cache of 100MB
• 1 billion transistors for logic
• Pollack’s rule:
• Performance is roughly proportional to square root of increase in
complexity
• Double complexity gives 40% more performance
• Multicore has potential for near-linear improvement
• Unlikely that one core can use all cache effectively
Power and Memory Considerations
Chip Utilization of Transistors
Software Performance Issues
• Performance benefits dependent on effective exploitation of parallel
resources
• Even small amounts of serial code impact performance
• 10% inherently serial on 8 processor system gives only 4.7 times
performance
• Communication, distribution of work and cache coherence
overheads
• Some applications effectively exploit multicore processors
Effective Applications for Multicore Processors
• Database
• Servers handling independent transactions
• Multi-threaded native applications
• Lotus Domino, Siebel CRM
• Multi-process applications
• Oracle, SAP, PeopleSoft
• Java applications
• Java VM is multi-thread with scheduling and memory management
• Sun’s Java Application Server, BEA’s Weblogic, IBM Websphere, Tomcat
• Multi-instance applications
• One application running multiple times
• E.g. Value Game Software
Multicore Organization• Number of core processors on chip
• Number of levels of cache on chip
• Amount of shared cache
• Next slide examples of each organization:
• (a) ARM11 MPCore
• (b) AMD Opteron
• (c) Intel Core Duo
• (d) Intel Core i7
Multicore Organization Alternatives
Advantages of shared L2 Cache
• Constructive interference reduces overall miss rate
• Data shared by multiple cores not replicated at cache level
• With proper frame replacement algorithms mean amount of
shared cache dedicated to each core is dynamic
• Threads with less locality can have more cache
• Easy inter-process communication through shared memory
• Cache coherency confined to L1
• Dedicated L2 cache gives each core more rapid access
• Good for threads with strong locality
• Shared L3 cache may also improve performance
Individual Core Architecture
• Intel Core Duo uses superscalar cores
• Intel Core i7 uses simultaneous multi-threading (SMT)
• Scales up number of threads supported
• 4 SMT cores, each supporting 4 threads appears as 16 core
Intel x86 Multicore Organization -
Core Duo (1)
• 2006
• Two x86 superscalar, shared L2 cache
• Dedicated L1 cache per core
• 32KB instruction and 32KB data
• Thermal control unit per core
• Manages chip heat dissipation
• Maximize performance within constraints
• Improved ergonomics
• Advanced Programmable Interrupt Controlled (APIC)
• Inter-process interrupts between cores
• Routes interrupts to appropriate core
• Includes timer so OS can interrupt core
Intel x86 Multicore Organization -
Core Duo (2)
• Power Management Logic
• Monitors thermal conditions and CPU activity
• Adjusts voltage and power consumption
• Can switch individual logic subsystems
• 2MB shared L2 cache
• Dynamic allocation
• MESI support for L1 caches
• Extended to support multiple Core Duo in SMP
• L2 data shared between local cores or external
• Bus interface
Intel x86 Multicore Organization -
Core i7
• November 2008
• Four x86 SMT processors
• Dedicated L2, shared L3 cache
• Speculative pre-fetch for caches
• On chip DDR3 memory controller
• Three 8 byte channels (192 bits) giving 32GB/s
• No front side bus
• QuickPath Interconnection
• Cache coherent point-to-point link
• High speed communications between processor chips
• 6.4G transfers per second, 16 bits per transfer
• Dedicated bi-directional pairs
• Total bandwidth 25.6GB/s
ARM11 MPCore
• Up to 4 processors each with own L1 instruction and data cache
• Distributed interrupt controller
• Timer per CPU
• Watchdog
• Warning alerts for software failures
• Counts down from predetermined values
• Issues warning at zero
• CPU interface
• Interrupt acknowledgement, masking and completion acknowledgement
• CPU
• Single ARM11 called MP11
• Vector floating-point unit
• FP co-processor
• L1 cache
• Snoop control unit
• L1 cache coherency
ARM11
MPCore
Block
Diagram
ARM11 MPCore Interrupt Handling
• Distributed Interrupt Controller (DIC) collates from many sources
• Masking
• Prioritization
• Distribution to target MP11 CPUs
• Status tracking
• Software interrupt generation
• Number of interrupts independent of MP11 CPU design
• Memory mapped
• Accessed by CPUs via private interface through SCU
• Can route interrupts to single or multiple CPUs
• Provides inter-process communication
• Thread on one CPU can cause activity by thread on another CPU
DIC Routing
• Direct to specific CPU
• To defined group of CPUs
• To all CPUs
• OS can generate interrupt to:
• All but self
• Self
• Other specific CPU
• Typically combined with shared memory for inter-process
communication
• 16 interrupt ids available for inter-process communication
Interrupt States
• Inactive
• Non-asserted
• Completed by that CPU but pending or active in others
• Pending
• Asserted
• Processing not started on that CPU
• Active
• Started on that CPU but not complete
• Can be pre-empted by higher priority interrupt
Interrupt Sources
• Inter-process Interrupts (IPI)
• Private to CPU
• ID0-ID15
• Software triggered
• Priority depends on target CPU not source
• Private timer and/or watchdog interrupt
• ID29 and ID30
• Legacy FIQ line
• Legacy FIQ pin, per CPU, bypasses interrupt distributor
• Directly drives interrupts to CPU
• Hardware
• Triggered by programmable events on associated interrupt lines
• Up to 224 lines
• Start at ID32
ARM11 MPCore Interrupt Distributor
Cache Coherency
• Snoop Control Unit (SCU) resolves most shared data bottleneck
issues
• L1 cache coherency based on MESI
• Direct data Intervention
• Copying clean entries between L1 caches without accessing external
memory
• Reduces read after write from L1 to L2
• Can resolve local L1 miss from rmote L1 rather than L2
• Duplicated tag RAMs
• Cache tags implemented as separate block of RAM
• Same length as number of lines in cache
• Duplicates used by SCU to check data availability before sending
coherency commands
• Only send to CPUs that must update coherent data cache
• Migratory lines
• Allows moving dirty data between CPUs without writing to L2 and
reading back from external memory
Recommended Reading
• Stallings chapter 18
• ARM web site
Intel Core i& Block Diagram
Intel Core Duo Block Diagram
Performance Effect of Multiple Cores
Recommended Reading
• Multicore Association web site
• ARM web site

More Related Content

What's hot

Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecturePiyush Mittal
 
Parallel computing
Parallel computingParallel computing
Parallel computing
Vinay Gupta
 
Multicore Processors
Multicore ProcessorsMulticore Processors
Multicore Processors
Smruti Sarangi
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
Dr Shashikant Athawale
 
Parallel computing and its applications
Parallel computing and its applicationsParallel computing and its applications
Parallel computing and its applications
Burhan Ahmed
 
Intel Core i7 Processors
Intel Core i7 ProcessorsIntel Core i7 Processors
Intel Core i7 Processors
Anagh Vijayvargia
 
Computer architecture cache memory
Computer architecture cache memoryComputer architecture cache memory
Computer architecture cache memory
Mazin Alwaaly
 
Multi core processors
Multi core processorsMulti core processors
Multi core processors
Ummiya Mohammedi
 
Multi-core architectures
Multi-core architecturesMulti-core architectures
Multi-core architecturesnextlib
 
Architecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPUArchitecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPU
GlobalLogic Ukraine
 
Multiprocessor architecture
Multiprocessor architectureMultiprocessor architecture
Multiprocessor architecture
Arpan Baishya
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
VIKAS SINGH BHADOURIA
 
Multi_Core_Processor_2015_(Download it!)
Multi_Core_Processor_2015_(Download it!)Multi_Core_Processor_2015_(Download it!)
Multi_Core_Processor_2015_(Download it!)
Sudip Roy
 
Multicore processors and its advantages
Multicore processors and its advantagesMulticore processors and its advantages
Multicore processors and its advantages
Nitesh Tudu
 
AMD Processor
AMD ProcessorAMD Processor
AMD Processor
Reber Novanta
 
Intel core i5
Intel core i5Intel core i5
Intel core i5
Abdul-Fattah Mahran
 
Single &Multi Core processor
Single &Multi Core processorSingle &Multi Core processor
Single &Multi Core processor
Justify Shadap
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
Haris456
 
Parallel computing in india
Parallel computing in indiaParallel computing in india
Parallel computing in india
Preeti Chauhan
 

What's hot (20)

Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecture
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
Multicore Processors
Multicore ProcessorsMulticore Processors
Multicore Processors
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
 
Parallel computing and its applications
Parallel computing and its applicationsParallel computing and its applications
Parallel computing and its applications
 
Intel Core i7 Processors
Intel Core i7 ProcessorsIntel Core i7 Processors
Intel Core i7 Processors
 
NUMA overview
NUMA overviewNUMA overview
NUMA overview
 
Computer architecture cache memory
Computer architecture cache memoryComputer architecture cache memory
Computer architecture cache memory
 
Multi core processors
Multi core processorsMulti core processors
Multi core processors
 
Multi-core architectures
Multi-core architecturesMulti-core architectures
Multi-core architectures
 
Architecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPUArchitecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPU
 
Multiprocessor architecture
Multiprocessor architectureMultiprocessor architecture
Multiprocessor architecture
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
 
Multi_Core_Processor_2015_(Download it!)
Multi_Core_Processor_2015_(Download it!)Multi_Core_Processor_2015_(Download it!)
Multi_Core_Processor_2015_(Download it!)
 
Multicore processors and its advantages
Multicore processors and its advantagesMulticore processors and its advantages
Multicore processors and its advantages
 
AMD Processor
AMD ProcessorAMD Processor
AMD Processor
 
Intel core i5
Intel core i5Intel core i5
Intel core i5
 
Single &Multi Core processor
Single &Multi Core processorSingle &Multi Core processor
Single &Multi Core processor
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
 
Parallel computing in india
Parallel computing in indiaParallel computing in india
Parallel computing in india
 

Similar to Multicore computers

Intel® hyper threading technology
Intel® hyper threading technologyIntel® hyper threading technology
Intel® hyper threading technology
Amirali Sharifian
 
Architecture of high end processors
Architecture of high end processorsArchitecture of high end processors
Architecture of high end processors
University of Gujrat, Pakistan
 
CPU Caches
CPU CachesCPU Caches
CPU Caches
shinolajla
 
trends of microprocessor field
trends of microprocessor fieldtrends of microprocessor field
trends of microprocessor field
Ramya SK
 
5_Embedded Systems مختصر.pdf
5_Embedded Systems  مختصر.pdf5_Embedded Systems  مختصر.pdf
5_Embedded Systems مختصر.pdf
aliamjd
 
Motivation for multithreaded architectures
Motivation for multithreaded architecturesMotivation for multithreaded architectures
Motivation for multithreaded architectures
Young Alista
 
Challenges in Embedded Computing
Challenges in Embedded ComputingChallenges in Embedded Computing
Challenges in Embedded Computing
Pradeep Kumar TS
 
Computer System Architecture Lecture Note 8.1 primary Memory
Computer System Architecture Lecture Note 8.1 primary MemoryComputer System Architecture Lecture Note 8.1 primary Memory
Computer System Architecture Lecture Note 8.1 primary Memory
Budditha Hettige
 
Central Processing Unit
Central Processing Unit Central Processing Unit
Central Processing Unit
Alaka Acharya
 
Computer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and MicrocontrollerComputer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and Microcontroller
AmrutaMehata
 
Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 HardwareJacob Wu
 
Microprocessor
MicroprocessorMicroprocessor
Microprocessor
Anand Tiwari
 
Sparc t3 2 technical presentation
Sparc t3 2 technical presentationSparc t3 2 technical presentation
Sparc t3 2 technical presentation
solarisyougood
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors ppt
Siddhartha Anand
 
Computer Evolution.ppt
Computer Evolution.pptComputer Evolution.ppt
Computer Evolution.ppt
VivekTrial
 
MK Sistem Operasi.pdf
MK Sistem Operasi.pdfMK Sistem Operasi.pdf
MK Sistem Operasi.pdf
wisard1
 
Microprocessor.ppt
Microprocessor.pptMicroprocessor.ppt
Microprocessor.pptsafia kalwar
 
Chapter4 Data Processing
Chapter4 Data ProcessingChapter4 Data Processing
Chapter4 Data Processing
Muhammad Waqas
 
Mces MOD 1.pptx
Mces MOD 1.pptxMces MOD 1.pptx
Mces MOD 1.pptx
RadhaC10
 
Sparc t4 2 system technical overview
Sparc t4 2 system technical overviewSparc t4 2 system technical overview
Sparc t4 2 system technical overview
solarisyougood
 

Similar to Multicore computers (20)

Intel® hyper threading technology
Intel® hyper threading technologyIntel® hyper threading technology
Intel® hyper threading technology
 
Architecture of high end processors
Architecture of high end processorsArchitecture of high end processors
Architecture of high end processors
 
CPU Caches
CPU CachesCPU Caches
CPU Caches
 
trends of microprocessor field
trends of microprocessor fieldtrends of microprocessor field
trends of microprocessor field
 
5_Embedded Systems مختصر.pdf
5_Embedded Systems  مختصر.pdf5_Embedded Systems  مختصر.pdf
5_Embedded Systems مختصر.pdf
 
Motivation for multithreaded architectures
Motivation for multithreaded architecturesMotivation for multithreaded architectures
Motivation for multithreaded architectures
 
Challenges in Embedded Computing
Challenges in Embedded ComputingChallenges in Embedded Computing
Challenges in Embedded Computing
 
Computer System Architecture Lecture Note 8.1 primary Memory
Computer System Architecture Lecture Note 8.1 primary MemoryComputer System Architecture Lecture Note 8.1 primary Memory
Computer System Architecture Lecture Note 8.1 primary Memory
 
Central Processing Unit
Central Processing Unit Central Processing Unit
Central Processing Unit
 
Computer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and MicrocontrollerComputer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and Microcontroller
 
Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 Hardware
 
Microprocessor
MicroprocessorMicroprocessor
Microprocessor
 
Sparc t3 2 technical presentation
Sparc t3 2 technical presentationSparc t3 2 technical presentation
Sparc t3 2 technical presentation
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors ppt
 
Computer Evolution.ppt
Computer Evolution.pptComputer Evolution.ppt
Computer Evolution.ppt
 
MK Sistem Operasi.pdf
MK Sistem Operasi.pdfMK Sistem Operasi.pdf
MK Sistem Operasi.pdf
 
Microprocessor.ppt
Microprocessor.pptMicroprocessor.ppt
Microprocessor.ppt
 
Chapter4 Data Processing
Chapter4 Data ProcessingChapter4 Data Processing
Chapter4 Data Processing
 
Mces MOD 1.pptx
Mces MOD 1.pptxMces MOD 1.pptx
Mces MOD 1.pptx
 
Sparc t4 2 system technical overview
Sparc t4 2 system technical overviewSparc t4 2 system technical overview
Sparc t4 2 system technical overview
 

More from Syed Zaid Irshad

Operating System.pdf
Operating System.pdfOperating System.pdf
Operating System.pdf
Syed Zaid Irshad
 
DBMS_Lab_Manual_&_Solution
DBMS_Lab_Manual_&_SolutionDBMS_Lab_Manual_&_Solution
DBMS_Lab_Manual_&_Solution
Syed Zaid Irshad
 
Data Structure and Algorithms.pptx
Data Structure and Algorithms.pptxData Structure and Algorithms.pptx
Data Structure and Algorithms.pptx
Syed Zaid Irshad
 
Design and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptxDesign and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptx
Syed Zaid Irshad
 
Professional Issues in Computing
Professional Issues in ComputingProfessional Issues in Computing
Professional Issues in Computing
Syed Zaid Irshad
 
Reduce course notes class xi
Reduce course notes class xiReduce course notes class xi
Reduce course notes class xi
Syed Zaid Irshad
 
Reduce course notes class xii
Reduce course notes class xiiReduce course notes class xii
Reduce course notes class xii
Syed Zaid Irshad
 
Introduction to Database
Introduction to DatabaseIntroduction to Database
Introduction to Database
Syed Zaid Irshad
 
C Language
C LanguageC Language
C Language
Syed Zaid Irshad
 
Flowchart
FlowchartFlowchart
Flowchart
Syed Zaid Irshad
 
Algorithm Pseudo
Algorithm PseudoAlgorithm Pseudo
Algorithm Pseudo
Syed Zaid Irshad
 
Computer Programming
Computer ProgrammingComputer Programming
Computer Programming
Syed Zaid Irshad
 
ICS 2nd Year Book Introduction
ICS 2nd Year Book IntroductionICS 2nd Year Book Introduction
ICS 2nd Year Book Introduction
Syed Zaid Irshad
 
Security, Copyright and the Law
Security, Copyright and the LawSecurity, Copyright and the Law
Security, Copyright and the Law
Syed Zaid Irshad
 
Computer Architecture
Computer ArchitectureComputer Architecture
Computer Architecture
Syed Zaid Irshad
 
Data Communication
Data CommunicationData Communication
Data Communication
Syed Zaid Irshad
 
Information Networks
Information NetworksInformation Networks
Information Networks
Syed Zaid Irshad
 
Basic Concept of Information Technology
Basic Concept of Information TechnologyBasic Concept of Information Technology
Basic Concept of Information Technology
Syed Zaid Irshad
 
Introduction to ICS 1st Year Book
Introduction to ICS 1st Year BookIntroduction to ICS 1st Year Book
Introduction to ICS 1st Year Book
Syed Zaid Irshad
 
Using the set operators
Using the set operatorsUsing the set operators
Using the set operators
Syed Zaid Irshad
 

More from Syed Zaid Irshad (20)

Operating System.pdf
Operating System.pdfOperating System.pdf
Operating System.pdf
 
DBMS_Lab_Manual_&_Solution
DBMS_Lab_Manual_&_SolutionDBMS_Lab_Manual_&_Solution
DBMS_Lab_Manual_&_Solution
 
Data Structure and Algorithms.pptx
Data Structure and Algorithms.pptxData Structure and Algorithms.pptx
Data Structure and Algorithms.pptx
 
Design and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptxDesign and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptx
 
Professional Issues in Computing
Professional Issues in ComputingProfessional Issues in Computing
Professional Issues in Computing
 
Reduce course notes class xi
Reduce course notes class xiReduce course notes class xi
Reduce course notes class xi
 
Reduce course notes class xii
Reduce course notes class xiiReduce course notes class xii
Reduce course notes class xii
 
Introduction to Database
Introduction to DatabaseIntroduction to Database
Introduction to Database
 
C Language
C LanguageC Language
C Language
 
Flowchart
FlowchartFlowchart
Flowchart
 
Algorithm Pseudo
Algorithm PseudoAlgorithm Pseudo
Algorithm Pseudo
 
Computer Programming
Computer ProgrammingComputer Programming
Computer Programming
 
ICS 2nd Year Book Introduction
ICS 2nd Year Book IntroductionICS 2nd Year Book Introduction
ICS 2nd Year Book Introduction
 
Security, Copyright and the Law
Security, Copyright and the LawSecurity, Copyright and the Law
Security, Copyright and the Law
 
Computer Architecture
Computer ArchitectureComputer Architecture
Computer Architecture
 
Data Communication
Data CommunicationData Communication
Data Communication
 
Information Networks
Information NetworksInformation Networks
Information Networks
 
Basic Concept of Information Technology
Basic Concept of Information TechnologyBasic Concept of Information Technology
Basic Concept of Information Technology
 
Introduction to ICS 1st Year Book
Introduction to ICS 1st Year BookIntroduction to ICS 1st Year Book
Introduction to ICS 1st Year Book
 
Using the set operators
Using the set operatorsUsing the set operators
Using the set operators
 

Recently uploaded

Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 

Recently uploaded (20)

Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 

Multicore computers

  • 2. Hardware Performance Issues • Microprocessors have seen an exponential increase in performance • Improved organization • Increased clock frequency • Increase in Parallelism • Pipelining • Superscalar • Simultaneous multithreading (SMT) • Diminishing returns • More complexity requires more logic • Increasing chip area for coordinating and signal transfer logic • Harder to design, make and debug
  • 5. Increased Complexity • Power requirements grow exponentially with chip density and clock frequency • Can use more chip area for cache • Smaller • Order of magnitude lower power requirements • By 2015 • 100 billion transistors on 300mm2 die • Cache of 100MB • 1 billion transistors for logic • Pollack’s rule: • Performance is roughly proportional to square root of increase in complexity • Double complexity gives 40% more performance • Multicore has potential for near-linear improvement • Unlikely that one core can use all cache effectively
  • 6. Power and Memory Considerations
  • 7. Chip Utilization of Transistors
  • 8. Software Performance Issues • Performance benefits dependent on effective exploitation of parallel resources • Even small amounts of serial code impact performance • 10% inherently serial on 8 processor system gives only 4.7 times performance • Communication, distribution of work and cache coherence overheads • Some applications effectively exploit multicore processors
  • 9. Effective Applications for Multicore Processors • Database • Servers handling independent transactions • Multi-threaded native applications • Lotus Domino, Siebel CRM • Multi-process applications • Oracle, SAP, PeopleSoft • Java applications • Java VM is multi-thread with scheduling and memory management • Sun’s Java Application Server, BEA’s Weblogic, IBM Websphere, Tomcat • Multi-instance applications • One application running multiple times • E.g. Value Game Software
  • 10. Multicore Organization• Number of core processors on chip • Number of levels of cache on chip • Amount of shared cache • Next slide examples of each organization: • (a) ARM11 MPCore • (b) AMD Opteron • (c) Intel Core Duo • (d) Intel Core i7
  • 12. Advantages of shared L2 Cache • Constructive interference reduces overall miss rate • Data shared by multiple cores not replicated at cache level • With proper frame replacement algorithms mean amount of shared cache dedicated to each core is dynamic • Threads with less locality can have more cache • Easy inter-process communication through shared memory • Cache coherency confined to L1 • Dedicated L2 cache gives each core more rapid access • Good for threads with strong locality • Shared L3 cache may also improve performance
  • 13. Individual Core Architecture • Intel Core Duo uses superscalar cores • Intel Core i7 uses simultaneous multi-threading (SMT) • Scales up number of threads supported • 4 SMT cores, each supporting 4 threads appears as 16 core
  • 14. Intel x86 Multicore Organization - Core Duo (1) • 2006 • Two x86 superscalar, shared L2 cache • Dedicated L1 cache per core • 32KB instruction and 32KB data • Thermal control unit per core • Manages chip heat dissipation • Maximize performance within constraints • Improved ergonomics • Advanced Programmable Interrupt Controlled (APIC) • Inter-process interrupts between cores • Routes interrupts to appropriate core • Includes timer so OS can interrupt core
  • 15. Intel x86 Multicore Organization - Core Duo (2) • Power Management Logic • Monitors thermal conditions and CPU activity • Adjusts voltage and power consumption • Can switch individual logic subsystems • 2MB shared L2 cache • Dynamic allocation • MESI support for L1 caches • Extended to support multiple Core Duo in SMP • L2 data shared between local cores or external • Bus interface
  • 16. Intel x86 Multicore Organization - Core i7 • November 2008 • Four x86 SMT processors • Dedicated L2, shared L3 cache • Speculative pre-fetch for caches • On chip DDR3 memory controller • Three 8 byte channels (192 bits) giving 32GB/s • No front side bus • QuickPath Interconnection • Cache coherent point-to-point link • High speed communications between processor chips • 6.4G transfers per second, 16 bits per transfer • Dedicated bi-directional pairs • Total bandwidth 25.6GB/s
  • 17. ARM11 MPCore • Up to 4 processors each with own L1 instruction and data cache • Distributed interrupt controller • Timer per CPU • Watchdog • Warning alerts for software failures • Counts down from predetermined values • Issues warning at zero • CPU interface • Interrupt acknowledgement, masking and completion acknowledgement • CPU • Single ARM11 called MP11 • Vector floating-point unit • FP co-processor • L1 cache • Snoop control unit • L1 cache coherency
  • 19. ARM11 MPCore Interrupt Handling • Distributed Interrupt Controller (DIC) collates from many sources • Masking • Prioritization • Distribution to target MP11 CPUs • Status tracking • Software interrupt generation • Number of interrupts independent of MP11 CPU design • Memory mapped • Accessed by CPUs via private interface through SCU • Can route interrupts to single or multiple CPUs • Provides inter-process communication • Thread on one CPU can cause activity by thread on another CPU
  • 20. DIC Routing • Direct to specific CPU • To defined group of CPUs • To all CPUs • OS can generate interrupt to: • All but self • Self • Other specific CPU • Typically combined with shared memory for inter-process communication • 16 interrupt ids available for inter-process communication
  • 21. Interrupt States • Inactive • Non-asserted • Completed by that CPU but pending or active in others • Pending • Asserted • Processing not started on that CPU • Active • Started on that CPU but not complete • Can be pre-empted by higher priority interrupt
  • 22. Interrupt Sources • Inter-process Interrupts (IPI) • Private to CPU • ID0-ID15 • Software triggered • Priority depends on target CPU not source • Private timer and/or watchdog interrupt • ID29 and ID30 • Legacy FIQ line • Legacy FIQ pin, per CPU, bypasses interrupt distributor • Directly drives interrupts to CPU • Hardware • Triggered by programmable events on associated interrupt lines • Up to 224 lines • Start at ID32
  • 23. ARM11 MPCore Interrupt Distributor
  • 24. Cache Coherency • Snoop Control Unit (SCU) resolves most shared data bottleneck issues • L1 cache coherency based on MESI • Direct data Intervention • Copying clean entries between L1 caches without accessing external memory • Reduces read after write from L1 to L2 • Can resolve local L1 miss from rmote L1 rather than L2 • Duplicated tag RAMs • Cache tags implemented as separate block of RAM • Same length as number of lines in cache • Duplicates used by SCU to check data availability before sending coherency commands • Only send to CPUs that must update coherent data cache • Migratory lines • Allows moving dirty data between CPUs without writing to L2 and reading back from external memory
  • 25. Recommended Reading • Stallings chapter 18 • ARM web site
  • 26. Intel Core i& Block Diagram
  • 27. Intel Core Duo Block Diagram
  • 28. Performance Effect of Multiple Cores
  • 29. Recommended Reading • Multicore Association web site • ARM web site