SlideShare a Scribd company logo
EE382N-20: Intel Xeon Phi
Mattan Erez
The University of Texas at Austin
Intel’s version of throughput computing
• Similar goals as GPUs
• Ignore latency
• Focus on parallelism and efficiency
• But, start with traditional x86 cores
– More general purpose
– Reasonable caches
• Add more parallelism and more latency hiding
Knights Corner
• Pentium (P54) scalar cores
– Efficient small core, but not best performing
• 61 cores (22nm process)
• 512KB shared L2 per core
Knights Landing
• Upgrade to Atom cores
• More cores (72) (14nm process)
• On-package “near” DRAM
• Socket rather than PCIe
– Can be “self hosted”
Parallelism
• Forget OOO, explicit parallelism
• 512-bit vector units in each core
• “many integrated cores” (MIC)
– Currently, 61 cores and more soon
• A lot of the innovations are in the SIMD units
– New instructions
– Better arithmetic design
– Scatter/gather (to cache)
– Help with mask generation
– Better permutations
Latency hiding
• 4 HW threads per core
• Many cores
• Big-ish caches
Simplicity first
• Ring interconnect
• Extended MESI (kind of like MOESI, but not quite)
– MESI + GOLS (globally owned locally shared)
• Simple global Tag Directory
• No real multi-threading support, just coherence
– Currently, synchronization is painful
– ~250 cycles to get through tag directory to find stuff
Programming
• Currently, accelerator as off-load mode
• Program with OpenMP/OpenACC or MPI for the most
part
• Can share virtual addresses with host and compiler can
insert copies (at offload boundaries)
• Still a bit rough around the edges
– Performance bugs
– Memory leaks
– Inter-node comm not ideal
• Significant investment though, will get better
xeon phi_mattan erez.pptx

More Related Content

Similar to xeon phi_mattan erez.pptx

In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesHazelcast
 
Trip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine LearningTrip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine LearningRenaldas Zioma
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor DesignSri Prasanna
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureScyllaDB
 
Network support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersNetwork support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersSangjin Han
 
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataHitoshi Sato
 
Amd Barcelona Presentation Slideshare
Amd Barcelona Presentation SlideshareAmd Barcelona Presentation Slideshare
Amd Barcelona Presentation SlideshareDon Scansen
 
Multicore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash PrajapatiMulticore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash PrajapatiAnkit Raj
 
SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4UniFabric
 
finaldraft-intelcorei5processorsarchitecture-130207093535-phpapp01.pdf
finaldraft-intelcorei5processorsarchitecture-130207093535-phpapp01.pdffinaldraft-intelcorei5processorsarchitecture-130207093535-phpapp01.pdf
finaldraft-intelcorei5processorsarchitecture-130207093535-phpapp01.pdfNazarAhmadAlkhidir
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsAnand Haridass
 
Nodes and Networks for HPC computing
Nodes and Networks for HPC computingNodes and Networks for HPC computing
Nodes and Networks for HPC computingrinnocente
 
Recon2016 shooting the_osx_el_capitan_kernel_like_a_sniper_chen_he
Recon2016 shooting the_osx_el_capitan_kernel_like_a_sniper_chen_heRecon2016 shooting the_osx_el_capitan_kernel_like_a_sniper_chen_he
Recon2016 shooting the_osx_el_capitan_kernel_like_a_sniper_chen_heLiang Chen
 

Similar to xeon phi_mattan erez.pptx (20)

In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & Techniques
 
Trip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine LearningTrip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine Learning
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor Design
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
 
Network support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersNetwork support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacenters
 
Lec04 gpu architecture
Lec04 gpu architectureLec04 gpu architecture
Lec04 gpu architecture
 
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
 
Amd Barcelona Presentation Slideshare
Amd Barcelona Presentation SlideshareAmd Barcelona Presentation Slideshare
Amd Barcelona Presentation Slideshare
 
27 multicore
27 multicore27 multicore
27 multicore
 
Multicore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash PrajapatiMulticore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash Prajapati
 
SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4
 
Hpc 4 5
Hpc 4 5Hpc 4 5
Hpc 4 5
 
The Motherboard
The MotherboardThe Motherboard
The Motherboard
 
finaldraft-intelcorei5processorsarchitecture-130207093535-phpapp01.pdf
finaldraft-intelcorei5processorsarchitecture-130207093535-phpapp01.pdffinaldraft-intelcorei5processorsarchitecture-130207093535-phpapp01.pdf
finaldraft-intelcorei5processorsarchitecture-130207093535-phpapp01.pdf
 
Multicore computers
Multicore computersMulticore computers
Multicore computers
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
 
Nodes and Networks for HPC computing
Nodes and Networks for HPC computingNodes and Networks for HPC computing
Nodes and Networks for HPC computing
 
L05 parallel
L05 parallelL05 parallel
L05 parallel
 
Recon2016 shooting the_osx_el_capitan_kernel_like_a_sniper_chen_he
Recon2016 shooting the_osx_el_capitan_kernel_like_a_sniper_chen_heRecon2016 shooting the_osx_el_capitan_kernel_like_a_sniper_chen_he
Recon2016 shooting the_osx_el_capitan_kernel_like_a_sniper_chen_he
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 

More from ssuser30e7d2

PACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptxPACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptxssuser30e7d2
 
isca22-feng-menda_for sparse transposition and dataflow.pptx
isca22-feng-menda_for sparse transposition and dataflow.pptxisca22-feng-menda_for sparse transposition and dataflow.pptx
isca22-feng-menda_for sparse transposition and dataflow.pptxssuser30e7d2
 
xeon phi_mattan erez.pptx
xeon phi_mattan erez.pptxxeon phi_mattan erez.pptx
xeon phi_mattan erez.pptxssuser30e7d2
 
FOSDEM_2019_Buildroot_RISCV.pdf
FOSDEM_2019_Buildroot_RISCV.pdfFOSDEM_2019_Buildroot_RISCV.pdf
FOSDEM_2019_Buildroot_RISCV.pdfssuser30e7d2
 
Gunjae_ISCA15_slides.pdf
Gunjae_ISCA15_slides.pdfGunjae_ISCA15_slides.pdf
Gunjae_ISCA15_slides.pdfssuser30e7d2
 
shift register.ppt
shift register.pptshift register.ppt
shift register.pptssuser30e7d2
 

More from ssuser30e7d2 (7)

PACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptxPACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptx
 
isca22-feng-menda_for sparse transposition and dataflow.pptx
isca22-feng-menda_for sparse transposition and dataflow.pptxisca22-feng-menda_for sparse transposition and dataflow.pptx
isca22-feng-menda_for sparse transposition and dataflow.pptx
 
xeon phi_mattan erez.pptx
xeon phi_mattan erez.pptxxeon phi_mattan erez.pptx
xeon phi_mattan erez.pptx
 
FOSDEM_2019_Buildroot_RISCV.pdf
FOSDEM_2019_Buildroot_RISCV.pdfFOSDEM_2019_Buildroot_RISCV.pdf
FOSDEM_2019_Buildroot_RISCV.pdf
 
Project3.ppt
Project3.pptProject3.ppt
Project3.ppt
 
Gunjae_ISCA15_slides.pdf
Gunjae_ISCA15_slides.pdfGunjae_ISCA15_slides.pdf
Gunjae_ISCA15_slides.pdf
 
shift register.ppt
shift register.pptshift register.ppt
shift register.ppt
 

Recently uploaded

İTÜ CAD and Reverse Engineering Workshop
İTÜ CAD and Reverse Engineering WorkshopİTÜ CAD and Reverse Engineering Workshop
İTÜ CAD and Reverse Engineering WorkshopEmre Günaydın
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationRobbie Edward Sayers
 
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdfONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdfKamal Acharya
 
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and ClusteringKIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and ClusteringDr. Radhey Shyam
 
fluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answerfluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answerapareshmondalnita
 
A case study of cinema management system project report..pdf
A case study of cinema management system project report..pdfA case study of cinema management system project report..pdf
A case study of cinema management system project report..pdfKamal Acharya
 
Scaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltageScaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltageRCC Institute of Information Technology
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industriesMuhammadTufail242431
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringC Sai Kiran
 
Peek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdfPeek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdfAyahmorsy
 
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docxThe Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docxCenterEnamel
 
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptxCloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptxMd. Shahidul Islam Prodhan
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectRased Khan
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdfKamal Acharya
 
fundamentals of drawing and isometric and orthographic projection
fundamentals of drawing and isometric and orthographic projectionfundamentals of drawing and isometric and orthographic projection
fundamentals of drawing and isometric and orthographic projectionjeevanprasad8
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdfAhmedHussein950959
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edgePaco Orozco
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwoodseandesed
 
Natalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in KrakówNatalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in Krakówbim.edu.pl
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdfKamal Acharya
 

Recently uploaded (20)

İTÜ CAD and Reverse Engineering Workshop
İTÜ CAD and Reverse Engineering WorkshopİTÜ CAD and Reverse Engineering Workshop
İTÜ CAD and Reverse Engineering Workshop
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdfONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
 
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and ClusteringKIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
 
fluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answerfluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answer
 
A case study of cinema management system project report..pdf
A case study of cinema management system project report..pdfA case study of cinema management system project report..pdf
A case study of cinema management system project report..pdf
 
Scaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltageScaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltage
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
 
Peek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdfPeek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdf
 
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docxThe Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
 
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptxCloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker project
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
fundamentals of drawing and isometric and orthographic projection
fundamentals of drawing and isometric and orthographic projectionfundamentals of drawing and isometric and orthographic projection
fundamentals of drawing and isometric and orthographic projection
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
Natalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in KrakówNatalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in Kraków
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdf
 

xeon phi_mattan erez.pptx

  • 1. EE382N-20: Intel Xeon Phi Mattan Erez The University of Texas at Austin
  • 2. Intel’s version of throughput computing • Similar goals as GPUs • Ignore latency • Focus on parallelism and efficiency • But, start with traditional x86 cores – More general purpose – Reasonable caches • Add more parallelism and more latency hiding
  • 3. Knights Corner • Pentium (P54) scalar cores – Efficient small core, but not best performing • 61 cores (22nm process) • 512KB shared L2 per core
  • 4. Knights Landing • Upgrade to Atom cores • More cores (72) (14nm process) • On-package “near” DRAM • Socket rather than PCIe – Can be “self hosted”
  • 5. Parallelism • Forget OOO, explicit parallelism • 512-bit vector units in each core • “many integrated cores” (MIC) – Currently, 61 cores and more soon • A lot of the innovations are in the SIMD units – New instructions – Better arithmetic design – Scatter/gather (to cache) – Help with mask generation – Better permutations
  • 6. Latency hiding • 4 HW threads per core • Many cores • Big-ish caches
  • 7. Simplicity first • Ring interconnect • Extended MESI (kind of like MOESI, but not quite) – MESI + GOLS (globally owned locally shared) • Simple global Tag Directory • No real multi-threading support, just coherence – Currently, synchronization is painful – ~250 cycles to get through tag directory to find stuff
  • 8. Programming • Currently, accelerator as off-load mode • Program with OpenMP/OpenACC or MPI for the most part • Can share virtual addresses with host and compiler can insert copies (at offload boundaries) • Still a bit rough around the edges – Performance bugs – Memory leaks – Inter-node comm not ideal • Significant investment though, will get better