SlideShare a Scribd company logo
1 of 37
Trends of SW Platforms for Heterogeneous Multi-core systems
and
Open Source Community Activities
Seung-hwa Song
Trends of SW technologies for
Heterogeneous Multi-core systems
Research and Efforts to overcome Issues about H.S
Open source community activities
Why has the Multi-core Era Arrived?
Limitations of Moore’s Law (Performance Issue)
Performance-oriented design is not the main market needs today.
Performance per Watt is the main requirement. (Energy Issue)
Lower power consumption of computers is becoming more and more
important
Why has the Multi-core Era Arrived?
Performance per watt and power consumption of three kinds of processors
Classification of Multi Processor
SMP(Symmetric Multi Processor)
- Homogeneous architecture
AMP(Asymmetric Multi Processor)
- Heterogeneous architecture
- Each processor can access a common memory map
- Any task can be allocated to any given processor
SMP (homogeneous architecture)
ASMP system (heterogeneous architecture)
- Each processor has different structures for specialized purposes.
-Each system consists of a master processor and slave processors. The
slave processors communicate with the master processor.
- Some (slave) processors cannot access the common memory map and
some are designated only a special role instead.
- Programmers should understand (the unique) tasks on/of/ each processor
and consider(try to create an) efficient communication mechanism.
Comparisons between
Homogeneous and Heterogeneous Computing
Symmetric, Same cores
(Usually CPUs)
Assymmetric, Different cores
(CPUs, GPUs, DSPs and accelerators)
operation is guaranteed to be same at each core operation cannot be supposed to be same at each
core
easy to off load tasks more complicated to off load tasks
good compatibility less compatibility
specialized for specific tasks
Overview of MPSoC solutions
Lucent Daytona(2000)
- First MPSoC
- Application : wireless communication router
- Symmetric
- A common memory map
C-5 network Processor(2001)
- Application : Network packet processor
- Asymmetric
Overview of MPSoC solutions
Texas Instruments OMAP architecture (2004)
- Application : cell phone processor
- ARM9 (master) and TMS320C55x DSP
(slave)
- Asymmetric
Texas Instruments’ Davinci
- Application : multimedia processor
- ARM Cortex-A8, ARM M3, DSP, codec accelerator
ARM MPcore
- main applications
(networking,
file I/O, UI)
- control slave cores
Video codec accelerators
- Video compression
Data bus
DSP
- Image processing
Today, most embedded system processors are heterogeneous.
Even though ASMP is specialized for designer’s goal, higher performance is
always required.
Recent MPSoC architecture integrates both SMP and ASMP structures.
Overview of MPSoC solutions
Overview of MPSoC solutions
AMD’s APU(Accelerated Processing Unit) Llano(2011)
First CPU-GPU fused processor
Intel’s sandy bridge processor
CPU-GPU fused processing unit
Overview of MPSoC solutions
Various mobile application processors
Software Issues with Heterogeneous Systems
Offloading
- Task offloading is a main goal of multi core system
- In heterogeneous system, task offloading is not easy
Data sharing
- Overhead of data transferring via memory bus is important issue
- The results from each processing unit should be integrated
- the number of memory copy should be minimized.
Programmability
- S/W development productivity is important
Software Issues with Heterogeneous
Systems, Continued
How can programmers develop S/W for each
different processor easily? (Usability)
How can we move code from a system to
other systems? (Portability)
HSA Foundations
HSA creates an improved processor design that exposes the benefits and capabilities of mainstream
programmable computer elements. Each part works together seamlessly.
Commercial Solutions for parallel
computing
AMD - Accelerated Parallel Processing SDK(AMD cores)
Intel - parallel studio(Intel cores)
Nvidia - CUDA(Nvidia GPU)
Open projects for parallel computing
OpenMP(Only CPU)
OpenACC(CPU, GPU)
OpenCL(Various processors)
Introduction to OpenCL
Open Computing Language (OpenCL) is a framework for writing programs
that execute across heterogeneous platforms consisting of CPUs, GPUs,
DSPs, FPGAs and other processors. (Source: Wikipedia)
OpenCL is an open standard maintained by the Khronos Group.
Programming model executable across various types of processors
Introduction to OpenCL
the abstract concept of the modern high-level programming language is
abandoned in OpenCL
OpenCL provides an abstract programming model for heterogeneous
hardware so that programmers are able to control processor resources
more flexibly
While Nvidia’s CUDA is a solution to maximize use of only GPUs, the goal of
OpenCL is to utilize any available processor resources
But main use of OpenCL is focused on GPUs currently.
Introduction to OpenCL
HOST
Processing
Element
Compute Unit
OpenCL
device
OpenCL compliant processors
Vendors supporting OpenCL :
AMD, Intel, Apple, Qualcomm, Imagination Technologies,
STMicroelectronics, IBM, Samsung, NVIDIA
http://www.khronos.org/conformance/adopters/conformant-products#opencl
Other projects using OpenCL
Activities of ETRI
The Industrial S/W platform technology for
heterogeneous systems was weak.
OS, platforms and software libraries for
heterogeneous multi core systems are
becoming more and more important.
Activities of ETRI
R&D road map
- Advanced OS kernel
- CPU-GPU load balancing enhancement
- IDE tool supporting S/W development based on heterogeneous multi-core
- Power consumption measurement of multi-core processor
Research by ETRI
Advanced OS kernel with high-efficient load balancing scheduler
CPU-GPU load balancing enhancement
High efficient energy OS and energy consumption monitoring technology
The Role of OpenSEED
Distribution of open source developed by research institutes
http://opensw-seed.org
Role of OpenSEED
OpenCV ocl(OpenCL) module test
Conclusion
The heterogeneous system era has already arrived.
Open projects and organizations are supporting the SW platform standard for
heterogeneous systems.
Software platforms and its advances are essential because heterogenous
systems are sophisticated.
New technologies should be distributed to contribute to the industry based on
heterogeneous systems.
Thank you
You can download this presentation file at
http://sshlab.blogspot.com
References
OpenSEED
http://opensw-seed.org/
http://www.slideshare.net/manglamjaiswal1/multicore-processor-technology
http://www.slideshare.net/AMD/amd-isscc-keynote
The world’s first combination of low-power CPU and advanced GPU
integrated into a single embedded device.
http://www.amd.com/Documents/49282_G-Series_platform_brief.pdf
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6031577&url=http%3A%2F%2Fieeexpl
ore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6031577
AMD HSA : http://developer.amd.com/resources/heterogeneous-
computing/what-is-heterogeneous-system-architecture-hsa/
HAS Foundation : http://hsafoundation.com/
Khronos OpenCL : http://www.khronos.org/opencl/
Khronos Web CL : http://www.khronos.org/webcl/

More Related Content

What's hot

Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaRob Gillen
 
GEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions FrameworkGEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions FrameworkAlexey Smirnov
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Ural-PDC
 
Using Flame Graphs
Using Flame GraphsUsing Flame Graphs
Using Flame GraphsIsuru Perera
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersDhanashree Prasad
 
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Intel Software Brasil
 
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John MelonakosPT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John MelonakosAMD Developer Central
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAPiyush Mittal
 
Presentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingPresentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingVengada Karthik Rangaraju
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDAMartin Peniak
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Rob Gillen
 
Kernelvm 201312-dlmopen
Kernelvm 201312-dlmopenKernelvm 201312-dlmopen
Kernelvm 201312-dlmopenHajime Tazaki
 
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012DefCamp
 
java memory management & gc
java memory management & gcjava memory management & gc
java memory management & gcexsuns
 
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例National Cheng Kung University
 
Direct Code Execution @ CoNEXT 2013
Direct Code Execution @ CoNEXT 2013Direct Code Execution @ CoNEXT 2013
Direct Code Execution @ CoNEXT 2013Hajime Tazaki
 

What's hot (20)

Cuda
CudaCuda
Cuda
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
 
GEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions FrameworkGEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions Framework
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
 
Using Flame Graphs
Using Flame GraphsUsing Flame Graphs
Using Flame Graphs
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for Beginners
 
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
 
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John MelonakosPT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDA
 
OpenMP
OpenMPOpenMP
OpenMP
 
Presentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingPresentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel Programming
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)
 
Kernelvm 201312-dlmopen
Kernelvm 201312-dlmopenKernelvm 201312-dlmopen
Kernelvm 201312-dlmopen
 
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
 
java memory management & gc
java memory management & gcjava memory management & gc
java memory management & gc
 
The pocl Kernel Compiler
The pocl Kernel CompilerThe pocl Kernel Compiler
The pocl Kernel Compiler
 
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
 
Direct Code Execution @ CoNEXT 2013
Direct Code Execution @ CoNEXT 2013Direct Code Execution @ CoNEXT 2013
Direct Code Execution @ CoNEXT 2013
 

Viewers also liked

Svn에서 git으로 이주하기
Svn에서 git으로 이주하기Svn에서 git으로 이주하기
Svn에서 git으로 이주하기Seunghwa Song
 
OpenCV 에서 OpenCL 살짝 써보기
OpenCV 에서 OpenCL 살짝 써보기OpenCV 에서 OpenCL 살짝 써보기
OpenCV 에서 OpenCL 살짝 써보기Seunghwa Song
 
이기종 멀티코어 프로세서를 위한 프로그래밍 언어 및 영상처리 오픈소스
이기종 멀티코어 프로세서를 위한 프로그래밍 언어 및 영상처리 오픈소스이기종 멀티코어 프로세서를 위한 프로그래밍 언어 및 영상처리 오픈소스
이기종 멀티코어 프로세서를 위한 프로그래밍 언어 및 영상처리 오픈소스Seunghwa Song
 
Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...
Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...
Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...Akihiro Hayashi
 
Multi-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architectureMulti-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architectureUmair Amjad
 
Multi-Core (MC) Processor Qualification for Safety Critical Systems
Multi-Core (MC) Processor Qualification for Safety Critical SystemsMulti-Core (MC) Processor Qualification for Safety Critical Systems
Multi-Core (MC) Processor Qualification for Safety Critical SystemsAdaCore
 
OpenCV를 활용한 컬러추적 문자 인식기의 구현
OpenCV를 활용한 컬러추적 문자 인식기의 구현OpenCV를 활용한 컬러추적 문자 인식기의 구현
OpenCV를 활용한 컬러추적 문자 인식기의 구현Daegi Kim
 
이기종 멀티코어 기반의 Open cv 응용 사례 및 효율적인 어플리케이션 디자인
이기종 멀티코어 기반의 Open cv 응용 사례 및 효율적인 어플리케이션 디자인이기종 멀티코어 기반의 Open cv 응용 사례 및 효율적인 어플리케이션 디자인
이기종 멀티코어 기반의 Open cv 응용 사례 및 효율적인 어플리케이션 디자인Seunghwa Song
 
Energy efficient mobile computing techniques in smartphones
Energy efficient mobile computing techniques in smartphonesEnergy efficient mobile computing techniques in smartphones
Energy efficient mobile computing techniques in smartphonesNinad Hogade
 
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreLec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreHsien-Hsin Sean Lee, Ph.D.
 

Viewers also liked (10)

Svn에서 git으로 이주하기
Svn에서 git으로 이주하기Svn에서 git으로 이주하기
Svn에서 git으로 이주하기
 
OpenCV 에서 OpenCL 살짝 써보기
OpenCV 에서 OpenCL 살짝 써보기OpenCV 에서 OpenCL 살짝 써보기
OpenCV 에서 OpenCL 살짝 써보기
 
이기종 멀티코어 프로세서를 위한 프로그래밍 언어 및 영상처리 오픈소스
이기종 멀티코어 프로세서를 위한 프로그래밍 언어 및 영상처리 오픈소스이기종 멀티코어 프로세서를 위한 프로그래밍 언어 및 영상처리 오픈소스
이기종 멀티코어 프로세서를 위한 프로그래밍 언어 및 영상처리 오픈소스
 
Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...
Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...
Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...
 
Multi-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architectureMulti-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architecture
 
Multi-Core (MC) Processor Qualification for Safety Critical Systems
Multi-Core (MC) Processor Qualification for Safety Critical SystemsMulti-Core (MC) Processor Qualification for Safety Critical Systems
Multi-Core (MC) Processor Qualification for Safety Critical Systems
 
OpenCV를 활용한 컬러추적 문자 인식기의 구현
OpenCV를 활용한 컬러추적 문자 인식기의 구현OpenCV를 활용한 컬러추적 문자 인식기의 구현
OpenCV를 활용한 컬러추적 문자 인식기의 구현
 
이기종 멀티코어 기반의 Open cv 응용 사례 및 효율적인 어플리케이션 디자인
이기종 멀티코어 기반의 Open cv 응용 사례 및 효율적인 어플리케이션 디자인이기종 멀티코어 기반의 Open cv 응용 사례 및 효율적인 어플리케이션 디자인
이기종 멀티코어 기반의 Open cv 응용 사례 및 효율적인 어플리케이션 디자인
 
Energy efficient mobile computing techniques in smartphones
Energy efficient mobile computing techniques in smartphonesEnergy efficient mobile computing techniques in smartphones
Energy efficient mobile computing techniques in smartphones
 
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreLec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
 

Similar to Trends of SW Platforms for Heterogeneous Systems

Deview 2013 rise of the wimpy machines - john mao
Deview 2013   rise of the wimpy machines - john maoDeview 2013   rise of the wimpy machines - john mao
Deview 2013 rise of the wimpy machines - john maoNAVER D2
 
System_on_Chip_SOC.ppt
System_on_Chip_SOC.pptSystem_on_Chip_SOC.ppt
System_on_Chip_SOC.pptzahixdd
 
Silberschatz / OS Concepts
Silberschatz /  OS Concepts Silberschatz /  OS Concepts
Silberschatz / OS Concepts Alanisca Alanis
 
Modern processor art
Modern processor artModern processor art
Modern processor artwaqasjadoon11
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialmadhuinturi
 
Modern processor art
Modern processor artModern processor art
Modern processor artwaqasjadoon11
 
OpenCAPI next generation accelerator
OpenCAPI next generation accelerator OpenCAPI next generation accelerator
OpenCAPI next generation accelerator Ganesan Narayanasamy
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer ArchitectureSubhasis Dash
 
Icg hpc-user
Icg hpc-userIcg hpc-user
Icg hpc-usergdburton
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersRyousei Takano
 
Introduction to parallel_computing
Introduction to parallel_computingIntroduction to parallel_computing
Introduction to parallel_computingMehul Patel
 

Similar to Trends of SW Platforms for Heterogeneous Systems (20)

Deview 2013 rise of the wimpy machines - john mao
Deview 2013   rise of the wimpy machines - john maoDeview 2013   rise of the wimpy machines - john mao
Deview 2013 rise of the wimpy machines - john mao
 
System_on_Chip_SOC.ppt
System_on_Chip_SOC.pptSystem_on_Chip_SOC.ppt
System_on_Chip_SOC.ppt
 
Clustering
ClusteringClustering
Clustering
 
Silberschatz / OS Concepts
Silberschatz /  OS Concepts Silberschatz /  OS Concepts
Silberschatz / OS Concepts
 
Modern processor art
Modern processor artModern processor art
Modern processor art
 
processor struct
processor structprocessor struct
processor struct
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorial
 
Danish presentation
Danish presentationDanish presentation
Danish presentation
 
Modern processor art
Modern processor artModern processor art
Modern processor art
 
OpenCAPI next generation accelerator
OpenCAPI next generation accelerator OpenCAPI next generation accelerator
OpenCAPI next generation accelerator
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer Architecture
 
Introduction to parallel computing
Introduction to parallel computingIntroduction to parallel computing
Introduction to parallel computing
 
ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014
 
Icg hpc-user
Icg hpc-userIcg hpc-user
Icg hpc-user
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
ARM.pdf
ARM.pdfARM.pdf
ARM.pdf
 
Par com
Par comPar com
Par com
 
2337610
23376102337610
2337610
 
Multi-Core on Chip Architecture *doc - IK
Multi-Core on Chip Architecture *doc - IKMulti-Core on Chip Architecture *doc - IK
Multi-Core on Chip Architecture *doc - IK
 
Introduction to parallel_computing
Introduction to parallel_computingIntroduction to parallel_computing
Introduction to parallel_computing
 

Recently uploaded

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsZilliz
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 

Recently uploaded (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 

Trends of SW Platforms for Heterogeneous Systems

  • 1. Trends of SW Platforms for Heterogeneous Multi-core systems and Open Source Community Activities Seung-hwa Song
  • 2. Trends of SW technologies for Heterogeneous Multi-core systems Research and Efforts to overcome Issues about H.S Open source community activities
  • 3. Why has the Multi-core Era Arrived? Limitations of Moore’s Law (Performance Issue) Performance-oriented design is not the main market needs today. Performance per Watt is the main requirement. (Energy Issue) Lower power consumption of computers is becoming more and more important
  • 4. Why has the Multi-core Era Arrived? Performance per watt and power consumption of three kinds of processors
  • 5. Classification of Multi Processor SMP(Symmetric Multi Processor) - Homogeneous architecture AMP(Asymmetric Multi Processor) - Heterogeneous architecture
  • 6. - Each processor can access a common memory map - Any task can be allocated to any given processor SMP (homogeneous architecture)
  • 7. ASMP system (heterogeneous architecture) - Each processor has different structures for specialized purposes. -Each system consists of a master processor and slave processors. The slave processors communicate with the master processor. - Some (slave) processors cannot access the common memory map and some are designated only a special role instead. - Programmers should understand (the unique) tasks on/of/ each processor and consider(try to create an) efficient communication mechanism.
  • 8. Comparisons between Homogeneous and Heterogeneous Computing Symmetric, Same cores (Usually CPUs) Assymmetric, Different cores (CPUs, GPUs, DSPs and accelerators) operation is guaranteed to be same at each core operation cannot be supposed to be same at each core easy to off load tasks more complicated to off load tasks good compatibility less compatibility specialized for specific tasks
  • 9. Overview of MPSoC solutions Lucent Daytona(2000) - First MPSoC - Application : wireless communication router - Symmetric - A common memory map C-5 network Processor(2001) - Application : Network packet processor - Asymmetric
  • 10. Overview of MPSoC solutions Texas Instruments OMAP architecture (2004) - Application : cell phone processor - ARM9 (master) and TMS320C55x DSP (slave) - Asymmetric Texas Instruments’ Davinci - Application : multimedia processor - ARM Cortex-A8, ARM M3, DSP, codec accelerator ARM MPcore - main applications (networking, file I/O, UI) - control slave cores Video codec accelerators - Video compression Data bus DSP - Image processing
  • 11. Today, most embedded system processors are heterogeneous. Even though ASMP is specialized for designer’s goal, higher performance is always required. Recent MPSoC architecture integrates both SMP and ASMP structures. Overview of MPSoC solutions
  • 12. Overview of MPSoC solutions AMD’s APU(Accelerated Processing Unit) Llano(2011) First CPU-GPU fused processor Intel’s sandy bridge processor CPU-GPU fused processing unit
  • 13. Overview of MPSoC solutions Various mobile application processors
  • 14. Software Issues with Heterogeneous Systems Offloading - Task offloading is a main goal of multi core system - In heterogeneous system, task offloading is not easy Data sharing - Overhead of data transferring via memory bus is important issue - The results from each processing unit should be integrated - the number of memory copy should be minimized. Programmability - S/W development productivity is important
  • 15. Software Issues with Heterogeneous Systems, Continued How can programmers develop S/W for each different processor easily? (Usability) How can we move code from a system to other systems? (Portability)
  • 16. HSA Foundations HSA creates an improved processor design that exposes the benefits and capabilities of mainstream programmable computer elements. Each part works together seamlessly.
  • 17.
  • 18. Commercial Solutions for parallel computing AMD - Accelerated Parallel Processing SDK(AMD cores) Intel - parallel studio(Intel cores) Nvidia - CUDA(Nvidia GPU)
  • 19. Open projects for parallel computing OpenMP(Only CPU) OpenACC(CPU, GPU) OpenCL(Various processors)
  • 20. Introduction to OpenCL Open Computing Language (OpenCL) is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, DSPs, FPGAs and other processors. (Source: Wikipedia) OpenCL is an open standard maintained by the Khronos Group. Programming model executable across various types of processors
  • 21. Introduction to OpenCL the abstract concept of the modern high-level programming language is abandoned in OpenCL OpenCL provides an abstract programming model for heterogeneous hardware so that programmers are able to control processor resources more flexibly While Nvidia’s CUDA is a solution to maximize use of only GPUs, the goal of OpenCL is to utilize any available processor resources But main use of OpenCL is focused on GPUs currently.
  • 23. OpenCL compliant processors Vendors supporting OpenCL : AMD, Intel, Apple, Qualcomm, Imagination Technologies, STMicroelectronics, IBM, Samsung, NVIDIA http://www.khronos.org/conformance/adopters/conformant-products#opencl
  • 25. Activities of ETRI The Industrial S/W platform technology for heterogeneous systems was weak. OS, platforms and software libraries for heterogeneous multi core systems are becoming more and more important.
  • 28. - Advanced OS kernel - CPU-GPU load balancing enhancement - IDE tool supporting S/W development based on heterogeneous multi-core - Power consumption measurement of multi-core processor Research by ETRI
  • 29. Advanced OS kernel with high-efficient load balancing scheduler
  • 30. CPU-GPU load balancing enhancement
  • 31. High efficient energy OS and energy consumption monitoring technology
  • 32. The Role of OpenSEED Distribution of open source developed by research institutes http://opensw-seed.org
  • 33. Role of OpenSEED OpenCV ocl(OpenCL) module test
  • 34. Conclusion The heterogeneous system era has already arrived. Open projects and organizations are supporting the SW platform standard for heterogeneous systems. Software platforms and its advances are essential because heterogenous systems are sophisticated. New technologies should be distributed to contribute to the industry based on heterogeneous systems.
  • 35. Thank you You can download this presentation file at http://sshlab.blogspot.com
  • 36. References OpenSEED http://opensw-seed.org/ http://www.slideshare.net/manglamjaiswal1/multicore-processor-technology http://www.slideshare.net/AMD/amd-isscc-keynote The world’s first combination of low-power CPU and advanced GPU integrated into a single embedded device. http://www.amd.com/Documents/49282_G-Series_platform_brief.pdf
  • 37. http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6031577&url=http%3A%2F%2Fieeexpl ore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6031577 AMD HSA : http://developer.amd.com/resources/heterogeneous- computing/what-is-heterogeneous-system-architecture-hsa/ HAS Foundation : http://hsafoundation.com/ Khronos OpenCL : http://www.khronos.org/opencl/ Khronos Web CL : http://www.khronos.org/webcl/

Editor's Notes

  1. Good morning, my name is Seung-hwa Song. Today, I am going to talk about a really interesting emerging trend in the computer industry. The computer industry has experienced great development with semiconductor. The processor units became faster and faster over the past decades. However, now the multi-core processor era has been arrived because boosting the speed of single-core processor’s speeds is no longer sufficient.
  2. First, I will talk about multi-core systems and its SW technoloigy trends. And then, I will present some interesting research and the efforts to solve software issues about heterogeneous systems. I will finally discuss open platform for heterogeneous systems and role of open source organizations
  3. The most important change in the market trends is that the only performance-oriented developement of computers are not the main requirement, but performance per watt. /With the rapid proliferation of mobile devices, from laptops to the smart phone boom, people prefer to use their mobile devices for a long time without having to recharge their batteries The growth of small-scale SoC technology has helped boost that change.
  4. It is common knowledge that we cannot easily boost the clock speed of the processing unit with modern semiconductor technology so people started to consider increasing the number of processor instead of clock speed. As we boost clock speed, electric power consumption increases exponentially. /Too much power consumption creates overheating which may damage semiconductions. But faster computers are a necessity and that is why the multi-core system is becoming increasingly popular. This picture describes the performance per watt of each processor. We can see many core processor use much less power than GPUs or quad processors..
  5. There are two kinds of multi processor design. Symmetric, and Asymmetric Multi Processor
  6. In SMP design, every processor is identical. All processors can access a common main memory and share data through a memory bus.
  7. ASMP is heterogeneous. Each processor has different architecture. Usually, one core works as a master processor and communicates with other slave processors. In some different designs, some slave processors do not connect to the main memory and work seperately.
  8. Because the homogeneous systems are symmetric each core is supposed to operate similary. So we can easily off load tasks from one core to another without additional effort. On the other hand, the heterogeneous systems consist of various specialized cores for specific tasks such as GPUs or codec accelerators. These special cores are less compatible but consume less power than symmetric CPUs..
  9. The multi-core system has developed due to an advance of the SoC Let’s take a look at the history of multi processing system-on-chip Lucent Daytona, the first MPSoC, is designed for wireless communication routers in 2000. Early MPSoC design was symmetric and has a common memory map as is expected.
  10. OMAP is a famous architecture for cell phone developed by Texas Instruments It includes an ARM9 processor as the master processor and a DSP as a slave (processor) The davinci core is another chipset model optimized for multimedia processing. I used this a few years ago. I ported main application on the ARM processor. I could use legacy linux OS and software libraries easily. But other graphical processing jobs are off loaded into DSP and video codec accelerators.
  11. Today, most embedded system processors are heterogeneous. Because heterogeneous system is better to save energy. Even though ASMP is specialized for designer’s goal, higher performance is always required. This is because many recent MPSoC architecture integrates both SMP and AMP
  12. The first CPU-GPU fused design was launched in the personal computer industry. AMD announced the first APU which low-powered Multi core CPU and advanced GPU are fused in one die. Interestingly, It seems that PC users don’t want only high performance computers but also worry about electricity fee too. Now this architecture solved both the performance and power consumption problem. And Intel is also following that trend.
  13. The most pressing market issue is mobile devices. Many processor vendors launched various cores to satisfy market needs. These processors are all heterogeneous multi processors including GPU and symmetric CPUs.
  14. There are some important software issues about heterogeneous system. The main goal of multi-core system is task off loading. It was not that big a problem in a homogeneous multi core system. Since all processors were the same, tasks could be computed on any core without any change of software. However, in heterogeneous system, we cannot port the legacy software libraries operated on CPUs to GPUs or DSPs. The second issue is data sharing. Good or bad, task off loading causes another overhead of data transferring through the memory bus. The results from each processing unit should be integrated so the final computation is significantly limited by the memory bus speed. This is because software engineers should design software to make the number of memory copy between cores reduced as many as possible. What I really want to talk about today is programmability, because programmability defines productivity.
  15. When we program on heterogeneous systems, we have to learn characteristics of each different core and software development environment. Even though processor vendors provide supporting packages for programmers, it is difficult for programmers to learn every programming environment. How can we develop software easily for each different environment? Many legacy software platforms, OSs and libraries support only major CPUs such as Intel, ARM, and Power PC. We have to rewrite code for minor processors if we want to off load them.
  16. To overcome these issues, the heterogeneous system association has been created. This organization proposes software platform architecture for the heterogeneous system.
  17. This is the HSA Solution Stack. While legacy OS and applications are on CPU and GPU hardware, HSA Runtime Infrastructure covers GPU, ACC and legacy OS. This low level software stack is abstraction layer of heterogeneous processors. HSA Accelerated applications layered over the infrastructure layer defines programming languages such as OpenCL, C++AMP, Python, and Javascript.
  18. These are commercial solutions for parallel computing. AMD’s APP SDK supports AMD processors. Since AMD is more open source-oriented company than Intel, it is based on OpenCL. Intel provides a parallel studio for programming on Intel processors. Nvidia provides CUDA which supports Nvidia’s GPUs. Its language grammar is similar with OpenCL but designed only for GPU utilization.
  19. There are also many open source solutions for parallel computing I cannot cover them in this presentation, but I will tell more details about OpenCL.
  20. The OpenCL is a programming language proposed by Apple firstly. Now its standard is maintained by the Khronos Group. OpenCL is designed for writing programs that is executable across various types of processors.
  21. This is an OpenCL platform model. The OpenCL platform has only one host which is connected to one or more OpenCL devices. The host is usually operated with a master CPU. Each OpenCL device includes one or more Compute Units And the compute units consist of one or more ‘processing elements’. Actual computation on a device occurs within the processing elements.
  22. There are open projects supporting the OpenCL. WebCL, WebGL and OpenGL is also standardized by Khronos group. Some projects related to image processing such as OpenCV and FFmpeg are also supporting OpenCL.
  23. Because of the movement, trends, and market needs, people in ETRI became busy. (I think they are always busy) there is no any basement technology, infra system or educational program in Korea.
  24. Thus, a leader of a research team in ETRI, Dr. Jung and his brilliant coworkers planned some projects for growth and development of the future software industry Their research work covers a variety of software industries such as OS, service libraries, development tools, management tools, and applications.
  25. This is RND road map of ETRI At first year, some proto type technologies and products were developed. Some works were related to OpenCL and linux kernels Next year, research works progressed so that we could make sufficient results. OpenCL IDE, Web engines based on heterogeneous multi core system, advanced OS kernel, and so on. Finally, this year, we are about to complete those projects and open them to the public
  26. Now, let me introduce some interesting research progressed by ETRI
  27. As I mentioned before, there are some noticeable issues about heterogeneous system such as task offloading, load balancing, and power efficiency. After offloading tasks to each core, efficient load balancing definately defines multi core system’s performance Advanced OS kernel includes improved task scheduler called the Distributed Weighted Round-Robin. This research shows how to utilize processors with a high efficiency rate.
  28. Since GPUs have parallel vector processing capabilities that enable them to compute large sets of data, data transferring between CPU and GPU causes a considerable bottleneck. This is because the ETRI also tried to improve load balancing between CPU and GPU is also important.
  29. Another main research topic of the ETRI is power consumption measurement algorithm. The new algorithm showed better accuracy compared to Google Power Tutor
  30. There is one more really important mission of the team ETRI A few years ago, some passionate engineers, including myself, were invited to a small group meeting. It was a tiny but significant kick-off meeting of an open source project community, called OpenSEED The main goal of this community is to test and evalute recent techniques devloped by ETRI. All of the research work is open to public and available for download at the OpenSEED site.
  31. My team members and I have evaluated the advanced Linux kernel since the first year of the project. We tried to test some image processing application programs with the Linux kernel. Now, we are trying to test ocl modules in the OpenCV library which supports OpenCL on OpenCV library.
  32. (Combine all references onto one slide)