SlideShare a Scribd company logo
1 of 13
Predicting the Time of Oblivious BSP ,[object Object],González J.A.   1 , León C.  1 , Piccoli F.  2 , Printista M.  2 , Roda J.L.  1 ,  Rodríguez C.   1 , Sande F.  1 1 Dpto. de Estadística, Investigación Operativa y Computación Universidad de La Laguna Tenerife, Canary Islands, Spain 2 Universidad Nacional de San Luis Ejército de los Andes 950, San Luis, Argentina
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bulk Synchronous Parallel Model (BSP) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Microprocessor Cache Memory Network Interface DRAM Memory Interconnection Network Microprocesador Memoria Caché Interfaz de Red Memoria DRAM Microprocesador Memoria Caché Interfaz de Red Memoria DRAM Microprocesador Memoria Caché Interfaz de Red Memoria DRAM Microprocessor Cache Memory Network Interface DRAM Memory
Oblivious BPS Model (OBSP) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],–  h PS : OBSP packet size g L b0 g 0 h h PS L b time T(h) = g*h+L b   h    h PS T(h) = g 0 *h+L b0   h < h PS
Paderborn University BSP Library ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],The Paderborn University BSP (PUB) Library - Design, Implementation and Performance Olaf Bonorden, Ben Juurlink, Ingo von Otte, Ingo Rieping 13 th  International Parallel Processing Symposium & 10 th  Symposium on Parallel and Distributed Processing (IPPS/SPDP) San Juan, Puerto Rico, April 12 - April 16, 1999
OBSP Cost Analysis
BSP Model vs OBSP Model  2,i =3*w + 2*(g*h+L b ) h    h PS T BSP =4*w + 2*(g*h+L) h    h PS BSP OBSP P1 P0 time w w 2w 2w L b L b g*h g*h P1 P0 time w w 2w 2w L L g*h g*h
FFT Analysis using the OBSP Model  1,i (T k (1) ,X k (1) ,  i (1) ) P1 P0 P2 P3 seq_fft Division bsp_partition Combination  2,i (T k (1) ,X k (1) ,  i (1) )  1,i (T k (2) ,X k (2) ,  i (2) )  1,i (T (0) ,X (0) ,0)  2,i (T (0) ,X (0) ,0) bsp_done X (0) ={0,1,2,3} X 0 (1) ={0,1} X 1 (1) ={2,3} X k (2) ={k} k=0,..,3 w 1,i g*h 1,i +L b w 2,i w 2,i (1) w 1,i (1) g*h 1,i (1) +L b  i (1) w 1,i (2)  i (2)
OBSP Prediction Accuracy Real and OBSP predicted time for the FFT algorithm on the CRAY T3E Real and OBSP predicted time for the RAP  algorithm on the CRAY T3E N=1000, M=1000 N=2048 OBSP parameter values on the CRAY T3E.  g  is in bytes per second p=16
PBS 209152 Items. CRAY T3E
Conclusions & Future Works ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
OBSP Cost Analysis Example P1 P0 time w w 2w 2w L b L b g*h g*h
BSP Cost Analysis Example time w w 2w 2w L L g*h g*h P1 P0

More Related Content

What's hot

An evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loopsAn evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loopsLinaro
 
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Intel® Software
 
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...Association for Computational Linguistics
 
Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...PyData
 
Model-counting Approaches For Nonlinear Numerical Constraints
Model-counting Approaches For Nonlinear Numerical ConstraintsModel-counting Approaches For Nonlinear Numerical Constraints
Model-counting Approaches For Nonlinear Numerical ConstraintsQuoc-Sang Phan
 
Learning Erlang (from a Prolog dropout's perspective)
Learning Erlang (from a Prolog dropout's perspective)Learning Erlang (from a Prolog dropout's perspective)
Learning Erlang (from a Prolog dropout's perspective)elliando dias
 
USING ORFEO TOOLBOX A GROWING COMPETENCE IN A COLLABORATIVE ENVIRONMENT
USING ORFEO TOOLBOX A GROWING COMPETENCE IN A COLLABORATIVE ENVIRONMENTUSING ORFEO TOOLBOX A GROWING COMPETENCE IN A COLLABORATIVE ENVIRONMENT
USING ORFEO TOOLBOX A GROWING COMPETENCE IN A COLLABORATIVE ENVIRONMENTotb
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMLinaro
 
Functional Reactive Programming by Gerold Meisinger
Functional Reactive Programming by Gerold MeisingerFunctional Reactive Programming by Gerold Meisinger
Functional Reactive Programming by Gerold MeisingerGeroldMeisinger
 
FME Tips and Tricks
FME Tips and TricksFME Tips and Tricks
FME Tips and TricksSterling Geo
 
Dummy log generation using poisson sampling
 Dummy log generation using poisson sampling Dummy log generation using poisson sampling
Dummy log generation using poisson samplingKwanghee Choi
 
PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019Ralf Gommers
 
A Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkA Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkYu Liu
 
Pain points with M3, some things to address them and how replication works
Pain points with M3, some things to address them and how replication worksPain points with M3, some things to address them and how replication works
Pain points with M3, some things to address them and how replication worksRob Skillington
 
Cosmic Rays- TEC
Cosmic Rays- TECCosmic Rays- TEC
Cosmic Rays- TECguest4cb860
 
ESCAPE Kick-off meeting - LSST (Feb 2019)
ESCAPE Kick-off meeting - LSST (Feb 2019)ESCAPE Kick-off meeting - LSST (Feb 2019)
ESCAPE Kick-off meeting - LSST (Feb 2019)ESCAPE EU
 
Linuxconf 2011 parallel languages talk
Linuxconf 2011 parallel languages talkLinuxconf 2011 parallel languages talk
Linuxconf 2011 parallel languages talkLenz Gschwendtner
 
Python crash course for geologists in the mining industry
Python crash course for geologists in the mining industryPython crash course for geologists in the mining industry
Python crash course for geologists in the mining industryJohann Dangin
 

What's hot (20)

An evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loopsAn evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loops
 
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
 
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
 
Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...
 
Model-counting Approaches For Nonlinear Numerical Constraints
Model-counting Approaches For Nonlinear Numerical ConstraintsModel-counting Approaches For Nonlinear Numerical Constraints
Model-counting Approaches For Nonlinear Numerical Constraints
 
Learning Erlang (from a Prolog dropout's perspective)
Learning Erlang (from a Prolog dropout's perspective)Learning Erlang (from a Prolog dropout's perspective)
Learning Erlang (from a Prolog dropout's perspective)
 
Matlab bode diagram_instructions
Matlab bode diagram_instructionsMatlab bode diagram_instructions
Matlab bode diagram_instructions
 
USING ORFEO TOOLBOX A GROWING COMPETENCE IN A COLLABORATIVE ENVIRONMENT
USING ORFEO TOOLBOX A GROWING COMPETENCE IN A COLLABORATIVE ENVIRONMENTUSING ORFEO TOOLBOX A GROWING COMPETENCE IN A COLLABORATIVE ENVIRONMENT
USING ORFEO TOOLBOX A GROWING COMPETENCE IN A COLLABORATIVE ENVIRONMENT
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVM
 
Functional Reactive Programming by Gerold Meisinger
Functional Reactive Programming by Gerold MeisingerFunctional Reactive Programming by Gerold Meisinger
Functional Reactive Programming by Gerold Meisinger
 
FME Tips and Tricks
FME Tips and TricksFME Tips and Tricks
FME Tips and Tricks
 
Dummy log generation using poisson sampling
 Dummy log generation using poisson sampling Dummy log generation using poisson sampling
Dummy log generation using poisson sampling
 
PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019
 
A Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkA Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on Spark
 
Pain points with M3, some things to address them and how replication works
Pain points with M3, some things to address them and how replication worksPain points with M3, some things to address them and how replication works
Pain points with M3, some things to address them and how replication works
 
Cosmic Rays- TEC
Cosmic Rays- TECCosmic Rays- TEC
Cosmic Rays- TEC
 
Cosmic Rays Tec
Cosmic Rays  TecCosmic Rays  Tec
Cosmic Rays Tec
 
ESCAPE Kick-off meeting - LSST (Feb 2019)
ESCAPE Kick-off meeting - LSST (Feb 2019)ESCAPE Kick-off meeting - LSST (Feb 2019)
ESCAPE Kick-off meeting - LSST (Feb 2019)
 
Linuxconf 2011 parallel languages talk
Linuxconf 2011 parallel languages talkLinuxconf 2011 parallel languages talk
Linuxconf 2011 parallel languages talk
 
Python crash course for geologists in the mining industry
Python crash course for geologists in the mining industryPython crash course for geologists in the mining industry
Python crash course for geologists in the mining industry
 

Viewers also liked

Linux containers
Linux containersLinux containers
Linux containersIndika Dias
 
PREDICTING THE TIME OF OBLIVIOUS PROGRAMS. Euromicro 2001
PREDICTING THE TIME OF OBLIVIOUS PROGRAMS. Euromicro 2001PREDICTING THE TIME OF OBLIVIOUS PROGRAMS. Euromicro 2001
PREDICTING THE TIME OF OBLIVIOUS PROGRAMS. Euromicro 2001Casiano Rodriguez-leon
 
eG Citrix Monitor
eG Citrix MonitoreG Citrix Monitor
eG Citrix MonitorPaul Bird
 
Acts 26 commentary
Acts 26 commentaryActs 26 commentary
Acts 26 commentaryGLENN PEASE
 
REGURGITATION AND ASPIRATION DURING ANESTHESIA
REGURGITATION AND ASPIRATION DURING ANESTHESIA REGURGITATION AND ASPIRATION DURING ANESTHESIA
REGURGITATION AND ASPIRATION DURING ANESTHESIA abiysileshi
 

Viewers also liked (7)

Ppcrslidesannotated
PpcrslidesannotatedPpcrslidesannotated
Ppcrslidesannotated
 
Linux containers
Linux containersLinux containers
Linux containers
 
PREDICTING THE TIME OF OBLIVIOUS PROGRAMS. Euromicro 2001
PREDICTING THE TIME OF OBLIVIOUS PROGRAMS. Euromicro 2001PREDICTING THE TIME OF OBLIVIOUS PROGRAMS. Euromicro 2001
PREDICTING THE TIME OF OBLIVIOUS PROGRAMS. Euromicro 2001
 
eG Citrix Monitor
eG Citrix MonitoreG Citrix Monitor
eG Citrix Monitor
 
Theperlreview
TheperlreviewTheperlreview
Theperlreview
 
Acts 26 commentary
Acts 26 commentaryActs 26 commentary
Acts 26 commentary
 
REGURGITATION AND ASPIRATION DURING ANESTHESIA
REGURGITATION AND ASPIRATION DURING ANESTHESIA REGURGITATION AND ASPIRATION DURING ANESTHESIA
REGURGITATION AND ASPIRATION DURING ANESTHESIA
 

Similar to PREDICTING THE TIME OF OBLIVIOUS PROGRAMS. Euromicro 2001

Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...AIRCC Publishing Corporation
 
Improving initial generations in pso algorithm for transportation network des...
Improving initial generations in pso algorithm for transportation network des...Improving initial generations in pso algorithm for transportation network des...
Improving initial generations in pso algorithm for transportation network des...ijcsit
 
COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...
COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...
COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...ijcsit
 
Coloured Algebras and Biological Response in Quantum Biological Computing Arc...
Coloured Algebras and Biological Response in Quantum Biological Computing Arc...Coloured Algebras and Biological Response in Quantum Biological Computing Arc...
Coloured Algebras and Biological Response in Quantum Biological Computing Arc...AIRCC Publishing Corporation
 
Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...
Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...
Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...Andrea Tassi
 
Photoacoustic tomography based on the application of virtual detectors
Photoacoustic tomography based on the application of virtual detectorsPhotoacoustic tomography based on the application of virtual detectors
Photoacoustic tomography based on the application of virtual detectorsIAEME Publication
 
A minimal introduction to Python non-uniform fast Fourier transform (pynufft)
A minimal introduction to Python non-uniform fast Fourier transform (pynufft)A minimal introduction to Python non-uniform fast Fourier transform (pynufft)
A minimal introduction to Python non-uniform fast Fourier transform (pynufft)Jyh-Miin Lin
 
Jeff Fischer - Python and IoT: From Chips and Bits to Data Science
Jeff Fischer - Python and IoT: From Chips and Bits to Data ScienceJeff Fischer - Python and IoT: From Chips and Bits to Data Science
Jeff Fischer - Python and IoT: From Chips and Bits to Data SciencePyData
 
Progress in the NNPDF global analysis
Progress in the NNPDF global analysisProgress in the NNPDF global analysis
Progress in the NNPDF global analysisJuan Rojo
 
cis98006
cis98006cis98006
cis98006perfj
 
Sampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxSampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxHamzaJaved306957
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingUniversity of Washington
 
Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...
Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...
Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...Takahiro Katagiri
 
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...Storti Mario
 
A novel particle swarm optimization for papr reduction of ofdm systems
A novel particle swarm optimization for papr reduction of ofdm systemsA novel particle swarm optimization for papr reduction of ofdm systems
A novel particle swarm optimization for papr reduction of ofdm systemsaliasghar1989
 
Definition and Validation of Scientific Algorithms for the SEOSAT/Ingenio GPP
Definition and Validation of Scientific Algorithms for the SEOSAT/Ingenio GPPDefinition and Validation of Scientific Algorithms for the SEOSAT/Ingenio GPP
Definition and Validation of Scientific Algorithms for the SEOSAT/Ingenio GPPEsri
 

Similar to PREDICTING THE TIME OF OBLIVIOUS PROGRAMS. Euromicro 2001 (20)

Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
 
Improving initial generations in pso algorithm for transportation network des...
Improving initial generations in pso algorithm for transportation network des...Improving initial generations in pso algorithm for transportation network des...
Improving initial generations in pso algorithm for transportation network des...
 
Pycon9 dibernado
Pycon9 dibernadoPycon9 dibernado
Pycon9 dibernado
 
COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...
COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...
COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...
 
Coloured Algebras and Biological Response in Quantum Biological Computing Arc...
Coloured Algebras and Biological Response in Quantum Biological Computing Arc...Coloured Algebras and Biological Response in Quantum Biological Computing Arc...
Coloured Algebras and Biological Response in Quantum Biological Computing Arc...
 
Er24902905
Er24902905Er24902905
Er24902905
 
Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...
Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...
Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...
 
Europy17_dibernardo
Europy17_dibernardoEuropy17_dibernardo
Europy17_dibernardo
 
Photoacoustic tomography based on the application of virtual detectors
Photoacoustic tomography based on the application of virtual detectorsPhotoacoustic tomography based on the application of virtual detectors
Photoacoustic tomography based on the application of virtual detectors
 
A minimal introduction to Python non-uniform fast Fourier transform (pynufft)
A minimal introduction to Python non-uniform fast Fourier transform (pynufft)A minimal introduction to Python non-uniform fast Fourier transform (pynufft)
A minimal introduction to Python non-uniform fast Fourier transform (pynufft)
 
Jeff Fischer - Python and IoT: From Chips and Bits to Data Science
Jeff Fischer - Python and IoT: From Chips and Bits to Data ScienceJeff Fischer - Python and IoT: From Chips and Bits to Data Science
Jeff Fischer - Python and IoT: From Chips and Bits to Data Science
 
Progress in the NNPDF global analysis
Progress in the NNPDF global analysisProgress in the NNPDF global analysis
Progress in the NNPDF global analysis
 
cis98006
cis98006cis98006
cis98006
 
Sampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxSampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptx
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity Computing
 
Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...
Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...
Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...
 
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
 
A novel particle swarm optimization for papr reduction of ofdm systems
A novel particle swarm optimization for papr reduction of ofdm systemsA novel particle swarm optimization for papr reduction of ofdm systems
A novel particle swarm optimization for papr reduction of ofdm systems
 
Definition and Validation of Scientific Algorithms for the SEOSAT/Ingenio GPP
Definition and Validation of Scientific Algorithms for the SEOSAT/Ingenio GPPDefinition and Validation of Scientific Algorithms for the SEOSAT/Ingenio GPP
Definition and Validation of Scientific Algorithms for the SEOSAT/Ingenio GPP
 
Lecture_2_v2_qc.pptx
Lecture_2_v2_qc.pptxLecture_2_v2_qc.pptx
Lecture_2_v2_qc.pptx
 

Recently uploaded

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Recently uploaded (20)

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

PREDICTING THE TIME OF OBLIVIOUS PROGRAMS. Euromicro 2001

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 7. BSP Model vs OBSP Model  2,i =3*w + 2*(g*h+L b ) h  h PS T BSP =4*w + 2*(g*h+L) h  h PS BSP OBSP P1 P0 time w w 2w 2w L b L b g*h g*h P1 P0 time w w 2w 2w L L g*h g*h
  • 8. FFT Analysis using the OBSP Model  1,i (T k (1) ,X k (1) ,  i (1) ) P1 P0 P2 P3 seq_fft Division bsp_partition Combination  2,i (T k (1) ,X k (1) ,  i (1) )  1,i (T k (2) ,X k (2) ,  i (2) )  1,i (T (0) ,X (0) ,0)  2,i (T (0) ,X (0) ,0) bsp_done X (0) ={0,1,2,3} X 0 (1) ={0,1} X 1 (1) ={2,3} X k (2) ={k} k=0,..,3 w 1,i g*h 1,i +L b w 2,i w 2,i (1) w 1,i (1) g*h 1,i (1) +L b  i (1) w 1,i (2)  i (2)
  • 9. OBSP Prediction Accuracy Real and OBSP predicted time for the FFT algorithm on the CRAY T3E Real and OBSP predicted time for the RAP algorithm on the CRAY T3E N=1000, M=1000 N=2048 OBSP parameter values on the CRAY T3E. g is in bytes per second p=16
  • 10. PBS 209152 Items. CRAY T3E
  • 11.
  • 12. OBSP Cost Analysis Example P1 P0 time w w 2w 2w L b L b g*h g*h
  • 13. BSP Cost Analysis Example time w w 2w 2w L L g*h g*h P1 P0

Editor's Notes

  1. Oblivious BSPModel 21/08/11 EuroPar-2000 Good afternoon ladies and gentlemen. In this paper, we propose a Parallel Computing Model that extends the well-known Bulk Synchronous Parallel model to work with algorithms that don´t require global barrier synchronisation, and deals with new programming features as processor-partition operations and oblivious synchronisation. This last feature gives name to the model: the Oblivious BSP.
  2. Oblivious BSPModel 21/08/11 EuroPar-2000 Presentation starts with a brief introduction to the BSP model concepts, and then I will present the Oblivious BSP model. A methodology for predicting the execution time is shown using a trivial example. After that, I will show the preliminaries results obtained using the OBSP model to predict the execution time of two algorithms: FFT, which is an example of Data Parallelism, and RAP, which is solved by a intensive communication pipeline algorithm. To conclude the presentation I will mention current and future works into this line.
  3. Oblivious BSPModel 21/08/11 EuroPar-2000 The Bulk Synchronous Parallel model was proposed by Prof. Valiant in 1990. It considers a parallel machine made of a set of p processor with private memory, interconnected throe a global communication network and a mechanism for synchronising the processors. The BSP model can be characterised by the following parameters: the communication gap g , defined as the unary packet transmission time, which reflects the per-processor bandwidth; the latency L , which corresponds to the time needed to synchronise all processors. These values depend on the number of processors p . A BSP computation is organised into supersteps, each of them consists of: Local computation, inter-process communication, and a global synchronisation. The execution time for a superstep s is given by: the largest amount of work performed by any processor during the superstep, w s plus the largest number of packets sent or received by any processor during the superstep, h s plus the time required by the global synchronisation.
  4. Oblivious BSPModel 21/08/11 EuroPar-2000 The OBSP model extends the BSP model to deal with oblivious synchronisation and processor-partition operations. When the number of messages due to receive by a processor in a superstep is known, a zero-cost synchronisation mechanism can be used to reduce the synchronisation overhead. An Oblivious Synchronisation blocks a processor until the expected number of messages are received. A partition operation splits the current set of processors into several subsets. Each of them acts as an autonomous BSP machine with its own processor numbering and synchronisation points. The OBSP machine communication capabilities are characterised by the following parameters: the gap g, the Synchronising Latency, L the Oblivious Latency, L b and the special values for small packet sizes g 0 and L b0
  5. Oblivious BSPModel 21/08/11 EuroPar-2000 The Paderborn University BSP library (PUB) is a parallel C library based on the BSP model. In addition to the most common BSP features, PUB provides routines to perform: oblivious synchronisation, partition operations, and collective communications.
  6. Oblivious BSPModel 21/08/11 EuroPar-2000 In an OBSP prediction analysis, we assume that: 1) supersteps are numbered starting at 1, 2) all processors perform the same number of supersteps R, and 3) because processors can be in different supersteps at the same time, a processor in its superstep s can send a message to other processor in a previous superstep. The system ensures that communication is not made effective until the receiver processor finishes its superstep s. Instead of using a global barrier, the OBSP model defines the incoming partners of each processor OMEGA as the set of processors that sends a message to this processor union itself. EICh sub s,i denotes the maximum number of communicated packet by a processor. PHI sub s,i denotes the time spent by processor i in superstep s, and is given by these recursive formulas. When a partition operation is performed, this schema is recursively applied into each submachine.
  7. Oblivious BSPModel 21/08/11 EuroPar-2000 In this slice I compare both execution models using a trivial example. In the first superstep one processor performs local computation and sends a message to the other processor, which has to do double amount of work. Then, they synchronise and the second superstep is a symmetrical one. Using the BSP model, the maximum amount of local computation in each superstep is 2w so the total computing time is given by: Using the OBSP model, the first processor can get the second superstep while the second processor remains in the first superstep. The system buffers the message until the receiver processor is ready to receive it. This overlapping allows reduce the total execution time.
  8. Oblivious BSPModel 21/08/11 EuroPar-2000 This figure represents the FFT execution under the OBSP model. Coloured blocks corresponds to local computation, and black blocks denotes inter-processors communication. Blue lines on the right denotes the supersteps performed by a machine X (j) , while the black lines marks the computing and communication parts in every superstep. In the original set of processors, each of them performs some local computing that include a partition into two subsets to solve the odd and even components transformation. This partition process continues until only one processor remains in each submachine. Each of these inner submachines performs only a superstep to compute a sequential transformation, and then rejoin to the outer machine. Local computation in the first superstep includes the work performed by the inner submachine. The superstep finishes with a data exchange, and the second superstep consists of the odd and even transformed signal combination.
  9. Oblivious BSPModel 21/08/11 EuroPar-2000 Preliminary results have been obtained on a CRAY T3E. The first table shows the model parameters values for this machine. We note that the values for small packet sizes are not available. In the second table, we can see the measured time and the OBSP predicted time for the FFT algorithm with an input vector of size 2 million of elements. The prediction accuracy is quite good. Percentage errors are less than 3% for the overall algorithm. After this paper acceptance, some experiments have been carried out with a fine-grain intensive-communication pipeline algorithm that solves the RAP. Percentage errors are larger than the previous example, but we point out that this algorithm uses small message sizes and the used model parameters are g y L b.
  10. Oblivious BSPModel 21/08/11 EuroPar-2000 Preliminary results have been obtained on a CRAY T3E. The first table shows the model parameters values for this machine. We note that the values for small packet sizes are not available. In the second table, we can see the measured time and the OBSP predicted time for the FFT algorithm with an input vector of size 2 million of elements. The prediction accuracy is quite good. Percentage errors are less than 3% for the overall algorithm. After this paper acceptance, some experiments have been carried out with a fine-grain intensive-communication pipeline algorithm that solves the RAP. Percentage errors are larger than the previous example, but we point out that this algorithm uses small message sizes and the used model parameters are g y L b.
  11. Oblivious BSPModel 21/08/11 EuroPar-2000 As conclusions: We have proposed a new parallel computing model that extends the BSP model to work with oblivious synchronisation and partition operations. Preliminary results shows that prediction accuracy is as good as the BSP model, but In future works we want to obtain the parameters values for small message sizes, and we want to extend the analysis to other algorithms and parallel platforms.
  12. Oblivious BSPModel 21/08/11 EuroPar-2000 In the first superstep, processor 1 has to make double amount of work than processor 0. Processor 1 receives a message from processor 0, so its omega set include both processor. If h is the amount of communicated data, PHI ’ s for each processor is ... Processor 0 starts its second superstep while processor 1 remains still in the previous one. System buffers the message to ensure it will be delivered when receiver processor demands it. Processor 1 has less work to do in the second superstep, so it sends the message back and finishes.
  13. Oblivious BSPModel 21/08/11 EuroPar-2000