SlideShare a Scribd company logo
1 of 21
Architecture and Performance of Runtime Environments for Data Intensive Scalable Computing SC09 Doctoral Symposium,  Portland, 11/18/2009 Student: Jaliya Ekanayake Advisor: Prof. Geoffrey Fox Community Grids Laboratory,  Digital Science Center Pervasive Technology Institute Indiana University
Cloud Runtimes for Data/Compute Intensive Applications Cloud Runtimes MapReduce  Dryad/DryadLINQ Sector/Sphere  Moving Computation to  Data Simple communication topologies MapReduce Directed Acyclic Graphs (DAG)s Distributed File Systems Fault Tolerance ,[object Object]
Represented as filter pipelines
Parallelizable filters,[object Object]
Applications using Hadoop and DryadLINQ (2) PhyloD [1]project from Microsoft Research Derive associations between HLA alleles and HIV codons and between codons themselves DryadLINQ  implementation [1] Microsoft Computational Biology Web Tools, http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/
Applications using Hadoop and DryadLINQ (3) 125 million distances 4 hours & 46 minutes Calculate  Pairwise Distances (Smith Waterman Gotoh) Calculate pairwise distances for a collection of genes (used for clustering, MDS) Fine grained tasks in MPI Coarse grained tasks in DryadLINQ Performed on 768 cores (Tempest Cluster)
Applications using Hadoop and DryadLINQ (4) ,[object Object]
K-Means Clustering
Matrix Multiplication
Multi-Dimensional Scaling (MDS),[object Object]
Applications & Different Interconnection Patterns Input map iterations Input Input map map Output Pij reduce reduce MPI Domain of MapReduce and Iterative Extensions
i-MapReduce ,[object Object]
Distinction on static data and variable data (data flow vs. δ flow)
Cacheable map/reduce tasks (long running tasks)
Combine operation
Support fast intermediate data transfersStatic data Configure() Iterate User Program δ flow Map(Key, Value)   Reduce (Key, List<Value>)  Close() Combine (Key, List<Value>) Different synchronization and intercommunication mechanisms used by the parallel runtimes
i-MapReduceProgramming Model runMapReduce()   Iterations Worker Nodes configureMaps() Local Disk configureReduce() Cacheable map/reduce tasks while(condition){ Can send <Key,Value> pairs directly Map() Reduce() Combine() operation Communications/data transfers via the pub-sub broker network updateCondition() Two configuration options : Using local disks (only for maps) Using pub-sub bus  } //end while close() User program’s process space
i-MapReduceArchitecture Pub/Sub Broker Network Map Worker M Worker Nodes Reduce Worker D MR Driver User Program D R M M M M MRDeamon D R R R R Data Read/Write File System Communication Data Split ,[object Object]
Eliminates file based communication
Cacheable map/reduce tasks
Static data remains in memory

More Related Content

What's hot

Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
Plenzogan technology
Plenzogan technologyPlenzogan technology
Plenzogan technologyplenzogan
 
Efficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in CloudsEfficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in CloudsIJERA Editor
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsZvi Avraham
 
Energy-aware VM Allocation on An Opportunistic Cloud Infrastructure
Energy-aware VM Allocation on An Opportunistic Cloud InfrastructureEnergy-aware VM Allocation on An Opportunistic Cloud Infrastructure
Energy-aware VM Allocation on An Opportunistic Cloud InfrastructureMario Jose Villamizar Cano
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computingVajira Thambawita
 
IRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET Journal
 
Parallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPParallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPIJSRED
 
Image transmission in wireless sensor networks
Image transmission in wireless sensor networksImage transmission in wireless sensor networks
Image transmission in wireless sensor networkseSAT Publishing House
 
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...Papitha Velumani
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationDevansh16
 
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...CSCJournals
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Derryck Lamptey, MPhil, CISSP
 
Load Rebalancing for Distributed Hash Tables in Cloud Computing
Load Rebalancing for Distributed Hash Tables in Cloud ComputingLoad Rebalancing for Distributed Hash Tables in Cloud Computing
Load Rebalancing for Distributed Hash Tables in Cloud Computingiosrjce
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitJinwon Lee
 

What's hot (20)

Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Plenzogan technology
Plenzogan technologyPlenzogan technology
Plenzogan technology
 
Efficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in CloudsEfficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in Clouds
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
Energy-aware VM Allocation on An Opportunistic Cloud Infrastructure
Energy-aware VM Allocation on An Opportunistic Cloud InfrastructureEnergy-aware VM Allocation on An Opportunistic Cloud Infrastructure
Energy-aware VM Allocation on An Opportunistic Cloud Infrastructure
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
 
IRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CL
 
Chap3 slides
Chap3 slidesChap3 slides
Chap3 slides
 
Parallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPParallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MP
 
Image transmission in wireless sensor networks
Image transmission in wireless sensor networksImage transmission in wireless sensor networks
Image transmission in wireless sensor networks
 
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
 
FrackingPaper
FrackingPaperFrackingPaper
FrackingPaper
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localization
 
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
 
Lecture 05 - Chapter 3 - Models of parallel computers and interconnections
Lecture 05 - Chapter 3 - Models of parallel computers and  interconnectionsLecture 05 - Chapter 3 - Models of parallel computers and  interconnections
Lecture 05 - Chapter 3 - Models of parallel computers and interconnections
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
 
Load Rebalancing for Distributed Hash Tables in Cloud Computing
Load Rebalancing for Distributed Hash Tables in Cloud ComputingLoad Rebalancing for Distributed Hash Tables in Cloud Computing
Load Rebalancing for Distributed Hash Tables in Cloud Computing
 
Lecture 04 chapter 2 - Parallel Programming Platforms
Lecture 04  chapter 2 - Parallel Programming PlatformsLecture 04  chapter 2 - Parallel Programming Platforms
Lecture 04 chapter 2 - Parallel Programming Platforms
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 

Similar to Architecture and Performance of Runtime Environments for Data Intensive Scalable Computing

Qiu bosc2010
Qiu bosc2010Qiu bosc2010
Qiu bosc2010BOSC 2010
 
Slide 1
Slide 1Slide 1
Slide 1butest
 
Slide 1
Slide 1Slide 1
Slide 1butest
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersIan Foster
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Yahoo Developer Network
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009Ian Foster
 
Dissertation defense
Dissertation defenseDissertation defense
Dissertation defensemarek_pomocka
 
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...LEGATO project
 
2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdfLevLafayette1
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningRafael Ferreira da Silva
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009Ian Foster
 
grid mining
grid mininggrid mining
grid miningARNOLD
 
Enhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniqueEnhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniquejournalBEEI
 
Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)
Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)
Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)Thilina Gunarathne
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22marpierc
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Robert Grossman
 
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONijcsit
 

Similar to Architecture and Performance of Runtime Environments for Data Intensive Scalable Computing (20)

Qiu bosc2010
Qiu bosc2010Qiu bosc2010
Qiu bosc2010
 
Slide 1
Slide 1Slide 1
Slide 1
 
Slide 1
Slide 1Slide 1
Slide 1
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and Supercomputers
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
Dissertation defense
Dissertation defenseDissertation defense
Dissertation defense
 
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
 
3rd presentation
3rd presentation3rd presentation
3rd presentation
 
2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 
grid mining
grid mininggrid mining
grid mining
 
Enhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniqueEnhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce Technique
 
Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)
Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)
Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
TransPAC3/ACE Measurement & PerfSONAR Update
TransPAC3/ACE Measurement & PerfSONAR UpdateTransPAC3/ACE Measurement & PerfSONAR Update
TransPAC3/ACE Measurement & PerfSONAR Update
 
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
 
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
 

Recently uploaded

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Architecture and Performance of Runtime Environments for Data Intensive Scalable Computing

  • 1. Architecture and Performance of Runtime Environments for Data Intensive Scalable Computing SC09 Doctoral Symposium, Portland, 11/18/2009 Student: Jaliya Ekanayake Advisor: Prof. Geoffrey Fox Community Grids Laboratory, Digital Science Center Pervasive Technology Institute Indiana University
  • 2.
  • 4.
  • 5. Applications using Hadoop and DryadLINQ (2) PhyloD [1]project from Microsoft Research Derive associations between HLA alleles and HIV codons and between codons themselves DryadLINQ implementation [1] Microsoft Computational Biology Web Tools, http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/
  • 6. Applications using Hadoop and DryadLINQ (3) 125 million distances 4 hours & 46 minutes Calculate Pairwise Distances (Smith Waterman Gotoh) Calculate pairwise distances for a collection of genes (used for clustering, MDS) Fine grained tasks in MPI Coarse grained tasks in DryadLINQ Performed on 768 cores (Tempest Cluster)
  • 7.
  • 10.
  • 11. Applications & Different Interconnection Patterns Input map iterations Input Input map map Output Pij reduce reduce MPI Domain of MapReduce and Iterative Extensions
  • 12.
  • 13. Distinction on static data and variable data (data flow vs. δ flow)
  • 14. Cacheable map/reduce tasks (long running tasks)
  • 16. Support fast intermediate data transfersStatic data Configure() Iterate User Program δ flow Map(Key, Value) Reduce (Key, List<Value>) Close() Combine (Key, List<Value>) Different synchronization and intercommunication mechanisms used by the parallel runtimes
  • 17. i-MapReduceProgramming Model runMapReduce() Iterations Worker Nodes configureMaps() Local Disk configureReduce() Cacheable map/reduce tasks while(condition){ Can send <Key,Value> pairs directly Map() Reduce() Combine() operation Communications/data transfers via the pub-sub broker network updateCondition() Two configuration options : Using local disks (only for maps) Using pub-sub bus } //end while close() User program’s process space
  • 18.
  • 19. Eliminates file based communication
  • 22. User Program is the composer of MapReduce computations
  • 23. Extends the MapReduce model to iterative computations
  • 25. Assume that static data fits in to distributed memory12/6/2009 Jaliya Ekanayake 11
  • 26. Applications – Pleasingly Parallel CAP3- Expressed Sequence Tagging Input files (FASTA) CAP3 CAP3 High Energy Physics (HEP) Data Analysis Output files
  • 27. Applications - Iterative Performance of K-Means Clustering Parallel Overhead of Matrix multiplication
  • 28. Current Research Virtualization Overhead Applications more susceptible to latencies (higher communication/computation ratio) => higher overheads under virtualization Hadoop shows 15% performance degradation on a private cloud Latency effect on i-MapReduceis lower compared to MPI due to the coarse grained tasks? Fault Tolerance for i-MapReduce Replicated data Saving state after n iterations
  • 29. Related Work General MapReduce References: Google MapReduce Apache Hadoop Microsoft DryadLINQ Pregel : Large-scale graph computing at Google Sector/Sphere All-Pairs SAGA: MapReduce Disco
  • 30. Contributions Programming model for iterative MapReduce computations i-MapReduceimplementation MapReduce algorithms/implementations for a series of scientific applications Applicability of cloud runtimes to different classes of data/compute intensive applications Comparison of cloud runtimes with MPI Virtualization overhead of HPC Applications and Cloud Runtimes
  • 31. Publications Jaliya Ekanayake, (Advisor: Geoffrey Fox) Architecture and Performance of Runtime Environments for Data Intensive Scalable Computing, Accepted for the Doctoral Showcase, SuperComputing2009. Xiaohong Qiu, Jaliya Ekanayake, Scott Beason, Thilina Gunarathne, Geoffrey Fox, Roger Barga, Dennis Gannon, Cloud Technologies for Bioinformatics Applications, Accepted for publication in 2nd ACM Workshop on Many-Task Computing on Grids and Supercomputers, SuperComputing2009. Jaliya Ekanayake, Atilla Soner Balkir, Thilina Gunarathne, Geoffrey Fox, Christophe Poulain, Nelson Araujo, Roger Barga, DryadLINQ for Scientific Analyses, Accepted for publication in Fifth IEEE International Conference on e-Science (eScience2009), Oxford, UK. Jaliya Ekanayake and Geoffrey Fox, High Performance Parallel Computing with Clouds and Cloud Technologies, First International Conference on Cloud Computing (CloudComp2009), Munich, Germany. – An extended version of this paper goes to a book chapter. Geoffrey Fox, Seung-Hee Bae, Jaliya Ekanayake, Xiaohong Qiu, and Huapeng Yuan, Parallel Data Mining from Multicore to Cloudy Grids, High Performance Computing and Grids workshop, 2008. – An extended version of this paper goes to a book chapter. Jaliya Ekanayake, Shrideep Pallickara, Geoffrey Fox, MapReduce for Data Intensive Scientific Analyses, Fourth IEEE International Conference on eScience, 2008, pp.277-284. Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox, A collaborative framework for scientific data analysis and visualization, Collaborative Technologies and Systems(CTS08), 2008, pp. 339-346. Shrideep Pallickara, Jaliya Ekanayake and Geoffrey Fox, A Scalable Approach for the Secure and Authorized Tracking of the Availability of Entities in Distributed Systems, 21st IEEE International Parallel & Distributed Processing Symposium (IPDPS 2007).
  • 32. Acknowledgements My Ph.D. Committee: Prof. Geoffrey Fox Prof. Andrew Lumsdaine Prof. Dennis Gannon Prof. David Leake SALSA Team @ IU Especially: Judy Qiu, Scott Beason, Thilina Gunarathne, Hui Li Microsoft Research Roger Barge Christophe Poulain
  • 34. Parallel Runtimes – DryadLINQ vs. Hadoop
  • 35. Cluster Configurations DryadLINQ Hadoop / MPI DryadLINQ / MPI

Editor's Notes

  1. Currently uses NaradaBrokering, but it is easily extensible to use any other pub/sub message infrastructure such as Apache ActiveMQ.