SlideShare a Scribd company logo
1 of 18
DS
RC

Data Science
Research Center

High Performance Distributed
Computing
Henri Bal
Vrije Universiteit Amsterdam
DS
RC

Outline

1. Development of the field
2. Highlights VU-HPDC group
3. Links to data science cycle
4. Conclusions
DS
RC

Developments

• Multiple types of data explosions:
– Big data: huge processing/transportation demands
– Complex heterogeneous data

10-100 x global internet
traffic per year,
exascale processing

Complex data
DS
RC

Developments

• Infrastructure explosion
– High complexity: heterogeneous systems with
diversity of processors, systems, networks
DS
RC

VU HPDC GROUP

• Bridge the gap between demanding
applications and complex infrastructure
• Distributed programming systems for
–
–
–
–

Clusters, grids, clouds
Heterogeneous systems (``Jungles”)
Accelerators (GPUs)
Clouds & mobile devices

• Applications: multimedia, semantic web,
model checking, games, astronomy,
astrophysics, climate modeling ….
DS
RC

Highlights VU-HPDC group

889Billion
game
states 2002
Solved Awari

Multimedia
data
AAAI-VC 2007
Multimedia
data

Semantic
web
3rd Prize: ISWC 2008

Astronomy
data
DACH 2008 - BS

DACH 2008 - FT

Semantic
web
1st Prize: SCALE 2008

1st Prize: SCALE 2010

EYR 2011
Sustainability award
DS
RC

Links to data science cycle
Visual
Analytics
Perception
Cognition

Decision
Theory

Understand
and decide

Distributed reasoning
Distributed
Processing

Reasoning
Knowledge
representati
on

Large Scale
Databases

Store and
process
Software
Eng.
System /
Network
Eng.

Analyze
and model

Multimedia
Retrieval

Modeling
and
simulation

Information
Retrieval
Machine
Learning
DS
RC

Reasoning – Semantic Web

• Make the Web smarter by injecting meaning
so that machines can “understand” it.
o initial idea by Tim Berners-Lee in 2001

• Now attracted the interest of big IT
companies
DS
RC

Google Example
DS
RC

Google Example
DS
RC

Distributed Reasoning

• WebPIE: web-scale distributed reasoner
doing full materialization
• QueryPIE: distributed reasoning with
backward-chaining + pre-materialization of
schema-triples
• DynamiTE: maintains materialization after
updates (additions & removals)
 Challenge: real-time incremental
reasoning on web scale, combining new
(streaming) data & existing historic data
With: Jacopo Urbani, Alessandro Margara, Frank van Harmelen

COMMIT/
DS
R C Distributed Computing
• Jungle computing with Ibis
– Distributed, heterogeneous, hierarchical systems

• Programming accelerators

With: NLeSC (Frank Seinstra, Rob van Nieuwpoort et al.)
DS
RC

Ibis

• Computational
Astrophysics (Leiden)

gravitational
dynamics
stellar
evolution

AMUSE
radiative
transport

• Climate Modeling (Utrecht)
• Multimedia Content Analysis (UvA)

hydrodynamics
DS
RC

Accelerators (GPUs)
Host Interface
GigaThread Engine
GPC

GPC
SM

SM

SM

SM

SM

GPC
SM

SM

SM

SM

SM

SM

SM

GPC

Polymorph Engine
Polymorph Engine

Polymorph Engine
Polymorph Engine

SM

Polymorph Engine
Polymorph Engine

Memory Controller

Polymorph Engine
Polymorph Engine

Polymorph Engine
Polymorph Engine

Polymorph Engine
Polymorph Engine

Polymorph Engine
Polymorph Engine

Polymorph Engine
Polymorph Engine

Polymorph Engine
Polymorph Engine

Polymorph Engine
Polymorph Engine

Polymorph Engine
Polymorph Engine

L2 Cache

Polymorph Engine
Polymorph Engine

SM

Polymorph Engine
Polymorph Engine

SM

Polymorph Engine
Polymorph Engine

SM

Polymorph Engine
Polymorph Engine

SM

GPC

SM

Polymorph Engine
Polymorph Engine

SM

SM

SM

SM

SM

Raster Engine

GPC

SM

SM

SM

SM

SM

GPC

SM

Raster Engine

GPC

• Methodology for efficient GPU programming
– Stepwise refinement, different levels of hardware
abstraction
– Compiler feedback at each level
 Challenge: getting grip on performance

Memory Controller

Memory Controller

SM

Memory Controller

– Multimedia content analysis
– Climate modeling
– LOFAR (pulsar pipelines)

Raster Engine

SM

Memory Controller

• Use cases

Memory Controller

Raster Engine
SM
DS
RC

Glasswing: MapReduce
on Accelerators

• Use accelerators (OpenCL) as mainstream
feature
• Massive out-of-core data sets
• Scale vertically & horizontally
• Maintain MapReduce abstraction

With: Ismail El Helw, Rutger Hofman, UvA-SNE
DS
RC

Glasswing Pipeline

• Overlaps computation, communication &
disk access
• Supports multiple buffering levels
DS
RC

Evaluation (DAS-4, EC2)

• Compute-bound applications benefit
dramatically from GPUs (up to 107×)
• Better scalability than Hadoop
• Runs on a variety of accelerators & clouds

 Challenge: real-world (compute-intensive) applications
DS
RC

Conclusions

• Strong links with Big data & Complex data
Visual
Analytics
Perception
Cognition

Decision
Theory

Understand
and decide

Distributed
Processing

Reasoning
Knowledge
representati
on

Large Scale
Databases

Store and
process
Software
Eng.
System /
Network
Eng.

Analyze
and model

Multimedia
Retrieval

Modeling
and
simulation

Information
Retrieval
Machine
Learning

More Related Content

Similar to High Performance Distributed Computing and Data Science

Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22marpierc
 
Rack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRebekah Rodriguez
 
5g, gpu and fpga
5g, gpu and fpga5g, gpu and fpga
5g, gpu and fpgaRichard Kuo
 
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...Geoffrey Fox
 
Designing High-Performance and Scalable Middleware for HPC, AI and Data Science
Designing High-Performance and Scalable Middleware for HPC, AI and Data ScienceDesigning High-Performance and Scalable Middleware for HPC, AI and Data Science
Designing High-Performance and Scalable Middleware for HPC, AI and Data ScienceObject Automation
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR Technologies
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...BigDataEverywhere
 
Development and Applications of Distributed IoT Sensors for Intermittent Conn...
Development and Applications of Distributed IoT Sensors for Intermittent Conn...Development and Applications of Distributed IoT Sensors for Intermittent Conn...
Development and Applications of Distributed IoT Sensors for Intermittent Conn...InfluxData
 
Designing High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPCDesigning High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPCObject Automation
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax Academy
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...inside-BigData.com
 
David Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TESDavid Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TESSysFera
 
ACM HPDC 2010参加報告
ACM HPDC 2010参加報告ACM HPDC 2010参加報告
ACM HPDC 2010参加報告Ryousei Takano
 
OLPC Mesh networking improvements
OLPC Mesh networking improvementsOLPC Mesh networking improvements
OLPC Mesh networking improvementsOSLL
 
Background scenario drivers and critical issues with a focus on technology ...
Background   scenario drivers and critical issues with a focus on technology ...Background   scenario drivers and critical issues with a focus on technology ...
Background scenario drivers and critical issues with a focus on technology ...bdemchak
 
Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraLarry Smarr
 
AMS 250 - High-Performance, Massively Parallel Computing with FLASH
AMS 250 - High-Performance, Massively Parallel Computing with FLASH AMS 250 - High-Performance, Massively Parallel Computing with FLASH
AMS 250 - High-Performance, Massively Parallel Computing with FLASH dongwook159
 

Similar to High Performance Distributed Computing and Data Science (20)

Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
Rack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC Supercomputer
 
5g, gpu and fpga
5g, gpu and fpga5g, gpu and fpga
5g, gpu and fpga
 
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
 
Designing High-Performance and Scalable Middleware for HPC, AI and Data Science
Designing High-Performance and Scalable Middleware for HPC, AI and Data ScienceDesigning High-Performance and Scalable Middleware for HPC, AI and Data Science
Designing High-Performance and Scalable Middleware for HPC, AI and Data Science
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community Edition
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
 
Development and Applications of Distributed IoT Sensors for Intermittent Conn...
Development and Applications of Distributed IoT Sensors for Intermittent Conn...Development and Applications of Distributed IoT Sensors for Intermittent Conn...
Development and Applications of Distributed IoT Sensors for Intermittent Conn...
 
Designing High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPCDesigning High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPC
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
 
David Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TESDavid Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TES
 
Grid computing
Grid computingGrid computing
Grid computing
 
ACM HPDC 2010参加報告
ACM HPDC 2010参加報告ACM HPDC 2010参加報告
ACM HPDC 2010参加報告
 
OLPC Mesh networking improvements
OLPC Mesh networking improvementsOLPC Mesh networking improvements
OLPC Mesh networking improvements
 
Background scenario drivers and critical issues with a focus on technology ...
Background   scenario drivers and critical issues with a focus on technology ...Background   scenario drivers and critical issues with a focus on technology ...
Background scenario drivers and critical issues with a focus on technology ...
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated Era
 
TransPAC3/ACE Measurement & PerfSONAR Update
TransPAC3/ACE Measurement & PerfSONAR UpdateTransPAC3/ACE Measurement & PerfSONAR Update
TransPAC3/ACE Measurement & PerfSONAR Update
 
AMS 250 - High-Performance, Massively Parallel Computing with FLASH
AMS 250 - High-Performance, Massively Parallel Computing with FLASH AMS 250 - High-Performance, Massively Parallel Computing with FLASH
AMS 250 - High-Performance, Massively Parallel Computing with FLASH
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 

High Performance Distributed Computing and Data Science

  • 1. DS RC Data Science Research Center High Performance Distributed Computing Henri Bal Vrije Universiteit Amsterdam
  • 2. DS RC Outline 1. Development of the field 2. Highlights VU-HPDC group 3. Links to data science cycle 4. Conclusions
  • 3. DS RC Developments • Multiple types of data explosions: – Big data: huge processing/transportation demands – Complex heterogeneous data 10-100 x global internet traffic per year, exascale processing Complex data
  • 4. DS RC Developments • Infrastructure explosion – High complexity: heterogeneous systems with diversity of processors, systems, networks
  • 5. DS RC VU HPDC GROUP • Bridge the gap between demanding applications and complex infrastructure • Distributed programming systems for – – – – Clusters, grids, clouds Heterogeneous systems (``Jungles”) Accelerators (GPUs) Clouds & mobile devices • Applications: multimedia, semantic web, model checking, games, astronomy, astrophysics, climate modeling ….
  • 6. DS RC Highlights VU-HPDC group 889Billion game states 2002 Solved Awari Multimedia data AAAI-VC 2007 Multimedia data Semantic web 3rd Prize: ISWC 2008 Astronomy data DACH 2008 - BS DACH 2008 - FT Semantic web 1st Prize: SCALE 2008 1st Prize: SCALE 2010 EYR 2011 Sustainability award
  • 7. DS RC Links to data science cycle Visual Analytics Perception Cognition Decision Theory Understand and decide Distributed reasoning Distributed Processing Reasoning Knowledge representati on Large Scale Databases Store and process Software Eng. System / Network Eng. Analyze and model Multimedia Retrieval Modeling and simulation Information Retrieval Machine Learning
  • 8. DS RC Reasoning – Semantic Web • Make the Web smarter by injecting meaning so that machines can “understand” it. o initial idea by Tim Berners-Lee in 2001 • Now attracted the interest of big IT companies
  • 11. DS RC Distributed Reasoning • WebPIE: web-scale distributed reasoner doing full materialization • QueryPIE: distributed reasoning with backward-chaining + pre-materialization of schema-triples • DynamiTE: maintains materialization after updates (additions & removals)  Challenge: real-time incremental reasoning on web scale, combining new (streaming) data & existing historic data With: Jacopo Urbani, Alessandro Margara, Frank van Harmelen COMMIT/
  • 12. DS R C Distributed Computing • Jungle computing with Ibis – Distributed, heterogeneous, hierarchical systems • Programming accelerators With: NLeSC (Frank Seinstra, Rob van Nieuwpoort et al.)
  • 13. DS RC Ibis • Computational Astrophysics (Leiden) gravitational dynamics stellar evolution AMUSE radiative transport • Climate Modeling (Utrecht) • Multimedia Content Analysis (UvA) hydrodynamics
  • 14. DS RC Accelerators (GPUs) Host Interface GigaThread Engine GPC GPC SM SM SM SM SM GPC SM SM SM SM SM SM SM GPC Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine SM Polymorph Engine Polymorph Engine Memory Controller Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine L2 Cache Polymorph Engine Polymorph Engine SM Polymorph Engine Polymorph Engine SM Polymorph Engine Polymorph Engine SM Polymorph Engine Polymorph Engine SM GPC SM Polymorph Engine Polymorph Engine SM SM SM SM SM Raster Engine GPC SM SM SM SM SM GPC SM Raster Engine GPC • Methodology for efficient GPU programming – Stepwise refinement, different levels of hardware abstraction – Compiler feedback at each level  Challenge: getting grip on performance Memory Controller Memory Controller SM Memory Controller – Multimedia content analysis – Climate modeling – LOFAR (pulsar pipelines) Raster Engine SM Memory Controller • Use cases Memory Controller Raster Engine SM
  • 15. DS RC Glasswing: MapReduce on Accelerators • Use accelerators (OpenCL) as mainstream feature • Massive out-of-core data sets • Scale vertically & horizontally • Maintain MapReduce abstraction With: Ismail El Helw, Rutger Hofman, UvA-SNE
  • 16. DS RC Glasswing Pipeline • Overlaps computation, communication & disk access • Supports multiple buffering levels
  • 17. DS RC Evaluation (DAS-4, EC2) • Compute-bound applications benefit dramatically from GPUs (up to 107×) • Better scalability than Hadoop • Runs on a variety of accelerators & clouds  Challenge: real-world (compute-intensive) applications
  • 18. DS RC Conclusions • Strong links with Big data & Complex data Visual Analytics Perception Cognition Decision Theory Understand and decide Distributed Processing Reasoning Knowledge representati on Large Scale Databases Store and process Software Eng. System / Network Eng. Analyze and model Multimedia Retrieval Modeling and simulation Information Retrieval Machine Learning