The High Performance Computing (HPC) community is facing a technology shift which will result in a performance boost by three orders of magnitudes within the next 5 years. This rise of performance will mainly be acquainted by increasing the level of concurrency in such a way that a user of those systems needs to accommodate to billion way parallelism. The main problems to solve are: Programmability, Portability, Energy Saving and Resiliency. The author believes that leveraging modern C++ will lead to a possible solution to those problems from a software perspective.
This talk will discuss the use of C++ in such a massively parallel environment: Using the HPX Parallel Runtime System - a future based API - to present a lightweight and efficient mechanism to support massively parallel, multi-way parallelism.
Introduction to HPC Programming Models - EUDAT Summer School (Stefano Markidi...EUDAT
Stefano will give an introduction to the most common and used programming models for performing parallel I/O on supercomputers. He will first give a broad overview of parallel APIs for programming I/O on supercomputers. He will then introduce MPI I/O, one of the most used programming interfaces for parallel I/O, presenting its basic concepts, providing programming examples and guidelines for achieving high performance I/O on supercomputers.
Visit: https://www.eudat.eu/eudat-summer-school
Introduction to HPC Programming Models - EUDAT Summer School (Stefano Markidi...EUDAT
Stefano will give an introduction to the most common and used programming models for performing parallel I/O on supercomputers. He will first give a broad overview of parallel APIs for programming I/O on supercomputers. He will then introduce MPI I/O, one of the most used programming interfaces for parallel I/O, presenting its basic concepts, providing programming examples and guidelines for achieving high performance I/O on supercomputers.
Visit: https://www.eudat.eu/eudat-summer-school
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...Accumulo Summit
Talk Abstract
The ability to collect and analyze large amounts of data is a growing problem amongst the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data: volume, velocity and variety.
This tutorial aims to provide researchers and practitioners with a range of tools and techniques that they can use in conjunction with Apache Accumulo to close this gap. The proposed tutorial will focus on building solid fundamentals using a rapid prototyping tool – the Dynamic Distributed Dimensional Data Model (D4M) – to quickly prototype new algorithms that can be tested with Apache Accumulo. The tutorial will be suitable for participants from all levels of experience using Apache Accumulo. The tutorial will begin with a general introduction of the big data landscape in order to align terminology and provide a unified view of the system regardless of participant background. The tutorial will then discuss systems engineering and how it applies to big data systems. We will then introduce D4M and provide examples of D4M being used for analytics such as dimensional analysis and background model fitting. We will then discuss current areas of research on security and privacy as well as graph algorithms. Tutorial slides will be distributed to participants and brief demonstrations will be used to reinforce concepts.
The goals of the tutorial are 1) to provide participants with a theoretical foundation of big data; 2) to demonstrate how Accumulo can be used to solve real problems from diverse domains; and 3) describe future avenues of research. This tutorial provides a deep dive into the topics presented at the 2014 Accumulo Summit in the presentation entitled: “Addressing Big Data Challenges through Innovative Architecture, Databases and Software”.
Speakers
Vijay Gadepally
Technical Staff, Lincoln Laboratory, MIT
Lauren Edwards
Associate Technical Staff, Lincoln Laboratory, MIT
Jeremy Kepner
Senior Technical Staff, Lincoln Laboratory, MIT
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream AnalysisJonas Traub
This is our presentation for the German paper "Die Apache Flink Plattform zur parallelen Analyse von Datenströmen und Stapeldaten" which was published in Proceedings of the
LWA 2015 Workshops: KDML, FGWM, IR, and FGDB. Trier, Germany, 7.-9. October 2015. Link: http://ceur-ws.org/Vol-1458/H02_CRC79_Traub.pdf
A relatively short Introduction to R as presented at the Belgian Software Craftmanship meetup group.
The goal of this presentation is to give you an introduction to:
• The style of the language
• It's ecosystem
• How common things like data manipulation and visualization work
• How to use it for machine learning
• Webdevelopment and report generation in R
• Integrating R in your system
License:
Introduction To R by Samuel Bosch
To the extent possible under law, the person who associated CC0 with Introduction To R has waived all copyright and related or neighboring rights
to Introduction To R.
http://creativecommons.org/publicdomain/zero/1.0/
Best Hadoop Institutes : kelly tecnologies is the best Hadoop training Institute in Bangalore.Providing hadoop courses by realtime faculty in Bangalore.
What's new in 1.9.0 blink planner - Kurt Young, AlibabaFlink Forward
Flink 1.9.0 added the ability to support multiple SQL planners under the same API. With this help. we successfully merged a lot features which comes from Alibaba's internal flink version, called blink. In this talk, I will give a introduction about the architecture of the blink planner, and also share with you the functionalities and performance enhancements we added.
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationApache Apex
This webinar will be a hands-on demonstration of how to clone and build the Apache Apex source code repositories, how to run the maven archetype to create a new Apex project, how to enhance it to build a word counting application and finally, how to run it and view results. We will also do a brief code walkthrough.
Bio:
Dr. Munagala V. Ramanath is a Committer for Apache Apex and a Software Engineer at DataTorrent. He has many years experience working for a variety of companies in California and a Ph.D. in Computer Science from the University of Wisconsin, Madison.
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...Accumulo Summit
Talk Abstract
The ability to collect and analyze large amounts of data is a growing problem amongst the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data: volume, velocity and variety.
This tutorial aims to provide researchers and practitioners with a range of tools and techniques that they can use in conjunction with Apache Accumulo to close this gap. The proposed tutorial will focus on building solid fundamentals using a rapid prototyping tool – the Dynamic Distributed Dimensional Data Model (D4M) – to quickly prototype new algorithms that can be tested with Apache Accumulo. The tutorial will be suitable for participants from all levels of experience using Apache Accumulo. The tutorial will begin with a general introduction of the big data landscape in order to align terminology and provide a unified view of the system regardless of participant background. The tutorial will then discuss systems engineering and how it applies to big data systems. We will then introduce D4M and provide examples of D4M being used for analytics such as dimensional analysis and background model fitting. We will then discuss current areas of research on security and privacy as well as graph algorithms. Tutorial slides will be distributed to participants and brief demonstrations will be used to reinforce concepts.
The goals of the tutorial are 1) to provide participants with a theoretical foundation of big data; 2) to demonstrate how Accumulo can be used to solve real problems from diverse domains; and 3) describe future avenues of research. This tutorial provides a deep dive into the topics presented at the 2014 Accumulo Summit in the presentation entitled: “Addressing Big Data Challenges through Innovative Architecture, Databases and Software”.
Speakers
Vijay Gadepally
Technical Staff, Lincoln Laboratory, MIT
Lauren Edwards
Associate Technical Staff, Lincoln Laboratory, MIT
Jeremy Kepner
Senior Technical Staff, Lincoln Laboratory, MIT
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream AnalysisJonas Traub
This is our presentation for the German paper "Die Apache Flink Plattform zur parallelen Analyse von Datenströmen und Stapeldaten" which was published in Proceedings of the
LWA 2015 Workshops: KDML, FGWM, IR, and FGDB. Trier, Germany, 7.-9. October 2015. Link: http://ceur-ws.org/Vol-1458/H02_CRC79_Traub.pdf
A relatively short Introduction to R as presented at the Belgian Software Craftmanship meetup group.
The goal of this presentation is to give you an introduction to:
• The style of the language
• It's ecosystem
• How common things like data manipulation and visualization work
• How to use it for machine learning
• Webdevelopment and report generation in R
• Integrating R in your system
License:
Introduction To R by Samuel Bosch
To the extent possible under law, the person who associated CC0 with Introduction To R has waived all copyright and related or neighboring rights
to Introduction To R.
http://creativecommons.org/publicdomain/zero/1.0/
Best Hadoop Institutes : kelly tecnologies is the best Hadoop training Institute in Bangalore.Providing hadoop courses by realtime faculty in Bangalore.
What's new in 1.9.0 blink planner - Kurt Young, AlibabaFlink Forward
Flink 1.9.0 added the ability to support multiple SQL planners under the same API. With this help. we successfully merged a lot features which comes from Alibaba's internal flink version, called blink. In this talk, I will give a introduction about the architecture of the blink planner, and also share with you the functionalities and performance enhancements we added.
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationApache Apex
This webinar will be a hands-on demonstration of how to clone and build the Apache Apex source code repositories, how to run the maven archetype to create a new Apex project, how to enhance it to build a word counting application and finally, how to run it and view results. We will also do a brief code walkthrough.
Bio:
Dr. Munagala V. Ramanath is a Committer for Apache Apex and a Software Engineer at DataTorrent. He has many years experience working for a variety of companies in California and a Ph.D. in Computer Science from the University of Wisconsin, Madison.
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Apache Apex
Presenter:
Priyanka Gugale, Committer for Apache Apex and Software Engineer at DataTorrent.
In this session we will cover introduction to Yarn, understanding yarn architecture as well as look into Yarn application lifecycle. We will also learn how Apache Apex is one of the Yarn applications in Hadoop.
The 5 People in your Organization that grow Legacy CodeRoberto Cortez
Have you ever looked at a random piece of code and wanted to rewrite it so badly? It’s natural to have legacy code in your application at some point. It’s something that you need to accept and learn to live with. So is this a lost cause? Should we just throw in the towel and give up? Hell no! Over the years, I learned to identify 5 main creators/enablers of legacy code on the engineering side, which I’m sharing here with you using real development stories (with a little humour in the mix). Learn to keep them in line and your code will live longer!
Introduction to Apache Apex and writing a big data streaming application Apache Apex
Introduction to Apache Apex - The next generation native Hadoop platform, and writing a native Hadoop big data Apache Apex streaming application.
This talk will cover details about how Apex can be used as a powerful and versatile platform for big data. Apache apex is being used in production by customers for both streaming and batch use cases. Common usage of Apache Apex includes big data ingestion, streaming analytics, ETL, fast batch. alerts, real-time actions, threat detection, etc.
Presenter : <b>Pramod Immaneni</b> Apache Apex PPMC member and senior architect at DataTorrent Inc, where he works on Apex and specializes in big data applications. Prior to DataTorrent he was a co-founder and CTO of Leaf Networks LLC, eventually acquired by Netgear Inc, where he built products in core networking space and was granted patents in peer-to-peer VPNs. Before that he was a technical co-founder of a mobile startup where he was an architect of a dynamic content rendering engine for mobile devices.
This is a video of the webcast of an Apache Apex meetup event organized by Guru Virtues at 267 Boston Rd no. 9, North Billerica, MA, on <b>May 7th 2016</b> and broadcasted from San Jose, CA. If you are interested in helping organize i.e., hosting, presenting, community leadership Apache Apex community, please email apex-meetup@datatorrent.com
Apache Hadoop: design and implementation. Lecture in the Big data computing course (http://twiki.di.uniroma1.it/twiki/view/BDC/WebHome), Department of Computer Science, Sapienza University of Rome.
Slides from the Introduction to UNIX Command-Lines class from the BTI Plant Bioinformatics course 2014. This is a course teach by the Sol Genomics Network researchers at the Boyce Thompson Institute.
Measuring the time spent on small individual fractions of program code is a common technique for analysing performance behavior and detecting performance bottlenecks. The benefits of the approach include a detailed individual attribution of performance and understandable feedback loops when experimenting with different code versions. There are however severe pitfalls when following this approach that can lead to vastly misleading results. Modern dynamic compilers use complex optimisation techniques that take a large part of the program into account. There can be therefore unexpected side-effects when combining different code snippets or even when running a presumably unrelated part of the code. This talk will present performance paradoxes with examples from the domain of dynamic compilation of Java programs. Furthermore, it will discuss an alternative approach to modelling code performance characteristics that takes the challenges of complex optimising compilers into account.
The Download: Tech Talks by the HPCC Systems Community, Episode 11HPCC Systems
Join us as we continue this series of webinars specifically designed for the community by the community with the goal to share knowledge, spark innovation and further build and link the relationships within our HPCC Systems community.
Episode 11 includes Tech Talks featuring speakers from our community on topics covering Big Data solutions, Spark Integration and other ECL Tips leveraging the HPCC Systems platform.
1) Raj Chandrasekaran, CTO & Co-Founder, ClearFunnel - Scaling Data Science capabilities: Leveraging a homogeneous Big Data ecosystem
2) James McMullan, Software Engineer III, LexisNexis Risk Solutions - HDFS Connector Preview
3) Bob Foreman, Senior Software Engineer, LexisNexis Risk Solutions - Building a RELATIONal Dataset - A Valentine’s Day Special!
Meetup: Big Data NLP with HPCC Systems® - A Development Ride from Spray to TH...HPCC Systems
HPCC (High Performance Computing Cluster) Systems from LexisNexis is an open source massive parallel-processing computing platform that solves Big Data problems. In this talk, attendees will be given an overview of HPCC Systems and see a demonstration of its use to parse data from free-form and semi-structured text. This represents a combined text extraction task with human intervention. The code elements and massively parallel processing principles involved in accomplishing these tasks will be thoroughly discussed.
Blending Supersonic, Subatomic Java with deep learning to perform object detection. Sounds interesting? Because it is! Then watch this session to learn how to create a microservice combining TensorFlow and Quarkus together into one executable using GraalVM native image, JNI, and Protobuf. With this, we detect objects in photos by returning labels, bounding boxes, and confidence scores. Additionally, we will touch on Open Data Hub, an AI/ML solution for OpenShift.
HDR Defence - Software Abstractions for Parallel ArchitecturesJoel Falcou
Performing large, intensive or non-trivial computing on array like data
structures is one of the most common task in scientific computing, video game
development and other fields. This matter of fact is backed up by the large number
of tools, languages and libraries to perform such tasks. If we restrict ourselves to
C++ based solutions, more than a dozen such libraries exists from BLAS/LAPACK
C++ binding to template meta-programming based Blitz++ or Eigen.
If all of these libraries provide good performance or good abstraction, none of
them seems to fit the need of so many different user types. Moreover, as parallel
system complexity grows, the need to maintain all those components quickly
become unwieldy. This thesis explores various software design techniques - like
Generative Programming, MetaProgramming and Generic Programming - and their
application to the implementation of various parallel computing libraries in such a
way that abstraction and expressiveness are maximized while efficiency overhead is
minimized.
This workshop will provide a hands-on introduction to Apache Spark and Apache Zeppelin in the cloud.
Format: A short introductory lecture on Apache Spark covering core modules (SQL, Streaming, MLlib, GraphX) followed by a demo, lab exercises and a Q&A session. The lecture will be followed by lab time to work through the lab exercises and ask questions.
Objective: To provide a quick and short hands-on introduction to Apache Spark. This lab will use the following Spark and Apache Hadoop components: Spark, Spark SQL, Apache Hadoop HDFS, Apache Hadoop YARN, Apache ORC, and Apache Ambari Zepellin. You will learn how to move data into HDFS using Spark APIs, create Apache Hive tables, explore the data with Spark and Spark SQL, transform the data and then issue some SQL queries.df
Lab pre-requisites: Registrants must bring a laptop with a Chrome or Firefox web browser installed (with proxies disabled). Alternatively, they may download and install an HDP Sandbox as long as they have at least 16GB of RAM available (Note that the sandbox is over 10GB in size so we recommend downloading it before the crash course).
Speakers: Robert Hryniewicz
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2018-embedded-vision-summit-trevett
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Neil Trevett, President of the Khronos Group and Vice President at NVIDIA, presents the "APIs for Accelerating Vision and Inferencing: Options and Trade-offs" tutorial at the May 2018 Embedded Vision Summit.
The landscape of SDKs, APIs and file formats for accelerating inferencing and vision applications continues to rapidly evolve. Low-level compute APIs, such as OpenCL, Vulkan and CUDA are being used to accelerate inferencing engines such as OpenVX, CoreML, NNAPI and TensorRT. Inferencing engines are being fed via neural network file formats such as NNEF and ONNX. Some of these APIs, like OpenCV, are vision-specific, while others, like OpenCL, are general-purpose. Some engines, like CoreML and TensorRT, are supplier-specific, while others, such as OpenVX, are open standards that any supplier can adopt. Which ones should you use for your project?
In this presentation, Trevett presents the current landscape of APIs, file formats and SDKs for inferencing and vision acceleration, explaining where each one fits in the development flow. Trevett also highlights where these APIs overlap and where they complement each other, and previews some of the latest developments in these APIs.
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesAkihiro Hayashi
With the shift to exascale computer systems, the importance of productive programming models for distributed systems is increasing. Partitioned Global Address Space (PGAS) programming models aim to reduce the complexity of writing distributed-memory parallel programs by introducing global operations on distributed arrays, distributed task parallelism, directed synchronization, and mutual exclusion. However, a key challenge in the application of PGAS programming models is the improvement of compilers and runtime systems. In particular, one open question is how runtime systems meet the requirement of exascale systems, where a large number of asynchronous tasks are executed.
While there are various tasking runtimes such as Qthreads, OCR, and HClib, there is no existing comparative study on PGAS tasking/threading runtime systems. To explore runtime systems for PGAS programming languages, we have implemented OCR-based and HClib-based Chapel runtimes and evaluated them with an initial focus on tasking and synchronization implementations. The results show that our OCR and HClib-based implementations can improve the performance of PGAS programs compared to the ex- isting Qthreads backend of Chapel.
Slides from the talk:
Aleš Zamuda. EuroHPC AI in DAPHNE. Severo Ochoa Research Seminars. 12/Sep/2023, 1-3-2 Room, BSC Main Building and Zoom. Barcelona Supercomputing Center, Barcelona, Spain.
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsAlluxio, Inc.
Alluxio foresaw the need for agility when accessing data across silos separated from compute engines like Spark, Presto, Tensorflow and PyTorch. Embracing the separation of storage from compute, the Alluxio data orchestration platform simplifies adoption of the data lake and data mesh paradigm for analytics and AI/ML. In this talk, Bin Fan will share observations to help identify ways to use the platform to meet the needs of your data environment and workloads.
越來越多的企業架構已轉向混合雲和多雲環境。雖然這種轉變帶來了更大的靈活性和敏捷性,但也意味著必須將計算與存儲分離,這就對企業跨框架、跨雲和跨存儲系統的數據管理和編排提出了新的挑戰。此分享將讓聽眾深入了解Alluxio數據編排理念在數據中台對存儲和計算的解耦作用,以及數據編排針對存算分離場景提出的創新架構,同時結合來自金融、運營商、互聯網等行業的典型應用場景來展現Alluxio如何為大數據計算帶來真正的加速,以及如何將數據編排技術用於AI模型訓練!
At the technology meeting of the Association of Independent Research Centers (http://airi.org): An overview of recent Scientific Computing activities at Fred Hutch, Seattle
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/08/khronos-group-standards-powering-the-future-of-embedded-vision-a-presentation-from-the-khronos-group/
Neil Trevett, Vice President of Developer Ecosystems at NVIDIA and President of the Khronos Group, presents the “Khronos Group Standards: Powering the Future of Embedded Vision” tutorial at the May 2021 Embedded Vision Summit.
Open standards play an important role in enabling interoperability for faster, easier deployment of vision-based systems. With advances in machine learning, the number of accelerators, processors, libraries and compilers in the market is rapidly increasing. Proprietary APIs and formats create a complex industry landscape that can hinder overall market growth.
The Khronos Group’s open standards for accelerating parallel programming play a major role in deploying inferencing and embedded vision applications and include SYCL, OpenVX, NNEF, Vulkan, SPIR, and OpenCL. Trevett provides an up-to-the-minute overview and update on the Khronos embedded vision ecosystem, highlighting the capabilities and benefits of each API, giving viewers insight into which standards may be relevant to their own embedded vision projects, and discussing the future directions of these key industry initiatives.
Similar to C++ on its way to exascale and beyond -- The HPX Parallel Runtime System (20)
Hand Rolled Applicative User ValidationCode KataPhilip Schwarz
Could you use a simple piece of Scala validation code (granted, a very simplistic one too!) that you can rewrite, now and again, to refresh your basic understanding of Applicative operators <*>, <*, *>?
The goal is not to write perfect code showcasing validation, but rather, to provide a small, rough-and ready exercise to reinforce your muscle-memory.
Despite its grandiose-sounding title, this deck consists of just three slides showing the Scala 3 code to be rewritten whenever the details of the operators begin to fade away.
The code is my rough and ready translation of a Haskell user-validation program found in a book called Finding Success (and Failure) in Haskell - Fall in love with applicative functors.
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppGoogle
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-fusion-buddy-review
AI Fusion Buddy Review: Key Features
✅Create Stunning AI App Suite Fully Powered By Google's Latest AI technology, Gemini
✅Use Gemini to Build high-converting Converting Sales Video Scripts, ad copies, Trending Articles, blogs, etc.100% unique!
✅Create Ultra-HD graphics with a single keyword or phrase that commands 10x eyeballs!
✅Fully automated AI articles bulk generation!
✅Auto-post or schedule stunning AI content across all your accounts at once—WordPress, Facebook, LinkedIn, Blogger, and more.
✅With one keyword or URL, generate complete websites, landing pages, and more…
✅Automatically create & sell AI content, graphics, websites, landing pages, & all that gets you paid non-stop 24*7.
✅Pre-built High-Converting 100+ website Templates and 2000+ graphic templates logos, banners, and thumbnail images in Trending Niches.
✅Say goodbye to wasting time logging into multiple Chat GPT & AI Apps once & for all!
✅Save over $5000 per year and kick out dependency on third parties completely!
✅Brand New App: Not available anywhere else!
✅ Beginner-friendly!
✅ZERO upfront cost or any extra expenses
✅Risk-Free: 30-Day Money-Back Guarantee!
✅Commercial License included!
See My Other Reviews Article:
(1) AI Genie Review: https://sumonreview.com/ai-genie-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIFusionBuddyReview,
#AIFusionBuddyFeatures,
#AIFusionBuddyPricing,
#AIFusionBuddyProsandCons,
#AIFusionBuddyTutorial,
#AIFusionBuddyUserExperience
#AIFusionBuddyforBeginners,
#AIFusionBuddyBenefits,
#AIFusionBuddyComparison,
#AIFusionBuddyInstallation,
#AIFusionBuddyRefundPolicy,
#AIFusionBuddyDemo,
#AIFusionBuddyMaintenanceFees,
#AIFusionBuddyNewbieFriendly,
#WhatIsAIFusionBuddy?,
#HowDoesAIFusionBuddyWorks
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Looking for a reliable mobile app development company in Noida? Look no further than Drona Infotech. We specialize in creating customized apps for your business needs.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Launch Your Streaming Platforms in MinutesRoshan Dwivedi
The claim of launching a streaming platform in minutes might be a bit of an exaggeration, but there are services that can significantly streamline the process. Here's a breakdown:
Pros of Speedy Streaming Platform Launch Services:
No coding required: These services often use drag-and-drop interfaces or pre-built templates, eliminating the need for programming knowledge.
Faster setup: Compared to building from scratch, these platforms can get you up and running much quicker.
All-in-one solutions: Many services offer features like content management systems (CMS), video players, and monetization tools, reducing the need for multiple integrations.
Things to Consider:
Limited customization: These platforms may offer less flexibility in design and functionality compared to custom-built solutions.
Scalability: As your audience grows, you might need to upgrade to a more robust platform or encounter limitations with the "quick launch" option.
Features: Carefully evaluate which features are included and if they meet your specific needs (e.g., live streaming, subscription options).
Examples of Services for Launching Streaming Platforms:
Muvi [muvi com]
Uscreen [usencreen tv]
Alternatives to Consider:
Existing Streaming platforms: Platforms like YouTube or Twitch might be suitable for basic streaming needs, though monetization options might be limited.
Custom Development: While more time-consuming, custom development offers the most control and flexibility for your platform.
Overall, launching a streaming platform in minutes might not be entirely realistic, but these services can significantly speed up the process compared to building from scratch. Carefully consider your needs and budget when choosing the best option for you.
Takashi Kobayashi and Hironori Washizaki, "SWEBOK Guide and Future of SE Education," First International Symposium on the Future of Software Engineering (FUSE), June 3-6, 2024, Okinawa, Japan
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Zoom is a comprehensive platform designed to connect individuals and teams efficiently. With its user-friendly interface and powerful features, Zoom has become a go-to solution for virtual communication and collaboration. It offers a range of tools, including virtual meetings, team chat, VoIP phone systems, online whiteboards, and AI companions, to streamline workflows and enhance productivity.
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Crescat
Crescat is industry-trusted event management software, built by event professionals for event professionals. Founded in 2017, we have three key products tailored for the live event industry.
Crescat Event for concert promoters and event agencies. Crescat Venue for music venues, conference centers, wedding venues, concert halls and more. And Crescat Festival for festivals, conferences and complex events.
With a wide range of popular features such as event scheduling, shift management, volunteer and crew coordination, artist booking and much more, Crescat is designed for customisation and ease-of-use.
Over 125,000 events have been planned in Crescat and with hundreds of customers of all shapes and sizes, from boutique event agencies through to international concert promoters, Crescat is rigged for success. What's more, we highly value feedback from our users and we are constantly improving our software with updates, new features and improvements.
If you plan events, run a venue or produce festivals and you're looking for ways to make your life easier, then we have a solution for you. Try our software for free or schedule a no-obligation demo with one of our product specialists today at crescat.io
C++ on its way to exascale and beyond -- The HPX Parallel Runtime System
1. C++ on its way to exascale and beyond
– The HPX Parallel Runtime System
Thomas Heller (thomas.heller@cs.fau.de)
January 21, 2016
This project has received funding from the Eu-
ropean Union‘s Horizon 2020 research and in-
novation programme under grant agreement No.
671603
2. What is Exascale anyway?
This project has received funding from the Eu-
ropean Union‘s Horizon 2020 research and in-
novation programme under grant agreement No.
671603
3. Exascale in numbers
• An Exascale Computer is supposed to execute 1018
floating point
operations in a second
• Exa: 1018
= 1000000000000000000
• People on Earth: 7.3 Billion = 7.3 ∗ 109
• Imagine each person is able to compute one operation per second. It
takes:
⇒ 136986301 seconds
⇒ 2283105 minutes
⇒ 38051 hours
⇒ 1585 days
⇒ 4 years
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
3/ 51
4. Why do we need that many calculations?
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
4/ 51
5. Why do we need that many calculations?
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
4/ 51
6. Why do we need that many calculations?
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
4/ 51
7. Why do we need that many calculations?
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
4/ 51
8. Why do we need that many calculations?
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
4/ 51
9. Why do we need that many calculations?
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
4/ 51
10. Why do we need that many calculations?
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
4/ 51
11. Why do we need that many calculations?
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
4/ 51
12. Challenges
• How do we program those beasts?
⇒ Massively parallel processors
⇒ Massive amount of compute nodes
⇒ Deep Memory hierarchies
• How can we design the architecture to be affordable?
⇒ Biggest Operational cost is Energy
⇒ Power Envelop of 20MW
⇒ Current fastest Computer (Tian-He 2): 17MW
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
5/ 51
13. Current Development
Current #1 System:
• Tian-He 2: 33.9 PFLOPS
• 4% of an Exaflop
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
6/ 51
14. Hardware Trends
• ARM: Low-Power ARM64 cores (maybe adding embedded GPU
accelerators)
• IBM: POWER + NVIDIA Accelerators
• Intel: Knights Landing (Xeon Phi) Many Core processor
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
7/ 51
15. How will C++ deal with all that?!?
This project has received funding from the Eu-
ropean Union‘s Horizon 2020 research and in-
novation programme under grant agreement No.
671603
16. Challenges
• Programmability
• Expressing Parallelism
• Expressing Data Locality
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
9/ 51
17. The 4 Horsemen of the Apocalypse: SLOW
Starvation
Latency
Overhead
Waiting for contention
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
10/ 51
18. State of the Art
• Modern architectures impose massive challenges on programmability in
the context of performance portability
• Massive increase in on-node parallelism
• Deep memory hierarchies
• Only portable parallelization solution for C++ programmers (today):
OpenMP and MPI
• Hugely successful for years
• Widely used and supported
• Simple use for simple use cases
• Very portable
• Highly optimized
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
11/ 51
19. State of the Art – Parallelism in C++
• C++11 introduced lower level abstractions
• std::thread, std::mutex, std::future, etc.
• Fairly limited, more is needed
• C++ needs stronger support for higher-level parallelism
• Several proposals to the Standardization Committee are accepted or
under consideration
• Technical Specification: Concurrency (P0159, note: misnomer)
• Technical Specification: Parallelism (P0024)
• Other smaller proposals: resumable functions, task regions, executors
• Currently there is no overarching vision related to higher-level parallelism
• Goal is to standardize a ‘big story’ by 2020
• No need for OpenMP, OpenACC, OpenCL, etc.
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
12/ 51
20. Stepping Aside – Introducing HPX
This project has received funding from the Eu-
ropean Union‘s Horizon 2020 research and in-
novation programme under grant agreement No.
671603
21. HPX – A general purpose parallel Runtime System
• Solidly based on a theoretical foundation – a well defined, new execution
model (ParalleX)
• Exposes a coherent and uniform, standards-oriented API for ease of
programming parallel and distributed applications.
• Enables to write fully asynchronous code using hundreds of millions of threads.
• Provides unified syntax and semantics for local and remote operations.
• Open Source: Published under the Boost Software License
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
14/ 51
22. HPX – A general purpose parallel Runtime System
HPX represents an innovative mixture of
• A global system-wide address space (AGAS - Active Global Address
Space)
• Fine grain parallelism and lightweight synchronization
• Combined with implicit, work queue based, message driven computation
• Full semantic equivalence of local and remote execution, and
• Explicit support for hardware accelerators (through percolation)
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
15/ 51
23. HPX 101 – The programming model
Memory
Locality 0
Memory
Locality 1
Memory
Locality i
Memory
Locality N-1
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
16/ 51
24. HPX 101 – The programming model
Global Address Space
Memory
Locality 0
Memory
Locality 1
Memory
Locality i
Memory
Locality N-1
Parcelport
Active Global Address Space (AGAS) Service
Thread-
Scheduler
Thread-
Scheduler
Thread-
Scheduler
Thread-
Scheduler
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
16/ 51
25. HPX 101 – The programming model
Global Address Space
Memory
Locality 0
Memory
Locality 1
Memory
Locality i
Memory
Locality N-1
Parcelport
Active Global Address Space (AGAS) Service
Thread-
Scheduler
Thread-
Scheduler
Thread-
Scheduler
Thread-
Scheduler
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
16/ 51
26. HPX 101 – The programming model
Global Address Space
Memory
Locality 0
Memory
Locality 1
Memory
Locality i
Memory
Locality N-1
Parcelport
Active Global Address Space (AGAS) Service
Thread-
Scheduler
Thread-
Scheduler
Thread-
Scheduler
Thread-
Scheduler
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
future <id_type > id =
new_ <Component >( locality , ...);
future <R> result =
async(id.get(), action , ...);
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
16/ 51
27. HPX 101 – The programming model
Locality 0 Locality 1 Locality i Locality N-1
Parcelport
Active Global Address Space (AGAS) Service
Thread-
Scheduler
Thread-
Scheduler
Thread-
Scheduler
Thread-
Scheduler
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
16/ 51
28. HPX 101 – Overview
HPX
C++ Standard Library
C++
R f(p...) Synchronous Asynchronous Fire & Forget
(returns R) (returns future<R>) (returns void)
Functions f(p...) async(f, p...) apply(f, p...)
(direct)
Functions bind(f, p...)(...) async(bind(f, p...), ...) apply(bind(f, p...), ...)
(lazy)
Actions HPX_ACTION(f, a) HPX_ACTION(f, a) HPX_ACTION(f, a)
(direct) a()(id, p...) async(a(), id, p...) apply(a(), id, p...)
Actions HPX_ACTION(f, a) HPX_ACTION(f, a) HPX_ACTION(f, a)
(lazy) bind(a(), id, p...)
(...)
async(bind(a(), id, p...),
...)
apply(bind(a(), id, p...),
...)
In Addition: dataflow(func, f1, f2);
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
17/ 51
29. The Future, an example
int universal_answer () { return 42; }
void deep_thought () {
future <int > promised_answer
= async(& universal_answer);
// do other things for 7.5 million years
cout << promised_answer.get() << endl;
// prints 42, eventually
}
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
18/ 51
30. Compositional facilities
• Sequential composition of futures
future <string > make_string () {
future <int > f1 =
async ([]() -> int { return 123; });
future <string > f2 = f1.then(
[](future <int > f) -> string
{
// here .get() won’t block
return to_string(f.get());
});
return f2;
}
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
19/ 51
31. Compositional facilities
• Parallel composition of futures
future <int > test_when_all () {
future <int > future1 =
async ([]() -> int { return 125; });
future <string > future2 =
async ([]() -> string { return string("hi"); });
auto all_f = when_all(future1 , future2);
future <int > result = all_f.then(
[]( auto f) -> int {
return do_work(f.get());
});
return result;
}
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
20/ 51
32. Dataflow – The new ’async’ (HPX)
• What if one or more arguments to ’async’ are futures themselves?
• Normal behavior: pass futures through to function
• Extended behavior: wait for futures to become ready before invoking the
function:
template <typename F, typename ... Arg >
future <result_of_t <F(Args ...) >>
// requires(is_callable <F(Arg ...) >)
dataflow(F && f, Arg &&... arg);
• If ArgN is a future, then the invocation of F will be delayed
• Non-future arguments are passed through
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
21/ 51
33. Parallel Algorithms
This project has received funding from the Eu-
ropean Union‘s Horizon 2020 research and in-
novation programme under grant agreement No.
671603
34. Concepts of Parallelism – Parallel Execution Properties
• The execution restrictions applicable for the work items
• In what sequence the work items have to be executed
• Where the work items should be executed
• The parameters of the execution environment
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
23/ 51
35. Concepts and Types of Parallelism
Application
Concepts
Execution Policies
Executors Executor Parameters
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
24/ 51
36. Concepts and Types of Parallelism
Application
Concepts
Execution Policies
Executors Executor Parameters
Restrictions
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
24/ 51
37. Concepts and Types of Parallelism
Application
Concepts
Execution Policies
Executors Executor Parameters
Restrictions
Sequence, Where
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
24/ 51
38. Concepts and Types of Parallelism
Application
Concepts
Execution Policies
Executors Executor Parameters
Restrictions
Sequence, Where
Grain Size
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
24/ 51
39. Concepts and Types of Parallelism
Application
Concepts
Execution Policies
Executors Executor Parameters
Restrictions
Sequence, Where
Grain Size
Futures, Async, Dataflow
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
24/ 51
40. Concepts and Types of Parallelism
Application
Concepts
Execution Policies
Executors Executor Parameters
Restrictions
Sequence, Where
Grain Size
Futures, Async, Dataflow
Parallel Algorithms Fork-Join, etc
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
24/ 51
41. Execution Policies (std)
• Specify execution guarantees (in terms of thread-safety) for executed
parallel tasks:
• sequential_execution_policy: seq
• parallel_execution_policy: par
• parallel_vector_execution_policy: par_vec
• In parallelism TS used for parallel algorithms only
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
25/ 51
42. Execution Policies (Extensions)
• Asynchronous Execution Policies:
• sequential_task_execution_policy: seq(task)
• parallel_task_execution_policy: par(task)
• In both cases the formerly synchronous functions return a future<>
• Instruct the parallel construct to be executed asynchronously
• Allows integration with asynchronous control flow
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
26/ 51
43. Executors
• Executor are objects responsible for
• Creating execution agents on which work is performed (P0058)
• In P0058 this is limited to parallel algorithms, here much broader use
• Abstraction of the (potentially platform-specific) mechanisms for launching
work
• Responsible for defining the Where and How of the execution of tasks
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
27/ 51
44. Execution Parameters
Allows to control the grain size of work
• i.e. amount of iterations of a parallel for_each run on the same thread
• Similar to OpenMP scheduling policies: static, guided, dynamic
• Much more fine control
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
28/ 51
45. Putting it all together – SAXPY routine with data locality
• a[i] = b[i] ∗ x + c[i], for i from 0 to N − 1
• Using parallel algorithms
• Explicit Control over data locality
• No raw Loops
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
29/ 51
46. Putting it all together – SAXPY routine with data locality
Complete serial version:
std::vector <double > a = ...;
std::vector <double > b = ...;
std::vector <double > c = ...;
double x = ...;
std:: transform(b.begin(), b.end(),
c.begin(), c.end(), a.begin(),
[x]( double bb, double cc)
{
return bb * x + cc;
});
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
30/ 51
47. Putting it all together – SAXPY routine with data locality
Parallel version, no data locality:
std::vector <double > a = ...;
std::vector <double > b = ...;
std::vector <double > c = ...;
double x = ...;
parallel :: transform(parallel ::par ,
b.begin(), b.end(),
c.begin(), c.end(), a.begin(),
[x]( double bb, double cc)
{
return bb * x + cc;
});
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
31/ 51
48. Putting it all together – SAXPY routine with data locality
Parallel version, no data locality:
std::vector <double , numa_allocator > a = ...;
std::vector <double , numa_allocator > b = ...;
std::vector <double , numa_allocator > c = ...;
double x = ...;
for(numa_executor : numa_executors) {
parallel :: transform(
parallel ::par.on(numa_executor),
b.begin() +..., b.begin() +...,
c.begin() +..., c.begin() +..., a.begin() +...,
[x]( double bb, double cc)
{ return bb * x + cc; });
}
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
32/ 51
49. Case Studies
This project has received funding from the Eu-
ropean Union‘s Horizon 2020 research and in-
novation programme under grant agreement No.
671603
50. LibGeoDecomp
• C++ Auto-parallelizing framework
• Open Source
• High scalability
• Wide range of platform support
• http://www.libgeodecomp.org
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
34/ 51
51. LibGeoDecomp
Futurizing the Simulation Flow
Basic Simulation flow:
for(Region r: innerRegion) {
update(r, oldGrid , newGrid , step);
}
swap(oldGrid , newGrid);
++step;
for(Region r: outerGhostZoneRegion) {
notifyPatchProviders(r, oldGrid);
}
for(Region r: outerGhostZoneRegion) {
update(r, oldGrid , newGrid , step);
}
for(Region r: innerGhostZoneRegion) {
notifyPatchAccepters(r, oldGrid);
}
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
35/ 51
52. LibGeoDecomp
Futurizing the Simulation Flow
Futurized Simulation flow:
parallel for(Region r: innerRegion) {
update(r, oldGrid , newGrid , step);
}
swap(oldGrid , newGrid); ++ step;
parallel for(Region r: outerGhostZoneRegion) {
notifyPatchProviders(r, oldGrid);
}
parallel for(Region r: outerGhostZoneRegion) {
update(r, oldGrid , newGrid , step);
}
parallel for(Region r: innerGhostZoneRegion) {
notifyPatchAccepters(r, oldGrid);
}
Continuation
Continuation
Continuation
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
36/ 51
53. HPXCL – Extending the Global Adress Space
• All GPU devices are addressable globally
• GPU memory can be allocated and referenced remotely
• Events are extensions of the shared state
⇒ API embedded into the already existing future facilities
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
37/ 51
54. From async to GPUs
Spawning single tasks not feasible
⇒ offload a work group (Think of parallel::for_each)
auto devices
= hpx:: opencl :: find_devices(hpx:: find_here (),
CL_DEVICE_TYPE_GPU).get();
// create buffers , programs and kernels ...
hpx:: opencl :: buffer buf = devices [0]. create_buffer(
CL_MEM_READ_WRITE , 4711);
auto write_future = buf.enqueue_write(some_vec.
begin(), some_vec.end());
auto kernel_future = kernel.enqueue(dim ,
write_future);
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
38/ 51
55. From async to GPUs
Spawning single tasks not feasible
⇒ offload a work group (Think of parallel::for_each)
• Proof of Concept
• Future Directions:
• Embedd OpenCL devices behind Execution Policies and Executors
• Hide OpenCL stuff behind parallel algorithms
• Hide OpenCL buffer management behind "distributed data structures"
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
38/ 51
57. Mandelbrot example
Acknowledgements to Martin Stumpf
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
40/ 51
58. Mandelbrot example
Acknowledgements to Martin Stumpf
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
41/ 51
59. LibGeoDecomp
Performance Results
0
10
20
30
40
50
60
70
1 2 4 8 16
Time[s]
Number of Cores, on one Node
Execution Times of HPX and MPI N-Body Codes
(SMP, Weak Scaling)
Sim HPX
Sim MPI
Comm HPX
Comm MPI
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
42/ 51
61. LibGeoDecomp
Performance Results
0
200
400
600
800
1000
1200
1400
1600
0 10 20 30 40 50 60
PerformanceinGFLOPS
Number of Cores
Weak Scaling Results for HPX N-Body Code
(Single Xeon Phi, Futurized)
1 Thread/Core
2 Threads/Core
3 Threads/Core
4 Threads/Core
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
42/ 51
62. LibGeoDecomp
Performance Results
0
5
10
15
20
25
30
0 2 4 6 8 10 12 14 16
PerformanceinTFLOPS
Number of Nodes, 16 Cores on Host, Full Xeon Phi
Weak Scaling Results for HPX N-Body Codes
(Host Cores and Xeon Phi Accelerator)
HPX
Peak
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
42/ 51
63. STREAM Benchmark
10
20
30
40
50
60
70
80
1 2 3 4 5 6 7 8 9 10 11 12
Bandwidth[GB/s]
Number of cores per NUMA Domain
TRIAD STREAM Results
(50 million data points)
HPX (1 NUMA Domain)
OpenMP (1 NUMA Domain)
HPX (2 NUMA Domains)
OpenMP (2 NUMA Domains)
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
43/ 51
64. Matrix Transpose
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8 9 10 11 12
Datatransferrate[GB/s]
Number of cores per NUMA domain
Matrix Transpose (SMP, 24kx24k Matrices)
HPX (1 NUMA Domain)
HPX (2 NUMA Domains)
OMP (1 NUMA Domain)
OMP (2 NUMA Domains)
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
44/ 51
65. Matrix Transpose
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8 9 10 11 12
Datatransferrate[GB/s]
Number of cores per NUMA domain
Matrix Transpose (SMP, 24kx24k Matrices)
HPX (2 NUMA Domains)
MPI (1 NUMA Domain, 12 ranks)
MPI (2 NUMA Domains, 24 ranks)
MPI+OMP (2 NUMA Domains)
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
45/ 51
66. Matrix Transpose
0
5
10
15
20
25
30
35
40
45
50
0 10 20 30 40 50 60
Datatransferrate[GB/s]
Number of cores
Matrix Transpose (Xeon/Phi, 24kx24k matrices)
HPX (4 PUs per core) OMP (4 PUs per core)
HPX (2 PUs per core) OMP (2 PUs per core)
HPX (1 PUs per core) OMP (1 PUs per core)
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
46/ 51
67. Matrix Transpose
0
5
10
15
20
25
30
35
2 3 4 5 6 7 8
Datatransferrate[GB/s]
Number of nodes (16 cores each)
Matrix Transpose (Distributed, 18kx18k elements per node)
HPX MPI
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
47/ 51
68. What’s beyond Exascale?
This project has received funding from the Eu-
ropean Union‘s Horizon 2020 research and in-
novation programme under grant agreement No.
671603
69. Conclusions
Higher-level parallelization abstractions in C++:
• uniform, versatile, and generic
• All of this is enabled by use of modern C++ facilities
• Runtime system (fine-grain, task-based schedulers)
• Performant, portable implementation
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
49/ 51
70. Parallelism is here to stay!
• Massive Parallel Hardware is already part of our daily lives!
• Parallelism is observable everywhere:
⇒ IoT: Massive amount devices existing in parallel
⇒ Embedded: Meet massively parallel energy-aware systems (Epiphany, DSPs,
FPGAs)
⇒ Automotive: Massive amount of parallel sensor data to process
• We all need solutions on how to deal with this, efficiently and pragmatically
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
50/ 51
71. More Information
• https://github.com/STEllAR-GROUP/hpx
• http://stellar-group.org
• hpx-users@stellar.cct.lsu.edu
• #STE||AR @ irc.freenode.org
Collaborations:
• FET-HPC (H2020): AllScale (https://allscale.eu)
• NSF: STORM (http://storm.stellar-group.org)
• DOE: Part of X-Stack
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
51/ 51