SpeQuloS: A QoS Service for BoT Applications Using Best Effort Distributed Computing Infrastructures
Simon Delamare Gilles Fedak Derrick Kondo Oleg Lodygensky
High-Performance Parallel and Distributed Computing, 2012
With the HPC Cloud facility, SURFsara offers self-service, dynamically scalable and fully configurable HPC systems to the Dutch academic community. Users have, for example, a free choice of operating system and software.
The HPC Cloud offers full control over a HPC cluster, with fast CPUs and high memory nodes and it is possible to attach terabytes of local storage to a compute node. Because of this flexibility, users can fully tailor the system for a particular application. Long-running and small compute jobs are equally welcome. Additionally, the system facilitates collaboration: users can share control over their virtual private HPC cluster with other users and share processing time, data and results. A portal with wiki, fora, repositories, issue system, etc. is offered for collaboration projects as well.
A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...Larry Smarr
10.04.07
Presentation by Larry Smarr to the NSF Campus Bridging Workshop
University Place Conference Center
Title: A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging End-User Laboratories to Data-Intensive Sources
Indianapolis, IN
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
In this deck from the Swiss HPC Conference, Mark Wilkinson presents: 40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility.
"DiRAC is the integrated supercomputing facility for theoretical modeling and HPC-based research in particle physics, and astrophysics, cosmology, and nuclear physics, all areas in which the UK is world-leading. DiRAC provides a variety of compute resources, matching machine architecture to the algorithm design and requirements of the research problems to be solved. As a single federated Facility, DiRAC allows more effective and efficient use of computing resources, supporting the delivery of the science programs across the STFC research communities. It provides a common training and consultation framework and, crucially, provides critical mass and a coordinating structure for both small- and large-scale cross-discipline science projects, the technical support needed to run and develop a distributed HPC service, and a pool of expertise to support knowledge transfer and industrial partnership projects. The on-going development and sharing of best-practice for the delivery of productive, national HPC services with DiRAC enables STFC researchers to produce world-leading science across the entire STFC science theory program."
Watch the video: https://wp.me/p3RLHQ-k94
Learn more: https://dirac.ac.uk/
and
http://hpcadvisorycouncil.com/events/2019/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...Larry Smarr
11.12.12
Seminar Presentation
Princeton Institute for Computational Science and Engineering (PICSciE)
Princeton University
Title: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research
Princeton, NJ
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...Larry Smarr
11.04.06
Joint Presentation
UCSD School of Medicine Research Council
Larry Smarr, Calit2 & Phil Papadopoulos, SDSC/Calit2
Title: High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biomedical Sciences
HDFLook is a software tool developed jointly by NASA GSFC and LOA USTL, France that allows users to view and analyze Earth science datasets. It provides capabilities to access, visualize, remap, reproject, subset, mosaic and convert files for MODIS, AIRS, and CERES products. HDFLook can be run in interactive, operational and batch modes and supports accessing geophysical and ancillary data for scientific analysis and operational uses. Over 3000 copies of HDFLook have been distributed worldwide to support the user community.
This document describes a performance-aware power capping orchestrator called XeMPUPiL for the Xen hypervisor. It monitors workloads using instrumentation-free techniques and manages power through both hardware and software approaches. The proposed solution uses Xen's hypervisor and hardware events/counters to observe workloads, then decides how to allocate resources and acts by interfacing with RAPL to enforce power caps while maintaining performance. Experimental results show XeMPUPiL outperforms a pure RAPL baseline for I/O, memory, and mixed workloads but suffers on CPU-intensive workloads due to Xen optimizations.
With the HPC Cloud facility, SURFsara offers self-service, dynamically scalable and fully configurable HPC systems to the Dutch academic community. Users have, for example, a free choice of operating system and software.
The HPC Cloud offers full control over a HPC cluster, with fast CPUs and high memory nodes and it is possible to attach terabytes of local storage to a compute node. Because of this flexibility, users can fully tailor the system for a particular application. Long-running and small compute jobs are equally welcome. Additionally, the system facilitates collaboration: users can share control over their virtual private HPC cluster with other users and share processing time, data and results. A portal with wiki, fora, repositories, issue system, etc. is offered for collaboration projects as well.
A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...Larry Smarr
10.04.07
Presentation by Larry Smarr to the NSF Campus Bridging Workshop
University Place Conference Center
Title: A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging End-User Laboratories to Data-Intensive Sources
Indianapolis, IN
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
In this deck from the Swiss HPC Conference, Mark Wilkinson presents: 40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility.
"DiRAC is the integrated supercomputing facility for theoretical modeling and HPC-based research in particle physics, and astrophysics, cosmology, and nuclear physics, all areas in which the UK is world-leading. DiRAC provides a variety of compute resources, matching machine architecture to the algorithm design and requirements of the research problems to be solved. As a single federated Facility, DiRAC allows more effective and efficient use of computing resources, supporting the delivery of the science programs across the STFC research communities. It provides a common training and consultation framework and, crucially, provides critical mass and a coordinating structure for both small- and large-scale cross-discipline science projects, the technical support needed to run and develop a distributed HPC service, and a pool of expertise to support knowledge transfer and industrial partnership projects. The on-going development and sharing of best-practice for the delivery of productive, national HPC services with DiRAC enables STFC researchers to produce world-leading science across the entire STFC science theory program."
Watch the video: https://wp.me/p3RLHQ-k94
Learn more: https://dirac.ac.uk/
and
http://hpcadvisorycouncil.com/events/2019/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...Larry Smarr
11.12.12
Seminar Presentation
Princeton Institute for Computational Science and Engineering (PICSciE)
Princeton University
Title: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research
Princeton, NJ
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...Larry Smarr
11.04.06
Joint Presentation
UCSD School of Medicine Research Council
Larry Smarr, Calit2 & Phil Papadopoulos, SDSC/Calit2
Title: High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biomedical Sciences
HDFLook is a software tool developed jointly by NASA GSFC and LOA USTL, France that allows users to view and analyze Earth science datasets. It provides capabilities to access, visualize, remap, reproject, subset, mosaic and convert files for MODIS, AIRS, and CERES products. HDFLook can be run in interactive, operational and batch modes and supports accessing geophysical and ancillary data for scientific analysis and operational uses. Over 3000 copies of HDFLook have been distributed worldwide to support the user community.
This document describes a performance-aware power capping orchestrator called XeMPUPiL for the Xen hypervisor. It monitors workloads using instrumentation-free techniques and manages power through both hardware and software approaches. The proposed solution uses Xen's hypervisor and hardware events/counters to observe workloads, then decides how to allocate resources and acts by interfacing with RAPL to enforce power caps while maintaining performance. Experimental results show XeMPUPiL outperforms a pure RAPL baseline for I/O, memory, and mixed workloads but suffers on CPU-intensive workloads due to Xen optimizations.
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...Ilham Amezzane
Support Vector Machines (SVMs) have proven to yield high accuracy and have been used widespread in recent years. However, the standard versions of the SVM algorithm are very time-consuming and computationally intensive; which places a challenge on engineers to explore other hardware architectures than CPU, capable of performing real-time training and classifications while maintaining low power consumption in embedded systems. This paper proposes an overview of works based on the two most popular parallel processing devices: GPU and FPGA, with a focus on multiclass training process. Since different techniques have been evaluated using different experimentation platforms and methodologies, we only focus on the improvements realized in each study.
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2021/02/introduction-to-the-tvm-open-source-deep-learning-compiler-stack-a-presentation-from-octoml/
Luis Ceze, Co-founder and CEO of OctoML, a Professor in the Paul G. Allen School of Computer Science and Engineering at the University of Washington, and Venture Partner at Madrona Venture Group, presents the “Introduction to the TVM Open Source Deep Learning Compiler Stack” tutorial at the September 2020 Embedded Vision Summit.
There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms — such as mobile phones, embedded devices, and accelerators — requires significant manual effort.
In this talk, Ceze presents his work on the TVM stack, which exposes graph- and operator-level optimizations to provide performance portability for deep learning workloads across diverse hardware back-ends. TVM solves optimization challenges specific to deep learning, such as high-level operator fusion, mapping to arbitrary hardware primitives and memory latency hiding. It also automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of optimizations.
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesIntel® Software
Orbital representations that are based on B-splines are widely used in quantum Monte Carlo (QMC) simulations of solids, which historically take as much as 50 percent of the total runtime. Random access to a large four-dimensional array make it challenging to efficiently use caches and wide vector units in modern CPUs. So, we present node-level optimizations of B-spline evaluations on multicore and manycore shared memory processors.
To increase single instruction multiple data (SIMD) efficiency and bandwidth utilization, we first apply data layout transformation from an array of structures (AoS) to a structure of arrays (SoA). Then, by blocking SoA objects, we optimize cache reuse and get sustained throughput for a range of problem sizes. We implement efficient nested threading in B-spline orbital evaluation kernels, paving the way towards enabling strong scaling of QMC simulations, resulting with performance enhancements. Finally, we employ roofline performance analysis to model the impacts of our optimizations.
State of Containers and the Convergence of HPC and BigDatainside-BigData.com
In this deck from 2018 Swiss HPC Conference, Christian Kniep from Docker Inc. presents: State of Containers and the Convergence of HPC and BigData.
"This talk will recap the history of and what constitutes Linux Containers, before laying out how the technology is employed by various engines and what problems these engines have to solve. Afterward Christian will elaborate on why the advent of standards for images and runtimes moved the discussion from building and distributing containers to orchestrating containerized applications at scale. In conclusion attendees will get an update on how containers foster the convergence of Big Data and HPC workloads and the state of native HPC containers."
Learn more: http://docker.com
and
http://www.hpcadvisorycouncil.com/events/2018/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Riding the Light: How Dedicated Optical Circuits are Enabling New ScienceLarry Smarr
The document discusses how dedicated optical circuits are enabling new science through high-bandwidth networks. It provides examples of several projects using dedicated optical networks, such as the OptIPuter project, to enable interactive analysis of large datasets through terabit network connections between supercomputing centers. The document concludes by discussing future ocean observatory networks that will use undersea fiber optics to enable remote interactive imaging and sensing.
Big Data, Beyond the Data Center
Increasingly the next scientific discoveries and the next industrial innovative breakthroughs will depend on the capacity to extract knowledge and sense from gigantic amount of information. Examples vary from processing data provided by scientific instruments such as the CERN’s LHC; collecting data from large-scale sensor networks; grabbing, indexing and nearly instantaneously mining and searching the Web; building and traversing the billion-edges social network graphs; anticipating market and customer trends through multiple channels of information. Collecting information from various sources, recognizing patterns and distilling insights constitutes what is called the Big Data challenge. However, As the volume of data grows exponentially, the management of these data becomes more complex in proportion. A key challenge is to handle the complexity of data management on Hybrid distributed infrastructures, i.e assemblage of Cloud, Grid or Desktop Grids. In this talk, I will overview our works in this research area; starting with BitDew, a middleware for large scale data management on Clouds and Desktop Grids. Then I will present our approach to enable MapReduce on Desktop Grids. Finally, I will present our latest results around Active Data, a programming model for managing data life cycle on heterogeneous systems and infrastructures.
Active Data is a data-centric approach to data life-cycle management that uses a Petri net-based model to represent data states and transitions between systems. It exposes distributed data sets and allows clients to react to life cycle events in a scalable way. A prototype implemented the publish-subscribe model and demonstrated handling over 30,000 transitions per second. Active Data provides advantages like formal verification and fault tolerance but requires more work to standardize and represent complex data operations.
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...Gilles Fedak
Active Data : Managing Data-Life Cycle on Heterogeneous Systems and Infrastructures
The Big Data challenge consists in managing, storing, analyzing and visualizing these huge and ever growing data sets to extract sense and knowledge. As the volume of data grows exponentially, the management of these data becomes more complex in proportion.
A key point is to handle the complexity of the 'Data Life Cycle', i.e. the various operations performed on data: transfer, archiving, replication, deletion, etc. Indeed, data-intensive applications span over a large variety of devices and e-infrastructures which implies that many systems are involved in data management and processing.
''Active Data'' is new approach to automate and improve the expressiveness of data management applications. It consists of
* a 'formal model' for Data Life Cycle, based on Petri Net, that allows to describe and expose data life cycle across heterogeneous systems and infrastructures.
* a 'programming model' allows code execution at each stage of the data life cycle: routines provided by programmers are executed when a set of events (creation, replication, transfer, deletion) happen to any data.
The document discusses MapReduce runtime environments, including their design, performance optimizations, and applications. It provides an overview of MapReduce, describing the programming model and key-value data processing. It also discusses the design of MapReduce execution runtimes, including their use of distributed file systems and handling of parallelization, load balancing, and failures. Finally, it outlines areas of ongoing research to improve MapReduce performance and applicability.
The iEx.ec Distributed Cloud: Latest Developments and PerspectivesGilles Fedak
The document discusses the iEx.ec Distributed Cloud, which allows blockchain applications to access off-chain computing resources through a market network built on the Ethereum blockchain. Key points include:
- iEx.ec creates a decentralized marketplace where computing resources like servers, apps, and data can be advertised and provisioned directly through smart contracts.
- This provides transparency, security, and no single point of failure compared to traditional clouds.
- The technology builds on decades of research in desktop grid computing and volunteer computing to execute tasks in a highly secure and scalable way.
- An initial proof-of-concept allows generation of custom Bitcoin addresses through parallel processing of tasks.
Talk at the Etherreum Developper Conference. Presents our approach to build a fully decentralized Cloud Infrastructure based on the Ethereum blockchain and Desktop Grid middleware.
How Blockchain and Smart Buildings can Reshape the InternetGilles Fedak
This document discusses how blockchain and smart buildings can reshape distributed cloud computing and the internet. It describes how blockchain technologies like Ethereum allow for distributed applications running on smart contracts. The iEx.ec project aims to provide a blockchain-based distributed cloud computing platform that gives applications access to computing resources like services, data, and infrastructure in a low-cost, secure, on-demand and fully distributed manner. This builds upon prior work in desktop grid computing and could make cloud computing more efficient and greener by better utilizing idle computing resources.
This document discusses various research topics related to proactive networking and edge computing. It begins with an outline of topics including edge caching, mobile edge computing (MEC), 5G vehicle-to-everything (V2X), virtual reality (VR), unmanned aerial vehicles (UAVs), and ultra-reliable low-latency communications (URLLC). It then discusses the need to move from reactive to proactive networking approaches to meet new requirements from applications like VR and industry 4.0. Key challenges discussed include time-varying content popularity, hierarchical caching, fog/edge computing with mobility, and ultra-reliable low-latency networking.
DSD-INT 2015 - Addressing high resolution modelling over different computing ...Deltares
This document discusses using high performance computing resources to model water quality and hydrodynamics in reservoirs. It summarizes work using the Delft3D model on the Cuerda del Pozo reservoir, which experiences eutrophication issues. The modeling aims to reproduce conditions like algae blooms and alert authorities before water quality deteriorates. While hydrodynamics modeling was successful, water quality modeling had issues to be addressed. The document also discusses using cloud resources through the EGI FedCloud to run the high resolution models, as well as potential applications of this use case for biodiversity infrastructure projects like LifeWatch and INDIGO-DataCloud.
DSD-INT 2015 - Addressing high resolution modelling over different computing ...Deltares
This document discusses using high performance computing resources to model water quality and hydrodynamics in reservoirs. It summarizes work using the Delft3D model on the Cuerda del Pozo reservoir, which experiences eutrophication issues. Testing was conducted on supercomputers and cloud infrastructures to determine appropriate resources for high resolution models. The work provides a useful demonstration case for infrastructure projects seeking water quality modeling applications. Addressing eutrophication through modeling can help alert authorities to water quality issues before they occur.
This document discusses using cloud computing technologies for data analysis applications. It presents different cloud runtimes like Hadoop, DryadLINQ, and CGL-MapReduce and compares their features to MPI. Applications like Cap3 and HEP are well-suited for cloud runtimes while iterative applications show higher overhead. Results show that as the number of VMs per node increases, MPI performance decreases by up to 50% compared to bare metal nodes. Integration of MapReduce and MPI could help improve performance of some applications on clouds.
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...Wolfgang Gentzsch
The UberCloud online marketplace for engineers and scientists to discover, try, and buy compute power on demand, in the cloud. Starting with free experiments in the cloud, including application software, cloud hardware, and expertise. Learning by doing how to use your application in the cloud.
The UberCloud online marketplace for engineers and scientists to discover, try, and buy compute power on demand, in the cloud. Starting with free experiments in the cloud, including application software, cloud hardware, and expertise. Learning by doing how to use your application in the cloud.
info.theubercloud.com/case-studies-and-resources
Bridging the gap to facilitate selection and image analysis activities for la...Phidias
PHIDIAS organised it's third and final PHIDIAS Webinar of the series, this time dedicated to Use Case 2: Big Data Earth Observations (EO), took place on 18 February 2021 at 15:00 CET, showcasing how PHIDIAS is taking advantage of HPC architecture to facilitate selection and image analysis activities for land surface monitoring.
Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...rodrickmero
Tanzanian Higher Learning Institutions (HLIs) are facing challenges in providing the necessary Information Technology (IT) support for education, research and development activities. Currently, HLIs use traditional computing (TC) which has proven to be uneconomical in terms of maintenance, software purchase costs, huge power consumption and staffing.
Cloud computing (CC) is the way forward for HLIs in solving the computing challenges. However, the HLIs policies regarding security of critical data in CC environment prevent adoption of CC services from existing vendors. The reliable and secure way is to establish and operate CC data centers dedicated to HLIs critical data and services. Owning and operating the traditional data centers is a challenge to HLIs because it consumes huge amounts of power. Tanzania like other developing countries has a low level of electrification, while the need for electric power consumption is increasing year after year. The need to consider energy efficient approaches in data center operation is very important for reducing both the operation costs and carbon footprint to the environment.
Therefore, this thesis presents the eco-efficient cloud computing framework that integrates renewable and non-renewable power sources, and free cooling in reducing carbon emission and power consumption in HLIT cloud data centers.
To develop the framework, we conducted a study in Tanzania HLIs to explore the current situation and cloud computing requirements. Interview, Observation, and document review were data collection method used by the study. After analysis of the results, we defined guidelines for developing CC building blocks. We used CloudSim tool kit and Netbin IDE to develop and to simulate eco-efficient framework.
At the end, eco-efficient framework has shown improvement on power consumption, efficiency and carbon emission. Therefore, eco-efficient approaches give HLIs of Tanzania sustainable solution to their computing needs by significantly reducing operating costs. Moreover, it ensures environment protection for the benefit of current and future generations.
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...Ilham Amezzane
Support Vector Machines (SVMs) have proven to yield high accuracy and have been used widespread in recent years. However, the standard versions of the SVM algorithm are very time-consuming and computationally intensive; which places a challenge on engineers to explore other hardware architectures than CPU, capable of performing real-time training and classifications while maintaining low power consumption in embedded systems. This paper proposes an overview of works based on the two most popular parallel processing devices: GPU and FPGA, with a focus on multiclass training process. Since different techniques have been evaluated using different experimentation platforms and methodologies, we only focus on the improvements realized in each study.
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2021/02/introduction-to-the-tvm-open-source-deep-learning-compiler-stack-a-presentation-from-octoml/
Luis Ceze, Co-founder and CEO of OctoML, a Professor in the Paul G. Allen School of Computer Science and Engineering at the University of Washington, and Venture Partner at Madrona Venture Group, presents the “Introduction to the TVM Open Source Deep Learning Compiler Stack” tutorial at the September 2020 Embedded Vision Summit.
There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms — such as mobile phones, embedded devices, and accelerators — requires significant manual effort.
In this talk, Ceze presents his work on the TVM stack, which exposes graph- and operator-level optimizations to provide performance portability for deep learning workloads across diverse hardware back-ends. TVM solves optimization challenges specific to deep learning, such as high-level operator fusion, mapping to arbitrary hardware primitives and memory latency hiding. It also automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of optimizations.
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesIntel® Software
Orbital representations that are based on B-splines are widely used in quantum Monte Carlo (QMC) simulations of solids, which historically take as much as 50 percent of the total runtime. Random access to a large four-dimensional array make it challenging to efficiently use caches and wide vector units in modern CPUs. So, we present node-level optimizations of B-spline evaluations on multicore and manycore shared memory processors.
To increase single instruction multiple data (SIMD) efficiency and bandwidth utilization, we first apply data layout transformation from an array of structures (AoS) to a structure of arrays (SoA). Then, by blocking SoA objects, we optimize cache reuse and get sustained throughput for a range of problem sizes. We implement efficient nested threading in B-spline orbital evaluation kernels, paving the way towards enabling strong scaling of QMC simulations, resulting with performance enhancements. Finally, we employ roofline performance analysis to model the impacts of our optimizations.
State of Containers and the Convergence of HPC and BigDatainside-BigData.com
In this deck from 2018 Swiss HPC Conference, Christian Kniep from Docker Inc. presents: State of Containers and the Convergence of HPC and BigData.
"This talk will recap the history of and what constitutes Linux Containers, before laying out how the technology is employed by various engines and what problems these engines have to solve. Afterward Christian will elaborate on why the advent of standards for images and runtimes moved the discussion from building and distributing containers to orchestrating containerized applications at scale. In conclusion attendees will get an update on how containers foster the convergence of Big Data and HPC workloads and the state of native HPC containers."
Learn more: http://docker.com
and
http://www.hpcadvisorycouncil.com/events/2018/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Riding the Light: How Dedicated Optical Circuits are Enabling New ScienceLarry Smarr
The document discusses how dedicated optical circuits are enabling new science through high-bandwidth networks. It provides examples of several projects using dedicated optical networks, such as the OptIPuter project, to enable interactive analysis of large datasets through terabit network connections between supercomputing centers. The document concludes by discussing future ocean observatory networks that will use undersea fiber optics to enable remote interactive imaging and sensing.
Big Data, Beyond the Data Center
Increasingly the next scientific discoveries and the next industrial innovative breakthroughs will depend on the capacity to extract knowledge and sense from gigantic amount of information. Examples vary from processing data provided by scientific instruments such as the CERN’s LHC; collecting data from large-scale sensor networks; grabbing, indexing and nearly instantaneously mining and searching the Web; building and traversing the billion-edges social network graphs; anticipating market and customer trends through multiple channels of information. Collecting information from various sources, recognizing patterns and distilling insights constitutes what is called the Big Data challenge. However, As the volume of data grows exponentially, the management of these data becomes more complex in proportion. A key challenge is to handle the complexity of data management on Hybrid distributed infrastructures, i.e assemblage of Cloud, Grid or Desktop Grids. In this talk, I will overview our works in this research area; starting with BitDew, a middleware for large scale data management on Clouds and Desktop Grids. Then I will present our approach to enable MapReduce on Desktop Grids. Finally, I will present our latest results around Active Data, a programming model for managing data life cycle on heterogeneous systems and infrastructures.
Active Data is a data-centric approach to data life-cycle management that uses a Petri net-based model to represent data states and transitions between systems. It exposes distributed data sets and allows clients to react to life cycle events in a scalable way. A prototype implemented the publish-subscribe model and demonstrated handling over 30,000 transitions per second. Active Data provides advantages like formal verification and fault tolerance but requires more work to standardize and represent complex data operations.
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...Gilles Fedak
Active Data : Managing Data-Life Cycle on Heterogeneous Systems and Infrastructures
The Big Data challenge consists in managing, storing, analyzing and visualizing these huge and ever growing data sets to extract sense and knowledge. As the volume of data grows exponentially, the management of these data becomes more complex in proportion.
A key point is to handle the complexity of the 'Data Life Cycle', i.e. the various operations performed on data: transfer, archiving, replication, deletion, etc. Indeed, data-intensive applications span over a large variety of devices and e-infrastructures which implies that many systems are involved in data management and processing.
''Active Data'' is new approach to automate and improve the expressiveness of data management applications. It consists of
* a 'formal model' for Data Life Cycle, based on Petri Net, that allows to describe and expose data life cycle across heterogeneous systems and infrastructures.
* a 'programming model' allows code execution at each stage of the data life cycle: routines provided by programmers are executed when a set of events (creation, replication, transfer, deletion) happen to any data.
The document discusses MapReduce runtime environments, including their design, performance optimizations, and applications. It provides an overview of MapReduce, describing the programming model and key-value data processing. It also discusses the design of MapReduce execution runtimes, including their use of distributed file systems and handling of parallelization, load balancing, and failures. Finally, it outlines areas of ongoing research to improve MapReduce performance and applicability.
The iEx.ec Distributed Cloud: Latest Developments and PerspectivesGilles Fedak
The document discusses the iEx.ec Distributed Cloud, which allows blockchain applications to access off-chain computing resources through a market network built on the Ethereum blockchain. Key points include:
- iEx.ec creates a decentralized marketplace where computing resources like servers, apps, and data can be advertised and provisioned directly through smart contracts.
- This provides transparency, security, and no single point of failure compared to traditional clouds.
- The technology builds on decades of research in desktop grid computing and volunteer computing to execute tasks in a highly secure and scalable way.
- An initial proof-of-concept allows generation of custom Bitcoin addresses through parallel processing of tasks.
Talk at the Etherreum Developper Conference. Presents our approach to build a fully decentralized Cloud Infrastructure based on the Ethereum blockchain and Desktop Grid middleware.
How Blockchain and Smart Buildings can Reshape the InternetGilles Fedak
This document discusses how blockchain and smart buildings can reshape distributed cloud computing and the internet. It describes how blockchain technologies like Ethereum allow for distributed applications running on smart contracts. The iEx.ec project aims to provide a blockchain-based distributed cloud computing platform that gives applications access to computing resources like services, data, and infrastructure in a low-cost, secure, on-demand and fully distributed manner. This builds upon prior work in desktop grid computing and could make cloud computing more efficient and greener by better utilizing idle computing resources.
This document discusses various research topics related to proactive networking and edge computing. It begins with an outline of topics including edge caching, mobile edge computing (MEC), 5G vehicle-to-everything (V2X), virtual reality (VR), unmanned aerial vehicles (UAVs), and ultra-reliable low-latency communications (URLLC). It then discusses the need to move from reactive to proactive networking approaches to meet new requirements from applications like VR and industry 4.0. Key challenges discussed include time-varying content popularity, hierarchical caching, fog/edge computing with mobility, and ultra-reliable low-latency networking.
DSD-INT 2015 - Addressing high resolution modelling over different computing ...Deltares
This document discusses using high performance computing resources to model water quality and hydrodynamics in reservoirs. It summarizes work using the Delft3D model on the Cuerda del Pozo reservoir, which experiences eutrophication issues. The modeling aims to reproduce conditions like algae blooms and alert authorities before water quality deteriorates. While hydrodynamics modeling was successful, water quality modeling had issues to be addressed. The document also discusses using cloud resources through the EGI FedCloud to run the high resolution models, as well as potential applications of this use case for biodiversity infrastructure projects like LifeWatch and INDIGO-DataCloud.
DSD-INT 2015 - Addressing high resolution modelling over different computing ...Deltares
This document discusses using high performance computing resources to model water quality and hydrodynamics in reservoirs. It summarizes work using the Delft3D model on the Cuerda del Pozo reservoir, which experiences eutrophication issues. Testing was conducted on supercomputers and cloud infrastructures to determine appropriate resources for high resolution models. The work provides a useful demonstration case for infrastructure projects seeking water quality modeling applications. Addressing eutrophication through modeling can help alert authorities to water quality issues before they occur.
This document discusses using cloud computing technologies for data analysis applications. It presents different cloud runtimes like Hadoop, DryadLINQ, and CGL-MapReduce and compares their features to MPI. Applications like Cap3 and HEP are well-suited for cloud runtimes while iterative applications show higher overhead. Results show that as the number of VMs per node increases, MPI performance decreases by up to 50% compared to bare metal nodes. Integration of MapReduce and MPI could help improve performance of some applications on clouds.
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...Wolfgang Gentzsch
The UberCloud online marketplace for engineers and scientists to discover, try, and buy compute power on demand, in the cloud. Starting with free experiments in the cloud, including application software, cloud hardware, and expertise. Learning by doing how to use your application in the cloud.
The UberCloud online marketplace for engineers and scientists to discover, try, and buy compute power on demand, in the cloud. Starting with free experiments in the cloud, including application software, cloud hardware, and expertise. Learning by doing how to use your application in the cloud.
info.theubercloud.com/case-studies-and-resources
Bridging the gap to facilitate selection and image analysis activities for la...Phidias
PHIDIAS organised it's third and final PHIDIAS Webinar of the series, this time dedicated to Use Case 2: Big Data Earth Observations (EO), took place on 18 February 2021 at 15:00 CET, showcasing how PHIDIAS is taking advantage of HPC architecture to facilitate selection and image analysis activities for land surface monitoring.
Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...rodrickmero
Tanzanian Higher Learning Institutions (HLIs) are facing challenges in providing the necessary Information Technology (IT) support for education, research and development activities. Currently, HLIs use traditional computing (TC) which has proven to be uneconomical in terms of maintenance, software purchase costs, huge power consumption and staffing.
Cloud computing (CC) is the way forward for HLIs in solving the computing challenges. However, the HLIs policies regarding security of critical data in CC environment prevent adoption of CC services from existing vendors. The reliable and secure way is to establish and operate CC data centers dedicated to HLIs critical data and services. Owning and operating the traditional data centers is a challenge to HLIs because it consumes huge amounts of power. Tanzania like other developing countries has a low level of electrification, while the need for electric power consumption is increasing year after year. The need to consider energy efficient approaches in data center operation is very important for reducing both the operation costs and carbon footprint to the environment.
Therefore, this thesis presents the eco-efficient cloud computing framework that integrates renewable and non-renewable power sources, and free cooling in reducing carbon emission and power consumption in HLIT cloud data centers.
To develop the framework, we conducted a study in Tanzania HLIs to explore the current situation and cloud computing requirements. Interview, Observation, and document review were data collection method used by the study. After analysis of the results, we defined guidelines for developing CC building blocks. We used CloudSim tool kit and Netbin IDE to develop and to simulate eco-efficient framework.
At the end, eco-efficient framework has shown improvement on power consumption, efficiency and carbon emission. Therefore, eco-efficient approaches give HLIs of Tanzania sustainable solution to their computing needs by significantly reducing operating costs. Moreover, it ensures environment protection for the benefit of current and future generations.
The document discusses energy consumption monitoring and management in Grid and Cloud computing infrastructures like Grid5000. It describes the energy sensor infrastructure deployed on Grid5000 sites, including Omegawatt boxes to measure power. The infrastructure is used to profile energy usage of applications and evaluate policies to reduce energy consumption and increase awareness among users. Logs of energy data are stored and made available in an online repository for analyzing consumption patterns.
DuraMat CO1 Central Data Resource: How it started, how it’s going …Anubhav Jain
The document summarizes several projects developed as part of the DuraMat CO1 Central Data Resource initiative to analyze photovoltaic performance and degradation data. A secure data portal was developed that currently hosts data from 239 users and 271 datasets. Software tools were also created, such as pvAnalytics for data cleaning and filtering, pvOps for operational and maintenance data analysis, and pv-vision for electroluminescence image analysis. These open source tools are publicly available and have helped advance the analysis of PV degradation through access to larger datasets. Overall, the projects have established a foundation for ongoing collaborative research on PV performance and lifetime under DuraMat 2.0.
Bob Jones, CERN & HNSciCloud Coordinator gives an update on the HNSciCloud Pre-Commercial Procurement which is now in its Solution Prototyping phase. The presentation includes also an overview of the prototypes under development.
Enabling Application Integrated Proactive Fault ToleranceDai Yang
Exascale computing is the next major milestone for the HPC community. Due to a steadily increasing probability of failures, cur- rent applications must be made malleable to be able to cope with dy- namic resource changes. In this paper, we show first results with LAIK, a lightweight library for dynamically re-distributable application data. This allows to free compute nodes from workload before a predicted failure. For a real-world application, we show that LAIK adds negligi- ble overhead. In addition, we show the effect of different re-distribution strategies.
This document provides an overview of predictive churn modeling using H2O and Sparkling Water. It discusses what predictive churn is and key performance measures like lift. It also introduces H2O as a machine learning platform, Apache Spark, and H2O Sparkling Water which integrates H2O with Spark. The document demonstrates building a predictive churn model on telco customer data using different approaches in H2O Flow, Spark Scala, and R. It discusses deploying a model via REST API, Docker, and H2O Steam.
Stay up-to-date on the latest news, events and resources for the OpenACC community. This month’s highlights covers the first remote GPU Hackathons, a complete schedule of upcoming events, using OpenACC for a biophysics problem, NVIDIA HPC SDK, GCC 10, new resources and more!
We will also discuss optimizations for MPI collectives communications, that are frequently used for processes synchronization and show how their performance is critical for scalable, high-performance applications.
Architecture and Performance of Runtime Environments for Data Intensive Scala...jaliyae
This document summarizes a student's doctoral research on runtime environments for data-intensive scalable computing. The key points are:
1) The student is investigating cloud runtimes like MapReduce, DryadLINQ, and i-MapReduce for data and compute-intensive applications represented as filter pipelines.
2) The student has applied these runtimes to applications in domains like genomics, phylogenetics, and high energy physics, demonstrating their ability to parallelize tasks.
3) The student has developed i-MapReduce to support iterative MapReduce computations more efficiently than traditional MapReduce systems by caching static data in memory between iterations.
4) Current research directions include evaluating the
Session 46 - Principles of workflow management and execution ISSGC Summer School
Here are the steps to refer to an input or output file in a P-GRADE Portal workflow:
1. For an input file, specify its location on the client side (your desktop). This tells the portal where to get the file from.
2. For an output file, specify where to store it on the client side. This tells the portal where to send the file to after job completion.
3. Additionally for input/output files located on storage elements (SEs), specify the LFC path which is how the file is referenced on the grid storage.
4. For each file, also specify an "internal file name" which is the name the job executable will use to access the file, such
Similar to SpeQuloS: A QoS Service for BoT Applications Using Best Effort Distributed Computing Infrastructures (20)
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
"Scaling RAG Applications to serve millions of users", Kevin GoedeckeFwdays
How we managed to grow and scale a RAG application from zero to thousands of users in 7 months. Lessons from technical challenges around managing high load for LLMs, RAGs and Vector databases.
High performance Serverless Java on AWS- GoTo Amsterdam 2024Vadym Kazulkin
Java is for many years one of the most popular programming languages, but it used to have hard times in the Serverless community. Java is known for its high cold start times and high memory footprint, comparing to other programming languages like Node.js and Python. In this talk I'll look at the general best practices and techniques we can use to decrease memory consumption, cold start times for Java Serverless development on AWS including GraalVM (Native Image) and AWS own offering SnapStart based on Firecracker microVM snapshot and restore and CRaC (Coordinated Restore at Checkpoint) runtime hooks. I'll also provide a lot of benchmarking on Lambda functions trying out various deployment package sizes, Lambda memory settings, Java compilation options and HTTP (a)synchronous clients and measure their impact on cold and warm start times.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
📕 Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
💻 Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
ScyllaDB is making a major architecture shift. We’re moving from vNode replication to tablets – fragments of tables that are distributed independently, enabling dynamic data distribution and extreme elasticity. In this keynote, ScyllaDB co-founder and CTO Avi Kivity explains the reason for this shift, provides a look at the implementation and roadmap, and shares how this shift benefits ScyllaDB users.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving
What began over 115 years ago as a supplier of precision gauges to the automotive industry has evolved into being an industry leader in the manufacture of product branding, automotive cockpit trim and decorative appliance trim. Value-added services include in-house Design, Engineering, Program Management, Test Lab and Tool Shops.
From Natural Language to Structured Solr Queries using LLMsSease
This talk draws on experimentation to enable AI applications with Solr. One important use case is to use AI for better accessibility and discoverability of the data: while User eXperience techniques, lexical search improvements, and data harmonization can take organizations to a good level of accessibility, a structural (or “cognitive” gap) remains between the data user needs and the data producer constraints.
That is where AI – and most importantly, Natural Language Processing and Large Language Model techniques – could make a difference. This natural language, conversational engine could facilitate access and usage of the data leveraging the semantics of any data source.
The objective of the presentation is to propose a technical approach and a way forward to achieve this goal.
The key concept is to enable users to express their search queries in natural language, which the LLM then enriches, interprets, and translates into structured queries based on the Solr index’s metadata.
This approach leverages the LLM’s ability to understand the nuances of natural language and the structure of documents within Apache Solr.
The LLM acts as an intermediary agent, offering a transparent experience to users automatically and potentially uncovering relevant documents that conventional search methods might overlook. The presentation will include the results of this experimental work, lessons learned, best practices, and the scope of future work that should improve the approach and make it production-ready.
From Natural Language to Structured Solr Queries using LLMs
SpeQuloS: A QoS Service for BoT Applications Using Best Effort Distributed Computing Infrastructures
1. SpeQuloS: A QoS Service for BoT Applications Using
Best Effort Distributed Computing Infrastructures
Simon Delamare 1
Gilles Fedak 2
Derrick Kondo 3
Oleg Lodygensky 4
1
LIP/CNRS, Univ. Lyon, France
2
LIP/INRIA, Univ. Lyon, France
3
LIG/INRIA, Univ. Grenoble, France
4
LAL/CNRS, Univ. Paris XI, France
High-Performance Parallel and Distributed Computing, 2012
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 1 / 18
2. Introduction
BE-DCI = “Best-Effort” Distributed Computing Infrastructure
→ Large computing power at low cost, Avoid wasting resources
→ No availability guarantee
Desktop Grids
→ BOINC projects: Peta FLOPS for free
Grids used in Best-Effort mode
→ ≈ 40% of utilization in Grid5000@Lyon
Cloud “Spot” Instances
→ c1.large instance price: 0.12$/h (spot) vs. 0.32$/h (regular)
Relevant for BoT execution ...
Bag of Tasks: Set of independent tasks to compute
→ but Low QoS level
Especially compared to regular infrastructures
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 2 / 18
3. Performance Problem Addressed
BoT completion rate increases at the end of execution
→ Tail Effect
0
0.2
0.4
0.6
0.8
1
1.2
0 20 40 60 80 100
BoTcompletionratio
Time
Continuation is performed
at 90% of completion
Ideal Time Actual Completion Time
Tail Duration
Slowdown = (Tail Duration + Ideal Time) / Ideal Time
BoT completion
Tail part of the BoT
Measured by Slowdown:
S =
IdealCompletionTime
RealCompletionTime
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 3 / 18
4. Slowdown by Tail Effect
Slowdown reported on BoT execution
0
0.2
0.4
0.6
0.8
1
0.1 1 10 100
Fractionofexecutionwheretailslowdown<S
Tail Slowdown S (Completion time observed divided by ideal completion time)
BOINC
XWHEP
Best 50% ⇒ S < 1.3
25% to 33% ⇒ S > 2
Worst 5% ⇒ S> 4 to 10
Avg. % of BoT in tail Avg. % of time in tail
BE-DCI Trace BOINC XWHEP BOINC XWHEP
Desktop Grids 4.65 5.11 51.8 45.2
Best Effort Grids 3.74 6.40 27.4 16.5
Spot Instances 2.94 5.19 22.7 21.6
→ Caused by no more than the last 7% of
BoT
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 4 / 18
5. How to improve the situation ?
Better scheduling
QoS in Grid scheduling ([12], [20], [38])
→ Require heavy modification of middleware
→ No satisfactory solution for unreliable infrastructure ([7])
Addressing the tail effect
→ e.g. in MapReduce ([3], [39]), but require precise information from compute
nodes, hard in large DCIs.
Building Hybrid DCIs
Grid & Desktop Grid ([35],[36])
→ Mostly to offload Grid usage
Using Cloud computing ([10],[28],[37])
→ To address peak demands
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 5 / 18
6. SpeQuloS Service
→ Improving BE-DCIs users perceived QoS
Speeding up BoT execution
Bring information on expected BoT execution time
By dynamic provision of Cloud resources
→ Monitoring BoT execution
→ Execute the tail on Cloud
Features:
1 Our context: Existing BE-DCIs and Clouds, not administrator: Black Boxes
2 Interface with users: QoS requests, State of completion, Prediction on
remaining time
3 Careful utilization of Cloud resources w/ Billing & Accounting of usage
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 6 / 18
7. Framework
SpeQuloS modules:
Information: Collect QoS-related
information from DGs
Oracle: Strategies to appropriately
use Cloud resources / QoS
prediction for users
Scheduler: Start/Stop Cloud
resources, usage accounting
Credit System: Bill Cloud usage to
user, using “credits” to buy Cloud
resource cpu.h
Implementation
Independant modules using Python & MySQL
Supported Clouds: EC2, OpenNebula, etc.
Supported DG middleware: BOINC & XtremWeb-HEP
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 7 / 18
8. Cloud Provisioning Strategies
When to start Cloud resources ?
At 90% of BoT completion (9C)
At 90% of BoT assignment (9A)
When Tail appear, by monitoring execution time variance (V)
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 8 / 18
9. Cloud Provisioning Strategies
When to start Cloud resources ?
At 90% of BoT completion (9C)
At 90% of BoT assignment (9A)
When Tail appear, by monitoring execution time variance (V)
How many Cloud resources to start (for a given amount of Credits) ?
Greedy: As much as possible, for 1 hour of cloud usage (G)
Conservative: To ensure that there will be enough credits to run Cloud up to
an estimated completion time (C)
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 8 / 18
10. Cloud Provisioning Strategies
When to start Cloud resources ?
At 90% of BoT completion (9C)
At 90% of BoT assignment (9A)
When Tail appear, by monitoring execution time variance (V)
How many Cloud resources to start (for a given amount of Credits) ?
Greedy: As much as possible, for 1 hour of cloud usage (G)
Conservative: To ensure that there will be enough credits to run Cloud up to
an estimated completion time (C)
How to use Cloud resources ?
Flat: Cloud worker not differentiated from BE-DCI workers (F)
Reschedule : Scheduler reshedule tasks executed on BE-DCI to Cloud (R)
Cloud Duplication : Uncompleted tasks are duplicated to a dedicated Cloud
infrastructure (D)
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 8 / 18
11. Experimentation Setup (1)
Simulations using real BE-DCI infrastructures availability traces, various BoT
workloads, BOINC and XWEP middleware
BE-DCIs availability traces :
Desktop Grids: seti, nd (SETI@Home & NotreDame traces from FTA)
Best Effort Grids: g5klyo, g5kgre (Available ressources in Grid5000 Lyon &
Grenoble clusters in December 2010)
Cloud Spot instances: spot10, spot100 (Maximum number of instances for a
renting cost of 10 or 100 $ per hour, fluctuates according to market price)
trace length mean deviation min max av. quartiles (s) unav. quartiles (s) avg. power power
(days) (nops/s) std. dev.
seti 120 24391 6793 15868 31092 61,531,5407 174,501,3078 1000 250
nd 413.87 180 4.129 77 501 952,3840,26562 640,960,1920 1000 250
g5klyo 31 90.573 105.4 6 226 21,51,63 191,236,480 3000 0
g5kgre 31 474.69 178.7 184 591 5,182,11268 23,547,6891 3000 0
spot10 90 82.186 3.814 29 87 4415,5432,17109 4162,5034,9976 3000 300
spot100 90 823.95 4.945 196 877 1063,5566,22490 383,1906,10274 3000 300
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 9 / 18
12. Experimentation Setup (2)
BoT workloads:
Size nops / task Arrival time
SMALL 1000 3600000 0
BIG 10000 60000 0
RANDOM norm(µ = 1000, σ2
= 200) norm(µ = 60000, σ2
= 10000) weib(λ = 91.98, k = 0.57)
Simulations methodology:
Reproducible executions wo & w/ SpeQuloS
SpeQuloS Credits provisioned w/ 10% of BoT workload (in Cloud resource
cpu.hour equivalent)
→ 25000 BoT execution traces
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 10 / 18
13. Strategies Comparison
Tail Removal Efficiency
→ Tail Duration w/ SpeQuloS vs Tail Duration wo SpeQuloS
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
FractionofBoTwheretailefficiency>P
Tail Removal Efficiency (Percentage P)
9C-G-F
9A-G-F
V-G-F
9C-C-F
9A-C-F
V-C-F
Flat deployment
strategy
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
FractionofBoTwheretailefficiency>P Tail Removal Efficiency (Percentage P)
9C-G-R
9A-G-R
V-G-R
9C-C-R
9A-C-R
V-C-R
Reschedule deployment
strategy
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
FractionofBoTwheretailefficiency>P
Tail Removal Efficiency (Percentage P)
9C-G-D
9A-G-D
V-G-D
9C-C-D
9A-C-D
V-C-D
Cloud duplication
deployment strategy
Best strategies are able to
Suppress tail for 50% of execution
Half the tail for 80% of execution
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 11 / 18
14. Strategies Comparison
Tail Removal Efficiency
→ Tail Duration w/ SpeQuloS vs Tail Duration wo SpeQuloS
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
FractionofBoTwheretailefficiency>P
Tail Removal Efficiency (Percentage P)
9C-G-F
9A-G-F
V-G-F
9C-C-F
9A-C-F
V-C-F
Flat deployment
strategy
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
FractionofBoTwheretailefficiency>P Tail Removal Efficiency (Percentage P)
9C-G-R
9A-G-R
V-G-R
9C-C-R
9A-C-R
V-C-R
Reschedule deployment
strategy
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
FractionofBoTwheretailefficiency>P
Tail Removal Efficiency (Percentage P)
9C-G-D
9A-G-D
V-G-D
9C-C-D
9A-C-D
V-C-D
Cloud duplication
deployment strategy
Best strategies are able to
Suppress tail for 50% of execution
Half the tail for 80% of execution
Flat (F) < Reschedule (R) & Cloud Duplication (D)
Tail Detection (V) triggers Cloud too late
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 11 / 18
15. Cloud Resources Consumption
Percentage of credits spent vs
credits provisioned (=10% of BoT
workload).
10% to 25% of what has been
provisioned are actually used by
Cloud resources
0
10
20
30
40
50
9C-G
-F
9C-G
-R
9C-G
-D9C-C-F
9C-C-R
9C-C-D9A
-G
-F
9A
-G
-R
9A
-G
-D9A
-C-F
9A
-C-R
9A
-C-DV
-G
-F
V
-G
-R
V
-G
-DV
-C-F
V
-C-R
V
-C-D
Percentageofcreditsused
Combination of SpeQuloS strategies
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 12 / 18
16. Cloud Resources Consumption
Percentage of credits spent vs
credits provisioned (=10% of BoT
workload).
10% to 25% of what has been
provisioned are actually used by
Cloud resources
0
10
20
30
40
50
9C-G
-F
9C-G
-R
9C-G
-D9C-C-F
9C-C-R
9C-C-D9A
-G
-F
9A
-G
-R
9A
-G
-D9A
-C-F
9A
-C-R
9A
-C-DV
-G
-F
V
-G
-R
V
-G
-DV
-C-F
V
-C-R
V
-C-D
Percentageofcreditsused
Combination of SpeQuloS strategies
→ ≈2.5% of BoT workload is executed on Cloud
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 12 / 18
17. Completion Time
Combination of strategies used: 9C-C-R
0
20000
40000
60000
80000
100000
120000
140000
SETI
N
D
G
5K
LY
OG
5K
G
RESPO
T10SPO
T100
Completiontime(s)
BE-DCI
No SpeQuloS
SpeQuloS
BOINC & SMALL BoT
0
5000
10000
15000
20000
25000
SETI
N
D
G
5K
LY
OG
5K
G
RESPO
T10SPO
T100
Completiontime(s)
BE-DCI
No SpeQuloS
SpeQuloS
BOINC & BIG BoT
0
10000
20000
30000
40000
50000
60000
70000
SETI
N
D
G
5K
LY
OG
5K
G
RESPO
T10SPO
T100
Completiontime(s)
BE-DCI
No SpeQuloS
SpeQuloS
BOINC & RANDOM BoT
0
5000
10000
15000
20000
25000
30000
35000
40000
SETI
N
D
G
5K
LY
OG
5K
G
RESPO
T10SPO
T100
Completiontime(s)
BE-DCI
No SpeQuloS
SpeQuloS
XWHEP & SMALL BoT
0
1000
2000
3000
4000
5000
6000
7000
8000
SETI
N
D
G
5K
LY
OG
5K
G
RESPO
T10
SPO
T100
Completiontime(s)
BE-DCI
No SpeQuloS
SpeQuloS
XWHEP & BIG BoT
1000
2000
3000
4000
5000
6000
7000
8000
SETI
N
D
G
5K
LY
OG
5K
G
RESPO
T10
SPO
T100
Completiontime(s)
BE-DCI
No SpeQuloS
SpeQuloS
XWHEP & RANDOM BoT
→ Up to 9x speedup
→ Depend on middleware used, BE-DCI volatility
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 13 / 18
18. Completion Time Prediction
→ User can ask prediction at any moment of BoT execution
Predicted completion time:
tp = α ×
t(r)
r
Current completion ratio: r
Time elapsed since submission: t(r)
α: adjustment factor, depend on execution environment:
DG server & middlware
Application & BoT size
→ Adjusted after BoT execution to minimize difference w/ completion time
observed
Statistical uncertainty (±x%): Success rate of prediction vs previous execution
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 14 / 18
19. Prediction Results
Completion Time Predication:
Made at 50% of BoT execution
Uncertainty: ± 20%
α adjusted after 30 execution w/ same BD-DCI, middleware, BoT workload
BoT category & Middleware
SMALL BIG RANDOM
BE-DCI BOINC XWHEP BOINC XWHEP BOINC XWHEP Mixed
seti 100 100 100 82.8 100 87.0 94.1
nd 100 100 100 100 100 96.0 99.4
g5klyo 88.0 89.3 96.0 87.5 75 75 85.6
g5kgre 96.3 88.5 100 92.9 83.3 34.8 83.3
spot10 100 100 100 100 100 100 100
spot100 100 100 100 100 76 3.6 78.3
Mixed 97.6 96.1 99.2 93.5 89.6 65.3 90.2
→ Successful prediction in 9 cases out of 10
→ Lower results with heterogeneous BoT
→ Needs a learning phase, with same BoT (at least same app.), executed on
same BE-DCI.
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 15 / 18
20. SpeQuloS Deployment in European Desktop Grid Initiative
EDGI project: Bringing European Desktop Grids computing resources to scientific
communities.
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 16 / 18
21. Conclusion
BE-DCIs: “Low-cost” solution but poor QoS (tail effect)
SpeQuloS: Use Cloud resources to improve QoS delivered to BE-DCI users
Efficiently removes the tail problem
→ Speed up BoT execution
→ Only require few % of workload to be executed on Cloud
Enable completion time prediction for users
→ A step towards BE-DCIs usability in computing landscape ?
Future work:
Better strategies to anticipate problems (tail effect)
Analysis from users feedback in SpeQuloS deployments
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 17 / 18
22. S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 18 / 18