Distributed Interactive Engineering Toolbox (DIET) is a middleware for distributed computing that provides a simple interface for solving computationally intensive problems across heterogeneous platforms. It uses a client-agent-server model and plug-in schedulers to optimize resource usage and performance. DIET has been deployed on large supercomputing platforms like Grid'5000 and has been used for applications in fields like cosmology, climatology, robotics, and bioinformatics.
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...Larry Smarr
10.10.28
Invited Speaker
Grand Challenges in Data-Intensive Discovery Conference
San Diego Supercomputer Center, UC San Diego
Title: High Performance Cyberinfrastructure Enables Data-Driven Science in the Globally Networked World
La Jolla, CA
UnaCloud: an opportunistic cloud computing Infrastructure as a Service (IaaS) model implementation, which provides at lower cost, fundamental computing resources (processing, storage and networking) to run arbitrary software, including operating systems and applications.
NVIDIA DEEP LEARNING INFERENCE PLATFORM PERFORMANCE STUDY
| TECHNICAL OVERVIEW
| 1
Introduction
Artificial intelligence (AI), the dream of computer scientists for over half
a century, is no longer science fiction—it is already transforming every
industry. AI is the use of computers to simulate human intelligence. AI
amplifies our cognitive abilities—letting us solve problems where the
complexity is too great, the information is incomplete, or the details are
too subtle and require expert training.
While the machine learning field has been active for decades, deep
learning (DL) has boomed over the last five years. In 2012, Alex
Krizhevsky of the University of Toronto won the ImageNet image
recognition competition using a deep neural network trained on NVIDIA
GPUs—beating all the human expert algorithms that had been honed
for decades. That same year, recognizing that larger networks can learn
more, Stanford’s Andrew Ng and NVIDIA Research teamed up to develop
a method for training networks using large-scale GPU computing
systems. These seminal papers sparked the “big bang” of modern AI,
setting off a string of “superhuman” achievements. In 2015, Google and
Microsoft both beat the best human score in the ImageNet challenge. In
2016, DeepMind’s AlphaGo recorded its historic win over Go champion
Lee Sedol and Microsoft achieved human parity in speech recognition.
GPUs have proven to be incredibly effective at solving some of the most
complex problems in deep learning, and while the NVIDIA deep learning
platform is the standard industry solution for training, its inferencing
capability is not as widely understood. Some of the world’s leading
enterprises from the data center to the edge have built their inferencing
solution on NVIDIA GPUs. Some examples include:
Cloud Computing is a growing research topic in recent years. The key concept of Cloud Computing is to provide a resource sharing model based on virtualization, distributed file system, parallel algorithm and web services. But how can we provide a testbed for cloud computing related training courses? In this talk we will share our experience to build cloud computing testbed for virtualization, high throughput computing and bioinformatics applications. It covers lots of open source projects, such as DRBL, Xen, Hadoop and bioinformatics related applications.
In short, Diskless Remote Boot in Linux (DRBL) provides a diskless or systemless environment for client machines. It works on Debian, Ubuntu, Mandriva, Red Hat, Fedora, CentOS and SuSE. DRBL uses distributed hardware resources and makes it possible for clients to fully access local hardware.
Xen is one of open source hypervisor for linux kernel. It had been used in Amazon EC2 production environment to provide cloud service model (1) — "Infrastructure as a Service (IaaS)". In this talk, we will show you how DRBL can help on fast deployment of Xen playground in classroom.
Hadoop is becoming the well-known open source cloud computing technology developed by Apache community. It is very power tool for data mining. It had been used in Yahoo and Facebook production environment to provide cloud service model (2) — "Platform as a Service (PaaS)". It’s easy to setup single hadoop node but difficult to manage a hadoop cluster. In this talk, we will show you how DRBL can help on fast deployment and management.
Most bioinformatics applications are open source, such as R, Bioconductor, BLAST, Clustal, PipMaker, Phylip, etc. But it also require traditional cluster job submission. In this talk we will show you how DRBL can help to build a testbed of bioinformatics research and provide cloud service model (3) — "Software as a Service (SaaS)". In this talk, we will cover how to:
- 1. Use DRBL to deploy Xen virtual cluster (drbl-xen)
- 2. Use DRBL to deploy Hadoop cluster (drbl-hadoop)
- 3. Use DRBL to deploy bioinformatics cluster (drbl-biocluster)
A live demonstration about drbl-hadoop and drbl-biocluster will be done in the talk, too.
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Maurice Nsabimana
Volunteers around the world increasingly act as human sensors to collect millions of data points. A team from the World Bank trained deep learning models, using Apache Spark and BigDL, to confirm that photos gathered through a crowdsourced data collection pilot matched the goods for which observations were submitted.
In this talk, Maurice Nsabimana, a statistician at the World Bank, and Jiao Wang, a software engineer on the Big Data Technology team at Intel, demonstrate a collaborative project to design and train large-scale deep learning models using crowdsourced images from around the world. BigDL is a distributed deep learning library designed from the ground up to run natively on Apache Spark. It enables data engineers and scientists to write deep learning applications in Scala or Python as standard Spark programs-without having to explicitly manage distributed computations. Attendees of this session will learn how to get started with BigDL, which runs in any Apache Spark environment, whether on-premise or in the Cloud.
What are the issues integration in integrating sensor nets and other distributed systems collecting and sharing real time data? How does RTI's Data Distribution Service address the integration needs without sacrificing the real-time collaboration constraints?
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Large amount of data are produced daily from various fields such as science, economics,
engineering and health. The main challenge of pervasive computing is to store and analyze large amount of
data.This has led to the need for usable and scalable data applications and storage clusters. In this article, we
examine the hadoop architecture developed to deal with these problems. The Hadoop architecture consists of
the Hadoop Distributed File System (HDFS) and Mapreduce programming model, which enables storage and
computation on a set of commodity computers. In this study, a Hadoop cluster consisting of four nodes was
created.Regarding the data size and cluster size, Pi and Grep MapReduce applications, which show the effect of
different data sizes and number of nodes in the cluster, have been made and their results examined.
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...Larry Smarr
10.10.28
Invited Speaker
Grand Challenges in Data-Intensive Discovery Conference
San Diego Supercomputer Center, UC San Diego
Title: High Performance Cyberinfrastructure Enables Data-Driven Science in the Globally Networked World
La Jolla, CA
UnaCloud: an opportunistic cloud computing Infrastructure as a Service (IaaS) model implementation, which provides at lower cost, fundamental computing resources (processing, storage and networking) to run arbitrary software, including operating systems and applications.
NVIDIA DEEP LEARNING INFERENCE PLATFORM PERFORMANCE STUDY
| TECHNICAL OVERVIEW
| 1
Introduction
Artificial intelligence (AI), the dream of computer scientists for over half
a century, is no longer science fiction—it is already transforming every
industry. AI is the use of computers to simulate human intelligence. AI
amplifies our cognitive abilities—letting us solve problems where the
complexity is too great, the information is incomplete, or the details are
too subtle and require expert training.
While the machine learning field has been active for decades, deep
learning (DL) has boomed over the last five years. In 2012, Alex
Krizhevsky of the University of Toronto won the ImageNet image
recognition competition using a deep neural network trained on NVIDIA
GPUs—beating all the human expert algorithms that had been honed
for decades. That same year, recognizing that larger networks can learn
more, Stanford’s Andrew Ng and NVIDIA Research teamed up to develop
a method for training networks using large-scale GPU computing
systems. These seminal papers sparked the “big bang” of modern AI,
setting off a string of “superhuman” achievements. In 2015, Google and
Microsoft both beat the best human score in the ImageNet challenge. In
2016, DeepMind’s AlphaGo recorded its historic win over Go champion
Lee Sedol and Microsoft achieved human parity in speech recognition.
GPUs have proven to be incredibly effective at solving some of the most
complex problems in deep learning, and while the NVIDIA deep learning
platform is the standard industry solution for training, its inferencing
capability is not as widely understood. Some of the world’s leading
enterprises from the data center to the edge have built their inferencing
solution on NVIDIA GPUs. Some examples include:
Cloud Computing is a growing research topic in recent years. The key concept of Cloud Computing is to provide a resource sharing model based on virtualization, distributed file system, parallel algorithm and web services. But how can we provide a testbed for cloud computing related training courses? In this talk we will share our experience to build cloud computing testbed for virtualization, high throughput computing and bioinformatics applications. It covers lots of open source projects, such as DRBL, Xen, Hadoop and bioinformatics related applications.
In short, Diskless Remote Boot in Linux (DRBL) provides a diskless or systemless environment for client machines. It works on Debian, Ubuntu, Mandriva, Red Hat, Fedora, CentOS and SuSE. DRBL uses distributed hardware resources and makes it possible for clients to fully access local hardware.
Xen is one of open source hypervisor for linux kernel. It had been used in Amazon EC2 production environment to provide cloud service model (1) — "Infrastructure as a Service (IaaS)". In this talk, we will show you how DRBL can help on fast deployment of Xen playground in classroom.
Hadoop is becoming the well-known open source cloud computing technology developed by Apache community. It is very power tool for data mining. It had been used in Yahoo and Facebook production environment to provide cloud service model (2) — "Platform as a Service (PaaS)". It’s easy to setup single hadoop node but difficult to manage a hadoop cluster. In this talk, we will show you how DRBL can help on fast deployment and management.
Most bioinformatics applications are open source, such as R, Bioconductor, BLAST, Clustal, PipMaker, Phylip, etc. But it also require traditional cluster job submission. In this talk we will show you how DRBL can help to build a testbed of bioinformatics research and provide cloud service model (3) — "Software as a Service (SaaS)". In this talk, we will cover how to:
- 1. Use DRBL to deploy Xen virtual cluster (drbl-xen)
- 2. Use DRBL to deploy Hadoop cluster (drbl-hadoop)
- 3. Use DRBL to deploy bioinformatics cluster (drbl-biocluster)
A live demonstration about drbl-hadoop and drbl-biocluster will be done in the talk, too.
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Maurice Nsabimana
Volunteers around the world increasingly act as human sensors to collect millions of data points. A team from the World Bank trained deep learning models, using Apache Spark and BigDL, to confirm that photos gathered through a crowdsourced data collection pilot matched the goods for which observations were submitted.
In this talk, Maurice Nsabimana, a statistician at the World Bank, and Jiao Wang, a software engineer on the Big Data Technology team at Intel, demonstrate a collaborative project to design and train large-scale deep learning models using crowdsourced images from around the world. BigDL is a distributed deep learning library designed from the ground up to run natively on Apache Spark. It enables data engineers and scientists to write deep learning applications in Scala or Python as standard Spark programs-without having to explicitly manage distributed computations. Attendees of this session will learn how to get started with BigDL, which runs in any Apache Spark environment, whether on-premise or in the Cloud.
What are the issues integration in integrating sensor nets and other distributed systems collecting and sharing real time data? How does RTI's Data Distribution Service address the integration needs without sacrificing the real-time collaboration constraints?
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Large amount of data are produced daily from various fields such as science, economics,
engineering and health. The main challenge of pervasive computing is to store and analyze large amount of
data.This has led to the need for usable and scalable data applications and storage clusters. In this article, we
examine the hadoop architecture developed to deal with these problems. The Hadoop architecture consists of
the Hadoop Distributed File System (HDFS) and Mapreduce programming model, which enables storage and
computation on a set of commodity computers. In this study, a Hadoop cluster consisting of four nodes was
created.Regarding the data size and cluster size, Pi and Grep MapReduce applications, which show the effect of
different data sizes and number of nodes in the cluster, have been made and their results examined.
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
Cloud Computing Evolution
Why Cloud Computing needed?
Cloud Computing Models
Cloud Solutions
Cloud Jobs opportunities
Criteria for Big Data
Big Data challenges
Technologies to process Big Data- Hadoop
Hadoop History and Architecture
Hadoop Eco-System
Hadoop Real-time Use cases
Hadoop Job opportunities
Hadoop and SAP HANA integration
Summary
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...Databricks
What we call the public cloud was developed primarily to manage and deploy web servers. The target audience for these products is Dev Ops. While this is a massive and exciting market, the world of Data Science and Deep Learning is very different — and possibly even bigger. Unfortunately, the tools available today are not designed for this new audience and the cloud needs to evolve. This talk would cover what the next 10 years of cloud computing will look like.
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONijcsit
Map Reduce has gained remarkable significance as a rominent parallel data processing tool in the research community, academia and industry with the spurt in volume of data that is to be analyzed. Map Reduce is used in different applications such as data mining, data analytic where massive data analysis is required, but still it is constantly being explored on different parameters such as performance and efficiency. This survey intends to explore large scale data processing using Map Reduce and its various implementations to facilitate the database, researchers and other communities in developing the technical understanding of the Map Reduce framework. In this survey, different Map Reduce implementations are explored and their inherent features are compared on different parameters. It also addresses the open issues and challenges raised on fully functional DBMS/Data Warehouse on Map Reduce. The comparison of various Map Reduce implementations is done with the most popular implementation Hadoop and other similar implementations using other platforms.
Cloud computing 13 principal enabling technologiesVaibhav Khanna
Cloud computing is Internet-based computing, whereby shared resources, software and information are provided to computers and other devices on-demand, like the electricity grid.
The cloud computing is a culmination of numerous attempts at large scale computing with seamless access to virtually limitless resources
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
Deploying storage with a forklift is so 1990s, right? Today’s applications and infrastructure demand systems and services that scale. Customers require performance and capacity to fit the use case and workloads, not the other way around. Architects need multi-temperature, multi-location, highly available, and compliance friendly platforms that grow with the generational shift in data growth and utility.
Cloud Computing :Technologies for Network-Based Systems - System Models for Distributed and Cloud Computing - Implementation Levels of Virtualization - Virtualization Structures/Tools and Mechanisms - Virtualization of CPU, Memory, and I/O Devices - Virtual Clusters and Resource Management - Virtualization for Data-Center Automation.
Performance Management in ‘Big Data’ ApplicationsMichael Kopp
Do applications using NoSQL still require performance management? Is it always the best option to throw more hardware at a MapReduce job? In both cases, performance management is still about the application, but "Big Data" technologies have added a new wrinkle.
In this video from the HPC User Forum in Santa Fe, Yoonho Park from IBM presents: IBM Datacentric Servers & OpenPOWER.
"Big data analytics, machine learning and deep learning are among the most rapidly growing workloads in the data center. These workloads have the compute performance requirements of traditional technical computing or high performance computing, coupled with a much larger volume and velocity of data."
Watch the video: http://wp.me/p3RLHQ-gJv
Learn more: https://openpowerfoundation.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
David Thoumas, OpenDataSoft CTO, about data API strategy (rich API vs. multiple end-points) for broadcasting data & making business
At APIdays 2012, the 1st European event dedicated to API world
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
2. Outline
Context
From DIET…
… to SysFera-DS
Conclusion
2
3. Why Large Scale systems?
First need: supercomputing at a national or international scale
Large size problems (grand challenge) need a collaboration
between several codes/supercomputing centers
Always a need for more computing power, memory capacity,
and disk storage
The power of any single resource is always small compared to
the aggregation of several resources
Network connectivity increased quickly!
• Many available resources
• Increasing complexity of applications
– Many clusters
– Multi-scale
– Supercomputers
– Multi-disciplinary
– Millions of PC and
– Huge data set produced
workstations connected
– Heterogeneity
– Sharing or renting resources
From DIET to SysFera-DS 3
4. Centralized or Decentralized ?
2001 TeraGrid / 2003 Grid’5000
Centralized! 1997 Google Cluster
• Grid Computing
(Clusters of Clusters)
(De)Centralized!
Decentralized!
Centralized!
Decentralized! Sky Computing
2002 Earth Simulator
• First computer to reach the Teraflops (40TF)
• Homogeneous, Centralized, Expensive
1946 ENIAC
• 18.000 tubes, 30 tons, 170 m²
• 2.000 tubes replaced every
months by 6 technicians
Cloud Computing
• Amazon
• Google
• Microsoft 2008 IBM Roadrunner
• … • First computer to reach
the Petaflops
From DIET to SysFera-DS 4
5. Research driven by applications
Data-centric applications
Very Large data management (in, out, temporary)
>30 TB data/night
Computer-centric applications
GigaFlops
Predicting Impacts of Massive Earthquakes (SDSC)
Community-centric applications
Data sharing (acquisition, results, ..)
Resources
Large Hadron Collider (LHC)
Without an optimal scheduling?
I just need my simulation result
Without minimizing ressources consumption?
Without any optimisation? …
Grid user point of view
Single sign-on
Single compute space
Single data space
Single development environment
From DIET to SysFera-DS 5
6. Which framework ?
Holy Grail: Transparency and simplicity (maybe even before performance) !
Scheduling tunability
Many incarnations of the Grid
Grid computing
Cluster computing peer-to-peer systems,
Global computing Web Services,
Clouds, …
Many programming models
Shared-State Models
Message Passing Models,
Hybrids models
RPC and RMI models
Peer-to-peer models
Web Services models
Coordination models, …
Do not forget good ol’ time research on scheduling and distributed systems
!
Most scheduling problems are very difficult to solve even in their simplistic
form …
… but simple solutions often lead to better performance results in real life
From DIET to SysFera-DS 6
7. Outline
Context
From DIET…
… to SysFera-DS
Conclusion
7
8. DIET’s Goals http://graal.ens-lyon.fr/DIET/
Our goals
To develop a toolbox for the deployment of environments using the Application Service
Provider/Software as a Service (ASP/SaaS) paradigm with different applications
Use as much as possible public domain and standard software
To obtain a high performance and scalable environment
Implement and validate our more theoretical results
Scheduling for heterogeneous platforms, data (re)distribution and replication, performance
evaluation, algorithmic for heterogeneous and distributed platforms, …
Based on CORBA and our own software developments
FAST for performance evaluation,
LogService for monitoring,
VizDIET for the visualization,
GoDIET for the deployment
Dagda for the data management
Several applications in different fields (simulation, bioinformatics, …)
Release 2.8 available on the web since november
ACI Grid ASP, RNTL GASP, ANR LEGO CIGC-05-11, ANR Gwendia, Celtic-plus
Project SEED4C
From DIET to SysFera-DS 8
9. RPC and Grid-Computing: Grid-RPC
• One simple idea
– Implementing the RPC programming model over the grid
– Using resources accessible through the network
– Mixed parallelism model (data-parallel model at the server level and task
parallelism between the servers)
• Features needed
– Load-balancing (resource localization and performance
evaluation, scheduling),
– IDL,
– Data and replica management,
– Security,
– Fault-tolerance,
– Interoperability with other systems,
– …
Design of a standard interface
– within the OGF (Grid-RPC and SAGA WG)
– Existing implementations: NetSolve/GridSolve, Ninf, DIET, OmniRPC
From DIET to SysFera-DS 9
10. RPC and Grid Computing: Grid-RPC
Request
AGENT(s)
Client S2 !
Op(C, A, B)
S3 S4
S1 S2
From DIET to SysFera-DS 10
11. Client and server interface
Client side
So easy …
Multi-interface
(C, C++, Fortran, Java, Python, Scilab, Web
Services, etc.)
Grid-RPC compliant
Server side
Install and submit new server to agent (LA)
Problem and parameter description
Client IDL transfer from server
Dynamic services
new service
new version
security update
outdated service
Etc.
From DIET to SysFera-DS 11
12. Architecture overview
( )* +,$
"
' &$
( )*
"+,$
' &$
( )*
"+,$
' &$ ' &$
%&$
%&$
! "# $
! "# $
! "# $ ! "# $ MA : Master Agent
! "# $ LA : Local Agent
! "# $ SeD : ServerDeamon
From DIET to SysFera-DS 12
13. Workflow Management
Workflow representation
Direct Acyclic Graph (DAG)
Each vertex is a task
Each directed edge represents
communication between tasks
Functional workflows
Loops, if statements, automatic
parallelism, fault-tolerance
Goals
!
Build and execute workflows
Use different heuristics to solve scheduling
problems
Extensibility to address multi-workflows
submission and large grid platform
Manage heterogeneity and variability of
environment
ANR Gwendia time
Idle Data transfert Execution time
Language definition (MOTEUR & MADAG)
EGI (Glite) Comparison on Grid’5000 vs EGI 132.143 s
32.857s 274.643 s
Grid’5000 (DIET) 0.214s Contribution to the management of large 540.614 s
3.371 s scale
platforms: the DIET experience 13
14. DIET Scheduling: Plug-in Schedulers
SeD level
Performance estimation function
Estimation Metric Vector - dynamic collection of performance estimation values
Performance measures available through DIET
FAST-NWS performance metrics
Time elapsed since the last execution
CoRI (Collector of Resource Information)
Developer defined values
Aggregation Methods
Defining mechanism to sort SeD responses: associated with the service and
defined at SeD level
Tunable comparison/aggregation routines for scheduling
Priority Scheduler
Performs pairwise server estimation comparisons returning a sorted list of server
responses;
Can minimize or maximize based on SeD estimations and taking into consideration the
order in which the request for those performance estimations was specified at SeD level.
From DIET to SysFera-DS 14
15. DIET Scheduling: Performance estimation
Collector of Resource Information (CoRI)
Interface to gather performance information
Currently 2 modules available
CoRI Manager
CoRI Easy
FAST (Martin Quinson’s PhD) CoRI-Easy FAST Other
Collector Collector Collectors like
Sigar, GPU, etc to come… Ganglia
Extension for parallel program
• Code analysis / FAST calls combination
• Allow the estimation of parallel
regular routines (ScaLAPACK-like)
Max. error: 14,7 %
Avg. error: 3,8 %
35,00 35,00
30,00 30,00
25,00 25,00
20,00 20,00
15,00 15,00
10,00 10,00
5,00 5,00
0,00
0,00
1
1
6
6
1 11 1
11 6
6 16
16 11 11
16 21 16
21
21
26 21 26
26 26
31 31
31 31
Measured Estimated
From DIET to SysFera-DS 15
16. Data Management
Three approaches for DIET
DTM (LIFC, Besançon)
Hierarchical and distributed data manager
Redistribution between servers
JuxMem (Paris, Rennes)
P2P data cache
DAGDA (IN2P3, Clermont-Ferrand and LIP)
Joining task scheduling and data management
Standardized through GridRPC OGF WG.
• Data Arrangement for Grid and
Distributed Applications
Explicit data replication: Using the API.
Implicit data replication.
Data replacement algorithm: LRU, LFU
AND FIFO
Transfer optimization by selecting the more
convenient source.
Storage resources usage management.
Data status backup/restoration.
From DIET to SysFera-DS 16
17. Parallel and batch submissions
Parallel & sequential jobs
transparent for the user
system dependent submission MA
SeDBatch
Many batch systems
Batch schedulers behaviour
LA SeD//
Internal scheduling process
Monitoring & Performance prediction NFS
Simulation (Simbatch)
SeD
OAR
SLURM
SeDBatch PBS
LSF
OGE
Loadleveler
6/03/12 From DIET to SysFera-DS
18. DIET Cloud
Inside the Cloud
DIET platform is virtualized
inside the cloud.
(as Xen image for example)
Very flexible and scalable
as DIET nodes can be launched
Scheduling is more complex
DIET as a Cloud manager
Eucalyptus interface
Eucalyptus is treated as a new Batch System
Provide a new implementation for the BatchSystem abstract class
From DIET to SysFera-DS 18
19. Grid’5000
Grid’5000
Building a nation wide experimental platform for
Grid & P2P researches (like a particle accelerator for the computer scientists)
9 geographically distributed sites hosting clusters with 256 CPUs to 1K CPUs)
All sites are connected by RENATER (French Res. and Edu. Net.)
Design and develop a system/middleware environment for safely test and repeat
experiments
Use the platform for Grid experiments in real life conditions
4 main features:
A high security for Grid’5000 and the Internet, despite the deep reconfiguration feature
Single sign-on
High-performance LRMS: OAR
A user toolkit to reconfigure the nodes and monitor experiment: Kadeploy
DIET deployment over a maximum of processors
1 MA, 8 LA, 540 SeDs
1120 clients on 140 machines
DGEMM requests (2000x2000 matrices)
Simple round-robin scheduling
From DIET to SysFera-DS 19
20. Applications: 4 of them
Cosmology Application Climatology Application
• Dark Mater Halos • Forecasting of the world's environment and
• Large Scale experiment on Grid’5K climate on regional to global scales
• Plug-in Scheduler
Robotic Application Bioinformatics Application
Parameters
DIET API
External
DIET middleware application call
Results
Request
Metrics vector
• BLAST
BLAST service
Plugin-scheduler
declaration
•40000 requests over 5 databases of different
sizes (from 1 to 5 GB)
• Experiment between Italia and France • Data management optimized
From DIET to SysFera-DS 20
21. Conclusions
Grid-RPC
Interesting approach for several applications
Simple, flexible, and efficient
Many interesting research issues (scheduling, data management, resource
discovery and reservation, deployment, fault-tolerance, …)
DIET
Scalable, open-source, and multi-application platform
Concentration on several issues like resource discovery, scheduling (distributed
scheduling and plugin schedulers), deployment (GoDIET and
GRUDU), performance evaluation (CoRI), monitoring (LogService and
VizDIET), data management and replication (DTM, JuxMem, and DAGDA)
Large scale validation on the Grid’5000 platform
A middleware designed and tunable for different applications
http://www.grid5000.org/
From DIET to SysFera-DS 21
22. Results
A complete Middleware for heterogeneous infrastructure
DIET is light to use and non-intrusive
Dedicated to many applications
Designed for Grid and Cloud
Efficient even in comparison to commercial tools
DIET is high tunability middleware
Used in production
The DIET Team
SysFera Compagny (14 persons today)
http://www.sysfera.com
From DIET to SysFera-DS 22
23. Future Prospects
Do we need application specific schedulers ?
Scheduling based on Economic Model for Cloud Platform
DIET Green (Collaboration with RESO)
Increase the DIET capacity to deal with heterogeneous
resources MA
Single System Image Cluster OS LA
Box Cluster LA LA SED Kerrighed
Kerrighed script generator
Deploy the image
Virtual Machines
New services are register
SED Batch SED Cloud
SED
Batch script generator
Cloud script generator
Submission to batch scheduler Deploy the image
New services are register
GPU architecture SMP Virtual
Multi-core
Batch Scheduler Cloud Platform
PBS, OAR, Loadlever, ... Eucalyptus, EC2, ...
Large scale architecture
…
From DIET to SysFera-DS 23
24. Outline
Context
From DIET…
… to SysFera-DS
Conclusion
24
25. Who are we?
• 2001: Research project from the Graal team
(Inria/ENS)
– DIET: grid middleware
• 2007: SysFera-DS used within the Décrypthon project
– Used in production
– Selected by IBM to replace Univa-UD
• 2010: Creation of SysFera, INRIA spin-off
• 2012: A team of 14 (R&D: 4 engineers and 5 PhD)
– Supported by two experts from INRIA and ENS
– SysFera-DS
26. Décrypthon
HPC management & mutualization
Before SysFera-
DS:
• Local usage of
resources
• No unique
submission BORDEAUX LILLE
interface
• 5 sites, 2 LoadLeveler LoadLeveler
different batch
schedulers
JUSSIE
ORSAY
U
LYON
LoadLeveler LoadLeveler
OAR + Stockage
27. Décrypthon
HPC management & mutualization
With SysFera-DS:
• Resources mutualization
• Web interface for
submission
• Application specific
scheduling
Site Web
• Data management BORDEAUX de
LILLE
soumissi
• Hardware failures LoadLeveler on
LoadLeveler
hidden from the
users (automatic
re-submission)
JUSSIE
ORSAY
U
LYON
LoadLeveler LoadLeveler
OAR + Stockage
28. Helping cure muscular distrophy
« The Décrypthon Steering Commitee chose
SysFera-DS starting on June 2007 for its qualities
of robustness and modularity. It has been
progressively implemented on the Décrypthon
grid's ressources while ensuring a completely
transparent and smooth transition for the
users. » Thierry Toursel
Research Project Manager, AFM
31. Working with a leading international
company
Thanks to SysFera-DS, we can now provide our
R&D engineers a stable, reliable and
performant solution to access our
supercomputers and computing clusters.
David Bateman
ICCOS Group Manager, EDF
32. SysFera-DS does it all
• Simple access to complex infrastructures
• Advanced administration features
– User management and access control
– Monitoring and reporting
• Consistent platform for application development
• Integration to existing environments
• Compatibility with many different resources
• Non-intrusive, non-exclusive
• Flexible, stable, reliable, performant
33. Keys benefits
Heterogeneous
applications
management
Big Data
Efficient
Management
Workflow & dataflow mangement &
design
Collaborative
Webboard
Hybrid Cloud
34. Offers
• A software to optimize your computations
• A licence to plug inside your software
• Your applications migration
• A webboard to manage your applications & infrastructures
• Skilled competences to support these tools
• Skilled competences to develop dedicated plugins
Your applications
Our Software
Our
Software
Your infrastucture
Your
Applications
Pool ressources
CIMENT CLOUD …
35. Offers
Webboard
« To manage Your
your Applications
Webboard
applications »
« To manage Your
your Applications
Vishnu applications »
« A set of dedicated plugins –
infrastructure management »
DIET
« to optimize your computations & integrate your
infrastructures »
36. Features overview
• Meta-scheduling (load balancing), workflows
management, jobs management, data management
• Resources and communications management
• Launch and monitoring of jobs, file transfers, hardware and
software infrastructure through a scientific portal
• User management with single sign-on
• Cross network domain
• Advanced and fine-grained data management
• Automatic management of dynamic resources
• Maintenance management
• Easy deployment
• Usable in user space: no need to be root
• Cloud management
37. The WebBoard (Before SysFera)
User and admin interface One app - one page
User rights management
Statistics
39. Outline
• Context
• From DIET…
• … to SysFera-DS
• Conclusion
39
40. 05.04.12 ANR-SOP
An open source solution
The core of SysFera-DS is open-source software...
...which means anyone can use it, share it, and
contribute to it.
40
42. Conclusion
• An open source solution with two different kind of
collaborated support
DIET
LIP - Avalon Team
- Proof of concept
- Simulations
- New features
- Grid’5000 experiments
- Scientific expertise
- etc.
SysFera-DS
SysFera
- Application support with industrial quality
- Platfom development
- New features
- Personnal features
- Research Grid to Production Grid
- Hotline
43. Acknowledgment
Abdelkader Amar Florent Rochette Nicolas Bard
Adrian Muresan Frédéric Desprez Ousmane Thiare
Alan Su Frédéric Lombard Peter Frauenkron
Amine Bsila Frédéric Suter Philippe Combes
Andréea Chis Gaël Le Mahec Philippe Martinez
Antoine Vernois Georg Hoesch Philippe Vicens
Barbara Walter Ghislain Charrier Phuspinder Kaur Chouhan
Benjamin Depardon Haïkel Guemar Raphaël Bolze
Benjamin Isnard Ibrahima Cissé Romain Lacroix
Bert Van Heukelom Jean-Marc Nicod Stéphane Vialle
Bruno DelFabro Jonathan Rouzaud-Cornabas Sylvain Dahan
Christophe Pera Kevin Coulomb Vincent Pichon
Cyril Pontvieux Laurent Philippe Yves Caniou
Cédric Tedeschi Ludovic Bertsch
Damien Reimert-Vasconcellos Luis Rodero-Merino
Daouda Traore Marc Boury
David Loureiro Martin Quinson
Eric Boix Mathias Colin
Eugene Pamba Capochichi Mathieu Jan
Emmanuel Quémener Maurice Djibril Faye
43