Mirko Mariotti presented a project to harvest dispersed computational resources using OpenStack. The project involved installing a single OpenStack installation to manage resources across multiple geographic locations. Resources at different locations were organized into zones corresponding to their location. Software-defined networking was used to connect the zones and make the remote resources available via the centralized OpenStack installation. This allowed researchers to access computational resources located at different universities and research centers through the single OpenStack interface.
Towards a Lightweight Multi-Cloud DSL for Elastic and Transferable Cloud-nati...Nane Kratzke
Cloud-native applications are intentionally designed for the cloud in order to leverage cloud platform features like horizontal scaling and elasticity – benefits coming along with cloud platforms. In addition to classical (and very often static) multi-tier deployment scenarios, cloud-native applications are typically operated on much more complex but elastic infrastructures. Furthermore, there is a trend to use elastic container platforms like Kubernetes, Docker Swarm or Apache Mesos. However, especially multi-cloud use cases are astonishingly complex to handle. In consequence, cloud-native applications are prone to vendor lock-in. Very often TOSCA-based approaches are used to tackle this aspect. But, these application topology defining approaches are limited in supporting multi-cloud adaption of a cloud-native application at runtime. In this paper, we analyzed several approaches to define cloud-native applications being multi-cloud transferable at runtime. We have not found an approach that fully satisfies all of our requirements. Therefore we introduce a solution proposal that separates elastic platform definition from cloud application definition. We present first considerations for a domain specific language for application definition and demonstrate evaluation results on the platform level showing that a cloud-native application can be transfered between different cloud service providers like Azure and Google within minutes and without downtime. The evaluation covers public and private cloud service infrastructures provided by Amazon Web Services, Microsoft Azure, Google Compute Engine and OpenStack.
Lessons Learned from Porting HelenOS to RISC-VMartin Děcký
HelenOS is an open source operating system based on the microkernel multiserver design principles. One of its goals is to provide excellent target platform portability. From the time of its inception, HelenOS already supported 4 different hardware platforms and currently it supports platforms as diverse as x86, SPARCv9 and ARM. This talk presents practical experiences and lessons learned from porting HelenOS to RISC-V.
While the unprivileged (user space) instruction set architecture of RISC-V has been declared stable in 2014, the privileged instruction set architecture is technically still allowed to change in the future. Likewise, many major design features and building blocks of HelenOS are already in place, but no official commitment to ABI or API stability has been made yet. This gives an interesting perspective on the pros and cons of both HelenOS and RISC-V. The talk also points to some possible research directions with respect to hardware/software co-design.
Towards a Lightweight Multi-Cloud DSL for Elastic and Transferable Cloud-nati...Nane Kratzke
Cloud-native applications are intentionally designed for the cloud in order to leverage cloud platform features like horizontal scaling and elasticity – benefits coming along with cloud platforms. In addition to classical (and very often static) multi-tier deployment scenarios, cloud-native applications are typically operated on much more complex but elastic infrastructures. Furthermore, there is a trend to use elastic container platforms like Kubernetes, Docker Swarm or Apache Mesos. However, especially multi-cloud use cases are astonishingly complex to handle. In consequence, cloud-native applications are prone to vendor lock-in. Very often TOSCA-based approaches are used to tackle this aspect. But, these application topology defining approaches are limited in supporting multi-cloud adaption of a cloud-native application at runtime. In this paper, we analyzed several approaches to define cloud-native applications being multi-cloud transferable at runtime. We have not found an approach that fully satisfies all of our requirements. Therefore we introduce a solution proposal that separates elastic platform definition from cloud application definition. We present first considerations for a domain specific language for application definition and demonstrate evaluation results on the platform level showing that a cloud-native application can be transfered between different cloud service providers like Azure and Google within minutes and without downtime. The evaluation covers public and private cloud service infrastructures provided by Amazon Web Services, Microsoft Azure, Google Compute Engine and OpenStack.
Lessons Learned from Porting HelenOS to RISC-VMartin Děcký
HelenOS is an open source operating system based on the microkernel multiserver design principles. One of its goals is to provide excellent target platform portability. From the time of its inception, HelenOS already supported 4 different hardware platforms and currently it supports platforms as diverse as x86, SPARCv9 and ARM. This talk presents practical experiences and lessons learned from porting HelenOS to RISC-V.
While the unprivileged (user space) instruction set architecture of RISC-V has been declared stable in 2014, the privileged instruction set architecture is technically still allowed to change in the future. Likewise, many major design features and building blocks of HelenOS are already in place, but no official commitment to ABI or API stability has been made yet. This gives an interesting perspective on the pros and cons of both HelenOS and RISC-V. The talk also points to some possible research directions with respect to hardware/software co-design.
Overview of challenges being faced by the AI community to achieve high-performance, scalable and distributed DNN training on Modern HPC systems with both scale-up and scale-out strategies. After that, the talk will focus on a range of solutions being carried out in my group to address these challenges. The solutions will include: 1) MPI-driven Deep Learning, 2) Co-designing Deep Learning Stacks with High-Performance MPI, 3) Out-of-core DNN training, and 4) Hybrid (Data and Model) parallelism. Case studies to accelerate DNN training with popular frameworks like TensorFlow, PyTorch, MXNet and Caffe on modern HPC systems will be presented.
Stateless load balancing - Research overviewAndrea Tino
Master Degree training program research project. The presentation introduces main objectives of the thesis and describes (without providing in-depth details) the most important aspects of the activity.
5G-Slicer: An emulator for mobile IoT applications deployed over 5G network s...MoysisSymeonides
"5G-Slicer An emulator for mobile IoT applications deployed over 5G network slices" presentation at 2022 IEEE/ACM Seventh International Conference on Internet-of-Things Design and Implementation (IoTDI)
A Comparative Analysis of Additional Overhead Imposed by Internet Protocol Se...ijceronline
IPSec, an Internet layer three (3)-security protocol suite is often characterising with introducing an additional space and processing overhead when implemented on a network for secured communication using either internet protocol version 4 or 6; IPv4 or IPv6. The use of Internet protocol security (IPSec) on IPv4 is an alternative that offers solutions and addresses the security vulnerabilities in network layer of the open system interconnect (OSI) and transmission control protocol/internet protocol (TCP/IP) protocol stack. In IPv6, IPSec is one among many other features added to the earlier Internet protocol to enhance efficiency and security. This paper, set as its objective to reports on the impact of processing and space overhead introduced by IPSec on both IPv4 and IPv6 in relation to packet end-to-end delay based on different IPSec transformations with different authentication and encryption algorithms deployed in different scenarios simulated using NS2. The experiment result reveals that the cost of IPSec added overhead is relatively small when smaller packet sizes are involved for both protocols in comparison with large packet sizes that were IPSec protected with the same configuration as the smaller packet, unless in the cases whereby the packet was very large which has to be fragmented. This paper can therefore, serve as a guide for network administrators to trade up between processing cost and larger address space specifically for transmission involving larger IP packets
Now more than ever, today’s businesses require reliable network connectivity and access to corporate resources. Connections to and from business units, vendors and SOHOs are all equally important to keep the continuity when needed. Business runs all day, every day and even in off hours. Most companies run operations around the clock, seven days a week so it’s important to realize that to keep a solid business continuity strategy, redundancy technologies should be considered and/or implemented.
So, we need to keep things up and available all the time. This is sometimes referred to five nines (99.999) uptime. The small percentage of downtime is accounted for unforeseen incidents, or ‘scheduled maintenance’ and usually set to take place during times of least impact, like in the middle of the night, or on holiday weekends if planned. If this is not a part of your systems and network architecture it should be considered if you want to keep a high level of availability. Because things break and unforeseen events do take place, we need to evaluate the need for creating an architecture that is ‘highly available’, or up as much as possible, with failures foreseen ahead of time and the only downtime, is to do planned maintenance.
Course: "Introductory course to HLS FPGA programming"Mirko Mariotti
Slides of the course: "Introductory course to HLS FPGA programming", Nov 27 – 30, 2023. ICSC National research center on HPC, big data and Quantum Computing
Slides of the talk: "Machine Learning on FPGA: The BondMachine Project". Given at "IV International School on Open Science Cloud", 28 Nov - 2 Dec 2022 - Perugia - Italy
More Related Content
Similar to Harvesting dispersed computational resources with Openstack
Overview of challenges being faced by the AI community to achieve high-performance, scalable and distributed DNN training on Modern HPC systems with both scale-up and scale-out strategies. After that, the talk will focus on a range of solutions being carried out in my group to address these challenges. The solutions will include: 1) MPI-driven Deep Learning, 2) Co-designing Deep Learning Stacks with High-Performance MPI, 3) Out-of-core DNN training, and 4) Hybrid (Data and Model) parallelism. Case studies to accelerate DNN training with popular frameworks like TensorFlow, PyTorch, MXNet and Caffe on modern HPC systems will be presented.
Stateless load balancing - Research overviewAndrea Tino
Master Degree training program research project. The presentation introduces main objectives of the thesis and describes (without providing in-depth details) the most important aspects of the activity.
5G-Slicer: An emulator for mobile IoT applications deployed over 5G network s...MoysisSymeonides
"5G-Slicer An emulator for mobile IoT applications deployed over 5G network slices" presentation at 2022 IEEE/ACM Seventh International Conference on Internet-of-Things Design and Implementation (IoTDI)
A Comparative Analysis of Additional Overhead Imposed by Internet Protocol Se...ijceronline
IPSec, an Internet layer three (3)-security protocol suite is often characterising with introducing an additional space and processing overhead when implemented on a network for secured communication using either internet protocol version 4 or 6; IPv4 or IPv6. The use of Internet protocol security (IPSec) on IPv4 is an alternative that offers solutions and addresses the security vulnerabilities in network layer of the open system interconnect (OSI) and transmission control protocol/internet protocol (TCP/IP) protocol stack. In IPv6, IPSec is one among many other features added to the earlier Internet protocol to enhance efficiency and security. This paper, set as its objective to reports on the impact of processing and space overhead introduced by IPSec on both IPv4 and IPv6 in relation to packet end-to-end delay based on different IPSec transformations with different authentication and encryption algorithms deployed in different scenarios simulated using NS2. The experiment result reveals that the cost of IPSec added overhead is relatively small when smaller packet sizes are involved for both protocols in comparison with large packet sizes that were IPSec protected with the same configuration as the smaller packet, unless in the cases whereby the packet was very large which has to be fragmented. This paper can therefore, serve as a guide for network administrators to trade up between processing cost and larger address space specifically for transmission involving larger IP packets
Now more than ever, today’s businesses require reliable network connectivity and access to corporate resources. Connections to and from business units, vendors and SOHOs are all equally important to keep the continuity when needed. Business runs all day, every day and even in off hours. Most companies run operations around the clock, seven days a week so it’s important to realize that to keep a solid business continuity strategy, redundancy technologies should be considered and/or implemented.
So, we need to keep things up and available all the time. This is sometimes referred to five nines (99.999) uptime. The small percentage of downtime is accounted for unforeseen incidents, or ‘scheduled maintenance’ and usually set to take place during times of least impact, like in the middle of the night, or on holiday weekends if planned. If this is not a part of your systems and network architecture it should be considered if you want to keep a high level of availability. Because things break and unforeseen events do take place, we need to evaluate the need for creating an architecture that is ‘highly available’, or up as much as possible, with failures foreseen ahead of time and the only downtime, is to do planned maintenance.
Similar to Harvesting dispersed computational resources with Openstack (20)
Course: "Introductory course to HLS FPGA programming"Mirko Mariotti
Slides of the course: "Introductory course to HLS FPGA programming", Nov 27 – 30, 2023. ICSC National research center on HPC, big data and Quantum Computing
Slides of the talk: "Machine Learning on FPGA: The BondMachine Project". Given at "IV International School on Open Science Cloud", 28 Nov - 2 Dec 2022 - Perugia - Italy
Slides of the talk at: "Joint ICTP-IAEA School on FPGA-based SoC and its
Applications to Nuclear and Scientific Instrumentation". 14 November - 02 December 2022 ICTP, Trieste, Italy
BondMachine slides for the InnovateFPGA 2018 Grand FinalMirko Mariotti
The BondMachine Project reached the InnovateFPGA 2018 Grand Final. These are the slides of the presentation of the project given at the Intel Campus in San Jose on the 15th of August 2018
The BondMachine: a fully reconfigurable computing ecosystemMirko Mariotti
The aim of the BondMachine (BM) project is to implement a computing system to enable a real and full exploitation of the underlying hardware. This is a key to the success in the era of hybrid computing. In order to achieve this objective the BM has been designed to create a heterogeneous and flexible architecture on top of FPGAs. Moreover the overall vision is based on a reduction of the number of hardware/software layers which by product guarantees a simpler software development. As such, the BM project has been thought as a complete reconfigurable computing ecosystem that, starting from a high-level description, creates both the hardware and the software that runs on it.
The two architectural pillars are computing elements (processors) and non-computing elements (for example memories, channels, barriers). The latter are meant to be shared among processors. Finally, thanks to a custom network protocol, many BondMachines can be interconnected together, therefore building heterogeneous multi-core systems or even clusters of multi-cores.
The flexibility of the BM makes possible the implementation of any computing system ranging from networks of small agents, like IoT (Internet of Things), to high-performance devices for ML (Machine Learning) or real time data analysis, and even systems that mix all this different characteristics together. The BM can interact with standard Linux workstations both as a special purpose hardware accelerator or as part of a computer/BM hybrid clusters.
Pim - Un Esempio di integrazione AA dei servizi INFN/Universita'Mirko Mariotti
Slides presentate al Workshop CCR 2016 dell' Istituto Nazionale di Fisica Nucleare, riguardanti la progettazione e lo sviluppo del sistema di gestione dei servizi erogati dal Dipartimento di Fisica e Geologia dell'Universita' di Perugia e dalla locale sezione INFN.
Presentazione delle attivita' svolte utilizzando il sistema di virtualizzazione Xen nel dipartimento di fisica dell' universita' di Perugia fino al 2006.
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Harvesting dispersed computational resources with Openstack
1. Harvesting dispersed computational resources with
Openstack
a Cloud infrastructure for the
Computational Science community
Mirko Mariotti
mirko.mariotti@unipg.it
ISGC 2018 - Academia Sinica - Taipei
2. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
Agenda
● General overview of our problem.
● Some words on our OpenStack Installation.
● How we extend our system to remote resources.
● Use cases.
2
3. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
General overview
Harvesting dispersed computational resources is an important topic for
nowaday, in particular for a small center.
The main goal of the present work is to illustrate a real example on how to
build a geographically distributed cloud to share and manage computing
resources, owned by heterogeneous cooperating entities.
3
4. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
Openstack @ Perugia (Italy)
● Small OpenStack installation (~600 cores)
● Computational resources for local researcher, students, labs, events.
● Not only services, base for our R&D on cloud technologies
4
5. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
Openstack @ Perugia (Italy)
● AA federated with INFN-AAI and Unipg IDM
● Network virtualization via neutron and VLAN backend
● Storage: cinder, ceph
● Two installation, one production (Mitaka), one development (latest
available)
● OpenStack core machine also virtualized (outside OS)
5
6. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
Dispersed resources
Some of our researchers have access to other computational centers
geographically distributed.
The centers are not cloud-based:
● Lack of local manpower.
● Not big enough to install a complete cloud system.
6
7. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
Dispersed resources
Locations
7
Dept of Physics and
Geology/ INFN Perugia
Dept of Chemistry
ASI-SSDC Space
Science data center at
the Italian Space
Agency
Dept of Pharmacy
8. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
The technical objectives
● To include remote resources into our local OpenStack installation.
● To make sure that the included satellite resources are used efficiently by
the cloud framework.
● To give back to the owning research group in the form of cloud resources
(instances, storage, and recipes)
8
9. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
The Pillars
● A single OpenStack installation.
● Resource organized in different zones logically correspondent to different
geographical locations.
● SDN (software-defined networking) solution to connect the different
zones.
● All build with standard servers and Linux systems.
9
10. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
A single OpenStack Installation
A central OS installation control all the sites but ...
the sites have to be as much autonomous as possible especially regarding:
● Storage
● Outbound connectivity
Cross-site operations have to be possible (knowing the risks).
Ideally the traffic among sites would be only the OpenStack management one.
10
11. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
SDN (Software-Defined Networking)
Software-Defined Networking is a way to overlay multiple networks to a single
physical fabric and to control them via software.
Openvswitch is an open source project for SDN
Used for network virtualization in many cloud framework
We are using this approach and Openvswitch also to the physical
infrastructure.
11
12. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
SDN
How we use it
We use a Linux box for each site to “virtualize” the openstack LAN (Both management and projects) and
transport it to other sites.
12
14. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
Connected nodes
14
The sites are connected Layer 2
15. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
Network Security
The VXLAN tunnel has to be encrypted, we tried two solution:
● OpenVPN point to point
○ Routers friendly (standard UDP/TCP traffic)
○ Less performant
● IPSEC
○ Routers unfriendly
○ More fragmented traffic
○ More performant
15
17. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
Per-zone customizations
In order to avoid cross-site interactions some consideration has to be taken
into account:
● Storage:
○ VMs in each zone has to use storage backend from the same zone.
● Network:
○ It is a nonsense to allow outbound traffic from satellite sites to go back to the main site.
○ Custom gateways for projects network on those zones. (more on the use cases)
17
18. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
Possible Issues
Site security
All the sites are on the same Layer 2.
Errors, misconfigurations, problems can potentially impact on the whole
system.
Sites has to be trusted
18
19. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
Possible Issues (cont.)
Poor performance
Encryption (OpenVPN/IPSEC) and encapsulation (VXLAN) are bandwidth
consuming (especially on commodity hardware).
This could be a problem for cross-site operation, but not a real problem for
OpenStack control traffic ( ~50 kBit/s each hardware node)
19
20. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
Some measures
20
Traffic on the OS controller node
AVG in: ~ 50 kBit/s/hwnode
AVG out: ~ 25 kBit/s/hwnode
Traffic on a DB/rabbitmq node
AVG in: ~ 17 kBit/s/hwnode
AVG out: ~35 kbit/s/hwnode
21. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
Still possible issues
Network problems
For any reason the network connection to a site is severed what happen to
VMs ?
OpenStack is resilient to this situation, it cannot contact anymore the
resources but VMs continue to work correctly (provided their storage is not
cross-site).
Other sites are not affected.
21
22. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
Automation
The sites are L2 connected, every automatic installation/configuration
mechanism available on the master site work out of the box on the remote
sites (preseed, puppet etc).
22
No problem for Openvswitch but constraints on the switching hardware
23. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
Next step
A pre-configured system that with
the prerequisites of:
● Openflow compliant switches.
● A standard way of cabling a
rack.
● A public IP.
Deploy and configure that rack as an
extension to our OS installation.
23
24. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
Use cases
24
Use case 1:
AMS analysis with
DODAS
Use case 2:
Computational
Chemistry
25. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
Use case 1: DODAS
Slide from the talk of
D.Spiga (Thursday
20/3/2018)
25
27. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
Use case 2: Computational Chem
We may name several scenarios that can be easily adapted to a Cloud
architecture as the one deployed:
● Complex workflows : e.g. calculation of the ab-initio values of the
potential energy surface (PES), fitting of the points, integration of the
nuclei dynamics equations and the final statistical analysis and
visualization of the results
● Drug Design: need to build computational protocols made of many
different steps, e.g. Virtual Screening run an entire sequence of jobs to
screen a large collection of ligands against one or multiple targets.
L. Storchi. F. Tarantelli, A. Laganà,. "Computing molecular energy surfaces on a Grid." LNCS 3980, 675 (2006)
F. Milletti, L. Storchi, G. Sforna, S. Cross, G, Cruciani, "Tautomer Enumeration and Stability Prediction for Virtual Screening on Large Chemical Databases" ,
Journal of Chemical Information and Modeling, 49 (1), 68 (2009). 27
28. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
Use case 2: Computational Chem
Quantum Chemistry : e.g. we deployed an approach to perform a geometry
optimization using the Dirac-Kohn-Sham module of BERTHA, a full
4-component DKS calculation (bond length of the AuOg+
molecular system).
L. Storchi , S. Rampino , L. Belpassi , F. Tarantelli , H. M. Quiney,"Efficient parallel all-electron four-component Dirac-Kohn-Sham program
using a distributed matrix approach.II" JCTC, 2013, 9 (12), pp 5356–5364 28
29. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
Conclusions
We successfully built a cross-site OpenStack centrally managed, including
resources otherwise poorly used
The whole system is based on open source standard solutions and commodity
hardware.
Resource are isolated in zones for efficiency and to avoid cross-site
interactions.
We used successfully this infrastructure in real cases.
29
30. 2018-03-23 Mirko Mariotti ISGC 2018 - Academia Sinica, Taipei
Thanks
People that contributed to this project:
VITILLARO, Giuseppe (Istituto di Scienze e Tecnologie Molecolari, Consiglio Nazionale delle Ricerche)
FORMATO, Valerio (INFN sezione di Perugia)
DURANTI, Matteo (INFN sezione di Perugia)
SPIGA, Daniele (INFN sezione di Perugia)
CIANGOTTINI, Manuel (INFN sezione di Perugia)
STORCHI, Loriano (Dipartimento di Farmacia, Università degli Studi "G. D'Annunzio", Chieti.)
MERGÉ, Matteo (INFN sezione Roma 2);
D'ANGELI, Paolo (Space Science Data Center at the Italian Space Agency.)
GUERRA, Antonio (Space Science Data Center at the Italian Space Agency.)
PRIMAVERA, Roberto (Space Science Data Center at the Italian Space Agency.)
30