Invited Talk in ISC High Performance 2019 Focus Session "Containers for Acceleration and Accessibility in HPC and Cloud Ecosystems" https://2019.isc-program.com/presentation/?id=inv_sp183&sess=sess177
XPDS14 - Towards Massive Server Consolidation - Filipe Manco, NECThe Linux Foundation
In recent years Xen has seen the development of many minimalistic or specialized virtual machines (e.g., OSv, Mirage, ClickOS, Erlang on Xen, etc.). Thanks in part to a small CPU and memory footprints, these VMs allow for running thousands or more on a single, inexpensive commodity server. Doing so could save cloud and network operators vast amounts of money.
Attempts to do so are already underway and have discovered important bottlenecks in Xen. While some of these have already been addressed by the community (e.g., limited number of event channels or memory grants) others still remain. In this talk we describe our experience when trying to run up to 10,000 MiniOS-based VMs, including bottlenecks in the XenStore, toolchain and network pipe. We further report on prototypical solutions, and on our implementation of suspend/resume for MiniOS that allows us tens of milliseconds migrations.
Kubernetes와 Kubernetes on OpenStack 환경의 비교와 그 구축방법에 대해서 알아봅니다.
1. 클라우드 동향
2. Kubernetes vs Kubernetes on OpenStack
3. Kubernetes on OpenStack 구축 방벙
4. Kubernetes on OpenStack 운영 방법
Homer - Workshop at Kamailio World 2017Giacomo Vacca
Homer is an Open Source tool for real-time analysis and monitoring of VoIP and RTC platforms. It supports all the major OSS voice platforms, it's modular, easy to install and scales to carrier-grade infrastructures. Homer goes beyond collecting and correlating signalling and logs, and can also capture RTCP reports, QoS reports, and other events. Through an ElasticSearch endpoint, Homer supports BigData analysis of traffic.
This workshop focuses on the deployment of a multi-node Homer framework with various approaches: bash installers, Docker containers, Puppet.
We'll see how to configure Kamailio, FreeSWITCH (including the ESL interface), RTPEngine, Janus gateway (Events API), to collect signalling, RTCP reports, app-specific events and have them correlated and presented in a user-friendly GUI.
For advanced users, we'll present the installation of captagent, the standalone capture agent, hepgen.js to generate test traffic, and a Wireshark dissector to have full visibility of data flows.
XPDS14 - Towards Massive Server Consolidation - Filipe Manco, NECThe Linux Foundation
In recent years Xen has seen the development of many minimalistic or specialized virtual machines (e.g., OSv, Mirage, ClickOS, Erlang on Xen, etc.). Thanks in part to a small CPU and memory footprints, these VMs allow for running thousands or more on a single, inexpensive commodity server. Doing so could save cloud and network operators vast amounts of money.
Attempts to do so are already underway and have discovered important bottlenecks in Xen. While some of these have already been addressed by the community (e.g., limited number of event channels or memory grants) others still remain. In this talk we describe our experience when trying to run up to 10,000 MiniOS-based VMs, including bottlenecks in the XenStore, toolchain and network pipe. We further report on prototypical solutions, and on our implementation of suspend/resume for MiniOS that allows us tens of milliseconds migrations.
Kubernetes와 Kubernetes on OpenStack 환경의 비교와 그 구축방법에 대해서 알아봅니다.
1. 클라우드 동향
2. Kubernetes vs Kubernetes on OpenStack
3. Kubernetes on OpenStack 구축 방벙
4. Kubernetes on OpenStack 운영 방법
Homer - Workshop at Kamailio World 2017Giacomo Vacca
Homer is an Open Source tool for real-time analysis and monitoring of VoIP and RTC platforms. It supports all the major OSS voice platforms, it's modular, easy to install and scales to carrier-grade infrastructures. Homer goes beyond collecting and correlating signalling and logs, and can also capture RTCP reports, QoS reports, and other events. Through an ElasticSearch endpoint, Homer supports BigData analysis of traffic.
This workshop focuses on the deployment of a multi-node Homer framework with various approaches: bash installers, Docker containers, Puppet.
We'll see how to configure Kamailio, FreeSWITCH (including the ESL interface), RTPEngine, Janus gateway (Events API), to collect signalling, RTCP reports, app-specific events and have them correlated and presented in a user-friendly GUI.
For advanced users, we'll present the installation of captagent, the standalone capture agent, hepgen.js to generate test traffic, and a Wireshark dissector to have full visibility of data flows.
XPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, CitrixThe Linux Foundation
Storage systems continue to deliver better performance year after year. High performance solutions are now available off-the-shelf, allowing users to boost their servers with drives capable of achieving several GB/s worth of throughput per host. To fully utilise such devices, workloads with large queue depths are often necessary. In virtual environments, this translates into aggregate workloads coming from multiple virtual machines.
Having previously addressed the impact of low latency devices in virtualised platforms, we are now aiming at optimising aggregate workloads. We will discuss the existing memory grant technologies available in Xen and compare trade-offs and performance implications of each: grant mapping, persistent grants and grant copy. For the first time, we will present grant copy as an alternative and show measurements over 7 GB/s, maxing out a set of local SSDs.
Want to learn how Facebook scales their load balancing infrastructure to support more than 1.3 billion users? We will be revealing the technologies and methods we use to global route and balance Facebook's traffic. The Traffic team at Facebook has built several systems for managing and balancing our site traffic, including both a DNS load balancer and a software load balancer capable of handling several protocols. This talk will focus on these technologies and how they have helped improve user performance, manage capacity, and increase reliability.
SaltConf14 - Matthew Williams, Flowroute - Salt Virt for Linux contatiners an...SaltStack
This SaltConf14 talk by Matthew Williams of Flowroute shows the power of Salt Virt and Runner for creating and managing VMs and Linux containers. A demonstration of the Salt lxc module shows the simplicity with which containers and VMs can be created and configured.
Experience Report: Cloud Foundry Open Source Operations | anyninesanynines GmbH
Cloud Foundry and OpenStack are the biggest Open Source projects in their domain. As IaaS and PaaS walk hand in hand the idea of combining both worlds is close. anynines is running their public Cloud Foundry offering on top of OpenStack for more than three years with two years running on a self-hosted OpenStack setup. As head of public Paas operations Julian Weber has gained a lot of knowledge to share about setting up and operating Cloud Foundry installations. This presentation leads the audience through the journey of adopting the Cloud Foundry Open Source version and breeding it to a highly available and production ready Cloud Foundry setup. The listener is guided through the analysis of potential single points of failure in standard CF Open Source setups up to required changes in the Cloud Foundry OS release to reach our goal. As this talk is about Cloud Foundry operations we also need to talk about experiences with BOSH as a general purpose tool for software lifecycle management of big distributed systems and possible improvements to the BOSH tool set and workflows. The talk will enable advanced DevOps to dive deeper into the technical details of setting up production ready Cloud Foundry installations based on Cloud Foundry Open Source.
How Can OpenNebula Fit Your Needs: A European Project FeedbackNETWAYS
BonFIRE is an european project which aims at providing a ”multi-site cloud facility for applications, services and systems research and experimentation”. Grouping different research cloud providers behind a common set of tools, APIs and services, it enables users to run their experiment against a heterogeneous set of infrastructure, hypervisors, networks, etc …
BonFIRE, and thus the (OpenNebula) testbeds, provide a relatively small set of images used to boot VMs. However, the experimental nature of BonFIRE projects results in a big ”turnover” of running VMs. Lot of VMs are used for a time period between a few hours and a few days, and an experiment startup can trigger deployment of many VMs at same time on a small set of OpenNebula workers, which does not correspond to usual Cloud workflow.
Default OpenNebula is not optimized for such usecase (small amount of worker nodes, high VMs turnover). However, thanks to its ability to be easily modified at each level of a Cloud deployment workflow, OpenNebula has been tuned to make it fit better with BonFIRE deployment process. This presentation will explain how to change OpenNebula TM and VMM to improve the parrallel deployment of many VMs in a short amount of time, reducing time needed to deploy an experiment to its lowest without lot of expensive hardware.
Running Cloud Foundry for 12 months - An experience report | anyninesanynines GmbH
anynines ran a public PaaS located in a German datacenter based on Cloud Foundry. In more than 12 months of running a Cloud Foundry PaaS man lessons about security, high availability, open stack and many other exciting topics have been learned. See how Bosh can be used and how it shouldn't be used. Learn how to perform Cloud Foundry upgrades and read how to harden Cloud Foundry by adding more fault tolerance with pacemaker.
OpenStack and Ceph: the Winning Pair
By: Sebastien Han
Ceph has become increasingly popular and saw several deployments inside and outside OpenStack. The community and Ceph itself has greatly matured. Ceph is a fully open source distributed object store, network block device, and file system designed for reliability, performance,and scalability from terabytes to exabytes. Ceph utilizes a novel placement algorithm (CRUSH), active storage nodes, and peer-to-peer gossip protocols to avoid the scalability and reliability problems associated with centralized controllers and lookup tables. The main goal of the talk is to convince those of you who aren't already using Ceph as a storage backend for OpenStack to do so. I consider the Ceph technology to be the de facto storage backend for OpenStack for a lot of good reasons that I'll expose during the talk. Since the Icehouse OpenStack summit, we have been working really hard to improve the Ceph integration. Icehouse is definitely THE big release for OpenStack and Ceph. In this session, Sebastien Han from eNovance will go through several subjects such as: Ceph overview Building a Ceph cluster - general considerations Why is Ceph so good with OpenStack? OpenStack and Ceph: 5 minutes quick start for developers Typical architecture designs State of the integration with OpenStack (icehouse best additions) Juno roadmap and beyond.
Video Presentation: http://bit.ly/1iLwTNf
Kubernetes for HCL Connections Component Pack - Build or Buy?Martin Schmidt
HCL Connections V7 will be based on Kubernetes only! A parallel WebSphere environment won't be necessary any longer. Martin and Christoph collected the basics and differences in building a Kubernetes environment of your choice. They show you a comparison of an on-premises deployment versus a hosted cloud environment (Amazon EKS). After this session you have the basics to size and build a Kubernetes cluster for Component Pack, so you can start learning the new technology to take off with Connections V7 and become a Kubernaut.
It is no accident that Xen software powers some of the largest Clouds in existence. From its outset, the Xen Project was intended to enable what we now call Cloud Computing. This session will explore how the Xen Architecture addresses the needs of the Cloud in ways which facilitate security, throughput, and agility. It will also cover some of the hot new developments of the Xen Project.
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...The Linux Foundation
With the development of virtualization, there are more device assignment requirements. Based on VT-d interrupt remapping, Intel introduces VT-d interrupt posting as a more enhanced method to handle interrupts in the virtualization environment. The Posted Interrupts (PI) on CPU side has been already supported in Intel CPUs, with VT-d Posted Interrupt we can get some additional advantages, it can directly deliver external interrupts to running vCPUs without hypervisor involvement, decease the interrupt migration complexity, differentiate between urgent and non-urgent external interrupt, and avoid consuming host-vector for each interrupt to vCPU. In this presentation, Feng will talk about the mechanism of VT-d PI and its advantages, as well as some performance data of I/O intensive workload in Xen, which will show the performance gain after using VT-d PI.
Ganeti Web Manager: Cluster Management Made SimpleOSCON Byrum
Looking for an easy, scalable way to manage your Ganeti-based clusters? Ganeti Web Manager provides admins an easy to deploy, Django based GUI that effectively manages private clusters & works equally well for providing customers access. With a caching system designed to scale to thousands of virtual machines without decreasing performance, Ganeti Web Manager makes cluster management truly simple.
[KubeCon NA 2020] containerd: Rootless Containers 2020Akihiro Suda
Rootless Containers means running the container runtimes (e.g. runc, containerd, and kubelet) as well as the containers without the host root privileges. The most significant advantage of Rootless Containers is that it can mitigate potential container-breakout vulnerability of the runtimes, but it is also useful for isolating multi-user environments on HPC hosts. This talk will contain the introduction to rootless containers and deep-dive topics about the recent updates such as Seccomp User Notification. The main focus will be on containerd (CNCF Graduated Project) and its consumer projects including Kubernetes and Docker/Moby, but topics about other runtimes will be discussed as well.
https://sched.co/fGWc
XPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, CitrixThe Linux Foundation
Storage systems continue to deliver better performance year after year. High performance solutions are now available off-the-shelf, allowing users to boost their servers with drives capable of achieving several GB/s worth of throughput per host. To fully utilise such devices, workloads with large queue depths are often necessary. In virtual environments, this translates into aggregate workloads coming from multiple virtual machines.
Having previously addressed the impact of low latency devices in virtualised platforms, we are now aiming at optimising aggregate workloads. We will discuss the existing memory grant technologies available in Xen and compare trade-offs and performance implications of each: grant mapping, persistent grants and grant copy. For the first time, we will present grant copy as an alternative and show measurements over 7 GB/s, maxing out a set of local SSDs.
Want to learn how Facebook scales their load balancing infrastructure to support more than 1.3 billion users? We will be revealing the technologies and methods we use to global route and balance Facebook's traffic. The Traffic team at Facebook has built several systems for managing and balancing our site traffic, including both a DNS load balancer and a software load balancer capable of handling several protocols. This talk will focus on these technologies and how they have helped improve user performance, manage capacity, and increase reliability.
SaltConf14 - Matthew Williams, Flowroute - Salt Virt for Linux contatiners an...SaltStack
This SaltConf14 talk by Matthew Williams of Flowroute shows the power of Salt Virt and Runner for creating and managing VMs and Linux containers. A demonstration of the Salt lxc module shows the simplicity with which containers and VMs can be created and configured.
Experience Report: Cloud Foundry Open Source Operations | anyninesanynines GmbH
Cloud Foundry and OpenStack are the biggest Open Source projects in their domain. As IaaS and PaaS walk hand in hand the idea of combining both worlds is close. anynines is running their public Cloud Foundry offering on top of OpenStack for more than three years with two years running on a self-hosted OpenStack setup. As head of public Paas operations Julian Weber has gained a lot of knowledge to share about setting up and operating Cloud Foundry installations. This presentation leads the audience through the journey of adopting the Cloud Foundry Open Source version and breeding it to a highly available and production ready Cloud Foundry setup. The listener is guided through the analysis of potential single points of failure in standard CF Open Source setups up to required changes in the Cloud Foundry OS release to reach our goal. As this talk is about Cloud Foundry operations we also need to talk about experiences with BOSH as a general purpose tool for software lifecycle management of big distributed systems and possible improvements to the BOSH tool set and workflows. The talk will enable advanced DevOps to dive deeper into the technical details of setting up production ready Cloud Foundry installations based on Cloud Foundry Open Source.
How Can OpenNebula Fit Your Needs: A European Project FeedbackNETWAYS
BonFIRE is an european project which aims at providing a ”multi-site cloud facility for applications, services and systems research and experimentation”. Grouping different research cloud providers behind a common set of tools, APIs and services, it enables users to run their experiment against a heterogeneous set of infrastructure, hypervisors, networks, etc …
BonFIRE, and thus the (OpenNebula) testbeds, provide a relatively small set of images used to boot VMs. However, the experimental nature of BonFIRE projects results in a big ”turnover” of running VMs. Lot of VMs are used for a time period between a few hours and a few days, and an experiment startup can trigger deployment of many VMs at same time on a small set of OpenNebula workers, which does not correspond to usual Cloud workflow.
Default OpenNebula is not optimized for such usecase (small amount of worker nodes, high VMs turnover). However, thanks to its ability to be easily modified at each level of a Cloud deployment workflow, OpenNebula has been tuned to make it fit better with BonFIRE deployment process. This presentation will explain how to change OpenNebula TM and VMM to improve the parrallel deployment of many VMs in a short amount of time, reducing time needed to deploy an experiment to its lowest without lot of expensive hardware.
Running Cloud Foundry for 12 months - An experience report | anyninesanynines GmbH
anynines ran a public PaaS located in a German datacenter based on Cloud Foundry. In more than 12 months of running a Cloud Foundry PaaS man lessons about security, high availability, open stack and many other exciting topics have been learned. See how Bosh can be used and how it shouldn't be used. Learn how to perform Cloud Foundry upgrades and read how to harden Cloud Foundry by adding more fault tolerance with pacemaker.
OpenStack and Ceph: the Winning Pair
By: Sebastien Han
Ceph has become increasingly popular and saw several deployments inside and outside OpenStack. The community and Ceph itself has greatly matured. Ceph is a fully open source distributed object store, network block device, and file system designed for reliability, performance,and scalability from terabytes to exabytes. Ceph utilizes a novel placement algorithm (CRUSH), active storage nodes, and peer-to-peer gossip protocols to avoid the scalability and reliability problems associated with centralized controllers and lookup tables. The main goal of the talk is to convince those of you who aren't already using Ceph as a storage backend for OpenStack to do so. I consider the Ceph technology to be the de facto storage backend for OpenStack for a lot of good reasons that I'll expose during the talk. Since the Icehouse OpenStack summit, we have been working really hard to improve the Ceph integration. Icehouse is definitely THE big release for OpenStack and Ceph. In this session, Sebastien Han from eNovance will go through several subjects such as: Ceph overview Building a Ceph cluster - general considerations Why is Ceph so good with OpenStack? OpenStack and Ceph: 5 minutes quick start for developers Typical architecture designs State of the integration with OpenStack (icehouse best additions) Juno roadmap and beyond.
Video Presentation: http://bit.ly/1iLwTNf
Kubernetes for HCL Connections Component Pack - Build or Buy?Martin Schmidt
HCL Connections V7 will be based on Kubernetes only! A parallel WebSphere environment won't be necessary any longer. Martin and Christoph collected the basics and differences in building a Kubernetes environment of your choice. They show you a comparison of an on-premises deployment versus a hosted cloud environment (Amazon EKS). After this session you have the basics to size and build a Kubernetes cluster for Component Pack, so you can start learning the new technology to take off with Connections V7 and become a Kubernaut.
It is no accident that Xen software powers some of the largest Clouds in existence. From its outset, the Xen Project was intended to enable what we now call Cloud Computing. This session will explore how the Xen Architecture addresses the needs of the Cloud in ways which facilitate security, throughput, and agility. It will also cover some of the hot new developments of the Xen Project.
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...The Linux Foundation
With the development of virtualization, there are more device assignment requirements. Based on VT-d interrupt remapping, Intel introduces VT-d interrupt posting as a more enhanced method to handle interrupts in the virtualization environment. The Posted Interrupts (PI) on CPU side has been already supported in Intel CPUs, with VT-d Posted Interrupt we can get some additional advantages, it can directly deliver external interrupts to running vCPUs without hypervisor involvement, decease the interrupt migration complexity, differentiate between urgent and non-urgent external interrupt, and avoid consuming host-vector for each interrupt to vCPU. In this presentation, Feng will talk about the mechanism of VT-d PI and its advantages, as well as some performance data of I/O intensive workload in Xen, which will show the performance gain after using VT-d PI.
Ganeti Web Manager: Cluster Management Made SimpleOSCON Byrum
Looking for an easy, scalable way to manage your Ganeti-based clusters? Ganeti Web Manager provides admins an easy to deploy, Django based GUI that effectively manages private clusters & works equally well for providing customers access. With a caching system designed to scale to thousands of virtual machines without decreasing performance, Ganeti Web Manager makes cluster management truly simple.
[KubeCon NA 2020] containerd: Rootless Containers 2020Akihiro Suda
Rootless Containers means running the container runtimes (e.g. runc, containerd, and kubelet) as well as the containers without the host root privileges. The most significant advantage of Rootless Containers is that it can mitigate potential container-breakout vulnerability of the runtimes, but it is also useful for isolating multi-user environments on HPC hosts. This talk will contain the introduction to rootless containers and deep-dive topics about the recent updates such as Seccomp User Notification. The main focus will be on containerd (CNCF Graduated Project) and its consumer projects including Kubernetes and Docker/Moby, but topics about other runtimes will be discussed as well.
https://sched.co/fGWc
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/luxoft/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Alexey Rybakov, Senior Director at LUXOFT, presents the "Making Computer Vision Software Run Fast on Your Embedded Platform" tutorial at the May 2016 Embedded Vision Summit.
Many computer vision algorithms perform well on desktop class systems, but struggle on resource constrained embedded platforms. This how-to talk provides a comprehensive overview of various optimization methods that make vision software run fast on low power, small footprint hardware that is widely used in automotive, surveillance, and mobile devices. The presentation explores practical aspects of deep algorithm and software optimization such as thinning of input data, using dynamic regions of interest, mastering data pipelines and memory access, overcoming compiler inefficiencies, and more.
Docker is going to change the way services are deployed by encapsulating them, thus provide a robust and complete continous development/deployment workflow. But how is docker impacting the compute part of HPC? Containerization uses kernel functions (Namespaces/CGroups) to encapsulate the processes, while allowing a fain grained customization of the compute stack; sitting on top of a stripped down bare-metal OS that provides basic services.
This sessions aims to explore if docker already is the 'virtualization' technique the HPC community is waiting for.
Could a distributed MPI job across multiple containers placed on different physical nodes beat a natively started job?
In this talk we will discuss how to build and run containers without root privileges. As part of the discussion, we will introduce new programs like fuse-overlayfs and slirp4netns and explain how it is possible to do this using user namespaces. fuse-overlayfs allows to use the same storage model as "root" containers and use layered images. slirp4netns emulates a TCP/IP stack in userland and allows to use a network namespace from a container and let it access the outside world (with some limitations).
We will also introduce Usernetes, and how to run Kubernetes in an unprivileged user namespace
https://sched.co/Jcgg
Latest (storage IO) patterns for cloud-native applications OpenEBS
Applying micro service patterns to storage giving each workload its own Container Attached Storage (CAS) system. This puts the DevOps persona within full control of the storage requirements and brings data agility to k8s persistent workloads. We will go over the concept and the implementation of CAS, as well as its orchestration.
An overview how IceCube and LIGO make use of the PRP/TNRP Nautilus distributed Kubernetes cluster.
Presented at GRP'19 http://grp-workshop-2019.ucsd.edu
Network services on Kubernetes on premiseHans Duedal
Deep dive into Kubernetes Networking and presentation of a usecase of running network services like DNS on a bare metal Kubernetes cluster for a major Danish e-sport event.
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
Delivery of a new Bio-informatics infrastructure at the Wellcome Trust Sanger Center. We include how to programatically create, manage and provide providence for images used both at Sanger and elsewhere using open source tools and continuous integration.
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)Tibo Beijen
Slides of the presentation about Kubernetes practices and learnings at NU.nl.
This presentation was the first of two at the Dutch Kubernetes meetup at the Sanoma Netherlands offices, that took place on Sept. 5th 2019
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageMayaData Inc
Webinar Session - https://youtu.be/_5MfGMf8PG4
In this webinar, we share how the Container Attached Storage pattern makes performance tuning more tractable, by giving each workload its own storage system, thereby decreasing the variables needed to understand and tune performance.
We then introduce MayaStor, a breakthrough in the use of containers and Kubernetes as a data plane. MayaStor is the first containerized data engine available that delivers near the theoretical maximum performance of underlying systems. MayaStor performance scales with the underlying hardware and has been shown, for example, to deliver in excess of 10 million IOPS in a particular environment.
Similar to Introducing Container Technology to TSUBAME3.0 Supercomputer (20)
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Online aptitude test management system project report.pdfKamal Acharya
The purpose of on-line aptitude test system is to take online test in an efficient manner and no time wasting for checking the paper. The main objective of on-line aptitude test system is to efficiently evaluate the candidate thoroughly through a fully automated system that not only saves lot of time but also gives fast results. For students they give papers according to their convenience and time and there is no need of using extra thing like paper, pen etc. This can be used in educational institutions as well as in corporate world. Can be used anywhere any time as it is a web based application (user Location doesn’t matter). No restriction that examiner has to be present when the candidate takes the test.
Every time when lecturers/professors need to conduct examinations they have to sit down think about the questions and then create a whole new set of questions for each and every exam. In some cases the professor may want to give an open book online exam that is the student can take the exam any time anywhere, but the student might have to answer the questions in a limited time period. The professor may want to change the sequence of questions for every student. The problem that a student has is whenever a date for the exam is declared the student has to take it and there is no way he can take it at some other time. This project will create an interface for the examiner to create and store questions in a repository. It will also create an interface for the student to take examinations at his convenience and the questions and/or exams may be timed. Thereby creating an application which can be used by examiners and examinee’s simultaneously.
Examination System is very useful for Teachers/Professors. As in the teaching profession, you are responsible for writing question papers. In the conventional method, you write the question paper on paper, keep question papers separate from answers and all this information you have to keep in a locker to avoid unauthorized access. Using the Examination System you can create a question paper and everything will be written to a single exam file in encrypted format. You can set the General and Administrator password to avoid unauthorized access to your question paper. Every time you start the examination, the program shuffles all the questions and selects them randomly from the database, which reduces the chances of memorizing the questions.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Introducing Container Technology to TSUBAME3.0 Supercomputer
1. Akihiro Nomura2019-06-17 ISC’19, Frankfurt Germany
Global Scientific Information and
Computing Center
Introducing Container Technology to
TSUBAME3.0 Supercomputer
Part of this work was supported by JST CREST Grant Number JPMJCR1501, Japan
Part of this work was conducted as research activities of AIST - Tokyo Tech
Real World Big-Data Computation Open Innovation Laboratory (RWBC-OIL)
2. What TSUBAME3.0 is?
• TSUBAME: Supercomputer series in Tokyo Tech
• TSUBAME1.0: ClearSpeed
• TSUBAME1.2: NVIDIA Tesla S1070
• The 1st supercomputer using GPU as compute unit (2008.11, T1.2)
• TSUBAME2.0: NVIDIA Tesla M2050 → TSUBAME2.5: K20X
• Operated for 7 years: 2010.11 – 2017.10
• TSUBAME-KFC(/DL): Oil-submerged supercomputer testbed
• #1 in Green500 (2013.11, 2014.06)
• TSUBAME3.0: NVIDIA Tesla P100
• #1 in Green500 (2017.06)
• Currently #25 in TOP500 (2019.06)
• Operation started on 2017.08
• ~1500 users from academia + industry, not only from Tokyo Tech
• Various application domain, expertise, software (ISV, self-made, serial, parallel)
• Important to keep research secret from other users
2019/6/17 ISC-HPC 2019 2
3. Experience in 7-year operation of
TSUBAME 2 supercomputer
• Our previous TSUBAME2 operated from 2010.11 to 2017.10
• We faced many problems during long-term operations
• Resource separation is required
• How to maintain software up-to-date
• How to ban users overeating resources in shared node
• How to make the system energy efficient
• How to cope with decaying network cables
(SIAM PP18)
• Other problems that I cannot disclose, or
do not want to remember again
2019/6/17 ISC-HPC 2019 3
4. Why resource separation ?
• TSUBAME2 1408 compute nodes are too fat
• To utilize 3GPU+2CPU/node config, users need to program with
CUDA/OpenACC for GPU, OpenMP for intra-node comm, MPI for inter-
node comm… just too hard for most users
• Three types of users (or workloads)
• Expert, Guru: fully utilize 3GPU+2CPU/node config
• GPU user: use 1~3 GPUs, but not so much for CPU threads
• CPU user: don’t use GPU at all
• Assigning full node to all users is just a waste of resource
2019/6/17 ISC-HPC 2019 4
5. Resource separation accomplished in T2
(in 2010)
• VM(KVM) based approach
• Run CPU VM inside GPU-job nodes
• GPU couldn’t be virtualized
• NW performance is limited due to IPoIB
• Nice usability
• Users can ssh into both GPU part and CPU part for debug / monitoring
• Many TSUBAME1 users did during their job
• Good isolation
• GPU user cannot see what’s going on in CPU part and vice versa
• Bad flexibility
• We cannot change the # of nodes to be split into two dynamically
CPU
CPU
GPU
GPU
GPU
IB
HCA
IB
HCA
4cores
8cores
G
Bare-Metal
U/V
VM
IP over IB
2019/6/17 ISC-HPC 2019 5
6. What happens in SW env,
if we operate one system for long time
• Everything gets stale
• System software compatibility problem
• GPU and Lustre drivers won’t support 5-years-old OS distro
• OS support problem
• OS vendors drop support of 5-years-old distro
• ISV software compatibility problem
• Some newer version won’t work in old OS
• Some stable version isn’t verified in new OS
• Library version hell
• Upgrade to newer OS version is painful
• Everything must be validated again, esp. in ISV software
• We did once (SLES11 → SLES11SP3, 2014.08), with large cost
2019/6/17 ISC-HPC 2019 6
7. When I tried to install Caffe to T2.5
(2015.05)
• SLES11SP3, two years from release, <1 year from system update
• SP4 appeared just after verification and installation
• Got request from a user on Friday evening, thought it’s easy
• Experienced library hell, took 3 days to install it
• Lots of missing libraries
• >20 Python packages, gflags, glog, leveldb, lmdb, protobuf, snappy, cuDNN, OpenCV
• GCC is too old, let’s install it…
• Ah, I need to recompile everything with new GCC…
• Also tried Tensorflow later days, but I abandoned
• Some binary-shipped part requires newer glibc
∴ Introducing bleeding-edge software to old system is quite painful
2019/6/17 ISC-HPC 2019 7
8. Our expectation to container technology
for upcoming TSUBAME3 (as of late 2016)
• We just wanted something we can
• Make OS kernel version and userland version independent
• Provide new system software and libraries with least cost
• Provide old userland if necessary
• Then we can skip validation of all ISVs in newer environment
• Also (partially) useful to replay old experiment later
• Split resources (CPU, GPU, Memory, NW) without performance
drawback
• Secure isolation between separated partitions
• Dynamic partitioning
• Allow users to do what they did in previous systems
• In our case, SSH to compute node while a job is running
2019/6/17 ISC-HPC 2019 8
9. Our choices for resource separation
(again, as of late 2016)
2019/6/17 ISC-HPC 2019 9
• VM and Docker was available choice
• Other container technology (Shifter, Singularity, …) was not mature
VM Metrics Docker container
GPU: virtualized
Interconnect:
IB: supports SR-IOV
OmniPath: No support
Performance Almost no overhead
SSH is not a problem Usability SSH into container requires some integration
Isolated w/o problem Isolation If cgroup works well, it’s OK
Hard to deploy OS dynamically Userland
virtualization
Userland can be chosen
VM on/off is costly Flexibility Container itself won’t be a problem
We didn’t specify VM or Docker explicitly,
but requested functionality in procurement
The vendor choose Docker
10. How TSUBAME3 node looks like
• The node is larger than T2
• 28 CPU cores
• 4 GPUs
• 4 Omni-Path HFIs
• Too huge for most of users
• Expert, Guru
• GPU user
• CPU user
• We expect most of users to
split the node
2019/6/17 ISC-HPC 2019 10
11. How we separate the node physically
• Separate the node
hierarchically
• Inspired by buddy system in
Linux kernel’s slab memory
allocator
• Less flexible because of fixed
mem/CPU mem/GPU ratio
• Better scheduling to minimize
scattered resources
2019/6/17 ISC-HPC 2019 11
CPU
CPU
GPU
GPU
GPU
GPU
OPA
HFI
OPA
HFI
OPA
HFI
OPA
HFI
14cores H
7cores Q
2cores
4cores
G
C4
12. Resource Utilization in TSUBAME3
(2019.04)
• ~70% of Jobs (based on
vnode×time) are running on
separated node, rather than full
node
• Sum of vnode×time product
exceeded 540 × 30day in busy
months
• We couldn’t serve jobs without
partitioning
2019/6/17 ISC-HPC 2019 12
13. How we separate the node logically
• Integration by HPE (primary vendor) and UNIVA (scheduler vendor)
• Just using cgroup
• To achieve the minimal goal of resource separation in short development time
• Userland virtualization is not urgent, should be implemented by when the initial
userland become obsoleted
• SSH to (part of) compute nodes are desirable, but not requisite
• Using Docker, integrated with scheduler
• To achieve the full goal, including triaged goals in cgroup impl.
• Multi-node docker integration was challenging, no predecessor at that time
• It took almost two years to make docker part serviced
• Integration broke scheduling priorities etc. in specific situation
• Finally started Docker-based service on 2019.04
2019/6/17 ISC-HPC 2019 13
14. Our requirement to container technology
• DO NOT PASS root TO USER
• We use several filesystems in our network
• Cluster NFS for home storage
• Lustre for high-speed shared storage
• (local SSD + BeeOND)
• We MUST prevent users to access data of other users
→ We decided NOT to allow users to bring their own images
• In docker, root in container is (sometimes restricted) root in host OS
• We cannot filter malicious image that allows to escape from jail
• Files with setuid bit, local vulnerability exploit, …
• Drawback: users cannot bring the images
• We initially thought that’s not a problem, or inevitable compromise
2019/6/17 ISC-HPC 2019 14
15. Time flies like an arrow, in just 2 years
• During introduction
and preparation,
container tech evolved
rapidly and we were
out of sync
• What users expect to
container was not
what we planned to do
with container
• Lots of application
container appeared,
including HPC apps
Pics from
http://www.projectcartoon.com
2019/6/17 ISC-HPC 2019 15
16. Other container choices: Singularity
• Docker was general purpose container
• Not designed to be used by untrusted users
• HPC-aware container are being implemented
• Shifter
• Prevent users in container image from getting root
• Singularity
• Run container without root (except for startup, cgroup and FS mount)
• There are security document describing setuid-related implementation!!
• Can we accept user-brought container images using Singularity?
2019/6/17 ISC-HPC 2019 16
17. Introducing Singularity to TSUBAME3.0
(2018.08-09)
• Request came from a user, with a pointer to security
consideration document
• Checked source code of singularity (setuid-related part) with
multiple staffs
• Discussion in research computer system audit board
• Not usual path for ordinary software, but Singularity requires setuid
binary
• Finally installed Singularity 2.6
• Singularity 3.2.1 is also available, from last week
• Did the same setuid-related code check as implementation changed
2019/6/17 ISC-HPC 2019 17
18. Pros and cons for Docker and Singularity
2019/6/17 ISC-HPC 2019 18
• Note: it’s just TSUBAME3’s case
Metrics should vary in different supercomputer sites
Docker in TSUBAME3 Metrics Singularity in TSUBAME3
Can SSH into container
IP address is assigned
Usability Running daemon inside container is not supported
No IP address is assigned
Already Integrated Isolation Need to be done from outside, but possible
Userland can be chosen
Only by system admins
Userland
virtualization
User can bring arbitrary images
Delayed to 2019.04 Service Start 2018.09
19. Yes, HPC container started working with
Singularity, that’s all?
• Unfortunately NO in MPI apps
• Requires integration of both kernel(host)-level drivers and userland libs
• Also, process launch must be done in host side, not from container
• mpirun …… singularity exec …… path/to/mpiapp
• Many container implementation has mechanics to fill the gap of
NVIDIA GPU driver version difference
• NVIDIA-docker, --nv option of Singularity…
• Yes, TSUBAME3 is NVIDIA GPU Cloud Ready
• TSUBAME3 uses OmniPath, while other HPC sites often uses
InfiniBand (or Tofu, Aries, …)
• Users (except for guru) don’t care which the underlying interconnect is
• Unlike accelerators, users don’t expect CUDA works in FPGA
• However, system software required in the container is different
2019/6/17 ISC-HPC 2019 19
20. Container
What we expect for MPI-impl
independent container
• MPI equivalent of --nv option?
• Auto-introduces MPI related system software
• Requires MPI ABI compatibility in some level
• MPI ABI compatibility initiatives
• libfabrics
• Recompile MPI apps with specific MPI,
when the image is built for specific system
• Fat container images to choose MPI lib
dynamically?
2019/6/17 ISC-HPC 2019 20
App
MPI for
InfiniBand
MPI for
OmniPath
21. Wrap up
• We tried to introduce Docker to TSUBAME3.0 in order to implement resource
separation and flexible userland update, not targeting container as goal, but just
as tools
• However, users expectation to container was different from what we thought
• Obtaining full goal at once with Docker was too adventurous and took very long
time to get it in service, but now working well
• It sometimes is important to change mind during system operation, opinion from
users are important
• For system administrator, security documentation is very important
• To run massively parallel applications everywhere using container, it still have
several problems to solve
• I believe I did (and am doing) something stupid, due to historical reasons, or just
not knowing appropriate technology
• Your input is always welcome
2019/6/17 ISC-HPC 2019 21
22. Acknowledgement
• TSUBAME3 operation working group members
• ~15 faculty and other staff members
• HPE and UNIVA engineers, who finally realized container-base
TSUBAME3.0 system with lots of effort
• We expect upgrading to SLES15 in 2020.03
• Many container vendors for formal or informal discussions
• And Users, especially those who requested bleeding edge software
• TSUBAME Computing Services: https://www.t3.gsic.titech.ac.jp/en/
2019/6/17 ISC-HPC 2019 22
Resources