ElasticHPC Supports the creation and management of cloud computing resources over multiple public cloud Providers Including Amazon, Azure, Google and Clouds supporting OpenStack.
The Case For Docker In Multi-Cloud Enabled Bioinformatics ApplicationsAhmed Abdullah
We have introduced elasticHPC-Docker based on container technology. Our package enables the creation of a computer cluster with containerized applications and workflows in private and in different commercial clouds using single interface. It also includes options to manage the cluster, to deploy and run bioinformatics applications for large datasets, and to interface with image registries.
Java is finally elastic! OpenJDK improvements and new features in Garbage Collection technology resulted in enhancing Java vertical scaling and resource consumption. Now JVM can promptly return unused memory and, as result it can go up and down automatically. In this presentation, we cover the main achievements in vertical scaling direction, as well as share peculiarities and tuning details of different GCs. Find out how to make your Java environments more elastic to follow the load and lower down the total cost of ownership at a large scale.
This document discusses cloud-native deployment and Kubernetes. It describes how containers isolate applications and enable portable, consistent deployment across environments. Kubernetes provides a platform for automating deployment, scaling, and management of containerized applications. It schedules containers on hosts and provides services for load balancing and discovery. The document outlines how Kubernetes uses immutable deployments, secrets, and configuration maps to deploy applications in a cloud-native way without breaking production systems during upgrades.
In this video from the 2017 HPC Advisory Council Stanford Conference, Christian Kniep from Gaikai presents: Best Practices: State of Linux Containers.
"Linux Containers gain more and more momentum in all IT ecosystems. This talk provides an overview about what happened in the container landscape (in particular Docker) during the course of the last year and how it impacts datacenter operations, HPC and High-Performance Big Data. Furthermore Christian will give an update/extend on the ‘things to explore’ list he presented in the last Lugano workshop, applying what he learned and came across during the year 2016."
Watch the video: http://wp.me/p3RLHQ-glP
Learn more: http://qnib.org
and
http://www.hpcadvisorycouncil.com/events/2017/stanford-workshop/
Sign up for our insideHPC Newsletter: http:/insidehpc.com/newsletter
Containerizing GPU Applications with Docker for Scaling to the CloudSubbu Rama
This document discusses containerizing GPU applications with Docker to enable scaling to the cloud. It describes how containers can solve problems of hardware and software portability by allowing applications to run consistently across different infrastructure. The document demonstrates how to build a GPU container using Dockerfiles and deploy it across multiple clouds. It also introduces Boost Containers which combine Bitfusion Boost technology with containers to build virtual GPU machines and clusters, enabling flexible scheduling of GPU workflows without code changes.
The document summarizes upgrades made to the SVG supercomputer in 2012, including:
- Upgrading to Sandy Bridge processors with 192 cores and 1.5TB memory on thin nodes and 512GB memory on fat nodes.
- Installing an Infiniband FDR 56Gb/s network with 4Tb/s bandwidth and 1us MPI latency.
- Configuring queues to take advantage of the Infiniband network and turbo boost, allowing up to 112 cores and 1024GB memory per job.
- Benchmark results showed peak performance of 3788 GFlops on thin nodes and 563 GFlops on fat nodes.
Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery. Kubernetes masters manage worker nodes, and pods which are the basic building blocks, containing one or more containers. It provides self-healing, horizontal pod autoscaling, service discovery, load balancing, configuration management.
The Case For Docker In Multi-Cloud Enabled Bioinformatics ApplicationsAhmed Abdullah
We have introduced elasticHPC-Docker based on container technology. Our package enables the creation of a computer cluster with containerized applications and workflows in private and in different commercial clouds using single interface. It also includes options to manage the cluster, to deploy and run bioinformatics applications for large datasets, and to interface with image registries.
Java is finally elastic! OpenJDK improvements and new features in Garbage Collection technology resulted in enhancing Java vertical scaling and resource consumption. Now JVM can promptly return unused memory and, as result it can go up and down automatically. In this presentation, we cover the main achievements in vertical scaling direction, as well as share peculiarities and tuning details of different GCs. Find out how to make your Java environments more elastic to follow the load and lower down the total cost of ownership at a large scale.
This document discusses cloud-native deployment and Kubernetes. It describes how containers isolate applications and enable portable, consistent deployment across environments. Kubernetes provides a platform for automating deployment, scaling, and management of containerized applications. It schedules containers on hosts and provides services for load balancing and discovery. The document outlines how Kubernetes uses immutable deployments, secrets, and configuration maps to deploy applications in a cloud-native way without breaking production systems during upgrades.
In this video from the 2017 HPC Advisory Council Stanford Conference, Christian Kniep from Gaikai presents: Best Practices: State of Linux Containers.
"Linux Containers gain more and more momentum in all IT ecosystems. This talk provides an overview about what happened in the container landscape (in particular Docker) during the course of the last year and how it impacts datacenter operations, HPC and High-Performance Big Data. Furthermore Christian will give an update/extend on the ‘things to explore’ list he presented in the last Lugano workshop, applying what he learned and came across during the year 2016."
Watch the video: http://wp.me/p3RLHQ-glP
Learn more: http://qnib.org
and
http://www.hpcadvisorycouncil.com/events/2017/stanford-workshop/
Sign up for our insideHPC Newsletter: http:/insidehpc.com/newsletter
Containerizing GPU Applications with Docker for Scaling to the CloudSubbu Rama
This document discusses containerizing GPU applications with Docker to enable scaling to the cloud. It describes how containers can solve problems of hardware and software portability by allowing applications to run consistently across different infrastructure. The document demonstrates how to build a GPU container using Dockerfiles and deploy it across multiple clouds. It also introduces Boost Containers which combine Bitfusion Boost technology with containers to build virtual GPU machines and clusters, enabling flexible scheduling of GPU workflows without code changes.
The document summarizes upgrades made to the SVG supercomputer in 2012, including:
- Upgrading to Sandy Bridge processors with 192 cores and 1.5TB memory on thin nodes and 512GB memory on fat nodes.
- Installing an Infiniband FDR 56Gb/s network with 4Tb/s bandwidth and 1us MPI latency.
- Configuring queues to take advantage of the Infiniband network and turbo boost, allowing up to 112 cores and 1024GB memory per job.
- Benchmark results showed peak performance of 3788 GFlops on thin nodes and 563 GFlops on fat nodes.
Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery. Kubernetes masters manage worker nodes, and pods which are the basic building blocks, containing one or more containers. It provides self-healing, horizontal pod autoscaling, service discovery, load balancing, configuration management.
Scaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaSJelastic Multi-Cloud PaaS
In this presentation, you'll find out what metrics should be tracked in order to meet the load requirements of application, how to finetune scaling triggers in order to efficiently handle different load levels, how to automate vertical and horizontal scaling of Jakarta EE applications running in the cloud.
Also, we share how to integrate load performance testing tools for adjusting horizontal scaling and making sure that your application can cope with production workloads.
Practical side is shown based on Jelastic PaaS https://jelastic.com/
A Brief introduction to Amazon ECS, Dockerization of Spring boot application, CI/CD and notifications using Slack.
This PPT also explains how CI/CD pipeline can be build using Jenkins. And
Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery called Pods. ReplicaSets ensure that a specified number of pod replicas are running at any given time. Key components include Pods, Services for enabling network access to applications, and Deployments to update Pods and manage releases.
This document provides an overview of Kubernetes basics. It introduces Kubernetes as an open source container orchestration tool developed by Google to manage the lifecycle of containers. It describes common Kubernetes concepts like pods, deployments, services, and how to install Kubernetes on local, on-premise and cloud environments. It also covers important topics for production use such as health checks, resource restrictions, logging, monitoring and alerts.
Kubernetes is an open-source platform for automating deployment, scaling, and management of containerized applications. It groups containerized applications into logical units called pods and uses labels to select pods and services for management at scale. Kubernetes masters manage the state of the cluster through the API server, scheduler and controller manager, while nodes run the pods and services and report back to the master.
Overview of kubernetes and its use as a DevOps cluster management framework.
Problems with deployment via kube-up.sh and improving kubernetes on AWS via custom cloud formation template.
The keynote discussed how containers can provide robustness and improved utilization of resources. Containers isolate applications and enable sets of applications called pods to run together with shared resources. The key challenges discussed were unpredictable interference between containers, low resource utilization, and hard to enforce isolation. Solutions presented were using cgroups for isolation, allowing "slack" resources to be used for lower priority tasks, and moving enforcement directly into the kernel. Kubernetes was introduced as an open source project for orchestrating pods across multiple machines through replication and reconciliation of the actual vs desired state.
Kubernetes for Beginners: An Introductory GuideBytemark
Kubernetes is an open-source tool for managing containerized workloads and services. It allows for deploying, maintaining, and scaling applications across clusters of servers. Kubernetes operates at the container level to automate tasks like deployment, availability, and load balancing. It uses a master-slave architecture with a master node controlling multiple worker nodes that host application pods, which are groups of containers that share resources. Kubernetes provides benefits like self-healing, high availability, simplified maintenance, and automatic scaling of containerized applications.
Federated mesos clusters for global data center designsKrishna-Kumar
The document describes Huawei's approach to federating Mesos clusters across multiple data centers. It proposes a multi-master federation approach where each data center runs its own Mesos master that coordinates with other masters. Gossiper modules in each data center gossip with each other to exchange framework and resource information. When one data center reaches a resource threshold, its gossiper will direct work to other data centers based on a simple policy engine. A demo visualization is shown to illustrate work load balancing during normal and failure scenarios.
Recent momentum around the evolution of Containers are gradually increase in last two years.Containers virtualize an OS and applications running in each container believe that they have full access to their very own copy of that OS. This is analogous to what VMs do when they virtualize at a lower level, the hardware. In the case of containers, it’s the OS that does the virtualization and maintains the illusion.
Recent past many software companies have quickly adopted container technologies, including Docker Containers, aware of the threat and advantage of the approach. For example, Linux companies have also jumped into the ground, seeing as this as an opportunity to grow the Linux market. Also Microsoft is going to add features to support containers and VMware have made efforts in integrating support for Docker into virtual machine technology.
Kubernetes is a container orchestration platform that provides a mechanism to manage the resources of containers in the cluster. That mechanism is known as "Requests and Limits".
Requests and limits play a key role not only in resource management but also in applications stability, capacity planning, scheduling the resources (i.e., on which node the pod will be running).
In this session we will cover:
- A quick review of Containers, Docker, and Kubernetes.
- Containers resource management in Kubernetes.
- Containers resource types in Kubernetes.
- 3 different ways to set requests and limits.
- The difference between capacity and allocatable resources.
- Tips and recap.
Slides used for Orchestructure May 2018 workshop.
Labs:
https://github.com/mrbobbytables/k8s-intro-tutorials
Event Information:
https://www.meetup.com/orchestructure/events/250189685/
Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It was originally developed by Google based on years of experience running production workloads at scale. Kubernetes groups containers into logical units called pods and handles tasks like scheduling, health checking, scaling and rollbacks. The main components include a master node that manages the cluster and worker nodes that run application containers scheduled by the master.
Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery called pods. Its main components include a master node that manages the cluster and worker nodes that run the applications. It uses labels to identify pods and services and selectors to group related pods. Common concepts include deployments for updating apps, services for network access, persistent volumes for storage, and roles/bindings for access control. The deployment process involves the API server, controllers, scheduler and kubelet to reconcile the desired state and place pods on nodes from images while providing discovery and load balancing.
Federated Kubernetes: As a Platform for Distributed Scientific ComputingBob Killen
A high level overview of Kubernetes Federation and the challenges encountered when building out a Platform for multi-institutional Research and Distributed Scientific Computing.
This document provides an overview of Kubernetes, a container orchestration system. It begins with background on Docker containers and orchestration tools prior to Kubernetes. It then covers key Kubernetes concepts including pods, labels, replication controllers, and services. Pods are the basic deployable unit in Kubernetes, while replication controllers ensure a specified number of pods are running. Services provide discovery and load balancing for pods. The document demonstrates how Kubernetes can be used to scale, upgrade, and rollback deployments through replication controllers and services.
This document provides an agenda and instructions for learning Kubernetes in 90 minutes. The agenda includes exercises on running a first web service in Kubernetes, revisiting pods, deployments and services, deploying with YAML files, and installing a microservices application called Guestbook. Key Kubernetes concepts covered include pods, deployments, services, YAML descriptors, and using deployments to scale applications. The document also provides background on containers, Docker, and the Kubernetes architecture.
This document provides an overview of Kubernetes including:
1) Kubernetes is an open-source platform for automating deployment, scaling, and operations of containerized applications. It provides container-centric infrastructure and allows for quickly deploying and scaling applications.
2) The main components of Kubernetes include Pods (groups of containers), Services (abstract access to pods), ReplicationControllers (maintain pod replicas), and a master node running key components like etcd, API server, scheduler, and controller manager.
3) The document demonstrates getting started with Kubernetes by enabling the master on one node and a worker on another node, then deploying and exposing a sample nginx application across the cluster.
Recent momentum around the evolution of Containers are gradually increase in last two years.Containers virtualize an OS and applications running in each container believe that they have full access to their very own copy of that OS. This is analogous to what VMs do when they virtualize at a lower level, the hardware. In the case of containers, it’s the OS that does the virtualization and maintains the illusion.
Delivering Bioinformatics MapReduce Applications in the CloudLukas Forer
This document discusses delivering bioinformatics MapReduce applications in the cloud. It introduces Cloudgene, a graphical workflow system for executing MapReduce programs, and CloudMan, a platform for deploying computational tools and data analysis environments in the cloud. The authors propose implementing Cloudgene as a service within the CloudMan platform to provide bioinformaticians with an integrated environment for executing MapReduce workflows and analyses in the cloud without requiring expertise in cluster administration or computer science. This would allow researchers to leverage scalable cloud resources for processing large genomic datasets.
The document discusses two software patterns used in developing Chipster, a bioinformatics application: graceful GUI blocking, which places an opaque layer over the GUI to indicate loading and prevent user interaction; and self-service distributed state management, which distributes application state management to clients to avoid single points of failure in a distributed system. The patterns were found useful for Chipster, which provides bioinformatics analysis tools through a graphical interface and supports distributed computing.
Workshop on Higher Education and Professional Responsibility in CBRN Applied Sciences and Technology across the Sub-Mediterranean Region
3-4 April 2012. Palazzo Zorzi, Venice
Session 3. Inspiring Initiatives and Scientific Cooperation
Scaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaSJelastic Multi-Cloud PaaS
In this presentation, you'll find out what metrics should be tracked in order to meet the load requirements of application, how to finetune scaling triggers in order to efficiently handle different load levels, how to automate vertical and horizontal scaling of Jakarta EE applications running in the cloud.
Also, we share how to integrate load performance testing tools for adjusting horizontal scaling and making sure that your application can cope with production workloads.
Practical side is shown based on Jelastic PaaS https://jelastic.com/
A Brief introduction to Amazon ECS, Dockerization of Spring boot application, CI/CD and notifications using Slack.
This PPT also explains how CI/CD pipeline can be build using Jenkins. And
Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery called Pods. ReplicaSets ensure that a specified number of pod replicas are running at any given time. Key components include Pods, Services for enabling network access to applications, and Deployments to update Pods and manage releases.
This document provides an overview of Kubernetes basics. It introduces Kubernetes as an open source container orchestration tool developed by Google to manage the lifecycle of containers. It describes common Kubernetes concepts like pods, deployments, services, and how to install Kubernetes on local, on-premise and cloud environments. It also covers important topics for production use such as health checks, resource restrictions, logging, monitoring and alerts.
Kubernetes is an open-source platform for automating deployment, scaling, and management of containerized applications. It groups containerized applications into logical units called pods and uses labels to select pods and services for management at scale. Kubernetes masters manage the state of the cluster through the API server, scheduler and controller manager, while nodes run the pods and services and report back to the master.
Overview of kubernetes and its use as a DevOps cluster management framework.
Problems with deployment via kube-up.sh and improving kubernetes on AWS via custom cloud formation template.
The keynote discussed how containers can provide robustness and improved utilization of resources. Containers isolate applications and enable sets of applications called pods to run together with shared resources. The key challenges discussed were unpredictable interference between containers, low resource utilization, and hard to enforce isolation. Solutions presented were using cgroups for isolation, allowing "slack" resources to be used for lower priority tasks, and moving enforcement directly into the kernel. Kubernetes was introduced as an open source project for orchestrating pods across multiple machines through replication and reconciliation of the actual vs desired state.
Kubernetes for Beginners: An Introductory GuideBytemark
Kubernetes is an open-source tool for managing containerized workloads and services. It allows for deploying, maintaining, and scaling applications across clusters of servers. Kubernetes operates at the container level to automate tasks like deployment, availability, and load balancing. It uses a master-slave architecture with a master node controlling multiple worker nodes that host application pods, which are groups of containers that share resources. Kubernetes provides benefits like self-healing, high availability, simplified maintenance, and automatic scaling of containerized applications.
Federated mesos clusters for global data center designsKrishna-Kumar
The document describes Huawei's approach to federating Mesos clusters across multiple data centers. It proposes a multi-master federation approach where each data center runs its own Mesos master that coordinates with other masters. Gossiper modules in each data center gossip with each other to exchange framework and resource information. When one data center reaches a resource threshold, its gossiper will direct work to other data centers based on a simple policy engine. A demo visualization is shown to illustrate work load balancing during normal and failure scenarios.
Recent momentum around the evolution of Containers are gradually increase in last two years.Containers virtualize an OS and applications running in each container believe that they have full access to their very own copy of that OS. This is analogous to what VMs do when they virtualize at a lower level, the hardware. In the case of containers, it’s the OS that does the virtualization and maintains the illusion.
Recent past many software companies have quickly adopted container technologies, including Docker Containers, aware of the threat and advantage of the approach. For example, Linux companies have also jumped into the ground, seeing as this as an opportunity to grow the Linux market. Also Microsoft is going to add features to support containers and VMware have made efforts in integrating support for Docker into virtual machine technology.
Kubernetes is a container orchestration platform that provides a mechanism to manage the resources of containers in the cluster. That mechanism is known as "Requests and Limits".
Requests and limits play a key role not only in resource management but also in applications stability, capacity planning, scheduling the resources (i.e., on which node the pod will be running).
In this session we will cover:
- A quick review of Containers, Docker, and Kubernetes.
- Containers resource management in Kubernetes.
- Containers resource types in Kubernetes.
- 3 different ways to set requests and limits.
- The difference between capacity and allocatable resources.
- Tips and recap.
Slides used for Orchestructure May 2018 workshop.
Labs:
https://github.com/mrbobbytables/k8s-intro-tutorials
Event Information:
https://www.meetup.com/orchestructure/events/250189685/
Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It was originally developed by Google based on years of experience running production workloads at scale. Kubernetes groups containers into logical units called pods and handles tasks like scheduling, health checking, scaling and rollbacks. The main components include a master node that manages the cluster and worker nodes that run application containers scheduled by the master.
Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery called pods. Its main components include a master node that manages the cluster and worker nodes that run the applications. It uses labels to identify pods and services and selectors to group related pods. Common concepts include deployments for updating apps, services for network access, persistent volumes for storage, and roles/bindings for access control. The deployment process involves the API server, controllers, scheduler and kubelet to reconcile the desired state and place pods on nodes from images while providing discovery and load balancing.
Federated Kubernetes: As a Platform for Distributed Scientific ComputingBob Killen
A high level overview of Kubernetes Federation and the challenges encountered when building out a Platform for multi-institutional Research and Distributed Scientific Computing.
This document provides an overview of Kubernetes, a container orchestration system. It begins with background on Docker containers and orchestration tools prior to Kubernetes. It then covers key Kubernetes concepts including pods, labels, replication controllers, and services. Pods are the basic deployable unit in Kubernetes, while replication controllers ensure a specified number of pods are running. Services provide discovery and load balancing for pods. The document demonstrates how Kubernetes can be used to scale, upgrade, and rollback deployments through replication controllers and services.
This document provides an agenda and instructions for learning Kubernetes in 90 minutes. The agenda includes exercises on running a first web service in Kubernetes, revisiting pods, deployments and services, deploying with YAML files, and installing a microservices application called Guestbook. Key Kubernetes concepts covered include pods, deployments, services, YAML descriptors, and using deployments to scale applications. The document also provides background on containers, Docker, and the Kubernetes architecture.
This document provides an overview of Kubernetes including:
1) Kubernetes is an open-source platform for automating deployment, scaling, and operations of containerized applications. It provides container-centric infrastructure and allows for quickly deploying and scaling applications.
2) The main components of Kubernetes include Pods (groups of containers), Services (abstract access to pods), ReplicationControllers (maintain pod replicas), and a master node running key components like etcd, API server, scheduler, and controller manager.
3) The document demonstrates getting started with Kubernetes by enabling the master on one node and a worker on another node, then deploying and exposing a sample nginx application across the cluster.
Recent momentum around the evolution of Containers are gradually increase in last two years.Containers virtualize an OS and applications running in each container believe that they have full access to their very own copy of that OS. This is analogous to what VMs do when they virtualize at a lower level, the hardware. In the case of containers, it’s the OS that does the virtualization and maintains the illusion.
Delivering Bioinformatics MapReduce Applications in the CloudLukas Forer
This document discusses delivering bioinformatics MapReduce applications in the cloud. It introduces Cloudgene, a graphical workflow system for executing MapReduce programs, and CloudMan, a platform for deploying computational tools and data analysis environments in the cloud. The authors propose implementing Cloudgene as a service within the CloudMan platform to provide bioinformaticians with an integrated environment for executing MapReduce workflows and analyses in the cloud without requiring expertise in cluster administration or computer science. This would allow researchers to leverage scalable cloud resources for processing large genomic datasets.
The document discusses two software patterns used in developing Chipster, a bioinformatics application: graceful GUI blocking, which places an opaque layer over the GUI to indicate loading and prevent user interaction; and self-service distributed state management, which distributes application state management to clients to avoid single points of failure in a distributed system. The patterns were found useful for Chipster, which provides bioinformatics analysis tools through a graphical interface and supports distributed computing.
Workshop on Higher Education and Professional Responsibility in CBRN Applied Sciences and Technology across the Sub-Mediterranean Region
3-4 April 2012. Palazzo Zorzi, Venice
Session 3. Inspiring Initiatives and Scientific Cooperation
استراتيجيات العلوم والتكنولوجيا والتجديد العالمية المعاصرة (ST&I)Prof. Tafida Ghanem
تعنى استراتيجيات العلوم والتكنولوجيا والتجديد العالمية المعاصرة (Science, Technology, and Innovation (ST&I)) بتسخير العلم والتكنولوجيا لأغراض التنمية فى العصر الحالى، وتتمثل فى السياسات والخطط والبرامج القومية التى تضعها الوزارات المعنية بالعلوم والتكنولوجيا فى الدول المتقدمة والدول الأخذة فى التقدم، وتهدف إلى تطوير البحوث والتنمية وإبداع العلوم فى جميع المجالات على المستوى القومى والعالمى، ودعم التكنولوجيا لخدمة المجتمع وحل المشكلات البيئية وتحقيق التنمية المستدامة والنمو طويل الأجل فى جميع بلدان العالم.
The document outlines the process and requirements for applying for funding from Saudi Arabia's Strategic Technologies Program Grants. Applicants must submit a proposal following a specific format, including an application form, project details, objectives, methodology, budget, and team CVs. Proposals go through an initial review at the applicants' research institution, then a formal review at KACST. Technical reviews are conducted by external reviewers who evaluate innovation, impact, and feasibility. Funding is provided in installments contingent on semi-annual progress reports. Changes to projects must be approved by the CNPSTI committee. The process aims to fund strategic, innovative research through a rigorous peer-reviewed selection process.
1. The document discusses plans for developing Egypt's e-justice sector and improving access to justice through information technology and communications.
2. The goals are to dramatically improve Egypt's rule of law indicators by making judicial processes more integrated, transparent, and effective through online portals, dashboards, and other digital tools that provide timely access to decisions and court performance metrics.
3. A roadmap is presented that outlines services, systems, and integrations to be developed between 2017-2018, including case management systems, a unified justice portal, digital judicial libraries, and analytics to support transparency and access to justice for the public.
This document summarizes a presentation about using the Barcode of Life Database (BOLD) for data sharing, collaboration, and publication in barcoding projects. It discusses how BOLD allows collaborators from different institutions and countries to work together by giving different levels of access. It also describes how BOLD facilitates sending barcode sequences to GenBank for publication and submitting bibliographies. The presentation encourages making barcoding projects public on BOLD to share data with the research community.
This document provides an introduction to the field of bioinformatics including:
1. Key fundamentals such as the flow of genetic information and challenges of accumulating biological data.
2. Applications such as using tools to help biologists with research through tasks like data analysis, storage and retrieval.
3. Various career paths in bioinformatics which typically require backgrounds in both biology and computer science.
The document provides an introduction to the field of bioinformatics. It discusses how bioinformatics applies computer science to analyze large amounts of biological data from fields like molecular biology, medicine, and biotechnology. It also outlines some of the main topics that will be covered in the course, including biological databases, gene and protein analysis, phylogenetic analysis, and gene prediction.
الثقافة التقنية والمواطنة الالكترونية. محاضرة ضمن البرنامج الثقافي الرمضاني المقام في جامعة الحدود الشمالية لعام 1435هـ.
وتشرح العلاقة بين التقنية والمعرفة التقنية والثقافة التقنية والمواطنة الالكترونية.
تم التطرق لبعض المفاهيم مثل الهوية الرقمية والحكومة الالكترونية.
The document discusses the applications of bioinformatics in drug discovery. It describes how bioinformatics supports computer-aided drug design through computational methods to simulate drug-receptor interactions. It also discusses how virtual high-throughput screening can identify compounds that strongly bind to protein targets. The document outlines the key steps in drug design, including identifying the disease target, studying lead compounds, rational drug design techniques, and testing drugs. It emphasizes that bioinformatics can predict important drug characteristics like absorption and toxicity to save costs during development.
Phil Basford - machine learning at scale with aws sage makerAWSCOMSUM
The document discusses a machine learning endpoint architecture experiment conducted using Amazon SageMaker. Key aspects covered include:
- The reference architecture used Amazon SageMaker endpoints running Docker containers with inference engines like XGBoost and TensorFlow.
- An experiment tested endpoint scaling and performance under load using Artillery. It found endpoints automatically scaled to two instances and each could handle high request volumes, but starting a new instance took 7 minutes.
- Analysis of CloudWatch logs determined that instances handled load evenly and autoscaled as needed when an instance terminated.
We present applications of Azure Services such as Azure IaaS/PaaS and Azure RemoteApp in computational fluid dynamics and sparse linear algebra. We also present Microsoft Machine Learning Studio in prediction of the heating load in the buildings.
Machine learning at scale with aws sage makerPhilipBasford
The document discusses machine learning at scale using serverless architectures on AWS, including a reference architecture using Amazon SageMaker, AWS Lambda, and other services, and details of experiments conducted to test performance, scalability, and operational aspects of deploying machine learning models with a serverless approach. It also covers monitoring metrics, deployment strategies, and using AWS services like X-Ray, CloudWatch, and CodePipeline to enable continuous deployment of machine learning models.
Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka Mario Ishara Fernando
This document discusses microservices and containers. It provides an overview of microservices architecture compared to monolithic architecture, highlighting that microservices are composed of many small, independent services with separate deployments and databases. It then discusses containers and how Docker is used to package and run applications in isolated containers. Finally, it introduces Kubernetes as a container orchestration system to manage and scale multiple containerized applications across a cluster of machines.
Introductio to Docker and usage in HPC applicationsRichie Varghese
This is a basic introduction to Docker and breif comparison of docker and Virtual machines...
You can refer the base papers
1) An Introduction to Docker and Analysis of its Performance - Babak Bashari Rad, Harrison John Bhatti, Mohammad Ahmadi
2) Using Docker in High Performance Computing
Applications - Minh Thanh Chung, Nguyen Quang-Hung, Manh-Thin Nguyen, Nam Thoai
note: Its recommended that you download the file as ppt from https://drive.google.com/open?id=1UtR7q9nLu-uBh1uHtokSyFvCV34InyvR as some demonstration works in slide show only....
The World of Internet
History of cloud computing
What is Cloud Computing?
Types of Cloud Computing
i. Software as a Service(SaaS)
ii. Platform as aService(PaaS)
iii. Infrastructure as a Service(IaaS)
Characteristics of Cloud Computing
Deployment model of Cloud Computing
Architecting .NET solutions in a Docker ecosystem - .NET Fest Kyiv 2019Alex Thissen
Conference: .NET Fest 2019
Location: Kyiv, Ukraine
Abstract: You must have noticed how Docker and containers is playing a more and more important part in .NET development. Docker support is everywhere, so it should be easy to build solutions based on container technology, right? But, it takes a bit more to architect and create a .NET solution that use Docker at its core. Many questions arise: How do you design a solution architecture that fits well with containers? Would I use .NET or .NET Core? What is a proper way to migrate to such an architecture? What changes in the .NET implementation from pre-Docker solutions with micro-services? Where do container orchestrators fit in and how do I build and deploy my solutions on a Docker container cluster, such as Azure Kubernetes Service?
These and many other questions will be answered in this session. You will learn how to design and architect your .NET solutions and get a flying start to create, build and run Docker-based containerized applications.
This document discusses task-based programming models for distributed computing. It defines tasks as distinct units of code that can be executed remotely. Task computing provides distribution by harnessing multiple computing nodes, unlike multithreaded computing within a single machine. The document categorizes task computing into high-performance, high-throughput, and many-task computing. It also describes popular task computing frameworks like Aneka, Condor, Globus Toolkit, and describes developing applications using the Aneka task programming model.
This document provides an overview of cloud computing and distributed systems. It discusses large scale distributed systems, cloud computing paradigms and models, MapReduce and Hadoop. MapReduce is introduced as a programming model for distributed computing problems that handles parallelization, load balancing and fault tolerance. Hadoop is presented as an open source implementation of MapReduce and its core components are HDFS for storage and the MapReduce framework. Example use cases and running a word count job on Hadoop are also outlined.
djypllh5r1gjbaekxgwv-signature-cc6692615bbc55079760b9b0c6636bc58ec509cd0446cb...Dr. Thippeswamy S.
This document discusses task-based distributed computing and the Aneka framework. It defines tasks as distinct units of code that can be executed remotely. Aneka uses a task programming model where tasks implement an interface and are wrapped in AnekaTask objects. Developers create application classes to control task submission and monitoring. Aneka supports various task types including embarrassingly parallel, parameter sweep, and workflows. It integrates with cloud infrastructures and provides APIs for developing distributed applications.
This document provides an overview of cloud computing and the Eucalyptus platform. It defines cloud computing as a large-scale distributed computing paradigm that delivers dynamically scalable computing resources as a service over the Internet. It then describes Eucalyptus as an open-source software that implements cloud computing on computer clusters and is compatible with Amazon EC2. The document outlines the Eucalyptus cloud architecture including components like the Cloud Controller, Cluster Controller, Node Controller, Storage Controller, and Walrus storage. It provides examples of deploying data mining applications on Eucalyptus and Amazon EC2 clouds.
Cloud computing is the natural evolution of computing where resources are provided as a service over the internet. There are different deployment models and types of cloud services including infrastructure as a service, platform as a service, and software as a service. Popular cloud frameworks include Google AppEngine, PubNub, and Jclouds which provide development platforms and services for storage, databases, and notifications in the cloud.
This document discusses cloud computing concepts including definitions, architecture, service models, and simulation tools. It summarizes a student project presentation on cloud computing that examines key aspects like scalability, pay-per-use model, and virtualization. It also evaluates cloud simulators CloudSim, GreenCloud and iCanCloud, comparing their features, scenarios and performance graphs. The document proposes a novel load balancing approach and its implementation through a dynamic information system interface.
This is the presentation on clusters computing which includes information from other sources too including my own research and edition. I hope this will help everyone who required to know on this topic.
This document summarizes a presentation on CloudSim, a toolkit for modeling and simulating cloud computing environments. CloudSim allows modeling resources and services in cloud data centers and testing application services. It features discrete event-driven simulation of large cloud environments and supports modeling virtualized resources, data centers, and network connections. CloudSim has advantages for testing policies in a repeatable and controllable environment and tuning systems before real deployment. The presentation outlines CloudSim's architecture, modeling capabilities, simulation steps, and concludes with discussions of conclusions and future work, as well as green cloud computing.
High Performance Computing (HPC) and Engineering Simulations in the CloudThe UberCloud
UberCloud Customer Workshop for engineers and scientist and their software providers, discussing cloud challenges and their solution, based on novel UberCloud software container technology which allows access and use of cloud resources and engineering applications and data, on demand, at your fingertips.
info.theubercloud.com/case-studies-and-resources
High Performance Computing (HPC) and Engineering Simulations in the CloudWolfgang Gentzsch
UberCloud Customer Workshop for engineers and scientist and their software providers, discussing cloud challenges and their solution, based on novel UberCloud software container technology which allows access and use of cloud resources and engineering applications and data, on demand, at your fingertips.
Similar to Supporting bioinformatics applications with hybrid multi-cloud services (20)
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Discover top-tier mobile app development services, offering innovative solutions for iOS and Android. Enhance your business with custom, user-friendly mobile applications.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
4. Cloud Computing
Cloud deployment models
Public Cloud
Private Cloud
Hybrid Cloud
Community Cloud
Private Cloud
Public Cloud
Hybrid Cloud
4
5. Cloud Computing
Advantages
Service automation and self-
service models
Easy to deploy
It is an immigration from CapEx
to OpEx
Data recovery and backup
Disadvantages
Security Issues
User has no clue where his/her
data is
Legacy systems incompatibility
Higher operational cost for long
term usage
Advantages and disadvantages of cloud computing
5
6. Cloud Computing
Cloud Computing for Bioinformatics Applications
Some tools already developed for bioinformatics applications
Crossbow,
Myrna
CloudBrust,
CloudBlast,
Cloud–RNA,
etc.
These tools are demonstrated on cloud computing and their techniques are not
generic to other tools and supports only Amazon Web Services
6
7. Cloud Computing
Computer Cluster middleware packages over cloud
Middleware packages support computer cluster management
over cloud
StarCluster
Vappio
CloudMan
etc.
These middleware packages do not support running computer cluster over multiple
Cloud providers
7
9. Our contribution
Cluster 1 Cluster 2 Cluster 3
Provider 1 Provider 2 Provider 1 Provider 2
Non-Federated Cloud Cluster Federated Cloud Cluster
Our contribution is to extend bioinformatics applications to run over multiple
clusters on different cloud service
providers and supporting two types of compute cluster
Non-Federated Cloud Cluster
Federated Cloud Cluster
9
10. Our contribution
ElasticHPC supports creation and management of computer cluster for
bioinformatics solutions on:
– Amazon Web Services
– Microsoft Windows Azure
– Google Compute Engine
– OpenStack based clouds
Provider 2 Provider 1 Provider 2
10
12. Use case scenarios
Provider 2 Provider 1 Provider 2
Simplified version of the variant analysis workflow based on NGS technology as an
example for our use case scenarios
12
The variant analysis workflow: the tools BWA, Picard, GATK are usually used for the
three steps of the workflow. On the arrows, we write the different file formats of
the processed data
13. Multiple clusters over multiple clouds
Provider 2 Provider 1 Provider 2
Multiple independent clusters over multiple clouds and each cluster
processes part of the input data
13
14. Multiple clusters over multiple clouds
Provider 2 Provider 1 Provider 2
Using this scenario depends on:
Time constraint or not.
Reducing the cost within specific time (Spot instances)
14
Input File 3
Cloud 1
Input File 1
Cluster 1 Cluster 2
Input File 2
Cloud 2
Cluster 3
Storing
Output files
On Object storage or S3
15. Multiple clusters over multiple clouds
Provider 2 Provider 1 Provider 2
Each cluster is created in one cloud and solves a step of the workflow.
15
16. Multiple clusters over multiple clouds
Provider 2 Provider 1 Provider 2
In the case of technical limitations
Some technical specification preventing a step from
running in one cloud, but the other steps can run in cheaper cloud.
16
Cloud 1
Cloud 2 Cloud 3
Cluster 1
Cluster 3Cluster 2
Read Mapping Step
Mark Duplicates Step
Variant Calling Step
Storing
Output files
On Object storage or S3
17. One cluster of federated cloud machines
Provider 2 Provider 1 Provider 2
One cluster composed of different machines from different clouds where
one master job queue which dispatches the jobs among the nodes in
different clouds.
17
18. Cloud 1 Cloud 2
Persistent
Process
Communication Layer
Master
Node
One cluster of federated cloud machines
master job queue dispatches the jobs among the nodes in different clouds
that works on the job level rather than the whole (sub-) workflow level
18
19. One cluster of federated cloud machines
• Using this scenario depends on
• The processing time differs from one job to another.
• The characteristics of the processed data
• Internet connection among the cloud sites
• Good management of input data according to its
characteristics
19
23. Implementation of multi-cloud
elasticHPC
23
The three major commercial providers Amazon, Azure, and Google
Amazon Web Services (AWS)
Execution Model:
• Highest CPU virtual machine of type c3.8xlarge (32 Cores and 108 GB
RAM $1.68/hr)
Storage Model:
• EBS “Elastic Block Storage” such as Hard disks and block devices
• S3 “Simple Storage Services” it is some sort of object storage.
Pricing models:
• Pay as you go
• Reserved instances
• Spot instances
24. Implementation of multi-cloud
elasticHPC
24
Microsoft Windows Azure
Execution Model:
• Highest CPU virtual machine of type A9 (16 cores, 112 GB RAM
$4.47/hr)
Storage Models:
• Page Blobs such as Hard disks and block devices as a file system with a
maximum size of 1 TB
• Block Blobs with maximum size of 200 GB.
Pricing models:
• Pay as you go “pay per minute”
25. Implementation of multi-cloud
elasticHPC
25
Google Compute Engine /Google Cloud
Execution Model:
• Highest CPU virtual machine of type n1-highmem-16 (16 cores , 104 GB
RAM, $1.18/hr)
• Also Google provides hard disks, snapshots and images within execution
models
Storage Models:
• Object Storage
Pricing models:
• Pay as you go “pay per minute”
• sustained use
30. Implementation of multi-cloud
elasticHPC
Cluster Manager
handles all functions related to
the creation and management
of clusters at that cloud site
including security settings
and storage devices
30
31. Implementation of multi-cloud
elasticHPC
Job and Data Manager
handles job submission and
data transfer management
between cluster’s nodes and
different storage types
(Block/Object) storage.
31
33. Experiments
Variant Analysis Workflow
Input exome dataset of size ≈ 9 GB
using BWA for read mapping, Picard for marking duplicates,
and GATK for variant calling
33
34. Experiments
Experiment 1
The workflow was executed 3 times independently on:
Google
n1-highmem-8 (8 Cores, 52 GB RAM, $0.452/hour)
AWS
m3.2xlarge (8 Cores, 30 GB RAM,$0.56/hour)
Azure
Standard A7 (8 Cores, 56 GB RAM, $1.00/hour)
The 9 GB input data is divided into blocks to be processed in parallel over
the cluster nodes
34
35. Experiments
Experiment 1
Google and Amazon have the same performance, on the other Hand Azure has the
Worst performance
35
Running times in minutes. “MarkD “ stands for mark duplicate step. The numbers
Between backets are the cost in USD
36. Experiments
Experiment 1
Noted that Mark duplicate has no performance improvement when
adding More nodes (increasing computing power) because Picard
requires all reads to be a one set of input.
36
37. Experiments
Experiment 2
Using the same input dataset but with stronger machine for the Mark Duplicate
step on Amazon
c3.8xlarge
Amazon c3.8xlarge,
which has 32 cores
and 108 GB RAM
and costs $1.68
Mark Duplicate
Google cluster
n1-highmem-8
8 Cores, 52GB
RAM, $0.452
n1-highmem-8
READ MAPPING
VARIANT CALLING
Uploading
VCF output
File to Object
Storage
S3/Google
Objects
Transfer
Mapped
BAM File
1 2
34
37
38. Experiments
Experiment 2
Google will always retrieve better cost when the parallelization leads to
fractions of hour. So the best cost with comparable performance for these
three steps workflow is when we use hybrid cloud of Amazon and Google.
38
Running times in minutes using single provider and multicloud scenario of
two providers. The numbers between brackets are the cost in USD
39. Conclusion
Introducing ElasticHPC that creates and manage computer cluster over multiple
cloud platforms for bioinformatics applications
Google and Azure offer “The charge per minutes” pricing model
Amazon charges per hour as a pricing model
ElasticHPC enables the data analyst to use cloud with best offer at the time of
analysis
elasticHPC opens the way for the development of more advanced layers for task
scheduling and cost-time optimization
Future work, we will include different ideas to use shared storage from
multi-cloud as a shared file system
39
41. Availability and requirements
• Project name: elasticHPC.
• Project home page: http://www.elastichpc.org.
• Operating system(s): Linux.
• Programming language: Python, C, Java script, HTML, Shell
script.
• Other requirements: Compatible with the browsers
FireFox, Chrome, Safari, and Opera. See the manual for
more details.
• License: Free for academics. Authorization license needed
for commercial usage (Please contact the corresponding
author for more details).
• Any restrictions to use by non-academics: No restrictions.
41
42. Configurations file
################## BASIC SETTING FOR CLOUD PLATFORMS ##############
[GCE]
# GOOGLE COMPUTE ENGINE CONFIGURATION
PROJECT_ID =
ZONE = us-central1-a
CLIENT_SECRET = config/client_secret.json
COMPUTE_SCOPE = https://www.googleapis.com/auth/compute
OAUTH_STORAGE = oauth2.dat
IMAGE_PROJECT =
SERVICE_EMAIL = default
NETWORK = default
SCOPES = https://www.googleapis.com/auth/devstorage.full_control
API_VERSION = v1
CLUSTER_CLIENT_KEY = keys/key
ROOT_DISK=disks
Configuration s file Sample
Google Specific configuration section
42
43. Configurations file
################## BASIC SETTING FOR CLOUD PLATFORMS ##############
[AZURE]
# MICROSOFT WINDOWS AZURE CONFIGURATIONS
SUBSCRIPTION_ID =
THUMBPRINT =
STORAGE_ACCOUNT =
STORAGE_KEY =
CERTIFICATE_PATH = mycert.pem
PKFILE = mycert.cer
CERT_DATA_PATH = mycert.pfx
CERT_PASSWORD =
REGION = WUS
CONTAINER=newcontainer
Configuration s file Sample
Azure Specific configuration section
43
44. Configurations file
########## BASIC SETTING FOR CLOUD PLATFORMS ########
[AWS]
# AMAZON WEB SERVICES CONFIGURATIONS
pkey= pk.pem
cert= cert.pem
accessKey=
secretKey=
keyPair= instance-key
securityGroup =
keyPairPath= instance-key.pem
INSTANCE_TYPE = m3.medium
MASTER_TYPE = m3.medium
REGION = USW1
ZONE = us-west-1c
Configuration s file Sample
Amazon Specific configuration section
44
45. Configurations file
###### DEFINE CLUSTERS #######
[CLUSTERS]
CLUSTERS_LIST= CLUSTER1, CLUSTER 2
[CLUSTER1] ### CLUSTER 1 is hybrid cluster over multi-
cloud
# CLUSTER CONFIGURATION
CLUSTER_NAME= cluster1
CLUSTER_PREFIX = cluster1
MachineSets=MachineSet2,MachineSet3,MachineSet1
MASTER_NODE_LOCATION= MachineSet2
NFS = True
# NFS CONFIGURATION
NFS_MOUNTING_POINT=/home
NFS_DEVICE=/dev/xvdf
NFS_FSID=0
NFS_EBS_Mode=NEW_VOLUME
# attach new volume
NFS_NEW_VOLUME_SIZE=10
# in case of attach an exist volume
GLUSTER=False
GLUSTER_MOUNT_POINT = /gluster/WGA/
GLUSTER_VOLUME_NAME = gv0
GLUSTER_STRIPE = 1
GLUSTER_REPLICATE = 1
GLUSTER_FORMAT_DISK = False
Cluster s Section defines multiple clusters where each one has multiple Machine sets,
every Machine sets represents a cluster on different cloud service provider
[MachineSet1]
NODES = 2
PROVIDER = GCE
# IMAGE CONFIGURATION
IMAGE_ID = tavaxy2
……
FIREWALL=ehpc,http2,apache2
FW_PORTS=5000,8080,80
FW_PROTOCOLS=tcp,tcp,tcp
[MachineSet2]
NODES = 0
PROVIDER = AWS
IMAGE_ID = ami-077d9a43
……..
FW_PORTS=5000,8080,80
FW_PROTOCOLS=tcp,tcp,tcp
[MachineSet3]
NODES=0
Provider = AZURE
IMAGE_ID = ehpc-generic26
OS_URL =
……..
FW_PORTS=5000,8080,80
FW_PROTOCOLS=tcp,tcp,tcp
45