When HPC Meet ML/DL
Machine learning and deep learning (ML/DL) are becoming important workloads for high performance computing (HPC) as new algorithms are developed to solve business problems across many domains. Container technologies like Docker can help with the portability and scalability needs of ML/DL workloads on HPC systems. Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications that can help run MPI jobs and ML/DL pipelines on HPC systems, though it currently lacks some features important for HPC like advanced job scheduling capabilities. Running an HPC-specific job scheduler like IBM Spectrum LSF on top of Kubernetes is one approach to address current gaps in
Experiences in Delivering Spark as a ServiceKhalid Ahmed
The back-end architecture for the public Spark service in IBM Bluemix is powered by IBM Spectrum Conductor with Spark technology. In this presentation, we will demonstrate the advantages of the architecture, which uses dynamic resource allocations based on multiple Spark tenants workload demands (vs. common cloud service architecture provisioning of pre-deployed cluster per tenant), as well as cluster's auto-scaling based on computation capacity and billing policies. We will also review some of the architectural challenges of scaling to thousand of Spark tenants in terms of performance, security requirements, data isolation and manageability.
Edge 2016 Session 1886 Building your own docker container cloud on ibm power...Yong Feng
The material for IBM Edge 2016 session for a client use case of Spectrum Conductor for Containers
https://www-01.ibm.com/events/global/edge/sessions/.
Please refer to http://ibm.biz/ConductorForContainers for more details about Spectrum Conductor for Containers.
Please refer to https://www.youtube.com/watch?v=7YMjP6EypqA and https://www.youtube.com/watch?v=d9oVPU3rwhE for the demo of Spectrum Conductor for Containers.
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Chris Fregly
https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/244971261/
Based on this blog post: https://mengdong.github.io/2017/07/15/distributed-tensorflow-with-gpu-on-kubernetes-and-mapr/
youtube video:
https://www.youtube.com/watch?v=3phz1_B-rR4
http://pipeline.ai
Edge 2016 SCL-2484: a software defined scalable and flexible container manage...Yong Feng
The material for IBM Edge 2016 session for Spectrum Container Management Solution.
https://www-01.ibm.com/events/global/edge/sessions/.
Please refer to http://ibm.biz/ConductorForContainers for more details about Spectrum Conductor for Containers.
Please refer to https://www.youtube.com/watch?v=7YMjP6EypqA and https://www.youtube.com/watch?v=d9oVPU3rwhE for the demo of Spectrum Conductor for Containers.
Resilient microservices with Kubernetes - Mete AtamelITCamp
Creating a single microservice is a well understood problem. Creating a cluster of load-balanced microservices that are resilient and self-healing is not so easy. Managing that cluster with rollouts and rollbacks, scaling individual services on demand, securely sharing secrets and configuration among services is even harder. Kubernetes, an open-source container management system, can help with this. In this talk, we will start with a simple microservice, containerize it using Docker, and scale it to a cluster of resilient microservices managed by Kubernetes. Along the way, we will learn what makes Kubernetes a great system for automating deployment, operations, and scaling of containerized applications.
Experiences in Delivering Spark as a ServiceKhalid Ahmed
The back-end architecture for the public Spark service in IBM Bluemix is powered by IBM Spectrum Conductor with Spark technology. In this presentation, we will demonstrate the advantages of the architecture, which uses dynamic resource allocations based on multiple Spark tenants workload demands (vs. common cloud service architecture provisioning of pre-deployed cluster per tenant), as well as cluster's auto-scaling based on computation capacity and billing policies. We will also review some of the architectural challenges of scaling to thousand of Spark tenants in terms of performance, security requirements, data isolation and manageability.
Edge 2016 Session 1886 Building your own docker container cloud on ibm power...Yong Feng
The material for IBM Edge 2016 session for a client use case of Spectrum Conductor for Containers
https://www-01.ibm.com/events/global/edge/sessions/.
Please refer to http://ibm.biz/ConductorForContainers for more details about Spectrum Conductor for Containers.
Please refer to https://www.youtube.com/watch?v=7YMjP6EypqA and https://www.youtube.com/watch?v=d9oVPU3rwhE for the demo of Spectrum Conductor for Containers.
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Chris Fregly
https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/244971261/
Based on this blog post: https://mengdong.github.io/2017/07/15/distributed-tensorflow-with-gpu-on-kubernetes-and-mapr/
youtube video:
https://www.youtube.com/watch?v=3phz1_B-rR4
http://pipeline.ai
Edge 2016 SCL-2484: a software defined scalable and flexible container manage...Yong Feng
The material for IBM Edge 2016 session for Spectrum Container Management Solution.
https://www-01.ibm.com/events/global/edge/sessions/.
Please refer to http://ibm.biz/ConductorForContainers for more details about Spectrum Conductor for Containers.
Please refer to https://www.youtube.com/watch?v=7YMjP6EypqA and https://www.youtube.com/watch?v=d9oVPU3rwhE for the demo of Spectrum Conductor for Containers.
Resilient microservices with Kubernetes - Mete AtamelITCamp
Creating a single microservice is a well understood problem. Creating a cluster of load-balanced microservices that are resilient and self-healing is not so easy. Managing that cluster with rollouts and rollbacks, scaling individual services on demand, securely sharing secrets and configuration among services is even harder. Kubernetes, an open-source container management system, can help with this. In this talk, we will start with a simple microservice, containerize it using Docker, and scale it to a cluster of resilient microservices managed by Kubernetes. Along the way, we will learn what makes Kubernetes a great system for automating deployment, operations, and scaling of containerized applications.
Introduction to KubeDirector - SF Kubernetes MeetupBlueData, Inc.
Presentation from San Francisco Kubernetes Meetup on October 30, 2018
https://www.meetup.com/San-Francisco-Kubernetes-Meetup/events/255431002
What is KubeDirector? - Tom Phelan & Joel Baxter, Bluedata
Kubernetes is clearly the container orchestrator of choice for cloud-native stateless applications. And with the introduction of StatefulSets and Persistent Volumes it is becoming possible to run stateful applications on Kubernetes.
Now the new KubeDirector project allows users to manage complex stateful clusters for AI, machine learning, and big data analytics on Kubernetes without writing a single line of GO code.
KubeDirector is an open source Apache project that uses the standard Kubernetes custom resource functionality and API extensions to deploy and manage complex stateful scale-out application clusters.
This session will provide an overview of the KubeDirector architecture, show how to author the metadata and artifacts required for an example stateful application (e.g. with Spark, Jupyter, and Cassandra), and demonstrate the deployment and management of the cluster on Kubernetes using KubeDirector.
https://github.com/bluek8s/kubedirector
The Jupyter Notebook has become the de facto platform used by data scientists and AI engineers to build interactive applications and develop their AI/ML models. In this scenario, it’s very common to decompose various phases of the development into multiple notebooks to simplify the development and management of the model lifecycle.
Luciano Resende details how to schedule together these multiple notebooks that correspond to different phases of the model lifecycle into notebook-based AI pipelines and walk you through scenarios that demonstrate how to reuse notebooks via parameterization.
Regarding Clouds, Mainframes, and Desktops … and LinuxRobert Sutor
In this talk, I'll focus on three areas of great opportunity as well as challenge for Linux: the accelerating market for cloud computing, Linux as a significant operating system for mainframes, and the hope for Linux on the desktop.
DCEU 18: Edge Computing with Docker EnterpriseDocker, Inc.
Marc Meunier - Director of Business Development, Docker
Adam Parco - Director of Engineering, Edge & IoT, Docker
The Internet of Things (IoT) is pushing more computing to the edge - where data from devices can be aggregated, filtered, and analyzed before it’s sent somewhere else. As edge devices become more powerful and capable of running sophisticated applications, the edge servers have to keep pace with development. The challenge for edge computing is that these servers and devices are distributed geographically across many sites and sometimes inaccessible. The Docker platform is designed for distributed computing and provides an easy way to securely distribute and run applications at the edge. In this session, we will outline some of the major trends around edge computing and the common architectures and use cases across different industries. We will highlight some of the work we’re doing with our customers to deliver on these edge use cases and where Docker is headed.
Kubernetes and Cloud Native Update Q4 2018CloudOps2005
This year’s final set of Kubernetes and Cloud Native meetups just took place. They kicked off in Kitchener-Waterloo on November 29th, and continued in Montreal December 3rd, Ottawa December 4th, Toronto December 5th, and Quebec December 6th. In preparation for the upcoming KubeCon and CloudNativeCon in Seattle, a wide range of open source solutions were discussed and, as always, beer and pizza provided. Ayrat Khayretdinov began each meetup with an update of Kubernetes and the Cloud Native landscape.
Cloud-Native PostgreSQL is a Kubernetes Operator for Postgres written by EDB entirely from scratch in the Go language and relying exclusively on the Kubernetes API.
This webinar covered:
- About DevOps & Cloud Native
- Overview of Cloud Native Postgres
- Storage for Postgres workloads in Kubernetes
- Start Using Cloud-Native Postgres
- Demo
Jupyter Enterprise Gateway enables Jupyter Notebook to launch remote kernels in a distributed cluster, including Apache Spark managed by YARN, IBM Spectrum Conductor or Kubernetes.
It provides out of the box support for the following kernels:
Python using IPython kernel
R using IRkernel
Scala using Apache Toree kernel
Strata - Scaling Jupyter with Jupyter Enterprise GatewayLuciano Resende
Born in academia, Jupyter notebooks are prevalent in both learning and research environments throughout the scientific community. Due to the widespread adoption of big data, AI, and deep learning frameworks, notebooks are also finding their way into the enterprise, which introduces a different set of requirements.
Alan Chin and Luciano Resende explain how to introduce Jupyter Enterprise Gateway into new and existing notebook environments to enable a “bring your own notebook” model while simultaneously optimizing resources consumed by the notebook kernels running across managed clusters within the enterprise. Along the way, they detail how to use different frameworks with Enterprise Gateway to meet the needs of data scientists operating within the AI and deep learning ecosystems.
Kubernetes is an open source container cluster orchestration platform founded by Google. This presentation covers an overview of it's main concepts, plus how it fits into Google Cloud Platform. This was delivered by Kit Merker at DevNexus 2015 in Atlanta.
Everything is changing from Health Care to the Automotive markets without forgetting Financial markets or any type of engineering everything has stopped being created as an individual or best-case scenario a team effort to something that is being developed and perfectioned by using AI and hundreds of computers.And even AI is something that we no longer can run in a single computer, no matter how powerful it is. What drives everything today is HPC or High-Performance Computing heavily linked to AI In this session we will discuss about AI, HPC computing, IBM Power architecture and how it can help develop better Healthcare, better Automobiles, better financials and better everything that we run on them
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARNDataWorks Summit
Deep learning is useful for enterprises tasks in the field of speech recognition, image classification, AI chatbots and machine translation, just to name a few.
In order to train deep learning/machine learning models, applications such as TensorFlow / MXNet / Caffe / XGBoost can be leveraged. And sometimes these applications will be used together to solve different problems.
To make distributed deep learning/machine learning applications easily launched, managed, monitored. Hadoop community has introduced Submarine project along with other improvements such as first-class GPU support, container-DNS support, scheduling improvements, etc. These improvements make distributed deep learning/machine learning applications run on YARN as simple as running it locally, which can let machine-learning engineers focus on algorithms instead of worrying about underlying infrastructure. Also, YARN can better manage a shared cluster which runs deep learning/machine learning and other services/ETL jobs with these improvements.
In this session, we will take a closer look at Submarine project as well as other improvements and show how to run these deep learning workloads on YARN with demos. Audiences can start trying running these workloads on YARN after this talk.
Speakers:
Sunil Govindan, Staff Engineer
Hortonworks
Zhankun Tank, Staff Engineer
Hortonworks
Containers as Infrastructure for New Gen AppsKhalid Ahmed
Khalid will share on emerging container technologies and their role in supporting an agile cloud-native application development model. He will discuss the basics of containers compared to traditional virtualization, review use cases, and explore the open-source container management ecosystem.
Introduction to KubeDirector - SF Kubernetes MeetupBlueData, Inc.
Presentation from San Francisco Kubernetes Meetup on October 30, 2018
https://www.meetup.com/San-Francisco-Kubernetes-Meetup/events/255431002
What is KubeDirector? - Tom Phelan & Joel Baxter, Bluedata
Kubernetes is clearly the container orchestrator of choice for cloud-native stateless applications. And with the introduction of StatefulSets and Persistent Volumes it is becoming possible to run stateful applications on Kubernetes.
Now the new KubeDirector project allows users to manage complex stateful clusters for AI, machine learning, and big data analytics on Kubernetes without writing a single line of GO code.
KubeDirector is an open source Apache project that uses the standard Kubernetes custom resource functionality and API extensions to deploy and manage complex stateful scale-out application clusters.
This session will provide an overview of the KubeDirector architecture, show how to author the metadata and artifacts required for an example stateful application (e.g. with Spark, Jupyter, and Cassandra), and demonstrate the deployment and management of the cluster on Kubernetes using KubeDirector.
https://github.com/bluek8s/kubedirector
The Jupyter Notebook has become the de facto platform used by data scientists and AI engineers to build interactive applications and develop their AI/ML models. In this scenario, it’s very common to decompose various phases of the development into multiple notebooks to simplify the development and management of the model lifecycle.
Luciano Resende details how to schedule together these multiple notebooks that correspond to different phases of the model lifecycle into notebook-based AI pipelines and walk you through scenarios that demonstrate how to reuse notebooks via parameterization.
Regarding Clouds, Mainframes, and Desktops … and LinuxRobert Sutor
In this talk, I'll focus on three areas of great opportunity as well as challenge for Linux: the accelerating market for cloud computing, Linux as a significant operating system for mainframes, and the hope for Linux on the desktop.
DCEU 18: Edge Computing with Docker EnterpriseDocker, Inc.
Marc Meunier - Director of Business Development, Docker
Adam Parco - Director of Engineering, Edge & IoT, Docker
The Internet of Things (IoT) is pushing more computing to the edge - where data from devices can be aggregated, filtered, and analyzed before it’s sent somewhere else. As edge devices become more powerful and capable of running sophisticated applications, the edge servers have to keep pace with development. The challenge for edge computing is that these servers and devices are distributed geographically across many sites and sometimes inaccessible. The Docker platform is designed for distributed computing and provides an easy way to securely distribute and run applications at the edge. In this session, we will outline some of the major trends around edge computing and the common architectures and use cases across different industries. We will highlight some of the work we’re doing with our customers to deliver on these edge use cases and where Docker is headed.
Kubernetes and Cloud Native Update Q4 2018CloudOps2005
This year’s final set of Kubernetes and Cloud Native meetups just took place. They kicked off in Kitchener-Waterloo on November 29th, and continued in Montreal December 3rd, Ottawa December 4th, Toronto December 5th, and Quebec December 6th. In preparation for the upcoming KubeCon and CloudNativeCon in Seattle, a wide range of open source solutions were discussed and, as always, beer and pizza provided. Ayrat Khayretdinov began each meetup with an update of Kubernetes and the Cloud Native landscape.
Cloud-Native PostgreSQL is a Kubernetes Operator for Postgres written by EDB entirely from scratch in the Go language and relying exclusively on the Kubernetes API.
This webinar covered:
- About DevOps & Cloud Native
- Overview of Cloud Native Postgres
- Storage for Postgres workloads in Kubernetes
- Start Using Cloud-Native Postgres
- Demo
Jupyter Enterprise Gateway enables Jupyter Notebook to launch remote kernels in a distributed cluster, including Apache Spark managed by YARN, IBM Spectrum Conductor or Kubernetes.
It provides out of the box support for the following kernels:
Python using IPython kernel
R using IRkernel
Scala using Apache Toree kernel
Strata - Scaling Jupyter with Jupyter Enterprise GatewayLuciano Resende
Born in academia, Jupyter notebooks are prevalent in both learning and research environments throughout the scientific community. Due to the widespread adoption of big data, AI, and deep learning frameworks, notebooks are also finding their way into the enterprise, which introduces a different set of requirements.
Alan Chin and Luciano Resende explain how to introduce Jupyter Enterprise Gateway into new and existing notebook environments to enable a “bring your own notebook” model while simultaneously optimizing resources consumed by the notebook kernels running across managed clusters within the enterprise. Along the way, they detail how to use different frameworks with Enterprise Gateway to meet the needs of data scientists operating within the AI and deep learning ecosystems.
Kubernetes is an open source container cluster orchestration platform founded by Google. This presentation covers an overview of it's main concepts, plus how it fits into Google Cloud Platform. This was delivered by Kit Merker at DevNexus 2015 in Atlanta.
Everything is changing from Health Care to the Automotive markets without forgetting Financial markets or any type of engineering everything has stopped being created as an individual or best-case scenario a team effort to something that is being developed and perfectioned by using AI and hundreds of computers.And even AI is something that we no longer can run in a single computer, no matter how powerful it is. What drives everything today is HPC or High-Performance Computing heavily linked to AI In this session we will discuss about AI, HPC computing, IBM Power architecture and how it can help develop better Healthcare, better Automobiles, better financials and better everything that we run on them
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARNDataWorks Summit
Deep learning is useful for enterprises tasks in the field of speech recognition, image classification, AI chatbots and machine translation, just to name a few.
In order to train deep learning/machine learning models, applications such as TensorFlow / MXNet / Caffe / XGBoost can be leveraged. And sometimes these applications will be used together to solve different problems.
To make distributed deep learning/machine learning applications easily launched, managed, monitored. Hadoop community has introduced Submarine project along with other improvements such as first-class GPU support, container-DNS support, scheduling improvements, etc. These improvements make distributed deep learning/machine learning applications run on YARN as simple as running it locally, which can let machine-learning engineers focus on algorithms instead of worrying about underlying infrastructure. Also, YARN can better manage a shared cluster which runs deep learning/machine learning and other services/ETL jobs with these improvements.
In this session, we will take a closer look at Submarine project as well as other improvements and show how to run these deep learning workloads on YARN with demos. Audiences can start trying running these workloads on YARN after this talk.
Speakers:
Sunil Govindan, Staff Engineer
Hortonworks
Zhankun Tank, Staff Engineer
Hortonworks
Containers as Infrastructure for New Gen AppsKhalid Ahmed
Khalid will share on emerging container technologies and their role in supporting an agile cloud-native application development model. He will discuss the basics of containers compared to traditional virtualization, review use cases, and explore the open-source container management ecosystem.
DNUG46 - Build your own private Cloud environmentpanagenda
Visit Nicos presentation to learn how you can build your own private on-premises cloud. The aim of this session is to give you an overview of how to build a private cloud environment, taking into account necessary requirements based on Kubernetes.
Nico will talk to you about cluster provisioning itself and the Toolchain you need to run your environment successfully. In the foreground are topics like infrastructure as code, CI/CD, storage, monitoring, high availability and security. Gain the knowledge and best practices you need to build your own private cloud environment.
Follow Nico on: https://pan.news/NicoMeisenzahlSlideShare
Besuchen Sie Nicos Vortrag um zu erfahren wie Sie Ihre eigene Private Cloud on-premises aufbauen können. Ziel dieser Session ist es, dass Sie einen Überblick erhalten, wie man eine Private Cloud Umgebung auf Basis von Kubernetes, unter Berücksichtigung notwendiger Anforderungen, aufbaut.
Nico wird mit Ihnen über das Cluster Provisioning selbst sowie die benötigte Toolchain sprechen, die Sie benötigen um die eigene Umgebung erfolgreich zu betreiben. Im Vordergrund stehen dabei Themen wie Infrastructure as Code, CI/CD, Storage, Monitoring, Hochverfügbarkeit und Sicherheit. Erhalten Sie das notwendige Wissen sowie Best Practices, die Ihnen beim Aufbau Ihrer eigenen Private Cloud-Umgebung helfen.
Klaus Gottschalk from IBM presented this deck at the 2016 HPC Advisory Council Switzerland Conference.
"Last year IBM together with partners out of the OpenPOWER foundation won two of the multi-year contacts of the US CORAL program. Within these contacts IBM develops an ac- celerated HPC infrastructure and software development ecosystem that will be a major step towards Exascale Computing. We believe that the CORAL roadmap will enable a massive pull for transformation of HPC codes for accelerated systems. The talk will discuss the IBM HPC strategy, explain the OpenPOWER foundation and the show IBM OpenPOWER roadmap for CORAL and beyond."
Watch the video presentation: http://wp.me/p3RLHQ-f9x
Learn more: http://e.huawei.com/us/solutions/business-needs/data-center/high-performance-computing
See more talks from the Switzerland HPC Conference:
http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Migrating Enterprise Microservices From Cloud Foundry to KubernetesTony Erwin
Slides originally presented in Shanghai at KubeCon + CloudNativeCon China 2018. Content developed by Tony Erwin and Jonathan Schweikhart.
Abstract: Historically, the forty microservices making up the IBM Cloud UI have been deployed as apps on Cloud Foundry (CF), an open source PaaS. But, recently, this enterprise microservice system has been migrated to run on Kubernetes to take advantage of improved orchestration, higher availability, and better performance. Tony Erwin & Jonathan Schweikhart will discuss their journey and provide insights into the advantages of Kube over CF. Even more importantly, they will describe approaches to solving new problems that took the place of old ones, such as: 1) adapting PaaS apps to run as containers on Kube, 2) enabling geo load balancing between the different platforms (to vet Kube before entirely replacing CF), 3) integrating tools like Prometheus into existing monitoring systems, and more! Their team's experiences will help you avoid pitfalls as you look to perform your own migrations to Kube!
NOTE: CF is always evolving and the limitations on private networking and private host names mentioned in the slides are no longer current. If you have access to CF API 2.115.0 or higher (released on June 25, 2018), you can leverage CF's service discovery feature (see https://docs.cloudfoundry.org/devguide/deploy-apps/cf-networking.html#discovery ).
The Download: Tech Talks by the HPCC Systems Community, Episode 11HPCC Systems
Join us as we continue this series of webinars specifically designed for the community by the community with the goal to share knowledge, spark innovation and further build and link the relationships within our HPCC Systems community.
Episode 11 includes Tech Talks featuring speakers from our community on topics covering Big Data solutions, Spark Integration and other ECL Tips leveraging the HPCC Systems platform.
1) Raj Chandrasekaran, CTO & Co-Founder, ClearFunnel - Scaling Data Science capabilities: Leveraging a homogeneous Big Data ecosystem
2) James McMullan, Software Engineer III, LexisNexis Risk Solutions - HDFS Connector Preview
3) Bob Foreman, Senior Software Engineer, LexisNexis Risk Solutions - Building a RELATIONal Dataset - A Valentine’s Day Special!
Imagine an entire IT infrastructure controlled not by hands and hardware, but by software. One in which application workloads such as big data, analytics, simulation and design are serviced automatically by the most appropriate resource, whether running locally or in the cloud. A Software Defined Infrastructure enables your organization to deliver IT services in the most efficient way possible, optimizing resource utilization to accelerate time to results and reduce costs. It is the foundation for a fully integrated software defined environment, optimizing your compute, storage and networking infrastructure so you can quickly adapt to changing business requirements. A comprehensive portfolio of management tools dynamically manage workloads and data, transforming a static IT infrastructure into a workload- , resource- and data-aware environment.
Learn more: http://ibm.co/1wkoXtc
Watch the video presentation: http://insidehpc.com/2015/03/slidecast-software-defined-infrastructure/
We are on the cusp of a new era of application development software: instead of bolting on operations as an after-thought to the software development process, Kubernetes promises to bring development and operations together by design.
Cloud computing comes into focus only when you think about what IT always needs: a way to increase capacity or add capabilities on the fly without investing in new infrastructure, training new personnel, or licensing new software. Cloud computing encompasses any subscription-based or pay-per-use service that, in real time over the Internet, extends IT's existing capabilities.
Service-Level Objective for Serverless Applicationsalekn
Deploying commercial applications that meet their expected business needs is challenging due to the differences between how business goals are specified and how the system is evaluated. Furthermore, business goals are dynamic, requiring deployment to change constantly over time. Such difficulties make it costly to maintain application quality as the underlying infrastructure is not always fast enough to keep up with business changes. Nowadays, serverless opens a new approach to build application. By abstracting out the deployment details, serverless application can be implemented with minimum deployment efforts. Serverless also reduces maintenance cost with auto-scaling and pay-as-you-go. Such abilities make us believe that by adopting serverless, we can build application that can meet and quickly adapt to business goals.
However, simply writing applications with serverless is not sufficient. Due to best-effort invocation mechanisms and the lack of application structure awareness, serverless performance is highly variable and often fails to support applications with rigorous quality of service requirements. In this study, we aim to mitigate such limitations by coupling serverless deployment with business needs. In particular, we define an Serverless Service-Level Objective (SLO) interface that allows developers to describe their application structure and business goals in terms of software-level objectives. We implement an SLO enforcer, which uses this information in combination with the system performance metrics to decide a proper serverless deployment and resource allocation for meeting business goals. The Serverless SLO leverages blueprint model, which allow developers to describe applications' architecture and runtime characteristics needs, to map application description to serverless function deployment on the top of Knative. We deploy our proposed system on KinD, a tool to run Kubernetes cluster over our local Docker container, and evaluate it with different system configurations. Evaluation results showed that SLO definition and enforcement helps serverless application use resources in accordance with business goals.
Similar to When HPC meet ML/DL: Manage HPC Data Center with Kubernetes (20)
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
ER(Entity Relationship) Diagram for online shopping - TAEHimani415946
https://bit.ly/3KACoyV
The ER diagram for the project is the foundation for the building of the database of the project. The properties, datatypes, and attributes are defined by the ER diagram.
When HPC meet ML/DL: Manage HPC Data Center with Kubernetes
1. When HPC Meet ML/DL
manage HPC Data Center
with Kubernetes
Yong Feng (yongfeng@ca.ibm.com)
2. IBM Systems
Please Note:
• IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice
and at IBM’s sole discretion.
• Information regarding potential future products is intended to outline our general product direction and it
should not be relied on in making a purchasing decision.
• The information mentioned regarding potential future products is not a commitment, promise, or legal
obligation to deliver any material, code or functionality. Information about potential future products may not be
incorporated into any contract.
• The development, release, and timing of any future features or functionality described for our products
remains at our sole discretion.
• Performance is based on measurements and projections using standard IBM benchmarks in a controlled
environment. The actual throughput or performance that any user will experience will vary depending upon
many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the
I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be
given that an individual user will achieve results similar to those stated here.
| 2
3. 3IBM Systems
Senior Architect of IBM Spectrum (former Platform Computing)
• Work on resource manager and workload scheduler for 12+ years after Ph.D
• Lead team on Open Source development from OpenStack, Yarn, Mesos, Kubernetes to
Spark etc.
• Lead team on core platform development of IBM Cloud Private
Who am I?
4. IBM Systems
Agenda
• What does ML/DL mean for HPC?
• What does Container/Docker mean for HPC?
• Kubernetes Basic
• Run MPI job on Kubernetes
• Run ML/DL Pipeline on Kubernetes
• Gaps of Kubernetes for HPC DataCenter
• What about Now?
| 4
6. 6IBM Systems
• New business challenges, especially Big Data, bring new topics,
HPDA, AI and IoT.
• Algorithm scientists have to keep optimizing their codes by new
technology
• ML/DL solves business problem across many domains
• New hardware technology makes ML/DL possible.
ML/DL is HPC’s 1st Consumer Killer App?
7. IBM Systems
Compute Resources & Network
Simulation
Visualization
Analytics Machine
Learning
Remote
UsersRemote
Users
Remote Users
• Scheduler controls job start and
placement
• Applications exchange data as
needed
• Producers
• Consumers
• Both
• Remote users receive/provide
feedback
Scheduler
data exchange
data exchange
HPC Solution Workflow
8. 8IBM Systems
• HPC common requirements
• Hardware: high IOPS Storage, low-latency networks,
powerful CPU, large Memory, etc.
• Software: parallel accelerators, job scheduler
• GPU becomes critical
• Various framework, more than just job, such as, in-memory
databases, long running services, etc.
• MPI is still important
• Development pipeline
• Container does matter
Infrastructure and Software Challenge
10. 10IBM Systems
• Portability to resolve the complexity
• Scalability to fit the nature of distribute/parallel computing
• Developer friendly with pipeline of develop, build, distribute and
deploy
• Improve resource utilization
• Less overhead
• Network and resource isolation
• Supported by existing HPC job scheduler
Values
11. 11IBM Systems
• Old Linux kernel
• Support infrastructure device/software, IB, parallel FS, GPU,
FPGA, etc.
• Security
• Limit HPC specific optimization
• Image control
• Trouble-shooting
Challenge
From: https://www.hpcwire.com/2017/05/04/singularity-hpc-container-technology-moves-lab/
From: http://www.hpctoday.com/viewpoints/containers-meet-hpc/
13. 13IBM Systems
Kubernetes Features
Intelligent Scheduling Self-healing Horizontal scaling
Service discovery
& load balancing
Automated rollouts
& rollbacks
Management of secret
& configuration
Storage orchestration
Batch Execution
14. IBM Systems
Kubernetes Concepts
A group of co-located containers
| 14
A service defines a set of pods and
a means by which to access them,
such as single stable IP address and
corresponding DNS name.
A volume is a directory, possibly
with some data in it, which is
accessible to a Container as part of
its filesystem.
A label is a key/value pair that is
attached to a resource, such as a
pod, to convey a user-defined
identifying attribute.
A replicateset ensures that
a specified number of pod replicas
are running at any one time.
A statefulset is a Controller that provides
a unique identity to its Pods. It provides
guarantees about the ordering of
deployment and scaling.
ReplicateSet StatefulSet
A job creates one or more pods and
ensures that a specified number of
them successfully terminate.
A Secret is an object that contains a
small amount of sensitive data. Such
information might be put in a Pod
specification or in an image
Batchjob
Secret
18. 18IBM Systems
• Docker image of MPI running environment
• Kubernetes BatchJob to manage MPI job lifecycle
• Kubernetes Secret for password-less ssh access among workers
• Bootstrap to integrate with MPI Process Lifecycle Management
(PLM)
• Kubernetes platform to work with other services and resources
• Kubernetes platform for general data center platform
Run MPI in Kubernetes
(bootstrap)
mpirun
Job pod
(bootstrap)
sshd
(bootstrap)
sshd
kube-api
Job pod Job pod
19. 19IBM Systems
• Docker image of Tensorflow running environment
• Kubernetes BatchJob to manage Tensorflow training job lifecycle
• Kubernetes Volume to share the data
• Kuberentes Deployment/Service to provide Tensorflow serving
service
• Kubernetes platform to work with other services and resources
• Kubernetes platform for general data center platform
Run Tensorflow Pipeline In Kubernetes
ps task
ps task
worker task
worker task
worker task
input
log
mode
l
JobVolume
dashboard
Deployment/ServiceVolume
serving
serving
Deployment/Service
test
Job
20. 20IBM Systems
• Kubernetes Deployment/Service for rolling upgrade
• Integrate with CI/CD utilities
Extend the Pipeline to Iterative Development
ps task
ps task
worker task
worker task
worker task
input
log
mode
l
JobVolume
dashboard
Deployment/ServiceVolume
serving
serving
Deployment/Service
test
Job
new
algorithm
new image
22. 22IBM Systems
• Lack of feature on job scheduling
• Job group: ps task and worker task
• Job queue: priority, fare-sharing, pre-emption, etc.
• MPI: gang-scheduling, PLM integration, placement policy
• Advance reservation
• Lack of feature on container support
• MPI optimization: optimization based on placement topology,
share IPC, NUMA/CPU binding, job recovery
• Lack of feature on security
• Image control
Gaps of Kubernetes for HPC
23. 23IBM Systems
• Job queue: (#36716)
• Introduce job queue concept and related resource sharing
policy
Planned Project in Community
HPDA = Data-Intensive Computing Using HPC
Domains
Manufactory:
Retail
Life science
Travel
Finance
Energy&Utility
HPDA = Data-Intensive Computing Using HPC
Domains
Manufactory:
Retail
Life science
Travel
Finance
Energy&Utility
Applications are different and each serves a purpose in computing an overall actionable solution to a problem
Not all applications need the same data or any data at all hence each application is classified as a data producer, consumer, or both
Remote user can be located on Intranet or Internet
A lot of point to point transfer data transactions – every application needs to know who it needs to send data to and every application needs to know who it should receive data fromvery cumbersome and potentially complicated if an application should fail or a new application starts
Complexity:
Dependencies: tools, compilers, libraries, etc
Software stack: academic sw is difficult to install, configure and deploy
Heterogeneous platform/architecture: laptop->supercomputer, x86-power
http://www.hpctoday.com/viewpoints/containers-meet-hpc/
https://www.nextplatform.com/2016/09/13/will-containers-total-package-hpc/
Security:
Containers launched as root
Access to bare metal, filesystems& device drivers
Infrastructure device: incompatibility of low level kernel
Image control: vulnerabilities
Limit HPC specific optimization: MPI local memory sharing, HDFS/GPFS data locality