5. Common pain points today
Regarding any software deployment on Cloud, HPC Clusters or local
environment
• Every environment is slightly different
• OS, Networking, Libraries, Compilers etc.
7. Docker Basics
• Container runtime on Linux
• “Standard” container image format
• Centralized repository of images:
• https://hub.docker.com/
• (like github except for runtimes)
8. CoreOS Basics
• Very minimal Linux distribution
• Containing basically the kernel and a few unix utilities
• Main tools:
• systemd for running services
• etcd for clustering hosts together
• fleet for distribution across cluster
• docker for running applications
9. “Everything CoreOS is building has been inspired by the way
Google runs its data center infrastructure”
- Alex Polvi, CEO CoreOS
Source: http://www.datacenterknowledge.com/archives/2014/07/16/etcd-secret-sauce-googles-kubernetes-pivotals-cloud-foundry/
10. Docker Advantages
• Simple Packaging
• Thousands of ready made images
• Low overhead (compared to VMs)
• Portability (Build once run anywhere)
• Composable
• (One step closer to the Lego block dream)
11. Docker Build:
Dockerfile
FROM ubuntu:14.04
MAINTAINER Tryggvi Larusson <tryggvi@greenqloud.com>
RUN apt-get update
RUN apt-get -y install mysql-server
EXPOSE 3306
CMD ["/usr/bin/mysqld_safe"]
docker build -t=“mysql” .
24. Run anywhere
KVM VM VMWare VM
Debian Linux
Baremetal Machine
CoreOS Linux
myapp1
myapp2
myapp1
myapp2
myapp1
myapp2
Cloud Instance
myapp1
myapp2
CentOS Linux
25. Implications for HPC developers/admins
• Freedom of choice
• Have more options in selecting where your workloads actually run - easier to
experiment
• Speed of deployment
• Standard containers make it very easy to automate
• Collaboration
• Easily share your HPC application to others and get feedback
• Build on an HPC app/container somebody already built and improve it
Hi my name is Tryggvi Larusson, and I’m the co-founder and CTO of GreenQloud.
I’m going to talk about the future of the cloud and how emerging technologies like containers are changing the way applications are developed and packaged and how this is leading towards the convergence of all kinds of workloads, including HPC style workloads with cloud computing
Just to give a little context, I want to briefly explain what we do at GreenQloud.
GreenQloud is a company out of Iceland and our main product is called QStack: which is a cloud infrastructure solution for private, public or hybrid deployments.
Here we see a screenshot of QStack, how that can be used to manage computing clusters, both in the form of virtual machines and bare metal deployments. I’m going to show a short demo on that later a little bit how that works.
So I want start with addressing the main topic of this talk, which is how to easily or conveniently package your applications in a standard way,
so that you can easily distribute and deploy it. and therefore everybody who knows this package format can easily run the application in a standard way.
I suspect this is a very common problem among both system administrators as well as application developers.
Like I said this problem is something many are very familiar with that are deploying applications in both public cloud environments, local virtualised environments as well as hpc clusters.
The key issue that many face is that every environment they deploy their application on is slightly different. A very common problem is that environments have different operating systems, even different variants of the same operating systems such as RedHat or Debian based variants of the same Linux operating system but have incompatible packaging systems, in this case RPM and aptitude. You may have different networking topologies so that for example you could have mulitple network adapters or private or public ip addresses and so on. Very commonly you have slightly different versions of libraries, and especially in HPC environments you have often different compilers or runtime systems that your application depends on.
So how to solve these problems.
One solution I present is docker and by extension CoreOS.
So what docker gives you is a very basic common ground to work on which is their container based framework and the basic Linux kernel which everything builds upon.
CoreOS in turn is just a very convenient way to run docker containers as it just contains the basic operating system and very little additional overhead.
Of course you can run docker on any moden linux distribution that has a recent enough, approximately version 3.8 or up of the kernel.
So I don’t know how many of you know the docker platform in detail, but I’m just going to go over some of the fundamentals of it.
For the first part it is a container runtime which means that it builds upon the basic features found in the linux kernel, the same as the popular LXC or linux containers projects builds on which are control groups and namespaces. So in essence it is a little bit like old style Unix jails on steroids.
The second thing docker introduces is a standards packaging format, or a binary format that packs a container into a binary.
Thirdly Docker has a central repository, a little bit similar to what github is for code, except this is for ready made images that are prebuilt and can just be run up with one simple docker run command.
CoreOS is a very exciting project, in one way because it goes a bit back to the basics of what the operating system should be, which is to provide very basic features to the running applications, so it just provides the basic standard linux kernel plus a few unix utilites.
CoreOS forgoes many of the assumptions we have gotten to know about Linux distributions because it is so minimal. For instance it does not have a package manager, but instead relies on docker to install applications.
Other than docker then CoreOS standardizes on systemd for running services and introduces a very simple clustering system or clustered key value store with built in consensus algorithms for very easily being able to build clustered applications simply through a rest protocol or easy command line tools.
CoreOS builds on etcd to build sort of a clustered systemd which they call fleet, as well as an overlay network system they call flannel.
It’s interesting that a lot of these developments somehow seem to be linked to or inspired by the way Google has been doing things internally, they have for at least the past ten years deployed they applications on a vast globally distributed cloud in a platform they call Borg or more recently Omega.
The people behind CoreOS have had no secret about that the way they are designing their OS is highly inspired by the way google has architected their solutions and they want to bring this kind of architecture to the masses through an open source project. So things like etcd and Kubernetes are both either inspired by Google or directly donated from Google.
CoreOS and Docker are a perfect companion in implementing this kind of architecture, and even though these projects are really young they show tremendous promise in being able to deliver this in a very simple and elegant manner.
In addition to Docker there are really exciting project that show great promise such as Kubernetes, which is only roughly half a year old. Mesos is also really interesting as it has some of the same goals and lines really well with a containerized and distributed solution like Docker.
The big cloud IaaS players are racing to support these technologies, but I suspect that this next phase of the cloud will benefit much more the smaller players because it has the possibility of leveling the field when these become commodity technologies.
And of course all of these technologies are open source.
One of the biggest advantages Docker brings is the easy way applications can be packaged, just a very simple description file called a Dockerfile and the a build command an you’re set.
Through the docker hub it brings thousands of ready made applications prepackaged for you.
And docker as well bring very low overhead, pretty much bare metal native performance, which can sometimes make a huge difference if you compare it with more traditional ways of packaging such as virtual machines.
The portability aspect of docker is also very important as the standard runtime makes it very easy to bring an application from one environment to the other, the only thing you need in common is the Linux kernel and the Docker runtime.
Docker is also taking us one step closer to this pipe dream of being able to build applications like lego blocks of ready made components, the standardised runtime makes it much simpler than before to compose together an application of prebuilt docker containers.
So here is an example of an very simple example of a dockerfile. In this case a mysql server. Here we build upon the stock ubuntu 14.04 container from the docker hub and add one package and define the command that should run it.
So to build this into a container you simply have to execute the docker build command illustrated below the docker file itself, and it is as simple as that.
And to run this newly built container there is just this simple docker run command , docker run and you just have to decide if you want to run it in the background, and how you want the ports to be mapped, and the name of the container to run.
Now if we step a bit back where is this all leading us.
Most of the technology powering big data and high performance computing is still very young, hailing from an open source world, and largely developed (and utilized) in research environments - not unlike the beginnings of the UNIX/Linux history.
Fairly inexpensive IT resources - such as those offered in public clouds - made development cheap, but because of the ultimate goal of these were to work with strictly regulated data (such as in the healthcare sector), the technology needed to be independent from the particularities of both public and private clouds.
In short: big data and HPC are looking for solutions they can design and develop once, and deploy anywhere.
I trust most of you here already have a good grasp on what cloud computing is, and how it transformed our way of thinking about and using IT resources.
Instead, I’d like to give an overview about how the evolving requirements of big data and high performance computing changed the direction of cloud computing.
If we look at the state of the cloud today we see that it is highly concentrated and centralised on a few key players. Here I’ve drawn a picture of the geographical locations of the biggest player on the market, which you might guess is AWS.
You see that the locations are quite few so close to half of the cloud internet traffic is going to these few locations that are almost all concentrated in the western countries of the world.
This is the remnant of a historical progression in IT development, that is turning back to decentralization.
During the last decade we’ve been seeing strong trends toward a centralization of applications, but recently we are heading back toward decentralization.
With this I’m really talking about the cutting edge of cloud deployments, that largely will be used by early adopters like those specializing in big data, but others will follow in due time.
If we look at the development of the cloud then really the precursor to the cloud was in-house virtualization, these are the kind of VMware style on premise datacenter solutions. Some people like to call this “cloud” still today - I don’t know why.
After the phase within cloud adoption where we had the deployment options of public and private clouds, the current step that we find ourselves taking is the hybrid cloud where you are starting to decentralize your applications again.
The next phase is the development to a distributed cloud architecture that I like to call a cloud federation where the distinction between private and public clouds are getting increasingly fuzzy and workloads could be dynamically moving between several actual cloud platforms.
The cloud is heading down a radical new path toward decentralization, so that all cloud applications will increasingly become distributed systems.
This kind of approach is not new: think about SETI, for which you could download an application to turn your own “infrastructure” (your desktop computer) into a data processing node for the project.
Of course technology has come a long way since then.
We’re at the beginnings of establishing a foundation for a distribution of the cloud ecosystem.
The main reason behind these changes is the needs of big data and high performance computing, that are in demand so much that they were capable of disrupting the market and compel cloud vendors - such as GreenQloud - to change directions and re-evaluate their approach.
So standard packaging and containers are enabling smaller scale cloud deployments to be viable and some existing clusters could be modified to mix workloads between on premise and public cloud services.
All of this enables the cloud infrastructure to be a truly distributed and global phenomenon.
So I’ve already mentioned Docker and CoreOS but some of the other interesting technologies that I could mention are Mesos and Kubernetes, that also have links to Docker actually. I included as well hadoop here as it is the de-facto standard in the
What will happen is that we will see a vast global interconnected cloud of clouds, and seamlessly deploying our applications, usually in the form of containers like Docker containers on this global grid.
Easy access to these standard commodity software components will make it just as easy to set up a cloud infrastructure as a regular white box linux machine.
No I want to demo how easy it is to setup an hadoop HDFS cluster in basically one click and deployed through docker.
Secondly I want to demo a cluster of docker servers working together to seamlessly deploy applications anywhere in the cluster, this a very new addition to the docker platform called Swarm.
One of the bit advantages of containers and Docker as an consequence is the low overhead. This is specifically notable when you compare containers against more traditional technologies like VirtualMachines.
Because containers don’t have the overhead and the burden of the hypervisor, sit just inside the operating system like any other process they perform pretty much on par with anything that runs directly on bare metal.
You also have less memory overhead because you can share memory with the host operating system but of course comes with the cost that you have a little bit less isolation between the containers which can have security implications.
HPC developers and businesses will be able to create containerized, packaged applications that can run on any scale - maximizing the efficiency of available resources.
There have been made some studies comparing docker with virtualisation solutions like KVM.
Here we see some numbers from a study from IBM where you see comparisons and you can note that on some benchmarks like Linpack there is considerable overhead of the KVM virtualisation which in this case is because of bottlenecks of how NUMA works within a virtualized environment, but has no effect on a container or Docker environments.
In some other benchmarks like the STREAM benchmark which is a memory throughput benchmark there is considerably less of a difference between KVM and Docker and bare metal, so it may depend on the workload how much of overhead the virtualization costs.
In addition, because of the commodity, industry-standard technologies that define this new ecosystem, our HPC applications crunching big data will be comfortable running in both private and public clouds.
Basically the whole of the current cloud development has been built around big data and HPC projects to process an incredible amount of unprocessed data amassed: more of it in fact in the 11 years since 2003 than since the dawn of time until 2003.
In the future this will be even more pervasive, especially as funding for these projects and their enabling support technologies increases, thanks to the ever-growing interest in the Internet of Things.
Because of what these developments require from vendors like GreenQloud, we’re tremendously excited about the future. Open source standards mean a much more levelled playing field and open market, and we are happy to have noticed these trends in time to position our product properly.
Here is an additional list of the most important components in the QStack software product. This is actually just a short list as we have a lot of smaller components or frameworks that we use that aren’t listed here.
For instance we have here the hypervisor layer which it in our case by default KVM which is in turn part of the Linux kernel.
We also make quite extensive use of chef as a devops style tool to automate deployments and mange updates of the software.
We have started to use docker a bit and there are many projects in addition.
So all of this then I think we’re heading for a convergence of all kinds of workloads into a next generation federated or distributed cloud.
Of course you will still have different kinds of architectures so that some applications run better on some specific type of hardware or network topology but you will still have a more unified way of packaging and running application in a more standardised and much more flexible way than today.
This means that you don't have to thin about the underlying stuff as much as before, weather you’re running your application on a cloud instance, in KVM or VMware virtualised environment, or on a bare metal machine, with a technology like docker none of this really matters to your application.
Just as well as you are freer to be decoupled from the actual linux distribution sitting underneath as long as you are running on a fairly recent linux kernel.
Now what does all of this mean to the typical HPC application developer or sysadmins.
Firstly it enables much more freedom where you deploy your applications and you can very easily move between several deployment models and have your application flexible for that.
The simplicity and standard form of the technology makes it really easy to get started, so HPC application deployments could be done in a matter of a few clicks like I just demonstrated.
Lastly it enables much easier collaboration, because if everybody packages their application in the same form then it will be much easier and faster for new people to try out the applications and possibly come up with improvements
As well, just like in the old tradition of the scientific community, you can much easier build upon the work of others because of the standard and simple form of the container technology.
QStack was designed to perform the same way on any deployment, as a true hybrid solution. The features that are included enable and support HPC and big data applications.