workshop_6_c__.pdf

COLLEGE OF COMPUTING, GEORGIA INSTITUTE OF TECHNOLOGY
Workshop 6/Systems Workshop 1:
Master Node in Map Reduce
In this module of the class, you are going to implement the base code for a
fault-tolerant Master in the MapReduce framework. Additionally, you are going to
create the handlers, interfaces, and scoreboard required for the Master. You will
be using docker containers as nodes, C++ as your implementation language, and
Kubernetes to orchestrate the whole thing.
1 EXPECTED OUTCOME
The student will learn about:
• The data structures associated with the Master of the MapReduce framework.
• Implementing remote procedure calls (RPC) to execute code on remote computers
(virtual machines), using the library gRPC.
• Leader election using etcd/zookeeper.
Specifically, you will:
1. Develop gRPC client and server applications in the C++ programming language.
2. Implement leader election in the applications using the distributed data store etcd or
zookeeper.
3. Develop the applications to run on containers built with Docker and deploy the contain-
ers using Kubernetes.
1

2 ASSUMPTIONS
This workshops assume that the student knows how to program in C++. The student is using a
computer with Ubuntu as the operating system (or a virtual machine).
3 BACKGROUND INFORMATION
This section goes through some basic concepts in Kubernetes, Helm and Kind that would be
helpful for this module. If you are familiar with these technologies, feel free to skip to the next
section.
3.1 KUBERNETES
In the NFV workshop, you used Docker containers as nodes in a network, utilizing it as a
lightweight VM. While this is sufficient for running single containers and a non complex system,
for a distributed system that needs multiple containers running at once with replication,
failures, and communication between each other, we would need some system to coordinate
and orchestrate that.(A specific example of this would be the orchestrator you’ve built for the
NFV project).
This is where Kubernetes comes in. Kubernetes is a service that manages automatic deploy-
ment, scaling, and management of containerized applications across a distributed system of
hosts. For those who are unfamiliar with Kubernetes, it is crucial to understand how Kuber-
netes models an application.
Figure 3.1: Kubernetes Abstraction
The figure above shows a rough diagram of how Kubernetes functions. The lowest level of
granularity in Kubernetes is a pod. Pods can be thought of as a single "VM" that runs one
or more docker containers. The images for containers ran in pods are pulled from either a
public, private, or local container registry. You can think of a container registry as a repository
of docker images. Each physical node in a Kubernetes cluster can run multiple pods, which
2

in turn, can run multiple docker containers. For simplicity, we recommend running a single
docker container in a pod for this module. Developers can connect to a kubernetes cluster
using the kubectl command line tool. Once connected, developers can deploy their application
on the cluster via the command line and a YAML configuration file.
Figure 3.2: Kubernetes Objects
While Figure 3.1 explained the basic abstraction of a Kubernetes cluster, Kubernetes defines
different objects that wraps around the basic concept of pods, and are used in the YAML
configuration file to setup your application. Figure 3.2 illustrates the Service, Deployment,
and Replica Set objects. A replica set defines a configuration where a pod is replicated for a
number of times. If a pod in a replica set dies, the Kubernetes cluster will automatically spawn
a new pod. A deployment object is a more generalized object that wraps around Replica sets,
and provides declarative updates to Pods along with a lot of other useful features. In general,
Replica Sets are not explicitly defined in a kubernetes configuration file, a deployment object
that specifies the number of replicas for pods will automatically set up a replica set. Finally, a
Kubernetes service object can connect to multiple deployment. Since pods can fail and new
replica pods can be added in a deployment, it’d difficult to interact with your application with
only deployments. A kubernetes service acts as a single point of access to your application.
For this module, you will define your own kubernetes YAML file for mapreduce. More
information on how to actually write the YAML file can be seen in this document, or this
youtube video. We recommend reading through the workload and services sections of the
Kubernetes document.
3.1.1 HELM AND KIND
Now that you have a basic understanding of Kubernetes, we’ll introduce to you two different
Kubernetes technologies that you will use in this module.
Helm is a package manager for Kubernetes. You can think of Helm as the "apt-get of
Kubernetes". Using helm, you can add public repositories of Kubernetes applications, which
contain ready-built kubernetes applications configs, known as "charts". You can then deploy
3

one of these public charts directly onto your own Kubernetes cluster. We will use Helm to
deploy an etcd or ZooKeeper Kubernetes service onto our cluster.
Kind is a local implementation of a Kubernetes cluster. Since Kubernetes is designed to run
on a cluster of multiple hosts, it is somewhat difficult to work with locally since you only have
one host. Some clever developers have figured out a way to simulate a distributed environment
using Docker called Kubernetes in Docker(KIND). KIND will be used as your local kubernetes
cluster for you to test your code.
Helm and Kind will be installed using the provided install script.
3.2 USEFUL REFERENCES
• C++ Tutorial
• Thinking in C++ 2nd Edition by Bruce Eckel
• Modern C++ by Scott Meyers
• Kubernetes Concepts
• Kubernetes YAML video
4 SPECIFICATION
Using Kubernetes, Docker, and etcd or ZooKeeper you are going to implement a Fault-Tolerant
Master node on your local machine.
5 DOWNLOAD REPO
$ sudo apt-get update
$ sudo apt-get install git
$ mkdir -p ~/src
$ cd ~/src
$ git clone https://github.gatech.edu/cs8803-SIC/workshop6-c.git
6 INSTALL DEPENDENCIES ON LOCAL DEVELOPMENT ENVIRONMENT)
The git repository contains a bash script to install all the required dependencies. Run as
follows:
$ cd ~/src/workshop6-c
$ chmod +x install.sh
$ ./install.sh
4

7 IMPLEMENTATION
This workshop has four phases:
1. Setting up C++ Applications
2. Create the required data structures for the Master.
3. Use GRPC for creating the RPC calls.
4. Implement Leader Election with etcd or ZooKeeper
5. Building Containers
6. Deploying Containers with Kubernetes
7.1 SETTING UP A GRPC C++ APPLICATION
The first thing you are going to do is create a simple gRPC application with C++. Please follow
the docs to guide your development.
Specifically, we would like you create the gRPC server in the worker node and put the gRPC
client in the master node. The server in the worker node should receive a string and return
the string plus gatech. For example, if the input is hello, the server should return hello gatech.
When the worker receives the call, it should log the input received using glog. There are a
couple of these out there, just choose one that you like.
Please test your binaries to make sure that they function correctly before moving onto the
next section.
7.2 IMPLEMENTING LEADER ELECTION WITH ETCD
Next, once the gRPC server and client have been created and can successfully exchange
information, you are going to implement leader election with etcd, a distributed, reliable
key-value store.
If you are using etcd, we recommend that you read about the API in the etcd docs, and follow
blog posts for how to implement leader election. Understand what is happening under the
hood, it will be discussed during the demo.
In addition to the master nodes, you should also think about how you are registering your
worker nodes. You don’t need to run an election for them, but saving some information in etcd
might be a good idea. We recommend looking into etcd leases to potentially help with saving
information for workers. Why could this be useful?
Unfortunately, at this point in time you will not be able to test your code unless you start a
local etcd cluster. If you would like to make sure your leader election works before proceeding
to the next section, be our guest! We are certain you will learn something by setting up etcd to
run locally on your machine.
5

7.3 IMPLEMENTING LEADER ELECTION WITH ZOOKEEPER
Selecting a leader is a really complex problem in distributed systems, luckily there are now
frameworks that allow us to implement it more easily for certain scenarios (like this one). We
are going to use Zookeeper to implement the leader election. First, read about how Zookeeper
works in here. An explanation (recipe) for implementing the leader election can be found here.
The directory that is going to contain all the master nodes is going to be call /master.
Zookeeper works as a standalone service. Your Master code should connect to it using the C
Binding, to facilitate this we are going to use a C++ wrapper that was already installed in the
previous script. The git for the repo can be found here. There are examples of how to use it
(and compile it) in the directory examples/.
Once the leader is elected, then it needs to replicate each local state change to all the
followers using RPC (to be implemented in following workshops). It is not until all the followers
responded back that a state change can be committed.
We are also going to use Zookeeper to keep a list of the available worker nodes, using
ephemerals nodes in the directory /workers (what is an ephemeral node?). Additionally, to
know which master replica to contact we are going to use the directory /masters and use the
value with the lowest sequence (as explained in the Recipe). If there is a scenario in which the
wrong master was contacted, it should reply back with and error and the address of the correct
Master leader.
7.4 BUILDING CONTAINERS
You can create your containers however you like, but we will give you hints on how to do it
with Docker.
1. Create two Dockerfiles, Dockerfile.master and Dockerfile.worker. Dockerfile Reference.
2. Create a build script, ./build.sh, in the root directory.
3. What does your build script need to do? Here are some suggestions:
a) Generate your gRPC binaries
b) Generate your application binaries
c) Build your docker images
7.5 DEPLOYING CONTAINERS WITH KUBERNETES
Now that you have the docker images and binaries set up, it’s time to build your Kubernetes
application. As noted in section 3, you will be using KIND to set up a local Kubernetes cluster.
1. Get KIND set up.
2. Create a cluster.
3. Write a Kubernetes YAML Deployment file that will deploy 2 master pods and 1 worker
pod(We recommend you using one service and two deployments)
6

4. Deploy your application to the local KIND cluster
5. Get logs for your pods.
Now is where the fun kicks in, it’s time to start wrestling with Kubernetes. Useful commands
can be found below:
1. Create a Kubernetes namespace to specify where things should be.
$ kubectl create ns <your-namespace>
2. Use Helm to install etcd chart onto your cluster
$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm install -n <your-namespace> etcd bitnami/etcd --set auth.rbac.enabled=false
3. Use Helm to install zookeeper chart onto your cluster
$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm install <your-namespace> bitnami/zookeeper
4. Load Docker images to Kind
$ kind load docker-image <your-master-image>
$ kind load docker-image <your-worker-image>
5. Deploy your application
$ kubectl -n <your-namespace> apply -f <your-kubernetes-configuration>.yaml
6. Helpful kubectl commands:
$ kubectl get all -n <your-namespace>
$ kubectl -n <your-namespace> logs pod/<your-pod-id>
$ kubectl -n <your-namespace> delete pod/<your-pod-id>
Hints: You may need to pass dynamic variables into your master, worker pod replicas (IP ad-
dress, ETCD/ZooKeeper endpoints information etc). You can do this by setting environmental
variables of pods in your Kubernetes files and status files.
8 USEFUL REFERENCES
• RPC Paper
9 DELIVERABLES
Git repository with all the related code and a clear README file that explains how to compile
and run the required scenarios. You will submit the repository and the commit id for the
workshop in the comment section of the upload. You will continue using the same repository
for the future workshops.
7

9.1 DEMO
The demo for this workshop is as follows (to discuss with other students):
You should be able to demo leader election using kubectl. This means:
1. Initiate two master nodes simultaneously, one of them should be elected leader and do
an RPC call to the only worker with its address as the input, e.g. it should receive back
<address> gatech.
2. You should kill the current master, the second master should become the new master
and do an RPC call to the worker with its address as the input.
3. The initially disconnected master should rejoin again.
4. Once the second master has joined correctly, you should kill the current leader. A new
leader should be elected and it should do an additional RPC call to the worker with its
address as input.
5. Log output should be relevant to the killing/rejoin of the master, worker should properly
log the input string.
8

workshop_6_c__.pdf

Recommended

Recommended

More Related Content

Featured

Featured (20)

workshop_6_c__.pdf