Building instruqt, a scalable learning platform
DevOps Utrecht
15 february 2018
Me.
Start new things
Writing code
Entrepreneur
I love skiing!
DevOps
(Google) cloud
(pre-)sales
Xebia
Instruqt
Inspire Challenge
Connector
Consultant
Co-founder
Startups
Automation
Solving problems
Marketing
Drumming
What is instruqt?
Gamified learning platform for IT.
Build skills, by doing.
Think for yourself.
No more monkey see, monkey do.
Unlock challenges to complete tracks and
earn badges
Create tracks yourself using the open SDK
Other use-cases are assessments, guided
demo’s and sandbox environments.
This is the 4th version of the platform
Iteration #1
Iteration #2
Iteration #3
Current version
Technologies we use.
Golang
React
Terraform
Google Cloud Functions
Bash
Docker
Firebase
Google Cloud Container Builder
Google Datastore
Google Storage
Google Kubernetes Engine
Google Kubernetes Engine
Google Container Registry
Google PubSub
Redux
Technologies we use.
Golang
React
Terraform
Google Cloud Functions
Bash
Docker
Firebase
Google Cloud Container Builder
Google Datastore
Google Storage
Google Kubernetes Engine
Google Kubernetes Engine
Google Container Registry
Google PubSub
Redux
Enough talking, show me some action!
DEMO
So, what happens behind the curtains?
DEMO
What is being created by the platform?
Kubernetes namespace
Shell pod
SSH service
Gotty service
SSH keys
Alpine container
User namespace
Exposed via ingress & proxy
Proxy
JWT token validation
Gotty service
play.instruqt.com
Firebase
Make it scale.
Google Cloud Platform does the heavy
lifting for us.
Everything runs in containers
Our containers are managed by Kubernetes
We don’t manage Kubernetes ourselves, we
use Google Kubernetes Engine (GKE).
Benefits of using GKE
Automatic scaling
Automatic upgrades
Separate node-pools
Natively supported by Terraform
Fast release cycle
Integration with GCP IAM
Google created Kubernetes
Google Cloud networking
Continuous Delivery the GCP way
Google Cloud Container Builder builds our
containers. No more build-servers!
Our build pipeline
Track directory
● config.yml
● track.yml
● challenges
Instruqt
Trackbuilder
Google Cloud
Storage bucket
Google Cloud
Container Builder
POST
instruqt track build
Google Container Registry
Image
DEMO
Automated deployment
Google Container Registry
Image
Google Cloud PubSub
Google Cloud
Container Builder
Google Cloud Function
Generate Kubernetes
YAML file
Kubernetes
New container Old container
POST
DEMO
Benefits of using a pipeline like this
No central buildserver
No more Jenkins (jay!)
Deployment across projects
Integration with GCP IAM
No external services required, all Google
Cloud Functions are small and simple
Automatic build logs
Tips when creating a similar pipeline
Annotate pods with a timestamp, so it will
always update the old pod.
PubSub has a slow-start mechanism. Be
prepared for some waiting time!
PubSub cannot be used cross project.
Use a cloud function that publishes the
messages across projects.
PubSub Cloud Function Cloud FunctionPubSub
Project 1 Project 2
Terraform
infrastructure as code done right
Terraform creates, updates and destroys
our infrastructure
We use Terraform to create our own
infrastructure as well as infrastructure for
users
Track container
Google Cloud
Storage bucket
Terraform: create user infrastructure
config.yaml
Terraform
template
(Golang template)
Rendered template terraform apply Infrastructure
Terraform state
Track containerGoogle Storage bucket
Terraform: destroy user infrastructure
terraform destroyTerraform state
Benefits of using Terraform
Immutable infrastructure
Support for all our resources
Integration with Google IAM
Declarative
Golang template + Terraform = powerful!
Stored state of infrastructure
Agentless (API + SSH)
Dry-run: terraform plan
How do we scale?
Open platform
Create and play content for free
The SDK makes creating content easy.
Curation will become a challenge!
Pain points we encountered
Custom built containers didn’t scale.
We inject our tools now, so you can use
unmodified containers.
GKE auto upgrades are great.
But make sure your critical services have at
least 3 replicas running :)
Our infrastructure is extremely dynamic.
Clean up is crucial to keep the bill low!
Some tools have a looong startup time, but
we need it to start instantly.
Our solution: VM pooling,
VM pool
VM pooling
Kubernetes VM
Kubernetes VMKubernetes VM
Label: available=true
Kubernetes VM
Label: available=false
A new VM is being created by terraform
terraform apply
Key takeaways
Standing on the shoulders of giants
Leverage cloud services where possible.
You probably won’t switch cloud providers
anyway.
Cloud providers are better in security
(Spectre, Meltdown) and scalability than us.
We do #YOLOPS.
Just make sure you can fix your mistakes
fast enough!
(or have a safety net in place!)
Our mission is to make learning IT more
hands-on and fun.
You can help us by creating tracks yourself!
Thank you.
Go to play.instruqt.com to create an account!
bas@instruqt.com
@bastichelaar
linkedin.com/in/bastichelaar

Building Instruqt, a scalable learning platform

Editor's Notes

  • #3 Bas Tichelaar Working for Xebia, 1 year for Instruqt I get energy from starting new things I have a technical background, but in my current role I try to do more marketing and sales
  • #4 Over the last 2 years we’ve been working on Instruqt
  • #5 Gamified in a sense that you have to complete the challenge to unlock the next one. Let’s start with a question, can you all stand up? Who of you has experience with containers? Who of you has followed the Docker official classroom based training? You can sit down. Who of you has followed an online course to learn Docker? You can sit down. Who of you has just thaught himself by just learning on the job? You can sit down. We have a few people standing, how did you learn Docker?
  • #6 Instead of watching video’s, or following tutorials, we present you with a real-life challenge. With actual infrastructure Focus is on the challenge itself
  • #7 A lot of tutorials just provide the commands. But you don’t actually learn anything when you copy-paste them. We provide you with a challenge, and verify the outcome. The route you took is less important.
  • #8 To progress, you need to complete topics and tracks. A topic is a collection of tracks, for example “Version control with Git”. A track is “Creating and cloning a repository” A challenge is “Clone this remote repository”
  • #9 One of our unique features is our open SDK. Anyone can create a track and publish it for the world to play. We created a command-line interface, that I will demonstrate later on.
  • #10 The platform can also be used for: Assessments Guided demo’s and producttours Sandbox environments
  • #11 In the past two year, we have learned a lot. It started as a competition platform, but people told us they want to use it to learn new technologies. So we’ve pivoted. Let’s take a look at the previous versions.
  • #12 This version didn’t have a browser based console, but you had to connect using your own SSH client. We’ve built this for Hashicorp Europe, that we organised together with Hashicorp. Was anyone there?
  • #13 The second iteration introduced the browser based console. No more SSH required. It looks already a lot like our current version.
  • #14 The third version was built for Google Cloud Next, where we launched for a bigger audience. The competition element was still the central element. But this version was way too complex to maintain and iterate on.
  • #15 This is a screenshot of the current version. We now have support for: Tabs Notes GUI interfaces And it is mobile friendly. I’ll show you a demo later on.
  • #16 We use quite a lot technologies. As you can see, we leverage a lot of cloud services.
  • #17 The core is built in Golang, with a React frontend. For the challenges, we use a lot of Bash.
  • #18 Ok, let’s take a look at a demo.
  • #19 Login, join track
  • #20 A lot of magic, right? Let’s take a look at some logs in the terminal to see what is actually going on.
  • #21 Leave track, join track again Show logs of join Maybe kubectl
  • #22 So as you can see, we create quite a lot, and it only takes a few seconds. Kubernetes does the heavy lifting for us. Namespace for isolation Services to expose the container SSH keys to connect to it from the checker and other containers We used kubectl in the past to do the checks, but SSH is way faster.
  • #23 If you go to play.instruqt.com, and you play a challenge, it will first verify your token using Firebase. Then it will provide you with a shell, that’s being hosted by your container.
  • #24 So, this is all nice, but the question is: does it scale? The short answer: yes it does.
  • #25 As we use GCP, we don’t have to worry about traffic spikes. We have auto-scaling enabled, with a minimum and maximum amount of nodes. So google just spins up more machines.
  • #26 From the first moment we are using containers. Containers provide us with a nice way to isolate services, and to limit the resource consumption. Also for development, it is a breeze to use. You just build a container and publish it in a registry.
  • #27 As said, the containers are managed by Kubernetes. If one dies, Kubernetes spins up a new one. If the resource limits are reached, it will kill the pod and spin up a new one. Kubernetes makes the platform robust and resilient.
  • #28 You can install Kubernetes yourself, and it’s quite easy. We even have a track that explains how to do it. Not sure if it’s published yet, but the track exists. But when you run stuff in production, you don’t want to be involved in updating servers, installing patches, rejoining disconnected nodes and upgrading Kubernetes itself. That’s why we chose for GKE. GKE offers us auto-upgrades out of the box, and they are quite fast in their release cycle.
  • #29 So when looking at GKE, we get a lot. Google created Kubernetes based on their experience with Borg. Another advantage is that it integrates really nice with other GCP resources like IAM and the networking stuff. When using a small cluster, Google even runs the master for you. So you can get started with just one node.
  • #30 We also use GCP for our build and deployment pipeline. Let’s take a look.
  • #31 A core component is Google Cloud Container Builder. We don’t use Jenkins or Gitlab-CI, we don’t even have build servers. Everything is done by Google Cloud Container Builder.
  • #32 This is a simplified version of our build pipeline. The code is being uploaded to the Instruqt trackbuilder using the instruqt build command.
  • #33 Make a change Instruqt track build Wait for the e-mail and slack message
  • #34 Explain the deployment Clodu Container builder publishes a message on PubSub, that triggers a Cloud Function. This function generates a Kubernetes YAML file with the new image ID, and triggers a deployment on Kubernetes. Kubernetes then fetches the image from the registry, and updates the pod with the new container. The old one is removed.
  • #35 Let’s see if our change is published?
  • #36 Benefits: No buildservers to manage or scale Integration with GCP for identity and access Cloud functions are ideal for this usecase, it just triggers a small script and then stops again. The first x invocations are free.
  • #38 If you don’t annotate, and the image name hasn’t changed, Kubernetes won’t do a thing. So always make a change in the Kubernetes YAML file to trigger the deployment.
  • #39 PubSub is great, but not if you sent message infrequently. It can take a while to warm up, which means that it can take a while for the image to be deployed. You can also keep it warm by sending messages frequently, though it’s not advised
  • #40 We use multiple GCP projects, so we need to communicate across projects. This is not supported by default, so we created a cloud function that publishes in another pubsub queue.
  • #41 Terraform is a very important part of our platform
  • #42 For thos of you not familiar with Terraform, it is infrastructure as code. We use it to create our underlying infrastructure on GCP, including deployments on Kubernetes
  • #43 But we also use Terraform in our platform toe generate and apply the infrastructure plans for users.
  • #44 Let’s take a look at the workflow. I showed you the config YAML. This is merged with a Terraform template (show example), and when a user joins a track, terraform is triggered and creates the required infrastructure. The state is stored in a bucket on Google Cloud Storage
  • #45 This bucket is crucial for deletion of infrastructure: terraform checks the created infrastructure and deletes all created resources
  • #47 We talked about the platform, but let’s take a look at our own challenges. How do we scale?
  • #48 We have a bit of a chicken and egg problem. We don’t have a lot of content yet, so it’s hard to sell on content only. On the other hand, because it’s not being used widely, no content is being created. To solve that, we have opened up. Anyone can create tracks now. The SDK is open, but we don’t have it published yet. If you want to join, let us know and we’ll provide you with a link.
  • #49 Right now content is our scaling challenge. But once we have a lot of content creators, we need to curate the content. One way is to let peopel rate challenges. Another one is to have a moderation phase in place. But that’s a problem for later.
  • #51 In the past, we used custom containers created by ourselves. But a lot of people want to create their own container. So we had to come up with a solution. What we do now, is override the entrypoint, and inject our own dependencies. This makes running unmodified containers possible. Another hurdle removed.
  • #52 GKE auto upgrades are very nice, but not if you have one pod running for a specific service. Always make sure you have at least 3 running!
  • #53 Our infrastructure is very dynamic, but so is our cloud bill. We need to cleanup old infrastructure properly to keep the bill low.
  • #54 Some tools take a few minutes to startup, but as a user you don’t want to wait. You want it to start instantly. So we had to come up with a solution: VM pooling.
  • #55 We have a few “warm” VM’s running. Once you join the track, it claims a VM. it will then replenish the VM pool with a new VM.
  • #57 Google and Hashicorp allow us to focus on our own software without having to care about the infrastructure or updates of software. Benefits: Time to market Security Scaling
  • #58 A lot of people talk about a lock in. If you run containers and put your logica there, you can move pretty easily. Especially if you also use Terraform. But don’t be too afraid for a lock in. You probably won’t move to another cloud provider anyway. And Google just rocks.
  • #59 Cloud providers were faster in rolling out updates than we could do ourselves. Same for scaling, we just can’t scale as quickly on our own hardware as we can do now with GCP.
  • #60 People talk about DevOps, NoOps. We do YOLOPS. We roll things out in production, and if things break or fail, we fix it quickly. For now this works, but at some point we might need some safety nets and testing. We just started with testing using Cypress.
  • #61 So, to conclude, we want to make learning more fun. I hope you have learned something today! If you want to join, please let me know, and I’ll provide you with access to the SDK.
  • #62 Thank you, and don’t forget to try it out yourself! You can just go to play.instruqt.com and create an account. You can contact me by email, or send me an invite on linkedin. I’m not very active on Twitter. Any questions?