11. How do we
Deploy, Monitor, & Debug?
Standard DevOps
rules don’t apply
12. How do we
Deploy, Monitor, & Debug?
Normal tools don’t work
out of the box
Standard DevOps
rules don’t apply
13. How do we
Deploy, Monitor, & Debug?
Normal tools don’t work
out of the box
Cloud-first makes good local
DevEx hard to achieve
Standard DevOps
rules don’t apply
born and raised in Montana, recently returned
Director of Engineering @ ClassPass
VP Engineering @ Doppler
ClassPass on servers, Doppler serverless, Palantir monolith
Servers -> Serverless -> Servers gives me insight
Not well-defined
No servers, but really that’s a misnomer. There are servers.
A mindset, change the level at which you think
It's the attempt to free yourself from having to think as much about things like: Infrastructure, Scaling, Availability.
Focus on writing the most important software to delight users
Another lens
Glue building blocks together with compute/Functions as a service AWS Lambda
Building blocks are created by others, your mindset is about focusing on the important software
Building blocks implies event-driven, because you don’t control them
Lambda is the computation part of it, running code on events, HTTP requests, db, AND queues among others
Offload everything. Lots of smart people have worked on problems for a long time; leverage their expertise
AWS is not the only one in this space. There are definitely lots of others
Constraints: Monetary, People, had other hard problems (ML), Didn’t have the expertise to think about uptime OR scaling infrastructure
Really successful in a lot of ways. Had to pitch hard. Never went down
We spent ~1/100th what we would have need to with a standard micro services architecture
38:00
Still, pretty complex! Lots of moving parts
A central messaging queue
Everything glued together with Lambdas
Don’t build what you don’t have to (Cognito, Dynamo and S3, AND Elasticsearch)
It wasn’t all perfect though. We ran into lots of issues along the way
At Doppler, we adopted Serverless really early, back in 2015. It was a wild west!
As we built our infrastructure out, found limitations in things that we just took for granted when working with servers.
We had to build some of our own tools to fix them
For example:
You might ask: How do we deploy?
What does deploy mean? With servers, it’s obvious - is a new server running?
With a slew of lambdas, how do you version them across time?
Throw out the DevOps rule books, they don’t apply No guideposts
Our normal monitoring tools, Papertrail or Loggly, NewRelic or DataDog are all built for servers
Your servers might not go down, but stuff can still go wrong!
Because we’re outsourcing so much work, we have a Cloud-first model. what does it even mean to have a dev environment on your local machine?
Can you work offline?
Can you set breakpoints?
Some of the issues we encountered
Build our own serverless service, using the lessons learned to ground us
lets users text captions and then we’ll text back a meme with that caption
Our architecture
Users send to Twilio, Twilio send to API Gateway
Invoke Lambda that puts request on queue and returns
Lambda that reads events off of queue, generates a meme and uploads it to S3
Sends that URL back to Twilio, which will text the user
Event-driven already!
First serverless function. logs, returns with a status code and a body
Now what though? I upload a zip file of code to an S3 bucket, I create a Lambda function that calls that code, I hand-craft an API Gateway definition to call into that, I hope I didn't make a mistake
How do I keep track of even this one function, let alone dozens?
There’s a better way
Now, frameworks. We’ll use Serverless Framework (like @ Doppler)
Lets us define functions, events, infrastructure all in one place atomically.
Open source, actively maintained, supports different vendors, different languages
Lots of community plugins! Many problems solved
This is the serverless framework definition, serverless.yml
Define service
Provider (where/what/defaults)
Functions (the functions that we’ve defined and how they’re triggered)
Same first function
Let’s get this thing onto the cloud! (Next slide)
Going to deploy our function using the serverless framework CLI, sls
This packages up our code, creates an AWS stack and wires everything together
And we see that we’ve deployed one function, helloServerless to us-east
Let’s invoke that function
Return is what we expect
Let’s look at our logs. Has what we expect
See how long it took and how much memory it took.
Something fishy here. To test this, I had to deploy it to the cloud.
Something we hit at Doppler. Let’s fix that and build the first portion of our service
28:00
So, we now have a Lambda that’s invokable. Let’s invoke it via HTTP, which is what Twilio will eventually do
Add event to our function. Path meme, a GET
So let’s use the event variable passed in to us via Lambda.
What we want to do is log out the meme text that is passed in as a query param
I don’t totally remember how to access query parameters, so we’ll stub this out for now
Instead of looking at docs or deploying and looking at logs to figure this out, let’s run locally
we’re going to add a plugin (easy to do)
Adding serverless-offline, gives you a local web server to invoke lambdas with
We can even debug this! I’ve set up VSCode to debug using the Serverless CLI
Add breakpoint
Listening on localhost at 3000
See that our function is trigger by an http event
Let’s execute that in the browser
Hit our breakpoint!
Let’s look at event. Ahh that’s right, it’s in event.queryStringParameters
Let’s now fix our function to use that information. We’ll read from the queryStringParameters when replying
We’re going to debug again and hit the local URL
Give it a query parameter
And we see it here. That’s really cool! We’re debugging the same code that will run in the cloud
Finally, let’s let the function run
What we see is that, sure enough, our query parameter shows up
So, we’ve solved one problem for the time being, which is debugging locally.
big issue averted because having to deploy and not step through code makes the test->fix cycle very slow
Let’s tackle the next part of our service
Our createMeme lambda should put some data on a queue
And a lambda responsible for sending memes should pull off of that queue
I added some code to our createMeme function to send the message
We’re now using a POST instead of a GET, and we receive
text of the message, who from, who to
We now need to define sendMeme
First, we’ll add the function in our serverless.yml
sns event
Using a pre-created SNS Topic
SNS is Simple Notification Service, and Lambdas can be woken by these notifications
Alright, let’s add some code to our sendMeme function
Code to read the messages from SQS
And just print out a TODO
Alright, we’ll deploy again using sls deploy
Get an endpoint back from AWS
We’ll post some data to that endpoint using curl
My data.json has some fake data
Ran this verbosely, so we get a lot of cruft
But we see that we received a 200 back
What should have happened is that createMeme was invoked, added to the queue and returned
Then, sendMeme was invoked. Did this happen? Let’s check the logs to see
We’ll check the sendMeme logs
Shows up!
Now, I’ve tried to sneak something by you that you may have noticed. It’s a long string
What the heck is this?
I made the SQS queue and SNS topic ahead of time
Hard coded, no longer testable fully locally
Luckily, Localstack
A fully functional local AWS cloud stack
Didn’t exist at Doppler, had to cobble stuff together
Using localstack, we’ll regain the ability to debug offline again while working on the next part of the service
sendMeme is getting called, needs to generate the meme and upload to S3
Ideally without so much hardcoding!
Add 2 new plugins
serverless-plugin-deploy-environment, sls-offline-sns
1st lets us define environments and refer to variables at deploy time or run time
With the aid of serverless, let’s us spin up new resources programmatically!
2nd lets a localstack SNS topic trigger offline lambdas
(Look at some new files added by the deploy-environment plugin)
These variables are around at deploy time
3 different environments
Bucket names
Also set up dummies for local and resources that will be built for develop/production
Passed into the lambda at runtime
Defaults for accessing our resources
But also endpoints for Localstack
(there is a plugin for this so you don’t have to do it manually)
We’ll do two things now. 1st, add generateAndUpload with the text and S3 bucket
Log out the media URL
(2nd replace our hard-coded strings with the ones passed in to the environment at runtime)
2nd replace our hard-coded strings with the ones passed in to the environment at runtime
(Next Slide: We’ll also have to change the hardcoded topic in serverless.yml)
We’ll also have to change the hardcoded topic in serverless.yml
Nothing hardcoded now
Let’s try debugging again.
Add a breakpoint to sendMeme
Post some data to our local listener
Our breakpoint gets hit!
And a meme is generated. Notice that the URL is local
Let’s go download it and get a preview of what our first meme looks like
Neat!
Even more cool - let’s now deploy and see the same thing running in our develop environment
Deploy as usual, but add stage, which for us is the environment name
Done! Notice how our url now has the environment backed right in
Grab that and hit it again
Succeeds, now we’ve hit this function in the cloud
Now we’ll look at the cloud logs for sendMeme
And it read off the queue! We’ll grab this URL, which is an S3 URL
(and download the file, we have a second meme)
Paste the url into chrome to download
Download the file, and we have a second meme
Same code, no changes, deployed locally and in the cloud!
15:00
Clear CD pipeline
No Localstack at Doppler, but similar
integration tests run directly against staging
If all pass, deploy automatically to prod
Now that our meme is generated, need to send that URL and the number to Twilio, who will text it to the user
Also need to have Twilio call our API
Not much code necessary to make this happen
Also added a Twilio webhook that hits our Lambda endpoint (in their UI)
We’re done, so…
Before this, I deployed to the production environment. This is live. Text the number
Splits on period for top + bottom
I’ll leave the number on the rest of the slides
While everyone is generating memes, let’s talk about cost
This is the cost formula
We have 2 lambdas, each of which take ~1000ms
Let’s say 20M requests/10M memes
At the current really small rate AWS charges per 1000ms
<$40. This isn’t all in, we’ll have a bit of spend on S3 and SQS, but not a ton
Really Twilio is what bites us
Even cooler. 100MM memes? $400. 1MM memes? $4.
Infinitely scalable
I hid a bit of library code away to keep the code we wrote simpler, but even added in, this entire service is <150 lines of code!
Security/Authz/Authn
Alerting/pagerduty integration (serverless-plugin-aws-alerts)
Secrets (deploy-environment supports credstash)
We’re missing some key components, but also have environments, local dev experience, effortless scaling
We get a B- right now, wouldn’t take much to get to an A+
3:30
Serverless is still really new, so lots of tools just don’t exist or aren’t as good yet, people don’t support it or know it when hiring, etc
(Often you become pretty locked in, because of how much you’re relying on)
Often you become pretty locked in, because of how much you’re relying on
(Moving target, things are changing and you have less control)
Moving target, things are changing and you have less control
(Long running services, services with lots of state, services that require specific memory, security, or execution time guarantees)
Long running services, services with lots of state, services that require specific memory, security, or execution time guarantees. And cold starts, if you always need the same response time
Final thoughts
We put together a pretty compelling service in less than 45 minutes. That’s awesome!
3 takeaways:
Serverless is a mindset
It's possible to build a serverless service starting today and there are huge benefits
...There are pitfalls too, and we've walked through some of the things to be careful with.
What about logging?
tailing the logs is ok if we’re looking at just a single function. Not good in a system
What are logs? Well, they’re streams of data output by our lambda
Streams? Serverless! lambda -> Cloudwatch Logs -> triggered lambda -> wherever
@ Doppler, we used Sumologic, there’s a plugin for that
Not going to write code for this, let’s look at the last part of our service