I’ll let you into a secret. There are servers. Lots of servers. And networking gear. Storage too. We just don’t want to have to think about them anymore.
In this session we’ll learn how to build, from the ground up, a scalable, reliable app without once mentioning virtual machines, load balancers, volumes, auto-scaling groups (or even containers that much).
A Functions-as-a-Service (FaaS) platform is the beating heart of a serverless architecture.
A function is a simple piece of code that does one job, well. It takes some input (usually on standard in) and, optionally, writes some output, usually to standard out. We are not talking about functions in the ‘functional programming’ sense here as these functions can have side-effects – rather they are self-contained units of work.
Your functions are deployed as a single unit to a Functions-as-a-Service platform.
This platform then deals with the provisioning of the underlying infrastructure; deploying your function code; scaling up and down; resilience and reliability; billing; security (authentication, authorization, isolation). And it has to do all this blazingly fast; at huge scale and for any language / platform that you care to use.
But a FaaS on it’s own is not sufficient. You need somewhere to store and process state (check out the State Service talks this week for some help with that). You also need a way to stitch together your small, independent functions into a coherent, fault-tolerant whole. More on that later.
Your FaaS platform will enable your app to scale organically per-request without you having to write any special code to handle it. And not just the gentle seasonal changes in demand that a retail business might experience but also sudden, dramatic surges in demand caused by your app going viral.
In addition to scaling your FaaS will be taking care of secure isolation of your functions too. Here again keeping things small helps us write better software: the isolated execution contexts can have only the secrets that they require added to them. This will greatly reduce the blast-radius for any software vulnerability.
The primary focus of a serverless platform is your code. The FaaS encourages you to break up your app into small, isolated parts. This is great for dev as small, isolated functions are easier to reason about and manage.
Serverless changes the economics of computing too.
The two primary principles are:
- you don’t pay while your code’s not running.
- when you do pay you’ll pay in sub-second increments for resources consumed. Typically 100ms billing increments over RAM.
There are secondary effects too:
- Developers can focus on writing code to improve the lives of their customers instead of managing infrastructure and software lifecycle. More productive developers = happier developers = more productive developers.
- Your FaaS can pack workloads much more densely than just using VMs alone. This enables better utilization of hardware and therefore better price/performance.
- Complexity is moved into the platform and handled for you. For example the auto-scaling nature of a FaaS removes the need for developers to write lots of complex monitoring, metrics and provisioning code.
The combination of sub-second resolution billing, seamless scaling and small, reusable functions will change the way we think about development. As tools and practices catch up to working at this fine-grained resolution, the financial impact of poorly written, badly behaved code will become immediately obvious and developers will stop using costly libraries and functions. This has the potential to have a massive positive impact on the maturity of software engineering as a discipline.
Fn is an open source Function-as-a-Service platform. It was announced by Oracle today/this week. You can download this now and use it for real.
Yes, really.
It’s an Apache 2.0 licensed project.
It’s not under the Oracle brand or Oracle github organisation.
We are actively seeking outside collaborators.
We have teams in the US and UK committed to working in the open on this project.
Not ‘open core’. We are not keeping anything back for ourselves – what you see in the github is what we’ll be running in our cloud service.
Why? - Being open allows us to provide a much better developer esperience.
Not only can you get and read the source so you know exactly what’s going on under the hood (should you need to)…
…but you can contribute to shape the project to better meet your needs.
But mainly because it enables an awesome, frictionless local developer experience that is difficult or impossible with closed, propietary solutions.
Best-in-class support for Java with more innovation to come.
A Java ”function development kit” (FDK) which speeds up the development of Java functions.
Unit testing support in the form of a Junit rule to accurately simulate the runtime environment.
Access to configuration and secrets provided by the fn platform.
Type coercions to make parsing and formatting your input and output as easy, and safe, as possible.
A polyglot platform with out of the box support for >15 stacks. But you can always drop back to the Docker/container layer if you need to run something that we don’t support. This is offers more flexibility than most other serverless offerings today.
Fn Flow is a service that sits alongside the Fn FaaS that addresses the workflow problem. It is open source too. More on this later.
1. We are a travel agent. We have partnered with 4 different 3rd parties. One for booking flights, one for booking hotels, another for booking car rentals and finally someone to manage sending emails on our behalf. We have selected the best-of-breed company in each sector. Unfortunately this means that we have a heterogeneous tech stack to deal with.
2. The flight partner provided us with a brilliant and easy to use Java SDK. The hotel partner insisted that they would only supply us with a ruby SDK. The car rental firm releases updates and patches to its nodejs SDK before any others and the email provider would prefer we interact w ith their service through the provided python client.
3. We are therefore going to write a function for each provider, to book and to cancel. This way we not only get to use the most appropriate tech stack but we get security benefits too as we can segregate runtime secrets required for authenticating to the different providers.
4. Finally we want to create a trip function that our app can call to reliably book a trip that consists of a flight, a hotel and a car rental booking.
1. In the first part of the demo we’re going to develop our first function, in Java, to book a flight.
[fn start] Easy to start a local dev server
[fn init] Easy to create a new function
[Show func.yaml] cmd tells Fn which Java method to call when there's work to do. path allows us to control the URI that this function binds to.
[Show boilerplate] It's a simple, plain Java program. No framework to learn and wrestle.
[Show test and run] JUnit support provides a comprehensive simulation of the environment that your function will run in when deployed. Thus you can get a high degree of confidence that your function is correct before it leaves your laptop.
[fn app create] apps are namespaces for functions. We need an app in order to...
[fn deploy] builds your function in a docker conatiner so that you have repeatable builds. deploys your function to the Fn service
[fn call] calls the function at the URI that you just deployed to
[curl] fn includes a lightweight HTTP gateway. Fn invocations are just HTTP calls to the URL for your function.
There’s nothing more productivity-sapping to a developer than having to wait a long time between writing code and getting feedback on whether it works. That used to be compile time, then waiting for tests or CI to run but today it’s waiting for your code to upload and deploy to the cloud. For a complex app this can be very time-consuming and will cause you to lose focus and lose flow. By offering a superb local development experience we can get that feedback cycle down from minutes to seconds and keep you in the zone.
1. The Fn Java FDK (function development kit) provides a high-fidelity, Junit compatible testing environment so that you can get a high degree of confidence in you code before you deploy it (even locally).
2. Fn is open source so you can run the exact same server on your laptop that will be running in the cloud service that you deploy to.
3. Cloud deployment takes advantage of Docker image layers to minimize the amount of data transferred for new deploys.
Plain old Java. No complicated new framework to learn, just write Java as you have always done.
Create some POJOs for request, response
Show updated test
Show sample request JSON
fn deploy
cat request | fn call
Curl again??
OK, so there is a helper library that we call the FDK. We aim to keep it as unobtrusive as possible but there will always be times when you want to customize it’s behaviour.
We use Jackson to do serialization to and from JSON which gives sensible defaults. You can also use the full power of Jackson either programmatically or via the annotations to customize the serialization as you need.
Or write your own input or output handler to take care of that exotic wire format that your client insists on keeping backwards compatability for. Or add protobuf support.
[Import Airline SDK package]
[Create configuration method] Fn supports setting configuration at the function or app level. Can set in func.yaml or via the API/CLI
[instantiate Airline SDK]
[Change handleRequest method to call airline api]
[fn deploy]
[fn call] we have a confirmation number - where did that come from?
[Show fake API dashboard] Well the Airline SDK is not real. It's a fake. In fact it just informs this dashboard that it's been called and with what parameters so that we can see what's going on.
2. In the second part of the demo we’re going to deploy the rest of the functions that make up our app so that we can start developing our trip function.
fn deploy --all --local
Fn routes list
In the last part of the demo we’re going to develop the trip function that will reliably orchestrate work across the functions that we’ve just developed to book a trip for the user. It must do so reliably, confirming each reservation or cancelling them all. It must cope with downtime in our partner APIs as well as full flights, packed hotels and empty rental lots.
This workflow or composition problem is actually a general one that most serverless applications have once they reach some level of complexity. There are 2 naive approaches:
- Write a blocking master function that runs one function, waits for the result, then runs the next etc. This might be easy to understand but we lose a lot of the nice characteristics of a serverless app when we do this. This master function is long-running and consumes resources for the whole time that the individual functions are running. When the booking processes take minutes this can get expensive. It's also not very reliable: if this master function dies then we've got no easy way of recovering.
- Chain the next function by directly calling it. This ends up being a maintenance nightmare very quickly as all of a sudden each upstream function has to know about every downstream one so that it can gather and pass the right data down the chain. It's also very difficult to deal with errors. Fan-in / join is tricky.
So we get to the point of needing an external co-ordination service that you can configure to call your functions, pass data down the graph and deal with errors.
In the past this has meant BPEL or some other external workflow DSL. Today, the answer from other clouds amounts to the same thing, in JSON with a shiny visual programming front-end. This approach has several issues that prevented serious usage the first time round and still apply today.
<rapid click through next slides>
Not every problem maps well to a state machine approach.
- Error handling becomes very tricky and causes an explosion of complexity.
It’s difficult to test
It’s really difficult to debug
You have to manage and run the orchestration service.
You end up creating and maintaining a lot of 'glue' code that extract data from the response of one function and format it for consumption by another.
- It's another language and associated tooling to learn.
Enter Fn Flow. It occurred to us that Java already has a mechanism for composing asynchronous functions. The CompletionStage API introduced in Java 8 and improved in Java 9 provides a really nice, type-safe promises-style API. This lets us compose a graph of asynchronous computations that lets you specify concurrency, fan-out, fan in and a bunch of other distributed programming primitives.
What if could use this API, or something very like it, in our favorite language (Java of course) to compose serverless functions?
So we wrote a service (that we call the completer) that can store and trigger the execution of these computation graphs.
This is part of the Fn platform and shares the same scalability and reliability properties. We open sourced this along with the rest of the Fn platform. This service backs an API that's almost exactly like CompletionStage that you can use in your Java function to reliably compose other functions. Let's see what it looks like…
Each funciton executes in a particular flow. You get access to that flow from Flows.currentFlow().
This object has a bunch of methods for adding work to your current execution graph. Here we’re going to use invokeFunction to call another function on the Fn platform. This can be in Java or any other language.
invokeFunction returns a FlowFuture which represents the potential future value that that function call returns.
FlowFuture itself then has some methods for chaining work onto the result of that computation. Now we’re just going to call the flight booking function.
Similarly to the flight booking function we’ve created a POJO that represents the input for a trip. It includes info about the flight, hotel and car. It corresponds to JSON that looks like this.
So when we deploy and call this function, and pass it that JSON, we can see a call to the flight booking fake.
So far so boring.
Next we’re going to use the thenCompose function to chain the hotel and car rental calls to the end of the flight call. Then compose take a lanbda that takes the result of the previous call, in this case a flight booking confirmation, and returns a new FlowFuture that represents the remaining work to be done.
Then we will use the whenComplete method to send an email. whenComplete takes a lambda that takes 2 parameter, a result and an error. On will be null. We can use this to check the result of the computation, handle errors if necessary and take appropriate action. For now we’ll just send the user an email.
An important thing to note about this code is that because the results from the flight, hotel and car booking are still in scope when we send the email we can use these values to compose the email. In fact each stage of this function, which we see as a whole here, is run as several different invocations, possibly in different JVMs on different hosts. The platform deals with making sure that the right values are available to each call a the appropriate time. This preserves the ‘don’t pay for idle’ and ‘automatic scaling’ serverless properties.
When we deploy and call this function we can see that we get a call to the flight, hotel and car providers and we then send an email with the results of each of those calls. This demonstrates fan out and fan in.
OK so what if one of our providers is unable to handle one of our booking requests? Let’s say, for example, that our car provider has run out of cars in between our user searching for a car and sumitting the booking request. But by this time we’ve already booked a flight and a hotel! We need to introduce a ‘compensating transaction’ that will cancel the hotel and flight in the case that we get an error from the car provider. In fact we need to cancel the car too as we don’t know if the error happened before or after the car booking was committed.
Introducing the exceptionallyCompose method on the FlowFuture. This lets us add new work to the computation graph if an error occurred in a preceding stage. You could use this to return a default value, retry the call, alert an operator or whatever else makes sense for your app. In this case, by inserting the exceptionallyCompose in the right place in the graph we can use it to implement the compensating transactions. This is an implementation of the saga pattern.
So when we deploy and call this function, in the happy case, we can see that it behaves the same as before.
We can now configure the fake SDK to have the car provider return an error. Now when we run the function we can see that we successfully book the flight and hotel, and fail to book the car. Then we will cancel the car, hotel and flight and finally send an email to the user telling them that we’ve failed.
At this point I’d like to introduce the flow UI. This is an experimental UI that we can use to explore what is happening during a particular flow invocation. The x-axis is time from left to right and the y axis is roughly dependency order. In this flow you can see the main function in the top left and the flight and hotel bookings here. Here is the failed car booking. Clicking on this node highlights the ancestors of this node, that is all the stages that caused this one to run. And below we can see logs from all of those stages including the error. This is super useful for understanding what’s going on during a flow execution.
Finally, what would happen if, instead of failing to book a car, the car provider was down entirely. We can simulate this in our fake API dashbaord by setting the car cancel function to return an error too.
Ideally we’d like to retry the cancellation until either we succeed or we exceed the maximum retry count.
So we have this retry helper that we can wrap our cancel function calls in. This is interesting because we’re just using primitives supplied by Fn Flow, in this case delay and thenCompose to implement the retry. But we’ve encapsulated this in a way that keeps our main flow understandable and enables us to reuse this pattern elsewhere.
When we run this function now we can see the flow retrying the cancellation of the car rental.
In the flow UI we can see that we’re not incurring any cost when there’s no work happening, for example in between the retries.
Out of time to show you the unit testing support.
OK, so let’s just recap what we’ve announced today and what it enables you to do.
Develop small, reusable functions in Java and have the platform deal with scale, deployment and all the infrastructure concerns
Develop functions in your favourite language or use custom or pre-existing Docker containers if none of the out-of-the box runtimes meet your needs.
Open and extensible is better for your productivity.
Use Fn Flow’s distributed programming primitives to build scalable, reliable serverless apps. This enables you to reason about, and test, your whole app in one place. Do this by using a familiar API in your favorite language (still Java) to orchestrate work. No more ‘programming’ in large globs of JSON. Or XML <shudder>
You can then use the first-class unit testing support to check that it will do the right thing. And you can use the local dev server to get that extra confidence.
Built-in error-handling constructs help you write apps that are fault-tolerant. And per-request scaling from the Fn platform takes the pain out of scaling.