A presentation given by Andrew Carlson, Principal Field Architect at Apollo, at our 2024 Austin API Summit, March 12-13.
Session Description: GraphQL is more than just a way to let client teams ship features faster or for backend teams to reuse their existing services efficiently. When used as a layer to aggregate and orchestrate existing APIs, it’s an ideal location in our architecture to centralize access control and authorization down to the field level, providing field-level observability into which clients request what data. Learn different ways of measuring the tradeoffs between authorization at each layer in the stack, and how to get column-level observability into who is requesting what data.
18. Applying policies closest to the data
Aggregation
API Gateway Services
❌ Flexible
❌ Granular
Persistence
19. Applying policies closest to the data
Aggregation
API Gateway Services
❌ Flexible
❌ Granular
Persistence
Trade-offs:
1. General roles
2. Broad rules, eg: table-level
3. Rarely the granularity we
need for consumer facing
apps
20. Applying policies closest to the user
✔️ Flexible
❌ Granular
Aggregation
API Gateway Services Persistence
21. Applying policies closest to the user
✔️ Flexible
❌ Granular
Aggregation
API Gateway Services Persistence
Trade-offs:
1. Broad policies
2. Allow or reject entire
services
22. Applying policies in individual services
❌ Flexible
✔️ Granular
Aggregation
API Gateway Services Persistence
23. Applying policies in individual services
❌ Flexible
✔️ Granular
Aggregation
API Gateway Services Persistence
Trade-offs:
1. Bespoke logic per service
2. Auth “sprawl”
3. New code, tests, and
deployments
24. Only securing the Persistence boundary
Flexibility
(how easy is it to update?)
Granularity
(how specific can we be?)
Persistence
25. Only securing the API Gateway boundary
Flexibility
(how easy is it to update?)
Granularity
(how specific can we be?)
Persistence API Gateway
26. Only securing the Service-level boundary
Flexibility
(how easy is it to update?)
Granularity
(how specific can we be?)
Persistence API Gateway
Services
45. A GraphQL boundary is flexible and granular
Flexibility
(how easy is it to update?)
Granularity
(how specific can we be?)
Persistence API Gateway
GraphQL
Services
46. 1000+ API architects & engineers building
better together
champions@apollographql.com
Editor's Notes
Hey everyone, it’s nice to be here today! My name is Andrew Carlson. I’m the Principal Field Architect at Apollo, and prior to that I spent over a decade helping companies like Ford, Sherwin-Williams, and Unilever design and execute on digital transformations. I care a lot about intentional design. Everything we ship has an architecture. Everything is designed. All we get is the choice about whether we want to be intentional about those designs, or not. I recommend that we are.
Today we’re going to chat about a very important topic, securing the data in our applications. But we can’t talk about security without talking about:
[next slide]
Boundaries.
The boundaries we set in our personal lives help us make decisions about what we spend our time on and our connection to our core values. Without it, we can become lost or waste our limited resource, time
The boundaries that we have in our architectures help us make intentional decisions about the best places to do things like protect our organizations most valuable digital resource: data.
[next slide]
Data is the lifeblood of our organizations, and protecting that data is expensive. So expensive, and so valuable, that by 2027, companies are forecast to spend 12.3 billion dollars securing and protecting that data. That’s a lot of bbq. In fact, I did the math. That’s over 350 million pounds of brisket from Terry Blacks (which you should definitely try while you’re in Austin by the way.)
The data security market is so big in part because our applications are complex, and that can make them harder to secure.
[next slide]
And the complexity in our tech stacks can be a result of any number of things. New clients that we need to support, mergers and acquisitions, and new technologies that are gradually adopted anywhere in the stack. Say you get a mandate from your C-Suite – you need to explore vector databases and figure out how to use them in your stack. Well, that’s a new technology for your persistence layer.
But even though we can neatly label different tiers here, it’s not really that neat in reality.
[next slide]
I mean look at this. As soon as you start hooking those clients up to different aggregation services and pulling data across your organization it can turn into a mess pretty quickly. This lack of clean boundaries, especially between the presentation and application tier, can make it challenging to secure the access to our data.
And when we need to go in and manually secure APIs, databases, and even BFFs it can be incredibly difficult to evolve and audit what is secured, where, and from whom?
[next slide]
So some very smart people across the industry came together and asked a simple question: rather than implementing one-off access control and security in various services and databases, what if you could apply the declarative style that we know and love from infrastructure as code to security policies at different layers of our stack?
[next slide]
And the rest is history. From that point forward we’ve seen a rise in the governance-as-code toolchain with excellent tooling like Casbin, Open Policy Agent now a graduated CNCF project, Sentinel, cedar, zanzibar and more.
All great options for applying declarative security to our stacks
[next slide]
And these tools, this pattern is so powerful because instead of going in and manually changing a string here, and a credential there, we can manage and audit our security policies just like we do code. And we all know what the alternative is: hoping you have the right credentials to the right database and don’t fat finger something along the way.
That sounds an awful lot like cowboy coding to me, which doesn’t sound very intentional. I think we can do better!
[next slide]
So declarative policies are great. We can make edits easily, audit them, and ensure they get the rigor as other areas of our code base. But where should we apply these policies? The persistence layer, where our data rests?
[next slide]
The services tier, where we apply the bulk of our business logic?
[next slide]
Maybe the aggregation tier?
Well…
[next slide]
It depends.
[next slide]
Just kidding, you didn’t come all this way for a consultants answer.
The real answer is, almost everywhere.
[next slide]
This is also sometimes called “Defense in Depth” Most orgs will benefit from this defense in depth motion, securing each layer of the stack, left to right.
But we’re here to talk about intentional design, right? After all, software architecture is about making principled decisions about things that are hard to change later. And this is one of those things. We want to be sure we’re applying policies in the right place.
So with policy-as-code specifically, how can we reason about the different boundaries that we can attach these policies to?
[next slide]
Well when in doubt, I like to draw a matrix. In this case we should measure at least two things, flexibility on the x-axis (how easy is it to update) and granularity on the y-axis (how specific can we be). And we can call this the authorization versatility by location
And our goal here is to evaluate the value and trade-offs that we can get from securing different layers in our stack. We can then layer on the topology of our architecture against this matrix to find the benefit of applying authorization policies, whether declaratively through a tool like Open Policy Agent, or even imperatively by hardcoding.
[next slide]
So let’s take a step back and simplify what we were looking at before, and just talk about the topology of our architecture. We can drop the Presentation layer, push it right off to the left of the screen, because we’re talking APIs and, well, we should never trust the client.
So that leaves us left to right with the API Gateway, Aggregation, Services or Application, and Data
[next slide]
Let’s go from the outside in, we’ll start by evaluating applying policies closest to the data. We’ll consider a database for brevity, but it can be any type of persistent storage, whether S3, PostgreSQL, or a data warehouse wrapping a database, like DataBricks wrapping MySQL.
As the system of record for the data, applying data access policies at the storage layer is a logical place to start.
[next slide]
However, the trade-off of only using security policies at the data storage layer is that they trend towards general roles, whether that’s through a username and password, certificate, LDAP, or other authentication protocol. These are enforced directly at the storage layer, such as through PostgreSQL Authentication, authorizing table-level access at best.
This type of low-level access rarely represents the permissions and scopes we want and need for a consumer-facing application.
[next slide]
So let’s move to the other side of the spectrum. We just talked about the layer closest to the data, now let’s talk closest to the user.
We can do this either service-by-service, or more commonly, through an API gateway like Kong or AWS API Gateway. Using policies at the API Gateway level can be a boon because we can fine-tune them to business requirements more comprehensively than at the database level.
[next slide]
However, policies and authorization implemented at the service level using a tool like Kong are still broadly enforcing an entire service at a time, rejecting an entire request if the policy agent identifies an unmatched rule.
So for example if you have a User service you can block access to the entire request, but can’t guard individual attributes within the request body as easily.
[next slide]
So next up, if we apply policies on a service-by-service basis, we can increase the granularity of the data we protect.
[next slide]
However, we must write unique and bespoke logic in each service, decreasing the flexibility of adjustments because every change will require new code, tests, and deployments, and we end up with policies all over the place.
[next slide]
So let’s go back to our matrix and take a look at these three options.
If we only secure the persistence boundary, It’s not particularly granular, and really not all that flexible.
[next slide]
Only securing the API Gateway is fairly flexible, you can secure big swaths of fields that may cross different tables, but it’s not very granular.
[next slide]
Securing individual services can be granular, but not super flexible. You need to get in the weeds and manually adjust policies within each individual service.
And you all know where I’m going with this. One layer left to explore… but before I get there, let’s talk about GraphQL. This is a part of the GraphQL track after all.
[next slide]
Now you may be thinking, as I used to, that GraphQL is just another option for our APIs… It’s REST v GraphQL or GraphQL v rpc… In 2012 that may have been the case, but over the last decade there’s been a lot of evolution in the space
After GraphQL was open-sourced in 2015 it saw rapid adoption and we started running into side effects of running GraphQL at scale. Schema stitching hit the scene in 2017 and then new architectural patterns like federated graphql emerged and have taken GraphQL by storm. Just a few short years later and companies like Netflix, MLB, etc are all deploying highly available, GraphQL at Scale using Apollo Router.
[next slide]
The reason for that, is that Federated architectures enable teams to deliver GraphQL’s benefits at a greater scale, transforming it from just another API to a layer in a stack that sits on top of existing services.
This graph of graphs provides access to any number of services with a single endpoint. It also enables teams to share entities and domain models across those subgraphs. Rather than exposing a sprawl of backends-for-frontends (BFFs) or experience APIs, service teams gain a central unified boundary, remember? Securing our applications is all ensuring we have adequate boundaries…
[next slide]
GraphQL, especially when implemented as a federated architecture, offers a unique opportunity to apply query and even column-level authorization and access policies within a single request that can span any service and any database.
In a federated GraphQL architecture, teams can maintain their own individual APIs. This pattern provides the simplicity of a GraphQL monolith for client teams but the modularity of a more decoupled approach for service teams.
This supergraph — a graph of graphs — orchestrates these services to provide a central access point for data while retaining field-level granularity.
Of course, we can apply policies in GraphQL, even without federation, but federation provides a boundary in an architecture that amplifies the benefits of declarative authorization.
[next slide]
Ok, we’ve been talking a lot about why it’s such a helpful boundary, but lets dig into the how a little bit.
What are the ways we can apply these declarative authorization policies in our GraphQL layer?
Applying declarative policies in GraphQL is a nascent space, but it has tremendous potential upside thanks to the flexibility and granularity we can gain in our security posture.
There are generally two ways this can be done today:
[next slide]
Manually in each resolver or centralized in the schema with a custom directive. Each with benefits and tradeoffs.
Applying policies one-by-one in resolvers, for example creating policy bundles that allow fine-grained and context-aware policies with OPA is simpler – just update the resolvers! But not very flexible. Honestly it’s fairly similar to the service-by-service data-access process.
[next slide]
Another option for applying policies is by customizing our schemas directly. Declaring policies in our schemas requires custom directives.
Custom directives are an advanced GraphQL feature, but it is the most declarative and clean way of applying these rules. Some emerging products on the market offer pre-built custom directives that reduce some of the complexity of building and maintaining them
[next slide]
A GraphQL layer can be a Goldilocks zone in our architecture because it is possible to apply broad rules, for example, to entire services, groups of services, or granular rules inside a query.
By applying these policies declaratively at this level, we can define granular and flexible authorization and even design for more complex patterns like returning partial responses (returning data that a user can access, and an error for requested data they don’t have permission to retrieve).
[next slide]
Ok, enough talking, Let’s make this real. Let’s take a look at a demo of applying declarative authorization policies directly within your GraphQL schemas.
[next slide]
Now, as you can imagine, demoing authorization takes a few tools and I only have a couple minutes left so I’ve trimmed this down as much as I could, and I’ve tried to use common tools wherever possible.
We’ll use Postman to issue our GraphQL queries, Open Policy agent to validate or reject our policies, VS Code to add policies to our schemas, JWTs or Json Web Tokens to contain authorization details about our request, and Apollo Router to compose and federate our GraphQL subgraphs.
And we’ll look at three examples:
[next slide]
Finally, before I click play on the demo, JWTs are really hard to demo. They’re a base64 encoded string that is impossible to read. This is the JWT I’m using for this demo. If the username is “Alice,” they have permission to query the “Locations” subgraph. If they don’t have that username, then they should be blocked.
Again, simplified for the purposes of this demo, but you can imagine a JWT with a much larger or more complex payload than this.
[next slide]
The first, is a fully unauthorized flow. In this example the client will pass a bad JWT, or even no JWT at all, in the GraphQL request. The Apollo Router will coordinate with OPA and reject all requests to the subgraphs.
[next slide]
The next example is an authorized flow. In this instance, the client will issue a valid JWT in the bearer token and the Open Policy Agent will return a successful response to the router, allowing the subgraph requests to complete.
[next slide]
And because I don’t trust the demo gods, I created this quick recording to walk through those three scenarios.
Unauthorized
Authorized
Partially authorized
And the third example is one that I personally find most compelling. Partial authorization. This one is very interesting because we can declaratively authorize _part_ of a request and still return the data the user has access to!
So in this example, the client will pass a JWT token that satisfies part of a request. The router will continue brokering with Open Policy Agent and then will intelligently issue requests to the subgraphs and fields that are satisfied by the declarative policies we’ve set.
I think this is super cool.
[next slide]
And because I don’t trust the demo gods, I created this quick recording to walk through those three scenarios.
Unauthorized
Authorized
Partially authorized
A strong security strategy requires a plan for every layer of our stack, and applying policies in GraphQL can give us flexibility and granularity that we haven’t seen before.
By building on the rules we’ve already applied at our persistence layers and API gateways to include authorization policies in GraphQL, we can use it as a centralized boundary for implementing nuanced, field-level access control and authorization.
If you’d like to learn more, we’d love to have you!
We have a community of >1000 API eng and architects from around the world sharing their experience transforming their stacks, finding ways to manage API sprawl, and establishing useful boundaries in the architectures.
Come join us!