Containing the Database
Nick Stott
Platform Engineer, Compose
Nick Stott
Platform Engineer @
Compose
Specialties: Containers and the databases
that go in them.
Compose – Database as a
Service
• We already host MongoDB, PostgreSQL, Redis,
Elasticsearch, RethinkDB, etcd and RabbitMQ
• Today, we’re announcing ScyllaDB on Compose
§ On Compose Enterprise today
§ On Compose generally very soon
How Compose
Got There
Compose – Phase 1
• Founded as MongoHQ in 2009
§ The first hosted MongoDB-as-a-service company
§ Y-Combinator Class of 2011
• Early adopters of containers
§ Containers helped build the infrastructure to offer DBaaS
Early problems…
• Static deployments, lost a lot of the magic of containers
• Pre-provisioned deployments
• Ran the full system, cron, sshd, everything was running in
the container
• Everything ran on the public internet
• This only worked for Mongo
This sucked….
… but we already
knew what we had
to do to fix it.
Elastic
Deployments
Compose – Phase 2
• Elastic deployments
§ More dynamic containers
§ Deploy & destroy containers on demand
• Bonus:
§ Orchestration becomes the authority for business/DB logic
But…
• Everything was still facing the public internet
• Orchestration layer became bloated
• This infrastructure limited us
• Still running the entire OS inside the container
• And, of course…
Still Only MongoDB
WE
WANT
MORE
Into a post-
MongoDB world
What needed to change…
• At Compose, each database needs:
§ To be on a private network
§ Not accessible to the internet
§ Deploy & destroy containers on demand
§ Logic-less orchestration
§ Lightweight dynamic containers
The Bad News
• From the outside, each database looks different.
§ Different query languages
§ Different set of drivers
§ Configured differently
§ Needs a different environment to run in
• Finding common ground is a daunting problem.
The Good News
• From the outside, all databases need the same things:
§ Scaling
§ Clustering and Failover
§ Backup and restore data
§ Quiescing nicely
§ Private networks
§ Operational health checks
Working for the future
• The next platform we built would need all those things
• So we refined the concepts
• And we took inspiration from the ideas in the Twelve-
Factor App - https://12factor.net/
• So we created…
The Twelve Factors
of
Stateful Apps
What is a stateful app?
• A database is an application with state – the data
• A database without data is just a base
12 Factor Stateful Apps
Configuration Scaling
Deployments & Processes Logs & Metrics
Disposability Database Administration
Affinity & Storage Recipes
Network & Portal Access Codebase
Fixed Network Identity Tools & Versions
Configuration
• Store configurable parameters in the container’s
environment
§ This encourages reuse of container images
• DBs need to be run with one or more configuration
§ Configuration files can be built quickly & repeatedly during the
pre-start `Configure` process by putting environmental data into
templates
• For Scylla we write the scylla.yaml file and configure the
listen addresses and cluster names.
Deployments and Processes
• A usefully redundant database contains more than one
moving piece.
§ Decompose the database into a collection of useful and distinct
processes
§ Run these processes as possibly stateful services in their own
environment/container
• Scylla is 3 data nodes and 3 proxy nodes
Disposability
• Database containers should be entirely ephemeral,
§ Easily created and destroyed.
• Destroying a container should not destroy the data
§ The database has a different life cycle than the data
• The database and the data have different lifecycles
§ Whatever happens to the database instance the data has to live
on
Affinity and Storage
• There needs to be affinity between a database and the
storage
§ Database nodes can have one or more attached volumes that are
persistent on container restart
§ These volumes have a different lifecycle than the container
§ Data volumes should be accessible from the host
Network and Portal Access
• Databases should live on their own private network
• Access should only happen through specialized “portal”
containers
§ Do not unnecessarily expose things on public networks
§ Only expose specific, hardened entry points on portal containers
via port binding
§ Portal containers should terminate ssl
• For Scylla, each data node is matched with a portal
controlling outside access to it
Fixed Network Identity - 1
• The naming and addressing of the the entire system
should be fixed before creation
• Container addresses should be static across container
restarts
• This includes the all elements that make up the networks
configuration
Fixed Network Identity -2
• Discovering network after the fact is problematic.
• Scylla, because of client driver auto discovery, needs the
same number of portals as there are database nodes.
• Each portal needs to know the address of it's partner
database node.
• Each database node also needs to know the address of the
portal.
Scaling
• First scale up the container
§ In a hosting environment, there should be plenty of leeway to
expand a container’s resources.
§ Databases prefer to stay up so just add resources
• Then scale deployments horizontally
§ Scylla lets us do this easily
§ PostgreSQL, Redis, MySQL are difficult to scale horizontally
§ Scaling horizontally and moving storage can be costly
Logs & Metrics
• Database specific metrics and log collection should be
done within the deployment from an extra container
• Collect Logs and Metrics from all nodes
• Each node should provide a stream of logs to stdout
§ Easier to collect on container hosts
§ Standardized practice across all nodes
Database Orchestration
• Encapsulate administration functions within the container
§ Push the logic out of the orchestration and into the containers
• Manage the deployment through the use of recipes
§ Sequence of ordered operations
§ Don’t know about the internal state of the database
Codebase
• Use one image
§ with a deterministic build process
§ can be used to create many running instances
• Different images should be provided for different database
versions
§ For example Scylla 1.0 vs 1.2 vs 1.3 would be three separate
images
Tools and Versions
• DB administration tools
§ Should be versioned
§ Kept in lock-step with the version of the database
- Avoids exposure to changes in how admin tools function.
§ Thrift support in Scylla was released in 1.3
- The tools to administer that were added only to that version
• No restarts on upgrades to administration tooling
§ Consider overlaying your tooling on top of running databases
That’s the twelve factors….
Configuration Scaling
Deployments & Processes Logs & Metrics
Disposability Database Orchestration
Affinity & Storage Recipes
Network & Portal Access Codebase
Fixed Network Identity Tools & Versions
In practice at Compose today
• We apply these factors throughout our platform
• It works for a wide range of database technologies
• It gives us reliable, repeatable, resilient systems
• And now Scylla is coming to that platform
Thank You!
Contact: hello@compose.com | @composeio

Scylla Summit 2016: Compose on Containing the Database

  • 1.
    Containing the Database NickStott Platform Engineer, Compose
  • 2.
    Nick Stott Platform Engineer@ Compose Specialties: Containers and the databases that go in them.
  • 3.
    Compose – Databaseas a Service • We already host MongoDB, PostgreSQL, Redis, Elasticsearch, RethinkDB, etcd and RabbitMQ • Today, we’re announcing ScyllaDB on Compose § On Compose Enterprise today § On Compose generally very soon
  • 4.
  • 5.
    Compose – Phase1 • Founded as MongoHQ in 2009 § The first hosted MongoDB-as-a-service company § Y-Combinator Class of 2011 • Early adopters of containers § Containers helped build the infrastructure to offer DBaaS
  • 6.
    Early problems… • Staticdeployments, lost a lot of the magic of containers • Pre-provisioned deployments • Ran the full system, cron, sshd, everything was running in the container • Everything ran on the public internet • This only worked for Mongo
  • 7.
  • 8.
    … but wealready knew what we had to do to fix it.
  • 9.
  • 10.
    Compose – Phase2 • Elastic deployments § More dynamic containers § Deploy & destroy containers on demand • Bonus: § Orchestration becomes the authority for business/DB logic
  • 11.
    But… • Everything wasstill facing the public internet • Orchestration layer became bloated • This infrastructure limited us • Still running the entire OS inside the container • And, of course…
  • 12.
  • 13.
  • 14.
    What needed tochange… • At Compose, each database needs: § To be on a private network § Not accessible to the internet § Deploy & destroy containers on demand § Logic-less orchestration § Lightweight dynamic containers
  • 15.
    The Bad News •From the outside, each database looks different. § Different query languages § Different set of drivers § Configured differently § Needs a different environment to run in • Finding common ground is a daunting problem.
  • 16.
    The Good News •From the outside, all databases need the same things: § Scaling § Clustering and Failover § Backup and restore data § Quiescing nicely § Private networks § Operational health checks
  • 17.
    Working for thefuture • The next platform we built would need all those things • So we refined the concepts • And we took inspiration from the ideas in the Twelve- Factor App - https://12factor.net/ • So we created…
  • 18.
  • 19.
    What is astateful app? • A database is an application with state – the data • A database without data is just a base
  • 20.
    12 Factor StatefulApps Configuration Scaling Deployments & Processes Logs & Metrics Disposability Database Administration Affinity & Storage Recipes Network & Portal Access Codebase Fixed Network Identity Tools & Versions
  • 21.
    Configuration • Store configurableparameters in the container’s environment § This encourages reuse of container images • DBs need to be run with one or more configuration § Configuration files can be built quickly & repeatedly during the pre-start `Configure` process by putting environmental data into templates • For Scylla we write the scylla.yaml file and configure the listen addresses and cluster names.
  • 22.
    Deployments and Processes •A usefully redundant database contains more than one moving piece. § Decompose the database into a collection of useful and distinct processes § Run these processes as possibly stateful services in their own environment/container • Scylla is 3 data nodes and 3 proxy nodes
  • 23.
    Disposability • Database containersshould be entirely ephemeral, § Easily created and destroyed. • Destroying a container should not destroy the data § The database has a different life cycle than the data • The database and the data have different lifecycles § Whatever happens to the database instance the data has to live on
  • 24.
    Affinity and Storage •There needs to be affinity between a database and the storage § Database nodes can have one or more attached volumes that are persistent on container restart § These volumes have a different lifecycle than the container § Data volumes should be accessible from the host
  • 25.
    Network and PortalAccess • Databases should live on their own private network • Access should only happen through specialized “portal” containers § Do not unnecessarily expose things on public networks § Only expose specific, hardened entry points on portal containers via port binding § Portal containers should terminate ssl • For Scylla, each data node is matched with a portal controlling outside access to it
  • 26.
    Fixed Network Identity- 1 • The naming and addressing of the the entire system should be fixed before creation • Container addresses should be static across container restarts • This includes the all elements that make up the networks configuration
  • 27.
    Fixed Network Identity-2 • Discovering network after the fact is problematic. • Scylla, because of client driver auto discovery, needs the same number of portals as there are database nodes. • Each portal needs to know the address of it's partner database node. • Each database node also needs to know the address of the portal.
  • 28.
    Scaling • First scaleup the container § In a hosting environment, there should be plenty of leeway to expand a container’s resources. § Databases prefer to stay up so just add resources • Then scale deployments horizontally § Scylla lets us do this easily § PostgreSQL, Redis, MySQL are difficult to scale horizontally § Scaling horizontally and moving storage can be costly
  • 29.
    Logs & Metrics •Database specific metrics and log collection should be done within the deployment from an extra container • Collect Logs and Metrics from all nodes • Each node should provide a stream of logs to stdout § Easier to collect on container hosts § Standardized practice across all nodes
  • 30.
    Database Orchestration • Encapsulateadministration functions within the container § Push the logic out of the orchestration and into the containers • Manage the deployment through the use of recipes § Sequence of ordered operations § Don’t know about the internal state of the database
  • 31.
    Codebase • Use oneimage § with a deterministic build process § can be used to create many running instances • Different images should be provided for different database versions § For example Scylla 1.0 vs 1.2 vs 1.3 would be three separate images
  • 32.
    Tools and Versions •DB administration tools § Should be versioned § Kept in lock-step with the version of the database - Avoids exposure to changes in how admin tools function. § Thrift support in Scylla was released in 1.3 - The tools to administer that were added only to that version • No restarts on upgrades to administration tooling § Consider overlaying your tooling on top of running databases
  • 33.
    That’s the twelvefactors…. Configuration Scaling Deployments & Processes Logs & Metrics Disposability Database Orchestration Affinity & Storage Recipes Network & Portal Access Codebase Fixed Network Identity Tools & Versions
  • 34.
    In practice atCompose today • We apply these factors throughout our platform • It works for a wide range of database technologies • It gives us reliable, repeatable, resilient systems • And now Scylla is coming to that platform
  • 35.