Scylla’s Open-Source
DynamoDB-compatible API
WEBINAR
2
Dor Laor is the CEO of ScyllaDB. Previously, Dor was part of the founding team of the KVM
hypervisor under Qumranet that was acquired by Red Hat. At Red Hat Dor was managing the KVM
and Xen development for several years. Dor holds an MSc from the Technion and a Phd in
snowboarding.
Avi Kivity, CTO of ScyllaDB, is known mostly for starting the Kernel-based Virtual Machine (KVM)
project, the hypervisor underlying many production clouds. He has worked for Qumranet and Red
Hat as KVM maintainer until December 2012. Avi is now CTO of ScyllaDB, a company that seeks to
bring the same kind of innovation to the public cloud space.
Nadav Har’El has had a diverse 20-year career in computer programming and computer science. In
the past he worked on scientific computing, networking software, and information retrieval, but in
recent years his focus has been on virtualization and operating systems, and among other things he
has worked on nested virtualization and exit-less I/O in KVM, and today he maintains the OSv kernel
and also works on Seastar and ScyllaDB.
3
+ The Real-Time Big Data Database
+ Drop-in replacement for Cassandra
+ 10X the performance & low tail latency
+ New: Scylla Cloud, DBaaS
+ Open source and enterprise editions
+ Founded by the creators of KVM hypervisor
+ HQs: Palo Alto, CA; Herzelia, Israel
What is Alternator
+ Why a DynamoDB-compatible API?
Live demos
+ Get started in 5 minutes in docker
+ CLI access to the DynamoDB API
+ Play a game on Alternator
+ Monitoring Alternator
Alternator implementation
+ How Scylla differs from DynamoDB
+ Current compatibility and limitations
Migrating from DynamoDB to Alternator
What is Alternator?
+ Scylla is an efficient NoSQL data store, announced September 2015.
+ It is open source, with enterprise support and cloud (SaaS) options.
+ Compatible with Cassandra and its APIs (CQL, Thrift).
+ Alternator: Adding a DynamoDB API to Scylla.
+ Efficient implementation for modern hardware
+ Throughput 10x higher than Cassandra
+ Linear scalability to many-core machines
+ Focused on modern fast SSDs
+ Low tail latency
+ Reliability
+ Autonomous database (minimal configuration)
We can apply these advantages to more than just Cassandra compatibility!
DynamoDB is similar in design and data model to Scylla
More details on the similarities, and differences, later.
Amazon Dynamo
(2007 paper)
Google Bigtable
(2006 paper)
DynamoDB is SaaS
+ SaaS is easy to get started with,
SaaS trend in industry
+ DynamoDB popularity growing
Vs. Cassandra:
Cassandra
DynamoDB
Better price/performance https://www.scylladb.com/product/benchmarks/dynamodb-benchmark/
+ Managed Scylla (“Scylla Cloud”) is
5 times cheaper than DynamoDB.
Vendor lock-in
+ Users want to move their DynamoDB application to
+ a different cloud provider,
+ a private datacenter,
+ or a hybrid of several clouds or datacenters.
+ Scylla can be run on any cloud or datacenter.
Live 1-node
Alternator Demo
+ Running Alternator is simply running Scylla, with the parameter
“alternator-port” set - to listen for the DynamoAPI API on that port.
+ You can get it running on local your machine in 5 minutes using docker:
docker run --name scylla -d -p 8000:8000
scylladb/scylla-nightly:alternator --alternator-port=8000
+ We can then run Amazon’s DynamoDB CLI tools against port 8000
+ aws --endpoint-url 'http://172.17.0.1:8000' dynamodb create-table
--table-name mytab
--attribute-definitions AttributeName=key,AttributeType=S
--key-schema AttributeName=key,KeyType=HASH
--billing-mode PAY_PER_REQUEST
+ First attempt works, second fails (as expected)
https://github.com/awsdocs/amazon-dynamodb-developer-guide/blob/master/doc_source/TicTacToe.Phase1.md
+ An open-source Python application using DynamoDB
+ Written by Amazon, demonstrates many DynamoDB features
+ Written in Python, using the Amazon’s AWS client library (boto3)
+ python application.py --mode local --port 8000
Implements a multiplayer Tic-Tac-Toe game server.
Many users can connect, invite each other to games, and play against each other,
and keep score.
Alternator
monitoring
+ Observability
+ Live workload
+ Cluster of three 30-core nodes in AWS, each in separate AZ.
+ 1.1 TB data - 1 billion items, 1.1KB each.
+ YCSB workload, 50% read 50% write, Zipfian distribution.
Alternator
implementation
Let’s survey some of the similarities and differences of Scylla and DynamoDB
+ How did we handle the differences?
+ What still needs to be done?
A much more detailed survey can be found in this document.
+ Fast scalable NoSQL databases with real-time response and huge volume
+ Key/Value and Wide-row stores
+ Eventual and configurable consistencies
+ Hashed partition keys - and also sort key for items inside a partition
+ DynamoDB is Cloud Native
+ Only available as a service, part of a huge deployment
+ Scylla has different options: OSS, Enterprise, as-a-service
+ More flexible - runs everywhere
+ Many configuration options - amount of replicas, different consistency levels,
+ Scylla has node operations, cli, etc
+ Scylla integrates with many OSS projects, Prometheus, Kafka, Spark (as first citizens)
+ Scylla’s units are servers (cpu shards). Dynamo’s units are IOPS/Tablets
+ Dynamo uses HTTP(s)/Json, Scylla used CQL
+ Data is divided into tables.
+ Data composed of Items in partitions.
Same as Scylla’s rows in partitions.
+ Item’s key has hash and range parts - like Scylla’s partition and clustering key.
(in DynamoDB API - only one of each)
+ Type of key columns defined in table’s schema (string, number, bytes)
+ But DynamoDB items can have additional attributes - not defined in schema
+ Attributes may be scalars (string, number, etc.), lists, sets, document.
+ Similar to a JSON document.
+ Needs to be emulated in Scylla
+ Put in table one map for top-level attributes
(mapping attribute name to JSON value).
+ Map instead of single JSON allows concurrent updates to
different top-level attributes of same item.
+ TODO: support updates to deep attributes.
● DynamoDB natively supports Read-Modify-Write (RMW) updates:
○ Conditional updates (set a = 2 if a == 1)
○ Counters (set a = a + 1)
○ Attribute copy (set a = b)
● Scylla natively supports independent writes to different columns
(CRDT):
○ Efficient updates to different columns - not requiring a read.
We are adding support for Read-modify-write operations - LWT.
Scylla DynamoDB
Log Structured Merge (LSM)
■ Efficient writes - without prior
read
BTree
■ Write includes a free read, but
are slow.
Quorum-based consistency
■ Writes done on several
replicas independently
■ Concurrent read-modify-write
operations are not serialized.
Leader model
■ One replica (“leader”)
responsible for a write
operation, so can serialize
read-modify-write operations.
+ Each node in Scylla cluster also answers DynamoDB API requests on this port.
+ So no need for separate sizing of an API translation cluster.
+ Same nodes can do both CQL and DynamoDB API
+ As needed, forwards the request to other nodes holding the requested data.
+ Uses internal Scylla function calls and RPC - no translation to CQL.
+ Client needs to send requests to the different Scylla nodes. Can be done via
DNS or HTTP load balancer.
DynamoDB API is: JSON-formatted requests and responses over HTTP/HTTPS.
Request Response
POST / HTTP/1.1
...
X-Amz-Target: DynamoDB_20120810.CreateTable
{ "TableName": "mytab",
"KeySchema":
[{"AttributeName": "key", "KeyType": "HASH"}],
"AttributeDefinitions":
[{"AttributeName": "key", "AttributeType": "S"}],
"BillingMode": "PAY_PER_REQUEST"
}
{ "TableDescription":
{ "AttributeDefinitions":
[{ "AttributeName":"key","AttributeType":"S"}],
"TableName": "mytab",
"KeySchema":
[{ “AttributeName":"key","KeyType":"HASH"}],
"TableStatus": "ACTIVE",
"CreationDateTime": 1569242964,
"TableId":
"91347050-de00-11e9-a100-000000000000"
}
}
+ See detailed current status in alternator.md, and issues in bug tracker:
+ https://github.com/scylladb/scylla/blob/master/docs/alternator/alternator.md
+ https://github.com/scylladb/scylla/issues
+ Several DynamoDB applications already work unmodified.
+ Some of the issues we will address for the GA:
+ Safe concurrent read-modify-write operations
+ A few operations and subcases of operations not yet supported
+ Authentication
+ SaaS on Scylla Cloud
+ DynamoDB streams (CDC)
Migrating from
DynamoDB to Scylla
+ Install Scylla and load balancer (or wait for Scylla Cloud SaaS availability)
+ Tell your application, written to use DynamoDB, Scylla’s endpoint address
+ This is a preview release. Watch out for unsupported features and unsafe
concurrent RMW operations.
+ Migrate existing data from DynamoDB to Scylla using DynamoDB API
+ E.g. Spark migrator:
https://www.scylladb.com/2019/09/12/migrating-from-dynamodb-to-scylla
+ Scylla is a very efficient, reliable, low latency NoSQL data store,
that began with Cassandra compatibility.
+ The Alternator Project adds to Scylla DynamoDB API compatibility.
+ Can run existing applications designed for DynamoDB,
+ On any cloud or data center, not just on AWS.
+ Open source.
+ Currently a preview release, with some limitations, but GA expected soon.
United States Israel www.scylladb.com
@scylladb

Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API

  • 1.
  • 2.
    2 Dor Laor isthe CEO of ScyllaDB. Previously, Dor was part of the founding team of the KVM hypervisor under Qumranet that was acquired by Red Hat. At Red Hat Dor was managing the KVM and Xen development for several years. Dor holds an MSc from the Technion and a Phd in snowboarding. Avi Kivity, CTO of ScyllaDB, is known mostly for starting the Kernel-based Virtual Machine (KVM) project, the hypervisor underlying many production clouds. He has worked for Qumranet and Red Hat as KVM maintainer until December 2012. Avi is now CTO of ScyllaDB, a company that seeks to bring the same kind of innovation to the public cloud space. Nadav Har’El has had a diverse 20-year career in computer programming and computer science. In the past he worked on scientific computing, networking software, and information retrieval, but in recent years his focus has been on virtualization and operating systems, and among other things he has worked on nested virtualization and exit-less I/O in KVM, and today he maintains the OSv kernel and also works on Seastar and ScyllaDB.
  • 3.
    3 + The Real-TimeBig Data Database + Drop-in replacement for Cassandra + 10X the performance & low tail latency + New: Scylla Cloud, DBaaS + Open source and enterprise editions + Founded by the creators of KVM hypervisor + HQs: Palo Alto, CA; Herzelia, Israel
  • 4.
    What is Alternator +Why a DynamoDB-compatible API? Live demos + Get started in 5 minutes in docker + CLI access to the DynamoDB API + Play a game on Alternator + Monitoring Alternator Alternator implementation + How Scylla differs from DynamoDB + Current compatibility and limitations Migrating from DynamoDB to Alternator
  • 5.
  • 6.
    + Scylla isan efficient NoSQL data store, announced September 2015. + It is open source, with enterprise support and cloud (SaaS) options. + Compatible with Cassandra and its APIs (CQL, Thrift). + Alternator: Adding a DynamoDB API to Scylla.
  • 7.
    + Efficient implementationfor modern hardware + Throughput 10x higher than Cassandra + Linear scalability to many-core machines + Focused on modern fast SSDs + Low tail latency + Reliability + Autonomous database (minimal configuration) We can apply these advantages to more than just Cassandra compatibility!
  • 8.
    DynamoDB is similarin design and data model to Scylla More details on the similarities, and differences, later. Amazon Dynamo (2007 paper) Google Bigtable (2006 paper)
  • 9.
    DynamoDB is SaaS +SaaS is easy to get started with, SaaS trend in industry + DynamoDB popularity growing Vs. Cassandra: Cassandra DynamoDB
  • 10.
    Better price/performance https://www.scylladb.com/product/benchmarks/dynamodb-benchmark/ +Managed Scylla (“Scylla Cloud”) is 5 times cheaper than DynamoDB.
  • 11.
    Vendor lock-in + Userswant to move their DynamoDB application to + a different cloud provider, + a private datacenter, + or a hybrid of several clouds or datacenters. + Scylla can be run on any cloud or datacenter.
  • 12.
  • 13.
    + Running Alternatoris simply running Scylla, with the parameter “alternator-port” set - to listen for the DynamoAPI API on that port. + You can get it running on local your machine in 5 minutes using docker: docker run --name scylla -d -p 8000:8000 scylladb/scylla-nightly:alternator --alternator-port=8000
  • 14.
    + We canthen run Amazon’s DynamoDB CLI tools against port 8000 + aws --endpoint-url 'http://172.17.0.1:8000' dynamodb create-table --table-name mytab --attribute-definitions AttributeName=key,AttributeType=S --key-schema AttributeName=key,KeyType=HASH --billing-mode PAY_PER_REQUEST + First attempt works, second fails (as expected)
  • 15.
    https://github.com/awsdocs/amazon-dynamodb-developer-guide/blob/master/doc_source/TicTacToe.Phase1.md + An open-sourcePython application using DynamoDB + Written by Amazon, demonstrates many DynamoDB features + Written in Python, using the Amazon’s AWS client library (boto3) + python application.py --mode local --port 8000 Implements a multiplayer Tic-Tac-Toe game server. Many users can connect, invite each other to games, and play against each other, and keep score.
  • 17.
  • 18.
  • 19.
    + Live workload +Cluster of three 30-core nodes in AWS, each in separate AZ. + 1.1 TB data - 1 billion items, 1.1KB each. + YCSB workload, 50% read 50% write, Zipfian distribution.
  • 20.
  • 21.
    Let’s survey someof the similarities and differences of Scylla and DynamoDB + How did we handle the differences? + What still needs to be done? A much more detailed survey can be found in this document.
  • 22.
    + Fast scalableNoSQL databases with real-time response and huge volume + Key/Value and Wide-row stores + Eventual and configurable consistencies + Hashed partition keys - and also sort key for items inside a partition
  • 23.
    + DynamoDB isCloud Native + Only available as a service, part of a huge deployment + Scylla has different options: OSS, Enterprise, as-a-service + More flexible - runs everywhere + Many configuration options - amount of replicas, different consistency levels, + Scylla has node operations, cli, etc + Scylla integrates with many OSS projects, Prometheus, Kafka, Spark (as first citizens) + Scylla’s units are servers (cpu shards). Dynamo’s units are IOPS/Tablets + Dynamo uses HTTP(s)/Json, Scylla used CQL
  • 24.
    + Data isdivided into tables. + Data composed of Items in partitions. Same as Scylla’s rows in partitions. + Item’s key has hash and range parts - like Scylla’s partition and clustering key. (in DynamoDB API - only one of each) + Type of key columns defined in table’s schema (string, number, bytes)
  • 25.
    + But DynamoDBitems can have additional attributes - not defined in schema + Attributes may be scalars (string, number, etc.), lists, sets, document. + Similar to a JSON document. + Needs to be emulated in Scylla + Put in table one map for top-level attributes (mapping attribute name to JSON value). + Map instead of single JSON allows concurrent updates to different top-level attributes of same item. + TODO: support updates to deep attributes.
  • 26.
    ● DynamoDB nativelysupports Read-Modify-Write (RMW) updates: ○ Conditional updates (set a = 2 if a == 1) ○ Counters (set a = a + 1) ○ Attribute copy (set a = b) ● Scylla natively supports independent writes to different columns (CRDT): ○ Efficient updates to different columns - not requiring a read. We are adding support for Read-modify-write operations - LWT.
  • 27.
    Scylla DynamoDB Log StructuredMerge (LSM) ■ Efficient writes - without prior read BTree ■ Write includes a free read, but are slow. Quorum-based consistency ■ Writes done on several replicas independently ■ Concurrent read-modify-write operations are not serialized. Leader model ■ One replica (“leader”) responsible for a write operation, so can serialize read-modify-write operations.
  • 28.
    + Each nodein Scylla cluster also answers DynamoDB API requests on this port. + So no need for separate sizing of an API translation cluster. + Same nodes can do both CQL and DynamoDB API + As needed, forwards the request to other nodes holding the requested data. + Uses internal Scylla function calls and RPC - no translation to CQL. + Client needs to send requests to the different Scylla nodes. Can be done via DNS or HTTP load balancer.
  • 29.
    DynamoDB API is:JSON-formatted requests and responses over HTTP/HTTPS. Request Response POST / HTTP/1.1 ... X-Amz-Target: DynamoDB_20120810.CreateTable { "TableName": "mytab", "KeySchema": [{"AttributeName": "key", "KeyType": "HASH"}], "AttributeDefinitions": [{"AttributeName": "key", "AttributeType": "S"}], "BillingMode": "PAY_PER_REQUEST" } { "TableDescription": { "AttributeDefinitions": [{ "AttributeName":"key","AttributeType":"S"}], "TableName": "mytab", "KeySchema": [{ “AttributeName":"key","KeyType":"HASH"}], "TableStatus": "ACTIVE", "CreationDateTime": 1569242964, "TableId": "91347050-de00-11e9-a100-000000000000" } }
  • 30.
    + See detailedcurrent status in alternator.md, and issues in bug tracker: + https://github.com/scylladb/scylla/blob/master/docs/alternator/alternator.md + https://github.com/scylladb/scylla/issues + Several DynamoDB applications already work unmodified. + Some of the issues we will address for the GA: + Safe concurrent read-modify-write operations + A few operations and subcases of operations not yet supported + Authentication + SaaS on Scylla Cloud + DynamoDB streams (CDC)
  • 31.
  • 32.
    + Install Scyllaand load balancer (or wait for Scylla Cloud SaaS availability) + Tell your application, written to use DynamoDB, Scylla’s endpoint address + This is a preview release. Watch out for unsupported features and unsafe concurrent RMW operations. + Migrate existing data from DynamoDB to Scylla using DynamoDB API + E.g. Spark migrator: https://www.scylladb.com/2019/09/12/migrating-from-dynamodb-to-scylla
  • 33.
    + Scylla isa very efficient, reliable, low latency NoSQL data store, that began with Cassandra compatibility. + The Alternator Project adds to Scylla DynamoDB API compatibility. + Can run existing applications designed for DynamoDB, + On any cloud or data center, not just on AWS. + Open source. + Currently a preview release, with some limitations, but GA expected soon.
  • 36.
    United States Israelwww.scylladb.com @scylladb