2011 july-gtug-high-replication-datastore

High Replication
Datastore
Ikai Lan
plus.ikailan.com
NYC GTUG
July 27, 2011

Wednesday, July 27, 2011

About the speaker

• Ikai Lan
• Developer Relations at Google based out
of San Francisco, CA
• Twitter: @ikai
• Google+: plus.ikailan.com


Agenda

• What is App Engine?
• What is High Replication datastore?
• Underneath the hood


What is App Engine?


Software

Platform

Infrastructure

Source: Gartner AADI Summit Dec 2009


SDK & “The Cloud”

Hardware

Networking

Operating system

Application runtime

Java, Python, Go

Static file serving


Scales dynamically

App
Server


Scales dynamically

App
Server

App
Server

App
Server


Customer: WebFilings

Disruptive multi-tenant App Engine application adopted by
Fortune 500 companies.


Customer: The Royal Wedding

Peaked at 32,000 requests per second with no disruption!


>100K Developers

>200K Apps

>1.5B daily pageviews


App Engine
Datastore
Schemaless, non-relational
datastore built on top of
Google’s Bigtable technology

Enables rapid development
and scalability


High Replication
• strongly consistent
• multi datacenter
• High reliability
• consistent
performance
• no data loss

How do I use HR?

• Create a new application! Just remember
the rules
• Fetch by key and ancestor queries exhibit
strongly consistent behavior

• Queries without an ancestor exhibit
eventually consistent behavior


Strong vs. Eventual
• Strong consistency means immediately after
the datastore tells us the data has been
committed, a subsequent read will return
the data written
• Eventual consistency means that some time
after the datastore tells us data has been
committed, a read will return written data -
immediate read may or may not


This is strongly
consistent
DatastoreService datastore = DatastoreServiceFactory
.getDatastoreService();

Entity item = new Entity("Item");
item.setProperty("data", 123);

Key key = datastore.put(item);

// This exhibits strong consistency.
// It should return the item we just saved.
Entity result = datastore.get(key);


This is strongly
consistent
// Save the entity root
Entity root = new Entity("Root");
Key rootKey = datastore.put(root);

// Save the child
Entity childItem = new Entity("Item", rootKey);
childItem.setProperty("data", 123);
datastore.put(childItem);

Query strongConsistencyQuery = new Query("Item");
strongConsistencyQuery.setAncestor(rootKey);
strongConsistencyQuery.addFilter("data", FilterOperator.EQUAL, 123);

FetchOptions opts = FetchOptions.Builder.withDefaults();

// This query exhibits strong consistency.
// It will return the item we just saved.
List<Entity> results = datastore.prepare(strongConsistencyQuery)
.asList(opts);


This is eventually
consistent
Entity item = new Entity("Item");
item.setProperty("data", 123);
datastore.put(item);

// Not an ancestor query
Query eventuallyConsistentQuery = new Query("Item");
eventuallyConsistentQuery.addFilter("data", FilterOperator.EQUAL, 123);


// This query exhibits eventual consistency.
// It will likely return an empty list.
List<Entity> results = datastore.prepare(eventuallyConsistentQuery)
.asList(opts);


Why?

• Reads are transactional
• On a read, we try to determine if we have
the latest version of some data
• If not, we catch up the data on the node to
the latest version


To understand this ...

• We need some understanding of Paxos ...
• ... which necessitates some understanding
of transactions
• ... which necessitates some understanding
of entity groups


Entity Groups
Entity
User
group root

Blog Blog

Entry Entry Entry

Comment
Comment Comment


Entity groups

// Save the child



.asList(opts);


Optimistic locking
Client A reads Client B
data. It's reads data.
current It's current
version is 11 version is 11

Modify data. Modify data.
Increment version Increment version
to 12 Datastore to 12

Client B tries
Client ! tries to to save data.
save data. Success!
Datastore
version is
higher or equal
than my
version - FAIL


Transactional reads

// Save the child



.asList(opts);


Transactional reads
Still being committed
Blog Entry
Version 11

Comment Comment
Parent: Entry Parent: Entry
Version 11 Version 12

Client B
Client A reads
Datastore transactionally
data
writing data

Version 12 has not ﬁnished committing -
Datastore returns version 11


Paxos simpliﬁed
Give me the
newest data Node A Node B
Datastore
Client

Is my data
up to date?

Node C Node D

1. If the data is up to date, return it

2. if the data is NOT up to date, "catch up" the data
by applying the jobs in the journal and return the latest
data


More reading

• My example was grossly oversimpliﬁed
• More details can be found here:
http://www.cidrdb.org/cidr2011/Papers/
CIDR11_Paper32.pdf


Contradictory advice

• Entity groups must be as big as possible to
cover as much related data as you can
• Entity groups must be small enough such
that your write rate per entity group never
goes above one write/second


Summary

• Remember the rules of strong consistency
and eventual consistency
• Group your data into entity groups when
possible and use ancestor queries


Questions?

• Twitter: @ikai
• Google+: plus.ikailan.com


2011 july-gtug-high-replication-datastore

Recommended

Recommended

More Related Content

Similar to 2011 july-gtug-high-replication-datastore

Similar to 2011 july-gtug-high-replication-datastore (20)

More from ikailan

More from ikailan (12)

Recently uploaded

Recently uploaded (20)

2011 july-gtug-high-replication-datastore