Introducing the App Engine datastore

Hands on with the App
Engine Datastore

Ikai Lan
May 9th, 2011

2

Thursday, May 26, 2011

About the speaker
• Ikai Lan - Developer Programs Engineer, Developer Relations
• Twitter: @ikai
• Google Profile: http://profiles.google.com/ikai.lan

3


Lab prerequisites
• JDK 1.5+
• Apache Ant
• Codelab package: http://code.google.com/p/2011-datastore-
bootcamp-codelab/downloads/detail?name=2011-datastore-
bootcamp-codelab.zip

Shortlink: http://tinyurl.com/datastore-bootcamp

4


Goals of this talk
• Understand a bit of how the datastore works underneath the
hood
• Have a conceptual background for the persistence codelab

5


Understanding the datastore
• The underlying Bigtable
• Indexing and queries
• Complex queries
• Entity groups
• Underlying infrastructure

6


Datastore layers

Complex Entity Group Queries on Key range Get and set
queries Transactions properties scan by key
Datastore
✓ ✓ ✓ ✓ ✓
Megastore
✓ ✓ ✓ ✓
Bigtable
✓ ✓

7


Datastore layers

Get and set
Complex Entity Group Group on Key on
Complex Entity Queries Queries range byGet and set
key, key
queries Transactions properties
scan range scans
by key
Datastore
✓✓ ✓ ✓ ✓ ✓✓ ✓✓
Megastore
✓ ✓ ✓ ✓✓ ✓✓
Bigtable
✓ ✓✓

8


What does a Bigtable row look like?

Source: http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en/us/papers/bigtable-osdi06.pdf

9


Bigtable API
• “Give me the column ‘name’ at key 123”
• “Set the column ‘name’ at key 123 to ‘ikai’”
• “Give me all columns where the key is greater than 100 and less
than 200”

10


Datastore layers

Get and set
key, key
scan range scans
by key
Datastore
✓✓ ✓ ✓ ✓ ✓✓ ✓✓
Megastore
✓ ✓ ✓ ✓✓ ✓✓
Bigtable
✓ ✓✓

11


Megastore API
• “Give me all rows where the column ‘name’ equals ‘ikai’”
• “Transactionally write an update to this group of entities”
• “Do a cross datacenter write of this data such that reads will be
strongly consistent” (High Replication Datastore)
• Megastore paper: http://www.cidrdb.org/cidr2011/Papers/
CIDR11_Paper32.pdf

12


Datastore layers

Get and set
key, key
scan range scans
by key
Datastore
✓✓ ✓ ✓ ✓ ✓✓ ✓✓
Megastore
✓ ✓ ✓ ✓✓ ✓✓
Bigtable
✓ ✓✓

13


App Engine Datastore API
• “Give me all Users for my app where the name equals ‘ikai’,
company equals ‘Google’, and sort them by the ‘awesome’
column, descending”

14


Queries


Let’s save an Entity with the low-level Java API
DatastoreService datastore = DatastoreServiceFactory
.getDatastoreService();

Entity ikai = new Entity("User", "ikai@google.com");

ikai.setProperty("firstName", "ikai");
ikai.setProperty("company", "google");

ikai.setUnindexedProperty("biography",
"Ikai is a great man, a great, great man.");

datastore.put(ikai);

16


Get an instance of the DatastoreService

Fetch a client instance




17


Instantiate a new Entity


Set the Entity Kind



18


Instantiate a new Entity


ikai.setProperty("firstName", "ikai"); a
Set unique key



19


Set indexed properties

First argument is the
property name

ikai.setUnindexedProperty("biography", argument
Second is the
property value


20


Set unindexed properties


This property will be saved, but we
will not run queries against it


21


Commit the entity to the datastore



"Ikai is a thing! man, a great, great man.");
Save the great


22


What happens when we save?

Write the entity
Make the Success!
write RPC
Write the
indexes

23


What actually gets written?

Entities table

Bigtable key Value
AppId:User:ikai@google.com ( Protobuf serialized entity - includes
firstName, company and biography
values )

Indexes table

Bigtable key Value
AppId:User:firstName:ikai:ikai@google.com ( Empty )

AppId:User:company:google:ikai@google.com ( Empty )

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

24


Now let’s run a query
• If we have the key, we can fetch it right away by key
• What if we don’t? We need indexes.

25


Let’s run a query

Query queryByName = new Query("User");

queryByName.addFilter("firstName",
FilterOperator.EQUAL, "ikai");

List<Entity> results = datastore.prepare(
queryByName).asList(
FetchOptions.Builder.withDefaults());

// Roughly equivalent to:
// SELECT * from User WHERE firstname = ‘ikai’;

26


Step 1: Query the indexes table

Entities table

Bigtable key Value
values )

Scan the indexes table for values >=
AppId:User:firstName:
Indexes table

Bigtable key Value



27


Step 2: Start extracting keys

Entities table

Bigtable key Value
values )

Indexes table

Bigtable key Value


That gets us this row - extract the key
ikai@google.com

28


Step 3: Batch get the entities themselves

Entities table

Bigtable key Value
values )

Now
Indexes table let’s go back to the entities table and
fetch that key. Success! Value
Bigtable key



29


Key takeaways
• This isn’t a relational database
– There are no full table scans
– Indexes MUST exist for every property we want to query
– Natively, we can only query on matches or startsWith queries
– Don’t index what we never need to query on
• Get by key = one step. Query on property value = 2 steps

30


Let’s run a more complex query!

Query queryByName = new Query("User");

queryByName.addFilter("firstName",
FilterOperator.EQUAL, "ikai");

queryByName.addFilter("company",
FilterOperator.EQUAL, "google");

List<Entity> results = datastore.prepare(
queryByName).asList(
FetchOptions.Builder.withDefaults());

// Roughly equivalent to:
// SELECT * from User WHERE firstname = ‘ikai’
// AND company = ‘google’;

31


Query resolution strategies
• This query can be resolved using built in indexes
– Zig zag merge join - we’ll cover this example

• Can be optimized using composite indexes

32


Zig zag across multiple indexes
Begin by scanning indexes >=
Bigtable key
AppId:User:company:google
AppId:User:company:acme:alfred@acme.com
AppId:User:company:google:david@google.com
AppId:User:company:google:ikai@google.com
AppId:User:company:google:max@google.com
AppId:User:company:megacorp:zed@megacorp.com

Bigtable key
AppId:User:firstName:alfred:alfred@acme.com
AppId:User:firstName:ikai:ikai@acme.com
AppId:User:firstName:ikai:ikai@google.com
AppId:User:firstName:ikai:ikai@megacorp.com
AppId:User:firstName:zed:zed@megacorp.com


33



Bigtable key

There’s at least a partial match,
Bigtable key
so we “jump” to the next index


34



Bigtable key

Move to the next index. Start a scan for keys >=

AppId:User:firstName:ikai:david@google.com Bigtable key


35



Bigtable key
Okay, so that’s a twist. The first value that
matches has key ikai@google.com! Does this
Bigtable key
value exist in the first index? AppId:User:firstName:alfred:alfred@acme.com


36


Let’s advance the original cursor to >=
Bigtable key

Bigtable key


37



Bigtable key

Bigtable key
Alright! We found a match. Let’s AppId:User:firstName:ikai:ikai@acme.com
add the key to our in memory list AppId:User:firstName:ikai:ikai@google.com
and go back to the first index AppId:User:firstName:ikai:ikai@megacorp.com


38



Bigtable key Let’s move on to see if there are any more
matches. Let’s start at max@google.com

Bigtable key
Bigtable key


39



Bigtable key

Are there any keys >=

AppId:User:firstName:ikai:max@google.com? Bigtable key


40



Bigtable key

No. We’re at the end of our Bigtable key
index scans. Let’s do a batch AppId:User:firstName:alfred:alfred@acme.com

key of our list of keys: AppId:User:firstName:ikai:ikai@acme.com
[ ‘ikai@google.com’ ]


41


Batch get the entities themselves

Entities table

Bigtable key Value
values )

Now let’s go back to the entities table and
fetch that key. Success!


42


Let’s change the shape of the data
• Zig zag performance is HIGHLY dependent on the shape of the
data
• Let’s go ahead and muck with the data a bit

43


Same query, sparsely distributed matches

Bigtable key

Bigtable key
AppId:User:firstName:igor:ikai@google.com


44


Begin by scanning indexes >=
Bigtable key
AppId:User:company:google

Bigtable key


45



Bigtable key

Bigtable key
AppId:User:firstName:ikai:david@google.com


46



Bigtable key

Bigtable key

Oh ... no matches. Let’s AppId:User:firstName:alfred:alfred@acme.com
move back to the first AppId:User:firstName:igor:ikai@google.com
index and move the AppId:User:firstName:ikai:ikai@megacorp.com
cursor down AppId:User:firstName:zed:zed@megacorp.com


47



Bigtable key

Okay, we’ve got another Googler

Bigtable key


48



Bigtable key

Bigtable key


49



Bigtable key
Oh ... no matches here
AppId:User:company:google:ikai@google.com either. Let’s go back to
AppId:User:company:google:max@google.com the first index.

Bigtable key


50



Bigtable key
Oh ... no matches here
AppId:User:company:google:ikai@google.com either. Let’s go back to
AppId:User:company:google:max@google.com the first index.

Bigtable key
... if these indexes were AppId:User:firstName:ikai:ikai@acme.com
huge, we could be here AppId:User:firstName:igor:ikai@google.com
for a while! AppId:User:firstName:ikai:ikai@megacorp.com


51


What happens in this case?
• If we traverse too many indexes, the datastore throws a
NeedIndexException
• We’ll want to build a composite index

52


Composite index

Bigtable key
AppId:User:company:acme:firstName:alfred:alfred@acme.com
AppId:User:company:google:firstName:david:david@google.com
AppId:User:company:google:firstName:ikai:ikai@google.com
AppId:User:company:google:firstName:max:max@google.com
AppId:User:company:megacorp:firstName:zed:zed@megacorp.com


53


Composite index

Bigtable key

Search for all keys >=
AppId:User:company:google:firstName:ikai


54


Composite index

Bigtable key

Well, that was much faster, wasn’t it?


55


Composite index tradeoffs
• Created at entity save time - incurs additional datastore CPU
and storage quota
• You can only create 200 composite index
• You need to know the possible queries ahead of time!

56


Complex Queries takeaways
• This isn’t a relational database
– There are no full table scans
– Indexes MUST exist for every property we want to query
• Performance depends on the shape of the data
• Worse case scenario: if your query matches are highly sparse
• Build composite indexes when you need them

57


Entity Groups


Why entity groups?
• We can perform transactions within this group - but not outside
• Data locality - data are stored “near” each other
• Strongly consistent queries when using High Replication
datastore within this entity group

59


Entity groups and transactions
• A hierarchical structuring of your data into Megastore’s unit of
atomicity
• Allows for transactional behavior - but only within a single entity
group
• Key unit of consistency when using High Replication datastore

60


Example: Data for a blog hosting service

User

Blog Has many
Has many

Entry

Has many Comment

61


Example: Data for a blog hosting service

User

Blog Has many
Has many

Entry
This can be structured as
an entity group (tree
structure)! Has many Comment

62


Structure this data as an entity group

Entity
User
group root

Blog Blog

Entry Entry Entry

Comment
Comment Comment

63


How are entity groups stored?

Entities table

Bigtable key Value
AppId:User:ikai@google.com ( Protobuf serialized User )

AppId:User:ikai@google.com/Blog:123 ( Protobuf serialized Blog )

AppId:User:ikai@google.com/Blog:123/Entry:456 ( Protobuf serialized Entry )


AppId:User:ikai@google.com/Blog:123/Entry:456/ ( Protobuf serialized Comment )
Comment:111
Comment:222
Comment:333

Read more: http://code.google.com/appengine/docs/python/datastore/entities.html

64



Entities table Entity groups have a single root entity
Bigtable key Value




Comment:111
Comment:222
Comment:333


65



Entities table

Bigtable key Value



Child entities embed the entire ancestry in
Comment:111 their keys
Comment:222
Comment:333


66


Let’s write an entity group transactionally

Entity blog = new Entity("Blog", "ikaisays.com",
ikai.getKey());
Entity entry = new Entity("Entry", "datastore-intro",
blog.getKey());

// Auto assign an ID
Entity comment = new Entity("Comment", entry.getKey());

Transaction tx = datastore.beginTransaction();

// Helper function for clarity
datastore.put(Arrays.asList(ikai, blog,entry, comment));

tx.commit();

67


.getDatastoreService(); Create the root entity
ikai.getKey());
blog.getKey());




tx.commit();

68



ikai.getKey());
blog.getKey());

This is the first child entity - notice the third
argument, which specifies the parent entity key


tx.commit();

69



ikai.getKey());
blog.getKey());

The next deeper entity sets the blog as the
parent


tx.commit();

70




We can also opt to not provide a key name and
ikai.getKey());
just use a parent key for a new entity
blog.getKey());




tx.commit();

71



ikai.getKey());
blog.getKey());

Start a new transaction



tx.commit();

72



ikai.getKey());
blog.getKey());




tx.commit();
Put the entities in parallel

73



ikai.getKey());
blog.getKey());



Actually commit the changes

tx.commit();

74


Step 1: Commit

Changes to Changes to entities
Commit
entities visible and indexes visible

Roll the timestamp forward on
the root entity

75


On read, check for the most
Step 2: Entity visible recent timestamp on the root
entity

Commit

This is the version we want
since it represents a
complete write

76


Step 3: Indexes updated

Commit

Indexes are written - now we
can query for this entity with
the new properties

77


Entity group and transactions takeaways
• Structure data into hierarchical trees
– Large enough to be useful, small enough to maximize
transactional throughput
• Transactions need an entity group root - roughly 1 transaction/
second
– If you write N entities that are all part of 1 entity group, it counts as
1 write
• Optimistic locking used - can be expensive with a lot of
contention

78


General datastore tips
• Denormalize as much as possible
– As much as possible, treat datastore as a key-value store
(Dictionary or Map like structure)
– Move large reporting to offline processing. This lets you avoid
unnecessary indexes
• Use entity groups for your data
• Build composite indexes where you need them - “need” depends
on shape of your data

79


Questions?


Introducing the App Engine datastore

More Related Content

Similar to Introducing the App Engine datastore

More from ikailan

Recently uploaded

Introducing the App Engine datastore