GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

NoSQL & DataGrids from a Developer Perspective

Cyrille Le Clerc - Michaël Figuière

Speaker

@cyrilleleclerc
blog.xebia.fr

Cyrille Le Clerc
Large Scale

DataGrids
Apache CXF

Speaker

@mﬁguiere
blog.xebia.fr

Michaël Figuière Distributed
Systems

NoSQL
Search Engines

About NoSQL

Not Only

No SQL

About NoSQL

Not Only

No SQL
Relational

On the Web side

- Created Dynamo
Similar needs for Web giants :
- < 40 min of unavailability per year
• Huge amount of data

• High availability

• Fault tolerance
- Created BigTable & MapReduce
• Scalability on commodity - Stores every webpages of Internet
hardware

Amazon : the birth of Dynamo

Requires complex requests,
temporal unavailability is acceptable

Fill cart Checkout Payment Process order Prepare Send

Requires high availability,
key-value store is enough

On the Financial side

- Released Coherence in 2001
Needs within ﬁnancial market :
- Started as a distributed cache
• Very low latency

• Rich queries & transactions

• Scalability
- Released Gigaspaces XAP in 2001
• Data consistency - Routes the request inside the data

Data Partitioning and Replication

Use Case : Train Ticketing System

With trains, stations,
seats, booking and
passengers

Store everything in a Mainframe !

Up to 3 To of RAM !
More than $1,000,000

IBM z11

Data Partitioning

Partition gamma

Small
servers
Partition beta

MainFrame
Partition alpha

Split data for scalability

Data Replication

Node 1

synchro
Partition alpha
Node 2

Duplicate data for
high availability and
Node 3
scalability

Partitioned Data Modeling

Seat
Booking Passenger
number
reduction name
price

Train
code
type

TrainStation
TrainStop
code
date
name

Typical relational data model

Partitionned Data Modeling

Partitioning ready
entities tree

e ntity
Root Seat
Booking Passenger
number
reduction name
price

Train
code
Du
type pli Refe
ca
ted renc
in e d
TrainStation ea ata
TrainStop ch
code pa
date rtit
ion
name

Find the root entity and denormalize


Remove unused data

Seat
Booking Passenger
number
reduction name
price
booked
Train
code
type

TrainStation
TrainStop
code
date
name


Sharding ready data structure

Seat
number
price
booked
Train
code
type

TrainStation
TrainStop
code
date
name

Consistency, Availability and Partition Tolerance

Data Consistency with replicas

Node 1
{ "name": "Barbie Computer",
"price": 15.50,
"tags" : [
"doll",
"barbie" Node 2
]}

write to all Node 3

Node 1

read from one Node 2

Node 3


{ "name": "Barbie Computer", Node 1
"price": 15.50,
"tags" : [
"doll",
"barbie"
]}
Node 2
write to one
Node 3

Node 1

Node 2

read from all Node 3


• You can adjust the balance between number of writes and number of
reads

• See Eventual Consistency

Data Consistency with Multiple Data Centers

"price": 15.50,
"tags" : [
"doll",
"barbie"
]}
"price": 15.50,
"tags" : [

West Coast "doll",
"barbie"
]}

East Coast


set price to $ 20.00

"price": 20.00,
"tags" : [
"doll",
"barbie"
]}
"price": 15.50,
West Coast "tags" : [
"doll",
"barbie"
]}

East Coast
propagation delay !



"price": 20.00,
"tags" : [
"doll",
"barbie"
]} { "name": "Barbie Computer",
"price": 15.50,

"doll",
"barbie",
“girl”
]}

East Coast
add tag “girl”
reconciliation API needed !



"price": 20.00,
"tags" : [
"doll",
"barbie"
]} { "name": "Barbie Computer",
"price": 15.50,

"doll",
"barbie",
“girl”
]}

East Coast
add tag “girl”
Network partitioning


London
New York
Tokyo

World wide replication
for financial market

CAP Theorem

Only 2 of these 3
properties can be
achieved in storage
Consistency
system

Availability
Partition
Tolerance

CAP Theorem

Relational DB
NoSQL DB
Consistency

Availability
Partition Impossible
Tolerance

Request Driven Data Modeling

• Relational data modeling is business driven

Adaptation to requests comes with tuning

• With partitioning, data modeling had to be adapted for requests
Because network latency matters

• NoSQL & DataGrids data modeling is request driven
Two requests may require to store data twice

Key-Value Store

In memory

In memory
with async
persistence

Persistent

Example with a user proﬁle

johndoe User proﬁle as byte[]

Similar to a Java
HashMap

Write Example with Riak

RiakClient riak = new RiakClient("http://server1:8098/riak");

RiakObject userProfileObj =
new RiakObject("bucket", "johndoe", serializer.serialize(userProfile);

riak.store(userProfileObj);

Inserts a user profile
into Riak

Read Example with Riak

FetchResponse response = riak.fetch("bucket", "johndoe");

if (response.hasObject()) {

userProfileObj = response.getObject();

}

Fetch a user profile using
its key in Riak

Column Families Store

For each Row ID we have
a list of key-value pairs
Key-value
pairs are
sorted by keys

Relational DB Column families DB

Example with a shopping cart

johndoe 17:21 Iphone 17:32 DVD Player 17:44 MacBook

willsmith 6:10 Camera 8:29 Ipad

pitdavis 14:45 PlayStation 15:01 Asus EEE 15:03 Iphone

Write Example with Cassandra

Cluster cluster =
HFactory.getOrCreateCluster("cluster", new CassandraHostConfigurator("server1:9160"));

Keyspace keyspace = HFactory.createKeyspace("EcommerceKeyspace", cluster);

Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);

mutator.insert("johndoe", "ShoppingCartColumnFamily",
HFactory.createStringColumn("14:21", "Iphone"));

Inserts a column into the
ShoppingCartColumnFamily

Read Example with Cassandra

SliceQuery<String, String, String> query =
HFactory.createSliceQuery(keyspace,
stringSerializer, stringSerializer, stringSerializer);

query.setColumnFamily("ShoppingCartColumnFamily")
.setKey("johndoe")
.setRange("", "", false, 10);

QueryResult<ColumnSlice<String, String>> result = query.execute();

Reads a slice of 10 columns
from ShoppingCartColumnFamily

Example with an item of a catalog

{
"name": "Iphone",
"price": 559.0,
item_1 "vendor": "Apple",
"rating": 4.6,
"tags": [ "phone", "touch" ]
}

The database is aware of
document’s fields and
can offers complex
queries

Write Example with MongoDB

Mongo mongo = new Mongo("mongos_1", 27017);
DB db = mongo.getDB("Ecommerce");
DBCollection catalog = db.getCollection("Catalog");

BasicDBObject doc = new BasicDBObject();
doc.put("name", "Iphone");
doc.put("price", 559.0);

catalog.insert(doc);

Inserts an item
document into MongoDB

Read Example with MongoDB

BasicDBObject query = new BasicDBObject();
query.put("price", new BasicDBObject("$lt", 600));
DBCursor cursor = catalog.find(query);

while(cursor.hasNext()) {
System.out.println(cursor.next());
}

Queries for all items with
a price lower than 600

In Memory Data Grids

eXtreme Scale

Example with train booking with IBM eXtremeScale
@Entity(schemaRoot=true)
public class Train { Seat
number
price
@Id
booked
String code; Train
code
@Index type
@Basic
TrainStop
String name;
date

@OneToMany(cascade=CascadeType.ALL)
List<Seat> seats = new ArrayList<Seat>();

@Version
int version;

...
} With Data Grids,
sub entities can have
cross relations

Write Example with IBM eXtreme Scale

eXtreme Scale provides
a JPA Style API

void persist(Train train) {
entityManager.persist(train);
}

Inserts a train into
eXtreme Scale

Read Example with IBM eXtreme Scale

/** Find by key */
Train findById(String id) {
return (Train) entityManager.find(Train.class, id);
}

/** Query Language */
Train findByTrain(String code) {
Query q = entityManager.createQuery("select t from Train t where t.code=:code");
q.setParameter("code", code);

return (Train) q.getSingleResult();
}

Simple and complex queries
with eXtreme Scale

More APIs

• Another Java EE versus Spring battle ? JSR 347 Data Grids vs. Spring Data

Unified API ontop of relational, document, column, key-value ?

Object to tuple projection API

Transactions

• NoSQL usually means NO transactions

• Except when it means eXtreme Transactions !

Transactions Concurrency

Place order
231

canon-eos: 1
ipod : 1
headphone : 1
311
iphone: 1
...

ipad : 1 121
iphone: 1
264

concurrency on iphone 2
barbie : 1
iphone: 1 637
cabbage-doll: 1

12

cancel order if one product warehouse stocks
is missing

SQL Transactions

Place order
231

canon-eos: 1
ipod : 1 begin
headphone : 1
311
iphone: 1
...
for each
shoppingCart.product
select for update ...
121
ipad : 1
iphone: 1
update ...
264
commit
2
barbie : 1
iphone: 1
cabbage-doll: 1 637

12

warehouse stocks

lock duration = f(shoppingcart.length)
if too many locks on the rows, then lock table !

SQL Transactions

Place order select for update ...

231

canon-eos: 1
ipod : 1
headphone : 1
311
iphone: 1
...

ipad : 1 121
iphone: 1
264

2
barbie : 1
iphone: 1
cabbage-doll: 1 637

12

warehouse stocks


SQL Transactions

Place order select for update ...

231

canon-eos: 1
ipod : 1
headphone : 1
311
iphone: 1
...

ipad : 1 121
iphone: 1
264

2
barbie : 1
iphone: 1
cabbage-doll: 1
637

12

warehouse stocks

Transactions with Manual Compensation

Place order
DO
-1
if(stock - quantity > 0) {
stock = stock - quantity;
} else {
throw exception() !
231
UNDO
stock = stock + quantity;

311
canon-eos: 1 -1
DO
ipod : 1 if(stock - quantity > 0) {

headphone : 1 stock = stock - quantity;
} else {
throw exception() !
iphone: 1
... UNDO

-1
121
DO

264
} else {
throw exception() !

UNDO
2

-1 DO
DO
if(stock - quantity > 0) { if(stock - quantity > 0) {
} else {
throw exception() !
stock = stock - quantity; 637
} else {
UNDO
throw exception() ! 12
}
warehouse stocks
UNDO

code “do”, “undo” and the chain


Place order
DO
-1
} else {
throw exception() !
231
UNDO
stock = stock + quantity; 637

barbie : 1
iphone: 1 DO
-1 311
cabbage-doll: 1 stock = stock - quantity;
} else {
throw exception() !

UNDO

DO
264
-1
} else {
throw exception() !

UNDO

121

warehouse stocks


Place order
DO
-1
} else {
throw exception() !
231
UNDO

barbie : 1
iphone: 1 DO
-1 311
} else {
throw exception() !

UNDO

DO
264
-1
} else {
throw exception() !

UNDO

121

warehouse stocks


Place order
DO
-1
} else {
throw exception() !
231
UNDO

barbie : 1
iphone: 1 DO
-1 311
} else {

no more iphone !
throw exception() !

UNDO

DO
264
-1
} else {
throw exception() !

UNDO

121

warehouse stocks


Place order
DO
-1
} else {
throw exception() !
231
UNDO

barbie : 1
iphone: 1 -1 311
interrupted
DO
} else {
throw exception() !

UNDO

DO
264
-1
cancelled
} else {
throw exception() !

UNDO

121

warehouse stocks


Place order
DO
-1

undo
} else {
throw exception() !
231
UNDO
stock = stock + quantity; 636 +1

barbie : 1 DO
iphone: 1 -1 311
interrupted
cabbage-doll: 1 } else {
throw exception() !

UNDO
0

DO
264
-1
cancelled
} else {
throw exception() !

UNDO

DO 121
} else {

}
throw exception() !
warehouse stocks
UNDO


• Code “do” & “undo” & chain execution

• What about interrupted chain execution ? Data corruption ?


• Code “do” & “undo” & chain execution

• What about interrupted chain execution ? Data corruption ?

data store managed transaction chain execution

Key-Value Store

• Get and Set by key

Simple but enough for a lot of use cases

• Riak and Voldemort provide a great scalability
Great to persist continuously growing datasets

• Memcached and Redis offer low overhead and latency
Great for cache and live data

Column Families Store

• Get and Set by key of a list of columns

Makes it possible to fetch and update partial data

• Queries are simples, but columns slice fetching is possible
Great for pagination

• Data model is too low level for many complex data modeling
Should typically be used for the largest scalability needs

Document Store

• Schema less

Great for continuously updated schemas

• Complex queries are available
Necessary for filtering and search

• Scalability may be limited if not querying using partition key
Can be handle using multiple storage and limited queries

In Memory Data Grid

• Very Low Latency & eXtreme Transaction Processing (XTP)

Investment banking, booking & inventory systems

• In Memory - No Persistence
Most of the time backed with a database

• High budget and Developer skills required
Some Open Source alternatives are appearing

Polyglot storage for eCommerce

Products
Solr
search

Product catalog MongoDB

Application
User account and
Cassandra
Shopping cart

Warehouse
inventory Coherence

Why NoSQL & DataGrids matter ?

• Polyglot Storage: databases that ﬁt the needs of every type of data

• Linear Scalability: being able to handle any further business requirements

• High Availability: multi-servers and multi-datacenters

• Elasticity: natural integration with Cloud Computing philosophy

• Some new use cases now available

Questions / Answers

?

GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Recommended

Recommended

More Related Content

Similar to GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Similar to GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective (20)

More from Cyrille Le Clerc

More from Cyrille Le Clerc (9)

Recently uploaded

Recently uploaded (20)

GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective