DEVELOPING POLYGLOT
PERSISTENCE APPLICATIONS
Chris Richardson
Author of POJOs in Action
Founder of the original CloudFoundry.com
@crichardson
chris@chrisrichardson.net
http://plainoldobjects.com
@crichardson
Presentation goal
The benefits and drawbacks of
polyglot persistence
and
How to design applications that
use this approach
@crichardson
About Chris
@crichardson
(About Chris)
@crichardson
About Chris()
@crichardson
About Chris
@crichardson
About Chris
http://www.theregister.co.uk/2009/08/19/springsource_cloud_foundry/
@crichardson
About Chris
@crichardson
Agenda
• Why polyglot persistence?
• Using Redis as a cache
• Optimizing queries using Redis materialized views
• Synchronizing MySQL and Redis
• Tracking changes to entities
@crichardson
Food to Go
• Take-out food delivery
service
• “Launched” in 2006
@crichardson
FoodTo Go Architecture
Order
taking
Restaurant
Management
MySQL
Database
CONSUMER
RESTAURANT
OWNER
@crichardson
Success Growth challenges
• Increasing traffic
• Increasing data volume
• Distribute across a few data centers
• Increasing domain model complexity
@crichardson
Limitations of relational
databases
• Scalability
• Distribution
• Schema updates
• O/R impedance mismatch
• Handling semi-structured data
@crichardson
Solution: Spend $$$ - high-end
databases and servers
http://www.powerandmotoryacht.com/megayachts/megayacht-musashi
@crichardson
Solution: Spend $$$ - open-
source stack + DevOps people
http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_5_series/madone_5_2/#
@crichardson
Solution: Use NoSQL
Benefits
• Higher performance
• Higher scalability
• Richer data-model
• Schema-less
Drawbacks
• Limited transactions
• Limited querying
• Relaxed consistency
• Unconstrained data
@crichardson
Example NoSQL Databases
Database Key features
Cassandra
Extensible column store,
very scalable, distributed
MongoDB
Document-oriented, fast,
scalable
Redis Key-value store, very fast
http://nosql-database.org/ lists 122+ NoSQL
databases
@crichardson
Redis: Fast and Scalable
Redis
master
Redis slave Redis slave
File
system
Client
In-memory
= FAST
Durable
Redis
master
Redis slave Redis slave
File
system
Consistent Hashing
@crichardson
Redis: advanced key/value store
K1 V1
K2 V2
... ...
String SET, GET
List LPUSH, LPOP, ...
Set SADD, SMEMBERS, ..
Sorted Set
Hashes
ZADD, ZRANGE
HSET, HGET
@crichardson
Sorted sets
myset
a
5.0
b
10.
Key
Value
ScoreMembers are
sorted by score
@crichardson
Adding members to a sorted set
Redis Server
zadd myset 5.0 a myset
a
5.0
Key Score Value
@crichardson
Adding members to a sorted set
Redis Server
zadd myset 10.0 b myset
a
5.0
b
10.
@crichardson
Adding members to a sorted set
Redis Server
zadd myset 1.0 c myset
a
5.0
b
10.
c
1.0
@crichardson
Retrieving members by index range
Redis Server
zrange myset 0 1
myset
a
5.0
b
10.
c
1.0
ac
Key Start
Index
End
Index
@crichardson
Retrieving members by score
Redis Server
zrangebyscore myset 1 6
myset
a
5.0
b
10.
c
1.0
ac
Key
Min
value
Max
value
@crichardson
Redis use cases
• Replacement for Memcached
• Session state
• Cache of data retrieved from
system of record (SOR)
• Replica of SOR for queries
needing high-performance
• Handling tasks that overload an RDBMS
• Hit counts - INCR
• Most recent N items - LPUSH and
LTRIM
• Randomly selecting an item –
SRANDMEMBER
• Queuing – Lists with LPOP, RPUSH, ….
• High score tables – Sorted sets and
ZINCRBY
• …
@crichardson
The future is polyglot
IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg
e.g. Netflix
• RDBMS
• SimpleDB
• Cassandra
• Hadoop/Hbase
@crichardson
Benefits
RDBMS
ACID transactions
Rich queries
...
+
NoSQL
Performance
Scalability
...
@crichardson
Drawbacks
Complexity
Development
Operational
@crichardson
Polyglot Pattern: Cache
Order
taking
Restaurant
Management
MySQL
Database
CONSUMER RESTAURANT
OWNER
Redis
Cache
First
Second
@crichardson
Redis:Timeline
Polyglot Pattern: write to SQL and NoSQL
Follower 1
Follower 2
Follower 3 ...
LPUSHX
LTRIM
MySQL
Tweets
Followers
INSERT
NEW
TWEET
Follows
Avoids
expensive
MySQL joins
@crichardson
Polyglot Pattern: replicate from
MySQL to NoSQL
Query
Service
Update
Service
MySQL
Database
Reader Writer
Redis
System
of
RecordMaterialized
view query() update()
replicate
and
denormalize
@crichardson
Agenda
• Why polyglot persistence?
• Using Redis as a cache
• Optimizing queries using Redis materialized views
• Synchronizing MySQL and Redis
• Tracking changes to entities
@crichardson
Food to Go – Domain model (partial)
class Restaurant {
long id;
String name;
Set<String> serviceArea;
Set<TimeRange> openingHours;
List<MenuItem> menuItems;
}
class MenuItem {
String name;
double price;
}
class TimeRange {
long id;
int dayOfWeek;
int openTime;
int closeTime;
}
@crichardson
Database schema
ID Name …
1 Ajanta
2 Montclair Eggshop
Restaurant_id zipcode
1 94707
1 94619
2 94611
2 94619
Restaurant_id dayOfWeek openTime closeTime
1 Monday 1130 1430
1 Monday 1730 2130
2 Tuesday 1130 …
RESTAURANT table
RESTAURANT_ZIPCODE table
RESTAURANT_TIME_RANGE table
@crichardson
RestaurantRepository
public interface RestaurantRepository {
void addRestaurant(Restaurant restaurant);
Restaurant findById(long id);
...
}
Food To Go will have scaling
eventually issues
@crichardson
Increase scalability by caching
Order
taking
Restaurant
Management
MySQL
Database
CONSUMER
RESTAURANT
OWNER
Cache
@crichardson
Caching Options
• Where:
• Hibernate 2nd level cache
• Explicit calls from application code
• Caching aspect
• Cache technologies: Ehcache, Memcached, Infinispan, ...
Redis is also an option
@crichardson
Using Redis as a cache
• Spring 3.1 cache abstraction
• Annotations specify which methods to cache
• CacheManager - pluggable back-end cache
• Spring Data for Redis
• Simplifies the development of Redis applications
• Provides RedisTemplate (analogous to JdbcTemplate)
• Provides RedisCacheManager
@crichardson
Using Spring 3.1 Caching
@Service
public class RestaurantManagementServiceImpl implements RestaurantManagementService {
private final RestaurantRepository restaurantRepository;
@Autowired
public RestaurantManagementServiceImpl(RestaurantRepository restaurantRepository) {
this.restaurantRepository = restaurantRepository;
}
@Override
public void add(Restaurant restaurant) {
restaurantRepository.add(restaurant);
}
@Override
@Cacheable(value = "Restaurant")
public Restaurant findById(int id) {
return restaurantRepository.findRestaurant(id);
}
@Override
@CacheEvict(value = "Restaurant", key="#restaurant.id")
public void update(Restaurant restaurant) {
restaurantRepository.update(restaurant);
}
Cache result
Evict from
cache
@crichardson
Configuring the Redis Cache
Manager
	 <cache:annotation-driven />
	 <bean id="cacheManager"
class="org.springframework.data.redis.cache.RedisCacheManager" >
	 	 <constructor-arg ref="restaurantTemplate"/>
	 </bean>
Enables caching
Specifies CacheManager
implementation
The RedisTemplate used
to access Redis
@crichardson
TimeRange MenuItem
Domain object to key-value
mapping?
Restaurant
TimeRange MenuItem
ServiceArea
K1 V1
K2 V2
... ...
@crichardson
RedisTemplate handles the
mapping
• Principal API provided by Spring Data to Redis
• Analogous to JdbcTemplate
• Encapsulates boilerplate code, e.g. connection management
• Maps Java objects Redis byte[]’s
@crichardson
Serializers: object byte[]
• RedisTemplate has multiple serializers
• DefaultSerializer - defaults to JdkSerializationRedisSerializer
• KeySerializer
• ValueSerializer
• ...
@crichardson
Serializing a Restaurant as JSON
@Configuration
public class RestaurantManagementRedisConfiguration {
@Autowired
private RestaurantObjectMapperFactory restaurantObjectMapperFactory;
private JacksonJsonRedisSerializer<Restaurant> makeRestaurantJsonSerializer() {
JacksonJsonRedisSerializer<Restaurant> serializer =
new JacksonJsonRedisSerializer<Restaurant>(Restaurant.class);
...
return serializer;
}
@Bean
@Qualifier("Restaurant")
public RedisTemplate<String, Restaurant> restaurantTemplate(RedisConnectionFactory factory) {
RedisTemplate<String, Restaurant> template = new RedisTemplate<String, Restaurant>();
template.setConnectionFactory(factory);
JacksonJsonRedisSerializer<Restaurant> jsonSerializer = makeRestaurantJsonSerializer();
template.setValueSerializer(jsonSerializer);
return template;
}
}
Serialize restaurants using Jackson
JSON
@crichardson
Caching with Redis
Order
taking
Restaurant
Management
MySQL
Database
CONSUMER
RESTAURANT
OWNER
Redis
Cache
First Second
@crichardson
Storing restaurants in Redis
...JSON...restaurant:1
@crichardson
Agenda
• Why polyglot persistence?
• Using Redis as a cache
• Optimizing queries using Redis materialized views
• Synchronizing MySQL and Redis
• Tracking changes to entities
@crichardson
Finding available restaurants
Available restaurants =
Serve the zip code of the delivery address
AND
Are open at the delivery time
public interface AvailableRestaurantRepository {
List<AvailableRestaurant>
	

 findAvailableRestaurants(Address deliveryAddress, Date deliveryTime);
...
}
@crichardson
Finding available restaurants on Monday, 6.15pm
for 94619 zipcode
Straightforward three-way join
select r.*
from restaurant r
inner join restaurant_time_range tr
on r.id =tr.restaurant_id
inner join restaurant_zipcode sa
on r.id = sa.restaurant_id
where ’94619’ = sa.zip_code
and tr.day_of_week=’monday’
and tr.openingtime <= 1815
and 1815 <= tr.closingtime
@crichardson
How to scale queries?
@crichardson
Option #1: Query caching
• [ZipCode, DeliveryTime] ⇨ list of available restaurants
BUT
• Long tail queries
• Update restaurant ⇨ Flush entire cache
Ineffective
@crichardson
Option #2: Master/Slave replication
MySQL
Master
MySQL
Slave 1
MySQL
Slave 2
MySQL
Slave N
Writes Consistent reads
Queries
(Inconsistent reads)
@crichardson
Master/Slave replication
• Mostly straightforward
BUT
• Assumes that SQL query is efficient
• Complexity of administration of slaves
• Doesn’t scale writes
@crichardson
Option #3: Redis materialized
views
Order
taking
Restaurant
Management
MySQL
Database
CONSUMER
RESTAURANT
OWNER
CacheRedis
System
of
RecordCopy
findAvailable()
update()
@crichardson
BUT how to implement findAvailableRestaurants()
with Redis?!
?
select r.*
from restaurant r
inner join restaurant_time_range tr
on r.id =tr.restaurant_id
inner join restaurant_zipcode sa
on r.id = sa.restaurant_id
where ’94619’ = sa.zip_code
and tr.day_of_week=’monday’
and tr.openingtime <= 1815
and 1815 <= tr.closingtime
K1 V1
K2 V2
... ...
@crichardson
Solution: Build an index using sorted
sets and ZRANGEBYSCORE
ZRANGEBYSCORE myset 1 6
select value
from sorted_set
where key = ‘myset’
and score >= 1
and score <= 6
=
key value score
sorted_set
@crichardson
select r.*
from restaurant r
inner join restaurant_time_range tr
on r.id =tr.restaurant_id
inner join restaurant_zipcode sa
on r.id = sa.restaurant_id
where ’94619’ = sa.zip_code
and tr.day_of_week=’monday’
and tr.openingtime <= 1815
and 1815 <= tr.closingtime
select value
from sorted_set
where key = ?
and score >= ?
and score <= ?
?
How to transform the SELECT
statement?
@crichardson
We need to denormalize
Think materialized view
@crichardson
Simplification #1:
Denormalization
Restaurant_id Day_of_week Open_time Close_time Zip_code
1 Monday 1130 1430 94707
1 Monday 1130 1430 94619
1 Monday 1730 2130 94707
1 Monday 1730 2130 94619
2 Monday 0700 1430 94619
…
SELECT restaurant_id
FROM time_range_zip_code
WHERE day_of_week = ‘Monday’
AND zip_code = 94619
AND 1815 < close_time
AND open_time < 1815
Simpler query:
§ No joins
§ Two = and two <
@crichardson
Simplification #2:Application
filtering
SELECT restaurant_id, open_time
FROM time_range_zip_code
WHERE day_of_week = ‘Monday’
AND zip_code = 94619
AND 1815 < close_time
AND open_time < 1815
Even simpler query
• No joins
• Two = and one <
@crichardson
Simplification #3: Eliminate multiple =’s with
concatenation
SELECT restaurant_id, open_time
FROM time_range_zip_code
WHERE zip_code_day_of_week = ‘94619:Monday’
AND 1815 < close_time
Restaurant_id Zip_dow Open_time Close_time
1 94707:Monday 1130 1430
1 94619:Monday 1130 1430
1 94707:Monday 1730 2130
1 94619:Monday 1730 2130
2 94619:Monday 0700 1430
…
key
range
@crichardson
Simplification #4: Eliminate multiple RETURN
VALUES with concatenation
SELECT open_time_restaurant_id,
FROM time_range_zip_code
WHERE zip_code_day_of_week = ‘94619:Monday’
AND 1815 < close_time
zip_dow open_time_restaurant_id close_time
94707:Monday 1130_1 1430
94619:Monday 1130_1 1430
94707:Monday 1730_1 2130
94619:Monday 1730_1 2130
94619:Monday 0700_2 1430
...
✔
@crichardson
zip_dow open_time_restaurant_id close_time
94707:Monday 1130_1 1430
94619:Monday 1130_1 1430
94707:Monday 1730_1 2130
94619:Monday 1730_1 2130
94619:Monday 0700_2 1430
...
Sorted Set [ Entry:Score, …]
[1130_1:1430, 1730_1:2130]
[0700_2:1430, 1130_1:1430, 1730_1:2130]
Using a Redis sorted set as an index
Key
94619:Monday
94707:Monday
94619:Monday 0700_2 1430
@crichardson
Querying with ZRANGEBYSCORE
ZRANGEBYSCORE 94619:Monday 1815 2359
è
{1730_1}
1730 is before 1815 è Ajanta is open
Delivery timeDelivery zip and day
Sorted Set [ Entry:Score, …]
[1130_1:1430, 1730_1:2130]
[0700_2:1430, 1130_1:1430, 1730_1:2130]
Key
94619:Monday
94707:Monday
@crichardson
Adding a Restaurant
Text
@Component
public class AvailableRestaurantRepositoryImpl implements AvailableRestaurantRepository {
@Override
public void add(Restaurant restaurant) {
addRestaurantDetails(restaurant);
addAvailabilityIndexEntries(restaurant);
}
private void addRestaurantDetails(Restaurant restaurant) {
restaurantTemplate.opsForValue().set(keyFormatter.key(restaurant.getId()), restaurant);
}
private void addAvailabilityIndexEntries(Restaurant restaurant) {
for (TimeRange tr : restaurant.getOpeningHours()) {
String indexValue = formatTrId(restaurant, tr);
int dayOfWeek = tr.getDayOfWeek();
int closingTime = tr.getClosingTime();
for (String zipCode : restaurant.getServiceArea()) {
redisTemplate.opsForZSet().add(closingTimesKey(zipCode, dayOfWeek), indexValue,
closingTime);
}
}
}
key member
score
Store as
JSON
@crichardson
Finding available Restaurants
@Component
public class AvailableRestaurantRepositoryImpl implements AvailableRestaurantRepository {
@Override
public List<AvailableRestaurant>
findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) {
String zipCode = deliveryAddress.getZip();
int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime);
int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime);
String closingTimesKey = closingTimesKey(zipCode, dayOfWeek);
Set<String> trsClosingAfter =
redisTemplate.opsForZSet().rangeByScore(closingTimesKey, timeOfDay, 2359);
Set<String> restaurantIds = new HashSet<String>();
for (String tr : trsClosingAfter) {
String[] values = tr.split("_");
if (Integer.parseInt(values[0]) <= timeOfDay)
restaurantIds.add(values[1]);
}
Collection<String> keys = keyFormatter.keys(restaurantIds);
return availableRestaurantTemplate.opsForValue().multiGet(keys);
}
Find those that
close after
Filter out those that
open after
Retrieve open
restaurants
@crichardson
Representing restaurants in Redis
...JSON...restaurant:1
94619:Monday [0700_2:1430, 1130_1:1430, ...]
94619:Tuesday [0700_2:1430, 1130_1:1430, ...]
@crichardson
NoSQL Denormalized
representation for each query
@crichardson
SorryTed!
http://en.wikipedia.org/wiki/Edgar_F._Codd
@crichardson
Agenda
• Why polyglot persistence?
• Using Redis as a cache
• Optimizing queries using Redis materialized views
• Synchronizing MySQL and Redis
• Tracking changes to entities
@crichardson
MySQL & Redis
need to be consistent
@crichardson
Two-Phase commit is not an
option
• Redis does not support it
• Even if it did, 2PC is best avoided http://www.infoq.com/articles/ebay-scalability-best-practices
@crichardson
Atomic
Consistent
Isolated
Durable
Basically Available
Soft state
Eventually consistent
BASE:An Acid Alternative http://queue.acm.org/detail.cfm?id=1394128
@crichardson
Updating Redis #FAIL
begin MySQL transaction
update MySQL
update Redis
rollback MySQL transaction
Redis has update
MySQL does not
begin MySQL transaction
update MySQL
commit MySQL transaction
<<system crashes>>
update Redis
MySQL has update
Redis does not
@crichardson
Updating Redis reliably
Step 1 of 2
begin MySQL transaction
update MySQL
queue CRUD event in MySQL
commit transaction
ACID
Event Id
Operation: Create, Update, Delete
New entity state, e.g. JSON
@crichardson
Updating Redis reliably
Step 2 of 2
for each CRUD event in MySQL queue
get next CRUD event from MySQL queue
If CRUD event is not duplicate then
Update Redis (incl. eventId)
end if
begin MySQL transaction
mark CRUD event as processed
commit transaction
@crichardson
ID JSON processed?
ENTITY_CRUD_EVENT
EntityCrudEvent
Processor
Redis
Updater
apply(event)
Step 1 Step 2
EntityCrudEvent
Repository
INSERT INTO ...
Timer
SELECT ... FROM ...
Redis
@crichardson
Updating Redis
WATCH restaurant:lastSeenEventId:≪restaurantId≫
Optimistic
locking
Duplicate
detection
lastSeenEventId = GET restaurant:lastSeenEventId:≪restaurantId≫
if (lastSeenEventId >= eventId) return;
Transaction
MULTI
SET restaurant:lastSeenEventId:≪restaurantId≫ eventId
... update the restaurant data...
EXEC
@crichardson
Representing restaurants in Redis
restaurant:lastSeenEventId:1 55 duplicate
detection
...JSON...restaurant:1
94619:Monday [0700_2:1430, 1130_1:1430, ...]
94619:Tuesday [0700_2:1430, 1130_1:1430, ...]
@crichardson
Agenda
• Why polyglot persistence?
• Using Redis as a cache
• Optimizing queries using Redis materialized views
• Synchronizing MySQL and Redis
• Tracking changes to entities
@crichardson
How do we generate CRUD
events?
@crichardson
Change tracking options
• Explicit code
• Hibernate event listener
• Service-layer aspect
• CQRS/Event-sourcing
@crichardson
ID JSON processed?
ENTITY_CRUD_EVENT
EntityCrudEvent
Repository
HibernateEvent
Listener
@crichardson
Hibernate event listener
public class ChangeTrackingListener
implements PostInsertEventListener, PostDeleteEventListener, PostUpdateEventListener {
@Autowired
private EntityCrudEventRepository entityCrudEventRepository;
private void maybeTrackChange(Object entity, EntityCrudEventType eventType) {
if (isTrackedEntity(entity)) {
entityCrudEventRepository.add(new EntityCrudEvent(eventType, entity));
}
}
@Override
public void onPostInsert(PostInsertEvent event) {
Object entity = event.getEntity();
maybeTrackChange(entity, EntityCrudEventType.CREATE);
}
@Override
public void onPostUpdate(PostUpdateEvent event) {
Object entity = event.getEntity();
maybeTrackChange(entity, EntityCrudEventType.UPDATE);
}
@Override
public void onPostDelete(PostDeleteEvent event) {
Object entity = event.getEntity();
maybeTrackChange(entity, EntityCrudEventType.DELETE);
}
@crichardson
Summary
• Each SQL/NoSQL database = set of tradeoffs
• Polyglot persistence: leverage the strengths of SQL and
NoSQL databases
• Use Redis as a distributed cache
• Store denormalized data in Redis for fast querying
• Reliable database synchronization required
@crichardson
Questions?
@crichardson chris@chrisrichardson.net
http://plainoldobjects.com

Developing polyglot persistence applications (svcc, svcc2013)