NoSQL databases such as Redis, MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offer significantly better scalability and performance. However, using a NoSQL database means giving up the benefits of the relational model such as SQL, constraints and ACID transactions. For some applications, the solution is polyglot persistence: using SQL and NoSQL databases together.
In this talk, you will learn about the benefits and drawbacks of polyglot persistence and how to design applications that use this approach. We will explore the architecture and implementation of an example application that uses MySQL as the system of record and Redis as a very high-performance database that handles queries from the front-end. You will learn about mechanisms for maintaining consistency across the various databases.
1. DEVELOPING POLYGLOT
PERSISTENCE APPLICATIONS
Chris Richardson
Author of POJOs in Action
Founder of the original CloudFoundry.com
@crichardson chris.richardson@springsource.com
http://plainoldobjects.com
2. Presentation goal
The benefits and drawbacks of
polyglot persistence
and
How to design applications that
use this approach
@crichardson
8. vmc push About-Chris
Developer Advocate for
CloudFoundry.com
Signup at http://cloudfoundry.com
@crichardson
9. Agenda
• Why polyglot persistence?
• Using Redis as a cache
• Optimizing queries using Redis materialized views
• Synchronizing MySQL and Redis
• Tracking changes to entities
• Using a modular asynchronous architecture
@crichardson
10. Food to Go
• Take-out food delivery
service
• “Launched” in 2006
@crichardson
11. Food To Go Architecture
RESTAURANT
CONSUMER
OWNER
Order Restaurant
taking Management
MySQL
Database
@crichardson
12. Success Growth challenges
• Increasing traffic
• Increasing data volume
• Distribute across a few data centers
• Increasing domain model complexity
@crichardson
13. Limitations of relational
databases
• Scalability
• Distribution
• Schema updates
• O/R impedance mismatch
• Handling semi-structured data
@crichardson
16. Example NoSQL Databases
Database Key features
Cassandra Extensible column store, very scalable, distributed
Neo4j Graph database
Document-oriented, fast, scalable
MongoDB
Redis Key-value store, very fast
http://nosql-database.org/ lists 122+ NoSQL
databases @crichardson
17. Redis
• Advanced key-value store K1 V1
• C-based server
K2 V2
• Very fast, e.g. 100K reqs/sec
... ...
• Optional persistence
• Transactions with optimistic locking
• Master-slave replication
• Sharding using client-side consistent hashing
@crichardson
18. Sorted sets
Value
Key
a b
myset
5.0 10.
Members are Score
sorted by score
@crichardson
19. Adding members to a sorted set
Redis Server
Key Score Value
a
zadd myset 5.0 a myset
5.0
@crichardson
20. Adding members to a sorted set
Redis Server
a b
zadd myset 10.0 b myset
5.0 10.
@crichardson
21. Adding members to a sorted set
Redis Server
c a b
zadd myset 1.0 c myset
1.0 5.0 10.
@crichardson
22. Retrieving members by index range
Start End
Key
Index Index Redis Server
zrange myset 0 1
c a b
myset
1.0 5.0 10.
c a
@crichardson
23. Retrieving members by score
Min Max
Key
value value Redis Server
zrangebyscore myset 1 6
c a b
myset
1.0 5.0 10.
c a
@crichardson
24. Redis use cases
• Replacement for Memcached • Handling tasks that overload an RDBMS
• Session state • Hit counts - INCR
• Cache of data retrieved from • Most recent N items - LPUSH and
system of record (SOR) LTRIM
• Replica of SOR for queries • Randomly selecting an item –
needing high-performance SRANDMEMBER
• Queuing – Lists with LPOP, RPUSH, ….
• High score tables – Sorted sets and
ZINCRBY
• …
@crichardson
25. Redis is great but there are
tradeoffs
• Low-level query language: PK-based access only
• Limited transaction model:
• Read first and then execute updates as batch
• Difficult to compose code
• Data must fit in memory
• Single-threaded server: run multiple with client-side sharding
• Missing features such as access control, ...
@crichardson
27. The future is polyglot
e.g. Netflix
• RDBMS
• SimpleDB
• Cassandra
• Hadoop/Hbase
IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg
@crichardson
28. Agenda
• Why polyglot persistence?
• Using Redis as a cache
• Optimizing queries using Redis materialized views
• Synchronizing MySQL and Redis
• Tracking changes to entities
• Using a modular asynchronous architecture
@crichardson
29. Food to Go – Domain model (partial)
class Restaurant { class TimeRange {
long id; long id;
String name; int dayOfWeek;
Set<String> serviceArea; int openTime;
Set<TimeRange> openingHours; int closeTime;
List<MenuItem> menuItems;
}
}
class MenuItem {
String name;
double price;
}
@crichardson
31. RestaurantRepository
public interface RestaurantRepository {
void addRestaurant(Restaurant restaurant);
Restaurant findById(long id);
...
}
Food To Go will have scaling
eventually issues
@crichardson
32. Increase scalability by caching
RESTAURANT
CONSUMER
OWNER
Order Restaurant
taking Management
MySQL
Cache
Database
@crichardson
33. Caching Options
• Where:
• Hibernate 2nd level cache
• Explicit calls from application code
• Caching aspect
• Cache technologies: Ehcache, Memcached, Infinispan, ...
Redis is also an option
@crichardson
34. Using Redis as a cache
• Spring 3.1 cache abstraction
• Annotations specify which methods to cache
• CacheManager - pluggable back-end cache
• Spring Data for Redis
• Simplifies the development of Redis applications
• Provides RedisTemplate (analogous to JdbcTemplate)
• Provides RedisCacheManager
@crichardson
35. Using Spring 3.1 Caching
@Service
public class RestaurantManagementServiceImpl implements RestaurantManagementService {
private final RestaurantRepository restaurantRepository;
@Autowired
public RestaurantManagementServiceImpl(RestaurantRepository restaurantRepository) {
this.restaurantRepository = restaurantRepository;
}
@Override
public void add(Restaurant restaurant) {
Cache result
restaurantRepository.add(restaurant);
}
@Override
@Cacheable(value = "Restaurant")
public Restaurant findById(int id) {
return restaurantRepository.findRestaurant(id);
Evict from
} cache
@Override
@CacheEvict(value = "Restaurant", key="#restaurant.id")
public void update(Restaurant restaurant) {
restaurantRepository.update(restaurant);
}
@crichardson
36. Configuring the Redis Cache
Manager
Enables caching
<cache:annotation-driven />
<bean id="cacheManager"
class="org.springframework.data.redis.cache.RedisCacheManager" >
<constructor-arg ref="restaurantTemplate"/>
</bean>
Specifies CacheManager The RedisTemplate used
implementation to access Redis
@crichardson
38. RedisTemplate handles the
mapping
• Principal API provided by Spring Data to Redis
• Analogous to JdbcTemplate
• Encapsulates boilerplate code, e.g. connection management
• Maps Java objects Redis byte[]’s
@crichardson
40. Serializing a Restaurant as JSON
@Configuration
public class RestaurantManagementRedisConfiguration {
@Autowired
private RestaurantObjectMapperFactory restaurantObjectMapperFactory;
private JacksonJsonRedisSerializer<Restaurant> makeRestaurantJsonSerializer() {
JacksonJsonRedisSerializer<Restaurant> serializer =
new JacksonJsonRedisSerializer<Restaurant>(Restaurant.class);
...
return serializer;
}
@Bean
@Qualifier("Restaurant")
public RedisTemplate<String, Restaurant> restaurantTemplate(RedisConnectionFactory factory) {
RedisTemplate<String, Restaurant> template = new RedisTemplate<String, Restaurant>();
template.setConnectionFactory(factory);
JacksonJsonRedisSerializer<Restaurant> jsonSerializer = makeRestaurantJsonSerializer();
template.setValueSerializer(jsonSerializer);
return template;
}
Serialize restaurants using Jackson
} JSON
@crichardson
41. Caching with Redis
RESTAURANT
CONSUMER
OWNER
Order Restaurant
taking Management
Redis MySQL
First Second
Cache Database
@crichardson
42. Agenda
• Why polyglot persistence?
• Using Redis as a cache
• Optimizing queries using Redis materialized views
• Synchronizing MySQL and Redis
• Tracking changes to entities
• Using a modular asynchronous architecture
@crichardson
43. Finding available restaurants
Available restaurants =
Serve the zip code of the delivery address
AND
Are open at the delivery time
public interface AvailableRestaurantRepository {
List<AvailableRestaurant>
findAvailableRestaurants(Address deliveryAddress, Date deliveryTime);
...
}
@crichardson
44. Finding available restaurants on Monday, 6.15pm
for 94619 zipcode
Straightforward three-way join
select r.*
from restaurant r
inner join restaurant_time_range tr
on r.id =tr.restaurant_id
inner join restaurant_zipcode sa
on r.id = sa.restaurant_id
where ’94619’ = sa.zip_code
and tr.day_of_week=’monday’
and tr.openingtime <= 1815
and 1815 <= tr.closingtime
@crichardson
46. Option #1: Query caching
• [ZipCode, DeliveryTime] ⇨ list of available restaurants
BUT
• Long tail queries
• Update restaurant ⇨ Flush entire cache
Ineffective
@crichardson
47. Option #2: Master/Slave replication
Writes Consistent reads
Queries
MySQL
(Inconsistent reads)
Master
MySQL MySQL MySQL
Slave 1 Slave 2 Slave N
@crichardson
48. Master/Slave replication
• Mostly straightforward
BUT
• Assumes that SQL query is efficient
• Complexity of administration of slaves
• Doesn’t scale writes
@crichardson
49. Option #3: Redis materialized
views
RESTAURANT
CONSUMER
OWNER
Order Restaurant
taking Management System
of
Copy update() Record
findAvailable()
MySQL
Redis Cache
Database
@crichardson
50. BUT how to implement findAvailableRestaurants()
with Redis?!
?
select r.*
from restaurant r K1 V1
inner join restaurant_time_range tr
on r.id =tr.restaurant_id
inner join restaurant_zipcode sa
on r.id = sa.restaurant_id
K2 V2
where ’94619’ = sa.zip_code
and tr.day_of_week=’monday’
and tr.openingtime <= 1815 ... ...
and 1815 <= tr.closingtime
@crichardson
51. Solution: Build an index using sorted
sets and ZRANGEBYSCORE
ZRANGEBYSCORE myset 1 6
=
sorted_set
select value key value score
from sorted_set
where key = ‘myset’
and score >= 1
and score <= 6
@crichardson
52. How to transform the SELECT
statement?
select r.*
from restaurant r
inner join restaurant_time_range tr
on r.id =tr.restaurant_id
inner join restaurant_zipcode sa
? select value
from sorted_set
where key = ?
on r.id = sa.restaurant_id and score >= ?
where ’94619’ = sa.zip_code and score <= ?
and tr.day_of_week=’monday’
and tr.openingtime <= 1815
and 1815 <= tr.closingtime
@crichardson
53. We need to denormalize
Think materialized view
@crichardson
54. Simplification #1:
Denormalization
Restaurant_id Day_of_week Open_time Close_time Zip_code
1 Monday 1130 1430 94707
1 Monday 1130 1430 94619
1 Monday 1730 2130 94707
1 Monday 1730 2130 94619
2 Monday 0700 1430 94619
…
SELECT restaurant_id
FROM time_range_zip_code
WHERE day_of_week = ‘Monday’ Simpler query:
No joins
AND zip_code = 94619 Two = and two <
AND 1815 < close_time
AND open_time < 1815
@crichardson
55. Simplification #2: Application
filtering
SELECT restaurant_id, open_time
FROM time_range_zip_code
WHERE day_of_week = ‘Monday’ Even simpler query
• No joins
AND zip_code = 94619
• Two = and one <
AND 1815 < close_time
AND open_time < 1815
@crichardson
56. Simplification #3: Eliminate multiple =’s with
concatenation
Restaurant_id Zip_dow Open_time Close_time
1 94707:Monday 1130 1430
1 94619:Monday 1130 1430
1 94707:Monday 1730 2130
1 94619:Monday 1730 2130
2 94619:Monday 0700 1430
…
SELECT restaurant_id, open_time
FROM time_range_zip_code
WHERE zip_code_day_of_week = ‘94619:Monday’
AND 1815 < close_time
key
range
@crichardson
58. Using a Redis sorted set as an index
zip_dow open_time_restaurant_id close_time
94707:Monday 1130_1 1430
94619:Monday 1130_1 1430
94707:Monday 1730_1 2130
94619:Monday 1730_1 2130
94619:Monday 0700_2 1430
...
Key Sorted Set [ Entry:Score, …]
94619:Monday [0700_2:1430, 1130_1:1430, 1730_1:2130]
94707:Monday [1130_1:1430, 1730_1:2130]
@crichardson
59. Querying with ZRANGEBYSCORE
Key Sorted Set [ Entry:Score, …]
94619:Monday [0700_2:1430, 1130_1:1430, 1730_1:2130]
94707:Monday [1130_1:1430, 1730_1:2130]
Delivery zip and day Delivery time
ZRANGEBYSCORE 94619:Monday 1815 2359
{1730_1}
1730 is before 1815 Ajanta is open
@crichardson
60. Adding a Restaurant
@Component
public class AvailableRestaurantRepositoryImpl implements AvailableRestaurantRepository {
@Override
public void add(Restaurant restaurant) {
addRestaurantDetails(restaurant);
Store as
addAvailabilityIndexEntries(restaurant); JSON
}
Text
private void addRestaurantDetails(Restaurant restaurant) {
restaurantTemplate.opsForValue().set(keyFormatter.key(restaurant.getId()), restaurant);
}
private void addAvailabilityIndexEntries(Restaurant restaurant) {
for (TimeRange tr : restaurant.getOpeningHours()) {
String indexValue = formatTrId(restaurant, tr); key member
int dayOfWeek = tr.getDayOfWeek();
int closingTime = tr.getClosingTime();
for (String zipCode : restaurant.getServiceArea()) {
redisTemplate.opsForZSet().add(closingTimesKey(zipCode, dayOfWeek), indexValue,
closingTime);
}
}
} score
@crichardson
61. Finding available Restaurants
@Component
public class AvailableRestaurantRepositoryImpl implements AvailableRestaurantRepository {
@Override
public List<AvailableRestaurant>
findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) { Find those that
String zipCode = deliveryAddress.getZip(); close after
int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime);
int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime);
String closingTimesKey = closingTimesKey(zipCode, dayOfWeek);
Set<String> trsClosingAfter =
redisTemplate.opsForZSet().rangeByScore(closingTimesKey, timeOfDay, 2359);
Set<String> restaurantIds = new HashSet<String>();
for (String tr : trsClosingAfter) { Filter out those that
String[] values = tr.split("_"); open after
if (Integer.parseInt(values[0]) <= timeOfDay)
restaurantIds.add(values[1]);
}
Collection<String> keys = keyFormatter.keys(restaurantIds);
return availableRestaurantTemplate.opsForValue().multiGet(keys); Retrieve open
} restaurants
@crichardson
63. Agenda
• Why polyglot persistence?
• Using Redis as a cache
• Optimizing queries using Redis materialized views
• Synchronizing MySQL and Redis
• Tracking changes to entities
• Using a modular asynchronous architecture
@crichardson
65. Two-Phase commit is not an
option
• Redis does not support it
• Even if it did, 2PC is best avoided http://www.infoq.com/articles/ebay-scalability-best-practices
@crichardson
66. Atomic
Consistent Basically Available
Isolated Soft state
Durable Eventually consistent
BASE: An Acid Alternative http://queue.acm.org/detail.cfm?id=1394128
@crichardson
67. Updating Redis #FAIL
begin MySQL transaction
update MySQL Redis has update
update Redis MySQL does not
rollback MySQL transaction
begin MySQL transaction
update MySQL
MySQL has update
commit MySQL transaction
Redis does not
<<system crashes>>
update Redis
@crichardson
68. Updating Redis reliably
Step 1 of 2
begin MySQL transaction
update MySQL
ACID
queue CRUD event in MySQL
commit transaction
Event Id
Operation: Create, Update, Delete
New entity state, e.g. JSON
@crichardson
69. Updating Redis reliably
Step 2 of 2
for each CRUD event in MySQL queue
get next CRUD event from MySQL queue
If CRUD event is not duplicate then
Update Redis (incl. eventId)
end if
begin MySQL transaction
mark CRUD event as processed
commit transaction
@crichardson
70. Step 1 Step 2
Timer
EntityCrudEvent EntityCrudEvent apply(event) Redis
Repository Processor Updater
INSERT INTO ... SELECT ... FROM ...
ENTITY_CRUD_EVENT
ID JSON processed?
Redis
@crichardson
71. Optimistic
locking Updating Redis
WATCH restaurant:lastSeenEventId:≪restaurantId≫
lastSeenEventId = GET restaurant:lastSeenEventId:≪restaurantId≫
Duplicate
if (lastSeenEventId >= eventId) return; detection
MULTI
SET restaurant:lastSeenEventId:≪restaurantId≫ eventId
Transaction
... update the restaurant data...
EXEC
@crichardson
72. Agenda
• Why polyglot persistence?
• Using Redis as a cache
• Optimizing queries using Redis materialized views
• Synchronizing MySQL and Redis
• Tracking changes to entities
• Using a modular asynchronous architecture
@crichardson
77. Agenda
• Why polyglot persistence?
• Using Redis as a cache
• Optimizing queries using Redis materialized views
• Synchronizing MySQL and Redis
• Tracking changes to entities
• Using a modular asynchronous architecture
@crichardson
79. Drawbacks of this monolithic
architecture
• Obstacle
to frequent
WAR
deployments
Restaurant
Management • Overloads IDE and web
container
...
• Obstacle
to scaling
development
• Technology lock-in
@crichardson
81. Using a message broker
Asynchronous is preferred
JSON is fashionable but binary
format is more efficient
@crichardson
82. Modular architecture
RESTAURANT
CONSUMER Timer
OWNER
Order Event Restaurant
taking Publisher Management
MySQL Redis
Redis RabbitMQ
Database Cache
@crichardson
83. Benefits of a modular
asynchronous architecture
• Scales
development: develop, deploy and scale each service
independently
• Redeploy UI frequently/independently
• Improves fault isolation
• Eliminates long-term commitment to a single technology stack
• Message broker decouples producers and consumers
@crichardson
84. Step 2 of 2
for each CRUD event in MySQL queue
get next CRUD event from MySQL queue
Publish persistent message to RabbitMQ
begin MySQL transaction
mark CRUD event as processed
commit transaction
@crichardson
88. Summary
• Each SQL/NoSQL database = set of tradeoffs
• Polyglot
persistence: leverage the strengths of SQL and
NoSQL databases
• Use Redis as a distributed cache
• Store denormalized data in Redis for fast querying
• Reliable database synchronization required
@crichardson