Developing polyglot persistence applications (devnexus 2013)

3,429
-1

Published on

NoSQL databases such as Redis, MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offer significantly better scalability and performance. However, using a NoSQL database means giving up the benefits of the relational model such as SQL, constraints and ACID transactions. For some applications, the solution is polyglot persistence: using SQL and NoSQL databases together.

In this talk, you will learn about the benefits and drawbacks of polyglot persistence and how to design applications that use this approach. We will explore the architecture and implementation of an example application that uses MySQL as the system of record and Redis as a very high-performance database that handles queries from the front-end. You will learn about mechanisms for maintaining consistency across the various databases.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,429
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
31
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Developing polyglot persistence applications (devnexus 2013)

  1. 1. DEVELOPING POLYGLOT PERSISTENCE APPLICATIONSChris RichardsonAuthor of POJOs in ActionFounder of the original CloudFoundry.com @crichardson chris.richardson@springsource.comhttp://plainoldobjects.com
  2. 2. Presentation goalThe benefits and drawbacks of polyglot persistence andHow to design applications that use this approach @crichardson
  3. 3. About Chris @crichardson
  4. 4. (About Chris) @crichardson
  5. 5. About Chris() @crichardson
  6. 6. About Chris @crichardson
  7. 7. About Chrishttp://www.theregister.co.uk/2009/08/19/springsource_cloud_foundry/ @crichardson
  8. 8. vmc push About-Chris Developer Advocate for CloudFoundry.comSignup at http://cloudfoundry.com @crichardson
  9. 9. Agenda• Why polyglot persistence?• Using Redis as a cache• Optimizing queries using Redis materialized views• Synchronizing MySQL and Redis• Tracking changes to entities• Using a modular asynchronous architecture @crichardson
  10. 10. Food to Go• Take-out food delivery service• “Launched” in 2006 @crichardson
  11. 11. Food To Go Architecture RESTAURANT CONSUMER OWNER Order Restaurant taking Management MySQL Database @crichardson
  12. 12. Success Growth challenges• Increasing traffic• Increasing data volume• Distribute across a few data centers• Increasing domain model complexity @crichardson
  13. 13. Limitations of relational databases• Scalability• Distribution• Schema updates• O/R impedance mismatch• Handling semi-structured data @crichardson
  14. 14. Solution: Spend Moneyhttp://upload.wikimedia.org/wikipedia/commons/e/e5/Rising_Sun_Yacht.JPG OR http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_5_series/madone_5_2/# @crichardson
  15. 15. Solution: Use NoSQL Benefits Drawbacks• Higher performance • Limited transactions• Higher scalability • Limited querying• Richer data-model • Relaxed consistency• Schema-less • Unconstrained data @crichardson
  16. 16. Example NoSQL DatabasesDatabase Key featuresCassandra Extensible column store, very scalable, distributedNeo4j Graph database Document-oriented, fast, scalableMongoDBRedis Key-value store, very fast http://nosql-database.org/ lists 122+ NoSQL databases @crichardson
  17. 17. Redis• Advanced key-value store K1 V1• C-based server K2 V2• Very fast, e.g. 100K reqs/sec ... ...• Optional persistence• Transactions with optimistic locking• Master-slave replication• Sharding using client-side consistent hashing @crichardson
  18. 18. Sorted sets Value Key a b myset 5.0 10. Members are Scoresorted by score @crichardson
  19. 19. Adding members to a sorted set Redis Server Key Score Value a zadd myset 5.0 a myset 5.0 @crichardson
  20. 20. Adding members to a sorted set Redis Server a b zadd myset 10.0 b myset 5.0 10. @crichardson
  21. 21. Adding members to a sorted set Redis Server c a b zadd myset 1.0 c myset 1.0 5.0 10. @crichardson
  22. 22. Retrieving members by index range Start End Key Index Index Redis Server zrange myset 0 1 c a b myset 1.0 5.0 10. c a @crichardson
  23. 23. Retrieving members by score Min Max Key value value Redis Serverzrangebyscore myset 1 6 c a b myset 1.0 5.0 10. c a @crichardson
  24. 24. Redis use cases• Replacement for Memcached • Handling tasks that overload an RDBMS • Session state • Hit counts - INCR • Cache of data retrieved from • Most recent N items - LPUSH and system of record (SOR) LTRIM• Replica of SOR for queries • Randomly selecting an item – needing high-performance SRANDMEMBER • Queuing – Lists with LPOP, RPUSH, …. • High score tables – Sorted sets and ZINCRBY • … @crichardson
  25. 25. Redis is great but there are tradeoffs• Low-level query language: PK-based access only• Limited transaction model: • Read first and then execute updates as batch • Difficult to compose code• Data must fit in memory• Single-threaded server: run multiple with client-side sharding• Missing features such as access control, ... @crichardson
  26. 26. And don’t forget:An RDBMS is fine for many applications @crichardson
  27. 27. The future is polyglot e.g. Netflix • RDBMS • SimpleDB • Cassandra • Hadoop/HbaseIEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg @crichardson
  28. 28. Agenda• Why polyglot persistence?• Using Redis as a cache• Optimizing queries using Redis materialized views• Synchronizing MySQL and Redis• Tracking changes to entities• Using a modular asynchronous architecture @crichardson
  29. 29. Food to Go – Domain model (partial)class Restaurant { class TimeRange { long id; long id; String name; int dayOfWeek; Set<String> serviceArea; int openTime; Set<TimeRange> openingHours; int closeTime; List<MenuItem> menuItems; }} class MenuItem { String name; double price; } @crichardson
  30. 30. Database schemaID Name … RESTAURANT table1 Ajanta2 Montclair EggshopRestaurant_id zipcode RESTAURANT_ZIPCODE table1 947071 946192 946112 94619 RESTAURANT_TIME_RANGE tableRestaurant_id dayOfWeek openTime closeTime1 Monday 1130 14301 Monday 1730 21302 Tuesday 1130 … @crichardson
  31. 31. RestaurantRepository public interface RestaurantRepository { void addRestaurant(Restaurant restaurant); Restaurant findById(long id); ... }Food To Go will have scaling eventually issues @crichardson
  32. 32. Increase scalability by caching RESTAURANT CONSUMER OWNER Order Restaurant taking Management MySQL Cache Database @crichardson
  33. 33. Caching Options• Where: • Hibernate 2nd level cache • Explicit calls from application code • Caching aspect• Cache technologies: Ehcache, Memcached, Infinispan, ... Redis is also an option @crichardson
  34. 34. Using Redis as a cache• Spring 3.1 cache abstraction • Annotations specify which methods to cache • CacheManager - pluggable back-end cache• Spring Data for Redis • Simplifies the development of Redis applications • Provides RedisTemplate (analogous to JdbcTemplate) • Provides RedisCacheManager @crichardson
  35. 35. Using Spring 3.1 Caching@Servicepublic class RestaurantManagementServiceImpl implements RestaurantManagementService { private final RestaurantRepository restaurantRepository; @Autowired public RestaurantManagementServiceImpl(RestaurantRepository restaurantRepository) { this.restaurantRepository = restaurantRepository; } @Override public void add(Restaurant restaurant) { Cache result restaurantRepository.add(restaurant); } @Override @Cacheable(value = "Restaurant") public Restaurant findById(int id) { return restaurantRepository.findRestaurant(id); Evict from } cache @Override @CacheEvict(value = "Restaurant", key="#restaurant.id") public void update(Restaurant restaurant) { restaurantRepository.update(restaurant); } @crichardson
  36. 36. Configuring the Redis Cache Manager Enables caching <cache:annotation-driven /> <bean id="cacheManager" class="org.springframework.data.redis.cache.RedisCacheManager" > <constructor-arg ref="restaurantTemplate"/> </bean> Specifies CacheManager The RedisTemplate used implementation to access Redis @crichardson
  37. 37. Domain object to key-value mapping? Restaurant K1 V1TimeRangeTimeRange MenuItem MenuItem K2 V2 ... ... ServiceArea @crichardson
  38. 38. RedisTemplate handles the mapping• Principal API provided by Spring Data to Redis• Analogous to JdbcTemplate• Encapsulates boilerplate code, e.g. connection management• Maps Java objects Redis byte[]’s @crichardson
  39. 39. Serializers: object byte[]• RedisTemplate has multiple serializers• DefaultSerializer - defaults to JdkSerializationRedisSerializer• KeySerializer• ValueSerializer• HashKeySerializer• HashValueSerializer @crichardson
  40. 40. Serializing a Restaurant as JSON@Configurationpublic class RestaurantManagementRedisConfiguration { @Autowired private RestaurantObjectMapperFactory restaurantObjectMapperFactory; private JacksonJsonRedisSerializer<Restaurant> makeRestaurantJsonSerializer() { JacksonJsonRedisSerializer<Restaurant> serializer = new JacksonJsonRedisSerializer<Restaurant>(Restaurant.class); ... return serializer; } @Bean @Qualifier("Restaurant") public RedisTemplate<String, Restaurant> restaurantTemplate(RedisConnectionFactory factory) { RedisTemplate<String, Restaurant> template = new RedisTemplate<String, Restaurant>(); template.setConnectionFactory(factory); JacksonJsonRedisSerializer<Restaurant> jsonSerializer = makeRestaurantJsonSerializer(); template.setValueSerializer(jsonSerializer); return template; } Serialize restaurants using Jackson} JSON @crichardson
  41. 41. Caching with Redis RESTAURANT CONSUMER OWNER Order Restaurant taking Management Redis MySQLFirst Second Cache Database @crichardson
  42. 42. Agenda• Why polyglot persistence?• Using Redis as a cache• Optimizing queries using Redis materialized views• Synchronizing MySQL and Redis• Tracking changes to entities• Using a modular asynchronous architecture @crichardson
  43. 43. Finding available restaurantsAvailable restaurants = Serve the zip code of the delivery address AND Are open at the delivery timepublic interface AvailableRestaurantRepository { List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime); ...} @crichardson
  44. 44. Finding available restaurants on Monday, 6.15pm for 94619 zipcode Straightforward three-way joinselect r.*from restaurant r inner join restaurant_time_range tr on r.id =tr.restaurant_id inner join restaurant_zipcode sa on r.id = sa.restaurant_idwhere ’94619’ = sa.zip_code and tr.day_of_week=’monday’ and tr.openingtime <= 1815 and 1815 <= tr.closingtime @crichardson
  45. 45. How to scale queries? @crichardson
  46. 46. Option #1: Query caching• [ZipCode, DeliveryTime] ⇨ list of available restaurants BUT• Long tail queries• Update restaurant ⇨ Flush entire cache Ineffective @crichardson
  47. 47. Option #2: Master/Slave replication Writes Consistent reads Queries MySQL (Inconsistent reads) Master MySQL MySQL MySQL Slave 1 Slave 2 Slave N @crichardson
  48. 48. Master/Slave replication• Mostly straightforward BUT• Assumes that SQL query is efficient• Complexity of administration of slaves• Doesn’t scale writes @crichardson
  49. 49. Option #3: Redis materialized views RESTAURANT CONSUMER OWNER Order Restaurant taking Management System ofCopy update() Record findAvailable() MySQL Redis Cache Database @crichardson
  50. 50. BUT how to implement findAvailableRestaurants() with Redis?! ?select r.*from restaurant r K1 V1 inner join restaurant_time_range tr on r.id =tr.restaurant_id inner join restaurant_zipcode sa on r.id = sa.restaurant_id K2 V2where ’94619’ = sa.zip_code and tr.day_of_week=’monday’ and tr.openingtime <= 1815 ... ... and 1815 <= tr.closingtime @crichardson
  51. 51. Solution: Build an index using sorted sets and ZRANGEBYSCOREZRANGEBYSCORE myset 1 6 = sorted_setselect value key value scorefrom sorted_setwhere key = ‘myset’ and score >= 1 and score <= 6 @crichardson
  52. 52. How to transform the SELECT statement?select r.*from restaurant r inner join restaurant_time_range tr on r.id =tr.restaurant_id inner join restaurant_zipcode sa ? select value from sorted_set where key = ? on r.id = sa.restaurant_id and score >= ?where ’94619’ = sa.zip_code and score <= ? and tr.day_of_week=’monday’ and tr.openingtime <= 1815 and 1815 <= tr.closingtime @crichardson
  53. 53. We need to denormalizeThink materialized view @crichardson
  54. 54. Simplification #1: DenormalizationRestaurant_id Day_of_week Open_time Close_time Zip_code1 Monday 1130 1430 947071 Monday 1130 1430 946191 Monday 1730 2130 947071 Monday 1730 2130 946192 Monday 0700 1430 94619… SELECT restaurant_id FROM time_range_zip_code WHERE day_of_week = ‘Monday’ Simpler query:  No joins AND zip_code = 94619  Two = and two < AND 1815 < close_time AND open_time < 1815 @crichardson
  55. 55. Simplification #2: Application filteringSELECT restaurant_id, open_timeFROM time_range_zip_codeWHERE day_of_week = ‘Monday’ Even simpler query • No joins AND zip_code = 94619 • Two = and one < AND 1815 < close_time AND open_time < 1815 @crichardson
  56. 56. Simplification #3: Eliminate multiple =’s with concatenation Restaurant_id Zip_dow Open_time Close_time 1 94707:Monday 1130 1430 1 94619:Monday 1130 1430 1 94707:Monday 1730 2130 1 94619:Monday 1730 2130 2 94619:Monday 0700 1430 …SELECT restaurant_id, open_timeFROM time_range_zip_codeWHERE zip_code_day_of_week = ‘94619:Monday’ AND 1815 < close_time key range @crichardson
  57. 57. Simplification #4: Eliminate multiple RETURN VALUES with concatenation zip_dow open_time_restaurant_id close_time 94707:Monday 1130_1 1430 94619:Monday 1130_1 1430 94707:Monday 1730_1 2130 94619:Monday 1730_1 2130 94619:Monday 0700_2 1430 ... SELECT open_time_restaurant_id, FROM time_range_zip_code WHERE zip_code_day_of_week = ‘94619:Monday’ AND 1815 < close_time ✔ @crichardson
  58. 58. Using a Redis sorted set as an index zip_dow open_time_restaurant_id close_time 94707:Monday 1130_1 1430 94619:Monday 1130_1 1430 94707:Monday 1730_1 2130 94619:Monday 1730_1 2130 94619:Monday 0700_2 1430 ... Key Sorted Set [ Entry:Score, …] 94619:Monday [0700_2:1430, 1130_1:1430, 1730_1:2130] 94707:Monday [1130_1:1430, 1730_1:2130] @crichardson
  59. 59. Querying with ZRANGEBYSCORE Key Sorted Set [ Entry:Score, …] 94619:Monday [0700_2:1430, 1130_1:1430, 1730_1:2130] 94707:Monday [1130_1:1430, 1730_1:2130] Delivery zip and day Delivery time ZRANGEBYSCORE 94619:Monday 1815 2359  {1730_1} 1730 is before 1815  Ajanta is open @crichardson
  60. 60. Adding a Restaurant@Componentpublic class AvailableRestaurantRepositoryImpl implements AvailableRestaurantRepository { @Override public void add(Restaurant restaurant) { addRestaurantDetails(restaurant); Store as addAvailabilityIndexEntries(restaurant); JSON } Text private void addRestaurantDetails(Restaurant restaurant) { restaurantTemplate.opsForValue().set(keyFormatter.key(restaurant.getId()), restaurant); } private void addAvailabilityIndexEntries(Restaurant restaurant) { for (TimeRange tr : restaurant.getOpeningHours()) { String indexValue = formatTrId(restaurant, tr); key member int dayOfWeek = tr.getDayOfWeek(); int closingTime = tr.getClosingTime(); for (String zipCode : restaurant.getServiceArea()) { redisTemplate.opsForZSet().add(closingTimesKey(zipCode, dayOfWeek), indexValue, closingTime); } } } score @crichardson
  61. 61. Finding available Restaurants@Componentpublic class AvailableRestaurantRepositoryImpl implements AvailableRestaurantRepository { @Override public List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) { Find those that String zipCode = deliveryAddress.getZip(); close after int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime); int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime); String closingTimesKey = closingTimesKey(zipCode, dayOfWeek); Set<String> trsClosingAfter = redisTemplate.opsForZSet().rangeByScore(closingTimesKey, timeOfDay, 2359); Set<String> restaurantIds = new HashSet<String>(); for (String tr : trsClosingAfter) { Filter out those that String[] values = tr.split("_"); open after if (Integer.parseInt(values[0]) <= timeOfDay) restaurantIds.add(values[1]); } Collection<String> keys = keyFormatter.keys(restaurantIds); return availableRestaurantTemplate.opsForValue().multiGet(keys); Retrieve open } restaurants @crichardson
  62. 62. Sorry Ted!http://en.wikipedia.org/wiki/Edgar_F._Codd @crichardson
  63. 63. Agenda• Why polyglot persistence?• Using Redis as a cache• Optimizing queries using Redis materialized views• Synchronizing MySQL and Redis• Tracking changes to entities• Using a modular asynchronous architecture @crichardson
  64. 64. MySQL & Redisneed to be consistent @crichardson
  65. 65. Two-Phase commit is not an option• Redis does not support it• Even if it did, 2PC is best avoided http://www.infoq.com/articles/ebay-scalability-best-practices @crichardson
  66. 66. AtomicConsistent Basically AvailableIsolated Soft stateDurable Eventually consistentBASE: An Acid Alternative http://queue.acm.org/detail.cfm?id=1394128 @crichardson
  67. 67. Updating Redis #FAILbegin MySQL transaction update MySQL Redis has update update Redis MySQL does notrollback MySQL transactionbegin MySQL transaction update MySQL MySQL has updatecommit MySQL transaction Redis does not<<system crashes>> update Redis @crichardson
  68. 68. Updating Redis reliably Step 1 of 2begin MySQL transaction update MySQL ACID queue CRUD event in MySQLcommit transaction Event Id Operation: Create, Update, Delete New entity state, e.g. JSON @crichardson
  69. 69. Updating Redis reliably Step 2 of 2for each CRUD event in MySQL queue get next CRUD event from MySQL queue If CRUD event is not duplicate then Update Redis (incl. eventId) end if begin MySQL transaction mark CRUD event as processed commit transaction @crichardson
  70. 70. Step 1 Step 2 Timer EntityCrudEvent EntityCrudEvent apply(event) Redis Repository Processor UpdaterINSERT INTO ... SELECT ... FROM ... ENTITY_CRUD_EVENT ID JSON processed? Redis @crichardson
  71. 71. Optimistic locking Updating RedisWATCH restaurant:lastSeenEventId:≪restaurantId≫lastSeenEventId = GET restaurant:lastSeenEventId:≪restaurantId≫ Duplicateif (lastSeenEventId >= eventId) return; detectionMULTI SET restaurant:lastSeenEventId:≪restaurantId≫ eventId Transaction ... update the restaurant data...EXEC @crichardson
  72. 72. Agenda• Why polyglot persistence?• Using Redis as a cache• Optimizing queries using Redis materialized views• Synchronizing MySQL and Redis• Tracking changes to entities• Using a modular asynchronous architecture @crichardson
  73. 73. How do we generate CRUD events? @crichardson
  74. 74. Change tracking options• Explicit code• Hibernate event listener• Service-layer aspect• CQRS/Event-sourcing @crichardson
  75. 75. HibernateEvent EntityCrudEvent Listener Repository ENTITY_CRUD_EVENT ID JSON processed? @crichardson
  76. 76. Hibernate event listenerpublic class ChangeTrackingListener implements PostInsertEventListener, PostDeleteEventListener, PostUpdateEventListener { @Autowired private EntityCrudEventRepository entityCrudEventRepository; private void maybeTrackChange(Object entity, EntityCrudEventType eventType) { if (isTrackedEntity(entity)) { entityCrudEventRepository.add(new EntityCrudEvent(eventType, entity)); } } @Override public void onPostInsert(PostInsertEvent event) { Object entity = event.getEntity(); maybeTrackChange(entity, EntityCrudEventType.CREATE); } @Override public void onPostUpdate(PostUpdateEvent event) { Object entity = event.getEntity(); maybeTrackChange(entity, EntityCrudEventType.UPDATE); } @Override public void onPostDelete(PostDeleteEvent event) { Object entity = event.getEntity(); maybeTrackChange(entity, EntityCrudEventType.DELETE); } @crichardson
  77. 77. Agenda• Why polyglot persistence?• Using Redis as a cache• Optimizing queries using Redis materialized views• Synchronizing MySQL and Redis• Tracking changes to entities• Using a modular asynchronous architecture @crichardson
  78. 78. Original architecture WAR Restaurant Management ... @crichardson
  79. 79. Drawbacks of this monolithic architecture • Obstacle to frequentWAR deployments Restaurant Management • Overloads IDE and web container ... • Obstacle to scaling development • Technology lock-in @crichardson
  80. 80. Need a more modular architecture @crichardson
  81. 81. Using a message brokerAsynchronous is preferredJSON is fashionable but binary format is more efficient @crichardson
  82. 82. Modular architecture RESTAURANT CONSUMER Timer OWNER Order Event Restaurant taking Publisher Management MySQL RedisRedis RabbitMQ Database Cache @crichardson
  83. 83. Benefits of a modular asynchronous architecture• Scales development: develop, deploy and scale each service independently• Redeploy UI frequently/independently• Improves fault isolation• Eliminates long-term commitment to a single technology stack• Message broker decouples producers and consumers @crichardson
  84. 84. Step 2 of 2for each CRUD event in MySQL queue get next CRUD event from MySQL queue Publish persistent message to RabbitMQ begin MySQL transaction mark CRUD event as processed commit transaction @crichardson
  85. 85. Message flowEntityCrudEvent Processor AvailableRestaurant ManagementService Redis Updater Spring Integration glue code RABBITMQ REDIS @crichardson
  86. 86. RedisUpdater AMQP<beans> Creates proxy <int:gateway id="redisUpdaterGateway" service-interface="net...RedisUpdater" default-request-channel="eventChannel" /> <int:channel id="eventChannel"/> <int:object-to-json-transformer input-channel="eventChannel" output-channel="amqpOut"/> <int:channel id="amqpOut"/> <amqp:outbound-channel-adapter channel="amqpOut" amqp-template="rabbitTemplate" routing-key="crudEvents" exchange-name="crudEvents" /></beans> @crichardson
  87. 87. AMQP Available...Service<beans> <amqp:inbound-channel-adapter channel="inboundJsonEventsChannel" connection-factory="rabbitConnectionFactory" queue-names="crudEvents"/> <int:channel id="inboundJsonEventsChannel"/> <int:json-to-object-transformer input-channel="inboundJsonEventsChannel" type="net.chrisrichardson.foodToGo.common.JsonEntityCrudEvent" output-channel="inboundEventsChannel"/> <int:channel id="inboundEventsChannel"/> Invokes service <int:service-activator input-channel="inboundEventsChannel" ref="availableRestaurantManagementServiceImpl" method="processEvent"/></beans> @crichardson
  88. 88. Summary• Each SQL/NoSQL database = set of tradeoffs• Polyglot persistence: leverage the strengths of SQL and NoSQL databases• Use Redis as a distributed cache• Store denormalized data in Redis for fast querying• Reliable database synchronization required @crichardson
  89. 89. @crichardson chris.richardson@springsource.com http://plainoldobjects.com - code and slides Sign up for CloudFoundry.com@crichardson
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×