Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Full slide deck here:
http://bit.ly/ceposta-hardest-part
Twitter: @christianposta
Blog: http://blog.christianposta.com
Email: christian@redhat.com
Christian Posta
Principal Archit...
Free download @ http://developers.redhat.com
People try to copy Netflix, but they can only
copy what they see. They copy the
results, not the process.
Adrian Cockcroft...
“Microservices” is about optimizing… for speed.
• Maybe it doesn’t matter so much… What
we really care about is speed, reduced
time to value, and business outcomes.
• May...
• Number of features accepted
• % of features completed
• User satisfaction
• Feature Cycle time
• defects discovered afte...
How does your company go fast?
Manage dependencies.
Data is a major dependency.
Wait. What is data?
What is one “thing”?
Book checkout / purchase Title Search
Recommendations
Weekly reporting
Focus on domain models, not data models
• Break things into smaller,
understandable models
• Surround a model and its
“con...
Aggregates
• Use the domain to lead you to invariant rules across your domain
model
• Model the invariants and their assoc...
Stick with these conveniences as long as you can.
Seriously.
But ...
• Load/size is too great to fit on one box
• Modules/use cases have different read/write
characteristics
• Queries...
From here on out, what we’re saying is
“thank you old reliable, awesome database…
we’ve got it from here”…
Kinda looks like a combinatorial mess….
“A microservice has its own database”
How do we deal with data in this world?
We need to understand something about the data
inside our services and the data outside our services.
https://msdn.microso...
Data inside a service
Data inside a service
Data outside a service
Data outside a service
Data outside a service
We’re now building a full-fledged distributed system.
Some things to remember…
Plan for failures.
Build concepts of time, delay,
network, and failures into the
design as a first-class citizen.
How do you “read” data and how do you “update” data.
tx.begin()
c = retrieveCustomer()
c.addNewAddress(address)
tx.add(c)
tx.commit()
publishAddressChange(address, c.id)
tx.begin()
c = retrieveCustomer()
c.addNewAddress(address)
tx.add(c)
publishAddressChange(address, c.id)
tx.commit()
Separate reads and writes
(CQRS)
https://secure.phabricator.com/book/phabcontrib/article/n_plus_one/
https://secure.phabricator.com/book/phabcontrib/article/n_plus_one/
getBulkHats()
getBulkHatsForCatsExcept()
wellReallyIJu...
https://secure.phabricator.com/book/phabcontrib/article/n_plus_one/
For our reads and writes, we need some “consistency”.
What is consistency?
The history of past operations we observe as
a reader of the data
We need reads and writes. But we expect failures. This
is starting to sound like a distributed-systems theorem
I’ve heard…
CAP tells us to pick 2: Consistency, Availability,
Partition Tolerance
CAP is a bad way to think about this.
Linearizable (strict) consistency
CAP - C
Sequential consistency
Monotonic reads consistency
Eventual consistency
Consistency models…
https://en.wikipedia.org/wiki/Consistency_model
• Strict consistency (Linearizability)
• Sequential co...
Can we really use relaxed consistency models?
Tradeoffs to make with read consistency and
performance
Replicated Data Consistency Explained through Baseball
(Doug Terry)
https://www.microsoft.com/en-us/research/publication/
...
Replicated Data Consistency Explained through Baseball
(Doug Terry)
https://www.microsoft.com/en-us/research/publication/
...
Maybe we can use a relaxed consistency model for some
of those previously mentioned use cases…
Example relaxing consistency…
Internet companies created their own tools
for helping with this. (some opensource!!)
• Yelp – MySQL Streamer
https://gith...
Meet debezium.io
Meet debezium.io
WePay uses Debezium
https://wecode.wepay.com/posts/streaming-databases-in-realtime-with-mysql-debezium-kafka
Meet debezium.io
Twitter: @christianposta
Blog: http://blog.christianposta.com
Email: christian@redhat.com
Thanks for listening! Time for d...
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
The hardest part of microservices: your data
Upcoming SlideShare
Loading in …5
×

The hardest part of microservices: your data

15,460 views

Published on

Microservices architecture is a very powerful way to build scalable systems optimized for speed of change. To do this, we need to build independent, autonomous services which by definition tend to minimize dependencies on other systems. One of the tenants of microservices, and a way to minimize dependencies, is “a service should own its own database”. Unfortunately this is a lot easier said than done. Why? Because: your data.

We’ve been dealing with data in information systems for 5 decades so isn’t this a solved problem? Yes and no. A lot of the lessons learned are still very relevant. Traditionally, we application developers have accepted the practice of using relational databases and relying on all of their safety guarantees without question. But as we build services architectures that span more than one database (by design, as with microservices), things get harder. If data about a customer changes in one database, how do we reconcile that with other databases (especially where the data storage may be heterogenous?).

For developers focused on the traditional enterprise, not only do we have to try to build fast-changing systems that are surrounded by legacy systems, the domains (finance, insurance, retail, etc) are incredibly complicated. Just copying with Netflix does for microservices may or may not be useful. So how do we develop and reason about the boundaries in our system to reduce complexity in the domain?

In this talk, we’ll explore these problems and see how Domain Driven Design helps grapple with the domain complexity. We’ll see how DDD concepts like Entities and Aggregates help reason about boundaries based on use cases and how transactions are affected. Once we can identify our transactional boundaries we can more carefully adjust our needs from the CAP theorem to scale out and achieve truly autonomous systems with strictly ordered eventual consistency. We’ll see how technologies like Apache Kafka, Apache Camel and Debezium.io can help build the backbone for these types of systems. We’ll even explore the details of a working example that brings all of this together.

Published in: Software
  • Be the first to comment

The hardest part of microservices: your data

  1. 1. Full slide deck here: http://bit.ly/ceposta-hardest-part
  2. 2. Twitter: @christianposta Blog: http://blog.christianposta.com Email: christian@redhat.com Christian Posta Principal Architect – Red Hat • Author “Microservices for Java Developers” • Committer/contributor Apache Camel, Apache ActiveMQ, Fabric8.io, Apache Kafka, Debezium.io, et. al. • Worked with large Microservices, web-scale, unicorn company
  3. 3. Free download @ http://developers.redhat.com
  4. 4. People try to copy Netflix, but they can only copy what they see. They copy the results, not the process. Adrian Cockcroft, former Chief Cloud Architect, Netflix
  5. 5. “Microservices” is about optimizing… for speed.
  6. 6. • Maybe it doesn’t matter so much… What we really care about is speed, reduced time to value, and business outcomes. • Maybe a data-driven approach is a better way to answer this question... Are you doing microservices?
  7. 7. • Number of features accepted • % of features completed • User satisfaction • Feature Cycle time • defects discovered after deployment • customer lifetime value (future profit as a result of relationship with the customer) https://en.wikipedia.org/wiki/Customer_lifetime_value • revenue per feature • mean time to recovery • % improvement in SLA • number of changes • number of user complaints, recommendations, suggestions • % favorable rating in surveys • % of users using which features • % reduction in error rates • avg number of tx / user • MANY MORE! Are you doing microservices?
  8. 8. How does your company go fast?
  9. 9. Manage dependencies.
  10. 10. Data is a major dependency.
  11. 11. Wait. What is data?
  12. 12. What is one “thing”?
  13. 13. Book checkout / purchase Title Search Recommendations Weekly reporting
  14. 14. Focus on domain models, not data models • Break things into smaller, understandable models • Surround a model and its “context” with a boundary • Implement the model in code or get a new model • Explicitly map between different contexts • Model transactional boundaries as aggregates
  15. 15. Aggregates • Use the domain to lead you to invariant rules across your domain model • Model the invariants and their associated entities/value objects as “aggregates” • Aggregates focus on transactional boundaries (ie, transactional in the “A” from ACID sense) • Individual aggregates are transactionally consistent • Aggregates use relaxed consistency models between aggregates (ie, something like the Actor model?) • Bounded Contexts use relaxed consistency models between boundaries
  16. 16. Stick with these conveniences as long as you can. Seriously.
  17. 17. But ... • Load/size is too great to fit on one box • Modules/use cases have different read/write characteristics • Queries/joins are getting too complex • Security issues • Lots of conflicting changes to the model/schema • Need denormalized, optimized indexing engines • We can live with eventual consistency (whatever that really means)
  18. 18. From here on out, what we’re saying is “thank you old reliable, awesome database… we’ve got it from here”…
  19. 19. Kinda looks like a combinatorial mess….
  20. 20. “A microservice has its own database”
  21. 21. How do we deal with data in this world?
  22. 22. We need to understand something about the data inside our services and the data outside our services. https://msdn.microsoft.com/en-us/library/ms954587.aspx
  23. 23. Data inside a service
  24. 24. Data inside a service
  25. 25. Data outside a service
  26. 26. Data outside a service
  27. 27. Data outside a service
  28. 28. We’re now building a full-fledged distributed system. Some things to remember…
  29. 29. Plan for failures. Build concepts of time, delay, network, and failures into the design as a first-class citizen.
  30. 30. How do you “read” data and how do you “update” data.
  31. 31. tx.begin() c = retrieveCustomer() c.addNewAddress(address) tx.add(c) tx.commit() publishAddressChange(address, c.id)
  32. 32. tx.begin() c = retrieveCustomer() c.addNewAddress(address) tx.add(c) publishAddressChange(address, c.id) tx.commit()
  33. 33. Separate reads and writes (CQRS)
  34. 34. https://secure.phabricator.com/book/phabcontrib/article/n_plus_one/
  35. 35. https://secure.phabricator.com/book/phabcontrib/article/n_plus_one/ getBulkHats() getBulkHatsForCatsExcept() wellReallyIJustWantCertainHats() justExecuteThisSqlForMe()
  36. 36. https://secure.phabricator.com/book/phabcontrib/article/n_plus_one/
  37. 37. For our reads and writes, we need some “consistency”.
  38. 38. What is consistency? The history of past operations we observe as a reader of the data
  39. 39. We need reads and writes. But we expect failures. This is starting to sound like a distributed-systems theorem I’ve heard…
  40. 40. CAP tells us to pick 2: Consistency, Availability, Partition Tolerance CAP is a bad way to think about this.
  41. 41. Linearizable (strict) consistency CAP - C
  42. 42. Sequential consistency
  43. 43. Monotonic reads consistency
  44. 44. Eventual consistency
  45. 45. Consistency models… https://en.wikipedia.org/wiki/Consistency_model • Strict consistency (Linearizability) • Sequential consistency • Causal consistency • Processor consistency • PRAM consistency (FIFO) • Bounded staleness consistency • Monotonic read consistency • Monotonic write consistency • Read your writes consistency • Eventual consistency
  46. 46. Can we really use relaxed consistency models?
  47. 47. Tradeoffs to make with read consistency and performance
  48. 48. Replicated Data Consistency Explained through Baseball (Doug Terry) https://www.microsoft.com/en-us/research/publication/ replicated-data-consistency-explained-through-baseball/ • What consistency model do you need, depending on what role you’re playing? • What consistency model are you willing to pay for? • Official score keeper? (Linearizability or RMW) • Umpire? (Linearizability) • Sports writer? (Bounded staleness, Eventual consistency) • Radio updates? (Monotonic read, Bounded staleness) • Statistician (Bounded staleness) • Friends in the pub (Eventual consistency)
  49. 49. Replicated Data Consistency Explained through Baseball (Doug Terry) https://www.microsoft.com/en-us/research/publication/ replicated-data-consistency-explained-through-baseball/
  50. 50. Maybe we can use a relaxed consistency model for some of those previously mentioned use cases…
  51. 51. Example relaxing consistency…
  52. 52. Internet companies created their own tools for helping with this. (some opensource!!) • Yelp – MySQL Streamer https://github.com/Yelp/mysql_streamer • LinkedIn – Databus https://github.com/linkedin/databus • Zendesk – Maxwell https://github.com/zendesk/maxwell
  53. 53. Meet debezium.io
  54. 54. Meet debezium.io
  55. 55. WePay uses Debezium https://wecode.wepay.com/posts/streaming-databases-in-realtime-with-mysql-debezium-kafka
  56. 56. Meet debezium.io
  57. 57. Twitter: @christianposta Blog: http://blog.christianposta.com Email: christian@redhat.com Thanks for listening! Time for demo?

×