CodeFest 2014. Christopher Bennage — CQRS Journey: scalable, available, and maintainable systems


Published on

Published in: Internet, Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • The requirements of today’s applications have changed. As a result, the way we build those applications needs to change as well.There are architectural practices that have become standard. They are a our default approach to building system.One such standard pattern high-level pattern is the Three Tier or N-Tier Architecture.This pattern worked well for a few decades, when our applications were mainly used in-house, hosted on our own servers, with a limited number of users.Today, we are all building applications that transcend these barriers. Whether you are an Enterprise Developer or an Independent Software Vendor, a Hobbyist, or a Consultant, the new reality is that you are building applications that need to:Scale with predictable cost, at unpredictable timesMinimize down-time.
  • Let’s think about these new requirements. (I think it is a bit misleading to call them “new”. In many ways, they are just the logical progression of older, well-established requirements. However, that’s off my topic.)It is not enough to simply scale out our systems to meet user demand, we have to ensure that we can do so with predictable cost. Preferably, a linear or better increase in cost. We’ve also learned that we cannot always predict when the surge in demand will occur. The beauty of the cloud is that we can quickly react to surges in user demand.I use the term ‘elasticity’ to describe the ability to scale at unpredictable times.I believe that “elasticity” also implies the idea of reducing cost by scaling down the resources used to a responsible minimum.User expectations is that applications and services are always available. Time lost corresponds directly to money lost. Likewise, our systems are increasingly global in scale and that means that old ideas like “performing maintenance while the users are asleep” are no longer meaningful.We need to be able to achieve these new requirements while keeping the system maintainable. Likewise, the additional complexities that arise when pursuing high scalability and availability mean that we need to also build our systems to be resilient to failures because there will be failures.
  • How does all of this lead to CQRS?CQRS is an approach, or an architectural pattern that can be applied to address some of these new requirements.CQRS simply means that we separate the “commands” in our system from the “queries” in our system.“Commands” means doing something that changes the state of the system. “Query” means reading the state of the system. I’ll sometimes use the terms “reads and writes” in place of “queries and commands”. This is similar to CQS., with CQRS the pattern is applied to a “bounded context” or subsystem as opposed to simply objects.Why do queries and commands (reads and writes) need to be segregated? The motivation arises from the fact that many systems (certainly not all) tend to have a lot more of one than the other.For example, consider a massive ecommerce system, it will have an order of magnitude more reads (queries) than it will writes (commands). Hence, separating these responsibilities in the system will allows us to more easily meet our new requirements.
  • Let’s examine the n-tier pattern so that we can contrast it with CQRS.In the traditional n-tier system we would have horizontal layers in our application. That is, we introduce segregation into our system based on the type of service provided. For example, we might have the following layers:Presentation – responsible for accepting user input and render data back to the userBusiness Logic – responsible for enforcing business rules, coordinating between components, etc.Persistence – responsible for storing dataThe difficulty here is that these layers (we could also call them “separations” or “segregations”) do not necessarily correspond to components that can scale independently of one another.In our fictitious ecommerce example, when a user simply browses for products the system employs all of these horizontal layers. Likewise, when a users makes a purchase or submits a product review, all of the layers are involved.To complicate matters furthers, some operations in a system are naturally contentious. They can interfere with other operations. An example of this might be updating the inventory of our fictitious ecommerce system. Such an operation might cause locks on product tables that would interfere with customers browsing for products.Finally, we don’t have a natural “unit of scale” with the traditional n-tier model. We could add resources to any given layer, but that is often likely to merely move the scale problem to another layer. This concept of “Scale Unit” is important.
  • If it is true that reads significantly outnumber writes in our system, then we have a natural “seam” in our system. We can introduce a separation along this “seam” that will allow us more flexibility when we need to scale. This means that we only need to increase the resources for the portion of the system that needs to handle the increased demand.A related concept is expressed in Robert C. Martin’s book Agile Software Development, Principles, Patterns, and Practices. However, in the book, the idea of identifying seams is primarily for supporting maintainability. Here I am suggesting that it as a mechanism for improving scalabity.Photo credit:
  • Now that we have identified a natural seam in our application between the reads and the writes (between the queries and the commands) we can explicitly model this in our system.Notice that we have not eliminated the horizontal layers in our architecture. They still exist, instead we are introducing vertical partitions in addition to these traditional horizontal layers.In this example, we still have a common presentation layer. However, it delegates writes/commands to one subsystem and reads/queries to a different subsystem.We can scale the read/query side of the system independently from the write/command side. In addition, these two subsystems can have completely different models. We don’t need to mix the query-side concerns of projecting and aggregating data into the command-write subsystem and vice versa. This can lead to code that is more focused and optimized.Overall, CQRS is about factoring your application in Commands and Queries so that you can scale the components independently.Now that you have different models, you can go to the next step and have separate data stores.Of course, these stores need some channels of communication. This could be replication, transformation, projection, etc. Having a separation in the data persistence means that we can also change the way we think about storing and modelling the data. That’s why we often talk about another pattern called Event Sourcing when we talk about CQRS.
  • It is common in our systems to store the current state of the system. Continuing with our naïve example of an ecommerce site we might model the shopping cart as two tables in a relational database. Alternatively, we might treat it as a single document in a NoSQL store. Either way, what we store is a representation of the current state. When there is a change to the shopping cart, we modify or mutate the current state.If I add a addition instance of an item to my shopping cart, I first load the current quantity for that item and I add one to it, then I persist it back to disk.With Event Sourcing, we don’t focus on the current state. Instead, we persist the transactions (or events) themselves. In this case, an “event” would be “adding 1 item to the shopping cart”. Events are never muted or altered. They represent something that happened. They are “append only”.In Event Sourcing, the current state is an aggregation of events. In order to acquire the current state, you have to “replay” the events.This is actually the way that bank accounts works. Bank accounts maintain a ledger of deposits and withdrawals. In order to determine the current balance, you iterate through the events and sum up the balance. Much like bank accounts, you don’t want to waste time iterating through events constantly. It is common in systems using Event Sourcing to maintain periodic snapshots of the state. Another benefit is that storing a sequence of events can provide a rich source for inferring user intent.Another down side of Event Sourcing is that the data is difficult to query. Of course, this brings us back to the beauty of CQRS: the data model for the command side of the system can use Event Sourcing, and the data model for the query side can use something that is easier to query.
  • So how might these patterns work together?Any part of your system, or rather your bounded context, that changes the state of the system would be classified as the “Command” part of the system. This is often referred to as the Write Model.The Write Model persists data using the Event Source pattern. This means that events are stored, but that current state is not explicitly stored. Since events are hard historical fact, they are never altered or updated. This means that Write Model is only ever appending data. It is never updating existing data. This can yields some performance benefits on its own.As events are generated in the Write Model, they can be asynchronously broadcast to Read Model of the system. The events can be processed, and aggregated data can be stored that is optimized for querying. Since the events are broadcast, that makes distribution of the Read Model easier to accommodate. Since data is optimized for querying, it can also lead to simpler, more focused code for the Read Model.In systems like this, the Write Model, or rather the persisted set of events, is considered the ultimate source of truth. Another benefit is that events can be replayed in light of new business rules.
  • With every solution there is a trade-off. What is the trade-off that we will encounter here?The most noticeable “problem” is that there is delay in propagating changes to the Read Model. Since the process of transmitting events to the Read Model is asynchronous, they is the possibility of stale results when you query. This is something that you need to be aware of from the business perspective.This is described as Eventual Consistency, because, given sufficient time, the Write Model and Read Model will have consistent data. However, there is the window of inconsistency in between.In practice, this turns out to be unimportant for many applications. First, the window of inconsistency is generally very brief (on the order of milliseconds). Secondly, many business scenarios are not impacted by stale data. For example, when retrieving product information to display to a customer is there significant impact if you do not display the product review submitted 2 seconds earlier?
  • This quote is from Nicomachean Ethics. This is quote is difficult for me to parse in English. I wonder if it was easier in the original Greek.The patterns & practices team wanted to explore these patterns of CQRS and ES. However, we recognized that we could not talk authoritatively about these patterns without real world experience.In order to gain this experience, we decided that we would take a journey.That is, we would go through the process of building an application with these patterns; including multiple cycles of development-release. Along the way, we would chronicle what we learned. The result is not such much prescriptive guidance, but rather the notes of a development team that encountered both success and failure.
  • We wanted to make sure that we were authentically connected the CQRS community and harvesting as much of their experience as possible. In a very real way, we were simply trying to map a territory that they had already explored.Conceptually, there was a lot of new land to explore. The CQRS community has its root in the Domain Driven Design community. There is a lot of language and concepts carried over from DDD. In some ways, CQRS can be seen as a continuation of the ideas and patterns that grew up in DDD.Eric Evans, the father of DDD, acknowledged in his presentation “DDD at 10”, where he described the state of DDD after 10 years. ran the project on GitHub, licensed under Apache 2.0, and we accepted community contributions for both the source code (our reference implementation) as well as the written guidance.We built a conference management system, albeit an incomplete one. We broke apart the application in multiple “bounded contexts” (or subsystems) and we chose to implement some of the subsystems using CQRS. We went through a cycle of development, we deployed the application, then we developed a second release and deployed that.Again, the results of these experiences are freely available online.
  • There is a saying in English, “When the only tool you have is a hammer, then everything looks like a nail.”It is important to recognize that CQRS is not the solution to every problem of scalability and availability. In fact, it is common to find systems that only use CQRS for specific subsystems. In our own journey, we divided our systems into subsystems called “bounded contexts”. However, in our story we treated these bounded contexts as vertical slices representing logical groups of features. We believe that CQRS only made sense for some of these bounded contexts and that other patterns might be a better fit for different bounded contexts.It is important to identify and understand the needs of your system.Multiple users competing for the same resource could benefit from CQRS.Changing business rules that need to be “replayed” against historical data could benefit from Event Sourcing.Complex business might benefit more from the core ideas and patterns in DDD itself.Overall, some possible benefits that CQRS/ES (and perhaps just DDD) might yield for you are:Help to explicitly partition the system into independent and isolated units that can both scale more easily as well as compose more easily with other parts of the systemBusiness rules and historical data are easier to version and evolve since due to Event SourcingTeam development is easier when “Bounded Contexts” or subsystem are explicitly called out.
  • We learned a lot along the way.It’s our assumptions that often get us in trouble. Most of us have been taught a certain way to build software (such as Object Orient Design) that works well for most things, however we should never jump to the conclusion that it works well for all things.Remember Martin Fowler’s First Law of Distributed Object Design: Don’t distribute your objects! (’t CQRS all the things. This follows from the last point: some parts of the system may work best as simple 2-tier CRUD and that’s okay.When you do have to distribute, you will really need tracing. Otherwise, you’ll spend all your time trying figure what is happening in the system.I’ve already said to throw away your assumption, this is especially important with performance. Test and keep testing.We made the mistake of trying to build everything from scratch. That’s a useful exercise for learning, but in our case we spend much more time than we expected getting that part right. We think we could have produce more by leveraging the work of others.There are more lessons learned in the published guidance. However, many of them will not makes sense without additional context.
  • The CQRS Journey project was completed almost two years ago. The questions of scalability, availability, and maintainability are more even important today than they were then.What have we learned since then?These patterns are actually the result of applying the insights from earlier patterns. The core idea is Separation of Concerns, and itunderlies N-Tier Architecture. It just did not go far enough to address the modern demands of scale and availability. CQRS tries to address these modern demands by applying Separation of Concerns more deeply, in this case to the parts of the system to reading and writing data.This leads us to the question, what can we separate next? Are there additional concerns that are tangled together that prevent us from achieving the scale and availability our systems need today? I believe that this is question we need to be continuously asking ourselves. Many of our natural assumptions can mislead us, as is demonstrated by the idea of Eventual Consistency.I’ve mentioned “bounded context” several times now. It’s a confusing term, and one that we struggled with during our project. Nevertheless, I think that it is an important concept to emphasize. It is a way to described a logically isolated and relatively independent part of your system. The key here is that the isolation and independence are with respect to actual features of the system, as opposed to infrastructure concerns. This can help us to identify vertical slices of our applications that can themselves be further separated by applying CQRS. This is a key insight that wasn’t truly clear to me at the time.Likewise, I also mentioned the term “unit of scale” or “Scale Unit”. I believe this is an idea that has recently emerged and that is a continuation of these ideas. Let’s start off with an analogy. Say that we are in the business of making toys, and we know that one worker can make 20 toys each days. If we need to produce 200 toys per day, then we need to have 10 workers.Often when we attempt to make our applications scale, we do so make trying to make our workers more efficient. We try to grow from 20 to 200 toys per day, by giving the worker better tools, optimizing their workspace, or paying them a higher wage. However, there is a physical limit to how much they can produce. The cloud as a platform allows us to magically create workers on demand. In this analogy, a worker is a Scale Unit. We can scale out our production, by hiring more workers. Likewise, in this analogy our system would be the factory where the workers make the toys.This distinction may sound subtle, but it the secret for building truly scalable systems.
  • Best Practices for the Design of Large-Scale Services on Windows Azure Cloud Services
  • CodeFest 2014. Christopher Bennage — CQRS Journey: scalable, available, and maintainable systems

    1. 1. Exploring CQRS and Event Sourcing A journey into high scalability, availability, and maintainability with Azure Christopher Bennage patterns & practices
    2. 2. •Scale with predictable cost •Scale at unpredictable times •Be continuously available New Requirements Architecting for Today
    3. 3. Command Query Responsibility Segregation What is CQRS? Separating Reads from Writes An architectural pattern that separates Commands (that change state) from Queries (that only read state).
    4. 4. N-Tier 20th Century Architecture Domain Logic Presentation Persistence Commands Queries
    5. 5. Applying CQRS Separate Data Stores
    6. 6. What is Event Sourcing? An Alternate Way to Represent the Data Cart Created Item 1 Added Item 2 Added Item 1 Removed Shipping Information Added Relational Model Event Stream
    7. 7. CQRS / ES Based on Rob Ashton’s
    8. 8. • Data is sent from Write model to Read model • Possibility of stale data • Does it have a business impact? Eventual Consistency The Trade-Off
    9. 9. “For the things we have to learn before we can do them, we learn by doing them.” ~Aristotle
    10. 10. •Is there a natural seam between reads and writes? •Are the business rules ever changing? •Is scalability one of the challenges? •Are benefits that CQRS brings clear? When to Use CQRS?
    11. 11. • Throw away your assumptions • Only distribute when necessary • CQRS is not a top-level architecture • Choose the right approach for each part of the problem • In a message-based system, tracing is very important • Test for performance early and frequently • Existing libraries, framework, and infrastructures can help Some Lessons Learned
    12. 12. Scalability results from Independence Insights After CQRS Units of Scale
    13. 13. • • • Resources •@bennage •
    14. 14. Вопросы? Christopher Bennage patterns & practices Microsoft