This is a presentation how to introduce CQRS pattern to an existing application, step by step, without breaking changes and holding up the development.
8. Disadvantages of this approach
• Enormous controllers / services / helpers
• Classes are not following SRP
• Refactoring is really difficult
• Complexity in writing unit tests
9. 1
CQRS IS TWO OBJECTS
WHERE THERE WAS ONCE ONE
Jimmy Bogard - https://lostechies.com/jimmybogard/2012/08/22/busting-some-cqrs-myths/
11. Step 1 – Splitting to objects
GetProducts
QueryHandler
ProductsController
AddProduct
CommandHandler
Mediator
Jimmy Bogard - https://lostechies.com/jimmybogard/2015/05/05/cqrs-with-mediatr-and-automapper/
12. Profit after this step
• Single responsibility
• Clearly defined target and dependencies
• Split to change and receive data
• Simplicity of writing unit tests
There is a lot of noise about CQRS. When you search through the Internet to look for information and implementation of this pattern you will find enormous number of arcticles. And it's really hard to start with CQRS when you read that without X Y Z you shouldn't start with CQRS. Or without A B C it's not CQRS but only some silly tries.
And I deeply refuse these ideas. It's really great to talk about these additions but at the beginning of the project it's really hard to plan that we will need these technologies. Due to this problem a lot of people refuses to interest about CQRS, so finally we are following the same mistakes over and over.
For me CQRS is “Segregate operations that read data from operations that update data by using separate interfaces. This can maximize performance, scalability, and security.” – from Microsoft tutorial page.
To begin with CQRS you don't need any of these stuff:
- you can start with single data source and add
- your code can be a simple procedure without domain side-effects
- event sourcing is a great pattern but not needed in CQRS
- async message buses are not needed as well
But to show you the power of CQRS lets start with an example of application that everybody would know.
Let's assume that you create an ecommerce app with your loved ORM – e.g. Entity Framework. You add layer of services which handle requests from controllers. In there you put validations, requests to database, handling business logic etc. Week after week, month after month, your application starts growing. And it starts looking as:
Maybe you are familiar with the best open-source .NET ecommerce solutution. It's really great peace of product, that you can use to build your own store, with plenty of functionalities. But unfortunately it is built in a typical layered way, that is really hard to mantain in such long-term and complicated projects.
Here are the examples:
OrderProcessingService – 187 times changed, 41 injected dependencies
Product Service – 173 times changed,, 27 injected dependencies
Everything that is somehow connected with orders or products is moved to these services. With such architecture you finish with god objects responsible for handling almost each request from particular domain.
So this approach is not fitting great in more complex applications:
- god objects with dependencies difficult to understand
- classes are responsible for multiple action
- it's really hard to split such objects or change their dependencies
- with need to mock multiple object and handle their behaviour writing unit tests is almost impossible or such unit tests do not add value to your applicationSo what we can do now?
This quote is from Bogard's post, by which was a big inspiration for me. So you don't really needs to do something difficult to implement CQRS in your projects – only thing is a split from object which changes and receives data to two different one's.
Moreover, Martin Fowler was writing about CQRS showing that to introduce this pattern to your app you need to split for model to command and queries:
So step by step, method by method, you divide your ProductService, which handles requests from ProductController, to two or more objects which handle particular actions. Action which changes data is called command and action which receives data is called query. Controller is using some implementation of dispatcher pattern (check Mediator library created by Jimmy Bogard) to publish these commands and queries. They are handled in appriopriate connected handlers.
And this is it, you have made a first step and implemented elements of CQRS in your projects. This is it – you can even stop at this point and call this CQRS – your actions are clearly defined and by definition you would know which object is changing data and which is just receiving it.
It may seem as simple and straigthforward step, but it delivers great value to your application:
- your object are now separated and you know what is their responsibility
- you know why this object was created and what are its connections to other part of your application
- you can easily find elements which changes data and focus on them when something is modifying your database
- writing unit tests, with clearly defined output and narrowed number of dependecies, is much easier and gives more value
To sum it up, it is much easier to develop and maintain your application if it is divided into small, separate part.
Of course this step is not enhancing performance of your application. So in some point you get to the point when your data querying is too time consuming and is not accepted by the end user. And of course, your best ORM is responsible for it. So what you can do in that point?
You focus on improving performance of your queries, with advantages taken from first step to CQRS – splitting to commands and queries.
So in that point you got some controllers handling requests by divided handlers. And you see that some of your handlers are struggling from using Entity Framework.
But not all handlers needs our attention. A lot of handlers work really well with our ORM – data queried there is relatively small and query is generated in a moment. In that case you do not need to change anything.
But for some handlers, with difficult queries are too slow for Entity Framework and you have to do something with it. And because you have strictly separated queries and command it's really easy to focus only on these requiring some attention. So you change a way as application connects to database and queries data to better fitted one – with better performance, quicker SQL query generation, less memory needed.
There are plenty of options to achieve such improvement:
– you can use No tracking option in EF not to track data changes
- SQL Queries and SQL Views are more detailed option to query data, fastest but hardest to maintain as well
- you can use one of many micro orms that are not so heavyweight as used one
- in .NET there is an option to use Automapper library and extension called ProjectTo to query data on basis of created maps. It's really fast and easy to introduct to your project
By that point you gained big performance improvement but without impact on other parts of the system.
Your system grows and you reach to the point where even how hard you try to improve performance of querying your data, you cannot achieve it. In some situations data is structured in a way that unables you to query it effectively – you need to make multiple unions, joins that makes querying slow on big amount of data. So what you can do in this situation?
In that case you can create additional model which store your data in a way that will be able to be queried fast and without overheads.
First you need to publish domain events – you can use previously created / used dispatcher. At the end of your handler you add a logic that raises event about action done.
Your events are synchronously handled by event handler which knows how to flatten your data to be quickly queried by your application. And puts this data to your database, the same as you currently using. But what is really important is that command handler and event handler is being runned in one transaction, so if adding data to ProductView fails application will also rollback previous operation. You are still using your current database, but part of your data is duplicated to be able to handle different query scenario.
This step solves few problems:
- data, which is hard to query, then command by command, is being transfered to simple read model
- your model is optimized to be queried really fast
- due to synchronous hadling and database transaction you don't lose your data
- at any level of your application you can recreate your view and add handle new commands
Unfortunately synchronous model of handling events has one major disadvantage – is synchronous. With many time-consuming event handlers your application freezes until all actions are done.
So you need to introduce asynchronous read model.
At the beginning you need to change way of handling your events – instead of handling events immidietaly you put your events to message bus. Then some async job, which is connected to the bus, will be notified about new event to be handled.
This async job runs event handlers which moves data to recent or new database. In your store you push product information to Elastic Search to search it efficiently and to Redis to cache data and use it in your queries.
In many cases your end-user will require to get the info when the handler pushing data to async data store finished its work. In that case you can use broadly described pattern called Correlation identifier and great library called SignalR. Every request from end user contains unique CorrelationId. It is being transported through every layer as value in Command and Events to have an ability to find which particular user caused our change. In that case you will notify this user by push notification, so he can query the data from our new model, just after pushing data into it.
Advantages of this step are pretty clear:
- you can create any read model, that you will require in current situation
- because events are handled asynchronously your command handling is very fast
- you end with many different but tiny object which are easy to test and maintain
Of course such step does not come without disadvantages, the bigest one are:
- you implemented eventual consistency pattern with all its consequences
- you need to handle all scenarios where command is done but event fails
- monitoring and debugging of your application is now much more complex than before
Ok, so to sum it all:
Path to CQRS, in my case, contains 4 steps:
- splitting objects to command and queries
- optimization places which requires such change
- creating synchronous read model
- creating asynchronous read model
What is really important, in different bounded context you can implement these steps in whole or only a part of them, depending of the needs and the use cases. Even more, nothing prevents you to have two different models in theoretically similar queries or commands – it all depends on how complex your application is and what are the needs of your end-user.