CQRS + Event Sourcing – Step by Step

A common issue I see is understanding the flow of commands, events and queries within a typical CQRS ES based system. The following post is designed to clear up what happens at each step. Hopefully, this will help you to reason about your code and what each part does.

cqrs event sourcing architecture

What is CQRS and what does it stand for?

CQRS stands for Command Query Responsibility Segregation. It is an architectural programming pattern. It is based on the idea that there are significant benefits resulting from separating code for the ‘write’ and the ‘read’ parts of an application. 

This makes more sense when you consider the number of reads to a database in order to display information in a typical business application, such as a blog CMS or customer relationship manager, compared to the number of times you write to the database. Clearly the reads vastly outnumber the writes. And yet typical databases and code are optimised for writes.

This gives rise to the idea that separating the command query parts of the application gives rise to some interesting benefits. Not surprisingly it also throws up some interesting challenges such as eventual consistency. Incidentally, if you are looking for a way to handle eventual consistency check out this post: 4 Ways to Handle Eventual Consistency on the UI

When Should You Use CQRS and Event Sourcing?

In my experience the major problem using the CQRS pattern (or architectural pattern) is the learning curve of CQRS it’s self. A big factor in deciding if it’s right for part or all of the next project should be how familiar your team is. It’s a bit of catch 22.

Using CQRS can and will offer the right project some considerable advantages. CQRS and ES grew out of the Domain-driven design movement. A movement designed to handle, reduce and control the growing complexity of software development. As a result, it’s ideally suited to more complex domains. The structure allows you to handle business logic, different models and potentially complicated business rules in a simpler way.

It’s suited to problems that have a clear domain model. And given the value attained through event sourcing, it is best suited to business-critical applications. There is an argument to develop less critical software using these ideas but more as a training ground for an under-experienced team.

As with many other top-level architectural patterns, elements can be plucked out for use independently. For example, separating out the query responsibility can bring benefits to other styles of application. Or the way the domain is isolated from external dependencies can really help model business logic more efficiently.

Given that premise, the following is typically how you would structure the various elements of the architecture.

1. A Command is generated from the UI

The UI displays data to the user. The user generates a command by requesting some form of change. This implies a different style of user interface sometimes known as a Task-Based UI. 

UI’s today often reflect the CRUD idea where data is laid out and you make changes. But in contrast, a task-based UI leads a user through a task. To make this difference more clear take the following example:

Changing an address.

In a CRUD based system, the address would be displayed and the address changed. But the system would not know why the address was changed. Was it due to a spelling mistake or because the customer has moved house?

In contrast, a task-based approach would allow the user to request to change the address because the customer is moving house. This can be of great importance to a business as it may trigger new offers or change the perceived risk etc.

This approach does require a small re-think into how to do validation. I won’t go into detail here but I offer some insights into smart ways to validate your CQRS events here: How To Validate Commands in a CQRS Application

2. The Message Router or Bus receives the command

The message bus is responsible for routing the command to its handler. Each command by definition is handled by a single handler.

This concept of routing the command allows your system to remain flexible and easy to maintain. You can create a pipeline for every command to follow which could do things like checking a user has permission and superficial validation and or record application performance data. The options are endless.

3. The handler prepares the aggregate root

The aggregate root is the heart of the CQRS pattern. It is where the domain models are developed. Everything that happens here is happening on the ‘write side’ or with the ‘write model’. The key thing to note about the write model is that is not designed for reading. Sounds obvious. But the number of times I see people asking how to query a domain model. But if you’re new to this then the idea that you can’t query a domain model is a little odd. But it is precisely this command query segregation which brings a cleaner less complex code solution to the business problem.

The handler news up a aggregate root and applies all previous events to it. This brings the AR (aggregate root) up to it’s latest state. This is typically very fast, even with thousands of events, however, if this becomes a performance bottleneck, a snapshotting process can be adopted to overcome this issue.

Aggregate roots are what is commonly known as ‘domain objects’ or ‘write model’. These are distinct from typical domain layers objects as they have no getters and setters (other than a getter for the ID). These objects are not designed to be queryable. They contain all the functionality and or other classes designed to perform some business role.

4. The command is issued

Once the AR is up to date the command is issued. The AR then ensures it can run the command and works out what would need to change but DOESN’T at this point change anything. Instead, it builds up the event or events that need to be applied to actually change its state. It then applies them to itself. This part is crucial and is what allows ‘events’ to be re-run in the future. The command phase can be thought of as the behaviour and the apply phase is the state transition.

5. The command handler requests the changes

At this point, assuming no exceptions have been raised, the command handler requests the uncommitted changes. Note that no persistence has actually taken place yet. The domain classes have no dependencies on external services. This makes them much easier to write and ensures they are not polluted by persistence requirements (I’m looking at you Entity Framework).

6. The command handler requests persistence of the uncommitted events

Here is when an event storage service comes into play. Its responsibility is to persist the events and also to ensure that no concurrency conflicts occur. You can read up on how to do this on a previous post of mine: How to handle concurrency issues in a CQRS Event Sourced system.

7. The events are published onto the Bus or Message Router

Unlike commands which only trigger 1 command handler, events can be routed to multiple de-normalisers. This enables you to build up very flexible optimised read models.

It’s important to note that events represent something that HAS happened. They should be named accordingly. Using CQRS gives the developer the opportunity to name things like commands and events in such a way as to be understandable to a domain expert. If you want to know more about how to name events well then check out this post: 6 Code Smells with your CQRS Events – and How to Avoid Them

8. De-normalisers build up the Read Model

The concept of a de-normaliser can at first be a little tricky. The problem is that we are all trained to think in ‘entities’, ‘models’ or ‘tables’. Generally, these are derived from normalised data and glued together into the form required for the front end. This process often involves complex joins views and other database query techniques. A de-normaliser on the hand translates certain events into the perfect form required for the various screens in your system. No joins required at all, ever! This makes reads, very fast and is the basis behind the claim that this style architecture is, almost, linearly scalable.

Most people begin to get twitchy at this point when they realise that duplicate data may exist in the read model. The important thing to remember is that the ‘event stream’ is the only source of truth and there is no (or should be no) accidental duplication within it. This allows you to re-create the entire read model or just parts of it, at will.

9. Data Transfer Objects are persisted to the Read Model

The final phase of the de-normaliser is to persist the simple DTO’s (data transfer objects) to the database. These objects and essentially property buckets and usually contain the ID of the aggregate they are associated with and a version number to aid in concurrency checking. These DTO’s provide the information the user requires, in order to form new commands and start the cycle over again.

All this results in a Highly Optimised Read Side Model

The read/query side is entirely independent of the commands and events, hence CQRS (Command Query Responsibility Segregation). The query side of the application is designed to issue queries against the read model for DTO’s. This process is made entirely trivial due to the de-normalisation of the read data. The models serve different clearly defined purposes. This clear command query responsibility segregation creates compartments within your code. Each compartment is easier and simpler to work with for having been separated out. You just need to think in terms of the command query loop rather than DTO’s and object-relational mapping.

A. User requests data

All the data is optimised for reading. This makes querying very simple. If you require values for a ‘type-ahead drop-down list’, just get the data from an optimised list designed especially for the task. No extra data need be supplied apart from that required to drive the dropdown. The helps keep the weight of the data payload light which in turn helps the application remain responsive to the user.

B. Simple Data Transfer Objects

The read model just returns simple and slim DTO’s that are, as I said before easy to work with on the front end. There is however a question over whether there should be a one to one read model for every screen. In my experience, this isn’t necessary and can create a lot of duplicate code. I found it easier to create some related models which contain the data commonly needed. I found a great deal of reuse which saved time. Unfortunately, there isn’t a clear rule of thumb. This is something you will get better at with experience.

In Conclusion

CQRS’s biggest hurdle is it’s perceived complexity. Don’t be fooled by all the steps above. Unlike a ‘simple CRUD’ approach which starts off simple but quickly gains in complexity over time. This approach remains relatively resistant to increased complexity in the scope of the application.

About the Author Daniel

I'm a professional software engineer of near on 15 years. Lucky enough to work for a small but rapidly growing company in London called Redington. They have given me the technical freedom to learn some cutting edge technologies like CQRS and Event Sourcing. Now I'm sharing what I learn here.

follow me on:
  • Juning Wang says:

    Great read in combination with the previous post on concurrency issue handling in CQRS.

  • Aman Verma says:

    Great article !!!
    I am working on CQRS+ES based project based on Microservices Architecture. This post helped a lot understanding CQRS. Would have been great with a coding example.

  • Denis says:

    Thank you very much for those articles. They really help to understand how things work. Lacking some code samples sometimes. For basic example of transformation to DTO. I couldn’t understand the idea behind persistence part. You persist only the diff of previous and new state of AR into events store? Or you save the actions which should be applied to previous state of AR and the actual received payload from the UI?

    • Daniel says:

      There two different persistence concepts at play here. The source of truth is derived from the event store. This is a store of things that have happened to the domain. The other persistence mechanism is the read model which is essentially a read optimised snapshot of the current state.

  • Denis says:

    Another question is about domain logic. It should be implemented in command handlers or in AR?

    Last question is about read model regeneration. If read database is pretty big and downtime is not a solution, how would you regenerate read model in this case and avoid out-of-sync issue?

    By the way, would be nice to see explanation on failed situations, how to behave if such situations take place at any point of this process.

    How to detect that read models are out of sync?
    As I understand this is not the issue of the pattern, but infrastructure problem instead. Fault tolerance is very important in this pattern as I can see.

    Thank you.

    • Daniel says:

      Domain logic should be done within the AR. There are sometimes questions over what constitutes domain logic. For example, when validating input. I’ve written an article about this which you can find here: How To Validate Commands in a CQRS Application

      Regarding read model regeneration. The only time you may want to regenerate a read model is during some major upgrade or planned downtime. A read model is just like a normal database just much simpler. No need for joins or unions etc.

      Regarding detection of out of sync read models. Ever read model update should be successful. If not then the model is out of sync. You can also detect it via discrepancies with the version number of the domain models and equivalent read models. Although this is trickier as it may be out of sync because multiple users are updating certain specific data.

  • Silviu says:

    In some picture explaining eventstorming, beside the events which are trigger in the AR, because an user trigger a command for the UI, there’s also events that appears because time policies or external system. My question is where should those events be trigged? Still AR or something else?

    • Daniel says:

      Events triggered by time policies are an interesting use case. If they are from within your application, could or should they be commands and therefore handled as such? Or are they triggered from within an AR, in which case they would be handled as usual.

      External events are a different matter. I would be tempted to borrow from the idea of an anti-corruption layer. Usually used to translate from one bounded context into another but in this case to translate the external event into something which makes sense for your system. This would give you a degree of protection if the signature of the event changed at some point in the future.

  • Kris-I says:

    Hello. Nice explanation, the bonus should be a sample project.

  • Igor says:

    Very bad explained. I have very good knowledge of both, but when I’ve when through as reference, I don’t understand nothing.

  • Steven says:

    Hi, I’m interested on getting your thoughts on the number of read models per screen. I often hear “one model / view per screen”, and in simple cases (and most examples) that is the case.

    But there can be more complicated screens. Say a screen that displays an order, and also displays a list of delivery companies that can service the order. Would you:

    – Create two read models, say OrderDetailsView and DeliveryCompaniesView and call two queries when populating the UI: GetOrderDetailsQuery and GetAllDeliveryCompaniesQuery.

    – Or, create a single query and read model based on the screens purpose ie. GetAssignDeliveryCompanyToOrderQuery and AssignDeliveryToOrderView, that contains both the order details and delivery companies. I lean towards the first but would be interested in you thoughts.

  • Shreyas says:

    Hey Danial,
    Great post BTW.
    I just wanted to ask do you have any repo of demonstrating this in a more practical way? Because at last we need to write code, so I just wanted to know that is there any repo that I can refer to while reading the blog post and then I think it will be more understandable.

  • Tarun Arora says:

    How do you handle the effect of replay of events on the Read Model in case of rebuilding of Aggregate Root?

    • Daniel says:

      Good question. Only new events are published. So when rebuilding an aggregate these are loading from history. When a command is then issued, new events are created as a reaction to that command. Only those events are then published.

  • >