The nice thing about programming is that everything is black and white.
When designing new features, we all go through a process of discovering requirements. If you are a good developer, you will start to ask questions and probe deeper to find the edge cases. Why? Because you know the edge cases can often take up 80% of the development time. But is that where 80% of the value is?
Just the other day I came across a question on the DDD/CQRS google group https://groups.google.com/forum/#!topic/dddcqrs/0VGltTIa1lM. The author was building a voting system. Users could vote up or down a comment, could change their vote but could only vote once per comment. At 5000 votes the voting would automatically close. His question was centred around the problem of set or inter aggregate validation. Specifically, at 4999 votes, how should he close out the system if 2 people voted at the same time.
If you are new to DDD you may have been surprised by some of the answers. Rather than advising the questioner to use some advanced locking/transaction strategies, the group started to focus in on what bad things would happen if it closed at 5001. In other words, they were more interested in business value, rather than strict adherence to a requirement. All too often we see requirements as hard and fast rules that the system must conform to at all costs. When in fact, if you step back and look at how business people operate, they often have to work around problems. While they specify 5000 as the max, it may be immaterial to them if it reaches 5001 or even 5010, so long as it does close.
I read another example of this way of thinking in a blog (I would link to it but I can no longer find it). An online retailer hired the author, as a consultant. He lead a team building a real time product suggestion engine. They had seen Amazon’s “Users who bought X also bought Y” and wanted it for their store. Realising the potential cost and complexity of running this kind of thing ‘in real time’, he dug a little deeper. He discovered that ‘real time’ could be data that was 24 hours old. The business valued relevant product suggestions rather than suggestion based on real time data. Running a batch job over night was going to be just a bit quicker to build and deploy and a probably save a £ or five in the process. I think he coined the phrase “feature injection” to describe this idea. Find the value behind the feature/story and offer options that deliver the value but are not necessarily the feature requested.
One of the great advantages of event sourcing and eventual consistency is that you can decide how strictly you need to adhere to the requirements. In other words you focus on delivering value, rather than features in isolation of value. In the voting example above maybe the 5000 limit is a soft limit. In which case you close off the voting system when the vote count reaches or exceeds 5000. Maybe the option of a high performance system with a small chance of exceeding the limit is more valuable than a slower approach that guarantees never to exceed 5000. If using event sourcing, the business (or a process manager) could then issue compensating commands/events to repair the problem (just like they do in the real world every day). As Greg Young put it “Consistency is over-rated.”
But what if we do have to deal with consistency. So given this kind of scenario, what options are there to resolve this kind of set based issue.
Locking, transactions and database constraints are old tried and tested tools for maintaining data integrity, but they come at a cost. Often the code/system is difficult to scale and can be complex to write and maintain. But they have the advantage of being well understood with plenty of examples to learn from. By implication this approach is generally done using CRUD based operations. If you want to maintain the use of event sourcing then you can try a hybrid approach.
If you are using event sourcing and reliance on an eventually consistent read model is not an option. You can adopt a locking field approach. In our voting example, before you issue the command you check a locking table/field, usually a database for the voting count. If under the max, then increment it and carry that value forward with the command. If, when the operation is but complete the count still matches or is still valid, then the operation can complete. When checking things like email address uniqueness, you could use a lookup table. Reserve the address before issuing the command. For these sort of operations it is best to use a data store that isn’t eventually consistent and can guarantee the constraint (uniqueness in this case). Additional complexity is a clear downside of this approach, but less obviously is the problem of knowing when the operation is complete. Read side updates are often carried out on a different thread or process or even machine to the command and there could be many different operations happening.
To some this sounds like an oxymoron, however it is a rather neat idea. Inconsistent things happen in systems all the time. Event sourcing allows you to handle these inconsistencies. Rather than throwing an exception and loosing someone’s work, all in the name of data consistency, record the event and fix it later.
As an aside, how do you know a consistent database is consistent? It keeps no record of the failed operations users have tried to carry out. If I try to update a row in a table that has been updated since I read from it, then the chances are I’m going to loose that data. This gives the DBA an illusion of data consistency, but try to explain that to the exasperated user!
Accepting these things happen, and allowing the business to recover, can bring real competitive advantage. First, you can make the deliberate assumption these issues won’t occur, allowing you to deliver the system quicker/cheaper. Only if they do occur and only if it is of business value do you add features to compensate for the problem.
Lets take a simplistic example to illustrate how a change in perspective maybe all you need to resolve the issue. Essentially we have a problem checking for uniqueness or cardinality across aggregate roots because consistency is only enforced with the aggregate. An example could be a goal keeper in a football team. A goal keeper is a player. You can only have 1 goal keeper per team on the pitch at any one time. A data driven approach may have an ‘IsGoalKeeper’ flag on the player. If the goal keeper is sent off and a outfield player goes in goal, then you would need to remove the IsGoalKeeper flag from the goal keeper and add it to one of the outfield players. You would need constraints in place to ensure that assistant managers didn’t accidentally assign a different player resulting in 2 goal keepers. In this scenario, we could model the IsGoalKeeper property on the Team, OutFieldPlayers or Game aggregate. This way, maintaining the cardinality becomes trivial.
It is only when you dig deeper into a domain and question the requirements and specs, do you start to find the flex points. When you accept that sometimes, business rules are not rules but guidelines. When you strive to understand value behind a rule, only then can you actually focus your time on delivering value over completeness. I’m guessing most businesses would rather you delivered a solution in 3 months that met 80% of their needs, than 98% in 12 months.
I'm a professional software engineer of near on 15 years. Lucky enough to work for a small but rapidly growing company in London called Redington. They have given me the technical freedom to learn some cutting edge technologies like CQRS and Event Sourcing. Now I'm sharing what I learn here.