The content reference here like the pain points and the based architecture are purely imaginary. It is just to illustrate the solutions presented that are a mixture of past experiences and current applicable architecture and patterns. So that we will be able to reach to a conclusion on Event-driven Microservices Architecture with Saga pattern approach.
In this article you will read about
It is always a good idea to understand the problem first before coming up with a solution. We should not dive directly into implementation when we do not have a specific problem to solve with a popular architectural approach.
Imagine you are maintaining a monolith system and it works and delivers perfectly as expected, do you need to change it? The answer is NO for sure unless there are pain points that we want to solve. Then imagine if you have the following pain points:
“Users are growing rapidly over time that leads to frequent slowness and downtime of the whole system. It is too expensive to create multiple instances of this big application to mitigate the issue. It is also difficult to do operational work in the system at the same time as it will affect the user experience of our customers during working hours.”
What are the problems here?
Problem #1: Growing number of users leads to increased complexity
We might have the following current state of the system.
At first glance, the bottleneck is on the database. If you think of it further, and let’s say the result of the analysis is that there are lots of transactions on the database, then the solution will be to apply some cache strategy on the database.
There are different solutions where we can do some cache strategy on the data. For example, if the system uses hibernate, by default it is already using the first level cache but it is session specific. What we can do in our problem here is to enable the second level cache and query cache. There are different libraries that are available that supports second level cache in hibernate. Let’s take for example the Ehcache library. This is how it works.
Each instance of the application has its own second level cache area and each of these caches are communicating with each other for transferring updates of the data. Below are simple description of what they are communicating:
Great! We manage to minimized the database transactions but it doesn’t end there. More users means more requirements means more modification of the system to cater the client’s needs. That adds complexity in the system.
Problem #2: Scaling up the application is expensive
Creating a new instance of the application is not ideal. Not all of the modules in that application needs to be scaled out. At this point, we will need to look deeper into the application. Let’s say the application consist of the following modules or features:
And we would probably like to separate the frontend (UI) from the backend (services). For example, most of the processes that make the system slower or cause downtime are in the homepage module, below might be how the system will look like when we modularized them:
Okay, now you can scale up only a specific module or feature but the architecture now looks more complex and we are actually building a distributed monolith. A Distributed monolith has both the disadvantages of a Microservices and Monolithic architecture like:
Looking on the current state, you might want to go back to a single monolith application.
Problem #3: Isolating failures
Customers and system operations share the same system. But the operation team needs to schedule or minimize data processing into the system as it might cause the system to slow down or at worst unresponsive. Let’s say there is a need to register a large number of users (or a bulk registration) into the system, if we based on the last state of the system, the point of failure is the Backoffice service. Homepage and Basket modules will then be unresponsive as they need to call Backoffice for further processes, while at the same time the Backoffice service is busy processing a bulk of request.
There will be several root cause of failures here:
To solve the unresponsive database, you might go for having a dedicated database for each module. But this will still give problems on updating or retrieving information from another services. This is where we will need the Event Driven Architecture.
Benefit of Event Driven Architecture
It allows system to have loosely coupled services. It uses messages/events to communicate with other services. There are three key components in this architecture:
Let’s improve the system by applying this architectural approach:
We now have designed an Event-driven Microservice Architecture. But wait, it looks like we will face data consistency issue here among different services. Then, this is where we will need the Saga pattern.
What is Saga Pattern?
Saga Pattern is an approach in managing data consistency across loosely coupled services. A service has its own local transaction and a Saga is a sequence of local transactions. It is usually beneficial for a long running transaction.
There are two types of Saga Coordination mechanism: Choreography-based and Orchestration-based.
What are the differences between the two? Below is a simple comparison.
More of Saga Explanation…
If you want to know more about Saga pattern, you can check Chris Richardsontalk about "Using sagas to maintain data consistency in a microservice architecture". Also, check out his trainings here: http://www.chrisrichardson.net/training.html Subscribe now
Moving on, there are key points that needs to take note when using Saga pattern especially for Orchestration-based, and these are the following:
Saga Orchestrator
Saga Participants
Messaging Channels
Messages
Saga Reliability
Looking back…
It reminds me of working on a system of a financial service company where the system communicates to different banks and payment networks (MasterCard, Visa, etc.).
For example: When a customer pays in a terminal on a different bank that allows a certain payment network in the customer’s bank card, then it goes to this system which has a connection to different banks and payment networks. The transaction flow should only take within 3 seconds!
What I noticed with the system was that it has the following:
Let’s go back to our problem and try a simple use case. Let’s say a customer would like to register in the system. In the registration process, the customer access the registration page and submitted the information (backoffice service), and then expected to redirect to the homepage (homepage service) in which the user is logged-in and its session is already associated to a basket (basket service), below will be the sequential diagram of the transaction flow (given with 3 retries on the UI for checking the state of a saga before redirecting to the homepage).
In the example above, there are 2 types of Saga Orchestrator:
CustomerSagaOrchestrator (with Orange color) → responsible for making sure that the needed information on customer registration are setup across the system
SessionSagaOrchestrator (with Blue color) → responsible for making sure that the sessionkey is spread across the system especially those services that are publicly available
The CustomerSagaOrchestrator belongs to the Backoffice service. When it receives a registration request, it triggers the CustomerSaga and then it (1:CustomerSagaOrchestrator) create the Customer data, which is a local transaction, and then send a command to the Homepage service to request an Account creation. Since the sending of command is done asynchronously via point-to-point queue to the Homepage service, Backoffice service can now then respond to the UI. The UI will just have to wait for the Saga to be completed by polling to the Backoffice service for the customer creation status.
On the Homepage service on the other hand, it receives a command for an Account creation. It will (2:CustomerSagaOrchestrator) create an Account and then sends a reply event back to Backoffice service, asynchronously, to inform that the Account was successfully created.
Back to the Backoffice service, at the background, it receives the reply event that the Account creation was successful and now it can proceed to the next step where it sends a command to the Homepage service again. This time requesting for Session creation.
Asynchronously, Homepage service receives this command to create a Session. Since the Session creation is associated to another saga, it will trigger the SessionSaga and then (3:CustomerSagaOrchestrator, 1:SessionSagaOrchestrator) creates a Session data, which is a local transaction for the SessionSaga. While another saga is triggered, CustomerSaga will just have to wait of the the SessionSaga to be completed.
To continue on the SessionSaga, Homepage service will send a command to the Basket service requesting for a Basket creation. Still asynchronously, Basket service receives this command and (2:SessionSagaOrchestrator) creates the Basket data. Then it will send a reply event back to the Homepage service to inform that it was successfully created.
Back in Homepage service, it receives the success reply event from Basket service and so it will proceed to the next step where it (3:SessionSagaOrchestrator) updates the status of the Session that the creation has completed. As the Session creation was completed and there is an associated Saga waiting for its completion, it then send back a success reply event to Backoffice service indicating that the Session creation was successful.
On the Backoffice side, it receives the reply event and can now proceed to the last step of its saga which is to (4:CustomerSagaOrchestrator) update the Customer creation status to completed.
In summary, each saga has its own steps and destination which can be presented below.
Moving Further with Saga
Taking our very detailed sequence diagram of the customer registration process above, we can simplify it by focusing only the affected data.
In the simplified diagram above, it is pretty much clear that we can extract the Saga Orchestrator into its dedicated service. It will expose an API in which it receives an “execute” command to start the Saga, with an exception that the first step is already done from the source. Then it will follow every steps in it, sending commands to the participants, one at a time. The last step will then send a reply back to the originator of the “execute” command.
Let’s update the diagram, in which, we will convert each data above as a dedicated API for each and apply the necessary commands and events in the communication.
In the diagram above, we have clearly identified the communication between different APIs, grouped by domains and saga orchestrators, to ensure data consistency across different services. We also have identified which API we need to expose for the Saga Orchestrator API.
The benefit of extracting the Saga Orchestrator as a separate API is that it is easier to maintain and understand the flow. For example, if we need to add more steps in our Saga Orchestrator, given that a new saga participant has an existing API and enough information is already available inside Saga Orchestrator API, then we only just have to manage it inside the Saga Orchestrator without affecting the source and participants.
Our Final Architecture
Considering the Saga Orchestrator APIs that was discovered above, we can now update the high level architecture view of our system and below will be the result.
Having this kind of Architecture, with modularized API and are loosely coupled, we are able to move forward by improving the development team organization and processes. If you are doing an Agile development process using Scrum, then each Scrum Team are loosely coupled, who will then be responsible for a certain module or group of modules.
From Monolith Architecture to an Event-driven Microservice Architecture, we managed to design a loosely coupled services and also maintained data consistency among them. It was not an easy journey but at least we were able to conclude that the best solution for the problem is switching to Microservices Architecture.
I hope you find this useful. Please note that the Saga Orchestration sample presented here are mainly focused on Retriable Transactions. There are more to consider on the Orchestration-based Saga Pattern like dealing with Compensating and Pivot Transactions in which we can tackle that next time.
Who writes here?
I'm working as a Software Developer with Mercateo. The team I belong to maintains the Legacy System on the procurement system side and at the same time works on the new project to improve the user interface in that system. I am passionate about EDA topics. Mercateo supports trainings to expand my knowledge in this area and this is what I like most about working here.
Feona May Samson