Three Data Models - for Microservices
Monoliths are easy to develop but lack boundary enforcement which makes it nearly impossible to keep code modular. As the number of components grows, module interdependencies snowball, the speed of development decreases, and costs of building software skyrockets.
Organizations turn to Microservices to reduce cross cutting concerns, gain modularity, and accelerate time to market. In theory, Microservices bring modularity to business logic and data, however our research shows organizations gain business logic separation and only partial data separation.
In practice, organizations use Three Data Models in their transition to Microservices:
Before we evaluate the strengths and weaknesses for these three models, we need to take a look at domains. A domain gives each service a bounded context, a definition of its purpose in the world and a data model to describes it. Eric Evans coined the term Domain Driven Design where it tackles this topic at length.
The core takeaway is that Microservices should be built around domain driven bounded context.
A Shared Data Model describes Microservices that implement domain centric business logic, communicate over the network, and share the same database. This model is often used as a first step towards Microservices or as an intermediate step during a Monolith decomposition.
Shared Data Model is easy to adopt as it has the same data access characteristics of its older brother, the Monolith. The main advantages of this model are:
- Encapsulation for domain specific business logic
- Service functionality definition through an API
- Same data handling code for transactions, joins, etc.
While not the end goal, it is a reasonable first step in the journey to Microservices.
Data is still Monolithic which preserves data coupling leading to the following challenges:
- No clear boundaries for data ownership
- Data format changes require cross team coordination
- Open for abuse, different teams can override same data
Shared Data Model achieves some business logic separation, but favors data access simplicity over data segregation. This model is often an intermediate step towards fully segregated Microservices. It may also be suitable for small Apps.
Distributed Data Model describes Microservices that own both, the business logic and data for its bounded context. Data shared with other services is accessible exclusively through service APIs.
Microservices using a Distributed Data Model gain clean bounded context which has the following advantages:
- Same team is responsible for the business logic, data, and the API interface
- Data access is exposed through APIs
- Internal data changes are decoupled from the API
When Microservices share data over the network through service APIs they face distributed data challenges:
- Service availability impacts data availability
- Large data sets are difficult to optimize
- Distributed transactions require special handling
- Data consistency requires special handling
These challenges are described at length in the following blog “…”.
While Distributed Data Model provides clean bounded context, it exposes a series of distributed data challenge that are addressed below.
- Data is not shared with its own api
- Data communicates over Pub/Sub asynchronous pattern.
- We communicate events to solve data consistency problem.
Microservices using Event Driven Data Model use databases for internal state but communicate through events over publisher/subscriber streams. Pub/Sub communication mechanism decouples data availability from the service availability, whereas events express an activity at a moment in time. These two mechanisms combined are known as Event Streams.
Event Streaming is a powerful infrastructure layer that enables services to scale horizontally, exchange information in real time and solve many of the challenges in the Distributed Data Model.
For addition insight in the power of Event Streaming checkout our blog at “…”.
Microservices that use an Event Driven Data Model have the following advantages:
- Decouples data availability from service availability
- Captures domain behavior and business intent
- Clean separation for internal and external data (see Pat Helland)
- Suitable for gradual migration of legacy systems (see Martin Fowler)
- Suitable for real-time services
- Built-in facility for auditing and playback
An Event Driven Data Model is a powerful infrastructure that brings along many moving parts which can lead to accidental complexity if not properly addressed:
- Event format and versioning
- Event tracing across services
- Event integration with legacy systems
- Consistent behavior across all services, languages in the app
- Consistent transaction & compensation management
- Multi-service data correlation
Despite ample evidence that Event Driven Data Model is a future proof model for building scalable Microservices, adoption has been slow. There is a lack of off-the-shelf technologies to help organizations roll out and operate an Event Driven Data Model for Microservices.
This observation inspired us to build Fluvio, an off-the-shelf solution to simplify the adoption of the Event Driven Data Model to deploy, manage, and monitor large scale Microservices Apps.
Distributed Data Infrastructure (DDI) is an open source, standards-based, language agnostic software that glues services to distributed data.
For example, when a transaction is required, the Model Interpreter builds the state machines, the Data Flow Engine applies the SAGA, and the Event Streaming Engine sends the events. If one or more components fail, the engine calls the compensation state machine.
A detail definition on all aspects of DDI can be found in the next chapter.
A Distributed Data Infrastructure allows Microservices to outsource data related concerns which has several the following benefits:
- One consistent distributed data implementation for all services
- Improvements in the DDI layer benefits all services
- Compatible with services written in any programming language
- Easy to define, augment, or modify for multi-service App life cycle
- Centralized governance through the control plane
- Built-in transaction management
- Built-in tracing and monitoring
- Built-in versioning