Creating distributed Canonical Data Model
Canonical data model addresses complexities of integrating a source/provider system with multiple target/consumer systems. This practice will provide information on when canonical data model is required, how to construct it and the aspects that need to be considered while designing such a model.
Purpose
CDMs are a type of data model and design pattern that aims to present data entities and relationships in the simplest possible form in order to integrate processes across various sources to target systems and databases. As the data being exchanged across various systems are subject to rely on different languages, syntax, and protocols, the purpose of CDM is to provide an ideal solution for such a problem. It includes common system layer API or component to translate and transform data which communicates from any source to any target system. It is standardized with naming conventions.
Getting started
How to proceed with CDM?
Canonical or (common) data model defines common architecture for messages exchanged between applications or components. The canonical data model (CDM) defines business entities, attributes, associations, and semantics relevant to specific domains. It can be an independent application or middleware interface to suffice the need to maintain common nomenclature across enterprises, channels, and technologies.
A CDM approach can and should include any technology the enterprise uses, including enterprise service bus (ESB) and business process management (BPM) platforms, service-oriented architecture (SOA), and any range of more specific tools and applications.
The use of global naming conventions introduces enterprise-wide standards that need to be consistently used and enforced. Naming conventions are applied to CDM components as part of formal analysis and design processes. In its most extreme form, a canon approach would mean having something like service contract of one person, customer, order, product and so on with a set of IDs, attributes, and associations that the entire enterprise can be agreed upon.
Pre-CDM to CDM shift:
As we have observed there were some limitations in point-to-point strained ecosystem which leads to pain-points. For example, bloated code, memory footprint, high integration cost and slow delivery. Whereas CDMs has its own promises and challenges:
> A modest proposal
- Define a common model for all data communicated between systems
- Common recommendation in one form or SOA pattern for an enterprise integration analysis
- Analyzing data landscape, or service landscape
- Compliance, internal-audit, MDM, large-scale integration efforts
> API Design
- SOA governance: Enforce consistency with the canonical model (“canonical schema”)
- Tools: Compose message formats
> Integration
- Transform messages to or from canonical format
- Boundary translation to or from industry standards
Benefits of CDM shift:
- Number of possible connections is simplified as (nx2) against n (n-1). (If n is 4 or above, the number of connections will get reduced along with a common translator into the consideration)
- Increase reusability of common components
- Improve business communication through standardization
- Reduce integration transformation, time, and cost
- Decoupled m/w (like Mule ESB) platform enables creation of pluggable component or module to help switch to various sources or target systems
CDM with SOA and microservices:
The CDM design-pattern is fully supported by the standardized service contract design principle advocating each source to target or vice-versa communication must be based on standardized data models. As CDM essentially backs SOA concept, the use of a standardized data model decouples applications by exposing reusable services.
Typically, microservices architecture style (MSA) do not require CDM because each microservice is a self-contained executable unit and promotes the isolated data model. But the data model owned by each microservice is part of a distributed canonical data model where a group of microservices used for one large business capabilities are within one subdomain.
How to use this artifacts in an engagement
For any engagement, if m/w interfaces or integration is identified for data transmission, this practice will help understand the required functional (business scenario) and nonfunctional requirements (solution) for the real-tile use of canonical data model or common data model (CDM).
Couple different case scenarios about the use of CDM:
1) Legacy to modernization coexistence model — Using domain-driven-design principle of bounded-context concept
Business scenario: There is one attribute in legacy called system ID and the counterpart of this attribute in newer projected application is ID. Now, data needs to be transported from legacy to new application during the phase of migration which is temporary.
Solution: Two different systems are free to choose its own standard but for legacy migration, an intermediate anti-corruption layer (ACL) pattern to translate both the data formats is required. This will only refer to map both the attributes and transform it in the middle to have seamless communication.
2) Enterprise IT ecosystem must have middleware tool to communicate and exchange any data from or to the client-facing application(s) or to from backend application(s).
Business scenario: Two different systems need to have one intermediate level system to mediate the data translation, transformation, and communication process due to critical business demands. In this case, each attribute of both the systems needs to be mapped one-to-one. Considering the large-scale volume of data exchange, CDM is required to be implemented.
Solution: The use of enterprise service bus (ESB) for enterprise data interchange (EAI) can suffice the mapping of common data models. This way, any external domain application can communicate to the CDM over ESB followed by the target system and vice-versa.