DI Architecture & Components
High-Level DI Architecture
Components
There are several functional components which comprise the DI layer. All serve ongoing DI operations as part of the data integration layer of the Smart DIH platform.
The table below summarizes all the key DI layer components.
Name | Purpose | Details |
---|---|---|
CDC (IIIDR) Source Database Agent | Captures the changes from the source System of Records. For example Oracle, DB2, MSSQL. |
IIDR Source database agent is a java application that usually installed on a source database server. IIDR Agent captures changes from the source database transaction log files in real time IIDR Oracle agent Port: 11001 IIDR Db2 zos agent Port: 11801 IIDR Db2 AS-400 agent Port: 11111 IIDR MSSQL agent Port: 10501 |
IIDR Target Kafka Agent | Writes the changes captured by IIDR source agent to Kafka. |
IIDR Kafka Agent is a Java application that runs on the Linux machine and writes changes captured by the IIDR source agent to Kafka. Port:11710 |
IIDR Access Server | IIDR administration service and Metadata Manager |
IIDR Access Server is responsible for creating all logical IIDR entities and objects such as subscriptions and data stores. All metadata is stored in the internal IIDR database (Pointbase) IIDR AS Port: 10101 |
DI Manager | This is the primary interface which controls all DI components |
Web service, exposes REST APIs to: 1) Create pipeline and source db connection 2) Stop/ start pipeline 3) Other administration tasks Port: 6080 |
DI MDM (Metadata Manager) | Stores and retrieves metadata in Zookeeper |
Web service, expose REST API Communicates with DI Manager Stores and retrieves metadata that is essential for a DI operation: 1) Data dictionary about tables, columns and indexes 2) Pipeline configuration 3) Other important metadata records that are required for ongoing DI operations Port: 6081 |
DI Processor | Java library run by Flink as a job. It is responsible for writing changes to the space. | Java library , deployed to Flink and invoked as a Flink job. Main responsibility to read messages from Kafka , perform a transformation from a Kafka message into a space document and write this change into the space relevant object. |
Zookeeper (ZK) | Serves as a persistent data store for DI components. Serves as a ZK that is required by Kafka. |
ZK runs on 3 nodes for H/A purposes. ZK data is replicated between all nodes. Port: 2181 |
Kafka | Serves as a streaming processing platform. |
Kafka is deployed in a cluster of 3 nodes when it uses ZK is its dependency. IIDR publishes changes to the Kafka topic and theDI Processor (Flink job) consumes these messages and writes changes to Space. Kafka Port: 9092 |
High-Level Data Flow
DI Subscription Manager
DI Subscription Manager is a web service that exposes a set of APIs on a port. Its unified API has control over CDC components. Only CDC components are in direct contact with the SoR.
DI Subscription Manager is a micro-service that is responsible for providing the following functionality:
1. Unified API that controls various CDC engines to implement the GigaSpaces pluggable connector vision. It creates and updates IIDR entities.
-
Defines CDC flows and entities. Defines a new subscription.
-
Start / Stop subscription data flow via IIDR
-
Monitors the status of the IIDR components
2. Unified method to extract data dictionary from various sources, such as the CDC engine , source database , schema registry or enterprise data catalog and populate the DI data dictionary internal repository (MDM)
3. Data dictionary extraction from the IIDR.
-
Significantly simplifies DI operations
-
Only IIDR components connect to the source database
-
There is a unified data dictionary extraction, regardless of the source database type