Debezium

Background

Change Data Capture (CDC) systems can be complex and time-consuming to implement and deploy. However, they have the potential to be very useful in a variety of situations. So community developers have developed a number of open-source tools to simplify the process of implementing CDC systems. Debezium is one such tool that is designed to work with a variety of database management systems (DBMSs) and is written in Java.

What is Debezium?

Debezium is an open source distributed platform for CDC. Debezium can point at your databases, and your applications can start responding to all of the inserts, updates and deletes that other applications commit to your databases. (Debezium Community 2021).

To understand exactly what debezium does, let's consider an example. Suppose you have a database called customers that has a table so named in it. One of the features of your application is to produce a daily report of the top 500 customers. This means that, ideally, you would like to generate a report of the customers table that includes record additions, modifications, and deletions.

As you can imagine, implementing these changes manually on a daily basis can quickly become expensive, and the process can be error-prone.

Debezium comes handy at this stage because it pushes changes out to your application as they happen. In other words, it streamlines the process of monitoring your database and producing your desired report.

Connectors

Connector Basics

Debezium's architecture is built around the concept of connectors. In data engineering, a connector is a process that moves data from one database to another. These processes may allow for filtering data, transforming it into a desired structure, or updating it for the purposes of analysis in a similar way as it happens in CDC.

Debezium is a library of connectors that capture changes from a variety of database management systems making it easier for your applications to consume and respond to the events regardless of where the changes originated. (Debezium Community 2021.). Each connector is designed for a specific type of database by taking advantage of that specific database feature for CDC.

Supported Connectors

Currently, Debezium supports connectors for the following databases that you are already familiar with:

Connector Capabilities

Most of the above notes aon connector capabilites comes from (Debezium Community 2021.)

MySQL Connector

Debezium is quite useful at performing queries against a MySQL database. This is because MySQL has a binary log or binlog of all changes to database schemas and tables (Debezium Community 2021.). At scale, Debezium has the speed to keep up with client applications and stream the event logs in a timely manner.

For each event record, Debezium sends a message to the client application consisting of a payload and often the corresponding schema. Each message consists of the following four components:

Component Description
DDL The operation being performed (INSERT, SELECT, DROP, etc.)
Database Name The title of the database
POS The MySQL binlog location
Table Changes The updated data table after ops are performed

When a table is in capture mode, Debezium will also keep an additional history of table operations in an internal log.

Connector Change Events

A change event is an INSERT, UPDATE, or DELETE operation for which the message also consists of a key and value. The change event key consists of both a table key and a row key, while the value has an envelope structure nested within the payload.

Connector Further Reading

Networking for Debezium

In order to get the databases, which are usually containerized and/or clustered, to communicate with the Debezium connector, you will likely need to setup container networking. For docker see the section on networking

Practical Debezium Setup

Overview

In this example we'll create a customer database ready for debezium. This will involve creating a MySQL dockerfile, image, then container. Then a script mentioned in the dockerfile and built into an image, customer.sql will be used to initialize the database. The project structure should look something like this.

tree
.
├── Dockerfile
└── customer.sql

Initialize Database

CREATE DATABASE IF NOT EXISTS customerdb;
USE customerdb;
DROP TABLE IF EXISTS `customer`;
CREATE TABLE `customer` (
`id` int NOT NULL,
`fullname` varchar(255) DEFAULT NULL,
`email` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
);

Setup Spring Application

Using Java Spring Boot, since it is native to Java and hence source code compatible with Debezium, we can customize the functionality of our CDC application.

TODO Insert any new info about how to configure a spring app for debezium.

Create Docker Network

docker network create net-label

Create Dockerfile

First create a Dockerfile to setup, in this case a MySQL database.

FROM mysql:8.0
ENV MYSQL_DATABASE=customerdb \
MYSQL_ROOT_PASSWORD=myNewPass

ADD customer.sql /docker-entrypoint-initdb.d
EXPOSE 3306

Create Docker Image

Now build the image so it's ready to be deployed as a container. In this case with a tag of mysql-label.

docker build -t mysql-label .

Create Docker Container

Now to run it all as a container.

docker run --rm \
--name mysql-label \
--network net-label \
--port 3306:3306 \
-d mysql-label

Run the Application within the Container

TODO Insert any new info about how to configure a spring app for debezium.

References

Web Links

Note Links

Footnotes