It’s no secret that the ultra-low latency, highly scalable, distributed Apache Kafka data streaming platform has ushered in a new era of real-time data integration, processing and analytics. With Kafka, enterprises can address new advanced analytics use cases and extract more value from more data. They are implementing Kafka, often based on the Confluent platform, and streaming alternatives such as Amazon Kinesis and Azure Event Hub to enable data lake streaming ingestion, complex message queues with many data end points, microservices data-sharing, and pre-processing for machine learning.
As with any new technology, the devil of complexity lies in the detail of implementation, particularly when it comes to database sources. Architects and DBAs struggle with the scripting and complexity of publishing database transactions to Kafka and streaming environments. Talented programmers must individually and manually configure data producers and data type conversions. They often cannot easily integrate source metadata and schema changes.
Here at Attunity, we’re partnering closely with some of the world’s largest organizations to apply solutions to these challenges. Attunity Replicate provides a simple, real-time and universal solution for converting production databases into live data streams. Architects and DBAs can easily configure any major database to publish to Kafka and other streaming systems, flexibly supporting one-to-many scenarios, automated data type mapping and comprehensive metadata integration.
First let’s step back a bit and consider the motivations to implement Kafka and streaming in the first place.
Drivers for Modern Data Streaming
Lines of business are exerting significant pressure on enterprise IT organizations to address critical requirements on the following dimensions.
- Real-time event response: Business events, ranging from customer purchases to supply chain checks to IoT equipment breakdowns, increasingly demand immediate attention from a revenue and cost management perspective. Latency SLAs are approaching zero.
- Data distribution: Both analytics users and use cases are rising, which creates a compounding need to process multiple, often overlapping data sets on distinct and specialized platforms. Data volumes are climbing in tandem.
- Parallel consumption: In a similar vein, multiple parties often need to process copies of the same data for different uses.
- Asynchronous communication: Transactional data needs to be captured and readily available upon creation, but applications need to be able to consume it at their own pace.
- Service modularity: A key principle of increasingly-common microservices architectures is that one service does not depend on another. This means that Service A must be able to use data created by Service B without imposing workload or uptime requirements on B.
These pain points underscore the value of a highly distributed, highly scalable streaming platform that asynchronously collects, buffers, and disseminates data records (also known as messages). The producers and consumers of these records should be independent, working with an intermediate broker rather than one another. And multiple consumers should each process their own record streams as they need it.
The Kafka “publish-subscribe” system enables “producers,” such as operational databases, social media platforms and IoT sensor-based systems, to send data records to brokers. The brokers, usually grouped into clusters for redundancy, persist these records to disk-based file systems and provide them to “consumers” such as Spark-based analytics engines upon request. Messages are grouped into topics (also known as streams) for targeted consumption. Topics can be partitioned to improve throughput via parallel reading and writing. With database producers, one transaction typically becomes one Kafka record.
High-level Kafka Architecture
The most common Kafka use cases for transactional database streaming are message queueing and streaming ingestion. Enterprise data teams often use Kafka as a message queue because it can create and manage many granularly defined topics between many sources and targets. This improves their ability to support rapidly-proliferating business requests for targeted views of the business. In the closely-related streaming ingestion use case, database transaction topics are landed in real-time to consumers such as data lakes for incremental processing.
Attunity Replicate for Transactional Database Streaming
Now we come to the role of Attunity.
DBAs and architects use Attunity Replicate to convert databases – a treasure trove of potential data insights – into streams. What does this mean? You can automate the process for configuring databases to publish to Kafka, Amazon Kinesis or Azure Event Hub and thereby address use cases such as real-time analytics, message queues and streaming ingestion. You can use the drag and drop Attunity Replicate interface to create a new target endpoint such as the Confluent Kafka-based platform, define the broker server, then browse the Confluent environment to select one or more topics. You can design, execute and monitor this task along with hundreds of other data flows through the enterprise-wide Attunity Enterprise Manager. Attunity Replicate also flexibly provides the ability to rename schemas or tables, add or drop columns from the producer definition, and filter records that are published to the topic stream.
And this happens with minimal impact on production workloads. Attunity Replicate change data capture technology remotely scans transaction logs to identify and replicate source updates while placing minimal load on source production databases. Row inserts, updates and deletes, as well as schema changes, all become records in the live transaction stream to the Kafka broker. By capturing only incremental changes, Attunity Replicate CDC reduces the bandwidth requirements of data transfer, which is especially useful for publication to cloud-based streaming systems.
Attunity Replicate for the Streaming Ecosystem
Check out our whitepaper, Streaming Operational Data to Cloud Data Lakes to learn more!