Data is creating massive waves of change and giving rise to a new data-driven economy that is only beginning. Organizations in all industries are changing their business models to monetize data, understanding that doing so is critical to competition and even survival. There is tremendous opportunity as applications, instrumented devices, and web traffic are throwing off reams of 1s and 0s, rich in analytics potential.
All this requires modern data integration. A foundational technology for modernizing your environment is change data capture (CDC) software, which enables continuous incremental replication by identifying and copying data updates as they take place. When designed and implemented effectively, CDC can meet today’s scalability, efficiency, real-time, and zero-impact requirements.
Without CDC, IT can fail to meet business analytics requirements. They must stop or slow production activities for batch runs, hurting efficiency and decreasing business opportunities. They cannot integrate enough data, fast enough, to meet analytics objectives. They lose business opportunities, lose customers, and break operational budgets.
But all CDC technologies are not created equal. To guide your approach to architectural planning and implementation, we have built the Replication Maturity Model. This summarizes the replication and change data capture (CDC) technologies available to the IT team and the impact on data management processes. Organizations and the technologies they use generally fall into the following four maturity levels: Basic, Opportunistic, Systematic, and Transformational. Although each level has advantages, IT can deliver the greatest advantage to the business at the Transformational level. What follows is a framework, adapted from the Gartner Maturity Model for Data and Analytics (ITScore for Data and Analytics, October 2017), which we believe can help you steadily advance to higher levels.
DATA REPLICATION MATURITY MODEL
At the Basic maturity level, organizations have not yet implemented CDC. A significant portion of organizations are still in this phase. Instead, they are using traditional, manual extract, transform, and load (ETL) tools and scripts or open source Sqoop software that replicates production data to analytics platforms via disruptive batch loads. These processes can vary by end point and require skilled ETL programmers to learn multiple processes and spend extra time configuring and reconfiguring replication tasks. Data silos may persist because most of these organizations lack the resources needed to integrate all of their data manually.
Such practices often are symptoms of larger issues that leave much analytics value unrealized, because the cost and effort of data integration limit both the number and the scope of analytics projects. Siloed teams often run ad hoc analytics initiatives that lack a single source of truth and strategic guidance from executives. To move from the Basic to Opportunistic level, IT department leaders need to recognize these limitations and commit the budget, training, and resources needed to use CDC replication software.
Level 2: Opportunistic
At the Opportunistic maturity level, enterprise IT departments have begun to implement basic CDC technologies. These often are manually configured tools that require software agents to be installed on production systems and capture source updates with unnecessary, disruptive triggers or queries. Because such tools still require resource-intensive and inflexible ETL programming that varies by platform type, efficiency suffers.
From a broader perspective, Level 2 IT departments often are also beginning to formalize their data management requirements. Moving to Level 3 requires a clear executive mandate to overcome cultural and motivational barriers.
Level 3: Systematic
Systematic organizations are getting their data house in order. IT departments in this phase implement automated CDC solutions such as Attunity Replicate that require no disruptive agents on source systems. These solutions enable uniform data integration procedures across more platforms, breaking silos while minimizing skill and labor requirements with a“self-service”approach. Data architects rather than specialized ETL programmers can efficiently perform high-scale data integration, ideally through a consolidated enterprise console and with no manual scripting. In many cases, they also can integrate full-load replication and CDC processes into larger IT management frameworks using REST or other APIs. For example, administrators can invoke and execute Attunity Replicate tasks from workload automation solutions.
IT teams at this level often have clear executive guidance and sponsorship in the form of a crisp corporate data strategy. Leadership is beginning to use data analytics as a competitive differentiator. Examples from Chapter 4 of the Streaming Change Data Capture ebook include the case studies for Suppertime and USave, which have taken systematic, data-driven approaches to improving operational efficiency. StartupBackers (case study 3) is similarly systematic in its data consolidation efforts to enable new analytics insights. Another example is illustrated in case study 4, Nest Egg, whose ambitious campaign to run all transactional records through a coordinated Amazon Web Services (AWS) cloud data flow is enabling an efficient, high-scale microservices environment.
Level 4: Transformational
Organizations reaching the Transformational level are automating additional segments of data pipelines to accelerate data readiness for analytics. For example, they might use data warehouse automation software to streamline the creation, management, and updates of data warehouse and data mart environments. They also might be automating the creation, structuring, and continuous updates of data stores within data lakes. Attunity Compose for Hive provides these capabilities for Hive data stores so that datasets compliant with ACID (atomicity, consistency, isolation, durability) can be structured rapidly in what are effectively SQL-like data warehouses on top of Hadoop.
We find that leaders within Transformational organizations are often devising creative strategies to reinvent their businesses with analytics. They seek to become truly data-driven. GetWell (case study 1 in Chapter 4) is an example of a transformational organization. By applying the very latest technologies — machine learning, and so on—to large data volumes, it is reinventing its offerings to greatly improve the quality of care for millions of patients.
So why not deploy Level 3 or Level 4 solutions and call it a day? Applying a consistent, nondisruptive and fully automated CDC process to various end points certainly improves efficiency, enables real-time analytics, and yields other benefits. However, the technology will take you only so far. We find that the most effective IT teams achieve the greatest efficiency, scalability, and analytics value when they are aligned with a C-level strategy to eliminate data silos, and guide and even transform their business with data-driven decisions.
This blog was adapted from Streaming Change Data Capture: A Foundation for Modern Data Architectures book, O’Reilly, 2018.