ATTUNITY FOR DATA LAKES
Making Transactional Data Available for Analytics at the Speed of Change
Automate your data pipeline and increase the value of your Data Lake by delivering timely, high-quality and well-governed transactional data to the business.
Data Lake Problems
Data lake projects often fail to produce a return on investment because they are complex to implement, need specialized domain expertise and take months or even years to roll out.
As a result, data engineers waste time on ad-hoc data set generation and data scientists not only lack of confidence in the provenance of the data, but struggle to derive insights from outdated information.
Automate Your Analytics-Ready Data Pipeline
The Attunity solution for Data Lakes automates the creation and deployment of pipelines that help data engineers successfully deliver a return on their existing data lake investments.
With Attunity’s no-code approach, data professionals implement pipelines in days, not months, ensuring the fastest time to insight for accurate and governed transactional data.
Multi-zone Data Lake
Data Lake failures often use a single zone for data ingest, query and analysis. Attunity for Data Lakes mitigates this problem by promoting a multi-zone, best-practice approach.
- Landing Zone – Raw data is continually ingested into data lake from a variety of data sources.
- Assemble Zone – Data is standardized, repartitioned and merged into a transformation-ready store.
- Provision Zone - Data engineers create enriched data subsets for consumption by data analysts or scientists.
Automated Data Pipeline
Realize faster value by automating data ingest, target schema creation, and continuous data updates to zones.
- Data Pipeline Designer – The point and click designer automatically generates transformation logic and pushes it to task engine for execution.
- Hive or Spark Task Engines – Run transformation tasks as a single, end-to-end process on either Hive or Spark engines.
- Historical Data Store - Standardizes and combines multiple change streams into a single historical data store ready for downstream processing.
- Data Set Provisioning – Easily create analytics-ready data subsets for analysts or further downstream processing.
- Multiple Export Formats – Data sets can be exported in several formats including ORC, AVRO and Parquet.
Continuous Streaming Data
Change data capture (CDC) technology delivers only committed changes made to your enterprise data sources to your data lake without imposing additional overhead on the source system or data lake infrastructure.
- Universal Connectivity - Supports all major data sources including relational databases, mainframes, SAP, streaming solutions, enterprise data warehouses, Big Data technologies, and cloud infrastructures such as Amazon Web Services, Microsoft Azure and the Google Cloud Platform.
- No Coding, Simple GUI - Use an intuitive interface to quickly and easily configure data feeds.
High Performance and Scalable - Ingest data at high speeds with near linear scalability from hundreds, to thousands of data sources.
- Agentless Architecture - Log-based, agentless CDC reduces the burden of administration and eliminates the source system processing penalty.
- Real-time Data Updates - Continually ingest data with enterprise-class change data capture (CDC) that immediately delivers immediately, with virtually no latency.
Centralized Metadata Integration
Central metadata repository helps data engineers understand, utilize and trust data flows.
- Data Catalog - Automatically collects metadata from source and target systems.
- Data Profiling – A detailed summary and report of the data attributes in the data lake and pipeline.
- Data Lineage – Highlights data provenance and the downstream impact of data changes.
- Metadata Directory Interoperability – Synchronize metadata with leading metadata repositories such as Apache Atlas.
Enterprise-grade Administration and Management
The central command center helps you configure, execute and monitor data pipelines across the enterprise.
- Configurable Dashboard Views - Group tasks by server, data source or target, application or physical location.
- KPI Alerting - Monitor hundreds of dataflow tasks in real-time through KPIs and alerts.
Search and Filter – Gain insights by searching and filtering tasks by data replication status and system operation.
- Granular Access Control - Leverage role-based access control for policy-based management of user views and actions.
- Programmable Integration - Integrate with enterprise dashboards via REST and .NET interfaces.