Welcome Guest | Login | »

Tech-WhitePrints™

Data Movement - Architecture Essentials

Modified on Wed, 03 Oct 2018 12:45 by Biswajit Dash Categorized as blueprint, classical architecture, data architecture, whiteprint

Problem Statement

What is data-movement, and when does a system require this? What are the different architectural solutions available for such a system? How to determine the most appropriate solution for data-movement?


Solution Abstract

What? When? How?

What is data-movement? The act of moving a copy of data from it's current physical location(s) to different location(s), reliably and repeatedly.

When to use data-movement? A data-movement from source-system to target-system is needed if :-
  • the need is to provide a redundant copy of the data to the target-system;
  • the target-system is restricted from accessing the source-system data;
  • the systems have different data structure requirements;
  • the target-system requires - read-only access, or does not require updates made to it to persist;
  • the availability of data from source-system is no more relevant, e.g., a 8x5 availability becomes 24x7 resulting in no window for offline processing;
  • the network or platform stability of the source-system is not reliable; and/or
  • the network bandwidth is inadequate to support the real-time data access and performance needs.

How to determine the solution? This is a multi-step process, i.e., :-
  • identify the driving forces which warrant the need of a data-movement solution;
  • determine the high-level architectural approach that can be applied; and
  • finally determine design approach at a more precision level.

The Inner-workings

Any data-movement system is composed of three building blocks - AMW i.e., Acquire, Manipulate, Write. Each block implements the below.

  • (A) Acquire : The extraction of data from the source-system.
  • (M) Manipulate : The enrichment of data acquired from source-system.
  • (W) Write: Writing the acquired and manipulated data to the target-system.

Image

The exact nature or responsibility of each of these building blocks can led to two architectural solutions, i.e.,
  • Replication, and
  • ETL or Extract-Transform-Load.

Replication Pipe-line

Image

ETL Pipe-line

Image

Decision Matrix

With the basic data-movement understanding above, the below two bus-matrix can be used to derive the most applicable architecture and design approach. In the matrix - higher the number of crosses 'X' better aligned is the solution to the problem statement.

Image

Image

Glossary

MasterThe source data store which is to be copied. This is considered the original source of data.
SlaveThe target data store.
Master-Master ReplicationThe a bi-directional replication between two source and target systems. Based on the replication direction the master and slave switch role.
Master-Slave ReplicationAn uni-directional replication from source (master) to target (slave).
Master-Master Row SynchA special Master-Master replication in which conflict resolution is done at row level.
Master-Slave-Snapshot ReplicationA special Master-Slave replication, in which the complete data from source is copied to the target at a point-of-time.
Master-Slave-CascadeA Master-Slave replication topology where replication from source target is achieved through a cascade of intermediate master/slave.
Batch ETLAn ETL in which the data from source systems are copied to target schema in a batch process, primarily to be used as for Data Warehouse based reporting.
Real-time/Streaming ETLAn ETL in which continuous stream of data is processed and copied incrementally to the target schema, primarily used for real-time analytics.

References



Paper Code: TWP_1001.10, Version: 1.0, Author: Biswajit Dash, License: CC-BY-ND, Published: Jan-2016


























Tech-WhitePrints™ | e-Mail: biswajitdash@hotmail.com | LinkedIn: biswajit-dash-ind | Powered by screwturn wiki.

Creative Commons License This work by Tech-WhitePrints™ is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.