Welcome Guest | Login | »

Tech-WhitePrints™

Batch System - Design Essentials

Modified on Sun, 26 Feb 2017 17:34 by Biswajit Dash Categorized as classical architecture, design

Problem Statement

What is batch a system? What are the essential design elements of such a system?


Design Abstract

What? When?

What is batch system? A batch process is non-interactive execution of series of steps or programs on a set of inputs. So, the three key elements being :-
  • non-interactive execution : the execution does not include any manual intervention in run-time;
  • set of inputs : a set of inputs like - records from database, lines from file - are processed; and
  • series of programs : each input is passed through a series of steps or programs.

A batch system provides the containing infrastructure for such batch processing.

When to use a batch? A batch system is used for processing large volume of inputs in an offline mode. Some of the day-to-day usages include :-
  • online order - orders captured through an online system are processed in offline batch;
  • message queue - requests queued in a queuing system are picked-up and processed in sequence (mostly);
  • data integration - data received from a source-system is fed into a target-system;
  • data warehouse - data from multiple systems are unified and fed into a warehouse;
  • month end processing - large volume of data is processed for report generation; and
  • many more.

The Inner-workings

Batch Flow - Basic

At basic level a batch system consists of three steps :-
  • read each input from the set of inputs;
  • process each input through one or more steps; and
  • repeat above until all inputs are processed.

Image

Batch Flow - Advanced

What are the key challenges? Unlike the above mentioned steps, a real-life batch system needs to answer some key questions such as :-
  • What action to take if the batch is executed repeatedly with same set of inputs?
  • Should the batch continue or stop - in case of error in reading a specific input?
  • Should the batch continue or stop - in case of error in processing a specific input?

The sketch depicts detail flow of a real-life (but simple and sequential) batch system.

Image

StepDescription
Config SettingsThe set of configuration parameters used by the batch system in run-time decision making.
Init Batch ContextThe step initializes the batch execution context, based on which different run-time decisions are taken.
Source: Database/FilesThe source of inputs to the batch process.
Detect Duplicate ExecutionThe step detects if the batch is being repeat executed on the same set of inputs.
Connect Data SourceThe step to connect to the input source and buffer/read the inputs.
Read InputThe step to read or pick a single input from the input set for processing.
Verify Input FormatThe step to verify the format compliance of the current input. This is mostly useful in file based inputs, specifically to verify - length of fields, field count, data-type etc.
Log ErrorThe step to log run-time error.
Log Format ErrorThe step to log the input that does not comply with format specifications. This log is used to perform corrective action on erroring inputs.
Process InputThe step to process the current input. This step is functionality specific, and may be a composition of one or more programs/steps.
Abort BatchThe step to abort the current batch execution. This can perform clean-up tacks like - logging, and closing connection etc.
Close BatchThe step to successfully complete the batch. This can perform clean-up tacks like - logging, and closing connection etc.

Implementation Notes

Besides the above basic flow, the design and implementation of a real-life high performing batch system also need to support :-
  • parallel processing of multiple inputs for faster processing/completion;
  • scale-out configuration where sub-set of inputs can be processed on different nodes/hardware;
  • fail-over and recovery mechanism where the pending tasks on a failed node can be picked-up by a different node;
  • real-time notification for faster corrective action; and
  • capturing of run-time stats like - batch duration, failed/passed counts, average processing time etc.

Glossary

InputA "single input element" over which processing is applied. It can be a record from database record-set, of line from a file.
NodeA hardware hosting the batch system capable of independently executing a batch process end-to-end.
Continue vs. AbortThe decision to either "continue processing" or "abort processing".


Paper Code: TWP_1003.10, Version: 1.0, Author: Biswajit Dash, License: CC-BY-ND, Published: Aug-2016


























Tech-WhitePrints™ | e-Mail: biswajitdash@hotmail.com | LinkedIn: biswajit-dash-ind | Powered by screwturn wiki.

Creative Commons License This work by Tech-WhitePrints™ is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.