ETL and the staging area

Decision support systems are usually based on the development of Data Warehouse infrastructures. A data warehouse (hereafter DW) architecture has two major areas: the staging area and the presentation area. In this article we present the staging area.

The sources from which data shall be systematically extracted, in order to be loaded in the DW, are determined. The database schema documentation of these sources, is reviewed in order to design the data extraction logic. Documentation quality of the data structures of these sources, influences the degree of difficulty in designing the data extraction logic. Data extracted are loaded in the staging area, either as simple files or as updates in database tables. The staging area may have various stages. Extraction of data from sources, transformation of data into new structures and data loading in the DW, a process known as ETL, takes places in the staging area.

The extraction process requires the determination of source relational tables – fields, from which data shall be extracted (as mentioned above, documentation of these structures is crucial for design). The design of the extraction process determines: o the frequency of data extraction o the extraction method (e.g. changes only) and technology (database partial replication) o the database instance or the file in which data are initially loaded, in the staging area

Moreover, the volume of data to be extracted is estimated, in order to plan for computational & storage capacity. Estimation sheets known as ‘volumetric sheets’ are developed with the following information per source field: • extraction frequency • estimated volume • standardisation and transformation rules applied (if any) • DW database field to which data will be loaded

In many cases, data quality assessment and data cleansing steps also take place in the staging area.

Design and implementation of the automated ETL process, often represents a major part of the man effort to develop a DW (international statistics estimate that it exceeds 70% of total effort).

The DW staging area, is often implemented in a separate physical server (staging server), thus adding complexity and cost. However, this approach has certain advantages like: o isolation of raw data which are extracted from sources, from processed data which are accessible by business analysts o additional security and process quality, given that DW users have no access in this area o load sharing, given that ‘data preparation’ tasks and DW querying tasks are handled by separate systems. o development of a central metadata repository which maintains documentation for all involved systems: operational systems (data sources), ETL process, data warehouse, BI tools and predefined reports

Various types of raw data processing, take place at the staging area: o Data standardization: data transformation to a standard format, if needed o Sorting of records o Matching and merging records of the same entity, which are derived from different sources (e.g. order records of the same Customer from different order handling systems), after standardization o Processing of calculated facts (facts derived from detailed data e.g. total monetary value of an order) o Management of surrogate keys, which replace operational systems keys o Enrichment of records with default values, if required o Production of aggregate data, if needed o Data conversion according to the technological platform used by the DW (DBMS, operating system)

The ETL process is automated by software and executed periodically to update the DW.

Contact us

Contact us

 Customer Intelligence | RFM | CRM |  Recency for Customer retention on subscription based services | Frequency & Monetary analysis for subscription based services  | Customer Lifetime Value for value-based servicing, a realistic analysis | Business Intelligence | Business Intelligence in Taxation | Business Intelligence in Healthcare | Data Warehousing | Strategic_aspects of business intelligence |  Combined skills for Business Intelligence  | Information as a competitive advantage, Innovation | Information as a competitive advantage, Internet | Information as a competitive advantage, Information as a service to the Customer  | Information as a competitive advantage, Creation of Customer value through retention & loyalty  | Information as a competitive advantage, Creation of Customer value |  Information as a competitive advantage | DSS_performance_capture | DSS_evolution | DW Staging area | DW presentation area | Kimball vs Inmon | Information quality | Customer-centric information architecture for efficient Customer insight | Test campaign result accuracy – test group sizingTest campaign result accuracy | Disclaimer |  Copyright © Pleroforea.com, All rights reserved

Home page - Business Intelligence

 

Disclaimer - Copyright -