Overview On Datawarehousing - For A Recruiter
Sign in

Overview on Datawarehousing - For a Recruiter

What is a Data Warehouse?


According to Inmon, famous author for several data warehouse books, "A data warehouse is a subject oriented, integrated, time variant, non volatile collection of data in support of management's decision making process".





Example: In order to store data, over the years, many application designers in each branch have made their individual decisions as to how an application and database should be built. So source systems will be different in naming conventions, variable measurements, encoding structures, and physical attributes of data. Consider a bank that has got several branches in several countries, has millions of customers and the lines of business of the enterprise are savings, and loans. The following example explains how the data is integrated from source systems to target systems.






What is OLAP?


OLAP, an acronym for 'Online Analytical Processing' is a technique by which the data sourced from a data warehouse or data mart is visualized and summarized to provide perspective multidimensional view across multiple dimensions. Generally OLAP refers to OLAP Tools(e.g Cognos, Business Objects etc.,) that help to accomplish these tasks.

Since data warehouse is designed using a dimensional data model, data is represented in the form of data cubes enabling us to aggregate facts, slice and dice across several dimensions. OLAP tools provide options to drill-down the data from one hierarchy to another hierarchy.

For example sales amount can be calculated for a particular year or it can be drilled down to its next hierarchies like month, week, day etc. In the same way, data can be rolled up for summarization from product to product group, product group to product sub-class then from product sub-class to product class.

Thus with this cube structure, data can be viewed from multiple points providing the data analysts, a greater insight into data.

There are many OLAP hybrids or variants like MOLAP(Multidimensional OLAP), HOLAP(Hybrid OLAP), ROLAP(Relational OLAP), DOLAP(Desktop OLAP or Database OLAP) available in the market and can be used depending on the needs and requirements of an organization.








OLAP


OLAP Database - Multidimensional



This is a type of database that is optimized for data warehouse, data mart and online analytical processing (OLAP) applications. The main advantage of this database is query performance.

Relational databases make it easy to work with individual records, whereas multidimensional databases are designed for analyzing large groups of records. Relational database is typically accessed using a Structured Query Language (SQL) query.

A multidimensional database allows a user to ask questions like "How many mortgages have been sold in New Jersey city" and "How many credit cards have been purchased in a particular county?".








Popular Multidimensional Databases

Database Name

Company Name

Crystal Holos

Business Objects

Hyperion Essbase

Hyperion

Oracle Express

Oracle Corporation

Oracle OLAP Option

Oracle Corporation

AWMicrosoft Analysis Services

Microsoft

PowerPlay Enterprise

Cognos







OLAP & its Hybrids

OLAP, an acronym for Online Analytical Processing is an approach that helps organization to take advantages of DATA. Popular OLAP tools are Cognos, Business Objects, Micro Strategy etc. OLAP cubes provide the insight into data and helps the topmost executives of an organization to take decisions in an efficient manner.

Technically, OLAP cube allows one to analyze data across multiple dimensions by providing multidimensional view of aggregated, grouped data. With OLAP reports, the major categories like fiscal periods, sales region, products, employee, promotion related to the product can be ANALYZED very efficiently, effectively and responsively. OLAP applications include sales and customer analysis, budgeting, marketing analysis, production analysis, profitability analysis and forecasting etc.


ROLAP

ROLAP stands for Relational Online Analytical Process that provides multidimensional analysis of data, stored in a Relational database(RDBMS).

MOLAP

MOLAP(Multidimensional OLAP), provides the analysis of data stored in a multi-dimensional data cube.

HOLAP

HOLAP(Hybrid OLAP) a combination of both ROLAP and MOLAP can provide multidimensional analysis simultaneously of data stored in a multidimensional database and in a relational database(RDBMS).

DOLAP

DOLAP(Desktop OLAP or Database OLAP)provide multidimensional analysis locally in the client machine on the data collected from relational or multidimensional database servers.




Database - RDBMS

Database - RDBMS

There are a number of relational databases to store data. A relational database contains normalized data stored in tables.

Tables contain records and columns. RDBMS makes it easy to work with individual records. Each row contains a unique instance of data for the categories defined by the columns.


RDBMS are used in OLTP applications(e.g. ATM cards) very frequently and sometimes datawarehouse may also use relational databases.

Please refer to Relational data modeling for details to know how data from a source system is normalized and stored in RDBMS databases.







Popular RDBMS Databases

RDBMS Name

Company Name


Oracle

Oracle Corporation


IBM DB2 UDB

IBM Corporation


IBM Informix

IBM Corporation


Microsoft SQL Server

Microsoft


Sybase

Sybase Corporation


Terradata

NCR







ETL Concepts

ETL Concepts

Extraction, transformation, and loading. ETL refers to the methods involved in accessing and manipulating source data and loading it into target database.

The first step in ETL process is mapping the data between source systems and target database(data warehouse or data mart). The second step is cleansing of source data in staging area. The third step is transforming cleansed source data and then loading into the target system.

Note that ETT (extraction, transformation, transportation) and ETM (extraction, transformation, move) are sometimes used instead of ETL.







Glossary of ETL

Source System

A database, application, file, or other storage facility from which the data in a data warehouse is derived.

Mapping

The definition of the relationship and data flow between source and target objects.

Metadata

Data that describes data and other structures, such as objects, business rules, and processes. For example, the schema design of a data warehouse is typically stored in a repository as metadata, which is used to generate scripts used to build and populate the data warehouse. A repository contains metadata.

Staging Area

A place where data is processed before entering the warehouse.

Cleansing

The process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process.

Transformation

The process of manipulating data. Any manipulation beyond copying is a transformation. Examples include cleansing, aggregating, and integrating data from multiple sources.

Transportation

The process of moving copied or transformed data from a source to a data warehouse.

Target System

A database, application, file, or other storage facility to which the "transformed source data" is loaded in a data warehouse.







Working with an ETL Tool:

» How to work with various options like designer, mapping, workflow, scheduling etc.,?

» How to work with sources like DBMS, relational source databases, files, ERPs etc., and
import the source definitions?

» How to import data from data modeling tools, applications etc.,?

» How to work with targets like DBMS, relational source databases, files, ERPs etc., and import the source definitions?

» How to create target definitions?

» How to create mappings between source definitions and target definitions?

» How to create transformations?

» How to cleanse the source data?

» How to create a dimension, slowly changing dimensions, cube etc.,?

» How to create and monitor workflows?

» How to configure, monitor and run debugger?

» How to view and generate metadata reports?







What are ETL Tools?

What are ETL Tools?



ETL Tools are meant to extract, transform and load the data into Data Warehouse for decision making. Before the evolution of ETL Tools, the above mentioned ETL process was done manually by using SQL code created by programmers.


This task was tedious and cumbersome in many cases since it involved many resources, complex coding and more work hours. On top of it, maintaining the code placed a great challenge among the programmers.

These difficulties are eliminated by ETL Tools since they are very powerful and they offer many advantages in all stages of ETL process starting from extraction, data cleansing, data profiling, transformation, debuggging and loading into data warehouse when compared to the old method.

There are a number of ETL tools available in the market to do ETL process the data according to business/technical requirements. Following are some those.


start_blog_img