Sunday, 22 August 2021

DWH Concepts

 What is DWH? 

It is a relational database designed for analytical needs 

It functions on basis of OLAP 

It is a central location where consolidated data from multiple locations (databases) are stored 

It is also called as the process of transforming data into information 


OLAP: 

OLAP is a flexible way to make complicated analysis of multidimensional data

Opens up new views of looking at data 

Supports filtering/sorting of data 

Data can be refined 


Types of OLAP Cubes:

MOLAP - Stores data directly into multidimensional database. Excellent performance and can perform complex calculations. Only limited amount of data can be handled 

ROLAP - Performs dynamic multidimensional analysis of data stored in relational database rather than in multidimensional database. Greater amount of data can be processed. Requires more processing time/disk space. 

HOLAP - Hybrid OLAP, combination of MOLAP & ROLAP. It can drill-through from the cubes into underlying relational data


OLAP Operations:

ROLLUP: Performs aggregation on a data cube by either 

1. Climbing up a concept hierarchy for dimension 

2. Dimension reduction 

DRILL-DOWN: Reverse operation of ROLLUP 

1. Stepping down a concept hierarchy for a dimension 

2. Introducing new dimension 

SLICE: Provides a new sub-cube from one particular dimension in a given cube 

DICE:  Provides a new sub-cube from two or more dimension in a given cube

PIVOT: It is also known as rotation operation. It transposes X axis in order to provide an alternative presentation of data 


Dimensions:

  • The tables that describe the dimensions involved are called dimensions.
  • Dividing a data DWH project into dimensions provides structured information for analysis & reporting 
  • End users queries on these dimension tables which contain descriptive information 

Facts:

  • A fact is a measure that can be summed, averaged or manipulated 
  • A Fact table contains two types of data. A dimension key and a measure
  • Every dimension table is linked to a Fact table


Schemas:

  • It gives logical description of entire database
  • It gives about constraints, key values & how key values linked between tables 
  • A database uses relational data model, while data warehouse uses Star, Snowflake and Fact Constellation schema 


Star Schema:

  • Each dimension is represented in one dimension table 
  • Every dimension table linked to a Fact tables 
  • Fact table is at the center, which contains keys to every dimension table 

Snowflake Schema: 

  • Dimension tables are normalized. Dimension table is split into multiple tables 

Galaxy Schema: 

  • Also known as Constellation schema 
  • Contains more than one Fact table 
  • One dimension table can be shared by multiple Fact tables. Such dimensions are called as Conformed dimensions