What is DWH?
It is a relational database designed for analytical needs
It functions on basis of OLAP
It is a central location where consolidated data from multiple locations (databases) are stored
It is also called as the process of transforming data into information
OLAP:
OLAP is a flexible way to make complicated analysis of multidimensional data
Opens up new views of looking at data
Supports filtering/sorting of data
Data can be refined
Types of OLAP Cubes:
MOLAP - Stores data directly into multidimensional database. Excellent performance and can perform complex calculations. Only limited amount of data can be handled
ROLAP - Performs dynamic multidimensional analysis of data stored in relational database rather than in multidimensional database. Greater amount of data can be processed. Requires more processing time/disk space.
HOLAP - Hybrid OLAP, combination of MOLAP & ROLAP. It can drill-through from the cubes into underlying relational data
OLAP Operations:
ROLLUP: Performs aggregation on a data cube by either
1. Climbing up a concept hierarchy for dimension
2. Dimension reduction
DRILL-DOWN: Reverse operation of ROLLUP
1. Stepping down a concept hierarchy for a dimension
2. Introducing new dimension
SLICE: Provides a new sub-cube from one particular dimension in a given cube
DICE: Provides a new sub-cube from two or more dimension in a given cube
PIVOT: It is also known as rotation operation. It transposes X axis in order to provide an alternative presentation of data
Dimensions:
- The tables that describe the dimensions involved are called dimensions.
- Dividing a data DWH project into dimensions provides structured information for analysis & reporting
- End users queries on these dimension tables which contain descriptive information
Facts:
- A fact is a measure that can be summed, averaged or manipulated
- A Fact table contains two types of data. A dimension key and a measure
- Every dimension table is linked to a Fact table
Schemas:
- It gives logical description of entire database
- It gives about constraints, key values & how key values linked between tables
- A database uses relational data model, while data warehouse uses Star, Snowflake and Fact Constellation schema
Star Schema:
- Each dimension is represented in one dimension table
- Every dimension table linked to a Fact tables
- Fact table is at the center, which contains keys to every dimension table
Snowflake Schema:
- Dimension tables are normalized. Dimension table is split into multiple tables
Galaxy Schema:
- Also known as Constellation schema
- Contains more than one Fact table
- One dimension table can be shared by multiple Fact tables. Such dimensions are called as Conformed dimensions