Data Warehousing Basics

Data warehousing basics Comparing conceptual, logical and physical data models
OLAP and OLTP OLTP is Online Transaction Processing Highly normalized structures 3NF used for keeping track of daily transactions and requires faster insert update and deletes at DB level hence we follow ER modelling technique (highly normalized tables). The update inserts deletes are faster as these operations have to be performed at single place. In case of de-normalized structures updates have to be done in multiple records. Since there is less data redundancy the storage size of DB decreases. ER model generates very complex interwoven Entity Diagrams across multiple Business processes. OLAP are is a data warehousing system where in we use dimensional modelling fact and dimensions. Such a design is easily understandable and has better performance in querying terms. Here the dimensions are not completely normalized but they
are kept in de-normalized state. Here in Dimensional Modelling from ER diagram we determine a business process and then build corresponding fact and dimensions related to this business process and then repeat the same for other business processes. FACT a fact table contains 2 parts a) Foreign keys( reference to dimensions) b) Measures(additive or semi additive fields e.g. quantity sold and market value of a product) A Dimension table contains the textual details of records in FACT table. Dimensional Modelling can be performed in 2 formats Star Schema Here we have our fact table in the centre, while all the dimension tables surround our fact. There is reference of each dimension in the fact. Snowflake Schema It has same fundamental structure as Star Schema, However the dimensions are further normalized into separate tables. The principle behind snowflaking is normalisation of the dimension tables by removing low cardinality attributes and forming separate tables.
Type of dimensions Conformed Dimensions These are the dimensions that have same meaning across multiple Subject areas/Business process. They are the integration points within a Data mart across multiple Subject areas/Business process. Like Time Dimension, in GMDM there is MDM_BATCH_DIM table Degenerate Dimensions These are dimensions that are derived from fact but have no dimensions of their own. Junk Dimensions These are dimensions that contain low cardinality columns from fact tables like indicators, flags.
Fact-less fact A fact table that does not contain any measure is called a fact-less fact. This table will only contain keys from different dimension tables. This is often used to resolve a many-to-many cardinality issue. Any day a de-normalized table will return query results faster than a normalized bunch of tables.
When star and when snow flake First of all, some definitions are in order. In a star schema, dimensions that reflect a hierarchy are flattened into a single table. For example, a star schema Geography Dimension would have columns like country, state/province, city, state and postal code. In the source system, this hierarchy would probably be normalized with multiple tables with one-to-many relationships.
A snowflake schema does not flatten a hierarchy dimension into a single table. It would, instead, have two or more tables with a one-to-many relationship. This is a more normalized structure. For example, one table may have state/province and country columns and a second table would have city and postal code. The table with city and postal code would have a many-to-one relationship to the table with the state/province columns.
There are some good for reasons snowflake dimension tables. One example is a company that has many types of products. Some products have a few attributes, others have many, many. The products are very different from each other. The thing to do here is to create a core Product dimension that has common attributes for all the products such as product type, manufacturer, brand, product group, etc. Create a separate sub-dimension table for each distinct group of products where each group shares common attributes. The sub-product tables must contain a foreign key of the core Product dimension table.
One of the criticisms of using snowflake dimensions is that it is difficult for some of the multidimensional frontend presentation tools to generate a query on a snowflake dimension. However, you can create a view for each combination of the core product/sub-product dimension tables and give the view a suitably description name (Frozen Food Product, Hardware Product, etc.) and then these tools will have no problem.

Data Warehousing Basics

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Data Warehousing Basics

Încărcat de

Drepturi de autor:

Formate disponibile

Data warehousing basics Comparing conceptual, logical and physical data models

S-ar putea să vă placă și