A Data model is a conceptual representation of data structures(tables) required for a database and is very powerful in expressing and communicating the business requirements. Data model helps functional and technical team in designing the database. Data Modeling Tools : Erwin, Oracle Designer, Power Designer. Two types of data modeling are as follows: 1) Logical modeling 2)Physical modeling
Logical Modeling Includes entities (tables), attributes (columns/fields) and relationships (keys). Uses business names for entities & attributes Is independent of technology (platform, DBMS) Is normalized to fourth normal form(4NF) Physical Modeling Includes tables, columns, keys, data types, validation rules, database triggers, stored procedures, domains, and access constraints Uses more defined and less generic specific names for tables and columns, such as abbreviated column names, limited by the database management system (DBMS) and any company defined standards Includes primary keys and indices for fast data access. Logical Vs Physical
Logical v/s Physical logical physical Represents business information and defines business rules Represents the physical implementation of the model in a database. Entity. Table. Attribute Column Primary Key Primary Key Constraint Alternate Key UserUnique Constraint or Unique Index Rule Check Constraint, Default Value Relationship Foreign Key Definition Comment What is ER Modeling? Entity Relational Data Modeling is used in OLTP systems which are transaction oriented. Focus of OLTP Design Individual data elements Data relationships Design goals Accurately model business Remove redundancy(Normalized)
ER Modeling Shortcomings: Complex Unfamiliar to business people Incomplete history Slow query performance
Dimensional Modeling Definition Logical data model used to represent the measures and dimensions that pertain to one or more business subject areas Dimensional Model = Star Schema Can easily translate into multi-dimensional database design if required Overcomes ER design shortcomings
D M Advantages: Understandable Systematically represents history Reliable join paths High performance query Enterprise scalability
ER v/s DM ER DM Tables are units of storage Cubes are units of storage Data is normalized and used for OLTP. Data is denormalized and used in datawarehouse and data mart. Several tables and chains of relationships among them Few tables and fact tables are connected to dimensional tables Detailed level of transactional data Summary of bulky transactional data(Aggregates and Measures) used in business decisions Normal Reports User friendly, interactive, drag and drop multidimensional OLAP Reports Dimension tables Dimension table is one that describe the business entities of an enterprise, represented as hierarchical, categorical information such as time, departments , locations, and products. Dimension tables are sometimes called lookup or reference tables. Textual content (Character data) Dimension tables Characteristics Hold the dimensional attributes Usually have a large number of attributes (wide) Add flags and indicators that make it easy to perform specific types of reports Have small number of rows in comparison to fact tables (most of the time) Surrogate Key A unique (primary key) generated by the RDBMS that is not derived from any data in the database and whose only significance is to act as the primary key. A surrogate key is frequently a sequential number. Each table assigned a unique primary key, specifically generated for the data warehouse
Dimension table contd Example of EMP dimension: Dimension table contd Example of dimension tables:
dealer_key
region state city dealer model_key
brand category line model
Model time_key
year quarter month date Time Dealer Slowly Changing Dimensions Dimension source data may change over time Relative to fact tables, dimension records change slowly Allows dimensions to have multiple 'profiles' over time to maintain history Each profile is a separate record in a dimension table
Slowly Changing Dimension Example Example: A woman gets married Possible changes to customer dimension 1) Last Name 2)Marriage Status 3)Address 4)Household Income Existing facts need to remain associated with her single profile New facts need to be associated with her married profile
Slowly Changing Dimension Types Three types of slowly changing dimensions Type 1 Updates existing record with modifications Does not maintain history Type 2 Adds new record Does maintain history Maintains old record Type 3: Keep old and new values in the existing row Requires a design change
Degenerated Dimensions A degenerate dimension is a dimension which is derived from the fact table and doesn't have its own dimension table. Stored in the fact table Common examples include invoice numbers or order numbers Use - Degenerate dimensions is often based on the desire to provide a direct reference back to a transactional system without the overhead of maintaining a separate dimension table.
Conformed Dimensions A dimension that has exactly the same meaning and content when being referred from different fact tables. Example: Cube-1 contains F1 D1 D2 D3 and Cube-2 contains F2 D1 D2 D4 are the Facts and Dimensions here D1 D2 are the Conformed Dimensions. Eg: Time Dimension
Fact table A fact table consists of the measurements, metrics or facts of a business process. Fact tables are often defined by their grain. Grain The level of detail represented by a row in the fact table Must be identified early
Example of Fact table
Sales Facts model_key dealer_key time_key
revenue quantity
Facts Fully additive Can be summed across any and all dimensions Stored in fact table Examples: revenue, quantity , Sales_amount
Facts Semi-additive Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others.
Facts Non-additive Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table. All ratios are non-additive Examples: Age, weather
Schemas in Data Warehouses
A schema is a collection of database objects, including tables, views, indexes, and synonyms. There is a variety of ways of arranging schema objects in the schema models designed for data warehousing. -STAR Schema -Snowflake Schema
STAR Schema
The star schema (also called star-join schema or multi- dimensional schema) is the simplest style of data warehouse schema. The star schema consists of one or more fact tables referencing any number of dimension tables. The main advantages of star schemas are that they: - Provide highly optimized performance for typical star queries. - Widely supported by a large number of business intelligence tools.
STAR Schema
Snowflake Schema
The snowflake schema is similar to the star schema. However, in the snowflake schema, dimensions are normalized into multiple related tables, whereas the star schema's dimensions are denormalized with each dimension represented by a single table. Advantages of Using the Snowflake Schema : - easier to maintain. - increases flexibility Disadvantages of Using the Snowflake Schema - increases the number of tables an end-user must work with. - makes the queries much more difficult to create because more tables need to be joined.
Snowflake Schema
32 Designing a Star Schema Five initial design steps Based on Kimball's six steps Start designing in order Re-visit and adjust over project life 33 1. Identify fact table
Start by naming the fact table with the name of the business subject area Step One 34 Step Two 2. Identify fact table grain
Describe what a row in the fact table represents - in business terms 35 Step Three 3. Identify dimensions 36 Step Four 4. Select facts 37 Step Five 5. Identify dimensional attributes