Sunteți pe pagina 1din 4

Data warehouse for reporting purpose.

Analysis of the data --- comparative study


Decision making - report -- easy understand --comparative study
Report is collection of data.
Analysis of the data is for better understanding
Analysis --looking data in different perspectives.
build better business.
Data warehouse -- multidimensional analysis--better decision making
evolution of reporting
Main frame -->client server ---->multi tire enterprising
MAIN FRAME
Is not user specific, more control specific ex; INDIAN BANKING
CLient server--- It was empowered, localized, ex: ATM
Multi tier ---> Internet Banking, easy to analyse
Associations, Sequential patterns.
Data does not having any meaning.
when you create relation to the data --- meaningful information.
each piece of information is data
Knowledge ---> Analysing information.
To do analysis we generate reports from the relational database.---> Data wareho
use.
DAY2-Capturing data without any relation is called raw data.
OLTP---> Relational database.
OLTP or Operational systems.
Transaction by transaction --- Insert, update , delete.
these are called as a OLTP. ONLINE TRANSACTIONAL PROCESSING OR OPERATING SYSTEMS
.
Normalisation -- 3NF--- Repetition of data is not allowed
Multidimensional Analysis.
We can generate reports by joining the all the tables in OLTP.
But it will a compute intense and may take long time to execute.
OLTP will be a challenge to generate the report for analysis.
Normalisation -- repetition of data is not acceptable..
OLTP is a normalised optimization.
the information in OLTP is detailed.
In OLTP the data is captured in transaction by transaction.
Orientation of data is record by record in OLTP.
Business Intelligence-- Online Analytical Processing.
View reporting and Analysis--- MultiDImensionall
OLAP:
OLTP:
deals with historical data
day to day information
data stored at the transac level
Data needs to be integrated.
database design is a denormalized
Database is noramlised.
OLTP is a product.
A new system for data analysis process i.e DATA WAREHOUSE.
DAY 3:-Using Data warehouse -- OLAP.
Architect design the data warehouse
data modeller design data tables.
data comes from etl to data warehouse to olap for reporting.
data warehouse is a process and is unique to an organisation.
subject area;;hr;;finance;;
when you put the data in data warehouse it cannot be modified by end user.

OLTP is having a volatile data.


OLTP and data warehouse both are databases.
OLTP is running business and data warehouse is managing business.
designing an oltp system is entity relationship modelling.
designing data warehouse is dimensional modelling.
the table in data warehouse contains the descriptive in nature.
Dimension is textual in nature, relatively static(ocasionally), highly hierarchi
cal in nature.
fact table contains -- measures, quantifiable, keys of dimensions.
Fact table contains the keys of the dimension table + fact table measures + date
s.
dimension tables are master tables and fact table is a detailed table.
Fact tables are joined to the dimension tables with the foreign keys.
this helps to grow quickly and holds large volumes of data.
First we load data into dimension table and then we load data into facts table.
dimension may/may not contain history but fact table contains history.
Denormalized data is the repetition of the data is acceptable.
Dimension tables are denormalized.
fact tables are neither normalized or denormalized.
This fact and dimension table comprise of star schema.
Snow flake schema --fact table surrounded by denormalised dimension table and one of the dimension t
able is sub dimensioned.
Architects decide the type of the schema needed.
The level at which you capture the data in fact table is-- grain level
DAY 5-String to String comparison usually takes long time.
In Order to overcome this we add keys to the dimension table while loading the d
ata into it.
OLTP keys are different from OLAP keys which are primary and unique.
With the introduction of keys we compare the data in string by string.
These keys which are sequentially generated numbers called surrogate keys.
If having a single subject area in a data modelling is called Data Mart.
Multiple subject area then we call it as data warehouse.
Both are similar having star schemas etc.
Data mart is not a part of the data warehouse.
Inmon methodology-- first data warehouse and then data mart --- called as--top d
own approach-- enterprise approach.
Kimball methodology---first data mart first and then data warehouse ---bottom up
approach.
Time taken to construct a data mart is lesser than data warehouse.
DAY 6-SCD Type 1 --- when you don't want the history value
you only want current value.
SCD Type 2 --- when you need both current value and also with history.
SCD Type 3 --- when you want both previous(not history just one step previous va
lue) and current value.
There will be more indexes in OLAP and less in OLTP.
If a dimension shared by more than one fact table then we call it as confirmed d
imension.
If a dimension table shared by more than one data mart is called confirmed dimen
sion.
Degenerated dimension --- field in fact table which does not have any attributes
in its dimension table.
Types of fact tables --- additive,non additive, semi additive
Additive --a measure which can be summed up with all the dimensions.
Fact less fact table -- If it contains only keys without any measures.

DAY 7-Three phases in a project -- DEV,TEST, PROD


dimensions and types of dimensions.
Different Types of Dimensions and Facts in Data Warehouse
Dimension A dimension table typically has two types of columns, primary keys to fact table
s and textual\descreptive data.
Fact A fact table typically has two types of columns, foreign keys to dimension table
s and measures those that contain numeric facts. A fact table can contain fact s dat
a on detail or aggregated level.
Types of Dimensions Slowly Changing Dimensions:
Attributes of a dimension that would undergo changes over time. It depends on th
e business requirement whether particular attribute history of changes should be
preserved in the data warehouse. This is called a Slowly Changing Attribute and
a dimension containing such an attribute is called a Slowly Changing Dimension.
Rapidly Changing Dimensions:
A dimension attribute that changes frequently is a Rapidly Changing Attribute. I
f you don t need to track the changes, the Rapidly Changing Attribute is no problem,
but if you do need to track the changes, using a standard Slowly Changing Dimen
sion technique can result in a huge inflation of the size of the dimension. One
solution is to move the attribute to its own dimension, with a separate foreign
key in the fact table. This new dimension is called a Rapidly Changing Dimension
.
Junk Dimensions:
A junk dimension is a single table with a combination of different and unrelated
attributes to avoid having a large number of foreign keys in the fact table. Ju
nk dimensions are often created to manage the foreign keys created by Rapidly Ch
anging Dimensions.
Inferred Dimensions:
While loading fact records, a dimension record may not yet be ready. One solutio
n is to generate an surrogate key with Null for all the other attributes. This s
hould technically be called an inferred member, but is often called an inferred
dimension.
Conformed Dimensions:
A Dimension that is used in multiple locations is called a conformed dimension.
A conformed dimension may be used with multiple fact tables in a single database
, or across multiple data marts or data warehouses.
Degenerate Dimensions:
A degenerate dimension is when the dimension attribute is stored as part of fact
table, and not in a separate dimension table. These are essentially dimension k
eys for which there are no other attributes. In a data warehouse, these are ofte
n used as the result of a drill through query to analyze the source of an aggreg
ated number in a report. You can use these values to trace back to transactions
in the OLTP system.
Role Playing Dimensions:
A role-playing dimension is one where the same dimension key
along with its associ
ated attributes
can be joined to more than one foreign key in the fact table. For
example, a fact table may include foreign keys for both Ship Date and Delivery D
ate. But the same date dimension attributes apply to each foreign key, so you ca
n join the same dimension table to both foreign keys. Here the date dimension is
taking multiple roles to map ship date as well as delivery date, and hence the
name of Role Playing dimension.
Shrunken Dimensions:
A shrunken dimension is a subset of another dimension. For example, the Orders f
act table may include a foreign key for Product, but the Target fact table may i

nclude a foreign key only for ProductCategory, which is in the Product table, bu
t much less granular. Creating a smaller dimension table, with ProductCategory a
s its primary key, is one way of dealing with this situation of heterogeneous gr
ain. If the Product dimension is snowflaked, there is probably already a separat
e table for ProductCategory, which can serve as the Shrunken Dimension.
Static Dimensions:
Static dimensions are not extracted from the original data source, but are creat
ed within the context of the data warehouse. A static dimension can be loaded ma
nually
for example with Status codes
or it can be generated by a procedure, such as
a Date or Time dimension.
Types of Facts Additive:
Additive facts are facts that can be summed up through all of the dimensions in
the fact table. A sales fact is a good example for additive fact.
Semi-Additive:
Semi-additive facts are facts that can be summed up for some of the dimensions i
n the fact table, but not the others.
Eg: Daily balances fact can be summed up through the customers dimension but not
through the time dimension.
Non-Additive:
Non-additive facts are facts that cannot be summed up for any of the dimensions
present in the fact table.
Eg: Facts which have percentages, ratios calculated.
Factless Fact Table:
In the real world, it is possible to have a fact table that contains no measures
or facts. These tables are called Factless Fact tables .
Eg: A fact table which has only product key and date key is a factless fact. The
re are no measures in this table. But still you can get the number products sold
over a period of time.
Based on the above classifications, fact tables are categorized into two:
Cumulative:
This type of fact table describes what has happened over a period of time. For e
xample, this fact table may describe the total sales by product by store by day.
The facts for this type of fact tables are mostly additive facts. The first exa
mple presented here is a cumulative fact table.
Snapshot:
This type of fact table describes the state of things in a particular instance o
f time, and usually includes more semi-additive and non-additive facts. The seco
nd example presented here is a snapshot fact table

S-ar putea să vă placă și