Documente Academic
Documente Profesional
Documente Cultură
Dollar value is additive fact. If we want to find out the amount for a
particular place for a particular period of time, we can add the dollar
amounts and come up with the total amount.
Q. What is a snapshot?
A. Business Intelligence:
BI is a broad category of applications and technologies for gathering,
integrating, storing, analyzing and providing access to data to help
enterprise users make better business decisions. BI Applications
includes the activities of decisions support systems, query and
reporting, online analytical processing, statistical analysis, forecasting
and Data Mining.
Data warehousing:
Data warehousing is a process of dimensional modeling by Extraction,
Clean, Conform and Delivering to build Data warehouses which
are subject oriented, time variant, non volatile.
Dimensional:
- focused how data will be efficient for retrieving
(example, by report and analysis tools).
- many data redundancies
- Consist of Fact and Dimension table
A. Best way is to move data from ODS --> Data warehouse to Data
Marts.
Q: What is ETL?
A: ETL is the Data Warehouse acquisition processes of Extracting (E),
Transforming or Transporting (T) and Loading (L) data from source
systems into the data warehouse.
What is Metadata?
Information about domain structure of data warehouse
In star schema, all your dimensions will be linked directly with your fact
table. On the other hand in Snowflake schema, dimensions maybe
interlinked or may have one to many relationship with other tables. As
previous mails said this isn't a desirable situation but you can make
best choice once you have gathered all the requirements.
The snowflake is a design like a star but with a connect tables in the
dimensions tables is a relation between 2 dimensions.
3. Q: Which is better, Star or Snowflake?
A: Strict data warehousing rules would have you use a Star schema but
in reality most designs tend to become Snowflakes. They each have
their pros and cons but both are far better then trying to use a
transactional system third-normal form design.
The star schema and OLAP cube are intimately related. Star schemas
are most appropriate for very large data sets. OLAP cubes are most
appropriate for smaller data sets where analytic tools can perform
complex data comparisons and calculations. In almost all OLAP cube
environments, it’s recommended that you originally source data into a
star schema structure, and then use wizards to transform the data into
the OLAP cube.
Dimensional modeling divides the world of data into two major types:
Measurements and Descriptions of the context surrounding those
measurements. The measurements, which are typically numeric, are
stored in fact tables, and the descriptions of the context, which are
typically textual, are stored in the dimension tables.
Every foreign key in the fact table has a match to a unique primary key
in the respective dimension (referential integrity). This allows the
dimension table to possess primary keys that aren’t found in the fact
table. Therefore, a product dimension table might be paired with a
sales fact table in which some of the products are never sold.
The main difference between second and third normal form is that
repeated entries are removed from a second normal form table and
placed in their own “snowflake”. Thus the act of removing the context
from a fact record and creating dimension tables places the fact table
in third normal form.
The fact tables are mostly very huge and almost never fetch a single
record into our answer set. We fetch a very large number of records on
which we then do, adding, counting, averaging, or taking the min or
max. The most common of them is adding. Applications are simpler if
they store facts in an additive format as often as possible. Thus, in the
grocery example, we don’t need to store the unit price. We compute
the unit price by dividing the dollar sales by the unit sales whenever
necessary.
The new fact table called for in the drill-across operation must share
certain dimensions with the fact table in the original query. All fact
tables in a drill-across query must use conformed dimensions.
Q. Why have more than one fact table instead of a single fact
table?
A Type 3 SCD adds a new field in the dimension record but does not
create a new record. We might change the designation of the
customer’s sales territory because we redraw the sales territory map,
or we arbitrarily change the category of the product from confectionary
to candy. In both cases, we augment the original dimension attribute
with an “old” attribute so we can switch between these alternate
realities.
Overwriting
Creating another dimension record
Creating a current value filed
Another benefit you can get from surrogate keys (SID) is in tracking
the SCD - Slowly Changing Dimension.
A classical example:
On the 1st of January 2002, Employee 'E1' belongs to Business
Unit 'BU1' (that's what would be in your Employee Dimension). This
employee has a turnover allocated to him on the Business Unit 'BU1'
but on the 2nd of June the Employee 'E1' is muted from Business Unit
'BU1' to Business Unit 'BU2.' All the new turnover has to belong to the
new Business Unit 'BU2' but the old one should belong to the Business
Unit 'BU1.'
If you used the natural business key 'E1' for your employee within your
data warehouse everything would be allocated to Business Unit 'BU2'
even what actually belongs to 'BU1.'
If you use surrogate keys, you could create on the 2nd of June a new
record for the Employee 'E1' in your Employee Dimension with a new
surrogate key.
This way, in your fact table, you have your old data (before 2nd of
June) with the SID of the Employee 'E1' + 'BU1.' All new data (after 2nd
of June) would take the SID of the employee 'E1' + 'BU2.'
Production may reuse keys that it has purged but that you are
still maintaining
Production might legitimately overwrite some part of a
product description or a customer description with new values
but not change the product key or the customer key to a new
value. We might be wondering what to do about the revised
attribute values (slowly changing dimension crisis)
Production may generalize its key format to handle some new
situation in the transaction system. E.g. changing the
production keys from integers to alphanumeric or may have
12-byte keys you are used to have become 20-byte keys
Acquisition of companies
Fact tables which do not have any facts are called factless fact tables.
They may consist of nothing but keys.
There are two kinds of fact tables that do not have any facts at all.
The first type of factless fact table is a table that records an event.
Many event-tracking tables in dimensional data warehouses turn out to
be factless.
E.g. A student tracking system that detects each student attendance
event each day.
GENERAL
Q. How many dimension tables did you had in your project and name
some dimensions (columns)? (Mascot)
Q. How many Facts & Dimension Tables are there in your Project?
(Mascot)
Data warehouses can have many different types of life cycles with
independent data marts. The following is an example of a data
warehouse life cycle.
In the life cycle of this example, four important steps are involved.
A data warehouse is for very large databases (VLDBs) and a data mart
is for smaller databases. The difference lies in the scope of the things
with which they deal.
A data mart is an implementation of a data warehouse with a small
and more tightly restricted scope of data and data warehouse
functions. A data mart serves a single department or part of an
organization. In other words, the scope of a data mart is smaller than
the data warehouse. It is a data warehouse for a smaller group of end
users.
A data warehouse system (DWS) comprises the data warehouse and all
components used for building, accessing and maintaining the DWH
(illustrated in Figure 1). The center of a data warehouse system is the
data warehouse itself. The data import and preparation component is
responsible for data acquisition. It includes all programs, applications
and legacy systems interfaces that are responsible for extracting data
from operational sources, preparing and loading it into the warehouse.
The access component includes all different applications (OLAP or data
mining applications) that make use of the information stored in the
warehouse.
After the initial load (the first load of the DWH according to the DWH
configuration), during the DWS operation phase, warehouse data must
be regularly refreshed, i.e., modifications of operational data since the
last DWH refreshment must be propagated into the warehouse such
that data stored in the DWH reflect the state of the underlying
operational systems. Besides DWH refreshment, DWS operation
includes further tasks like archiving and purging of DWH data or DWH
monitoring.
It all depends on the needs of the users, how fast data changes and
the volume of information that is to be loaded into the data warehouse.
It is common to schedule daily, weekly or monthly dumps from
operational data stores during periods of low activity (for example, at
night or on weekends). The longer the gap between loads, the longer
the processing times for the load when it does run. A technical IS/IT
staffer should make some calculations and consult with potential users
to develop a schedule to load new data.
Q. What are the data modeling tools you have used? (Polaris)
During the physical design process, you convert the data gathered
during the logical design phase into a description of the physical
database, including tables and constraints.
Entity-Relationship.
Q. How do you extract data from different data sources explain with an
example? (Polaris)
Q. What are the reporting tools you have used? What is the difference
between them? (Polaris)
Q. What are the Different types of OLAP's? What are their differences?
(Mascot)
ROLAP stands for Relational OLAP. Users see their data organized in
cubes with dimensions, but the data is really stored in a Relational
Database (RDBMS) like Oracle. The RDBMS will store data at a fine
grain level, response times are usually slow.
DOLAP
The terms data warehousing and OLAP are often used interchangeably.
As the definitions suggest, warehousing refers to the organization and
storage of data from a variety of sources so that it can be analyzed
and retrieved easily. OLAP deals with the software and the process of
analyzing data, managing aggregations, and partitioning information
into cubes for in-depth analysis, retrieval and visualization. Some
vendors are replacing the term OLAP with the terms analytical
software and business intelligence.
Q. Aggregate navigation
Q. How do I set the log level higher for more detailed information
within Data Warehouse Center 7.2?
Within DWC, log level capability can be set from 0 to 4. There is a log
level 5, yet it cannot be turned on using the GUI, but must be turned
on manually. A command line trace can be used for any trace level,
and this is the only way to turn on a level 5 trace:
Be sure to reset the trace level to 0 using the command line when you
are done:
db2 => update iwh.configuration set value_int = 0 where name =
'TRACELVL'
and (component = '<component name>')
When you run a trace, the Data Warehouse Center writes information
to text files. Data Warehouse Center programs that are called from
steps also write any trace information to this directory. These files are
located in the directory specified by the VWS_LOGGING environment
variable.
1. What was the original business problem that led you to do this
project?