SSG515 Compre Paper

Comprehensive Examination SS G515 – Data Warehousing
NAME: IDNO:
BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI
II SEMESTER 2004-2005
SS G515 DATA WAREHOUSING
Comprehensive Examination
th
Date: 06 May 2005
Time: 3 Hours
Weightage: 35% [Part A (closed book) – 19 & Part B (open book) – 16]
Part A – Closed Book
Points to note:
Answer multiple choice questions in the Question paper itself
Some questions may have more than one correct option. You will get credit only if you
mark all the correct options
 There is NO NEGATIVE MARKING
 PUT A TICK on the correct option(s)
 Short answer questions are to be solved in the supplementary answer sheet provided
Multiple-Choice Questions (20*0.5=10)
1. Pick the correct statement(s):

(a) Snowflaking affects the fact table
(b) Outriggers affect the fact table
(c) Mini-dimensions affect the fact table
(d) Role-playing dimensions affect the fact table
2. In a data warehouse, if D1 and D2 are two conformed dimensions, then:
(a) D1 may be an exact replica of D2
(b) D1 may be at a rolled up level of granularity compared to D2
(c) Columns of D1 may be a subset of D2 and vice versa
(d) Rows of D1 may be a subset of D2 and vice versa
3. Pick the odd dimension out:
(a) Product
(b) Time
(c) Customer
(d) Store
4. The number of summary tables (assume separate fact table approach to store
summary data), if we have seven dimension tables and one level of hierarchy
along all dimensions:
(a) 43
(b) 63
(c) 49
(d) 243
5. Role-playing dimensions are:
(a) Separate physical copies of the dimensions
(b) Separate logical copies of the dimensions
(c) Implemented as SYNONYMS in oracle
(d) Dimensions with different granularity in a dimensional DW
6. Real-time data warehouse contains:
(a) Both historical & current data with latency of 24 hours or more
(b) Both historical & current data with latency of a few hours or less
(c) Only current data with latency of a few hours or less
(d) Only historical data
Page 1 of 5
7. Real-time data warehouse are used for:

(a) Tactical decisions
(b) Strategic decisions
(c) Both
(d) None
8. Pick the odd one out:
(a) Class I ODS
(b) Class II ODS
(c) Class III ODS
(d) Class IV ODS
(a) Meaningless keys
(b) Non-intelligent keys
(c) Smart keys
(d) Integer keys
10. If a cube has 10 dimensions and each dimension has 5 levels of hierarchies, the
total number of cuboids that can be generated are:
(a) 510
(b) 610
(c) 105
(d) 106
11. In a Grocery Store sales data mart, the cost of a product is stored in:
(a) Product dimension table
(b) Sales fact table
(c) Both in product dimension & sales fact table
(d) Cost is not maintained in the data warehouse
12. Market-basket analysis can be done on:
(a) Daily grain fact table
(b) Monthly grain fact table
(c) Receipt line grain fact table
(d) All
(a) Drill-through
(b) Drill-down
(c) Drill-across
(d) Roll-up
14. Core facts are:
(a) Stored only in the core fact table
(b) Stored in core fact table but not in custom fact table
(c) Stored in custom fact table but not in core fact table
(d) Stored in both core as well as custom fact table
15. Dimensional modeling is:
(a) Logical modeling
(b) Conceptual modeling
(c) Physical modeling
(d) None of the above
16. A dimension can be added to an existing star schema when it is at:
(a) A finer granularity than the fact table
(b) A coarser granularity than the fact table
(c) The same granularity as that of the fact table
(d) Granularity has nothing to do with adding a dimension table
Page 2 of 5
17. During the load process:

(a) Dimension tables are populated first
(b) Fact tables are populated first
(c) Conformed dimensions are loaded first
(d) Order does not matter
(a) Partitioning
(b) Aggregation
(c) Indexes
(d) View materialization
19. In HOLAP systems:
(a) Detailed data is stored in relational tables
(b) Summarized data is stored in relational tables
(c) Summarized data is stored in MDDBs
(d) User access via MOLAP tools
20. Deciding which views to materialize is a:
(a) NP-complete problem
(b) NP-hard problem
(c) P-problem
(d) None
Short Answer Questions (6*1.5=9)
1. What kind of optimization techniques are used in ROLAP cube computation?

2. Consider a Grocery store data warehouse with Store, Time, Product, and
Customer as dimensions. There are two fact tables corresponding to sales and
shelf inventory. What are fact-focused and dimension-focused queries? Give an
example of each type.
3. Can we make the transaction number degenerate dimension a dimension table in
the grocery store sales data mart? If so, what attributes you would have in the
transaction dimension and what kind of analysis you can do using those
attributes.
4. What is the main disadvantage of having normalized dimensions, mini-
dimensions and outriggers in your data warehouse design? Suggest a way of
overcoming it.
5. What is trickle-feeding the data warehouse? Why we need to do it? Do we trickle-
feed both the facts and dimensions or it is sufficient to just trickle-feed the facts?
6. Discuss the role of aggregate navigator. Give an architecture for the same and
explain how it redirects queries to appropriate aggregates?
Page 3 of 5
BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

II SEMESTER 2004-2005
SS G515 DATA WAREHOUSING
Comprehensive Examination
th
Date: 06 May 2005
Time: 3 Hours
Weightage: 35% [Part A (closed book) – 19 & Part B (open book) – 16]
Part B – Open Book
Problem 1
A consortium of banks wants to develop a data warehouse for effective decision-
making about their loan schemes. The banks provide loans to customers for
various purposes like House Building Loan, Car Loan, Educational Loan,
Personal Loan, etc. The whole country is categorized into a number of regions,
namely, North, South, East and West. Each region consists of a set of states.
Loan is disbursed to customers at interest rates that change from time to time.
Also, at any given point of time, the different types of loans have different rates.
The data warehouse should record an entry for each disbursement of loan to
customer. With respect to the above business scenario, answer the following
questions. Clearly state any reasonable assumptions you make.
1. Design a star schema for the data warehouse clearly identifying the fact
table(s), dimensional table(s), their attributes and measures along with the
primary key and foreign key relationships.
2. Write an SQL query by which you can display region-wise, bank-wise, year-
wise total amount of loans disbursed from your schema.
3. Draw a cuboid that would display the result of the query specified in Q. 2
above.
4. From the cuboid of Q. 3 above, if we want to see the amount of loan disbursed
during the year 2000 for the state of Maharashtra, which sequence of OLAP
operations would you need to perform?
5. Show the lattice of cuboids for the multi-dimensional data considering all the
dimensions in your schema using a single level of hierarchy for each dimension.
6. Draw possible schema hierarchies for each dimension.
7. Based on the schema hierarchies drawn in Q. 6 above, determine the total
number of cuboids, considering all the aggregation levels.
8. Draw a set of aggregated fact tables and their corresponding shrunken
dimensions for all the levels of hierarchies along the branch dimension. What are
the implications of doing this on the ETL process?
9. Once your data warehouse is ready and operational, there is a new
requirement to maintain the amount of loan re-payed at the same level of
granularity. Extend your star schema to a fact constellation schema to take care
of the new requirement.
10. What is the additivity of the fact(s) in your fact table(s)?
[2+1+1+1+1+1+1+1+2+1]
Page 4 of 5
Problem 2
Consider the attendance fact table in the BITS data warehouse. The dimensions
for this fact table are student, course, faculty, time, room, and campus and there
is a dummy fact (4 bytes). Assume finest granularity. The warehouse contains
data for the last 5 academic years for all campuses. It is found that the
attendance is 70%. If there are 10000 students in the student dimension,
estimate the size of the fact table (in GB) given that there are 200 courses and
each course has 40 lectures per semester.
Create an aggregated schema, which gives the course-wise, total attendance for
whole semester. (Draw the schema clearly)
If we need to calculate the percentage attendance for any course, do we need
information from any other fact table?
Did sparsity failure occur? Justify you answer.
[2+1+1+1]
Page 5 of 5

SSG515 Compre Paper

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

SSG515 Compre Paper

Încărcat de

Drepturi de autor:

Formate disponibile

Comprehensive Examination SS G515 – Data Warehousing

1. Pick the correct statement(s):

7. Real-time data warehouse are used for:

17. During the load process:

Short Answer Questions (6*1.5=9)

1. What kind of optimization techniques are used in ROLAP cube computation?

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

S-ar putea să vă placă și