Documente Academic
Documente Profesional
Documente Cultură
Get more stuff like.. Assignments, Documents, PC tips and tricks, Widescreen wallpapers,Learning videos,Open source software,Latest cool icons
packs,Important web links and whatever you want...
Follow @ss731382
Welcome to..www.sslifehacker.blogspot.in
Home
Blog Archive
January (2)
December (7)
November (2)
October (1)
August (3)
May (4)
April (9)
FAQ
1. What is the disadvantage of File Management System over DBMS?
Ans:
Some of the disadvantages of file management system over database management system are:
n Data redundancy and inconsistency
n Difficulty in accessing
data
n Difficulty in integrating data into new enterprise level applications because of varying formats
n Lack of support for concurrent updates by multiple users.
n Lack of inherent security
2. Are relational databases the only possible type of database models?
Ans:
No. Apart from relational, other models include network and hierarchical models. However, these two models are obsolete.
Nowadays, relational and object-oriented models are preferred.
3. What is referential integrity and how it is achieved in a relational database?
Ans:
Referential integrity is a feature of DBMS that prevents the user from entering inconsistent data. This is mainly achieved by having a
foreign key constraint on a table.
4. What are the higher normal forms?
Ans:
A normal form considered higher than 3NF is the Boyce Codd Normal Form (BCNF). The BCNF differs from the 3 NF when there are
more than one composite and disjoint candidate keys.
-----------------------------------------------------------------------------------------------------------5. On which layer of application architecture does data warehouse operate?
http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more
MENU ITEMS
4th sem
6 Sem KU Lab
Practical
Assignments
Cracking Pass
(2)
Documents
Downloads
GNIIT Sem C E
(1)
ibps
Icons Pack
Notification
PRACTICAL
solved Answer
-51 (1)
1/21
8/21/2014
solved Answer
-52 (1)
Ans:
A data warehouse is a server-side repository to store data.
solved Answer
-53 (1)
solved Answer
-54 (1)
solved Answer
-61 (1)
solved Answer
-62 (1)
solved Answer
-63 (1)
solved Answer
-64 (1)
RECENT POSTS
NIIT
How to Recove
Window s 7 Pas
explained how
recover the los
passw ord
usingBacktrack
Passw ord
Here i am going
introduce a new
named as Wind
Passw ord Kille.
Tw itter
Follow by Em ail
Email address..
FAQ
13. What are the benefits of OLTP?
Ans:
On Line Transaction Processing (OLTP) assists in storing current business transactional data. It also supports a large number of
concurrent users to access data at the same time.
Total Pageview s
1 2
14. Why the OLTP cannot provide history data for analysis?
Ans:
Data in a data warehouse comes from an OLTP system only. However, it cannot be directly used for analysis. The reason is that
the data in OLTP systems is not organized to give results quickly from billions of records. In a data warehouse, data is classified
into various categories and so it is possible to give the results quickly.
15. Why is the data in the data warehouse not stored in a normalized form as in OLTP?
Ans:
The objective of storing data in a normalized form in OLTP is to reduce redundancy and minimize disk storage. The key objective in
a data warehouse is to enhance the query response time. The easier the access to data better will be the query response time.
Hence, the normalization rules do not matter in a data warehouse.
16. An integral part of OLTP is its support for hundreds of concurrent users. The number of concurrent users supported
by a data warehouse is comparable to OLTP. Is this statement true or false? Justify your answer.
Ans:
The statement is false. This is because the number of people involved in data analysis is very low as compared to front-end users
who engage in transactional data. Moreover, the percentage of CPU usage per user is very high in case of data warehousing as
compared to OLTP users.
17. Explain why a data warehouse does not use current or OLTP data for analysis.
Ans:
The main purpose of a data warehouse is to provide historical data to analyze business trends. Therefore, historical data needs to
be a snapshot of events over time and not only on the current data.
18. What is the advantage of MOLAP as storage model?
Ans:
MOLAP dimensions provide better query performance. Here the contents of the dimension are
Analysis Server and not on a Relational Server.
http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more
2/21
8/21/2014
MOLAP.
--------------------------------------------------------------------------------------------------------FAQ
22. Name the 2 important parameters that decide the granularity of partitions.
Ans:
Two important factors that decide the granularity of partitions are the overall size and manageability of the system. Both parameters
are to be balanced against each other while deciding on a partitioning strategy. Suppose a data containing information about the
population is partitioned on the basis of state, the two maintenance related issues that could be faced by the administrator are:
n The query needs the information of all the states, such as particular languages spoken in the states, when all the states have
to be scanned.
n If the definition of state changes (state is redefined), the entire fact table needs to be built again.
23. Are there any disadvantages of data partitioning?
Ans:
Data partitioning is by and large an advantageous technique for improving performance. However, it increases the implementation
complexity and imposes constraints in query design.
24. Can partitions be indexed?
Ans:
Yes, partitions can be indexed if supported by the platform. For example, in Oracle 9i, you can create various types of partitions in
indexes.
25. If you have a huge amount of historical data, which is too old to be useful often but cannot be discarded, then can
partitioning help?
Ans:
Essentially the answer to this question depends on various factors such as availability of resources and design strategies. However,
you can partition data on the basis of the date that it was last accessed and keep the historical data on a separate partition. In fact,
you can use Stripping to keep it on a separate disk to improve access speed to the more useful data.
-----------------------------------------------------------------------------------------------------------FAQ
26. Are there any risks associated with aggregation?
Ans:
The main risk associated with aggregates is that of increase in disk storage space.
27. Once created, is an aggregate permanent?
Ans:
No, aggregates keep changing as per the need of the business. In fact, they can be taken offline or put online anytime by the
administrator. Aggregates, which have become obsolete, can also be deleted to free up disk space.
28. Can operations such as MIN and MAX value be determined once a summary table has been created?
Ans:
Operations such as MIN and MAX cannot be determined correctly once the summary table has been created. To determine their
value they must be calculated and stored at the time that the summary table was derived from the base table.
29. How much storage increase might be required in the data warehouse system when using aggregates?
Ans:
The storage needs typically increase by a factor of 1 or sometimes even 2 for aggregates.
--------------------------------------------------------------------------------------------------------FAQ
30. What are conformed dimensions?
Ans:
A conformed dimension is the one whose meaning is independent of the fact table from which it is being referred to.
31. What are virtual data marts?
Ans:
Virtual data marts are logical views of multiple physical data marts based on user requirement.
32. Which tool supports data mart based data warehouse architectures?
Ans:
Informtica is commonly used for implementing data mart based data warehouse architectures.
33. Is the data in data marts also historical like in data warehouses?
Ans:
The data in data marts is historical only to some extent. In fact, it is not the same as the data in data warehouse because of the
difference in the purpose and approaches of the two.
-----------------------------------------------------------------------------------------------------------FAQ
34. How can you classify metadata?
Ans:
http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more
3/21
8/21/2014
http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more
4/21
8/21/2014
53. What are some other Data Mining Languages and standardization of primitives apart from DMQL?
Ans:
Some other Data Mining Languages and standardizations of primitives apart from DMQL include:
n MSQL
n Mine Rule
n Query flocks based on Data log syntax
n OLEDB for DM
n CRISP-DM
54. Which Data Mining tools are used commercially?
Ans:
Some Data Mining tools used commercially are:
n Clementine
n Darwin
n Enterprise Miner
n Intelligent Miner
n Mine Set
55. How can noisy data be smoothened?
Ans:
Noisy data can be smoothened using the following techniques:
n Binning
n Clustering
n Computer/Human inspection
n Regression
--------------------------------------------------------------------------------------------------------FAQ
56. What are variations to Apriori algorithm?
Ans:
Following are some of the variations to Apriori algorithm that improves the efficiency of the original algorithm:
n Transaction reduction: Reducing the number of transaction scanned in future iterations
n Partitioning: Partitioning data to find candidate itemsets
n Sampling: Mining on a subset of the given data
n Dynamic itemset counting: Adding candidate itemsets at different points during the scan
57. Which is the best approach when we are interested in finding all possible interactions among a set of attributes?
Ans:
http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more
5/21
8/21/2014
6/21
8/21/2014
http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more
7/21
8/21/2014
Before looking into the details of each of the managers we could get a broad idea about their functionality by mapping the
processes that we studied in the previous chapter to the managers. The extracting and loading processes are taken care of by
the load manager. The processes of cleanup and transformation of data as also of back up and archiving are the duties of the
ware house manage, while the query manager, as the name implies is to take case of query management.
74. Indicate the important function of a Load Manager, Warehouse Manager.
Important function of Load Manager:
i) To extract data from the source (s)
ii) To load the data into a temporary storage device
iii) To perform simple transformations to map it to the structures of the data ware house.
Important function of Warehouse Manager:
i) Analyze the data to confirm data consistency and data integrity .
ii) Transform and merge the source data from the temporary data storage into the ware house.
iii) Create indexes, cross references, partition views etc.,.
iv) Check for normalizations.
v) Generate new aggregations, if needed.
vi) Update all existing aggregations
vii) Create backups of data.
viii) Archive the data that needs to be archived.
75. Differentiate between vertical partitioning and horizontal partitioning.
In horizontal partitioning, we simply the first few thousand entries in one partition, the second few thousand in the next and
so on. This can be done by partitioning by time, where in all data pertaining to the first month / first year is put in the first
partition, the second one in the second partition and so on. The other alternatives can be based on different sized dimensions,
partitioning an other dimensions, petitioning on the size of the table and round robin partitions. Each of them have certain
advantages as well as disadvantages.
In vertical partitioning, some columns are stored in one partition and certain other columns of the same row in a different
partition. This can again be achieved either by normalization or row splitting. We will look into their relative trade offs.
76.What is schema? Distinguish between facts and dimensions.
A schema, by definition, is a logical arrangements of facts that facilitate ease of storage and retrieval, as described by the end
http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more
8/21
8/21/2014
users. The end user is not bothered about the overall arrangements of the data or the fields in it. For example, a sales
executive, trying to project the sales of a particular item is only interested in the sales details of that item where as a tax
practitioner looking at the same data will be interested only in the amounts received by the company and the profits made.
The star schema looks a good solution to the problem of ware housing. It simply states that one should identify the facts and
store it in the read-only area and the dimensions surround the area. Whereas the dimensions are liable to change, the facts are
not. But given a set of raw data from the sources, how does one identify the facts and the dimensions? It is not always easy,
but the following steps can help in that direction.
i) Look for the fundamental transactions in the entire business process. These basic entities
are the facts.
ii) Find out the important dimensions that apply to each of these facts. They are the candidates
for dimension tables.
iii) Ensure that facts do not include those candidates that are actually dimensions, with a set of
facts attached to it.
iv) Ensure that dimensions do not include these candidates that are actually facts.
77. What is an event in data warehousing? List any five events.
An event is defined as a measurable, observable occurrence of a defined action. If this definition is quite vague, it is because it
encompasses a very large set of operations. The event manager is a software that continuously monitors the system for the
occurrence of the event and then take any action that is suitable (Note that the event is a measurable and observable
occurrence). The action to be taken is also normally specific to the event.
A partial list of the common events that need to be monitored are as follows:
i). Running out of memory space.
ii). A process dying
iii). A process using excessing resource
iv). I/O errors
v). Hardware failure
78. What is summary table? Describe the aspects to be looked into while designing a summary table.
The main purpose of using summary tables is to cut down the time taken to execute a specific query.
The main methodology involves minimizing the volume of data being scanned each time the query is to be
answered. In other words, partial answers to the query are already made available. For example, in the
above cited example of mobile market, if one expects
i) the citizens above 18 years of age
ii) with salaries greater than 15,000 and
iii) with professions that involve traveling are the potential customers, then, every time the query is to be processed (may be
every month or every quarter), one will have to look at the entire data base to compute these values and then combine them
suitably to get the relevant answers. The other method is to prepare summary tables, which have the values pertaining toe ach
of these sub-queries, before hand, and then combine them as and when the query is raised.
Summary table are designed by following the steps given below:
i) Decide the dimensions along which aggregation is to be done.
ii) Determine the aggregation of multiple facts.
iii) Aggregate multiple facts into the summary table.
iv) Determine the level of aggregation and the extent of embedding.
v) Design time into the table.
vi) Index the summary table.
79 List the significant issues in automatic cluster detection.
Most of the issues related to automatic cluster detection are connected to the kinds of questions we want to be answered in
the data mining project, or data preparation for their successful application.
i). Distance measure
Most clustering techniques use for the distance measure the Euclidean distance formula (square root of the sum of the squares
of distances along each attribute axes).
Non-numeric variables must be transformed and scaled before the clustering can take place. Depending
on this transformations, the categorical variables may dominate clustering results or they may be even
completely ignored.
ii). Choice of the right number of clusters
If the number of clusters k in the K-means method is not chosen so to match the natural structure of the data, the results will
not be good. The proper way t alleviate this is to experiment with different values for k. In principle, the best k value will
exhibit the smallest intra-cluster distances and largest inter-cluster distances.
iii). Cluster interpretation
Once the clusters are discovered they have to be interpreted in order to have some value for the data mining project.
81.Define data marting. List the reasons for data marting.
The data mart stores a subset of the data available in the ware house, so that one need not always have to scan through the
entire content of the ware house. It is similar to a retail outlet. A data mart speeds up the queries, since the volume of data to
be scanned is much less. It also helps to have tail or made processes for different access tools, imposing control strategies
http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more
9/21
8/21/2014
etc.,.
Following are the reasons for which data marts are created:
i) Since the volume of data scanned is small, they speed up the query processing.
ii) Data can be structured in a form suitable for a user access too
iii) Data can be segmented or partitioned so that they can be used on different platforms and also different control strategies
become applicable.
82. Explain how to categorize data mining system.
There are many data mining systems available or being developed. Some are specialized systems dedicated to a given data
source or are confined to limited data mining functionalities, other are more versatile and comprehensive. Data mining systems
can be categorized according to various criteria among other classification are the following:
a) Classification according to the type of data source mined: this classification categorizes data mining systems according to the
type of data handled such as spatial data, multimedia data, time-series data, text data, World Wide Web, etc.
b) Classification according to the data model drawn on: this classification categorizes data mining systems based on the data
model involved such as relational database, object-oriented database, data warehouse, transactional, etc.
c) Classification according to the king of knowledge discovered: this classification categorizes data mining systems based on
the kind of knowledge discovered or data mining functionalities, such as characterization, discrimination, association,
classification, clustering, etc. Some systems tend to be comprehensive systems offering several data mining functionalities
together.
d) Classification according to mining techniques used: Data mining systems employ and provide different techniques. This
classification categorizes data mining systems according to the data analysis approach used such as machine learning, neural
networks, genetic algorithms, statistics, visualization, database oriented or data warehouse-oriented, etc.
83. List and explain different kind of data that can be mined.
Different kind of data that can be mined are listed below:i). Flat files: Flat files are actually the most common data source for data mining algorithms, especially at the research level.
ii). Relational Databases: A relational database consists of a set of tables containing either values of entity attributes, or
values of attributes from entity relationships.
iii). Data Warehouses: A data warehouse as a storehouse, is a repository of data collected from multiple data sources (often
heterogeneous) and is intended to be used as a whole under the same unified schema.
iv). Multimedia Databases: Multimedia databases include video, images, audio and text media. They can be stored on
extended object-relational or object-oriented databases, or simply on a file system.
v). Spatial Databases: Spatial databases are databases that in addition to usual data, store geographical information like
maps, and global or regional positioning.
vi). Time-Series Databases: Time-series databases contain time related data such stock market data or logged activities.
These databases usually have a continuous flow of new data coming in, which sometimes causes the need for a challenging real
time analysis.
vii). World Wide Web: The World Wide Web is the most heterogeneous and dynamic repository available. A very large
number of authors and publishers are continuously contributing to its growth and metamorphosis and a massive number of
users are accessing its resources daily.
84. Give the syntax for task relevant data specification.
Syntax for tax-relevant data specification:The first step in defining a data mining task is the specification of the task-relevant data, that is, the data on which mining is to
be performed. This involves specifying the database and tables or data warehouse containing the relevant data, conditions for
selecting the relevant data, the relevant attributes or dimensions for exploration, and instructions regarding the ordering or
grouping of the data retrieved. DMQL provides clauses for the clauses for the specification of such information, as follows:i). use database (database_name) or use data warehouse (data_warehouse_name): The use clause directs the mining
task to the database or data warehouse specified.
ii). from (relation(s)/cube(s)) [where(condition)]: The from and where clauses respectively specify the database tables or
data cubes involved, and the conditions defining the data to be retrieved.
iii). in relevance to (attribute_or_dimension_list): This clause lists the attributes or dimensions for exploration.
iv). order by (order_list): The order by clause specifies the sorting order of the task relevant data.
v). group by (grouping_list): the group by clause specifies criteria for grouping the data.
vi). having (conditions): The having cluase specifies the condition by which groups of data are considered relevant.
85. Explain the designing of GUI based on data mining query language.
A data mining query language provides necessary primitives that allow users to communicate with data mining systems. But
novice users may find data mining query language difficult to use and the syntax difficult to remember. Instead , user may prefer
to communicate with data mining systems through a graphical user interface (GUI). In relational database technology , SQL
serves as a standard core language for relational systems , on top of which GUIs can easily be designed. Similarly, a data
mining query language may serve as a core language for data mining system implementations, providing a basis for the
development of GUI for effective data mining.
A data mining GUI may consist of the following functional components:a) Data collection and data mining query composition - This component allows the user to specify task-relevant data sets
and to compose data mining queries. It is similar to GUIs used for the specification of relational queries.
http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more
10/21
8/21/2014
b) Presentation of discovered patterns This component allows the display of the discovered patterns in various forms,
including tables, graphs, charts, curves and other visualization techniques.
c) Hierarchy specification and manipulation - This component allows for concept hierarchy specification , either manually
by the user or automatically. In addition , this component should allow concept hierarchies to be modified by the user or
adjusted automatically based on a given data set distribution.
d) Manipulation of data mining primitives This component may allow the dynamic adjustment of data mining thresholds,
as well as the selection, display and modification of concept hierarchies. It may also allow the modification of previous data
mining queries or conditions.
e) Interactive multilevel mining This component should allow roll-up or drill-down operations on discovered patterns.
f) Other miscellaneous information This component may include on-line help manuals, indexed search , debugging and
other interactive graphical facilities.
86. Explain how decision trees are useful in data mining.
Decision trees are powerful and popular tools for classification and prediction. The attractiveness of tree-based methods is due
in large part to the fact that, it is simple and decision trees represent rules. Rules can readily be expressed so that we humans
can understand them or in a database access language like SQL so that records falling into a particular category may be
retrieved.
87 Identify an application and also explain the techniques that can be incorporated in solving the problem using
data mining techniques.
Write yourself...
88.Write a short notes on :
i) Data Mining Querying Language
ii) Schedule Manager
iii) Data Formatting.
i) Data Mining Querying Language
A data mining language helps in effective knowledge discovery from the data mining systems. Designing
a comprehensive data mining language is challenging because data mining covers a wide spectrum of
tasks from data characterization to mining association rules, data classification and evolution analysis.
Each task has different requirements. The design of an effective data mining query language requires a
deep understanding of the power, limitation and underlying mechanism of the various kinds of data mining
tasks.
ii) Schedule manager
The scheduling is the key for successful warehouse management. Almost all operations in the ware
house need some type of scheduling. Every operating system will have its own scheduler and batch
control mechanism. But these schedulers may not be capable of fully meeting the requirements of a data
warehouse. Hence it is more desirable to have specially designed schedulers to manage the operations.
iii) Data formatting
Final data preparation step which represents syntactic modifications to the data that do not change its
meaning, but are required by the particular modelling tool chosen for the DM task. These include:
a). reordering of the attributes or records: some modelling tools require reordering of the attributes
(or records) in the dataset: putting target attribute at the beginning or at the end, randomizing
order of records (required by neural networks for example)
b). changes related to the constraints of modelling tools: removing commas or tabs, special
characters, trimming strings to maximum allowed number of characters, replacing special
characters with allowed set of special characters.
---------------------------------------------------------------------------------------------------------------
Diagram:-
http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more
11/21
8/21/2014
The Concept of Database :We have seen in the previous section how data can be stored in computer. Such stored data becomes
a database a collection of data. For example, if all the marks scored by all the students of a class are
stored in the computer memory, it can be called a database. From such a database, we can answer
questions like who has scored the highest marks? ; In which subject the maximum number of students
have failed?; Which students are weak in more than one subject? etc. Of course, appropriate programs
have to be written to do these computations. Also, as the database becomes too large and more and more
data keeps getting included at different periods of time, there are several other problems about maintaining
these data, which will not be dealt with here.
Since handling of such databases has become one of the primary jobs of the computer in recent years,
it becomes difficult for the average user to keep writing such programs. Hence, special languages
called database query languages- have been deviced, which makes such programming easy, there languages
help in getting specific queries answered easily.
93. With example explain the different views of a data.
Data is normally stored in tabular form, unless storage in other formats becomes advantageous, we
store data in what are technically called relations or in simple terms as tables.
The views are Mainly 2 types .
i). Simple View
ii). Complex View
Simple view:
- It is created by selecting only one table.
- It does not contains functions.
- it can perform DML (SELECT,INSERT,UPDATE,DELETE,MERGE, CALL,LOCK TABLE) operations through
simple view.
Complex view :
-It is created by selecting more than one table.
-It can performs functions.
-You can not perform always DML operations through
94 Briefly explain the concept of normalization.
Normalization is dealt with in several chapters of any books on database management systems. Here, we will take the simplest
definition, which suffices our purpose namely any field should not have subfields.
http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more
12/21
8/21/2014
http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more
13/21
8/21/2014
The ware house manger can be easily termed to be the most complex of the ware house components, and performs a variety of
tasks. A few of them can be listed below.
i) Analyze the data to confirm data consistency and data integrity.
http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more
14/21
8/21/2014
http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more
15/21
8/21/2014
Let us elaborate a little on the example. Consider a customer A. If there is a situation, where the
warehouse is building the profiles of customer, then A becomes a fact - against the name A, we can list his address, purchases,
debts etc. One can ask questions like how many purchases has A made in the last 3 months etc. Then A is fact. On the other
hand, if it is likely to be used to answer questions like how many customers have made more than 10 purchases in the last 6
months, and one uses the data of A, as well as of other customers to give the answer, then it becomes a fact table. The rule is,
in such cases, avoid making A as a candidate key.
http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more
16/21
8/21/2014
This is a 2 dimensional table. One the other hand, if the company wants a data of all items sold by its outlets, it can be done by
simply by superimposing the 2 dimensional table for each of these items one behind the other. Then it becomes a 3 dimensional
view.
Then the query, instead of looking for a 2 dimensional rectangle of data, will look for a 3 dimensional cuboid of data.
There is no reason why the dimensioning should stop at 3 dimensions. In fact almost all queries can be thought of as approaching
a multi-dimensioned unit of data from a multidimensioned volume of the schema.
108. Why partitioning is needed in large data warehouse?
Partitioning is needed in any large data ware house to ensure that the performance and manageability is improved. It can help the
query redirection to send the queries to the appropriate partition, thereby reducing the overall time taken for query processing.
109. Explain the types of partitioning in detail.
i). Horizontal partitioning :This is essentially means that the table is partitioned after the first few thousand entries, and the next
few thousand entries etc. This is because in most cases, not all the information in the fact table needed all the time. Thus
horizontal partitioning helps to reduce the query access time, by directly cutting down the amount of data to be scanned by the
queries.
ii). Vertical partitioning :As the name suggests, a vertical partitioning scheme divides the table vertically i.e. each row is
divided into 2 or more partitions.
iii). Hardware partitioning :Needless to say, the dataware design process should try to maximize the performance of the system. One of the ways to ensure
this is to try to optimize by designing the data base with respect to specific hardware architecture.
110. Explain the mechanism of row splitting.
Row Splitting :The method involved identifying the not so frequently used fields and putting them into another table.
This would ensure that the frequently used fields can be accessed more often, at much lesser computation time.
It can be noted that row splitting may not reduce or increase the overall storage needed, but normalization may involve a change
in the overall storage space needed. In row splitting, the mapping is 1 to 1 whereas normalization may produce one to many
relationships.
http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more
17/21
8/21/2014
Guidelines used for hardware partitioning :Needless to say, the dataware design process should try to maximize the performance of the system. One of the ways to
ensure this is to try to optimize by designing the data base with respect to specific hardware architecture. Obviously, the exact
details of optimization depends on the hardware platforms. Normally the following guidelines are useful:i). maximize the processing, disk and I/O operations.
ii). Reduce bottlenecks at the CPU and I/O
112. What is aggregation? Explain the need of aggregation. Give example.
Aggregation : Data aggregation is an essential component of any decision support data ware house. It helps us to ensure a
cost effective query performance, which in other words means that costs incurred to get the answers to a query would be
more than off set by the benefits of the query answer. The data aggregation attempts to do this by reducing the processing
power needed to process the queries. However, too much of aggregations would only lead to unacceptable levels of
operational costs.
Too little of aggregations may not improve the performance to the required levels. A file balancing of
the two is essential to maintain the requirements stated above. One thumbrule that is often suggested is that about three out of
every four queries would be optimized by the aggregation process, whereas the fourth will take its own time to get
processed. The second, though minor, advantage of aggregations is that they allow us to get the overall trends in the data.
While looking at individual data such overall trends may not be obvious, whereas aggregated data will help us draw certain
conclusions easily.
113. Explain the different aspects for designing the summary table.
Summary table are designed by following the steps given below :i). Decide the dimensions along which aggregation is to be done.
ii). Determine the aggregation of multiple facts.
iii). Aggregate multiple facts into the summary table.
iv). Determine the level of aggregation and the extent of embedding.
v). Design time into the table.
vi). Index the summary table.
114. Give the reasons for creating the data mart.
The following are the reasons for which data marts are created :i). Since the volume of data scanned is small, they speed up the query processing.
ii). Data can be structured in a form suitable for a user access too
iii). Data can be segmented or partitioned so that they can be used on different platforms and
also different control strategies become applicable.
115. Explain the two stages in setting up data marts.
There are two stages in setting up data marts :i). To decide whether data marts are needed at all. The above listed facts may help you to
decide whether it is worth while to setup data marts or operate from the warehouse itself.
The problem is almost similar to that of a merchant deciding whether he wants to set up retail
shops or not.
ii). If you decide that setting up data marts is desirable, then the following steps have to be gone
through before you can freeze on the actual strategy of data marting.
a) Identify the natural functional splits of the organization.
b) Identify the natural splits of data.
c) Check whether the proposed access tools have any special data base structures.
d) Identify the infrastructure issues, if any, that can help in identifying the data marts.
e) Look for restrictions on access control. They can serve to demarcate the warehouse
details.
116. What are disadvantages of data mart?
There are certain disadvantages :-
http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more
18/21
8/21/2014
Meta data should be able to describe data as it resides in the data warehouse. This will help the warehouse manager to control
data movements. The purpose of the metadata is to describe the objects in the database. Some of the descriptions are listed
here.
Tables
- Columns
* Names
* Types
Indexes
- Columns
* Name
* Type
Views
- Columns
* Name
* Type
Constraints
- Name
- Type
- Table
* Columns
120. How the query manager uses the Meta data? Explain in detail.
Meta data is also required to generate queries. The query manger uses the metadata to build a history of all queries run and
generator a query profile for each user, or group of uses.
We simply list a few of the commonly used meta data for the query. The names are self explanatory.
o Query
o Table accessed
Column accessed
Name
Reference identifier
o Restrictions applied
o Column name
o Table name
o Reference identifier
o Restrictions
o Join criteria applied
o Column name
o Table name
o Reference identifier
o Column name
o Table name
o Reference identifier
o Aggregate function used
o Column name
http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more
19/21
8/21/2014
Ware house Manager :The warehouse manager is responsible for maintaining data of the ware house. It should also create
and maintain a layer of meta data. Some of the responsibilities of the ware house manager are
o Data movement
o Meta data management
o Performance monitoring
o Archiving.
Data movement includes the transfer of data within the ware house, aggregation, creation and
maintenance of tables, indexes and other objects of importance. It should be able to create new aggregations as well as
remove the old ones. Creation of additional rows / columns, keeping track of the aggregation processes and creating meta data
are also its functions.
124. What are the different system management tools used for data warehouse?
The different system management tools used for data warehouse :i). Configuration managers
ii). schedule managers
iii). event managers
iv). database mangers
v). back up recovery managers
vi). resource and performance a monitors.
-----------------------------------------------------------------------------------------------------
http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more
20/21
8/21/2014
No comments:
Post a Comment
Enter your comment...
Comment as:
Publish
Google Account
Preview
Newer Post
Home
Older Post
Subscribe To
Posts
Comments
Satyendra Sharma
Add to circles
10 have me in circles
Labels
4th sem (1) 6 Sem KU Lab Practical (1) Assignments
Sem C Elective . (1) ibps (2) Icons Pack (2) Notification (1)
PRACTICAL
Bsc.IT -51 (1) solved Answers Bsc.IT -52 (1) solved Answers Bsc.IT -53 (1) solved Answers Bsc.IT -54 (1) solved Answers Bsc.IT -61 (1) solved Answers Bsc.IT -62 (1) solved Answers Bsc.IT -63 (1) solved Answers Bsc.IT -64 (1)
http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more
21/21