Sunteți pe pagina 1din 25

REGULATION 2013 ACADEMIC YEAR: 2017-2018

IFET COLLEGE OF ENGINEERING


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
IT6702/DATA WAREHOUSING AND DATA MINING
UNIT-1 DATA WAREHOUSING
100 % Theory
YR/SEM: III/VI

PART – A (2 marks)

Data Warehousing Components


1. Define data warehouse.
2. Define operational databases?
3. How is a data warehouse different from database? How they are similar?
(Apr/May 2017)
4. Write a brief outline of MDDBs
5. List out the functions of SCT tools.
6. Suppose that a data warehouse contains 20 dimensions, each with about five levels of
granularity. Users are mainly interested in four particular dimensions, each having three
frequently accessed levels for rolling up and drilling down. How would we design a data
cube structure to efficiently support this preference?
7. Define Data and Database Heterogeneity.
Data Warehouse
8. List out the characteristics of Data Warehouse (Nov/Dec 2015)
9. What is data warehouse Metadata? (Nov/Dec 2014) (Apr/May 2015)
10. Suppose that a data warehouse contains 10 dimensions, each with about five levels of
granularity. At times, a user may want to drill through the cube, down to the raw data for
one or two particular dimensions. How would you support this feature?

11. Write a brief outlineof Technical Metadata?


12. Write short notes on Business Metadata.
13. Suppose that we need to record three measures in a data cube: min, average, and median.
Design an efficient computation and storage method for each measure given that the cube
allows data to be deleted incrementally (i.e., in small portions at a time) from the cube.

14. Mention the characteristics of information directory or metadata.


15. Listout thecategories of access tools?
Query & Reporting Tools
16. What are the types of query and reporting tools? (Apr/May 2009)
17. A popular data warehouse implementation is to construct a multidimensional database,
known as a data cube. Unfortunately, this may often generate a huge, yet very sparse
multidimensional matrix. Present an example illustrating such a huge and sparse data cube.

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


1
REGULATION 2013 ACADEMIC YEAR: 2017-2018

18. What are the business applications of reporting and query tools?
19. State why, for the integration of multiple heterogeneous information sources, many
companies in industry prefer the update-driven approach (which constructs and uses data
warehouses), rather than the query-driven approach (which applies wrappers and
integrators). Describe situations where the query-driven approach is preferable over the
update-driven approach.

20. Define data mining tools. (Nov/Dec 2014)


Data Mart
21. Define the term Data mart and explain why independent data mart is dangerous?
22. A flight data warehouse for a travel agent consists of six dimensions: traveler, departure
(city), departure time, arrival, arrival time, and flight; and two measures: count, and avg
fare, where avg fare stores the concrete fare at the lowest level but average fare at other
levels. Suppose the cube is fully materialized. Starting with the base cuboid [traveller,
departure, departure time, arrival, arrival time, flight], what specific OLAP operations (e.g.,
roll-up flight to airline) should one perform in order to list the average fare per month for
each business traveller who flies American Airlines (AA) from L.A. in the year 2014?

23. Which are Data Marts? (May/Jun 2016)


24. In what situation the business drivers are underlying data mart?
25. A flight data warehouse for a travel agent consists of six dimensions: traveler, departure
(city), departure time, arrival, arrival time, and flight; and two measures: count, and avg
fare, where avg fare stores the concrete fare at the lowest level but average fare at other
levels. Suppose we want to compute a data cube where the condition is that the minimum
number of records is 10 and the average fare is over $500. Outline an efficient cube
computation method (based on common sense about flight data distribution).
26. Mention the problems of Data marts. (Apr/May 2009)
27. What are data cubes? (May/Jun 2016)
Building Data Warehouse
28. What are the factors to build and use data warehouse? (Apr/May 2014)
29. Consider the following multifeature cube query: Grouping by all subsets of {item, region, month},
find the minimum shelf life in 2014 for each group, and the fraction of the total sales due to tuples
whose price is less than $100, and whose shelf life is between 1.25 and 1.5 of the minimum shelf
life. Express the query in extended SQL.

30. When can we implement the top-down approach and mention its advantages?
31. Write a brief outline on bottom-up approach?
32. Consider the following multifeature cube query: Grouping by all subsets of {item, region, month},
find the minimum shelf life in 2014 for each group, and the fraction of the total sales due to tuples
whose price is less than $100, and whose shelf life is between 1.25 and 1.5 of the minimum shelf
life. Is this a distributive multifeature cube? Comment it

33. Mention the advantages and disadvantages of bottom-up approach.

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


2
REGULATION 2013 ACADEMIC YEAR: 2017-2018

34. What are the common characteristics of successful data warehouses?


35. Suppose that the data for analysis includes the attribute age. The age values for the data tuples are
(in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35,
36, 40, 45, 46, 52, 70. What is mean and median of the median?
36. Why data warehouse is rather difficult to build?
37. What is the design approach of data warehouse? (Apr/May 2014)
38. What is meant by data content?
39. Suppose a group of 12 sales price records has been sorted as follows:5; 10; 11; 13; 15; 35; 50; 55;
72; 92; 204; 215: Partition them into three bins by each using equal-frequency partitioning method.
40. What is the use of tools in data warehouse?
41. Define OLTP?
42. Suppose a group of 12 sales price records has been sorted as follows: 5; 10; 11; 13; 15; 35; 50; 55;
72; 92; 204; 215: Partition them into three bins by each using equal-Width partitioning method.
43. What are the nine decisions in the design of data warehouse? (Nov/Dec 2016)
44. Which platform is best to build a successful Data warehouse? (Apr/May 2014)
45. What are the logical steps need to implement a data warehouse?
46. Write a brief outline on balanced approach?
47. Define vendor solutions.
48. Write Short notes on prism solutions.
49. What are the options to grow data warehouse for data placement? .
50. What is the main idea of database gateways?
51. What are the characteristics of Directory Manager?
52. State why data partitioning is a key requirement for effective parallel execution of database
operations (Nov/Dec 2015)
Data Warehouse Benefits
53. List out the benefits of Data Warehousing.
54. What are the examples of Tangible Benefits and intangible benefits?
55. Write about parallel database technology.
56. Is the wrong partitioning method creates hot spots. Justify? (Nov/Dec 2009)
Data Warehouse Memory Architecture
57. What is shared memory architecture (SMA)? (Apr/May2009)
58. What are the advantages and dis advantages of shared memory systems?
59. Define shared-disk architecture and its characteristics?
60. Suppose a group of 12 sales price records has been sorted as follows: 5; 10; 11; 13; 15; 35;
50; 55; 72; 92; 204; 215: Partition them into three bins by each using Clustering method.
61. What are the components of DLM?
62. What are the advantages and dis advantages of shared- disk architecture?
63. What are the advantages and disadvantages of SNA?
64. Compare Shared-Nothing Architecture and Shared-Disk Architecture.
65. What are the requirements of Shared-nothing architecture?
Data Dimensional Modeling
66. What are the basic concepts of dimensional modeling? (Apr/May 2014)

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


3
REGULATION 2013 ACADEMIC YEAR: 2017-2018

67. What is star schema explain with an example? Or Define Star Schema
(Nov/Dec 2014) (May/Jun 2012) (Nov/Dec 2016)
68. Explain what the overview of SYBASE IQ is?
69. What is bitmapped indexing? (Apr/May 2011)
70. How could we summarize the concept of Data cardinality?
71. What is Data Transformation? Give example. (Apr/May 2011)
72. Write short notes on Data replication tools.
73. Suppose that a data warehouse consists of the three dimensions time, doctor, and patient,
and the two measures count and charge, where charge is the fee that a doctor charges a
patient for a visit. Write an SQL query assuming the data is stored in a relational database
with the schema fee (day, month, year, doctor, hospital, patient, count, charge).
74. List out the features of Metacenter?
75. List out the benefits of the Integrity tool.
Metadata
76. What is metadata dictionary?
77. Define Metadata with an example. (Apr/May 2015) (Nov/Dec 2014)
78. List the contents of meta data Repository (May/Jun 2016)
79. Draw the framework of Metadata Interchange Framework.
80. List the components of Metadata interchange frameworks.

PART- B (13 marks)


1. Give the steps for design and construction of Data warehouses and explain with three
tier architecture diagram. Or Explain Seven components of Data warehouse
architecture with neat diagram. (Apr 2015) (May 2016) (Nov 2016) (13)
2. Discuss DBMS Schemas for decision Support. Describe Performance problem with
star schema. (Nov 2016) (13)
3. What is Data warehouse? Diagrammatically illustrate and discuss the data warehousing
architecture? (Nov 2011) (Nov 2014) (Apr 2015) (13)
4. (i) List and discuss the steps involved in mapping the data warehouse to a
multiprocessor architecture. (Nov 2014) (Apr 2014,2017) (Nov 2011) (7)
(ii) Explain the evolution of database technology. (Nov 2014) (6)
5. (i) Explain with diagrammatic illustration the relationship between operational
data, a data warehouse and data marts. (5)
(ii) “A data warehouse can be modeled by either a star schema or a snowflake
schema”. With relevant examples discuss the two types of schema.
(Nov 2015) (8)
6. Discuss Data Extraction, Clean up and transformation tools with Meta data
management. (Apr 2017) (13)

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


4
REGULATION 2013 ACADEMIC YEAR: 2017-2018

7. (i) Explain about Access tools and Data marts? (6)


(ii) What are the benefits for building as data warehousing? Explain (7)
8. What is Meta Data? Classify Meta data and explain the same (May 2014) (13)
9. Briefly compare the following concepts. You may use an example to explain your points
Snowflake schema, fact constellation, starnet query model
(b) Data cleaning, data transformation, refresh
(c) Enterprise warehouse, data mart, virtual warehouse (13)
10. Suppose that a data warehouse for Big University consists of the following four dimensions:
student, course, semester, and instructor, and two measures count and avg_grade. When at
the lowest conceptual level (e.g., for a given student, course, semester, and instructor
combination), the avg_grademeasure stores the actual course grade of the student. At higher
conceptual levels, avg_gradestores the average grade for the given combination.
(a) Draw a snowflake schema diagram for the data warehouse.

(b) Starting with the base cuboid [student; course; semester; instructor], what specific OLAP
operations (e.g., roll-up from semester to year) should one perform in order to list the
average grade of CS courses for each Big University student.

(c) If each dimension has five levels (including all), such as “student < major < status <
university <all", how many cuboids will this cube contain (including the base and apex
cuboids)? (13)

11. Design a data warehouse for a regional weather bureau. The weather bureau has about 1,000
probes, which are scattered throughout various land and ocean locations in the region to
collect basic weather data, including air pressure, temperature, and precipitation at each hour.
All data are sent to the central station, which has collected such data for over 10 years. Your
design should facilitate efficient querying and on-line analytical processing, and derive
general weather patterns in multidimensional space. (13)

12. Suppose that a data warehouse consists of the four dimensions, date, spectator, location, and
game, and the two measures, count and charge, where charge is the fare that a spectator pays
when watching a game on a given date. Spectators may be students, adults, or seniors, with
each category having its own charge rate.
Draw a star schema diagram for the data warehouse.
(b) Starting with the base cuboid [date; spectator; location; game], what specific OLAP
operations should one perform in order to list the total charge paid by student spectators
at GM Place in 2014?
(c) Bitmap indexing is useful in data warehousing. Taking this cube as an example,
briefly discuss advantages and problems of using a bitmap index structure. (13)

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


5
REGULATION 2013 ACADEMIC YEAR: 2017-2018

PART-C (15 MARKS)

1. Describe the differences between the following approaches for the integration of a data
mining system with a database or data warehouse system: no coupling, loose coupling,
semitight coupling, and tight coupling. State which approach you think is the most
popular, and why. (15)
2. Briefly describe the following advanced database systems and applications: object-
relational databases, spatial databases, text databases, multimedia databases, the World
Wide Web. (15)
3. State why for the integration of multiple heterogeneous information sources many
companies in industry prefer the update-driven approach (which con- structs and uses
data warehouses), rather than the query-driven approach (which applies wrappers and
integrators). Describe situations where the query-driven approach is preferable over the
update-driven approach. (15)

4. Recent applications pay special attention to spatiotemporal data streams. A


spatiotemporal data stream contains spatial information that changes over time, and is in
the form of stream data, i.e., the data flow in-and-out like possibly infinite streams.
(a) Present three application examples of spatiotemporal data streams.
(b) Discuss what kind of interesting knowledge can be mined from such data
streams, with limited time and resources.
(c) Identify and discuss the major challenges in spatiotemporal data mining.
(d) Using one application example, sketch a method to mine one kind of
knowledge from such stream data efficiently. (15)
5. Robust data loading poses a challenge in database systems because the input data are
often dirty. In many cases, an input record may have several missing values and some
records could be contaminated (i.e., with some data values out of range or of a different
data type than expected). Work out an automated data cleaning and loading algorithm so
that the erroneous data will be marked and contaminated data will not be mistakenly
inserted into the database during data loading. (15)

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


6
REGULATION 2013 ACADEMIC YEAR: 2017-2018

UNIT– 2 BUSINESS ANALYSIS


PART – A (2 marks)

Reporting and Query tools:


1. Write short notes on Leading Report writers.
2. How would you classify the type of reporting tools and explain.
3. Define the categories of tools in business analysis. (Nov/Dec 2014)
4. What are the different types of tools for user and related activities?
Tool Categories:
5. Define Meta layer in managed query tools.
6. What is the use of EIS?
7. List out the tools of data mining?
8. What is the use of Point and click tool?
9. Mention the various access types used in data warehouse.
10. How can we identify the distinct types of reporting?
Cognos Impromptu:
11. What is the overview of impromptu?
12. Is Impromptu a database reporting tool? Why?
13. Write short notes user acceptance of Impromptu.
14. Define Information catalog?
15. How to create a catalog?
16. List the different types of prompt in Cognos?
17. What is object oriented architecture?
18. What are the activities controls by governors?
19. How impromptu increases the value of distributed standard reports?
20. To get the possible impromptu outcomes what types of variables can be use?
21. Write a brief outline of frames and mention its types.
22. What is the main idea of Impromptu Request server?
23. What are the supported databases for Impromptu Server?
24. List out the features of Impromptu.
OLAP and Multidimensional OLAP:
25. What is Multidimensional OLAP? (May/Jun 2015)
26. What is the multidimensional data model? Give example. (Apr/May 2017)
27. List out the Needs and limitation of OLAP?
28. State the needs of a multidimensional data model (Nov/Dec 2015)
29. Where a Multidimensional Data Model is typically used?
(May/Jun 2015) (Apr/May 2015)
30. Which server tire is responsible for transforming information to Cognos 8 Application
Server?
31. What are the operations in multidimensional data model?
OLAP Guidelines:

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


7
REGULATION 2013 ACADEMIC YEAR: 2017-2018

32. List OLAP Guidelines. (Nov/Dec 2016)


33. Write short notes on Transparency and Accessibility in OLAP Guidelines.
34. Define consistent reporting performance & multiuser support?
35. List the distinct features of OLTP with OLAP. (Apr/May 2017)
36. Write a brief outline of Multidimensional conceptual view and client/server architecture?
37. What is Generic dimensionality and Dynamic sparse matrix handling?
38. Write about unrestricted cross – dimensional operations and intuitive manipulation.
39. How would you summarize Flexible reporting and unlimited dimensions and aggregation
levels?
40. Write about comprehensive DBM tools and source record level.
Multidimensional versus Multi-relational OLAP:
41. Can you make a distinction between Multidimensional and Multirelational OLAP?
42. What is multi-relational OLAP? (Apr/May 2015)
43. Define Star Schema Approach?
44. Define ER Model.
45. Mention the schemas for multidimensional databases.
46. Write an example for Star Schema. (May/Jun 2012) (Nov/Dec 2014)
47. What are the main characteristics of star schema? (Nov/Dec 2014)
48. Explain the concept of snowflake schema and Fact Constellation schema with an
example?
49. How can we define a multidimensional schema for my data?
50. What is the syntax of cube definition and dimension definition statement?
51. What was the percent change in market share for a grouping of my top 20% of products
for the current three-month period versus same period year ago for accounts that grew
by more than 20 percent in revenue?
52. How measures can be categorized?
53. Write in your own words about distributive measure?
54. Write Short notes on algebraic and Holistic measure.
55. What is concept hierarchy? (Apr/May 2008)
56. How are concept hierarchies useful in OLAP?
57. Write a brief outline Starnet model?
58. What are the classifications of tools for data mining? (Apr/May 2011)
Categories of Tools:
59. How OLAP tools assume data?
60. How would you describe MOLAP and draw its architecture.
61. How would you describe ROLAP and draw its architecture.
62. What can you say about Managed query environment (MQE)
63. List the features of web-enabled data access?
64. Difference between First- generation, second generation and third generation websites?

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


8
REGULATION 2013 ACADEMIC YEAR: 2017-2018

65. Point-of-sales data and sales made via call-center or the web are stored in different
location and formats. It would a time consuming process for an executive to obtain OLAP
reports such as – What are the most popular products purchased by customers between
the ages 15 to 30?
66. Mention the vendor approaches for deploying tools in web.
67. What was the main idea of HTML Publishing and helper application approach in web?
68. Write about Server-centric components, Java and ActiveX applications approaches in
web.
OLAP Tools and the Internet:
69. How would you summarize Arbor Essbase Web?
70. Define OLAP tool. (Apr/May 2010)
71. Comment on OLAP Tools on Internet. (Nov/Dec 2016)
72. What is Virtual warehouse? (Nov/Dec 2014)
73. Define Micro Strategy DSS Web?
74. What is Brio Technology?
75. Mention the advantages and disadvantages of MOLAP.
76. List out the advantages and disadvantage of ROLAP.
77. What are the reasons to builds a query and reporting environment?
78. Define Power builder.
79. What do you think about application painter and Window painter?
80. What is meant by Data windows painter?
81. Write about Database painter and structure painter.
82. Write a note on function painter and user object painter.

PART- B (13 marks)


1. List and Discuss the basic features that are provided by reporting and query tools
used for business analysis. (Or) Discuss different tool Categories in data warehouse
business analysis. (Nov/Dec‟16) 13
2. Highlight the features of the reporting and query tool COGNOS Impromptu.
(Nov/Dec ‟15 ‟16) (April/May ‟15 „17) 13
3. List and explain the typical OLAP operations for multidimensional data with
suitable examples and diagrammatic illustrations
(May/Jun‟16) (Apr/May‟15)(Nov/Dec‟14) 13
4. (A) Distinguish between Online Transaction Processing (OLTP) and Online
Analytical Processing (OLAP). (Nov/Dec „16) (Nov/Dec „14) 10
(B) What are the categories of Aggregate function? 3

5. (A) Perform a comparative study between MOLAP and ROLAP. 7


(B) Explain with diagrammatic illustration managed query environment (MQE)
architecture. 6

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


9
REGULATION 2013 ACADEMIC YEAR: 2017-2018

6. What are the different types of OLAP servers? (May/Jun‟16) 13


7. Summarize multi-dimensional data model and illustrate the different schemas for
multidimensional model. (Apr/May‟17‟15) (May/Jun‟12) 13

8. Explain different categories of OLAP Tools with diagram. (Apr/May „17) 13

9. In data warehouse technology, a multiple dimensional view can be implemented by a


relational database technique (ROLAP), or by a multidimensional database technique
(MOLAP), or by a hybrid database technique (HOLAP).
(a) Briefly describe each implementation technique.
(b) For each technique, explain how each of the following functions may be
implemented:
i. The generation of a data warehouse (including aggregation)
ii. Roll-up
iii. Drill-down
iv. Incremental updating
Which implementation techniques do you prefer, and why? (Nov/Dec ‘15) 13

10. (i) Design multi-dimensional data model for hospital data warehouse, consist three
dimensions time, doctor, and patient and the two measures count and charge, where
charge is a fee that a doctor charges a patients for a visit. (3+3)
(1) Enumerate three classes of schema that are popularly used for modeling
data warehouses.
(2) Draw a schema diagram for the above data warehouse using all of the
schema classes listed in (1).

(ii) How to reduce the size of the fact table? Explain with an example.
(Nov/Dec14) 7
11. Regarding the computation of measures in a data cube: 13
(a) Enumerate three categories of measures, based on the kind of aggregate functions
used in computing a data cube.
(b) For a data cube with the three dimensions time, location, and item, which category
does the function variance belong to? Describe how to compute it if the cube is
partitioned into many chunks.
Hint: The formula for computing variance is where xi is the average
of N xis.
(c) Suppose the function is \top 10 sales." Discuss how to efficiently compute this
measure in a data cube.

12. With relevant examples discuss multidimensional online analytical processing and multi
relational online analytical processing. 13

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


10
REGULATION 2013 ACADEMIC YEAR: 2017-2018

PART-C (15 MARKS)

1. Design a data warehouse for a regional weather bureau. The weather bureau has about
1,000 probes, which are scattered throughout various land and ocean locations in the
region to collect basic weather data, including air pressure, temperature, and precipitation
at each hour. All data are sent to the central station, which has collected such data for
over 10 years. Your design should facilitate efficient querying and on-line analytical
processing, and derive general weather patterns in multidimensional space. (15)
2. What are hypercube? How do they apply in an OLAP system? (15)
3. A popular data warehouse implementation is to construct a multidimensional database,
known as a datacube. Unfortunately, this may often generate a huge, yet very sparse
multidimensional matrix. Present an example illustrating such a huge and sparse data
cube. (15)
4. Suppose that a data warehouse contains 20 dimensions, each with about five levels of
granularity.
(a) Users are mainly interested in four particular dimensions, each having three
frequently accessed levels for rolling up and drilling down. How would you design a data
cube structure to efficiently support this preference?
(b) At times, a user may want to drill through the cube, down to the raw data for one or
two particular dimensions. How would you support this feature? (15)
5. For class characterization, what are the major differences between a data cube based
implementation and a relational implementation such as attribute-oriented induction?
Discuss which method is most efficient and under what conditions this is so. (15)

UNIT– 3 – DATA MINING (100% THEORY)

PART- A (2 MARKS)
Introduction:
1. What are the evolutionary paths in the development of database system?
2. What motivated data mining? Why is it so important?
Data Mining:
3. Define and Draw the architecture of Data mining system.
4. Is data mining a simple transformation of technology developed from databases, statistics,
and machine learning?
5. Present an example where data mining is crucial to the success of a business. What data
mining functions does this business need? Can they be performed alternatively by data
query processing or simple statistical analysis?
6. What is KDD? What are the steps involved in KDD?
7. List some of the data mining techniques?

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


11
REGULATION 2013 ACADEMIC YEAR: 2017-2018

8. Suppose your task as a software engineer at Big-University is to design a data mining


system to examine their university course database, which contains the following
information: the name, address, and status (e.g., undergraduate or graduate) of each
student, the courses taken, and their cumulative grade point average (GPA). Describe the
architecture you would choose. What is the purpose of each component of this
architecture?
9. Write a brief outline Pattern evaluation?
10. Define user interface module?
11. Briefly describe the following advanced database systems and applications: object-
relational databases, spatial databases, text databases, multimedia databases, the World
Wide Web.
12. List out the applicable fields of the discovered knowledge.
Data and types of Data:
13. What are the different data repositories to perform mining?
14. What are the types of data? (Nov/Dec 2014)
15. Explain how the evolution of database technology led to data mining
16. Describe why concept hierarchies are useful in data mining.
17. Difference between a data warehouse and data mart?
18. Write a brief outline of relational databases and Transactional Databases?
19. Outliers are often discarded as noise. However, one person’s garbage could be another’s
treasure. For example, exceptions in credit card transactions can help us detect the
fraudulent use of credit cards. Taking fraudulence detection as an example, propose two
methods that can be used to detect outliers and discuss which one is more reliable.
20. Draw the typical framework of a data warehouse for an electronics
21. List out the various kinds of advanced data information systems?
22. Define the following
1. Object-Relational Databases.
2. Text Databases and
3. Multimedia databases
23. How would you summarize the concept of temporal databases and sequence Database.
24. Provide an example of what you mean by spatial databases?
25. Data quality can be assessed in terms of accuracy, completeness, and consistency.
Propose two other dimensions of data quality.
26. Write Short notes on Heterogeneous and legancy databases.
27. In many applications, new data sets are incrementally added to the existing large data
sets. Thus an important consideration for computing descriptive data summary is whether
a measure can be computed efficiently in incremental manner. Use count, standard
deviation, and median as examples to show that a distributive or algebraic measure
facilitates efficient incremental computation, whereas a holistic measure does not.
28. List out the features of Data streams?

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


12
REGULATION 2013 ACADEMIC YEAR: 2017-2018

29. Illustrate the concept of Web usage mining or web log mining?
30. Use the two methods below to normalize the following group of data: 200, 300, 400, 600,
1000 min-max normalization by setting min = 0 and max = 1 (b) z-score normalization.
31. Summarize the web search services of data mining?
32. List out the primitives that satisfy a data mining task.
Data Mining Functionalities:
33. List out the data mining functionalities. (April/May 2015)
34. How would you summarize class/concept descriptions and give an example?
35. How to derive the concept / class descriptions?
36. Define Data characterization and data discrimination?
37. Illustrate the concept of frequent patterns and structured pattern?
38. Use a flow chart to summarize the stepwise forward selection procedures for attribute
subset selection.
39. Write in your own words about Multidimensional association rules?
40. Use a flow chart to summarize the stepwise backward elimination procedures for
attribute subset selection .
41. Write Short notes on Classification and Prediction.
42. Define the concept of Decision tree and Neural Network.
43. Illustrate what you think about regression analysis and relevance analysis.
44. Use a flow chart to summarize a combination of forward selection and backward
elimination procedures for attribute subset selection.
45. Define the concept of cluster analysis?
46. Write two methods that can be used to detect outliers and discuss which one is more
reliable?
Interestingness of Patterns
47. Define the term interestingness of patterns. (Nov/Dec 2014)
48. Explain what is meant by subjective interestingness measures?
49. How are the buckets determined and the attribute values partitioned?
50. Discuss data mining system generate only interesting patterns and How it generate all of
the interesting patterns?
Classification of Data Mining Systems
51. List out the categories of data mining systems?
52. Define a pattern. (Nov/Dec 2015)
53. Write in your own words about Meta learning?
54. What are the classifications of Data mining systems?
55. What is descriptive and predictive data mining?
56. What is data mining query?
Data mining task primitives:
57. List the primitives for specification of a data mining task. (Apr/May 2017)
58. What do you think about DMQL?

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


13
REGULATION 2013 ACADEMIC YEAR: 2017-2018

59. Mention the steps involved in the class comparison procedure.


60. How to integrate or couple the DM system with a DB system?
61. What are the possible integration schemes of Data mining systems?
Issues:
62. List out the issues in data mining.
Data preprocessing:
63. List out the techniques of data preprocessing?
64. Why data preprocessing is an important issue for data warehousing and data
mining? (Apr/May 2015) (Nov/Dec 2015)
65. State the need for data cleaning. (or) Why date cleaning routines are needed.
(Apr/May 2015)
66. What are the different steps in Data Transformation? (Nov/Dec 2016)
67. Why we need data transformation? Mention the ways by which data can be
transformed. (Apr/May 2017)
68. What is data discretization? (Apr/May 2017)
69. Suppose that the data for analysis includes the attribute age. The age values for the data
tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30,
33, 33, 35, 35, 35, 35, 36, 40, 45,46, 52, 70. What is the mean of the data? What is the
median?
70. List out the uses of descriptive data summarization?
71. List out the measures of central tendency and data dispersion.
72. Explain the concept of distributive measure and algebraic measure?
73. Illustrate the concept of holistic measure and Mid-range?
74. Write a note on Frequency histograms.
75. How can you go about filling in the missing values for Data cleaning?
76. How can we “smooth” out the data to remove the noise?
77. Can you explain the reasons for missing values in data cleaning?
Integration of a Data Mining System with a Data Warehouse:
78. Define Data Integration and its issues
79. Write a note on min-max and z-score normalization.
80. Suppose that the data for analysis includes the attribute age. The age values for the data
tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30,
33, 33, 35, 35, 35, 35, 36, 40, 45,46, 52, 70. Use smoothing by bin means to smooth the
above data, using a bin depth of 3. Illustrate your steps. Comment on the effect of this
technique for the given data.
81. Define Support & confidence (May/Jun 2016)
82. Illustrate the concept of data reduction and its strategies
83. State why concept hierarchies are useful in data mining. (Nov/Dec 2012)

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


14
REGULATION 2013 ACADEMIC YEAR: 2017-2018

PART – B (13 MARKS)

1. (i) What is data preprocessing? Explain the various data reduction techniques.
Or
Why do we need to preprocess data? What are the different forms of preprocessing?
(Apr/May 2017) (9)
(ii) Explain the basic methods for data cleaning. (May/June 2016) (4)

2. a) Explain with diagrammatic illustration data mining as a step in the process of


knowledge discovery (Apr/May15) (Nov/Dec15) (8)
b) What is evolution analysis? Give an example. (5)
3. State and explain the various classifications of data mining systems with example.
(Nov/Dec14) (13)
4. What is interestingness of a pattern? Explain the integration of data mining system
with a data warehouse. (May/Jun16) (13)
5. Explain different strategies of Data Reduction. (Nov/Dec16) (13)
6. What is the use of data mining tasks? What are the basic types of data mining
tasks? Explain with examples. (or) List and discuss about the primitives involved in
specifying a data mining task. (Apr/May15) (Nov/Dec15) (13)
7. a) Describe Data Discretization and concept hierarchy generation. State why
concept hierarchies are useful in data mining (Nov/Dec 16) (7)
b) Explain with diagrammatic illustration how data mining acts as a confluence of
multiple disciplines. (Nov/Dec15) (6)
8. Explain the various data mining issues and functionalities in detail
(May/Jun16)(Nov/Dec14) (13)

9. Describe in detail data mining functionalities and the different kinds of patterns can
be mined? (Apr/May17) (13)

10. Outline the major research challenges of data mining in one specific application domain,
such as stream/sensor data analysis, spatiotemporal data analysis, or bioinformatics.
(13)
11. Recent applications pay special attention to spatiotemporal data streams. A
Spatiotemporal data stream contains spatial information that changes over time, and is
in the form of stream data, i.e., the data flow in-and-out like possibly infinite streams
(a) Present three application examples of spatiotemporal data streams.
(b) Discuss what kind of interesting knowledge can be mined from such data streams,
with limited time and resources.
(c) Identify and discuss the major challenges in spatiotemporal data mining.

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


15
REGULATION 2013 ACADEMIC YEAR: 2017-2018

(d) Using one application example, sketch a method to mine one kind of knowledge from
such stream data efficiently. (13)

PART- C (15 MARKS)

6. Present an example where data mining is crucial to the success of a business. What data
mining functions does this business need? Can they be performed alternatively by data
query processing or simple statistical analysis? (15)
7. What is data mining? In your answer, address the following:
Is it another hype?
(b) Is it a simple transformation of technology developed from databases, statistics, and
machine learning?
(c) Explain how the evolution of database technology led to data mining.
(d) Describe the steps involved in data mining when viewed as a process of knowledge
discovery. (15)
8. Data quality can be assessed in terms of accuracy, completeness, and consistency.
Propose two other dimensions of data quality. (DATA PREPROCESSING) (15)
9. Give three additional commonly used statistical measures (i.e., not illustrated in this
chapter) for the characterization of data dispersion, and discuss how they can be
computed efficiently in large databases. (15)
10. Propose and outline a level-shared mining approach to mining multilevel association
rules in which each item is encoded by its level position, and an initial scan of the
database collects the count for each item at each concept level, identifying frequent and
subfrequent items. Comment on the processing cost of mining multilevel associations
with this method in comparison to mining single-level associations. (15)

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


16
REGULATION 2013 ACADEMIC YEAR: 2017-2018

UNIT -4 ASSOCIATION RULE MINING AND CLASSIFICATION


(100% THEORY)
PART – A (2 Marks)

FREQUENT PATTERNS:
1. What are frequent patterns? (Nov 2007)
2. Prove that all nonempty subsets of a frequent itemset must also be frequent.
3. What is market basket analysis? (May 2009)
4. Write the frequency notation of item set?
5. When an itemset X is said to be closed?
6. Mention the criteria’s for classifying the frequent pattern mining.
7. Explain frequent pattern mining based on the completeness of patterns.
8. How would you mine frequent patterns based on the levels of abstraction?
9. Explain frequent pattern mining based on data dimensions?
10. Give an example of frequent pattern mining based on the kinds of rules?
11. Suppose that frequent itemsets are saved for a large transaction database, DB. Discuss how
to efficiently mine the (global) association rules under the same minimum support threshold
if set of new transactions, denoted as ∆DB, is (incrementally) added in?
ASSOCIATION RULES:
12. Prove that the support of any nonempty subset s0 of itemset s must be at least as great as the
support of s.
13. What is the main idea of structural pattern matching?
14. What is rule base classification? (Nov 2011)
15. How to mine association rules from large databases? (Nov 2007)
16. List the interesting measures for association rules.
(Apr/May 2008,2009) (Nov 2012)
APRIORI ALGORITHM:
17. State the Apriori property.
18. What is an antimonotone?
19. Give the properties of Apriori algorithm.
20. Write the outline about prune step in Apriori property?
21. List the methods to improve Apriori‟s efficiency. (Nov 2016)
22. What is conditional probability?
23. Give a note on hash based techniques.
24. What is Partitioning? Give example.
25. Illustrate the main idea of local frequent itemset?
26. Write the definition of Sampling?
27. What is dynamic item set counting?
FP TREE:
28. What is frequent- pattern growth? (May 2010)
29. Generalize the special features of frequent pattern tree and closed frequent item sets?
30. The price of each item in a store is nonnegative. For each of the following cases, identify the
kinds of constraint they represent and briefly discuss how to mine such association rules
efficiently.
(a) Containing at least one Nintendo game
(b) Containing items the sum of whose prices is less than $150

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


17
REGULATION 2013 ACADEMIC YEAR: 2017-2018

31. Prepare the need for merging in Data item.


32. Write about sub- Item pruning.
33. Write about Boolean association rules. (Nov 2009)
34. List the techniques used to improve the efficiency of Apriori algorithm. (May 2010)
35. Mention few approaches to mining Multilevel Association Rules. (Nov 2010)
36. Point out the importance of item merging in data pruning.
37. Summarize the steps for construction of FP –tree.
38. Generalize the special feature for FP-growth tree algorithm.
39. Point the interesting measures of multi-level association rules?
40. What is group based support?
ASSOCIATION CLASSIFICATION:
41. Give example for intra-dimensional association and multidimensional association.
42. Summarize about inter-dimensional and hybrid-dimensional association rules.
43. Show the attributes used in categorical and quantitative attributes.
44. Association rule mining often generates a large number of rules. Discuss effective methods
that can be used to reduce the number of rules generated while still preserving most of the
interesting rules.
45. Point out the measures of two - dimensional quantitative association rules?
BINNING & ITS STRATEGIES:
46. Define binning. List the common three binning strategies.
47. Compare grid based technique and non-grid based technique.
CONSTRAINT BASED MINING:
48. What is constraint based mining?
49. Give the various constraints that are included in CBM.
50. Define the term interestingness of patterns. (Nov 2014)
51. What is Lift measure?
52. Give the formula for different pattern evaluation measures.
CORRELATION ANALYSIS:
53. Give example for correlation using 2.
54. What is correlation analysis? (May 2012, Nov 2011, May 2011)
55. How can metarules be used to guide the mining process?
56. List the classification of rule constraint.
57. Compare antimonotonic and monotonic.
58. Difference between succinct constraints and Convertible constraints.
59. Why is association-based classification able to achieve higher classification accuracy than a
classical decision-tree method?
60. Write about classification accuracy.
PREDICTION:
61. How is data prediction different from classification?
62. What is boosting? State why it may improve the accuracy of decision tree induction.
(May2016)
63. Give a note on relevance analysis.
64. The price of each item in a store is nonnegative. The store manager is only interested in rules
of the form: one free item may trigger $200 total purchases in the same transaction." State
how to mine such rules efficiently.
65. Write about decision tree induction. (Nov 2012,May 2015)

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


18
REGULATION 2013 ACADEMIC YEAR: 2017-2018

66. How do you choose best split while constructing a decision tree? (May 2014)
67. Elucidate two phase involved in decision tree induction? (Nov 2016)
68. It is important to calculate the worst-case computational complexity of the decision tree
algorithm. Given data set D, the number of attributes n, and the number of training tuples
|D|, show that the computational cost of growing a tree is at most n n | D |  log(| D |)
69. List the conditions for terminating recursive partitioning.
SELECTION MEASURES:
70. Give the formula for gain ratio and gini index.
71. What is the use of pruning in decision tree construction? (May 2013 / May2016)
72. What is the drawback of using a separate set of tuples to evaluate pruning?
73. Differentiate between prepruning and postpruning.
74. What is a support vector machines? (May 2011)
75. Develop a scalable SVM algorithm for efficient SVM classification in large datasets. (S).
BAYESIAN CLASSIFICATION:
76. What is naïve Bayesian classification? How is it different from Bayesian classification?
(May 2012)
77. State Bayes‟ theorem. (May 2016)
78. What is lazy learner? Give an example. (Nov 2014) (Apr 2017)
79. How do you evaluate accuracy of a classifier? (Apr 2017)
80. Design an efficient method that performs effective naive Bayesian classification over an
infinite data stream.
81. Compare the advantages and disadvantages of eager classification (e.g., decision tree,
Bayesian, neural network) versus lazy classification (e.g., k-nearest neighbor, case-based
reasoning).
82. Write about BOAT.
83. Define pessimistic pruning.
PART-B (13Marks)

1. a. Distinguish classification and prediction. State the issues regarding classification and
prediction. (May 2016) (4)
b. Give the algorithm for Decision Tree Induction and explain with an example.
(Nov 2012)(Nov 2011)(May 2012)(May2016) (9)
2. a. Explain about classification by Backpropagation in detail. (7)
b. Discuss in detail about constrained based association mining.
(Apr 2017) (May 2012) (6)
3. a. Write and explain algorithm for mining frequent itemsets without candidate
generation. (May 2014) (7)

b. A database has nine transactions. Let min_sup = 30%


(Nov 2012, May 2014) (6)
TITID
Li List of Items_IDs
1 a, b, e
2 b, d
3 b, c
4 a, b, d

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


19
REGULATION 2013 ACADEMIC YEAR: 2017-2018

5 a, c
6 b, c
7 a, c
8 a, b, c, e
9 a, b, c

Find all the frequent itemsets using the above algorithm

4. Find all frequent item sets for the given training set using apriori and FP growth
respectively. Compare the efficiency of the two mining processes. (Nov 2016) (13)
TID Items_bought
T100 {M,O,N,K,E,Y}
T200 {D,O,N,K,E,Y}
T300 {M,A,K,E}
T400 {M,U,C,K,Y}
T500 {C,O,O,K,I,E}

5. Apply the Apriori algorithm for discovering frequent itemsets to the following dataset.
(Nov 2011, May 2013, May2015, Nov/Dec15) (13)

Trans_ID Items purchased


101 Kiwi, grapes, star fruit
102 Kiwi, Gooseberry
103 Gooseberry, pear
104 Kiwi, grapes, star fruit
105 Lemons, star fruit
106 Lemon
107 Lemon, gooseberry
108 Kiwi, grapes, mango, star fruit
109 Mango, pear
110 Kiwi, grapes, star fruit
Use 0.3 for the minimum support value. Illustrate each step of the Apriori algorithm.

6. Discuss the Apriori algorithm for mining frequent itemset with an example in detail.
(Nov 2014, May2011, May 2010, May 2012) (May 2016) (13)

7. With an example explain various attribution selection measures in classification.


(Nov 2014) (13)

8. What is classification? Develop an algorithm for classification using Bayesian


classification. Explain with example. (May 2013, May 2012,May2015) (13)

Or

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


20
REGULATION 2013 ACADEMIC YEAR: 2017-2018

State Bayes theorem of posterior probability and explain the working of a


Bayesian classifier with an example (Or) Explain Naïve Bayesian Classification with
and sample example (Nov 2016) (13)

9. Explain Rule based classification. Give an example and explain in detail.


(Nov 2014, May 2011, May 2012) (13)
10. Discuss the single dimensional Boolean association rule mining for transaction
database. (Apr 2017) (13)

11. a. What is classification? With an example explain how support vector machines can be
used for classification. (Nov 2011) (9)

b. What are the prediction techniques supported by a data mining system?


(Nov 2011) (4)

PART-C (15 MARKS)

1. Give a short example to show that items in a strong association rule may actually be
negatively correlated. (15)
2. Sequential patterns can be mined in methods similar to the mining of association rules.
Design an efficient algorithm to mine multilevel sequential patterns from a transaction
database. An example of such a pattern is the following: “A customer who buys a PC will
buy Microsoft software within three months", on which one may drill down to find a
more refined version of the pattern, such as “A customer who buys a Pentium PC will
buy Microsoft Office within three months". (15)
3. In many applications, new data sets are incrementally added to the existing large data
sets. Thus an important consideration for computing descriptive data summary is whether
a measure can be computed efficiently in incremental manner. Use count, standard
deviation, and median as examples to show that a distributive or algebraic measure
facilitates efficient incremental computation, whereas a holistic measure does not. (15)
4. Suppose that you are in the market to purchase a data mining system.
(a) Regarding the coupling of a data mining system with a database and/or data
warehouse system, what are the differences between no coupling, loose coupling,
semi-tight coupling, and tight coupling?
(b) What is the difference between row scalability and column scalability?
(c) Which feature(s) from those listed above would you look for when selecting a
data mining system? (15)
5. Write pseudocode for the automatic generation of a concept hierarchy for numeric data
based on the equal frequency partitioning rule. (15)

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


21
REGULATION 2013 ACADEMIC YEAR: 2017-2018

UNIT– 5 CLUSTERING AND APPLICATIONS AND TRENDS IN DATA MINING


100 % THEORY
Part-A (2Marks)
CLUSTER ANALYSIS
1. What is clustering? Nov/Dec‟11, May/Jun‟16
2. What is classifier accuracy? Nov/Dec „14
3. What is cluster analysis and mention its uses?
4. State the role of cluster analysis. Nov/Dec‟16
5. Explain about cluster analysis tools?
6. Give the reason on why clustering is needed in data mining? Nov/Dec‟16
7. List out the typical requirements of clustering in data mining?
8. Write a brief outline of incremental clustering and insensitivity?
9. Give a note on high dimensionality.
10. Outline about constraint-based clustering?
TYPES OF DATA
11. How would you classify the type of data in cluster analysis?
12. Give a brief outline of interval-scaled variable?
13. What are the requirements of Euclidean distance and Manhattan distance function?
14. Let X1 = {1, 2} and X2 = {3, 5} represent two points. Calculate the Manhattan distance
between the two points. May/Jun ‘15
15. Provide a main idea for Binary variable?
16. How can we compute the dissimilarity between two binary variables?
17. Compare symmetric and asymmetric binary variable?
18. Point out the major difference between discrete and continuous ordinal variables.
19. Differentiate handle ordinal variables and Ratio-scaled variables.
20. How can you compute the dissimilarity between objects described by ratio-scaled variables?
CATEGORIZATION OF CLUSTERING METHODS
21. What are the categorizations of major clustering methods? May/June‟12
22. How to achieve global optimality in partitioning-based clustering?
23. What are the approaches to improve the quality of hierarchical clustering?
K – MEANS PARTITIONING METHOD
24. How does the K-means algorithm work?
25. Apply the procedure or method for K-means partitioning algorithm.
26. What is meant by K-Nearest Neighbor algorithm? Apr/May‟17
27. Define K-modes method.
28. Define EM algorithm.
29. How do you make the k-means algorithms more scalable?
30. How might the K-means algorithm be able to be modified to diminish sensitivity?
31. Mention the uses of frequent pattern-based clustering.
32. Write about PAM.
33. Apply four cases of the cost function for k-mediods clustering.
34. Write the algorithm for PAM, a k-medoids partition.
35. Which method is more robust k-means or k-medoids?
36. How efficient is the k-medoids algorithm on large data sets?

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


22
REGULATION 2013 ACADEMIC YEAR: 2017-2018

37. Illustrate the effectiveness and scalability of CLARA.


HIERARCHICAL METHODS
38. Classify Hierarchical clustering methods. May/Jun‟13
39. Define AGNES and DIANA.
40. Mention some of the difficulties of hierarchical clustering.
41. Give the diagrammatic representation of hierarchical clustering and divisive hierarchical
clustering on data objects {a,b,c,d,e}.
42. Define dendrogram. Apply its representation of objects {a,b,c,d,e}.
43. Point out the measures of distance between the clusters.
DENSITY & GRID BASED METHODS
44. Define Divisive Hierarchal clustering. Nov/Dec‟14
45. What is BIRCH?
46. What is objective of clustering feature (CF)?
47. List the primary phases of BIRCH.
48. Differentiate ROCK and Chameleon.
49. Differentiate relative interconnectivity and relative closeness.
50. Define DBSCAN.
51. What is OPTICS? How the object values are used in OPTIC algorithm?
52. What is DENCLUE? Mention its basic ideas.
53. What are the major advantages does DENCLUE have in comparison with other clustering
algorithms?
54. Define STING. May/Jun‟14
55. Mention the parameters of STING Clustering.
56. How statistical information is useful for query answering in STING?
57. Mention the advantages of STING over other clustering methods.
CLUSTERING HIGH DIMENSIONAL DATA
58. What is wave cluster? May/Jun‟14
59. What is wavelet transform?
60. Why wavelet transformation is useful for clustering?
61. Classify what are the services provided by model based clustering methods and mixture
density model?
62. What is conceptual clustering? Mention its process.
63. How does COBWEB decide where to incorporate it into the classification tree?
64. What are the operators used in COBWEB?
65. Summarize the needs of CLASSIT.
66. List out the properties of Neural Networks.
67. What is subspace clustering?
68. What are the basic ideas of CLIQUE clustering algorithm?
69. How does CLIQUE work?
70. How effective is CLIQUE?
71. What is PROCLUS? Generalize the phases of PROCLUS algorithm.
CONSTRAINT BASED CLUSTER ANALYSIS
72. Analyze categorization of constraint-based clustering.
73. How can we approach the problem of clustering with obstacles?
74. What are the methods of semi-supervised clustering?
75. Distinguish between classification and clustering. May/Jun‟12, Nov/Dec‟12

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


23
REGULATION 2013 ACADEMIC YEAR: 2017-2018

OUTLIER ANALYSIS
76. What is an outlier? Give an example for the library management system.
Nov‟15, May‟13, Nov‟12, May‟11, Nov‟11, Nov‟16
77. How outliers may be detected by clustering? May/Jun‟15
78. Classify Outlier detection approaches. Mention the applications of outlier detections.
Nov/Dec‟11
DATA MINING APPLICATIONS
79. List the some applications of Data Mining. May/Jun‟11 „17, Nov/Dec‟16
80. Point out the role of Data mining in financial data analysis?
81. How Data mining is useful for the retail industry?
82. Explain the application of Data mining in telecommunication industry.
83. Why do we need Data mining in biological data analysis?
PART – B(13 Marks)

1. a. Explain the types of data used in Cluster analysis with example.


Nov 2014, May 2014, May 2016, Apr 2017 (7)
b. Explain the importance of Outlier analysis in clustering with example.
Nov 2014, May 2014, May 2012, May 2016 (6)
2. Discuss about categorization of Hierarchical clustering methods.
Nov 2014, May 2015, Nov 2015, Apr 2017 (13)
3. Illustrate the partitioning clustering algorithm using an example.
May 2016, Nov 2015, Nov 2016 (13)
4. a. Discuss about model-based clustering (7)
b. Describe density based methods in detail with examples. Nov 2014, May 2013 (6)
5. What is grid based clustering? Describe the grid based clustering approaches.
Nov 2011 (13)
6. Explain in detail about clustering high dimensional data (13)
7.With relevant example discuss constraint based clustering analysis. May 2011 (13)
8. a. Write the difference between CLARA and CLARANS. May 2014 (5)
b. Classify Data mining applications. Explain how data mining is used for retail industry.
Or
Describe the applications and trends in data mining in detail.
May 2014, Nov 2016 (8)
9. Consider five points {X1, X2, X3, X4, X5} with the following coordinated as a 2-D sample for clustering:
X1 = {0, 2}, X2 = {0, 0}, X3 = {1.5, 0}, X4 = {5, 0}, X5 = { 5, 2}
Illustrate the K-means partitioning algorithm using the above data set.
Nov 2011, May 2015, May 2013, Nov 2012, May 2011 (13)
10. What is k-means algorithm? Suppose that the data mining task is to cluster the following
eight points into three clusters.
A1 (2, 10), A2 (2, 5), A3 (8, 4), B1 (5, 8), B2 (7, 5), B3 (6, 4), C1 (1, 2), C2 (4, 9).
The distance function is Euclidean distance. Suppose initially At, Bt and Ct are assigned as the
center of each cluster respectively. Use k-means algorithm to show only,
i) The three cluster centers after the first round execution and (7)
ii) The final three clusters. May 2015, May 2012 (6)

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


24
REGULATION 2013 ACADEMIC YEAR: 2017-2018

PART-C (15 MARKS)

1. Outliers are often discarded as noise. However, one person’s garbage could be another’s
treasure. For example, exceptions in credit card transactions can help us detect the
fraudulent use of credit cards. Taking fraudulence detection as an example, propose two
methods that can be used to detect outliers and discuss which one is more reliable. (15)

2. What are the differences between visual data mining and data visualization? Data
visualization may suffer from the data abundance problem. For example, it is not easy to
visually discover interesting properties of network connections if a social network is
huge, with complex and dense connections. Propose a data mining method that may help
people see through the network topology to the interesting features of the social network.
(15)
3. What is a collaborative recommender system? In what ways does it differ from a
customer- or product- based clustering system? How does it differ from a typical
classification or predictive modeling system? Outline one method of collaborative
filtering. Discuss why it works and what its limitations are in practice. (15)

4. What are the major challenges faced in bringing data mining research to market?
Illustrate one data mining research issue that, in your view, may have a strong impact on
the market and on society. Discuss how to approach such a research issue. (15)

5. Give an example of how specific clustering methods may be integrated, for example,
where one clustering algorithm is used as a preprocessing step for another. In addition,
provide reasoning on why the integration of two methods may sometimes lead to
improved clustering quality and efficiency. (15)

R1: “Data Warehousing, Data Mining, & OLAP”, Alex Berson, Tata McGraw-Hill
edition.
R2:“DATA MINING: CONCEPTS AND TECHNIQUES”, HAN & KAMBER, 3rd EDITION
R3: “DATA MINING SOLUTIONS”, RAJENDRA AKERKAR.
R4: “DATA MINING: TUTORIAL EXERCISES - CLUSTERING – K-MEANS, NEAREST NEIGHBOR
AND HIERARCHICAL”, HAN & KAMBER

IFETCE/CSE/III YR/ VI SEM/IT6702/DWDM/ALL UNIT/QB/VER 1.2


25

S-ar putea să vă placă și