Sunteți pe pagina 1din 14

Model Question Paper

Subject Code: MIT401 Book ID: B1633


Subject Name: Data Warehousing and Data Mining
Credits: 4 Marks: 140

Section A (Compulsory)

Descriptive Questions (10 Marks each)

Answer any four Questions 4 x 10 = 40 Marks

1. Explain the Top-Down and Bottom-up Data Warehouse development Methodologies.


(Refer section 1.5 for details) [5+5 marks]

2. Differentiate between Data Warehouse database and OLTP database


(Refer table 2.1 for details) [5+5 marks]

3. Explain the Data Warehouse Architectures 1, 2 and 3 [3+4+3 marks]


(Refer section 3.3 for details)

4. Differentiate between E-R modelling and Dimensional Modelling


(Refer section 4.4 for details) [5+5 marks]

5. Define Data Cleaning and explain basic methods of data cleaning


(Refer section 10.3 for details) [1+3+3+3 marks]

6. What is divisive clustering? Write algorithmic steps for the divisive clustering.
(Refer section 12.4.4 for details) [4+6 marks]
Section B
Multiple Choice Questions

Part A (One mark questions) (50*1 = 50 marks)

1. In transaction processing system data can be retrieved without user presences using
_________________.
a. Batch processing
b. Bit processing
c. Bulk processing
d. None of above

2. Transaction processing systems are the backbone of an organization __________


a. Less data processing
b. Data hiding easy
c. Update data base constantly
d. Having bulk of data

3. OLTP stand for _________________


a. Online Technical Processing System
b. Online Transaction Processing System
c. One way Transaction Processing System
d. Outline Technical Processing System

4. Data Warehouse is defined as subject oriented, integrated, time variant and


___________.
a. Communication based
b. Volatile
c. Transaction oriented
d. Non Volatile

5. The first Data Warehouses were developed in the _________ year


a. 1980
b. 1985
c. 1990
d. 1995

6. With ______________ approach, compatibility among the tools from different


vendors could become a serious problem.
a. Build or Buy
b. Single vendor
c. Best-of-breed
d. Both b & a
7. Data Warehouse contains data for ______________ purpose
a. Design
b. Reading
c. Analysis
d. Storing

8. OLTP systems are designed for __________________


a. Real-time business operations
b. Day to day truncations
c. Application design
d. Both A & C

9. SDLC stands for _____________________________


a. system development life cycle
b. system design life cycle
c. structure design life cycle
d. structure development life cycles

10. _____________ data is not a Component of Data Warehouse Architecture


a. Production
b. Application
c. Archived
d. External

11. Data Mart is a data _________ that may derive from a Data Warehouse that
emphasizes ease of access and usability for a particular designed purpose.
a. Department level
b. Limited in size
c. Read-only
d. Repository

12. Data about data is called ________________


a. Raw data
b. Metadata
c. Data design
d. Data process

13. Knowledge discovery is called ________________.


a. Data ware housing
b. Data duplication
c. Data mining
d. None of the above

14. ______________ Architecture is composed of multiple architectures.


a. ETL tool
b. Federated
c. Orthodoxed
d. None of the above

15. ERD stands for _____________________________.


a. Equality Relationship diagram
b. Equity Relationship Diagram
c. Entity Relationship Diagram
d. Entity Resources Diagram

16. ___________________ Data Model describes data from a high level.


a. Conceptual
b. Logical
c. Physical
d. Structural

17. For designing a database ________________ Method is used.


a. E-R model
b. GUI
c. Dimensional
d. None of the above

18. The E-R modeling supports ___________________ to reduce redundancy in the


database
a. Application program
b. Database design
c. De - normalization
d. Normalization

19. De-normalized model is also known as _______________________


a. E-R model
b. Dimensional model
c. Physical model
d. Logical model

20. Dimensional model can be implemented with the following data bases,
a. MDDB
b. Relational data base
c. Flat files
d. Excel data files
21. Populating all the Data Warehouse tables for the very first time is called
_______________.
a. Data Load
b. Performance Load
c. Initial Load
d. Physical Load

22. Which of the following are open source ETL tools?


a. SAS Data Integrator
b. Cognos Decision Stream
c. Microsoft DTS
d. Clover

23. The process of turning redo log files into archived redo log files is called _________.
a. Archiving
b. Archiving log
c. Transaction
d. Archived redo log

24. Which one is not related to data transformation tracks?


a. Selection
b. Conversion
c. Enrichment
d. Format Revision

25. ____________________ is related to textual data in the Data Warehouse


a. Date/Time Conversion
b. Character Set Conversion
c. Splitting of Single Fields
d. Merging of Information

26. _______________ Allows data to be modeled and viewed in multiple dimensions


a. data cube
b. Singlecube
c. Multicube
d. Hypercube

27. ERP and CRM are _________________________ kind of systems


a. OLAP
b. OLTP
c. OLDP
d. OLNP

28. OLAP stands for ____________________________


a. Online analytical processing
b. Online Application processing
c. Organizational level application processing
d. None of the above

29. MDS stands for ___________________


a. Multi Design Structure
b. Modeling Design Structure
c. Multi-Dimensional Structures
d. Middleware Domain Structure

30. Which of the following are the intermediate servers that stand in between a relational
back-end server and client front-end tools
a. ROLAP
b. MOLAP
c. HOLAP
d. All the above

31. Query response time is _____________________ kind of metadata.


a. Operational metadata
b. Relational metadata
c. Business metadata
d. Technical metadata

32. Key hierarchies and key performance indicators are ________________ kind of
Metadata.
a. Business metadata
b. Operational metadata
c. Relational metadata
d. Technical metadata

33. Storing, data mapping and transformation from source systems to the data
warehouse fall into:
a. Technical metadata
b. Operational metadata
c. Business metadata
d. Relational metadata

34. According to Ralph Kimball, Back-room metadata guides:


a. Extraction
b. Cleaning
c. Loading processes
d. All the above

35. One tool that can allow data warehouse managers to deal with metadata is called
________________.
a. Data encapsulation
b. Data hierarchy
c. Repository
d. Data mining

36. Data Mining has its roots from Statistics, Artificial Intelligence (AI) and
______________
a. Low level Language
b. Machine Learning
c. High-level Language
d. Nine of the above

37. ______________ Optimization techniques are based on the concepts of genetic


combination, mutation, and natural selection
a. Fuzzy logic
b. Neural networks
c. Involve decision trees
d. Genetic algorithms

38. Sales, cost, inventory, payroll, and accounting come under ____________________
a. Operational
b. Transactional
c. Nonoperational
d. a or b

39. Data mining discovers hidden patterns in data. (True / False)


a. True
b. False

40. _________________ Routines attempt to fill in missing values, smooth out noise
while identifying outlines, and correct inconsistencies in the data.
a. Data cleaning
b. Data passing
c. Data collecting
d. None of the above

41. Redundancies can be detected by _____________________.


a. Correlation analysis
b. Entity identification
c. Data Integration
d. Data Transformation

42. ________________ works to remove the noise from the data that includes
techniques like binning, clustering, and regression.
a. Aggregation
b. Generalization
c. Normalization
d. Smoothing

43. In which Strategy of data reduction, redundant attributes are detected.


a. Date cube aggregation
b. Numerosity reduction
c. Data compression
d. Dimension reduction

44. ____________ Techniques is used to detect relationships or associations between


specific values of categorical variables in large data sets.
a. Dimensions of data rule
b. Association rule
c. Levels of abstractions rule
d. Association mining rule

45. Neural networks are made up of _______________.


a. Artificial neurons
b. Clustering
c. Regression
d. Classification

46. Any superset of an infrequent set is an infrequent set. Which property is this?
a. Downward Closure Property
b. Middle Closure Property
c. Partition Closure Property
d. Upward Closure Property

47. Which of the following techniques are concerned about user navigation accessing?
a. Web structural mining
b. Web usage mining
c. Web content mining
d. Web data definition mining

48. Web data is _________.


a. Structured data
b. Un-structured data
c. Only text data
d. Binary data

49. GDP stands for _______________________.


a. Gross domestic period
b. Gross demand product
c. Gross domestic product
d. Gross domain product

50. _____________ is proving to be a critical link between theory, simulation, and


experiment.
a. Data intensive computing
b. Data mining
c. Data warehousing
d. None of the above

Part B (Two marks questions) (25*2 = 50 marks)

51. Data Warehouse is a database that is designed for facilitating _________ and
__________.
a. Code and Design
b. URL and Path
c. Query and Analysis
d. None of the above

52. There are two widely used methods for deriving business requirements
________driven requirements gathering and ________ driven requirements
gathering
a. Source, User
b. Deliver, Store
c. Service, Operations
d. Storage, Managment

53. Prior to loading data into the Data Warehouse inconsistencies are _____________
and _________________.
a. Data collection and sending
b. Formatting and retrieving
c. Data address and passing
d. Identified and resolved
54. Systems relational model is usually de-normalized into ______________ and
______________.
a. ERP system and data tables
b. Dimension and fact tables
c. CRM system and Master table
d. OLTP systems and database tables

55. ___________ and ____________ of data take place on a large scale in the data
staging area.
a. Sorting and Merging
b. Queuing and repeating
c. Scaling and ordering
d. Mapping and designing

56. Dimensional model consists of ___________ and ____________ tables


a. User creation tables and pre-defined tables
b. Master table and database tables
c. Fact Table and Dimension Tables
d. Customer table and student tables

57. Fact-tables usually consist of _________to_________ relationships.


a. One to one
b. One to many
c. Many to many
d. Many to one

58. Info Data extraction, ______________ and __________ encompass the areas of
data acquisition and data storage
a. Transformation and Loading
b. Data hiding and supporting
c. Retrying and mapping
d. Security and processing

59. ______________ and _____________ are the most time-consuming tasks in ETL
a. Data retrieving and handling data
b. Planning and altering
c. Design and creation
d. Application and managing

60. The ____________ and _____________ modes are applicable to full refresh.
a. Free and proceed
b. Load and append
c. Packet and subscript
d. Retrieve and source

61. Data cube contains ____________ and ______________.


a. Dimensions and Facts
b. Fragments and procedures
c. Data modeling and duplicating
d. Channels and sources

62. Operational Metadata follows _____________________ and _________________.


a. Rows and columns
b. Queues and stacks
c. Tables and sorting
d. Queries and aggregations

63. ____________ and _________________ are the key emerging Business Intelligence
technologies
a. Data collection and application
b. Data warehouse and data mining
c. Network and database
d. Data security and data analysis

64. ______________ and ___________ is the internal and external data into a
comprehensive view Mine for the integrated data information
a. Capture and integrate
b. Relative and argument
c. Learning and duplicating
d. Presenting and focusing

65. ______________ and _______________ is the information and knowledge in ways


that expedite complex decision making.
a. Accessing and transforming
b. Manipulating and transferring
c. Engaging and processing
d. Organize and present

66. Data pre-processing techniques are __________________ and _______________.


a. Data integration and data cleaning
b. Data retrieving and data manipulating
c. Data hiding and data collecting
d. Data organizing and meta data
67. Normalization may improve the ________________ and ____________ of mining
algorithms involving distance measurements
a. Data processing and resource
b. Accuracy and efficiency
c. Clustering and manipulating
d. Multi-tasking and multi-processing

68. Outliers may be identified through a combination of __________ and __________.


a. Coding and processing
b. Programs and algorithms
c. Computer and human inspection
d. Application and background

69. _______________ and __________________ typically have metadata


a. Data communication and data planning
b. Data mining and data structure
c. Data hiding and retrieving
d. Databases and data warehouses

70. Two fundamental goals of Data Mining are ___________ and ___________.
a. Prediction and Description
b. Gathering data and transferring data
c. Coding and processing
d. Accuracy and efficiency

71. Data mining techniques are ______________ and ________________.


a. Regression and Clustering
b. Multi-tasking and multi-processing
c. Data processing and resource
d. Data hiding and retrieving

72. _____________ and ______________ are the methods of clustering


a. Hierarchical and Agglomerative
b. Manipulating and transferring
c. Prediction and Description
d. Coding and processing

73. _________identifies groups of houses according to their house type, value and
geographical location and _________is the classification of plants and animals given
their features
a. Length and Science
b. City-planning and Biology
c. Insurance and libraries
d. www and zoology
74. Web mining can be broadly defined as the ___________ and ___________ of useful
information from the World Wide Web
a. Discovery and analysis
b. Web design and algorithm
c. Mailing and networking
d. Client and server

75. Multidimensional databases can present their data to an application using


_______________ and _______________.
a. Data warehouse and data mining
b. Data cubes and single cubes
c. Hyper cubes and multi cubes
d. Data collection and data retrieving
Answer Keys (Part A & Part B)
Subject Code: MIT401 Book ID: B1633

Part - A Part - B
Unit
Unit no.
Q. Ans. Q. Ans. no. / Q. Ans. Unit no. /
/ Page
No. Key No. Key Page No. key Page no.
no.
no.
1 A 1/2 26 A 6/90 51 C 1/4
2 C 1/2 27 B 6/87 52 A 2/21
3 B 1/2 28 A 6/85 53 D 1/6
4 D 1/9 29 C 6/94 54 B 3/24
5 A 1/7 30 D 6/91 55 A 3/31
6 C 2/15 31 A 7/101 56 C 4/44
7 C 2/22 32 A 7/104 57 C 4/45
8 A 2/13 33 A 7/100 58 A 5/69
9 A 2/17 34 D 7/100 59 C 5/72
10 B 3/25 35 C 7/105 60 B 5/81
11 D 3/35 36 B 8/116 61 A 6/90
12 B 3/35 37 D 8/116 62 D 7/101
13 C 3/34 38 D 8/111 63 B 8/112
14 B 3/39 39 A 8/115 64 A 8/113
15 C 4/42 40 A 10/135 65 D 8/113
16 A 4/42 41 A 10/140 66 A 10/113
17 A 4/43 42 D 10/141 67 B 10/134
18 D 4/47 43 D 10/143 68 C 10/138
19 B 4/47 44 B 11/160 69 D 10/139
20 B 4/48 45 A 11/171 70 A 11/159
21 C 5/80 46 D 11/164 71 A 11/159
22 D 5/82 47 B 13/198 72 A 12/177
23 A 5/75 48 B 13/196 73 B 12/175
24 D 5/76 49 C 14/222 74 A 13/195
25 B 5/78 50 A 14/222 75 C 6/94

S-ar putea să vă placă și