Sunteți pe pagina 1din 32

IT9223- ADVANCED DATABASE SYSTEMS -UNIT-1-2MARKS

IT9223 ADVANCED DATABASE SYSTEMS


UNIT I [DISTRIBUTED DATABASES]

1. What are advantages of distributed databases over conventional database?

mimics organisational structure with data


local access and autonomy without exclusion
cheaper to create and easier to expand
improved availability/reliability/performance by removing reliance on a central site
Reduced communication overhead
Most data access is local, less expensive and performs better
Improved processing power
Many machines handling the database rather than a single server

more complex to implement


more costly to maintain
security and integrity control
standards and experience are lacking
Design issues are more complex

2. Write about distributed databases architecture

o
o
o

Defines the structure of the system


components identified
functions of each component defined
interrelationships and interactions between components defined

3. What is a distributed database?


A logically interrelated collection of shared data (and a description of this data), physically distributed
over a computer network
(DDBMS) is the software that manages the DDB and provides an access mechanism that makes this
distribution transparent to the users.
DDBS = DB + Communication
non-centralised
DDBMS
Motivated by need to integrate operational data and to provide controlled access
manages the Distributed database
makes the distribution transparent to the user

4. What are implicit assumptions in DBMS?


Data stored at a number of sites each site logically consists of a single processor.
Processors at different sites are interconnected by a computer network no multiprocessors
parallel database systems
Distributed database is a database, not a collection of files data logically related as exhibited in the users
access patterns
o relational data model
D-DBMS is a full-fledged DBMS
o not remote file system, not a TP system

5. Write the different dimensions of the problem in DDBMS

o
o

o
o

Distribution
Whether the components of the system are located on the same machine or not
Heterogeneity
Various levels (hardware, communications, operating system)
DBMS important one
data model, query language,transaction management algorithms
Autonomy
Not well understood and most troublesome
Various versions
Design autonomy: Ability of a component DBMS to decide on issues related to its own design.
Communication autonomy: Ability of a component DBMS to decide whether and how to communicate with
other DBMSs.
Execution autonomy: Ability of a component DBMS to execute local operations in any manner it wants to.

6. What are the issues in DDBMS?


Data Allocation - Where to locate data and whether to replicate?
Data Fragmentation - Partition the database
Distributed catalogue management
Distributed transactions
Distributed Queries
Making all of the above transparent to the user is the key of DDBMSs
Replication
If a site (or network path) fails, the data held there is unavailable
Consider replication (duplication) of data to improve availability
No replication: Disjoint fragments
Partial replication: Site dependent
Full replication: Slows down update for consistency, expensive
7. What are the advantages of distributed databases?

Capacity and incremental growth


Increase reliability and availability
Modularity
Reduced communication overhead
Protection of valuable data
Efficiency and Flexibility

8. What are the disadvantages of distributed databases?


DDB design more complex, fragmentation & replication; extra work must be done by the DBAs to ensure
that the distributed nature of the system is transparent.
Economics,
Concurrency control,
Inexperience,
Security,
Difficult to maintain integrity
9. Write the Applications of DDBMS.

Manufacturing - especially multi-plant manufacturing


Military command and control
Electronic fund transfers and electronic trading
Corporate MIS
Airline restrictions
Hotel chains
Any organization which has a decentralized organization structure
10. What is relation and sub-relations?

Relation
views are subsets of relations locality
extra communication
Fragments of relations (sub-relations)
concurrent execution of a number of transactions that access different portions of a relation
views that cannot be defined on a single fragment will require extra processing
semantic data control (especially integrity enforcement) more difficult
11. What are the types of Fragmentation?
Horizontal Fragmentation (HF)
splitting the database by rows
e.g. A-J in site 1, K-S in site 2 and T-Z in site 3
o Primary Horizontal Fragmentation (PHF)
o Derived Horizontal Fragmentation (DHF)
Vertical Fragmentation (VF)
Splitting database by columns/fields
e.g. columns/fields 1-3 in site A, 4-6 in site B
Take the primary key to all sites
Hybrid Fragmentation (HF)
Horizontal and vertical could even be combined
12. Define Vertical Fragmentation. And what is the advantage of VF?

o
o

o
o

Has been studied within the centralized context


design methodology
physical clustering
More difficult than horizontal, because more alternatives exist.
Two approaches :
grouping

attributes to fragments
splitting
relation to fragments
Overlapping fragments
grouping
Non-overlapping fragments
splitting
We do not consider the replicated key attributes to be overlapping.
Advantage: Easier to enforce functional dependencies (for integrity checking etc.)
13. What are the Information Requirements in VF?

Application Information
Attribute affinities
a measure that indicates how closely related the attributes are
This is obtained from more primitive usage data
Attribute usage values
Given a set of queries Q = {q1, q2,, qq} that will run on the relation R[A1, A2,, An],

use(qi,Aj) =

1 if attribute Aj is referenced by query qi

0 otherwise

14. What is the significance of Query processing?

using client-server architecture


user creates query
Client parses and sends to server(s) (SQL?)
servers return appropriate Tables
client combines into one Table
Issue of data transfer cost over a network
optimise the query to transfer the least amount
15. What are the Query processing components?

Query language that is used


SQL: intergalactic data speak
Query execution methodology
The steps that one goes through in executing high-level (declarative) user queries.
Query optimization
How do we determine the best execution plan?
16. What are the objectives of Query Optimization?

Minimize a cost function


I/O cost + CPU cost + communication cost

o
o

These might have different weights in different distributed environments


Wide area networks
communication cost will dominate
low bandwidth
low speed
high protocol overhead
most algorithms ignore all other cost components
Local area networks
communication cost not that dominant
total cost function should be considered
Can also maximize throughput
17. What are the types of Optimizers?

o
o
o
o

Exhaustive search
cost-based
optimal
combinatorial complexity in the number of relations
Heuristics
not optimal
regroup common sub-expressions
perform selection, projection first
replace a join by a series of semijoins
reorder operations to reduce intermediate relation size
optimize individual operations
18. What is Optimization Granularity

o
o

Single query at a time


cannot use common intermediate results
Multiple queries at a time
efficient if many similar queries
decision space is much larger
19. What are the types of optimization Timing?

o
o
o
o

o
o
o
o

o
o
o

Static
compilation optimize prior to the execution
difficult to estimate the size of the intermediate results error propagation
can amortize over many executions
R*
Dynamic
run time optimization
exact information on the intermediate relation sizes
have to reoptimize for multiple executions
Distributed INGRES
Hybrid
compile using a static algorithm
if the error in estimate sizes > threshold, reoptimize at run time
MERMAID

20. What are statistics in query optimization?

o
o
o

o
o

o
o

Relation
cardinality
size of a tuple
fraction of tuples participating in a join with another relation
Attribute
cardinality of domain
actual number of distinct values
Common assumptions
independence between different attribute values
uniform distribution of attribute values within their domain
21. What are the decision sites are used in Query Optimization?

o
o
o

o
o
o

Centralized
single site determines the best schedule
simple
need knowledge about the entire distributed database
Distributed
cooperation among sites to determine the schedule
need only local information
cost of cooperation
Hybrid
one site determines the global schedule
each site optimizes the local sub queries

22. What are the types of network topologies used in DDBMS?

o
o
o

o
o
o
o

Wide area networks (WAN) point-to-point


characteristics
low bandwidth
low speed
high protocol overhead
communication cost will dominate; ignore all other cost factors
global schedule to minimize communication cost
local schedules according to centralized query optimization
Local area networks (LAN)
communication cost not that dominant
total cost function should be considered
broadcasting can be exploited (joins)
special algorithms exist for star networks
23. What is Query Decomposition?

Input: Calculus query on global relations


Normalization

o
o

o
o
o

manipulate query quantifiers and qualification


Analysis
detect and reject incorrect queries
possible for only a subset of relational calculus
Simplification
eliminate redundant predicates
Restructuring
calculus query algebraic query
more than one translation is possible
use transformation rules

24. What is Data Localization?

o
o

Input: Algebraic query on distributed relations


Determine which fragments are involved
Localization program
substitute for each global query its materialization program
optimize

25. What is Global Query Optimization?

o
o

Input: Fragment query


Find the best (not necessarily optimal) global schedule
Minimize a cost function
Distributed join processing
Bushy vs. linear trees
Which relation to ship where?
Ship-whole vs ship-as-needed
Decide on the use of semijoins
Semijoin saves on communication at the expense of more local processing.
Join methods
nested loop vs ordered joins (merge join or hash join)
26. What are the rules of DDBMS?

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.

Local autonomy
No reliance on a central site
Continuous operation
Location independence
Fragmentation independence
Replication independence
Distributed Query processing
Distributed transaction processing
Hardware independence
Operating System independence
Network independence
Database independence

27. Define Transaction.


A transaction is a collection of actions that make consistent transformations of system states while
preserving system consistency.

Concurrency transparency

Failure transparency

28. What are the properties of Transaction?


Atomicity: All or nothing
Consistency: No violation of integrity constraints
Isolation: Concurrent changes invisible and serializable
Durability: Committed updates persist
29. What is concurrency control?
The problem of synchronizing concurrent transactions such that the consistency of the database is
maintained while, at the same time, maximum degree of concurrency is achieved.
Anomalies:
o Lost updates
The effects of some transactions are not reflected on the database.
o Inconsistent retrievals
A transaction, if it reads the same data item more than once, should always read the same value.
30. What are the modes in distributed locks?

Four modes of management possible:


Centralised 2PL
Read any copy, update all for updates
Single site, bottleneck, failure?
Primary Copy 2PL
Distributes locks, one copy designated primary, others slaves
Only primary copy locked for updates, slaves updated later
Distributed 2PL
Each site manages its own data locks
All copies locked for an update, high cost of comms
Majority Locking
31. What are the Distributed Reliability Protocols?

Commit protocols
Termination protocols
Recovery protocols
Independent recovery non-blocking termination

IT9223- ADVANCED DATABASE SYSTEMS -UNIT-2-2MARKS

UNIT II - OBJECT ORIENTED DATABASES


TWO MARKS
1. What us object databases?

Became commercially popular in mid 1990s


You can store the data in the same format as you use it. No paradigm shift.
Did not reach full potential till the classes they store were decoupled from the database schema.
Open source implementation available low cost solution now exists.

2. What is Object Oriented Database? (OODB)


A database system that incorporates all the important object-oriented concepts
Some additional features
o
Unique Object identifiers
o
Persistent object handling
Is the coupling of Object Oriented (OOP) Programming principles with Database Management System (DBMS)
principles
o Provides access to persisted objects using the same OO-programming language

3. What are the advantages of OODBS?

Designer can specify the structure of objects and their behavior (methods)
Better interaction with object-oriented languages such as Java and C++
Definition of complex and user-defined types
Encapsulation of operations and user-defined methods

4. List out the Object Database Vendors.

Matisse Software Inc.,


Objectivity Inc.,
Poet's FastObjects,
Computer Associates,
eXcelon Corporation
Db4o

5.

Differentiate OODB and Relational DB?


OODB

Relational DB

6.

Uses an OO model.
Data is a collection of objects
whose behavior, state, and
relationships are stored as a
physical entity.

Uses record-oriented model.


Data is a collection of record types
(relations), each having collection of
records or tuples stored in a file.

Language dependence (OO


Language specific.

Language independence (via SQL)

No impedance mismatch in
application using OODB.

Impedance mismatch in OO
application. Mapping must be
performed.

Write about modeling and design in OODBMS.

Basically, an OODBMS is an object database that provides DBMS capabilities to objects that have been
created using an object-oriented programming language (OOPL). The basic principle is to add
persistence to objects and to make objects persistent. Consequently application programmers who use
OODBMSs typically write programs in a native OOPL such as Java, C++ or Smalltalk, and the language
has some kind of Persistent class, Database class, Database Interface, or Database API that provides
DBMS functionality as, effectively, an extension of the OOPL.

7. What is Object data modeling?


An object consists of three parts: structure (attribute, and relationship to other objects like aggregation,
and association), behavior (a set of operations) and characteristic of types (generalization/serialization).
An object is similar to an entity in ER model; therefore we begin with an example to demonstrate the
structure and relationship.

8. Define Attributes.
Attributes are like the fields in a relational model. However in the Book example we have,for attributes
publishedBy and writtenBy, complex types Publisher and Author,which are also objects. Attributes with
complex objects, in RDNS, are usually other tableslinked by keys to the employee table.

9. Define Relationships.
Relationships: publish and written by are associations with I: N and 1:1 relationship; composed of is an
aggregation (a Book is composed of chapters). The 1: N relationship is usually realized as attributes
through complex types and at the behavioral level. For example,

10. What is Generalization and serialization?


Generalization/Serialization is a relationship, which is supported in OODB through class hierarchy. An
ArtBook is a Book, therefore the ArtBook class is a subclass of Book class. A subclass inherits all the
attribute and method of its superclass.

11. Define Message.

Message: means by which objects communicate, and it is a request from one object to another to
execute one of its methods. For example:
Publisher_object.insert (Rose, 123,) i.e. request to execute the insert method on a Publisher object )

12. Define Method.


Method: defines the behavior of an object. Methods can be used to change state by modifying its
attribute values. To query the value of selected attributes the method that responds to the message
example is the method insert defied in the Publisher class.

13. What are the drawbacks of persistent programming languages?


o Due to power of most programming languages, it is easy to make programming errors that
damage the database.
o Complexity of languages makes automatic high-level optimization more difficult.
o Do not support declarative querying as well as relational databases
14. What are the types of Query languages?

1.
2.
3.

Declarative query language Not computationally complete


Syntax based on SQL (select, from, where)
Additional flexibility (queries with user defined operators and types)
15. Give an example for Complex Data?
A Water Resource Management example
A database of state wide water projects
Includes a library of picture slides
Indexing according to predefined concepts prohibitively expensive
16. What are the types of queries can be used in water resource management?

1.
2.
3.

Geographic locations
Reservoir levels during droughts
Recent flood conditions, etc

17. What is Mutiversion concurrency control?


Multiversion concurrency control (abbreviated MCC or MVCC), in the database field of computer
science, is a concurrency control method commonly used by database management systems to provide
concurrent access to the database and in programming languages to implement transactional memory.
18. What are types of failures?
System crashes: due to hardware or software errors, resulting in loss of main memory.
Media failures: due to problem with disk head or unreadable media
Application errors: due to logical errors in the program which may cause transactions to fail.
Natural disasters: physical loss of media. (Fire, food, earth quakes, terrorism, etc)
Sabotage: intentional corruption or destruction of data.
Carelessness: unintentional destruction of data by user or operator.
19. What are the general failures?

Transaction failures Transaction aborts


System failures failure of processor, main memory, power supply
Media failures failure of secondary storage

20. What is meant by concurrency control?


Concurrency control ensures that correct results for concurrent operations are generated, while getting
those results as quickly as possible. Thus concurrency control is an essential element for correctness in
any system where two database transactions or more, executed with time overlap, can access the same
data, e.g., virtually in any general-purpose database system.

21. What is meant by media recovery?


Media recovery deals with failures of the storage media holding the permanent database, in particular
disk failures. The traditional database approach for media recovery uses archive copies (dumps) of the
database as well as archive logs. Archive copies represent snapshots of the database and are
periodically taken. The archive log contains the log records for all committed changes which are not yet
reflected in the archive copy.

22. Define logging and recovery.


Logging and recovery ensure that failures are masked to the users of transaction-based data
management systems by providing automatic treatment for different kinds of failures, such as transaction
failures, system failures (crashes), media failures and disasters. The main goal is to guarantee the
atomicity (A) and durability (D) properties of ACID transactions by providing undo recovery for failed
transactions and redo recovery for committed transactions. Logging is the task of collecting redundant
data needed for recovery

23. What is meant by transaction?


A transaction comprises a unit of work performed within a database management system (or similar
system) against a database, and treated in a coherent and reliable way independent of other transactions.
24. What are the two main purposes of transaction?
1.

To provide reliable units of work that allow correct recovery from failures and keep a database consistent
even in cases of system failure, when execution stops (completely or partially) and many operations upon
a database remain uncompleted, with unclear status.

2.

To provide isolation between programs accessing a database concurrently. If this isolation is not provided
the programs outcome are possibly erroneous.
25. What is Crash recovery?
Crash recovery is needed when the whole database (transaction) system fails, e.g. due to a hardware or
software error. All transactions which were active and not yet committed at crash time have failed so that
their changes must be undone. The changes for transactions that have committed before the crash must
survive. A redo recovery is needed for all changes of committed transactions that have been lost by the
crash because the changed pages resided only in main memory but were not yet written out to the
permanent database.

26. What is meant by disaster recovery?

Disaster recovery can be achieved by maintaining a backup copy of the database at a geographically
remote location. By continuously transferring log data from the primary database to the backup and
applying the changes there, the backup can be kept (almost) up-to-date.

27. What is data persistence?


Data persistence means that in a DBMS all data is maintained as long as it is not deleted explicitly. The
life span of data needs to be determined directly or indirectly be the user and must not be dependent on
system features. Additionally data once stored in a database must not be lost. Changes of a database
which are done by a transaction are persistent. When a transaction is finished even a system crash
cannot put the data in danger.

28. Define transaction recovery.


Transaction recovery (rollback) is performed when a transaction fails during normal
Processing, e.g. due to a program error or invalid input data. The log records in the log buffer and in the
log file are used to undo the changes of the failed transaction in reverse order.

UNIT III

EMERGING SYSTEMS
TWO MARKS
<!--[if !vml]-->
<!--[endif]-->

<!--[if !supportLists]-->1. <!--[endif]-->What is client/server model?


<!--[if !supportLists]-->
<!--[endif]-->Networked computing model
<!--[if !supportLists]-->
<!--[endif]-->Processes distributed between clients and servers
<!--[if !supportLists]-->
<!--[endif]-->Client Workstation (usually a PC) that requests and uses a service
<!--[if !supportLists]-->
<!--[endif]-->Server Computer (PC/mini/mainframe) that provides a service
<!--[if !supportLists]-->
<!--[endif]-->For DBMS, server is a database server
<!--[if !supportLists]--> 2. <!--[endif]-->Give notes on database Server Architectures.
<!--[if !supportLists]--> <!--[endif]-->2-tiered approach
<!--[if !supportLists]--> <!--[endif]-->Client is responsible for
<!--[if !supportLists]-->o <!--[endif]-->I/O processing logic
<!--[if !supportLists]-->o <!--[endif]-->Some business rules logic
<!--[if !supportLists]--> <!--[endif]-->Server performs all data storage and access processing DBMS

is only on server
<!--[if !supportLists]-->3.<!--[endif]-->What are the advantages of database server?
<!--[if !supportLists]-->
<!--[endif]-->Clients do not have to be as powerful
<!--[if !supportLists]-->
<!--[endif]-->Greatly reduces data traffic on the network
<!--[if !supportLists]-->
<!--[endif]-->Improved data integrity since it is all processed centrally
<!--[if !supportLists]-->
<!--[endif]-->Stored procedures some business rules done on server
<!--[if !supportLists]-->4.<!--[endif]-->What are the advantages of Three-Tier Architectures?

<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->

<!--[endif]-->Scalability
<!--[endif]-->Technological flexibility
<!--[endif]-->Long-term cost reduction
<!--[endif]-->Better match of systems to business needs
<!--[endif]-->Improved customer service
<!--[endif]-->Competitive advantage
<!--[endif]-->Reduced risk

<!--[if !supportLists]-->5. <!--[endif]-->What are the challenges of Three-tier Architectures?


<!--[if !supportLists]--> <!--[endif]-->High short-term costs
<!--[if !supportLists]--> <!--[endif]-->Tools and training
<!--[if !supportLists]--> <!--[endif]-->Experience
<!--[if !supportLists]--> <!--[endif]-->Incompatible standards
<!--[if !supportLists]--> <!--[endif]-->Lack of compatible end-user tools
<!--[if !supportLists]-->6. <!--[endif]-->What are the Client/Server Security levels are used?
<!--[if !supportLists]--> <!--[endif]-->Network environment complex security issues
<!--[if !supportLists]--> <!--[endif]-->Security levels:
<!--[if !supportLists]-->o <!--[endif]-->System-level password security
<!--[if !supportLists]--> <!--[endif]-->for allowing access to the system
<!--[if !supportLists]-->o <!--[endif]-->Database-level password security
<!--[if !supportLists]--> <!--[endif]-->for determining access privileges to tables;

read/update/insert/delete privileges
<!--[if !supportLists]-->o <!--[endif]-->Secure client/server communication
<!--[if !supportLists]--> <!--[endif]-->via encryption
<!--[if !supportLists]-->7. <!--[endif]-->Define data warehousing.
<!--[if !supportLists]--> <!--[endif]-->Data sources often store only current data, not historical data
<!--[if !supportLists]--> <!--[endif]-->Corporate decision making requires a unified view of all

organizational data, including historical data

<!--[if !supportLists]--> <!--[endif]-->A data warehouse is a repository (archive) of information

gathered from multiple sources, stored under a unified schema, at a single site
<!--[if !supportLists]-->o <!--[endif]-->Greatly simplifies querying, permits study of historical trends
<!--[if !supportLists]-->o <!--[endif]-->Shifts decision support query load away from transaction
processing systems
<!--[if !supportLists]-->8.<!--[endif]-->What are the Design Issues in data warehousing?
<!--[if !supportLists]--> <!--[endif]-->When and how to gather data
<!--[if !supportLists]-->o <!--[endif]-->Source driven architecture: data sources transmit new information
to warehouse, either continuously or periodically (e.g. at night)
<!--[if !supportLists]-->o <!--[endif]-->Destination driven architecture: warehouse periodically requests
new information from data sources
<!--[if !supportLists]-->o <!--[endif]-->Keeping warehouse exactly synchronized with data sources (e.g.
using two-phase commit) is too expensive
<!--[if !supportLists]--> <!--[endif]-->What schema to use
<!--[if !supportLists]-->o <!--[endif]-->Schema integration
<!--[if !supportLists]--> <!--[endif]-->Data cleansing
<!--[if !supportLists]-->o <!--[endif]-->E.g. correct mistakes in addresses (misspellings, zip code errors)
<!--[if !supportLists]-->o <!--[endif]-->Merge address lists from different sources and purge duplicates
<!--[if !supportLists]--> <!--[endif]-->How to propagate updates
<!--[if !supportLists]-->o <!--[endif]-->Warehouse schema may be a (materialized) view of schema from
data sources

<!--[if !supportLists]--> <!--[endif]-->What data to summarize


<!--[if !supportLists]-->o <!--[endif]-->Raw data may be too large to store on-line
<!--[if !supportLists]-->o <!--[endif]-->Aggregate values (totals/subtotals) often suffice
<!--[if !supportLists]-->o <!--[endif]-->Queries on raw data can often be transformed by query optimizer

to use aggregate values


<!--[if !supportLists]-->9.<!--[endif]-->What is the data warehouse Schemas?
<!--[if !supportLists]--> <!--[endif]-->Dimension values are usually encoded using small integers and

mapped to full values via dimension tables

<!--[if !supportLists]--> <!--[endif]-->Resultant schema is called a star schema


<!--[if !supportLists]-->o <!--[endif]-->More complicated schema structures
<!--[if !supportLists]--> <!--[endif]-->Snowflake schema: multiple levels of dimension tables
<!--[if !supportLists]--> <!--[endif]-->Constellation: multiple fact tables
<!--[if !supportLists]-->10. <!--[endif]-->Give a example for data warehouse Schema

<!--[if !supportLists]-->11. <!--[endif]-->Define Data Mining.

Data mining is the process of semi-automatically analyzing large databases to find


useful patterns
<!--[if !supportLists]-->12. <!--[endif]-->Define prediction.
Prediction based on past history
<!--[if !supportLists]-->o <!--[endif]-->Predict if a credit card applicant poses a good credit risk, based

on some attributes (income, job type, age, ..) and past history
<!--[if !supportLists]-->o <!--[endif]-->Predict if a pattern of phone calling card usage is likely to be

fraudulent
<!--[if !supportLists]-->13. <!--[endif]-->Give some examples of prediction mechanisms.
<!--[if !supportLists]-->o <!--[endif]-->Classification
<!--[if !supportLists]--> <!--[endif]-->Given a new item whose class is unknown, predict to which class

it belongs
<!--[if !supportLists]-->o <!--[endif]-->Regression formulae
<!--[if !supportLists]--> <!--[endif]-->Given a set of mappings for an unknown function, predict the

function result for a new parameter value


<!--[if !supportLists]--> 14.
<!--[endif]-->What are the descriptive Patterns in data mining?
Associations: Find books that are often bought by similar customers. If a new such customer
buys one such book, suggest the others too. Associations may be used as a first step in
detecting causation. E.g. association between exposure to chemical X and cancer,
Clusters: E.g. typhoid cases were clustered in an area surrounding a contaminated well
Detection of clusters remains important in detecting epidemics
<!--[if !supportLists]-->15. <!--[endif]--> What is classification Rules? And give example.
<!--[if !supportLists]--> <!--[endif]-->Classifications rules help assign new objects to classes.
<!--[if !supportLists]-->o <!--[endif]-->E.g., given a new automobile insurance applicant, should he or

she be classified as low risk, medium risk or high risk?

<!--[if !supportLists]--> <!--[endif]-->Classification rules for above example could use a variety of

data, such as educational level, salary, age, etc.


<!--[if !supportLists]-->o <!--[endif]--> person P, P.degree = masters and P.income > 75,000
<!--[if !supportLists]--> <!--[endif]-->P.credit = excellent

<!--[if !supportLists]-->o <!--[endif]--> person P, P.degree = bachelors and

(P.income 25,000 and P.income 75,000)


P.credit = good
<!--[if !supportLists]--> <!--[endif]-->Rules are not necessarily exact: there may be some
misclassifications
<!--[if !supportLists]--> <!--[endif]-->Classification rules can be shown compactly as a decision tree.
<!--[if !supportLists]-->16. <!--[endif]--> Define Decision Tree.
A decision tree is a decision support tool that uses a tree-like graph or model of decisions and
their possible consequences, including chance event outcomes, resource costs, and utility. It is
one way to display an algorithm. Decision trees are commonly used in operations research,
specifically in decision analysis, to help identify a strategy most likely to reach a goal. Another
use of decision trees is as a descriptive means for calculating conditional probabilities.
<!--[if !supportLists]-->17. <!--[endif]--> How do you construct of Decision Trees?
<!--[if !supportLists]--> <!--[endif]-->Training set: a data sample in which the classification is already
known.
<!--[if !supportLists]--> <!--[endif]-->Greedy top down generation of decision trees.
<!--[if !supportLists]-->o <!--[endif]-->Each internal node of the tree partitions the data into groups
based on a partitioning attribute, and a partitioning condition for the node
<!--[if !supportLists]-->o <!--[endif]-->Leaf node:
<!--[if !supportLists]--> <!--[endif]-->all (or most) of the items at the node belong to the same class, or
<!--[if !supportLists]--> <!--[endif]-->All attributes have been considered, and no further partitioning is
possible.
18.

How do you find the best splits?

<!--[if !supportLists]--> <!--[endif]-->Categorical attributes (with no meaningful order):


<!--[if !supportLists]-->o <!--[endif]-->Multi-way split, one child for each value
<!--[if !supportLists]-->o <!--[endif]-->Binary split: try all possible breakup of values into two sets, and

pick the best

<!--[if !supportLists]--> <!--[endif]-->Continuous-valued attributes (can be sorted in a meaningful

order)
<!--[if !supportLists]-->o <!--[endif]-->Binary split:
<!--[if !supportLists]--> <!--[endif]-->Sort values, try each as a split point
<!--[if !supportLists]-->
<!--[endif]-->E.g. if values are 1, 10, 15, 25, split at 1, 10, 15
<!--[if !supportLists]--> <!--[endif]-->Pick the value that gives best split
<!--[if !supportLists]-->o <!--[endif]-->Multi-way split:
<!--[if !supportLists]--> <!--[endif]-->A series of binary splits on the same attribute has roughly

equivalent effect
<!--[if !supportLists]-->19.

<!--[endif]-->What is meant by Regression?

Regression deals with the prediction of a value, rather than a class.


<!--[if !supportLists]-->o <!--[endif]-->Given values for a set of variables, X1, X2, , Xn, we wish to

predict the value of a variable Y.


One way is to infer coefficients a0, a1, a1, , an such that
Y = a0 + a1 * X1 + a2 * X2 + + an * Xn
Finding such a linear polynomial is called linear regression.
<!--[if !supportLists]-->o <!--[endif]-->In general, the process of finding a curve that fits the data is also
called curve fitting.
The fit may only be approximate
<!--[if !supportLists]-->o <!--[endif]-->because of noise in the data, or

<!--[if !supportLists]-->o <!--[endif]-->because the relationship is not exactly a polynomial

Regression aims to find coefficients that give the best possible fit.
<!--[if !supportLists]-->20.
<!--[endif]-->Give an example for association rules.
<!--[if !supportLists]--> <!--[endif]-->Retail shops are often interested in associations between

different items that people buy.


<!--[if !supportLists]-->o <!--[endif]-->Someone who buys bread is quite likely also to buy milk
<!--[if !supportLists]-->o <!--[endif]-->A person who bought the book Database System Concepts is

quite likely also to buy the book Operating System Concepts.

<!--[if !supportLists]--> <!--[endif]-->Associations information can be used in several ways.


<!--[if !supportLists]-->o <!--[endif]-->E.g. when a customer buys a particular book, an online shop may

suggest associated books.

<!--[if !supportLists]--> <!--[endif]-->Association rules:


<!--[if !supportLists]-->o <!--[endif]-->bread milk
DB-Concepts, OS-Concepts Networks
<!--[if !supportLists]-->o <!--[endif]-->Left hand side: antecedent,
right hand side: consequent
<!--[if !supportLists]-->o <!--[endif]-->An association rule must have an associated population; the

population consists of a set of instances


<!--[if !supportLists]--> <!--[endif]-->E.g. each transaction (sale) at a shop is an instance, and the set
of all transactions is the population
<!--[if !supportLists]--> <!--[endif]-->Rules have an associated support, as well as an associated
confidence.
<!--[if !supportLists]-->21. <!--[endif]-->What is meant by support and confidence?
Support is a measure of what fraction of the population satisfies both the antecedent and the
consequent of the rule.
<!--[if !supportLists]-->o <!--[endif]-->E.g. suppose only 0.001 percent of all purchases include milk and
screwdrivers. The support for the rule is milk screwdrivers is low.
<!--[if !supportLists]--> <!--[endif]-->Confidence is a measure of how often the consequent is true
when the antecedent is true.
<!--[if !supportLists]-->o <!--[endif]-->E.g. the rule bread milk has a confidence of 80 percent if 80
percent of the purchases that include bread also include milk.
<!--[if !supportLists]-->22. <!--[endif]-->Define Clustering.
<!--[if !supportLists]--> <!--[endif]-->Clustering: Intuitively, finding clusters of points in the given data

such that similar points lie in the same cluster

<!--[if !supportLists]--> <!--[endif]-->Can be formalized using distance metrics in several ways


<!--[if !supportLists]-->o <!--[endif]-->Group points into k sets (for a given k) such that the average

distance of points from the centroid of their assigned group is minimized


<!--[if !supportLists]--> <!--[endif]-->Centroid: point defined by taking average of coordinates in each
dimension.
<!--[if !supportLists]-->o <!--[endif]-->Another metric: minimize average distance between every pair of
points in a cluster
<!--[if !supportLists]--> <!--[endif]-->Has been studied extensively in statistics, but on small data sets
<!--[if !supportLists]-->o <!--[endif]-->Data mining systems aim at clustering techniques that can handle
very large data sets
<!--[if !supportLists]-->o <!--[endif]-->E.g. the Birch clustering algorithm (more shortly)
<!--[if !supportLists]-->23. <!--[endif]-->What is meant by Hierarchical Clustering
<!--[if !supportLists]--> <!--[endif]-->Example from biological classification
<!--[if !supportLists]-->o <!--[endif]-->(the word classification here does not mean a prediction
mechanism)

<!--[if !supportLists]-->

<!--[endif]-->chordata

mammalia
reptilia
leopards humans
snakes crocodiles
<!--[if !supportLists]--> <!--[endif]-->Other examples: Internet directory systems (e.g. Yahoo, more on
this later)
<!--[if !supportLists]--> <!--[endif]-->Agglomerative clustering algorithms
<!--[if !supportLists]-->o <!--[endif]-->Build small clusters, then cluster small clusters into bigger
clusters, and so on
<!--[if !supportLists]--> <!--[endif]-->Divisive clustering algorithms
<!--[if !supportLists]-->o <!--[endif]-->Start with all items in a single cluster, repeatedly refine (break)
clusters into smaller ones
<!--[if !supportLists]-->24. <!--[endif]-->Explain about Birch Algorithm.
<!--[if !supportLists]--> <!--[endif]-->Clustering algorithms have been designed to handle very large
datasets
<!--[if !supportLists]--> <!--[endif]-->E.g. the Birch algorithm
<!--[if !supportLists]-->o <!--[endif]-->Main idea: use an in-memory R-tree to store points that are being
clustered
<!--[if !supportLists]-->o <!--[endif]-->Insert points one at a time into the R-tree, merging a new point
with an existing cluster if is less than some distance away
<!--[if !supportLists]-->o <!--[endif]-->If there are more leaf nodes than fit in memory, merge existing
clusters that are close to each other
<!--[if !supportLists]-->o <!--[endif]-->At the end of first pass we get a large number of clusters at the
leaves of the R-tree
<!--[if !supportLists]--> <!--[endif]-->Merge clusters to reduce the number of clusters
<!--[if !supportLists]-->25. <!--[endif]-->What is text mining?
<!--[if !supportLists]--> <!--[endif]-->Text mining: application of data mining to textual documents
<!--[if !supportLists]-->o <!--[endif]-->cluster Web pages to find related pages
<!--[if !supportLists]-->o <!--[endif]-->cluster pages a user has visited to organize their visit history
<!--[if !supportLists]-->o <!--[endif]-->classify Web pages automatically into a Web directory
<!--[if !supportLists]--> <!--[endif]-->Data visualization systems help users examine large volumes of
data and detect patterns visually
<!--[if !supportLists]-->o <!--[endif]-->Can visually encode large amounts of information on a single
screen
<!--[if !supportLists]-->o <!--[endif]-->Humans are very good a detecting visual patterns
<!--[if !supportLists]-->26. <!--[endif]-->What are called as web databases?
Semantic web architecture and applications are a dramatic departure from earlier
database and applications generations. Semantic processing includes these earlier statistical
and natural langue techniques, and enhances these with semantic processing tools. First,
Semantic Web architecture is the automated conversion and storage of unstructured text
sources in a semantic web database. Second, Semantic Web applications automatically extract
and process the concepts and context in the database in a range of highly flexible tools.

<!--[if !supportLists]-->27. <!--[endif]-->What are


structured and unstructured Data?
Semantic Web architecture and applications handle both structured and unstructured
data. Structured data is stored in relational databases with static classification systems, and
also in discrete documents. These databases and documents can be processed and converted
to Semantic Web databases, and then processed with unstructured data. Much of the data we

read, produce and share is now unstructured; emails, reports, presentations, media content,
web pages. And, these documents are stored in many different formats; text, email files,
Microsoft word processor, spreadsheet, presentation files, Lotus Notes, Adobe.pdf, and HTML.
It is difficult, expensive, slow and inaccurate to attempt to classify and store these in a
structured database. All of these sources can be automatically converted to a common
Semantic Web database, and integrated into one common information source.

<!--[if !supportLists]-->28. <!--[endif]-->Compare


and contract with synthetic vs Artificial Intelligence.
Semantic Web technology is NOT Artificial Intelligence. AI was a mythical marketing goal to
create thinking machines. The Semantic Web supports a much more limited and realistic goal.
This is Synthetic Intelligence. The concepts and relationships stored in the Semantic Web
database are synthesized, or brought together and integrated, to automatically create a new
summary, analysis, report, email, alert; or launch another machine application. The goal of
Synthetic Intelligence information systems is bringing together all information sources and user
knowledge, and synthesizing these in global networks.
<!--[if !supportLists]-->29. <!--[endif]-->What are mobile databases?
The main advantage of using a mobile database in your application is offline access to datain
other words, the ability to read and update data without a network connection. This helps avoid
problems such as dropped connections, low bandwidth, and high latency that are typical on
wireless networks today.
Fully connected information space
<!--[if !supportLists]--> <!--[endif]-->Each node of the information space has some communication
capability.
<!--[if !supportLists]--> <!--[endif]-->Some node can process information.
<!--[if !supportLists]--> <!--[endif]-->Some node can communicate through voice channel.
<!--[if !supportLists]--> <!--[endif]-->Some node can do both
Can be created and maintained by integrating legacy database systems, and wired and wireless
systems (PCS, Cellular system, and GSM)
<!--[if !supportLists]-->30. <!--[endif]-->What is a Mobile Database System (MDS)?
A system with the following structural and functional properties
<!--[if !supportLists]--> <!--[endif]-->Distributed system with mobile connectivity
<!--[if !supportLists]--> <!--[endif]-->Full database system capability
<!--[if !supportLists]--> <!--[endif]-->Complete spatial mobility
<!--[if !supportLists]--> <!--[endif]-->Built on PCS/GSM platform
<!--[if !supportLists]--> <!--[endif]-->Wireless and wired communication capability

UNIT IV - DATABASE DESIGN ISSUES


TWO MARKS
<!--[if

!vml]-->

<!--[endif]-->
1. What are attributes? Give example.
An entity is represented by a set of attributes. Attributes are descriptive properties possessed by each
member of an entity set.

Example: possible attributes of customer entity are customer name, customer id, Customer Street,
customer city.

2. Define single valued and multi valued attributes.


Single valued attributes: attributes with a single value for a particular entity are called single valued
attributes.
Multi valued attributes: Attributes with a set of value for a particular entity are called multivalve
attributes.

3. What are stored and derived attributes?


Stored attributes: The attributes stored in a data base are called stored attributes.
Derived attributes: The attributes that are derived from the stored attributes are called derived attributes.

4. Define weak and strong entity sets?


Weak entity set: entity set that do not have key attribute of their own are called weak entity sets.
Strong entity set: Entity set that has a primary key is termed a strong entity set.

5. What is the use of integrity constraints?


Integrity constraints ensure that changes made to the database by authorized users do not result in a loss
of data consistency. Thus integrity constraints guard against accidental damage to the database.

6. Mention the various levels in security measures.


Database system, Operating system, Network, Physical, Human

7. What are the types of attributes?


Simple: Each entity has a single atomic value for the attribute.
Composite: The attribute may be composed of several components.
Multi-valued: An entity may have multiple values for that attribute.

8. Write the entity set corresponding to the entity type car.


Registration (RegistrationNumber, State), VehicleID, Make, Model, Year, (Color)
Example:
((ABC 123, TEXAS), TK629, Ford Mustang, convertible, 1999,(red,balck))

9. What are Weak entity types?

An entity that does not have a key attribute. A Weak attribute entity must participate in an identifying
relationship type with an owner or identifying entity type.

10. Define Normalization.


Normalization is carried out in practice so that the resulting designs are of high quality and meet the
desirable properties

11. What is Demoralization?


The process of storing the join of higher normal form relations as a base relationwhich is in a lower
normal form

12. What is the need of normalization?


<!--[if !supportLists]--> <!--[endif]-->Mixing attributes of multiple entities may cause problems
<!--[if !supportLists]--> <!--[endif]-->Information is stored redundantly wasting storage
<!--[if !supportLists]--> <!--[endif]-->Problems with update anomalies

13. What is integrity? And what are the types of integrity?

Integrity here refers to the CORRECTNESS & CONSISTENCY of the data


stored in the database
Types: Database Integrity, Entity integrity, Referential Integrity
14. What is meant by consistency?
It ensures the truthfulness of the database. The consistency property ensures that any transaction the
database performs will take it from one consistent state to another. The consistency property does not say
how the DBMS should handle an inconsistency other than ensure the database is clean at the end of the
transaction.

15. What is an entity relationship model?


The entity relationship model is a collection of basic objects called entities and relationship among those
objects. An entity is a thing or object in the real world that is distinguishable from other objects.

16. Define query optimization.


Query optimization refers to the process of finding the lowest cost method of evaluating a given query.

17. What are the tunable parameters?

<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->

<!--[endif]-->Tuning
<!--[endif]-->Tuning
<!--[endif]-->Tuning
<!--[endif]-->Tuning
<!--[endif]-->Tuning

of hardware
of schema
of indices
of materialized views
of transactions

18. What is hardware tuning?

<!--[if !supportLists]-->

<!--[endif]-->Even well-tuned transactions typically

require a few I/O operations

<!--[if !supportLists]-->-

<!--[endif]-->Typical disk supports

about 100 random I/O operations per second


<!--[if !supportLists]-->- <!--[endif]-->Suppose each transaction
requires just 2 random I/O operations. Then to support n
transactions per second, we need to stripe data across n/50 disks
(ignoring skew)
<!--[if !supportLists]-->
<!--[endif]-->Number of I/O operations per transaction
can be reduced by keeping more data in memory
<!--[if !supportLists]-->- <!--[endif]-->If all data is in memory, I/O
needed only for writes
<!--[if !supportLists]-->- <!--[endif]-->Keeping frequently used data
in memory reduces disk accesses, reducing number of disks
required, but has a memory cost
<!--[if !supportLists]-->20. <!--[endif]-->What is mean by schema tuning?
<!--[if !supportLists]--> <!--[endif]-->Vertically partition relations to
isolate the data that is accessed most often -- only fetch needed
information.

<!--[if !supportLists]--> <!--[endif]-->E.g., split


account into two, (account-number, branchname) and (account-number, balance).
<!--[if !supportLists]-->

<!--[endif]-->Improve performance by storing

a denormalized relation

<!--[if !supportLists]--> <!--[endif]-->E.g., store


join of account and depositor; branch-name and
balance information is repeated for each holder
of an account, but join need not be computed
repeatedly.
<!--[if !supportLists]-->

<!--[endif]-->Cluster together on the same

disk
page
records
match in a frequently required join,

that

would

<!--[if !supportLists]--> <!--[endif]--> compute


join very efficiently when required.
<!--[if !supportLists]-->21. <!--[endif]-->What is Index tuning?

<!--[if !supportLists]-->o <!--[endif]-->Create appropriate indices to


speed up slow queries/updates
<!--[if !supportLists]-->o <!--[endif]-->Speed up slow updates by
removing excess indices (tradeoff between queries and updates)
<!--[if !supportLists]-->o <!--[endif]-->Choose type of index (Btree/hash) appropriate for most frequent types of queries.
<!--[if !supportLists]-->o <!--[endif]-->Choose which index to make
clustered
<!--[if !supportLists]-->o <!--[endif]-->Index tuning wizards look at past history
of queries and updates (the workload) and recommend which indices
would be best for the workload

<!--[if !supportLists]-->22. <!--[endif]-->What


is the use of performance Simulation?
Performance simulation using queuing model useful to predict bottlenecks
as well as the effects of tuning changes, even without access to real
system.
23. What are temporal databases?
Non Temporal
<!--[if !supportLists]-->
<!--[endif]-->store only a single state of the real world, usually the most recent
state
<!--[if !supportLists]-->
<!--[endif]-->classified as snapshot databases
<!--[if !supportLists]-->
<!--[endif]-->application developers and database designers need to code for
time varying data requirements eg history tables, forecast reports etc
Temporal
<!--[if !supportLists]-->
<!--[endif]-->stores upto two dimensions of time i.e VALID (stated) time and
TRANSACTION (logged) time
<!--[if !supportLists]-->
<!--[endif]-->Classified as historical, rollback or bi-temporal
<!--[if !supportLists]-->
<!--[endif]-->No need for application developers or database designers to code
for time varying data requirements i.e time is inherently supported

24. Give Examples of application domains dealing with time varying data.
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->

<!--[endif]-->Financial Apps (e.g. history of stock market data)


<!--[endif]-->Insurance Apps (e.g. when were the policies in effect)
<!--[endif]-->Reservation Systems (e.g. when is which room in a hotel booked)
<!--[endif]-->Medical Information Management Systems (e.g. patient records)
<!--[endif]-->Decision Support Systems (e.g. planning future contigencies)
<!--[endif]-->CRM applications (eg customer history / future)

<!--[if !supportLists]--> <!--[endif]-->HR applications (e.g Date tracked positions in hierarchies)

25. What is a SDBMS?


A SDBMS is a software module that
<!--[if !supportLists]-->o <!--[endif]-->can work with an underlying DBMS
<!--[if !supportLists]-->o <!--[endif]-->supports spatial data models, spatial abstract data types (ADTs) and a
query language from which these ADTs are callable
<!--[if !supportLists]-->o <!--[endif]-->supports spatial indexing, efficient algorithms for processing spatial
operations, and domain specific rules for query optimization
<!--[if !supportLists]-->

<!--[endif]-->Examples: Oracle Spatial data cartridge, ESRI SDE

26. Give examples for spatial databases.


<!--[if !supportLists]-->

<!--[endif]-->Consider a spatial dataset with:

<!--[if !supportLists]-->

<!--[endif]-->County boundary (dashed white line)

<!--[if !supportLists]-->

<!--[endif]-->Census block - name, area, population, boundary (dark line)

<!--[if !supportLists]-->

<!--[endif]-->Water bodies (dark polygons)

<!--[if !supportLists]-->

<!--[endif]-->Satellite Imagery (gray scale pixels)

<!--[if !supportLists]-->

<!--[endif]-->Storage in a SDBMS table:

create table census_blocks (


name
string,
area
float,
population
number,
boundary
polygon );

27. How is a SDBMS different from a GIS?


<!--[if !supportLists]-->

<!--[endif]-->GIS is a software to visualize and analyze spatial data using

spatial analysis functions such as


<!--[if !supportLists]-->
<!--[endif]-->Search Thematic search, search by region, (re-)classification
<!--[if !supportLists]-->

<!--[endif]-->Location analysis Buffer, corridor, overlay

<!--[if !supportLists]-->

<!--[endif]-->Terrain analysis Slope/aspect, catchment, drainage network

<!--[if !supportLists]-->

<!--[endif]-->Flow analysis Connectivity, shortest path

<!--[if !supportLists]-->

<!--[endif]-->Distribution Change detection, proximity, nearest neighbor

<!--[if !supportLists]-->

<!--[endif]-->Spatial analysis/Statistics Pattern, centrality, autocorrelation,

indices of similarity, topology: hole description


<!--[if !supportLists]-->
<!--[endif]-->Measurements Distance, perimeter, shape, adjacency, direction
<!--[if !supportLists]-->

<!--[endif]-->GIS uses SDBMS

<!--[if !supportLists]-->

<!--[endif]-->to store, search, query, share large spatial data sets

28. What are the components of a SDBMS?


<!--[if !supportLists]-->

<!--[endif]-->Recall: a SDBMS is a software module that

<!--[if !supportLists]-->

<!--[endif]-->can work with an underlying DBMS

<!--[if !supportLists]-->

<!--[endif]-->supports spatial data models, spatial ADTs and a query language

from which these ADTs are callable


<!--[if !supportLists]-->
<!--[endif]-->supports spatial indexing, algorithms for processing spatial
operations, and domain specific rules for query optimization
<!--[if !supportLists]-->
<!--[endif]-->Components include
<!--[if !supportLists]-->

<!--[endif]-->Spatial data models, query language, query processing, file

organization and indices, query optimization, etc.

UNIT V - CURRENT ISSUES


TWO MARKS
<!--[if

!vml]-->

<!--[endif]-->

1. Write any four rules defined in relational database design?


<!--[if !supportLists]--> <!--[endif]-->All database management must take place using the relational
database's innate functionality
<!--[if !supportLists]--> <!--[endif]-->All information in the database must be stored as values in a table
<!--[if !supportLists]--> <!--[endif]-->All database information must be accessible through the combination of
a table name, primary key and column name.
<!--[if !supportLists]--> <!--[endif]-->The database must use NULL values to indicate missing or unknown
information
<!--[if !supportLists]--> <!--[endif]-->The database schema must be described using the relational database
syntax
<!--[if !supportLists]--> <!--[endif]-->The database may support multiple languages, but it must support at
least one language that provides full database functionality (e.g. SQL)

<!--[if !supportLists]--> <!--[endif]-->The system must be able to update all updatable views
<!--[if !supportLists]--> <!--[endif]-->The database must provide single-operation insert, update and delete
functionality
<!--[if !supportLists]--> <!--[endif]-->Changes to the physical structure of the database must be transparent
to applications and users.
<!--[if !supportLists]--> <!--[endif]-->Changes to the logical structure of the database must be transparent to
applications and users.
<!--[if !supportLists]--> <!--[endif]-->The database must natively support integrity constraints.
<!--[if !supportLists]--> <!--[endif]-->Changes to the distribution of the database (centralized vs. distributed)
must be transparent to applications and users.
<!--[if !supportLists]--> <!--[endif]-->Any languages supported by the database must not be able to subvert
integrity controls
2. What are knowledge bases?
A knowledge base (abbreviated KB, kb or ) is a special kind of database for knowledge management. A
knowledge base provides a means for information to be collected, organized, shared, searched and
utilized.

3. What are characteristics of knowledge bases?


characteristics of KBSs:
<!--[if !supportLists]--> <!--[endif]-->intelligent information processing systems
<!--[if !supportLists]--> <!--[endif]-->representation of domain of interest symbolic representation
<!--[if !supportLists]--> <!--[endif]-->problem solving by symbol-manipulation Symbolic programs
4. What are the main components in Knowledge bases?
<!--[if !supportLists]--> <!--[endif]-->knowledge-base (KB)
<!--[if !supportLists]--> <!--[endif]-->inference engine
<!--[if !supportLists]--> <!--[endif]-->case-specific database
<!--[if !supportLists]--> <!--[endif]-->explanation subsystem
<!--[if !supportLists]--> <!--[endif]-->knowledge acquisition subsystem
<!--[if !supportLists]--> <!--[endif]-->user interface ( user)

<!--[if !supportLists]--> <!--[endif]-->specially intefaces


<!--[if !supportLists]--> <!--[endif]-->developer interface ( knowledge engineer, human expert)
5. What are called active databases?
An Active Database is a database that includes an event driven architecture (often in the form of ECA
rules) which can respond to conditions both inside and outside the database. Possible uses include
security monitoring, alerting, statistics gathering and authorization. Most modern relational databases
include active database features in the form of SQL Triggers.

6. Differentiate database and information system.

7. What are deductive databases?


A deductive database is a database system that can make deductions (i.e.: conclude additional facts)
based on rules and facts stored in the (deductive) database. Datalog is the language typically used to
specify facts, rules and queries in deductive databases. Deductive databases have grown out of the
desire to combine logic programming with relational databases to construct systems that support a
powerful formalism and are still fast and able to deal with very large datasets. Deductive databases are
more expressive than relational databases but less expressive than logic programming systems.
Deductive databases have not found widespread adoptions outside academia, but some of their concepts
are used in today's relational databases to support the advanced features of more recent SQL standards.
8. What are parallel databases?
<!--[if !supportLists]--> <!--[endif]-->Parallel machines are becoming quite common and affordable
<!--[if !supportLists]-->o <!--[endif]-->Prices of microprocessors, memory and disks have dropped sharply
<!--[if !supportLists]-->o <!--[endif]-->Recent desktop computers feature multiple processors and this trend is
projected to accelerate
<!--[if !supportLists]--> <!--[endif]-->Databases are growing increasingly large
<!--[if !supportLists]-->o <!--[endif]-->Large volumes of transaction data are collected and stored for later
analysis.
<!--[if !supportLists]-->o <!--[endif]-->multimedia objects like images are increasingly stored in databases
<!--[if !supportLists]--> <!--[endif]-->Large-scale parallel database systems increasingly used for:

<!--[if !supportLists]-->o <!--[endif]-->storing large volumes of data


<!--[if !supportLists]-->o <!--[endif]-->processing time-consuming decision-support queries
providing high throughput for transaction processing
9. What are the types of skew?
<!--[if !supportLists]-->o <!--[endif]-->Attribute-value skew
<!--[if !supportLists]--> <!--[endif]-->Some values appear in the partitioning attributes of many tuples; all the
tuples with the same value for the partitioning attribute end up in the same partition.
<!--[if !supportLists]--> <!--[endif]-->Can occur with range-partitioning and hash-partitioning.
<!--[if !supportLists]-->o <!--[endif]-->Partition skew
<!--[if !supportLists]--> <!--[endif]-->With range-partitioning, badly chosen partition vector may assign too
many tuples to some partitions and too few to others.
<!--[if !supportLists]--> <!--[endif]-->Less likely with hash-partitioning if a good hash-function is chosen.

10. How do you handling skew using histograms?


<!--[if !supportLists]--> <!--[endif]-->Balanced partitioning vector can be constructed from histogram in a
relatively straightforward fashion. Assume uniform distribution within each range of the histogram
<!--[if !supportLists]--> <!--[endif]-->Histogram can be constructed by scanning relation, or sampling (blocks
containing) tuples of the relation

11. How do you handling skew using virtual processor partitioning?


<!--[if !supportLists]--> <!--[endif]-->Skew in range partitioning can be handled elegantly using virtual
processor partitioning: Create a large number of partitions (say 10 to 20 times the number of
processors). Assign virtual processors to partitions either in round-robin fashion or based on estimated
cost of processing each virtual partition
<!--[if !supportLists]--> <!--[endif]-->Basic idea:
If any normal partition would have been skewed, it is very likely the skew is spread over a number of
virtual partitions
Skewed virtual partitions get spread across a number of processors, so work gets distributed evenly!
12. What is interquery Parallelism?
<!--[if !supportLists]--> <!--[endif]-->Queries/transactions execute in parallel with one another.

<!--[if !supportLists]--> <!--[endif]-->Increases transaction throughput; used primarily to scale up a


transaction processing system to support a larger number of transactions per second.
<!--[if !supportLists]--> <!--[endif]-->Easiest form of parallelism to support, particularly in a shared-memory
parallel database, because even sequential database systems support concurrent processing.
<!--[if !supportLists]--> <!--[endif]-->More complicated to implement on shared-disk or shared-nothing
architectures
<!--[if !supportLists]-->o <!--[endif]-->Locking and logging must be coordinated by passing messages between
processors.
<!--[if !supportLists]-->o <!--[endif]-->Data in a local buffer may have been updated at another processor.
Cache-coherency has to be maintained reads and writes of data in buffer must find latest version of
data.
13. Define cache coherency protocol with example.
<!--[if !supportLists]--> <!--[endif]-->Example of a cache coherency protocol for shared disk systems:
<!--[if !supportLists]-->o <!--[endif]-->Before reading/writing to a page, the page must be locked in
shared/exclusive mode.
<!--[if !supportLists]-->o <!--[endif]-->On locking a page, the page must be read from disk
<!--[if !supportLists]-->o <!--[endif]-->Before unlocking a page, the page must be written to disk if it was
modified.
<!--[if !supportLists]--> <!--[endif]-->More complex protocols with fewer disk reads/writes exist.
<!--[if !supportLists]--> <!--[endif]-->Cache coherency protocols for shared-nothing systems are similar.
Each database page is assigned a home processor. Requests to fetch the page or write it to disk are sent
to the home processor.

14. What is meant by intraquery parallelism?


<!--[if !supportLists]--> <!--[endif]-->Execution of a single query in parallel on multiple processors/disks;
important for speeding up long-running queries.
<!--[if !supportLists]--> <!--[endif]-->Two complementary forms of intraquery parallelism :
Interoperation Parallelism parallelize the execution of each individual operation in the query.
Interoperation Parallelism execute the different operations in a query expression in parallel. The first
form scales better with increasing parallelism because the number of tuples processed by each operation
is typically more than the number of operations in a query.

15. Write some design issues of parallel systems?


<!--[if !supportLists]--> <!--[endif]-->Parallel loading of data from external sources is needed in order to
handle large volumes of incoming data.
<!--[if !supportLists]--> <!--[endif]-->Resilience to failure of some processors or disks.
<!--[if !supportLists]--> <!--[endif]-->On-line reorganization of data and schema changes must be supported.
<!--[if !supportLists]--> <!--[endif]-->Also need support for on-line repartitioning and schema changes
(executed concurrently with other processing).

16. What are the multimedia data types?


<!--[if !supportLists]--> <!--[endif]-->Text
<!--[if !supportLists]--> <!--[endif]-->Image
<!--[if !supportLists]--> <!--[endif]-->Video
<!--[if !supportLists]--> <!--[endif]-->Audio
<!--[if !supportLists]--> <!--[endif]-->mixed multimedia data

17. What is image database system?


Image Database is searchable electronic catalog or database which allows you to organize and list
images by topics, modules, or categories. The Image Database will provide the student with important
information such as image title, description, and thumbnail picture. Additional information can be provided
such as creator of the image, filename, and keywords that will help students to search through the
database for specific images. Before you and your students can use Image Database, you must add it to
your course.

18. Give a note on image retrieval system.


An image retrieval system is a computer system for browsing, searching and retrieving images from a
large database of digital images. Most traditional and common methods of image retrieval utilize some
method of adding metadata such as captioning, keywords, or descriptions to the images so that retrieval
can be performed over the annotation words. Manual image annotation is time-consuming, laborious and
expensive; to address this, there has been a large amount of research done on automatic image
annotation. Additionally, the increase in social web applications and the semantic web have inspired the
development of several web-based image annotation tools.

19. Give a note on image search and methods.


Image search is a specialized data search used to find images. To search for images, a user may provide
query terms such as keyword, image file/link, or click on some image, and the system will return images
"similar" to the query. The similarity used for search criteria could be meta tags, color distribution in
images, region/shape attributes, etc.
<!--[if !supportLists]-->

<!--[endif]-->Image meta search - search of images based on associated metadata

such as keywords, text, etc.


<!--[if !supportLists]-->

<!--[endif]-->Content-based image retrieval (CBIR) the application of computer

vision to the image retrieval. CBIR aims at avoiding the use of textual descriptions and instead retrieves
images based on similarities in their contents (textures, colors, shapes etc.) to a user-supplied query
image or user-specified image features.
20. How the image data are classified?
<!--[if !supportLists]-->

<!--[endif]-->Archives - usually contain large volumes of structured or semi-

structured homogeneous data pertaining to specific topics.


<!--[if !supportLists]-->

<!--[endif]-->Domain-Specific Collection - this is a homogeneous collection

providing access to controlled users with very specific objectives. Examples of such a collection
are biomedical and satellite image databases.
<!--[if !supportLists]-->

<!--[endif]-->Enterprise Collection - a heterogeneous collection of images

that is accessible to users within an organizations intranet. Pictures may be stored in many
different locations.
<!--[if !supportLists]-->

<!--[endif]-->Personal Collection - usually consists of a largely homogeneous

collection and is generally small in size, accessible primarily to its owner, and usually stored on
a local storage media.
<!--[if !supportLists]-->

<!--[endif]-->Web - World Wide Web images are accessible to everyone with

an Internet connection. These image collections are semi-structured, non-homogeneous and


massive in volume, and are usually stored in large disk arrays.
21. What are text databases?

Large collections of documents from various sources: news articles, research papers, books,
digital libraries, e-mail messages, and web pages, library database, etc.

S-ar putea să vă placă și