IT9223

IT9223- ADVANCED DATABASE SYSTEMS -UNIT-1-2MARKS
IT9223 ADVANCED DATABASE SYSTEMS

UNIT I [DISTRIBUTED DATABASES]
1. What are advantages of distributed databases over conventional database?
mimics organisational structure with data

local access and autonomy without exclusion
cheaper to create and easier to expand
improved availability/reliability/performance by removing reliance on a central site
Reduced communication overhead
Most data access is local, less expensive and performs better
Improved processing power
Many machines handling the database rather than a single server
more complex to implement

more costly to maintain
security and integrity control
standards and experience are lacking
Design issues are more complex
2. Write about distributed databases architecture
o
o
o
Defines the structure of the system

components identified
functions of each component defined
interrelationships and interactions between components defined
3. What is a distributed database?

A logically interrelated collection of shared data (and a description of this data), physically distributed
over a computer network
(DDBMS) is the software that manages the DDB and provides an access mechanism that makes this
distribution transparent to the users.
DDBS = DB + Communication
non-centralised
DDBMS
Motivated by need to integrate operational data and to provide controlled access
manages the Distributed database
makes the distribution transparent to the user
4. What are implicit assumptions in DBMS?

Data stored at a number of sites each site logically consists of a single processor.
Processors at different sites are interconnected by a computer network no multiprocessors
parallel database systems
Distributed database is a database, not a collection of files data logically related as exhibited in the users
access patterns
o relational data model
D-DBMS is a full-fledged DBMS
o not remote file system, not a TP system
5. Write the different dimensions of the problem in DDBMS
o
o
o
o
Distribution
Whether the components of the system are located on the same machine or not
Heterogeneity
Various levels (hardware, communications, operating system)
DBMS important one
data model, query language,transaction management algorithms
Autonomy
Not well understood and most troublesome
Various versions
Design autonomy: Ability of a component DBMS to decide on issues related to its own design.
Communication autonomy: Ability of a component DBMS to decide whether and how to communicate with
other DBMSs.
Execution autonomy: Ability of a component DBMS to execute local operations in any manner it wants to.
6. What are the issues in DDBMS?

Data Allocation - Where to locate data and whether to replicate?
Data Fragmentation - Partition the database
Distributed catalogue management
Distributed transactions
Distributed Queries
Making all of the above transparent to the user is the key of DDBMSs
Replication
If a site (or network path) fails, the data held there is unavailable
Consider replication (duplication) of data to improve availability
No replication: Disjoint fragments
Partial replication: Site dependent
Full replication: Slows down update for consistency, expensive
7. What are the advantages of distributed databases?
Capacity and incremental growth

Increase reliability and availability
Modularity
Reduced communication overhead
Protection of valuable data
Efficiency and Flexibility
8. What are the disadvantages of distributed databases?

DDB design more complex, fragmentation & replication; extra work must be done by the DBAs to ensure
that the distributed nature of the system is transparent.
Economics,
Concurrency control,
Inexperience,
Security,
Difficult to maintain integrity
9. Write the Applications of DDBMS.
Manufacturing - especially multi-plant manufacturing

Military command and control
Electronic fund transfers and electronic trading
Corporate MIS
Airline restrictions
Hotel chains
Any organization which has a decentralized organization structure
10. What is relation and sub-relations?
Relation
views are subsets of relations locality
extra communication
Fragments of relations (sub-relations)
concurrent execution of a number of transactions that access different portions of a relation
views that cannot be defined on a single fragment will require extra processing
semantic data control (especially integrity enforcement) more difficult
11. What are the types of Fragmentation?
Horizontal Fragmentation (HF)
splitting the database by rows
e.g. A-J in site 1, K-S in site 2 and T-Z in site 3
o Primary Horizontal Fragmentation (PHF)
o Derived Horizontal Fragmentation (DHF)
Vertical Fragmentation (VF)
Splitting database by columns/fields
e.g. columns/fields 1-3 in site A, 4-6 in site B
Take the primary key to all sites
Hybrid Fragmentation (HF)
Horizontal and vertical could even be combined
12. Define Vertical Fragmentation. And what is the advantage of VF?
o
o
o
o
Has been studied within the centralized context

design methodology
physical clustering
More difficult than horizontal, because more alternatives exist.
Two approaches :
grouping
attributes to fragments
splitting
relation to fragments
Overlapping fragments
grouping
Non-overlapping fragments
splitting
We do not consider the replicated key attributes to be overlapping.
Advantage: Easier to enforce functional dependencies (for integrity checking etc.)
13. What are the Information Requirements in VF?
Application Information
Attribute affinities
a measure that indicates how closely related the attributes are
This is obtained from more primitive usage data
Attribute usage values
Given a set of queries Q = {q1, q2,, qq} that will run on the relation R[A1, A2,, An],
use(qi,Aj) =
1 if attribute Aj is referenced by query qi
0 otherwise
14. What is the significance of Query processing?
using client-server architecture

user creates query
Client parses and sends to server(s) (SQL?)
servers return appropriate Tables
client combines into one Table
Issue of data transfer cost over a network
optimise the query to transfer the least amount
15. What are the Query processing components?
Query language that is used

SQL: intergalactic data speak
Query execution methodology
The steps that one goes through in executing high-level (declarative) user queries.
Query optimization
How do we determine the best execution plan?
16. What are the objectives of Query Optimization?
Minimize a cost function

I/O cost + CPU cost + communication cost
o
o
These might have different weights in different distributed environments

Wide area networks
communication cost will dominate
low bandwidth
low speed
high protocol overhead
most algorithms ignore all other cost components
Local area networks
communication cost not that dominant
total cost function should be considered
Can also maximize throughput
17. What are the types of Optimizers?
o
o
o
o
Exhaustive search
cost-based
optimal
combinatorial complexity in the number of relations
Heuristics
not optimal
regroup common sub-expressions
perform selection, projection first
replace a join by a series of semijoins
reorder operations to reduce intermediate relation size
optimize individual operations
18. What is Optimization Granularity
o
o
Single query at a time

cannot use common intermediate results
Multiple queries at a time
efficient if many similar queries
decision space is much larger
19. What are the types of optimization Timing?
o
o
o
o
o
o
o
o
o
o
o
Static
compilation optimize prior to the execution
difficult to estimate the size of the intermediate results error propagation
can amortize over many executions
R*
Dynamic
run time optimization
exact information on the intermediate relation sizes
have to reoptimize for multiple executions
Distributed INGRES
Hybrid
compile using a static algorithm
if the error in estimate sizes > threshold, reoptimize at run time
MERMAID
20. What are statistics in query optimization?
o
o
o
o
o
o
o
Relation
cardinality
size of a tuple
fraction of tuples participating in a join with another relation
Attribute
cardinality of domain
actual number of distinct values
Common assumptions
independence between different attribute values
uniform distribution of attribute values within their domain
21. What are the decision sites are used in Query Optimization?
o
o
o
o
o
o
Centralized
single site determines the best schedule
simple
need knowledge about the entire distributed database
Distributed
cooperation among sites to determine the schedule
need only local information
cost of cooperation
Hybrid
one site determines the global schedule
each site optimizes the local sub queries
22. What are the types of network topologies used in DDBMS?
o
o
o
o
o
o
o
Wide area networks (WAN) point-to-point

characteristics
low bandwidth
low speed
high protocol overhead
communication cost will dominate; ignore all other cost factors
global schedule to minimize communication cost
local schedules according to centralized query optimization
Local area networks (LAN)
communication cost not that dominant
total cost function should be considered
broadcasting can be exploited (joins)
special algorithms exist for star networks
23. What is Query Decomposition?
Input: Calculus query on global relations

Normalization
o
o
o
o
o
manipulate query quantifiers and qualification

Analysis
detect and reject incorrect queries
possible for only a subset of relational calculus
Simplification
eliminate redundant predicates
Restructuring
calculus query algebraic query
more than one translation is possible
use transformation rules
24. What is Data Localization?
o
o
Input: Algebraic query on distributed relations

Determine which fragments are involved
Localization program
substitute for each global query its materialization program
optimize
25. What is Global Query Optimization?
o
o
Input: Fragment query

Find the best (not necessarily optimal) global schedule
Minimize a cost function
Distributed join processing
Bushy vs. linear trees
Which relation to ship where?
Ship-whole vs ship-as-needed
Decide on the use of semijoins
Semijoin saves on communication at the expense of more local processing.
Join methods
nested loop vs ordered joins (merge join or hash join)
26. What are the rules of DDBMS?
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
Local autonomy
No reliance on a central site
Continuous operation
Location independence
Fragmentation independence
Replication independence
Distributed Query processing
Distributed transaction processing
Hardware independence
Operating System independence
Network independence
Database independence
27. Define Transaction.

A transaction is a collection of actions that make consistent transformations of system states while
preserving system consistency.
Concurrency transparency
Failure transparency
28. What are the properties of Transaction?

Atomicity: All or nothing
Consistency: No violation of integrity constraints
Isolation: Concurrent changes invisible and serializable
Durability: Committed updates persist
29. What is concurrency control?
The problem of synchronizing concurrent transactions such that the consistency of the database is
maintained while, at the same time, maximum degree of concurrency is achieved.
Anomalies:
o Lost updates
The effects of some transactions are not reflected on the database.
o Inconsistent retrievals
A transaction, if it reads the same data item more than once, should always read the same value.
30. What are the modes in distributed locks?
Four modes of management possible:

Centralised 2PL
Read any copy, update all for updates
Single site, bottleneck, failure?
Primary Copy 2PL
Distributes locks, one copy designated primary, others slaves
Only primary copy locked for updates, slaves updated later
Distributed 2PL
Each site manages its own data locks
All copies locked for an update, high cost of comms
Majority Locking
31. What are the Distributed Reliability Protocols?
Commit protocols
Termination protocols
Recovery protocols
Independent recovery non-blocking termination
IT9223- ADVANCED DATABASE SYSTEMS -UNIT-2-2MARKS
UNIT II - OBJECT ORIENTED DATABASES

TWO MARKS
1. What us object databases?
Became commercially popular in mid 1990s

You can store the data in the same format as you use it. No paradigm shift.
Did not reach full potential till the classes they store were decoupled from the database schema.
Open source implementation available low cost solution now exists.
2. What is Object Oriented Database? (OODB)

A database system that incorporates all the important object-oriented concepts
Some additional features
o
Unique Object identifiers
o
Persistent object handling
Is the coupling of Object Oriented (OOP) Programming principles with Database Management System (DBMS)
principles
o Provides access to persisted objects using the same OO-programming language
3. What are the advantages of OODBS?
Designer can specify the structure of objects and their behavior (methods)
Better interaction with object-oriented languages such as Java and C++
Definition of complex and user-defined types
Encapsulation of operations and user-defined methods
4. List out the Object Database Vendors.
Matisse Software Inc.,

Objectivity Inc.,
Poet's FastObjects,
Computer Associates,
eXcelon Corporation
Db4o
5.
Differentiate OODB and Relational DB?

OODB
Relational DB
6.
Uses an OO model.
Data is a collection of objects
whose behavior, state, and
relationships are stored as a
physical entity.
Uses record-oriented model.

Data is a collection of record types
(relations), each having collection of
records or tuples stored in a file.
Language dependence (OO

Language specific.
Language independence (via SQL)
No impedance mismatch in
application using OODB.
Impedance mismatch in OO
application. Mapping must be
performed.
Write about modeling and design in OODBMS.
Basically, an OODBMS is an object database that provides DBMS capabilities to objects that have been
created using an object-oriented programming language (OOPL). The basic principle is to add
persistence to objects and to make objects persistent. Consequently application programmers who use
OODBMSs typically write programs in a native OOPL such as Java, C++ or Smalltalk, and the language
has some kind of Persistent class, Database class, Database Interface, or Database API that provides
DBMS functionality as, effectively, an extension of the OOPL.
7. What is Object data modeling?

An object consists of three parts: structure (attribute, and relationship to other objects like aggregation,
and association), behavior (a set of operations) and characteristic of types (generalization/serialization).
An object is similar to an entity in ER model; therefore we begin with an example to demonstrate the
structure and relationship.
8. Define Attributes.
Attributes are like the fields in a relational model. However in the Book example we have,for attributes
publishedBy and writtenBy, complex types Publisher and Author,which are also objects. Attributes with
complex objects, in RDNS, are usually other tableslinked by keys to the employee table.
9. Define Relationships.
Relationships: publish and written by are associations with I: N and 1:1 relationship; composed of is an
aggregation (a Book is composed of chapters). The 1: N relationship is usually realized as attributes
through complex types and at the behavioral level. For example,
10. What is Generalization and serialization?

Generalization/Serialization is a relationship, which is supported in OODB through class hierarchy. An
ArtBook is a Book, therefore the ArtBook class is a subclass of Book class. A subclass inherits all the
attribute and method of its superclass.
11. Define Message.
Message: means by which objects communicate, and it is a request from one object to another to
execute one of its methods. For example:
Publisher_object.insert (Rose, 123,) i.e. request to execute the insert method on a Publisher object )
12. Define Method.

Method: defines the behavior of an object. Methods can be used to change state by modifying its
attribute values. To query the value of selected attributes the method that responds to the message
example is the method insert defied in the Publisher class.
13. What are the drawbacks of persistent programming languages?

o Due to power of most programming languages, it is easy to make programming errors that
damage the database.
o Complexity of languages makes automatic high-level optimization more difficult.
o Do not support declarative querying as well as relational databases
14. What are the types of Query languages?
1.
2.
3.
Declarative query language Not computationally complete

Syntax based on SQL (select, from, where)
Additional flexibility (queries with user defined operators and types)
15. Give an example for Complex Data?
A Water Resource Management example
A database of state wide water projects
Includes a library of picture slides
Indexing according to predefined concepts prohibitively expensive
16. What are the types of queries can be used in water resource management?
1.
2.
3.
Geographic locations
Reservoir levels during droughts
Recent flood conditions, etc
17. What is Mutiversion concurrency control?

Multiversion concurrency control (abbreviated MCC or MVCC), in the database field of computer
science, is a concurrency control method commonly used by database management systems to provide
concurrent access to the database and in programming languages to implement transactional memory.
18. What are types of failures?
System crashes: due to hardware or software errors, resulting in loss of main memory.
Media failures: due to problem with disk head or unreadable media
Application errors: due to logical errors in the program which may cause transactions to fail.
Natural disasters: physical loss of media. (Fire, food, earth quakes, terrorism, etc)
Sabotage: intentional corruption or destruction of data.
Carelessness: unintentional destruction of data by user or operator.
19. What are the general failures?
Transaction failures Transaction aborts

System failures failure of processor, main memory, power supply
Media failures failure of secondary storage
20. What is meant by concurrency control?

Concurrency control ensures that correct results for concurrent operations are generated, while getting
those results as quickly as possible. Thus concurrency control is an essential element for correctness in
any system where two database transactions or more, executed with time overlap, can access the same
data, e.g., virtually in any general-purpose database system.
21. What is meant by media recovery?

Media recovery deals with failures of the storage media holding the permanent database, in particular
disk failures. The traditional database approach for media recovery uses archive copies (dumps) of the
database as well as archive logs. Archive copies represent snapshots of the database and are
periodically taken. The archive log contains the log records for all committed changes which are not yet
reflected in the archive copy.
22. Define logging and recovery.

Logging and recovery ensure that failures are masked to the users of transaction-based data
management systems by providing automatic treatment for different kinds of failures, such as transaction
failures, system failures (crashes), media failures and disasters. The main goal is to guarantee the
atomicity (A) and durability (D) properties of ACID transactions by providing undo recovery for failed
transactions and redo recovery for committed transactions. Logging is the task of collecting redundant
data needed for recovery
23. What is meant by transaction?

A transaction comprises a unit of work performed within a database management system (or similar
system) against a database, and treated in a coherent and reliable way independent of other transactions.
24. What are the two main purposes of transaction?
1.
To provide reliable units of work that allow correct recovery from failures and keep a database consistent
even in cases of system failure, when execution stops (completely or partially) and many operations upon
a database remain uncompleted, with unclear status.
2.
To provide isolation between programs accessing a database concurrently. If this isolation is not provided
the programs outcome are possibly erroneous.
25. What is Crash recovery?
Crash recovery is needed when the whole database (transaction) system fails, e.g. due to a hardware or
software error. All transactions which were active and not yet committed at crash time have failed so that
their changes must be undone. The changes for transactions that have committed before the crash must
survive. A redo recovery is needed for all changes of committed transactions that have been lost by the
crash because the changed pages resided only in main memory but were not yet written out to the
permanent database.
26. What is meant by disaster recovery?
Disaster recovery can be achieved by maintaining a backup copy of the database at a geographically
remote location. By continuously transferring log data from the primary database to the backup and
applying the changes there, the backup can be kept (almost) up-to-date.
27. What is data persistence?

Data persistence means that in a DBMS all data is maintained as long as it is not deleted explicitly. The
life span of data needs to be determined directly or indirectly be the user and must not be dependent on
system features. Additionally data once stored in a database must not be lost. Changes of a database
which are done by a transaction are persistent. When a transaction is finished even a system crash
cannot put the data in danger.
28. Define transaction recovery.

Transaction recovery (rollback) is performed when a transaction fails during normal
Processing, e.g. due to a program error or invalid input data. The log records in the log buffer and in the
log file are used to undo the changes of the failed transaction in reverse order.
UNIT III
EMERGING SYSTEMS
TWO MARKS


1. What is client/server model?


Networked computing model
Processes distributed between clients and servers
Client Workstation (usually a PC) that requests and uses a service
Server Computer (PC/mini/mainframe) that provides a service
For DBMS, server is a database server
 2. Give notes on database Server Architectures.
 2-tiered approach
 Client is responsible for
o I/O processing logic
o Some business rules logic
 Server performs all data storage and access processing DBMS
is only on server
3.What are the advantages of database server?
Clients do not have to be as powerful
Greatly reduces data traffic on the network
Improved data integrity since it is all processed centrally
Stored procedures some business rules done on server
4.What are the advantages of Three-Tier Architectures?
Scalability
Technological flexibility
Long-term cost reduction
Better match of systems to business needs
Improved customer service
Competitive advantage
Reduced risk
5. What are the challenges of Three-tier Architectures?

 High short-term costs
 Tools and training
 Experience
 Incompatible standards
 Lack of compatible end-user tools
6. What are the Client/Server Security levels are used?
 Network environment complex security issues
 Security levels:
o System-level password security
 for allowing access to the system
o Database-level password security
 for determining access privileges to tables;
read/update/insert/delete privileges
o Secure client/server communication
 via encryption
7. Define data warehousing.
 Data sources often store only current data, not historical data
 Corporate decision making requires a unified view of all
organizational data, including historical data
 A data warehouse is a repository (archive) of information
gathered from multiple sources, stored under a unified schema, at a single site
o Greatly simplifies querying, permits study of historical trends
o Shifts decision support query load away from transaction
processing systems
8.What are the Design Issues in data warehousing?
 When and how to gather data
o Source driven architecture: data sources transmit new information
to warehouse, either continuously or periodically (e.g. at night)
o Destination driven architecture: warehouse periodically requests
new information from data sources
o Keeping warehouse exactly synchronized with data sources (e.g.
using two-phase commit) is too expensive
 What schema to use
o Schema integration
 Data cleansing
o E.g. correct mistakes in addresses (misspellings, zip code errors)
o Merge address lists from different sources and purge duplicates
 How to propagate updates
o Warehouse schema may be a (materialized) view of schema from
data sources
 What data to summarize

o Raw data may be too large to store on-line
o Aggregate values (totals/subtotals) often suffice
o Queries on raw data can often be transformed by query optimizer
to use aggregate values

9.What is the data warehouse Schemas?
 Dimension values are usually encoded using small integers and
mapped to full values via dimension tables
 Resultant schema is called a star schema

o More complicated schema structures
 Snowflake schema: multiple levels of dimension tables
 Constellation: multiple fact tables
10. Give a example for data warehouse Schema
11. Define Data Mining.
Data mining is the process of semi-automatically analyzing large databases to find

useful patterns
12. Define prediction.
Prediction based on past history
o Predict if a credit card applicant poses a good credit risk, based
on some attributes (income, job type, age, ..) and past history
o Predict if a pattern of phone calling card usage is likely to be
fraudulent
13. Give some examples of prediction mechanisms.
o Classification
 Given a new item whose class is unknown, predict to which class
it belongs
o Regression formulae
 Given a set of mappings for an unknown function, predict the
function result for a new parameter value

 14.
What are the descriptive Patterns in data mining?
Associations: Find books that are often bought by similar customers. If a new such customer
buys one such book, suggest the others too. Associations may be used as a first step in
detecting causation. E.g. association between exposure to chemical X and cancer,
Clusters: E.g. typhoid cases were clustered in an area surrounding a contaminated well
Detection of clusters remains important in detecting epidemics
15.  What is classification Rules? And give example.
 Classifications rules help assign new objects to classes.
o E.g., given a new automobile insurance applicant, should he or
she be classified as low risk, medium risk or high risk?
 Classification rules for above example could use a variety of
data, such as educational level, salary, age, etc.

o  person P, P.degree = masters and P.income > 75,000
 P.credit = excellent
o  person P, P.degree = bachelors and
(P.income 25,000 and P.income 75,000)

P.credit = good
 Rules are not necessarily exact: there may be some
misclassifications
 Classification rules can be shown compactly as a decision tree.
16.  Define Decision Tree.
A decision tree is a decision support tool that uses a tree-like graph or model of decisions and
their possible consequences, including chance event outcomes, resource costs, and utility. It is
one way to display an algorithm. Decision trees are commonly used in operations research,
specifically in decision analysis, to help identify a strategy most likely to reach a goal. Another
use of decision trees is as a descriptive means for calculating conditional probabilities.
17.  How do you construct of Decision Trees?
 Training set: a data sample in which the classification is already
known.
 Greedy top down generation of decision trees.
o Each internal node of the tree partitions the data into groups
based on a partitioning attribute, and a partitioning condition for the node
o Leaf node:
 all (or most) of the items at the node belong to the same class, or
 All attributes have been considered, and no further partitioning is
possible.
18.
How do you find the best splits?
 Categorical attributes (with no meaningful order):

o Multi-way split, one child for each value
o Binary split: try all possible breakup of values into two sets, and
pick the best
 Continuous-valued attributes (can be sorted in a meaningful
order)
o Binary split:
 Sort values, try each as a split point
E.g. if values are 1, 10, 15, 25, split at 1, 10, 15
 Pick the value that gives best split
o Multi-way split:
 A series of binary splits on the same attribute has roughly
equivalent effect
19.
What is meant by Regression?
Regression deals with the prediction of a value, rather than a class.

o Given values for a set of variables, X1, X2, , Xn, we wish to
predict the value of a variable Y.

One way is to infer coefficients a0, a1, a1, , an such that
Y = a0 + a1 * X1 + a2 * X2 + + an * Xn
Finding such a linear polynomial is called linear regression.
o In general, the process of finding a curve that fits the data is also
called curve fitting.
The fit may only be approximate
o because of noise in the data, or
o because the relationship is not exactly a polynomial
Regression aims to find coefficients that give the best possible fit.
20.
Give an example for association rules.
 Retail shops are often interested in associations between
different items that people buy.

o Someone who buys bread is quite likely also to buy milk
o A person who bought the book Database System Concepts is
quite likely also to buy the book Operating System Concepts.
 Associations information can be used in several ways.

o E.g. when a customer buys a particular book, an online shop may
suggest associated books.
 Association rules:

o bread milk
DB-Concepts, OS-Concepts Networks
o Left hand side: antecedent,
right hand side: consequent
o An association rule must have an associated population; the
population consists of a set of instances

 E.g. each transaction (sale) at a shop is an instance, and the set
of all transactions is the population
 Rules have an associated support, as well as an associated
confidence.
21. What is meant by support and confidence?
Support is a measure of what fraction of the population satisfies both the antecedent and the
consequent of the rule.
o E.g. suppose only 0.001 percent of all purchases include milk and
screwdrivers. The support for the rule is milk screwdrivers is low.
 Confidence is a measure of how often the consequent is true
when the antecedent is true.
o E.g. the rule bread milk has a confidence of 80 percent if 80
percent of the purchases that include bread also include milk.
22. Define Clustering.
 Clustering: Intuitively, finding clusters of points in the given data
such that similar points lie in the same cluster
 Can be formalized using distance metrics in several ways

o Group points into k sets (for a given k) such that the average
distance of points from the centroid of their assigned group is minimized

 Centroid: point defined by taking average of coordinates in each
dimension.
o Another metric: minimize average distance between every pair of
points in a cluster
 Has been studied extensively in statistics, but on small data sets
o Data mining systems aim at clustering techniques that can handle
very large data sets
o E.g. the Birch clustering algorithm (more shortly)
23. What is meant by Hierarchical Clustering
 Example from biological classification
o (the word classification here does not mean a prediction
mechanism)
chordata
mammalia
reptilia
leopards humans
snakes crocodiles
 Other examples: Internet directory systems (e.g. Yahoo, more on
this later)
 Agglomerative clustering algorithms
o Build small clusters, then cluster small clusters into bigger
clusters, and so on
 Divisive clustering algorithms
o Start with all items in a single cluster, repeatedly refine (break)
clusters into smaller ones
24. Explain about Birch Algorithm.
 Clustering algorithms have been designed to handle very large
datasets
 E.g. the Birch algorithm
o Main idea: use an in-memory R-tree to store points that are being
clustered
o Insert points one at a time into the R-tree, merging a new point
with an existing cluster if is less than some distance away
o If there are more leaf nodes than fit in memory, merge existing
clusters that are close to each other
o At the end of first pass we get a large number of clusters at the
leaves of the R-tree
 Merge clusters to reduce the number of clusters
25. What is text mining?
 Text mining: application of data mining to textual documents
o cluster Web pages to find related pages
o cluster pages a user has visited to organize their visit history
o classify Web pages automatically into a Web directory
 Data visualization systems help users examine large volumes of
data and detect patterns visually
o Can visually encode large amounts of information on a single
screen
o Humans are very good a detecting visual patterns
26. What are called as web databases?
Semantic web architecture and applications are a dramatic departure from earlier
database and applications generations. Semantic processing includes these earlier statistical
and natural langue techniques, and enhances these with semantic processing tools. First,
Semantic Web architecture is the automated conversion and storage of unstructured text
sources in a semantic web database. Second, Semantic Web applications automatically extract
and process the concepts and context in the database in a range of highly flexible tools.
27. What are

structured and unstructured Data?
Semantic Web architecture and applications handle both structured and unstructured
data. Structured data is stored in relational databases with static classification systems, and
also in discrete documents. These databases and documents can be processed and converted
to Semantic Web databases, and then processed with unstructured data. Much of the data we
read, produce and share is now unstructured; emails, reports, presentations, media content,
web pages. And, these documents are stored in many different formats; text, email files,
Microsoft word processor, spreadsheet, presentation files, Lotus Notes, Adobe.pdf, and HTML.
It is difficult, expensive, slow and inaccurate to attempt to classify and store these in a
structured database. All of these sources can be automatically converted to a common
Semantic Web database, and integrated into one common information source.
28. Compare

and contract with synthetic vs Artificial Intelligence.
Semantic Web technology is NOT Artificial Intelligence. AI was a mythical marketing goal to
create thinking machines. The Semantic Web supports a much more limited and realistic goal.
This is Synthetic Intelligence. The concepts and relationships stored in the Semantic Web
database are synthesized, or brought together and integrated, to automatically create a new
summary, analysis, report, email, alert; or launch another machine application. The goal of
Synthetic Intelligence information systems is bringing together all information sources and user
knowledge, and synthesizing these in global networks.
29. What are mobile databases?
The main advantage of using a mobile database in your application is offline access to datain
other words, the ability to read and update data without a network connection. This helps avoid
problems such as dropped connections, low bandwidth, and high latency that are typical on
wireless networks today.
Fully connected information space
 Each node of the information space has some communication
capability.
 Some node can process information.
 Some node can communicate through voice channel.
 Some node can do both
Can be created and maintained by integrating legacy database systems, and wired and wireless
systems (PCS, Cellular system, and GSM)
30. What is a Mobile Database System (MDS)?
A system with the following structural and functional properties
 Distributed system with mobile connectivity
 Full database system capability
 Complete spatial mobility
 Built on PCS/GSM platform
 Wireless and wired communication capability
UNIT IV - DATABASE DESIGN ISSUES

TWO MARKS


1. What are attributes? Give example.
An entity is represented by a set of attributes. Attributes are descriptive properties possessed by each
member of an entity set.
Example: possible attributes of customer entity are customer name, customer id, Customer Street,
customer city.
2. Define single valued and multi valued attributes.

Single valued attributes: attributes with a single value for a particular entity are called single valued
attributes.
Multi valued attributes: Attributes with a set of value for a particular entity are called multivalve
attributes.
3. What are stored and derived attributes?

Stored attributes: The attributes stored in a data base are called stored attributes.
Derived attributes: The attributes that are derived from the stored attributes are called derived attributes.
4. Define weak and strong entity sets?

Weak entity set: entity set that do not have key attribute of their own are called weak entity sets.
Strong entity set: Entity set that has a primary key is termed a strong entity set.
5. What is the use of integrity constraints?

Integrity constraints ensure that changes made to the database by authorized users do not result in a loss
of data consistency. Thus integrity constraints guard against accidental damage to the database.
6. Mention the various levels in security measures.

Database system, Operating system, Network, Physical, Human
7. What are the types of attributes?

Simple: Each entity has a single atomic value for the attribute.
Composite: The attribute may be composed of several components.
Multi-valued: An entity may have multiple values for that attribute.
8. Write the entity set corresponding to the entity type car.

Registration (RegistrationNumber, State), VehicleID, Make, Model, Year, (Color)
Example:
((ABC 123, TEXAS), TK629, Ford Mustang, convertible, 1999,(red,balck))
9. What are Weak entity types?
An entity that does not have a key attribute. A Weak attribute entity must participate in an identifying
relationship type with an owner or identifying entity type.
10. Define Normalization.

Normalization is carried out in practice so that the resulting designs are of high quality and meet the
desirable properties
11. What is Demoralization?

The process of storing the join of higher normal form relations as a base relationwhich is in a lower
normal form
12. What is the need of normalization?

 Mixing attributes of multiple entities may cause problems
 Information is stored redundantly wasting storage
 Problems with update anomalies
13. What is integrity? And what are the types of integrity?
Integrity here refers to the CORRECTNESS & CONSISTENCY of the data

stored in the database
Types: Database Integrity, Entity integrity, Referential Integrity
14. What is meant by consistency?
It ensures the truthfulness of the database. The consistency property ensures that any transaction the
database performs will take it from one consistent state to another. The consistency property does not say
how the DBMS should handle an inconsistency other than ensure the database is clean at the end of the
transaction.
15. What is an entity relationship model?

The entity relationship model is a collection of basic objects called entities and relationship among those
objects. An entity is a thing or object in the real world that is distinguishable from other objects.
16. Define query optimization.

Query optimization refers to the process of finding the lowest cost method of evaluating a given query.
17. What are the tunable parameters?
Tuning
of hardware
of schema
of indices
of materialized views
of transactions
18. What is hardware tuning?
Even well-tuned transactions typically
require a few I/O operations
-
Typical disk supports
about 100 random I/O operations per second

- Suppose each transaction
requires just 2 random I/O operations. Then to support n
transactions per second, we need to stripe data across n/50 disks
(ignoring skew)
Number of I/O operations per transaction
can be reduced by keeping more data in memory
- If all data is in memory, I/O
needed only for writes
- Keeping frequently used data
in memory reduces disk accesses, reducing number of disks
required, but has a memory cost
20. What is mean by schema tuning?
 Vertically partition relations to
isolate the data that is accessed most often -- only fetch needed
information.
 E.g., split

account into two, (account-number, branchname) and (account-number, balance).
Improve performance by storing
a denormalized relation
 E.g., store

join of account and depositor; branch-name and
balance information is repeated for each holder
of an account, but join need not be computed
repeatedly.
Cluster together on the same
disk
page
records
match in a frequently required join,
that
would
  compute

join very efficiently when required.
21. What is Index tuning?
o Create appropriate indices to

speed up slow queries/updates
o Speed up slow updates by
removing excess indices (tradeoff between queries and updates)
o Choose type of index (Btree/hash) appropriate for most frequent types of queries.
o Choose which index to make
clustered
o Index tuning wizards look at past history
of queries and updates (the workload) and recommend which indices
would be best for the workload
22. What

is the use of performance Simulation?
Performance simulation using queuing model useful to predict bottlenecks
as well as the effects of tuning changes, even without access to real
system.
23. What are temporal databases?
Non Temporal
store only a single state of the real world, usually the most recent
state
classified as snapshot databases
application developers and database designers need to code for
time varying data requirements eg history tables, forecast reports etc
Temporal
stores upto two dimensions of time i.e VALID (stated) time and
TRANSACTION (logged) time
Classified as historical, rollback or bi-temporal
No need for application developers or database designers to code
for time varying data requirements i.e time is inherently supported
24. Give Examples of application domains dealing with time varying data.
Financial Apps (e.g. history of stock market data)

Insurance Apps (e.g. when were the policies in effect)
Reservation Systems (e.g. when is which room in a hotel booked)
Medical Information Management Systems (e.g. patient records)
Decision Support Systems (e.g. planning future contigencies)
CRM applications (eg customer history / future)
 HR applications (e.g Date tracked positions in hierarchies)
25. What is a SDBMS?

A SDBMS is a software module that
o can work with an underlying DBMS
o supports spatial data models, spatial abstract data types (ADTs) and a
query language from which these ADTs are callable
o supports spatial indexing, efficient algorithms for processing spatial
operations, and domain specific rules for query optimization
Examples: Oracle Spatial data cartridge, ESRI SDE
26. Give examples for spatial databases.

Consider a spatial dataset with:
County boundary (dashed white line)
Census block - name, area, population, boundary (dark line)
Water bodies (dark polygons)
Satellite Imagery (gray scale pixels)
Storage in a SDBMS table:
create table census_blocks (

name
string,
area
float,
population
number,
boundary
polygon );
27. How is a SDBMS different from a GIS?

GIS is a software to visualize and analyze spatial data using
spatial analysis functions such as

Search Thematic search, search by region, (re-)classification
Location analysis Buffer, corridor, overlay
Terrain analysis Slope/aspect, catchment, drainage network
Flow analysis Connectivity, shortest path
Distribution Change detection, proximity, nearest neighbor
Spatial analysis/Statistics Pattern, centrality, autocorrelation,
indices of similarity, topology: hole description

Measurements Distance, perimeter, shape, adjacency, direction
GIS uses SDBMS
to store, search, query, share large spatial data sets
28. What are the components of a SDBMS?

Recall: a SDBMS is a software module that
can work with an underlying DBMS
supports spatial data models, spatial ADTs and a query language
from which these ADTs are callable

supports spatial indexing, algorithms for processing spatial
operations, and domain specific rules for query optimization
Components include
Spatial data models, query language, query processing, file
organization and indices, query optimization, etc.
UNIT V - CURRENT ISSUES

TWO MARKS


1. Write any four rules defined in relational database design?

 All database management must take place using the relational
database's innate functionality
 All information in the database must be stored as values in a table
 All database information must be accessible through the combination of
a table name, primary key and column name.
 The database must use NULL values to indicate missing or unknown
information
 The database schema must be described using the relational database
syntax
 The database may support multiple languages, but it must support at
least one language that provides full database functionality (e.g. SQL)
 The system must be able to update all updatable views
 The database must provide single-operation insert, update and delete
functionality
 Changes to the physical structure of the database must be transparent
to applications and users.
 Changes to the logical structure of the database must be transparent to
applications and users.
 The database must natively support integrity constraints.
 Changes to the distribution of the database (centralized vs. distributed)
must be transparent to applications and users.
 Any languages supported by the database must not be able to subvert
integrity controls
2. What are knowledge bases?
A knowledge base (abbreviated KB, kb or ) is a special kind of database for knowledge management. A
knowledge base provides a means for information to be collected, organized, shared, searched and
utilized.
3. What are characteristics of knowledge bases?

characteristics of KBSs:
 intelligent information processing systems
 representation of domain of interest symbolic representation
 problem solving by symbol-manipulation Symbolic programs
4. What are the main components in Knowledge bases?
 knowledge-base (KB)
 inference engine
 case-specific database
 explanation subsystem
 knowledge acquisition subsystem
 user interface ( user)
 specially intefaces

 developer interface ( knowledge engineer, human expert)
5. What are called active databases?
An Active Database is a database that includes an event driven architecture (often in the form of ECA
rules) which can respond to conditions both inside and outside the database. Possible uses include
security monitoring, alerting, statistics gathering and authorization. Most modern relational databases
include active database features in the form of SQL Triggers.
6. Differentiate database and information system.
7. What are deductive databases?

A deductive database is a database system that can make deductions (i.e.: conclude additional facts)
based on rules and facts stored in the (deductive) database. Datalog is the language typically used to
specify facts, rules and queries in deductive databases. Deductive databases have grown out of the
desire to combine logic programming with relational databases to construct systems that support a
powerful formalism and are still fast and able to deal with very large datasets. Deductive databases are
more expressive than relational databases but less expressive than logic programming systems.
Deductive databases have not found widespread adoptions outside academia, but some of their concepts
are used in today's relational databases to support the advanced features of more recent SQL standards.
8. What are parallel databases?
 Parallel machines are becoming quite common and affordable
o Prices of microprocessors, memory and disks have dropped sharply
o Recent desktop computers feature multiple processors and this trend is
projected to accelerate
 Databases are growing increasingly large
o Large volumes of transaction data are collected and stored for later
analysis.
o multimedia objects like images are increasingly stored in databases
 Large-scale parallel database systems increasingly used for:
o storing large volumes of data

o processing time-consuming decision-support queries
providing high throughput for transaction processing
9. What are the types of skew?
o Attribute-value skew
 Some values appear in the partitioning attributes of many tuples; all the
tuples with the same value for the partitioning attribute end up in the same partition.
 Can occur with range-partitioning and hash-partitioning.
o Partition skew
 With range-partitioning, badly chosen partition vector may assign too
many tuples to some partitions and too few to others.
 Less likely with hash-partitioning if a good hash-function is chosen.
10. How do you handling skew using histograms?

 Balanced partitioning vector can be constructed from histogram in a
relatively straightforward fashion. Assume uniform distribution within each range of the histogram
 Histogram can be constructed by scanning relation, or sampling (blocks
containing) tuples of the relation
11. How do you handling skew using virtual processor partitioning?

 Skew in range partitioning can be handled elegantly using virtual
processor partitioning: Create a large number of partitions (say 10 to 20 times the number of
processors). Assign virtual processors to partitions either in round-robin fashion or based on estimated
cost of processing each virtual partition
 Basic idea:
If any normal partition would have been skewed, it is very likely the skew is spread over a number of
virtual partitions
Skewed virtual partitions get spread across a number of processors, so work gets distributed evenly!
12. What is interquery Parallelism?
 Queries/transactions execute in parallel with one another.
 Increases transaction throughput; used primarily to scale up a

transaction processing system to support a larger number of transactions per second.
 Easiest form of parallelism to support, particularly in a shared-memory
parallel database, because even sequential database systems support concurrent processing.
 More complicated to implement on shared-disk or shared-nothing
architectures
o Locking and logging must be coordinated by passing messages between
processors.
o Data in a local buffer may have been updated at another processor.
Cache-coherency has to be maintained reads and writes of data in buffer must find latest version of
data.
13. Define cache coherency protocol with example.
 Example of a cache coherency protocol for shared disk systems:
o Before reading/writing to a page, the page must be locked in
shared/exclusive mode.
o On locking a page, the page must be read from disk
o Before unlocking a page, the page must be written to disk if it was
modified.
 More complex protocols with fewer disk reads/writes exist.
 Cache coherency protocols for shared-nothing systems are similar.
Each database page is assigned a home processor. Requests to fetch the page or write it to disk are sent
to the home processor.
14. What is meant by intraquery parallelism?

 Execution of a single query in parallel on multiple processors/disks;
important for speeding up long-running queries.
 Two complementary forms of intraquery parallelism :
Interoperation Parallelism parallelize the execution of each individual operation in the query.
Interoperation Parallelism execute the different operations in a query expression in parallel. The first
form scales better with increasing parallelism because the number of tuples processed by each operation
is typically more than the number of operations in a query.
15. Write some design issues of parallel systems?

 Parallel loading of data from external sources is needed in order to
handle large volumes of incoming data.
 Resilience to failure of some processors or disks.
 On-line reorganization of data and schema changes must be supported.
 Also need support for on-line repartitioning and schema changes
(executed concurrently with other processing).
16. What are the multimedia data types?

 Text
 Image
 Video
 Audio
 mixed multimedia data
17. What is image database system?

Image Database is searchable electronic catalog or database which allows you to organize and list
images by topics, modules, or categories. The Image Database will provide the student with important
information such as image title, description, and thumbnail picture. Additional information can be provided
such as creator of the image, filename, and keywords that will help students to search through the
database for specific images. Before you and your students can use Image Database, you must add it to
your course.
18. Give a note on image retrieval system.

An image retrieval system is a computer system for browsing, searching and retrieving images from a
large database of digital images. Most traditional and common methods of image retrieval utilize some
method of adding metadata such as captioning, keywords, or descriptions to the images so that retrieval
can be performed over the annotation words. Manual image annotation is time-consuming, laborious and
expensive; to address this, there has been a large amount of research done on automatic image
annotation. Additionally, the increase in social web applications and the semantic web have inspired the
development of several web-based image annotation tools.
19. Give a note on image search and methods.

Image search is a specialized data search used to find images. To search for images, a user may provide
query terms such as keyword, image file/link, or click on some image, and the system will return images
"similar" to the query. The similarity used for search criteria could be meta tags, color distribution in
images, region/shape attributes, etc.
Image meta search - search of images based on associated metadata
such as keywords, text, etc.

Content-based image retrieval (CBIR) the application of computer
vision to the image retrieval. CBIR aims at avoiding the use of textual descriptions and instead retrieves
images based on similarities in their contents (textures, colors, shapes etc.) to a user-supplied query
image or user-specified image features.
20. How the image data are classified?
Archives - usually contain large volumes of structured or semi-
structured homogeneous data pertaining to specific topics.

Domain-Specific Collection - this is a homogeneous collection
providing access to controlled users with very specific objectives. Examples of such a collection
are biomedical and satellite image databases.
Enterprise Collection - a heterogeneous collection of images
that is accessible to users within an organizations intranet. Pictures may be stored in many
different locations.
Personal Collection - usually consists of a largely homogeneous
collection and is generally small in size, accessible primarily to its owner, and usually stored on
a local storage media.
Web - World Wide Web images are accessible to everyone with
an Internet connection. These image collections are semi-structured, non-homogeneous and

massive in volume, and are usually stored in large disk arrays.
21. What are text databases?
Large collections of documents from various sources: news articles, research papers, books,
digital libraries, e-mail messages, and web pages, library database, etc.

IT9223

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

IT9223

Încărcat de

Drepturi de autor:

Formate disponibile

IT9223- ADVANCED DATABASE SYSTEMS -UNIT-1-2MARKS

IT9223 ADVANCED DATABASE SYSTEMS

1. What are advantages of distributed databases over conventional database?

mimics organisational structure with data

more complex to implement

2. Write about distributed databases architecture

Defines the structure of the system

3. What is a distributed database?

4. What are implicit assumptions in DBMS?

5. Write the different dimensions of the problem in DDBMS

6. What are the issues in DDBMS?

Capacity and incremental growth

8. What are the disadvantages of distributed databases?

Manufacturing - especially multi-plant manufacturing

Has been studied within the centralized context

1 if attribute Aj is referenced by query qi

14. What is the significance of Query processing?

using client-server architecture

Query language that is used

Minimize a cost function

These might have different weights in different distributed environments

Single query at a time

20. What are statistics in query optimization?

22. What are the types of network topologies used in DDBMS?

Wide area networks (WAN) point-to-point

Input: Calculus query on global relations

manipulate query quantifiers and qualification

24. What is Data Localization?

Input: Algebraic query on distributed relations

25. What is Global Query Optimization?

Input: Fragment query

27. Define Transaction.

28. What are the properties of Transaction?

Four modes of management possible:

IT9223- ADVANCED DATABASE SYSTEMS -UNIT-2-2MARKS

UNIT II - OBJECT ORIENTED DATABASES

Became commercially popular in mid 1990s

2. What is Object Oriented Database? (OODB)

3. What are the advantages of OODBS?

4. List out the Object Database Vendors.

Matisse Software Inc.,

Differentiate OODB and Relational DB?

Uses record-oriented model.

Language dependence (OO

Language independence (via SQL)

Write about modeling and design in OODBMS.

7. What is Object data modeling?

10. What is Generalization and serialization?

11. Define Message.

12. Define Method.

13. What are the drawbacks of persistent programming languages?

Declarative query language Not computationally complete

17. What is Mutiversion concurrency control?

Transaction failures Transaction aborts

20. What is meant by concurrency control?

21. What is meant by media recovery?

22. Define logging and recovery.

23. What is meant by transaction?

26. What is meant by disaster recovery?

27. What is data persistence?

28. Define transaction recovery.

1. What is client/server model?

5. What are the challenges of Three-tier Architectures?