Documente Academic
Documente Profesional
Documente Cultură
o
o
o
o
o
o
o
Distribution
Whether the components of the system are located on the same machine or not
Heterogeneity
Various levels (hardware, communications, operating system)
DBMS important one
data model, query language,transaction management algorithms
Autonomy
Not well understood and most troublesome
Various versions
Design autonomy: Ability of a component DBMS to decide on issues related to its own design.
Communication autonomy: Ability of a component DBMS to decide whether and how to communicate with
other DBMSs.
Execution autonomy: Ability of a component DBMS to execute local operations in any manner it wants to.
Relation
views are subsets of relations locality
extra communication
Fragments of relations (sub-relations)
concurrent execution of a number of transactions that access different portions of a relation
views that cannot be defined on a single fragment will require extra processing
semantic data control (especially integrity enforcement) more difficult
11. What are the types of Fragmentation?
Horizontal Fragmentation (HF)
splitting the database by rows
e.g. A-J in site 1, K-S in site 2 and T-Z in site 3
o Primary Horizontal Fragmentation (PHF)
o Derived Horizontal Fragmentation (DHF)
Vertical Fragmentation (VF)
Splitting database by columns/fields
e.g. columns/fields 1-3 in site A, 4-6 in site B
Take the primary key to all sites
Hybrid Fragmentation (HF)
Horizontal and vertical could even be combined
12. Define Vertical Fragmentation. And what is the advantage of VF?
o
o
o
o
attributes to fragments
splitting
relation to fragments
Overlapping fragments
grouping
Non-overlapping fragments
splitting
We do not consider the replicated key attributes to be overlapping.
Advantage: Easier to enforce functional dependencies (for integrity checking etc.)
13. What are the Information Requirements in VF?
Application Information
Attribute affinities
a measure that indicates how closely related the attributes are
This is obtained from more primitive usage data
Attribute usage values
Given a set of queries Q = {q1, q2,, qq} that will run on the relation R[A1, A2,, An],
use(qi,Aj) =
0 otherwise
o
o
o
o
o
o
Exhaustive search
cost-based
optimal
combinatorial complexity in the number of relations
Heuristics
not optimal
regroup common sub-expressions
perform selection, projection first
replace a join by a series of semijoins
reorder operations to reduce intermediate relation size
optimize individual operations
18. What is Optimization Granularity
o
o
o
o
o
o
o
o
o
o
o
o
o
Static
compilation optimize prior to the execution
difficult to estimate the size of the intermediate results error propagation
can amortize over many executions
R*
Dynamic
run time optimization
exact information on the intermediate relation sizes
have to reoptimize for multiple executions
Distributed INGRES
Hybrid
compile using a static algorithm
if the error in estimate sizes > threshold, reoptimize at run time
MERMAID
o
o
o
o
o
o
o
Relation
cardinality
size of a tuple
fraction of tuples participating in a join with another relation
Attribute
cardinality of domain
actual number of distinct values
Common assumptions
independence between different attribute values
uniform distribution of attribute values within their domain
21. What are the decision sites are used in Query Optimization?
o
o
o
o
o
o
Centralized
single site determines the best schedule
simple
need knowledge about the entire distributed database
Distributed
cooperation among sites to determine the schedule
need only local information
cost of cooperation
Hybrid
one site determines the global schedule
each site optimizes the local sub queries
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
Local autonomy
No reliance on a central site
Continuous operation
Location independence
Fragmentation independence
Replication independence
Distributed Query processing
Distributed transaction processing
Hardware independence
Operating System independence
Network independence
Database independence
Concurrency transparency
Failure transparency
Commit protocols
Termination protocols
Recovery protocols
Independent recovery non-blocking termination
Designer can specify the structure of objects and their behavior (methods)
Better interaction with object-oriented languages such as Java and C++
Definition of complex and user-defined types
Encapsulation of operations and user-defined methods
5.
Relational DB
6.
Uses an OO model.
Data is a collection of objects
whose behavior, state, and
relationships are stored as a
physical entity.
No impedance mismatch in
application using OODB.
Impedance mismatch in OO
application. Mapping must be
performed.
Basically, an OODBMS is an object database that provides DBMS capabilities to objects that have been
created using an object-oriented programming language (OOPL). The basic principle is to add
persistence to objects and to make objects persistent. Consequently application programmers who use
OODBMSs typically write programs in a native OOPL such as Java, C++ or Smalltalk, and the language
has some kind of Persistent class, Database class, Database Interface, or Database API that provides
DBMS functionality as, effectively, an extension of the OOPL.
8. Define Attributes.
Attributes are like the fields in a relational model. However in the Book example we have,for attributes
publishedBy and writtenBy, complex types Publisher and Author,which are also objects. Attributes with
complex objects, in RDNS, are usually other tableslinked by keys to the employee table.
9. Define Relationships.
Relationships: publish and written by are associations with I: N and 1:1 relationship; composed of is an
aggregation (a Book is composed of chapters). The 1: N relationship is usually realized as attributes
through complex types and at the behavioral level. For example,
Message: means by which objects communicate, and it is a request from one object to another to
execute one of its methods. For example:
Publisher_object.insert (Rose, 123,) i.e. request to execute the insert method on a Publisher object )
1.
2.
3.
1.
2.
3.
Geographic locations
Reservoir levels during droughts
Recent flood conditions, etc
To provide reliable units of work that allow correct recovery from failures and keep a database consistent
even in cases of system failure, when execution stops (completely or partially) and many operations upon
a database remain uncompleted, with unclear status.
2.
To provide isolation between programs accessing a database concurrently. If this isolation is not provided
the programs outcome are possibly erroneous.
25. What is Crash recovery?
Crash recovery is needed when the whole database (transaction) system fails, e.g. due to a hardware or
software error. All transactions which were active and not yet committed at crash time have failed so that
their changes must be undone. The changes for transactions that have committed before the crash must
survive. A redo recovery is needed for all changes of committed transactions that have been lost by the
crash because the changed pages resided only in main memory but were not yet written out to the
permanent database.
Disaster recovery can be achieved by maintaining a backup copy of the database at a geographically
remote location. By continuously transferring log data from the primary database to the backup and
applying the changes there, the backup can be kept (almost) up-to-date.
UNIT III
EMERGING SYSTEMS
TWO MARKS
<!--[if !vml]-->
<!--[endif]-->
is only on server
<!--[if !supportLists]-->3.<!--[endif]-->What are the advantages of database server?
<!--[if !supportLists]-->
<!--[endif]-->Clients do not have to be as powerful
<!--[if !supportLists]-->
<!--[endif]-->Greatly reduces data traffic on the network
<!--[if !supportLists]-->
<!--[endif]-->Improved data integrity since it is all processed centrally
<!--[if !supportLists]-->
<!--[endif]-->Stored procedures some business rules done on server
<!--[if !supportLists]-->4.<!--[endif]-->What are the advantages of Three-Tier Architectures?
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[endif]-->Scalability
<!--[endif]-->Technological flexibility
<!--[endif]-->Long-term cost reduction
<!--[endif]-->Better match of systems to business needs
<!--[endif]-->Improved customer service
<!--[endif]-->Competitive advantage
<!--[endif]-->Reduced risk
read/update/insert/delete privileges
<!--[if !supportLists]-->o <!--[endif]-->Secure client/server communication
<!--[if !supportLists]--> <!--[endif]-->via encryption
<!--[if !supportLists]-->7. <!--[endif]-->Define data warehousing.
<!--[if !supportLists]--> <!--[endif]-->Data sources often store only current data, not historical data
<!--[if !supportLists]--> <!--[endif]-->Corporate decision making requires a unified view of all
gathered from multiple sources, stored under a unified schema, at a single site
<!--[if !supportLists]-->o <!--[endif]-->Greatly simplifies querying, permits study of historical trends
<!--[if !supportLists]-->o <!--[endif]-->Shifts decision support query load away from transaction
processing systems
<!--[if !supportLists]-->8.<!--[endif]-->What are the Design Issues in data warehousing?
<!--[if !supportLists]--> <!--[endif]-->When and how to gather data
<!--[if !supportLists]-->o <!--[endif]-->Source driven architecture: data sources transmit new information
to warehouse, either continuously or periodically (e.g. at night)
<!--[if !supportLists]-->o <!--[endif]-->Destination driven architecture: warehouse periodically requests
new information from data sources
<!--[if !supportLists]-->o <!--[endif]-->Keeping warehouse exactly synchronized with data sources (e.g.
using two-phase commit) is too expensive
<!--[if !supportLists]--> <!--[endif]-->What schema to use
<!--[if !supportLists]-->o <!--[endif]-->Schema integration
<!--[if !supportLists]--> <!--[endif]-->Data cleansing
<!--[if !supportLists]-->o <!--[endif]-->E.g. correct mistakes in addresses (misspellings, zip code errors)
<!--[if !supportLists]-->o <!--[endif]-->Merge address lists from different sources and purge duplicates
<!--[if !supportLists]--> <!--[endif]-->How to propagate updates
<!--[if !supportLists]-->o <!--[endif]-->Warehouse schema may be a (materialized) view of schema from
data sources
on some attributes (income, job type, age, ..) and past history
<!--[if !supportLists]-->o <!--[endif]-->Predict if a pattern of phone calling card usage is likely to be
fraudulent
<!--[if !supportLists]-->13. <!--[endif]-->Give some examples of prediction mechanisms.
<!--[if !supportLists]-->o <!--[endif]-->Classification
<!--[if !supportLists]--> <!--[endif]-->Given a new item whose class is unknown, predict to which class
it belongs
<!--[if !supportLists]-->o <!--[endif]-->Regression formulae
<!--[if !supportLists]--> <!--[endif]-->Given a set of mappings for an unknown function, predict the
<!--[if !supportLists]--> <!--[endif]-->Classification rules for above example could use a variety of
order)
<!--[if !supportLists]-->o <!--[endif]-->Binary split:
<!--[if !supportLists]--> <!--[endif]-->Sort values, try each as a split point
<!--[if !supportLists]-->
<!--[endif]-->E.g. if values are 1, 10, 15, 25, split at 1, 10, 15
<!--[if !supportLists]--> <!--[endif]-->Pick the value that gives best split
<!--[if !supportLists]-->o <!--[endif]-->Multi-way split:
<!--[if !supportLists]--> <!--[endif]-->A series of binary splits on the same attribute has roughly
equivalent effect
<!--[if !supportLists]-->19.
Regression aims to find coefficients that give the best possible fit.
<!--[if !supportLists]-->20.
<!--[endif]-->Give an example for association rules.
<!--[if !supportLists]--> <!--[endif]-->Retail shops are often interested in associations between
<!--[if !supportLists]-->
<!--[endif]-->chordata
mammalia
reptilia
leopards humans
snakes crocodiles
<!--[if !supportLists]--> <!--[endif]-->Other examples: Internet directory systems (e.g. Yahoo, more on
this later)
<!--[if !supportLists]--> <!--[endif]-->Agglomerative clustering algorithms
<!--[if !supportLists]-->o <!--[endif]-->Build small clusters, then cluster small clusters into bigger
clusters, and so on
<!--[if !supportLists]--> <!--[endif]-->Divisive clustering algorithms
<!--[if !supportLists]-->o <!--[endif]-->Start with all items in a single cluster, repeatedly refine (break)
clusters into smaller ones
<!--[if !supportLists]-->24. <!--[endif]-->Explain about Birch Algorithm.
<!--[if !supportLists]--> <!--[endif]-->Clustering algorithms have been designed to handle very large
datasets
<!--[if !supportLists]--> <!--[endif]-->E.g. the Birch algorithm
<!--[if !supportLists]-->o <!--[endif]-->Main idea: use an in-memory R-tree to store points that are being
clustered
<!--[if !supportLists]-->o <!--[endif]-->Insert points one at a time into the R-tree, merging a new point
with an existing cluster if is less than some distance away
<!--[if !supportLists]-->o <!--[endif]-->If there are more leaf nodes than fit in memory, merge existing
clusters that are close to each other
<!--[if !supportLists]-->o <!--[endif]-->At the end of first pass we get a large number of clusters at the
leaves of the R-tree
<!--[if !supportLists]--> <!--[endif]-->Merge clusters to reduce the number of clusters
<!--[if !supportLists]-->25. <!--[endif]-->What is text mining?
<!--[if !supportLists]--> <!--[endif]-->Text mining: application of data mining to textual documents
<!--[if !supportLists]-->o <!--[endif]-->cluster Web pages to find related pages
<!--[if !supportLists]-->o <!--[endif]-->cluster pages a user has visited to organize their visit history
<!--[if !supportLists]-->o <!--[endif]-->classify Web pages automatically into a Web directory
<!--[if !supportLists]--> <!--[endif]-->Data visualization systems help users examine large volumes of
data and detect patterns visually
<!--[if !supportLists]-->o <!--[endif]-->Can visually encode large amounts of information on a single
screen
<!--[if !supportLists]-->o <!--[endif]-->Humans are very good a detecting visual patterns
<!--[if !supportLists]-->26. <!--[endif]-->What are called as web databases?
Semantic web architecture and applications are a dramatic departure from earlier
database and applications generations. Semantic processing includes these earlier statistical
and natural langue techniques, and enhances these with semantic processing tools. First,
Semantic Web architecture is the automated conversion and storage of unstructured text
sources in a semantic web database. Second, Semantic Web applications automatically extract
and process the concepts and context in the database in a range of highly flexible tools.
read, produce and share is now unstructured; emails, reports, presentations, media content,
web pages. And, these documents are stored in many different formats; text, email files,
Microsoft word processor, spreadsheet, presentation files, Lotus Notes, Adobe.pdf, and HTML.
It is difficult, expensive, slow and inaccurate to attempt to classify and store these in a
structured database. All of these sources can be automatically converted to a common
Semantic Web database, and integrated into one common information source.
!vml]-->
<!--[endif]-->
1. What are attributes? Give example.
An entity is represented by a set of attributes. Attributes are descriptive properties possessed by each
member of an entity set.
Example: possible attributes of customer entity are customer name, customer id, Customer Street,
customer city.
An entity that does not have a key attribute. A Weak attribute entity must participate in an identifying
relationship type with an owner or identifying entity type.
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[endif]-->Tuning
<!--[endif]-->Tuning
<!--[endif]-->Tuning
<!--[endif]-->Tuning
<!--[endif]-->Tuning
of hardware
of schema
of indices
of materialized views
of transactions
<!--[if !supportLists]-->
<!--[if !supportLists]-->-
a denormalized relation
disk
page
records
match in a frequently required join,
that
would
24. Give Examples of application domains dealing with time varying data.
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
<!--[if !supportLists]-->
!vml]-->
<!--[endif]-->
<!--[if !supportLists]--> <!--[endif]-->The system must be able to update all updatable views
<!--[if !supportLists]--> <!--[endif]-->The database must provide single-operation insert, update and delete
functionality
<!--[if !supportLists]--> <!--[endif]-->Changes to the physical structure of the database must be transparent
to applications and users.
<!--[if !supportLists]--> <!--[endif]-->Changes to the logical structure of the database must be transparent to
applications and users.
<!--[if !supportLists]--> <!--[endif]-->The database must natively support integrity constraints.
<!--[if !supportLists]--> <!--[endif]-->Changes to the distribution of the database (centralized vs. distributed)
must be transparent to applications and users.
<!--[if !supportLists]--> <!--[endif]-->Any languages supported by the database must not be able to subvert
integrity controls
2. What are knowledge bases?
A knowledge base (abbreviated KB, kb or ) is a special kind of database for knowledge management. A
knowledge base provides a means for information to be collected, organized, shared, searched and
utilized.
vision to the image retrieval. CBIR aims at avoiding the use of textual descriptions and instead retrieves
images based on similarities in their contents (textures, colors, shapes etc.) to a user-supplied query
image or user-specified image features.
20. How the image data are classified?
<!--[if !supportLists]-->
providing access to controlled users with very specific objectives. Examples of such a collection
are biomedical and satellite image databases.
<!--[if !supportLists]-->
that is accessible to users within an organizations intranet. Pictures may be stored in many
different locations.
<!--[if !supportLists]-->
collection and is generally small in size, accessible primarily to its owner, and usually stored on
a local storage media.
<!--[if !supportLists]-->
Large collections of documents from various sources: news articles, research papers, books,
digital libraries, e-mail messages, and web pages, library database, etc.