Documente Academic
Documente Profesional
Documente Cultură
Definition of Terms
DOCUMENT
INDEX
INDEXING
TOOL
PATRON
ABSTRACT
Information Retrieval
System
Information retrieval system is a mechanism
for carrying out the functions of information
retrieval process.
Organization of information may take in
different forms (manual, by the use of
computer or a combination of both).
Most challenging problem: providing for the
nearest possible response or coincidence
Modern information retrieval systems: data
retrieval, reference retrieval and text retrieval.
Information Retrieval
System
Functions involved:
1. The information is created and acquired for the
system.
2. Knowledge records are analyzed and tagged by
set of index terms.
3. The knowledge records are stored physically
and index terms are stored into a structured file.
4. The users query is tagged with sets of index
terms and then is matched against tagged
records.
5. Matched documents are retrieved for review.
6. Feedback may lead to several reiterations of
the search.
Is user satisfied?
Search is completed
Stop
Types of Indexes
by arrangement
a. Alphabetical index
Advantage:
More convenient to use and follows an
order that is familiar to users.
Drawbacks:
synonymy
scattering of entries
b. Classified index
Advantages:
useful for generic searches.
Brings similar things together.
Drawbacks:
Most users find them difficult to use.
Needs a secondary file.
One cannot enter it directly as one can with
alphabetical sequences of names.
c. Concordance
Uses:
Locate a partly or completely remembered
passage
Drawback:
Searching is difficult since this type of index
spreads similar entries over many synonymous
terms, ignores misspellings, and confuses any
general-specific term relationships.
4.28
5.30
4.14
2. Periodical index*
consistency becomes the most
challenging part
open-ended projects
scope is broader
3. Newspaper index
vocabulary control becomes a
paramount challenge
4. Audiovisual materials index
textual labeling is needed along with
image matching
level of specificity
may be lower than
the book index.
by physical form
card index
printed index
microform index
computerized index
automatic indexing
computer-assisted indexing
Indexing Languages
Purposes and Uses
a system for naming or identifying subjects contained
in a document.
as a tool for communication
Features/Characteristics
Vocabulary refers to terms selected from the
indexing of concepts.
Syntactics refers to the combination and
modification of terms to form headings and multilevel
headings or to form search statements.
Example: Employees, Training of; Training of employees
Semantics the study of meaning as expressed in
communication such as words.
Hierarchical relationship
Genus species relationship (represents class
inclusion
Example:
Foot Toes
Men Women
Education Teaching
Types of Indexing
Languages
Redundancy is greater
Disadvantages of Controlled
Vocabulary Language:
Incompatibility of different indexing
languages.
High input cost.
The possibility of inadequate
vocabulary.
Types of Controlled
Vocabulary
1. Authority List / Subject Authority List
Examples:
2. Thesaurus
Poly-hierarchical
Examples:
Relationships of Terms
INTELLIGENCE
BT: Ability
NT: Comprehension
RT: Talent
Aptitude
Broader term (BT) reference shows hierarchical relationship
upward in the classification tree.
Narrower term (NT) reference is similar to the broader term
reference, except it goes down in the classification tree.
Related term (RT) reference refers to a descriptor that can be
used in addition to the basic term but is not in a hierarchical
relationship.
Use reference refers to a preferred descriptor from a non-usable
term.
Use for (UF) reference deals primarily with synonymous or variant
forms of the preferred descriptor. It is also used to lead the indexer
to more general terms.
Scope Note (SN) is used to give the users about the descriptors
usage restrictions or to clarify ambiguity.
Thesaurus
1.
2.
3.
4.
Indexing Systems
1. Coordinate indexes an indexing
scheme that combines single index terms to
create composite subject concepts
Types:
post-coordinate indexing
pre-coordinate indexing
3. Chain indexes
Provide that every concept becomes
linked, or chained.
Introduced by S.R. Ranganathan as
part of his Colon Classification, the
system uses synthesis or number
building. The number that represents
some complex subject is arrived at by
joining the notational elements that
represent more elemental subjects.
Literature
2
English
Poetry
.8
Victorian period
821
820
800
821.8
Example:
Topic is measures from information theory of the information
content of document surrogates
@MEASURES? OF<INFORMATION CONTENT?OF <DOCUMENT
SURROGATES>>?FROM<INFORMATION THEORY>
Sample index strings that may be generated from the above
input string are:
DOCUMENT SURROGATES. INFORMATION CONTENT. MEASURES
FROM INFORMATION THEORY
INFORMATION CONTENT OF DOCUMENT SURROGATES.
MEASURES FROM INFORMATION THEORY
INFORMATION THEORY. MEASURES OF INFORMATION CONTENT
OF DOCUMENT SURROGATES
Measures of Effectiveness of
the Indexing System
1. Recall measure is a simple quantitative
ratio of relevant documents retrieved to the
total number of relevant documents
potentially available. Recall depends on the
level of exhaustivity allowed by the
indexing policy.
Example:
If there are 100 relevant documents in the
library that are relevant to the users needs
and the indexing system retrieves 75, then
the recall ratio is 75 out of 100 (75/100).
Recall for this search is 75 percent effective.
Subject Indexing
Steps in subject indexing:
1. Recording bibliographic data
2. Subject determination
3. Conceptual analysis
4. Translation into standard terms
using controlled vocabulary
2. Subject determination
aboutness of the material
formulation of a concept list
most appropriate to the given community
of users
If necessary, modify both indexing tools
and procedures as a result of feedback
from inquiries
no arbitrary limit should be set to the
number of terms or descriptors
concepts should be identified as
specifically as possible.
3. Content analysis
a. Factors that may affect content analysis:
Environmental situation
Policy decisions
Decisions of the indexer
b. Parts of the documents that have to be
analyzed
Title
Abstract
List of contents
Text itself
Illustrations, diagrams, tables and captions.
Reference section
pottery making
or
weaving
wood carving
Punctuation
a. The inversion of a phrase used as the
heading in a main entry is punctuated by a
comma.
b. If the heading is followed immediately by
page references, a comma is used between the
heading and the first numeral and between
subsequent numerals.
c. If the heading is followed immediately by
run-in subentries, a colon precedes the first
subheading. All subsequent subentries are
preceded by semicolons. For example:
payments, balance of: definition of, 16;
importance of, 19
2. Corporate Bodies
Names of the corporate bodies should normally be
indexed without transposition
e.g. British Museum
Transposition may, however, be used if it is considered
that this would help the users of the index.
e.g. Department of Agriculture see Agriculture,
Department of
J. Whitaker & Sons see Whitaker (J) & Sons
Choose the most recent or the most commonly used form
of corporate name as the main heading and add see
cross references from other forms
e.g. John Moores University see Liverpool John Moores
University
3. Geographic Names
should be full as necessary for clarity, with additions
to avoid confusion with the otherwise identical
names
e.g Alaminos (Laguna)
Alaminos (Pangasinan)
An article or preposition should be retained in a
geographic name of which it forms an integral part
e.g. La Paz
Las Vegas
Where the article or preposition does not form an
integral part of a name it should be omitted, e.g.
e.g New Forest rather than The New Forest
4. Titles of documents
should normally be italicized, underlined or otherwise
distinguished. If necessary for identification, names of creators,
places of publication dates or other qualifiers may be added within
parenthesis.
e.g.
e.g.
To the Lighthouse
e.g.
Evaluation of Indexes
Guidelines/Criteria
1. Subject error
Errors in choosing subject descriptors
Omission errors
Use of a too broad or too narrow term
3. Terminology
4. Internal guidance
Cross-references
Printed instruction on how to use the index
5. Accuracy in referring
Bibliographic citation
Cross-references
6. Entry scattering
Example:
College libraries School libraries
National libraries
Special libraries
7. Entry differentiation
Example:
Libraries, 1-2, 28-31, 42, 53-60, 82, 109-11, 131-40, 310, 342-50
10. Layout
12. Cost
13. Standards
Indexing Standards
International Organization for
Standardization
ISO 2788: 1986 Documentation Guidelines for
the establishment and development of monolingual
thesauri
ISO 5964: 1985 Documentation - Guidelines for
the establishment and development of multilingual
thesauri
ISO 5963: 1985 Documentation Methods for
examining documents, determining their subjects,
and selecting indexing terms
Indexing Standards
International Organization for
Standardization
ISO 999: 1996 Information and documentation
Guidelines for the content, organization and
presentation of indexes
ISO 4: 1997 - Information and documentation Rules
for the abbreviation of title words and titles of
publications. It publishes a List of Serial Title Word
Abbreviations which includes title word
abbreviations in over 50 languages.
Automatic Indexing
refers to indexing by machine, or the
analysis of text by means of computer
algorithms. The focus is on automatic
methods used behind the scenes with
little or no input from individual
searchers, with the exception of
relevance feedback.
GOOD LUCK!