Sunteți pe pagina 1din 92

INDEXING

Definition of Terms

indexing - the process of providing in-depth access to


information contained within a document or knowledge
record.

index - a guide to the contents of a document or


collection of documents with the same format arranged in
a searchable order such as alphabetical, classified,
chronological or numerical.

index entry single record in an index that may consist


of four parts: main heading, subheading, locator and/or
cross reference/s.

descriptor a term designed for use by the thesaurus to


represent the aboutness of a topic in a document.

document any item that contains information, either in


print or non-print format, including digital forms.

identifier - proper name of person, object,


institution/organization, process, etc.

indexing language - any vocabulary, controlled or


uncontrolled, used for indexing along with the rules
of usage.
indexing system a set of prescribed procedures
(manual or machine-operated) intended for
organizing the contents of a document or
knowledge records for purposes of retrieval and
dissemination.
keyword - raw word coming from the documents
that are regarded as indexable term.
qualifier - a term or phrase added to a heading to
distinguish among homographs or clarify meaning.
translation the process of converting concepts
derived from the document into a particular set of
index terms usually derived from a controlled
vocabulary.
vocabulary control - the process of organizing a
list of terms for use in indexing, along with the rules
of usage.

Development of Indexes and


Indexing
First systematic organization of written
records occurred in Sumer around 3, 000 B.C.
Around 2, 000 B.C. in China and India, record
keeping became part of the society.
Early civilizations proposed schemes of
knowledge classification and document
arrangement (e.g. Greeks used some sort of
alphabetic order).
In 900 A.D., an encyclopedia was arranged in
alphabetical order.
During the 15th century, books were
published with blank pages and quite wide
margins.

The 17th century brought a new type of


information tool, the periodical.
During the 19th century also, Paul Otlet and
Henry La Fontaine founded the International
Institute of Bibliography to improve indexing
approaches to scholarly literature. This led to
modern keyword and free-text indexing.
In 1900, H.W. Wilson first published Readers
Guide to Periodical Literature.
In the 1950s, W.F. Poole published an index that
covered numerous issues of many periodicals.
By the 1950s, computers penetrated the
indexing arena and efforts to evaluate indexing
begun.

Role of Indexing in Information


Retrieval
Relationship of Indexing, Abstracting and Searching
(Cleveland and Cleveland, 2001, p. 31)

DOCUMENT

INDEX

INDEXING
TOOL

PATRON

ABSTRACT

Information Retrieval
System
Information retrieval system is a mechanism
for carrying out the functions of information
retrieval process.
Organization of information may take in
different forms (manual, by the use of
computer or a combination of both).
Most challenging problem: providing for the
nearest possible response or coincidence
Modern information retrieval systems: data
retrieval, reference retrieval and text retrieval.

Information Retrieval
System
Functions involved:
1. The information is created and acquired for the
system.
2. Knowledge records are analyzed and tagged by
set of index terms.
3. The knowledge records are stored physically
and index terms are stored into a structured file.
4. The users query is tagged with sets of index
terms and then is matched against tagged
records.
5. Matched documents are retrieved for review.
6. Feedback may lead to several reiterations of
the search.

Feedback may lead to several reiterations of the


search...
User espresses an inforamtion need

Request is conceptually analyzed


Request is translated into system's
index language

A searching strategy is composed

Reformation of the request

Search is carried out

Is user satisfied?

Search is completed

Stop

Are all searching options depleted?

Purposes and Uses of


Indexes

Saves time and effort in finding


information.

Identify potentially relevant information in


the document or collection being indexed.

Analyze concepts treated in a document to


produce appropriate index headings based
on the indexing language assigned.

Purposes and Uses of


Indexes
Group together related topics.
Direct the users seeking information
under terms not chosen as index
headings to headings that have been
chosen.
Suggest related topics .
Tool for current awareness services.

Types of Indexes
by arrangement
a. Alphabetical index
Advantage:
More convenient to use and follows an
order that is familiar to users.
Drawbacks:
synonymy
scattering of entries

b. Classified index
Advantages:
useful for generic searches.
Brings similar things together.
Drawbacks:
Most users find them difficult to use.
Needs a secondary file.
One cannot enter it directly as one can with
alphabetical sequences of names.

c. Concordance
Uses:
Locate a partly or completely remembered
passage
Drawback:
Searching is difficult since this type of index
spreads similar entries over many synonymous
terms, ignores misspellings, and confuses any
general-specific term relationships.

d. Numerical or serial order*


e.g. Numerical Patent Index of Chemical
Abstracts; American Statistics Office

Nelsons Complete Concordance


of the Revised Standard Version
Bible
AARON
Is there not A., your brother, the Ex.
The Lord said to A., Go into 4.27
And Moses told A. all the words

4.28

And A. spoke all the words which 4.30


Afterward Moses and A. went to

5.30

4.14

by type or form of material indexed


1. Book index*
Reasons for Preparing a Book Index
collects the different ways of wording
the same concept.
filters information for the reader.
pinpoints information

Components of a book index entry:


main heading
subheading
locator
cross references
World Wide Web (WWW)
browsers, 78
components, 89
development, 100-156
see also Internet

2. Periodical index*
consistency becomes the most
challenging part
open-ended projects
scope is broader
3. Newspaper index
vocabulary control becomes a
paramount challenge
4. Audiovisual materials index
textual labeling is needed along with
image matching

Difference between Book &


Periodical Indexes
Compiled only
A continuous
once and within a
process and more
relatively short
often performed
time and usually
by a team of
performed by a
indexers and
single person.
lasting for an
extended period.
Deals with a more
or less well-defined Deals with a great
central topic.
variety of topics.

Difference between Book &


Periodical Indexes
Terminology must be
Indexing terms
are almost always consistent and
derived from a
derived from the
controlled vocabulary.
text.
Terms are prescribed
Specificity is
by a controlled
largely governed
vocabulary and their
by the text itself.

level of specificity
may be lower than
the book index.

Difference between Book &


Periodical Indexes
Every single page Articles are scanned
for indexable items
of a book must be
and may rely on an
read.
abstract or summary
Entire text is
compiled.
virtually subject to
A periodical index will
indexing.
depend on a number
Always bound with of policy decisions.
the indexed text.
Compiled separately.

by physical form
card index
printed index
microform index
computerized index
automatic indexing
computer-assisted indexing

Principles and Concepts of


Indexing
Exhaustivity refers to the extent to which concepts are
made retrievable by means of index terms.
1.1 Summarization
1.2 Depth indexing
2. Specificity refers to the extent to which a concept or
topic in a document is identified by a percise term in the
hierarchy of its genus-species relationship.
Example:
An information resource about musicians should be
entered under Musicians and not under Performing
Artists.
3. Consistency refers to the extent of the agreement
exists on the terms to be used to index some documents.
Types of consistency level:
inter-indexer consistency
intra-indexer consistency

Indexing Languages
Purposes and Uses
a system for naming or identifying subjects contained
in a document.
as a tool for communication

Features/Characteristics
Vocabulary refers to terms selected from the
indexing of concepts.
Syntactics refers to the combination and
modification of terms to form headings and multilevel
headings or to form search statements.
Example: Employees, Training of; Training of employees
Semantics the study of meaning as expressed in
communication such as words.

Semantic relationships are categorized


into:
Equivalence relationship implies that
there will be more than one term denoting
the same concept.
Synonyms
Quasi-synonyms
Preferred spelling
Acronyms and abbreviations
Current and established terms
Translation

Hierarchical relationship
Genus species relationship (represents class
inclusion
Example:

Agro industry Food Industry Meat Industry


Whole - part relationship
Example:

Foot Toes

Affinitive relationship displayed with the


use of related terms
Example:

Men Women
Education Teaching

Types of Indexing
Languages

1. Natural language (derived-term system)


Characteristics are:

Improves recall because it provides more


access points but reduces precision

Redundancy is greater

Uses more current terms

Tends to be favored by subject-specialists or


the end-users

May also be called indexing by extraction


(or extractive indexing method).

2. Controlled vocabulary (assigned-term


system)
Functions:

To control synonyms by choosing one form


as the standard term

To make distinctions among homographs

To bring or link together terms that are


closely related

Establishes the size of scope of a term

Usually records hierarchical and


affinitive/associative relations

Controls variant spellings

Syndetic devices used by a controlled


vocabulary:
USE and UF (use for) for synonyms
BT (broader term), NT (narrower
term) and RT (related term) for
differing levels of specificity and
certain near synonyms and antonyms

Advantages of Controlled Vocabulary


Language
Increases the probability that both indexer and
searcher will express a particular concept in
the same way.
Increases the probability that the same term
will be used by different indexers or by the
same indexer at different times.
Helps searchers to focus their thoughts when
they approach the information system without
a full and precise realization of what
information they need.

Disadvantages of Controlled
Vocabulary Language:
Incompatibility of different indexing
languages.
High input cost.
The possibility of inadequate
vocabulary.

Types of Controlled
Vocabulary
1. Authority List / Subject Authority List
Examples:

Library of Congress Subject Headings

Sears List of Subject Headings

Dewey Decimal Classification

2. Thesaurus

Latin word means treasure

Poly-hierarchical

Examples:

The Art & Architecture Thesaurus*


ERIC (Education Resouces Information Center)
Thesaurus*

Similarities between Authority Lists


and Thesauri
Both attempts to provide subject access
to information resources by providing
terminology that can be consistent rather
than uncontrolled and unpredictable.
Both choose preferred terms and make
references from non-used terms.
Both provide hierarchies so that terms
are presented in relation to their broader,
narrower, and related terms.

Difference between Authority Lists


and Thesauri
Thesauri are made up of single terms
and bound terms representing single
concepts. Subject heading lists have
phrases and other pre-coordinated
terms in addition to single terms.
Thesauri are more strictly hierarchical.
Thesauri are narrow in scope.
Thesauri are more likely multilingual.

Relationships of Terms

INTELLIGENCE
BT: Ability
NT: Comprehension
RT: Talent
Aptitude
Broader term (BT) reference shows hierarchical relationship
upward in the classification tree.
Narrower term (NT) reference is similar to the broader term
reference, except it goes down in the classification tree.
Related term (RT) reference refers to a descriptor that can be
used in addition to the basic term but is not in a hierarchical
relationship.
Use reference refers to a preferred descriptor from a non-usable
term.
Use for (UF) reference deals primarily with synonymous or variant
forms of the preferred descriptor. It is also used to lead the indexer
to more general terms.

Scope Note (SN) is used to give the users about the descriptors
usage restrictions or to clarify ambiguity.

Thesaurus
1.
2.
3.
4.

Identify the subject field.


Identify the nature of literature to be indexed.
Identify the users.
Identify the file structure. Will this be a precoordinate or post-coordinate system?
5. Consult published indexes, glossaries,
dictionaries, and other tools in the subject
areas for the raw vocabulary.
6. Cluster the terms.
7. Establish term relationships.

Indexing Systems
1. Coordinate indexes an indexing
scheme that combines single index terms to
create composite subject concepts
Types:
post-coordinate indexing
pre-coordinate indexing

2. Classified indexes contents are


arranged systematically by classes or
subject headings.
2.1 Enumerative indexes
Both DDC, LCC, and UDC are examples of
enumerative classifications.
Enumerative classifications are top-down
methods of analysis.

2.2 Faceted indexes


often called as analytico-synthetic system. A
facet analysis is a tightly controlled process by
which simple concepts are organized into
carefully defined categories by connecting
class numbers of the basic concepts.
Bottom-up systems.
Is pre-coordinated at the time of indexing and
is arranged in classification order rather than
a straight alphabetical order.
Shiyali Ramamrita Ranganathan in 1930s
Example: When indexing a cookbook, some important facets
might be:
Holidays
Ingredients
Recipe Titles
Techniques

3. Chain indexes
Provide that every concept becomes
linked, or chained.
Introduced by S.R. Ranganathan as
part of his Colon Classification, the
system uses synthesis or number
building. The number that represents
some complex subject is arrived at by
joining the notational elements that
represent more elemental subjects.

Example of a Chain Index


Topic: Victorian period English Poetry (821.8)
Hierarchy:
8

Literature
2

English

Poetry
.8

Victorian period

Chain index entries that will be generated are the following:


Victorian period: Poetry: English: Literature
Poetry: English: Literature
English: Literature
Literature

821
820

800

821.8

4. Permuted title indexes


Advantages:
minimum cost
does not need the expertise of a professional indexer
because it is entirely done by a computer
Disadvantages:
titles may not accurately reflect the content of the
item
limited number of terms restrict complete subject
indication
most of the title indexes are unappealing to the eye
can increase the retrieval of irrelevant documents
usually employ stop-lists

Scattering of synonyms and generic terms usually


cause user frustration and missed entries.

4.1 KWIC (keyword in context) was introduced


by Hans Peter Luhn in 1959. It is a rotated index
most commonly derived from the titles of
documents. Each keyword appearing in a title
becomes an entry point and highlighted in some
way by setting it off at the center of the page.
Principles of KWIC Indexing
Title are generally informative
Words extracted from the title can be used as an
effective guide
Although the meaning of an individual word viewed
in isolation may be ambiguous or too general, the
context surrounding the word helps to define and
explain meaning.
Example:
for Croatians.
Cataloging and classification
Cataloging and classification for Croatians
for
Croatians. Cataloging and classification

4.2 KWOC (keyword out of context) - A variation


on the Keyword in Context Index (KWIC), in which
keywords, removed from the context of the titles
that contain them, appear as headings in a
separate line index flush with the left margin.
Example:
Cataloging Cataloging and classification for Croatians.
classification Cataloging and classification for Croatians.
Croatians. Cataloging and classification for Croatians
*A keyword used as an entry point in KWOC index is sometimes
not repeated in the title but is replaced by an asterisk (*) or
some symbols.
Example:
Blue-eyed* Cats in Texas . 25
Cat
The * and the Economy .. 12
Cats
Blue-eyed * in Texas ... 13
Economy The Cat and the * . 56
Texas
Blue-eyed Cats in * .. 76

4.3 KWAC (keyword alongside context) - also


produced by computer algorithm, are designed to
preserve work pairs and phrases in the
alphabetical sequence of keywords while at the
same time imitating the traditional format with the
lead term on the left.
Example:
Cataloging and classification for Croatians.
classification for Croatians. Cataloging and
Croatians. Cataloging and classification for

5. Citation indexes lead users to papers


by citations, rather than by index terms.
6. String indexes a word-based system
in which the indexer analyzes the various
aspects of the subject treated in a
document and records the aspects as
words, along with role operators . The
computer program combines these words
into string of terms that represents a
brief summary of the documents
content.

6.1 PRECIS (Preserved Context Index System)


developed by Derek Austin for the British
National Bibliography (1971-1973) in order to
produce printed alphabetical subject entries.
principle of context-dependency.
It involves:
Determining the subject content of the document
Analyzing the subject statement to determine the role
of each significant term (action term, location item, an
agent or object of the action)
Determine the relationship of a term to other terms in
the database and how should all these terms be
linked.

Below is an illustration on how a string of


terms are organized according to the
principle of context-dependency.
Topic: Selection of personnel in paper
industries in the Philippines, the input
string is:
A>B>C>D
or
Philippines > Paper industries > Personnel >
Selection

The input string is:


(0) Philippines
(1) paper industries
(P) personnel
(2) selection

Where (2) represents the transition action, (P)


object of action, (0) location, and (1) key system
(object of transitive action). These operators show
the role that a term plays in relation to other terms
and thus can be regarded as role indicators or role
operators.

Entries provided are:


Philippines
Paper industries. Personnel. Selection.
Paper industries. Philippines.
Personnel. Selection.
Personnel. Paper industries. Philippines.
Selection.
Selection. Personnel. Paper industries. Philippines

6.2 POPSI (Postulate-based Permuted


Subject Indexing)
developed at the Documentation Research and
Training Center (India)
classification ideas of S.R. Ranganathan
coding used for the index string generator is
based on the indicator system of Colon
Classification. A comma , precedes the
entity segment; a semicolon ; is a
property segment; a colon :is a process
segment; a hyphen -is a qualifying sub
segment; and a greater than >is a narrower
term.

Example: The topic study, using


rabbits, of heart stimulation by
antibiotics will be placed under the
discipline of pharmacology and will
generate the following input string:
PHARMACOLOGY, CHEMICAL>DRUG>ANTIBIOTICS;
STIMULATION-CIRCULATORY SYSTEM>HEART:
STUDY-ANIMAL>RABBIT

Index strings that may be generated


from the index string cited above are:
ANIMAL,STUDY,STIMULATION
PHARMACOLOGY,ANTIBIOTICS;STIMULATIONHEART:STUDY-RABBIT
ANTIBIOTICS,PHARMACOLOGY
PHARMACOLOGY,ANTIBIOTICS;STIMULATIONHEART:STUDY-RABBIT

6.3 NEPHIS (Nested Phrase Indexing System)


developed by Timothy C. Craven. The input string was
designed to be a phrase in ordinary language.
Four different coding symbols are used:
the left and the right angular brackets (< and >) mark the beginning and end of a phrase embedded or
nested within a larger phrase
the question mark ? - indicates that what follows is
a connective to be included only in those index strings
in which the connective has something to connect
the at sign @ - indicates that what follows is not an
access term; this coding symbol is used at the
beginning of the input string or at the beginning of a
nested phrase.

Example:
Topic is measures from information theory of the information
content of document surrogates
@MEASURES? OF<INFORMATION CONTENT?OF <DOCUMENT
SURROGATES>>?FROM<INFORMATION THEORY>
Sample index strings that may be generated from the above
input string are:
DOCUMENT SURROGATES. INFORMATION CONTENT. MEASURES
FROM INFORMATION THEORY
INFORMATION CONTENT OF DOCUMENT SURROGATES.
MEASURES FROM INFORMATION THEORY
INFORMATION THEORY. MEASURES OF INFORMATION CONTENT
OF DOCUMENT SURROGATES

6.4 CIFT (Contextual Indexing and Faceted


Taxonomic Access System)
developed for the Modern Language Association
(MLA), alphabetical subject entries are created from
strings provided by indexers who assign facets
derived from literature, linguistics and folklore.
Example:
HENDIADYS
English literature. Tragedy. 1500-1599
Shakespeare, William. Hamlet. Use of HENDIADYS. Sources
in Vigil. Linguistic approach
LINGUISTIC APPROACH
English literature. Tragedy. 1500-1599
Shakespeare, William. Hamlet. Use of Hendiadys. Sources
in Vigil.
LINGUISTIC APPROACH

Measures of Effectiveness of
the Indexing System
1. Recall measure is a simple quantitative
ratio of relevant documents retrieved to the
total number of relevant documents
potentially available. Recall depends on the
level of exhaustivity allowed by the
indexing policy.
Example:
If there are 100 relevant documents in the
library that are relevant to the users needs
and the indexing system retrieves 75, then
the recall ratio is 75 out of 100 (75/100).
Recall for this search is 75 percent effective.

2. Precision measure is the ratio of


relevant documents retrieved to the total
number of documents retrieved.
Relevance or precision depends on the
terminology of the text being indexed
and the specificity of the indexing
language used.
Example:
If 100 documents are retrieved and 50 of
those items are relevant to the request,
the precision ratio is 50 to 100 (50/100).
Precision for this search is 50 percent
effective.

Subject Indexing
Steps in subject indexing:
1. Recording bibliographic data
2. Subject determination
3. Conceptual analysis
4. Translation into standard terms
using controlled vocabulary

1. Recording bibliographic data (author, title,


publication data, etc.)
a. When indexing printed books, pamphlets, periodicals
and other printed documents, use locators that refer to
the page numbers, separating locators with a comma.
Example:
Livingstone, Ken 1/3, 1/97, 3/56
b. When indexing several issues or volumes of one title
of a periodical, the indexer should take the locators
from the numbering of the issues at the time of
publication.
Example:54/3: 38 volume/part: page
53, April 1998: 38 volume, date: page
53: 38volume: page
April 1998: 38 date: page

c. When indexing contents of a collection of documents, locators


should give complete information about each document (title
of the article, the author(s), the title of the periodical, volume
number and date, and the inclusive pagination for the article).
Example:
Automated Teller Machines
Competition spurs development of innovative bank
technologies. Bus Journ. 45, Jan-Mar 2004: 13.
The new networks. Info Tech. Apr 2005: 76-89.
d. If a document treats a subject continuously in a consecutively
numbered sequence, reference should be made to the first
and last numbered elements only.
e. Exceptionally, where space constraints apply or where the
locators are extremely long, e.g. 10002-10012, numbers may
be deleted so that the only changed digits of the second
locator are given, e.g. 10002-12.
f. Conventionally, the digits 10-19 in each hundred are given in
full, e.g. 412-18

2. Subject determination
aboutness of the material
formulation of a concept list
most appropriate to the given community
of users
If necessary, modify both indexing tools
and procedures as a result of feedback
from inquiries
no arbitrary limit should be set to the
number of terms or descriptors
concepts should be identified as
specifically as possible.

3. Content analysis
a. Factors that may affect content analysis:
Environmental situation
Policy decisions
Decisions of the indexer
b. Parts of the documents that have to be
analyzed
Title
Abstract
List of contents
Text itself
Illustrations, diagrams, tables and captions.
Reference section

4. Translation into standard terms using


controlled vocabulary
The following practices must be observed in the translation
process.

Concepts which are already translated into


indexing terms should be translated into their
preferred terms.
Terms which represent new concepts should be
checked for accuracy and acceptability in reference
tools.
If the concepts are not present in an existing
thesaurus or classification scheme, these may be
Expressed by terms or descriptors which are
admitted into indexing language
Represented temporarily by more general
terms, the new concept being proposed as
candidates for later addition

Indexing Policies and Guidelines


&
Production of Indexes

Indexing Procedures for Books


1. Examine the text carefully.
2. Read the text several times, page by
page, to be able to analyze the
contents and determine the indexable
topics.
3. Select the topics to be indexed taking
into consideration their significance to
the central theme of the book.
4. Name the topics that were chosen to
be indexed and mark up page proofs.

5. Alphabetize the entries.


6. Edit the entries
Decide which entries should be the
main headings and which should be
the subheadings
Decide whether certain entries will be
treated as main entries or subentries
Example:
handicrafts
pottery making
weaving
wood carving

pottery making
or

weaving
wood carving

Main entries unmodified by subentries should


not be followed by long rows or page numbers.
Subentries must be concise and informative
Make a final choice among synonymous terms
Provide adequate but not excessive crossreferencing
Examples:
Cars Trucks
Chevrolet, 224 Dodge Ram, 219
Mazda, 146 GMC (Jimmy), 143
Volkswagen Mercedes-Benz, 144
See also trucks
See also cars

Punctuation
a. The inversion of a phrase used as the
heading in a main entry is punctuated by a
comma.
b. If the heading is followed immediately by
page references, a comma is used between the
heading and the first numeral and between
subsequent numerals.
c. If the heading is followed immediately by
run-in subentries, a colon precedes the first
subheading. All subsequent subentries are
preceded by semicolons. For example:
payments, balance of: definition of, 16;
importance of, 19

7. Determine the design of the index after


the compilation of the entries
Decide whether subentries will follow an
indented or run-in style.
Typography should be used to
differentiate between types of headings
and to distinguish them from numerals
indicating volumes, parts and pages.
8. Typing, proofreading, and the final
review.

Indexing Techniques for


Periodical Articles
1. Always index names of persons honored by
awards or prizes and those eulogized in
obituaries.
2. Every article that have permanent value should
be indexed under all topics and issues dealt
with.
3. Editorials should be indexed under their topics
as any other article but differentiated from the
others by the addition of (Ed.) or (E). The titles
of editorials may be indexed under a collective
heading Editorials.

4. Letters to the editor if considered indexable


should be indexed by topic, not under a
caption that may have been assigned by the
editor. It is advisable to index at least the
name of the person who criticized an article
as well as the authors response. For example:
Doe, John. Effect of magnetic fields 37-43
Errors (H. Smith) 75; correction 185
[authors entry]
Smith, Henry. Effect of magnetic fields
(John Doe pp. 37-43): errors 75
[letter writers index entry]

5. Book reviews are indexed by the title of the


book, followed by the name of the author, the
locator, and the designation (R) unless all book
reviews are listed under the class heading
Book Reviews or in a separate index,
e.g.
Guide to reference books, 10th ed. (Sheehy) 68 (R)

*The name of the reviewer should be included in


the author name index,
e.g.
Dixon, Geoffrey 68 (R), 92-96, 123

Choice and Forms of Headings


(ISO 999)
1. Personal Names
full form as possible
should take the form used in the document, but if the
text is not consistent, the indexer should adopt one
form
choose the most recent, or the most commonly used
form of personal name as the heading and add see
cross-references from other forms,
e.g. Clemens, Samuel Langhorne see Twain, Mark
where surnames are in common used, the entry
should be the surname followed by any given name or
initials
Where surnames are not used, the name that
customarily comes first should properly be used as
the entry word
e.g. Imran Khan

Persons identified only by a given name or forename


should be indexed under that name, qualified if
necessary, by a title of office or other distinguishing
epithet
e.g.
Leonardo da Vinci
Boudicca, Queen of Iceni
Persons normally identified by a title of honor or
nobility should be indexed under that title, expanded
if necessary by their family name
e.g. Dalai Lama
First Duke of Marlborough, John Churchill
Compound and multiple surnames, whether
hyphenated or not, should be indexed under the first
part
e.g. Layzell Ward, Patricia
Perez de Cueller, Javier

2. Corporate Bodies
Names of the corporate bodies should normally be
indexed without transposition
e.g. British Museum
Transposition may, however, be used if it is considered
that this would help the users of the index.
e.g. Department of Agriculture see Agriculture,
Department of
J. Whitaker & Sons see Whitaker (J) & Sons
Choose the most recent or the most commonly used form
of corporate name as the main heading and add see
cross references from other forms
e.g. John Moores University see Liverpool John Moores
University

Liverpool John Moores University

3. Geographic Names
should be full as necessary for clarity, with additions
to avoid confusion with the otherwise identical
names
e.g Alaminos (Laguna)
Alaminos (Pangasinan)
An article or preposition should be retained in a
geographic name of which it forms an integral part
e.g. La Paz

Las Vegas
Where the article or preposition does not form an
integral part of a name it should be omitted, e.g.
e.g New Forest rather than The New Forest

Rheinfall rather than Der Rheinfall

4. Titles of documents
should normally be italicized, underlined or otherwise
distinguished. If necessary for identification, names of creators,
places of publication dates or other qualifiers may be added within
parenthesis.

Ave Maria (Gounod)


Ave Maria (Schubert)
Ave Maria (Verdi)

e.g.

In an English index, articles in titles are conventionally


transposed to the end of the heading so that filing order is
explicit.

Hunting of the Snark, The


Kapital, Das
e.g.

A preposition at the beginning of the title should be retained

e.g.

To the Lighthouse

5. First lines of poems


Conventionally in an index of first lines of poems,
the article is retained without transposition and is
recognized for purpose of alphabetical arrangement

A little thing in the snow


The modest Rose puts forth a
thorn

e.g.

Evaluation of Indexes
Guidelines/Criteria
1. Subject error
Errors in choosing subject descriptors
Omission errors
Use of a too broad or too narrow term

2. Generic searching Alphabetical indexes


have always presented difficulties in
promoting generic searching.

3. Terminology
4. Internal guidance
Cross-references
Printed instruction on how to use the index

5. Accuracy in referring
Bibliographic citation
Cross-references

6. Entry scattering
Example:
College libraries School libraries
National libraries

Special libraries

7. Entry differentiation
Example:
Libraries, 1-2, 28-31, 42, 53-60, 82, 109-11, 131-40, 310, 342-50

8. Spelling and punctuation


9. Filing

Letter by letter (Air base, Airborne, Air brake)


Word by word (Air base, Airborne, Air brake)

10. Layout

Main heading are in heavy print


Subheadings are in lighter print and small letters and
indented
See references are italicized

11. Length and type

Index length should be 3-5% of the pages of a typical


nonfiction book, about 5-8% for a history or
biography and about 15-20% for reference books

12. Cost
13. Standards

Indexing Standards
International Organization for
Standardization
ISO 2788: 1986 Documentation Guidelines for
the establishment and development of monolingual
thesauri
ISO 5964: 1985 Documentation - Guidelines for
the establishment and development of multilingual
thesauri
ISO 5963: 1985 Documentation Methods for
examining documents, determining their subjects,
and selecting indexing terms

Indexing Standards
International Organization for
Standardization
ISO 999: 1996 Information and documentation
Guidelines for the content, organization and
presentation of indexes
ISO 4: 1997 - Information and documentation Rules
for the abbreviation of title words and titles of
publications. It publishes a List of Serial Title Word
Abbreviations which includes title word
abbreviations in over 50 languages.

British Standards Institution (BSI)


BS 1749: 1985 - Recommendations for alphabetical
arrangement and the filing order of numbers and
symbols
BS 6478: 1984 Guide to filing bibliographic
information in libraries and documentation
BS 6529: 1984 Recommendations for examining
documents, determining their subjects and selecting
indexing terms
BS 6723: 1985 Guide to establishment and
development of multilingual thesauri
BS 5723: 1987 - Guide to establishment and
development of monolingual thesauri
BS ISO 999: 1996 Information and Documentation
Guidelines for the content, organization and
presentation of indexes

Automatic Indexing
refers to indexing by machine, or the
analysis of text by means of computer
algorithms. The focus is on automatic
methods used behind the scenes with
little or no input from individual
searchers, with the exception of
relevance feedback.

Four Types of Approaches


(Cleveland & Cleveland, 2001, p. 211)

Statistical based on counts of


words, statistical associations, and
collation techniques that assigns
weighs, cluster similar words
Syntactical stresses grammar and
parts of speech, identifying concepts
found in designated grammatical
combinations, such as noun phrases.

Semantic systems concerned with the


context sensitivity of words in the text.
What does cat mean in terms of its context?
House cats? Heavy earthmoving
equipment?
Knowledge-based systems goes beyond
thesaurus or equivalent relationships to
knowing the relationship between words,
e.g. tibia is part of a leg, thus the
document is indexed under leg injuries.

Human /Manual Indexing vs.


Automatic Indexing

Needs more people


Costly
Human error
Low in production
Quality can range from
excellent to appalling

Needs less human effort


Cheaper
Follows instruction
automatically
Accurate
Fast in production
Promotes meticulous
problem analysis
Dependent to human
intelligence
Power lies on how the
computer is
programmed

Human /Manual Indexing vs.


Automatic Indexing
Automatic methods have trouble handling
synonyms, homonyms, and semantic
relations. Conceptualizing is very poor.
Human indexers go through cognitive
processes that may be influenced by their
background experience, education, training,
intelligence, and common sense.
Computers can, and humans cannot, organize
all words in a text and in a given database
and make statistical operations on them

Indexing and the Internet


Search Tools
Search engines - Engines are computer software that scan the
Web and select pages to be indexed for the searching system.
They are often referred to as Web indexes since they examine the
content of the web pages. Examples: HotBot, InfoSeek, and
Google.
Directory-based systems usually indexed by human and
thus tend to have a higher level of quality in the indexing.
Indexing may be based on full text or on most frequently used
words since the way the material is organized is a sense of
browsing that is similar to traditional library browsing. Examples:
Yahoo! Directory and Google Directory
Metasearchers - allow the user to search across multiple search
tools at once. They take users query and submit it to a number
of other search tools. Examples: Metacrawler and Surfmax

GOOD LUCK!

S-ar putea să vă placă și