Sunteți pe pagina 1din 37

This article was downloaded by: [DePaul University]

On: 14 November 2014, At: 20:39


Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,
37-41 Mortimer Street, London W1T 3JH, UK

Cataloging & Classification Quarterly


Publication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/wccq20

Quality of Korean Cataloging Records in Shared


Databases
a
Hee-Sook Shin
a
Cataloging Department , Ohio State University Libraries , 1858 Neil Avenue, Columbus, OH,
43210-1286 E-mail:
Published online: 24 Oct 2008.

To cite this article: Hee-Sook Shin (2003) Quality of Korean Cataloging Records in Shared Databases, Cataloging &
Classification Quarterly, 36:1, 55-90, DOI: 10.1300/J104v36n01_05

To link to this article: http://dx.doi.org/10.1300/J104v36n01_05

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the
Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and
should be independently verified with primary sources of information. Taylor and Francis shall not be liable for
any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever
or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of
the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
Quality of Korean Cataloging Records
in Shared Databases
Hee-sook Shin
Downloaded by [DePaul University] at 20:39 14 November 2014

ABSTRACT. For decades, the issue of quality control in cataloging rec-


ords has been discussed, in particular in bibliographic control in the
shared databases for various languages, but no earlier studies assess the
quality of Korean cataloging records in the databases. This study exam-
ines the quality of Korean cataloging records in OCLC’s WorldCat by
evaluating records in terms of specific errors, error frequency, areas where
errors occur frequently, and errors that could inhibit record retrieval. The
results of the current study also are compared with the results of the study
of the quality of Chinese-language cataloging records in order to identify
shared error patterns. Similarities were found in the error rates and error
types. Based on the results, the author proposes some recommendations
on how to maintain quality in cataloging Korean-language records. [Ar-
ticle copies available for a fee from The Haworth Document Delivery Service:
1-800-HAWORTH. E-mail address: <docdelivery@haworthpress.com> Website:
<http:// www.HaworthPress.com> © 2003 by The Haworth Press, Inc. All rights re-
served.]

KEYWORDS. Cataloging, bibliographic control, quality control, Chinese,


Japanese and Korean (CJK), Korean cataloging records, shared database

1. INTRODUCTION

The growth of Chinese, Japanese, and Korean (CJK) library collections,


along with the development of CJK cataloging software, has increased the

Hee-sook Shin, ML, GDIT, is Non-Roman Languages Coordinator, Cataloging


Department, Ohio State University Libraries, 1858 Neil Avenue, Columbus, OH
43210-1286 (E-mail: shin.110@osu.edu).
Cataloging & Classification Quarterly, Vol. 36(1) 2003
http://www.haworthpress.com/store/product.asp?sku=J104
 2003 by The Haworth Press, Inc. All rights reserved. 55
56 CATALOGING & CLASSIFICATION QUARTERLY

number of CJK records entered into shared databases, particularly in the two
largest online cataloging databases in the United States, OCLC Online Com-
puter Library Center and Research Libraries Information Network (RLIN). By
the year 2000, the total number of CJK records both in OCLC and RLIN
reached almost 2 million. With this influx of records into the shared cataloging
environment, the quality of the records requires attention. Although two noted
studies assessed Chinese records (one study by Zeng and one by Simpson), no
one has yet formally scrutinized Japanese and Korean records. The purpose of
this study is to examine the quality of Korean-language cataloging records in
OCLC’s WorldCat in order to identify areas where errors occur frequently and
Downloaded by [DePaul University] at 20:39 14 November 2014

detect errors that inhibit record retrieval. The study follows a methodology
similar to that used by Lei Zeng in her 1993 thesis, An Evaluation of the Qual-
ity of Chinese-Language Records in the OCLC OLUC Database and a Study of
a Rule-Based Data Validation System for Online Chinese Cataloging.1 The
current study offers comparisons to Zeng’s findings and identifies shared error
patterns.
Based on the study’s results, the author makes recommendations towards
maintain quality Korean-language records and hopes to stimulate discussion
among libraries regarding the quality in the cataloging process. The present
findings and recommendations can be directly applied to CJK catalogers, and
perhaps to other catalogers as well, as they can be used in the development of
training manuals and workshops.

2. LITERATURE REVIEW

The large volume of literature on quality in cataloging and its impact on the
user reaffirms that this issue is a fundamental concern for scholars working in
the information field.2,3,4,5,6 Quality is well defined in the literature, as several
authors have proposed evaluative criteria and methodology.7,8,9 For instance,
Graham’s article defines the essential considerations in assuring quality cata-
loging, emphasizing extent and accuracy, while Thomas establishes that qual-
ity is “dynamic and dependent on the values and needs of cataloging users,”
reviewing options such as copy cataloging, cooperative cataloging, minimal
level cataloging, and outsourcing. Reeb, in light of a case study carried out by
his university, discusses how systematically to evaluate quality, concluding
that record evaluation could potentially improve a cataloger’s skills and under-
standing of the rules; however, he cautions that evaluation could be used
merely as a grading system.
Although no earlier studies assess the quality of Korean cataloging records
in the shared databases, several researchers have assessed general database
Hee-sook Shin 57

quality.10,11,12,13,14,15,16 For example, Intner examined the quality of West-


ern-language records. Inspecting 430 records, she identified 1,067 errors, an
average of 2.48 errors per record. Over a third of these errors involved the mis-
application of standard descriptive cataloging rules, leading her to suggest that
such rules should be reassessed and, if necessary, clarified and that catalogers
should be made aware of these errors and the rules pertaining to them. Follow-
ing this study, Romero compared Intner’s results to data gathered from records
created by her students who, although trained, were not experienced, identify-
ing a significantly greater number of errors in her sample than were earlier
identified by Intner. This study illustrates the relationship between inexperi-
Downloaded by [DePaul University] at 20:39 14 November 2014

ence and high error frequency.


Zeng’s 1993 doctoral thesis analyzing Chinese-language records in the
shared databases was pivotal in the formation of the present study. Zeng identi-
fied common errors, systematizing them into three classes: format, content,
and editing and inputting errors. Simpson conducted a similar study in 1998,
analyzing 380 Chinese monographic records and identifying 430 errors. Error
distributions were similar in both studies; the majority of errors occurred in the
245 and 260 fields and mis-romanization errors were the most frequent (partic-
ularly incomplete or omitted transcriptions).

3. THE IMPORTANCE OF QUALITY IN CATALOGING

The primary aim of cataloging is to provide accurate information so that li-


brary materials can be identified and located as efficiently as possible. Issues
of cataloging quality are not noticeable until a needed item is inaccessible be-
cause it is cataloged poorly or incorrectly. Missing and incorrect information,
along with other errors that affect record retrieval, result in poor quality. In
contrast, records that make retrieval easy reflect high quality, because they are
accurate and thorough. Cataloging quality depends on the cataloger, who is
challenged to make a large, unstructured body of information easily and uni-
versally accessible in the most comprehensive and cost effective manner pos-
sible.
Although cataloging is flexible enough to adapt to most local library envi-
ronments, quality, accuracy, completeness, and consistency should be main-
tained. A record’s adherence to guidelines can determine quality. In terms of
description, records must correspond with the materials that they represent.
Also, records must contain appropriate access points as defined in standard
guides such as The Anglo American Cataloguing Rules (AACR)(2nd ed.), The
Library of Congress Rule Interpretations (LCRI), The Library of Congress
Descriptive, Subject Cataloging Manuals (SCM), International Standard Bib-
58 CATALOGING & CLASSIFICATION QUARTERLY

liographic Description (ISBD) conventions, and other appropriate reference


guides. Such standards ensure consistency, which is crucial to maintaining the
effectiveness of the databases.
Quality standards become increasingly important in online-shared catalogs
such as OCLC and RLIN. Errors that in the past would have affected only the
local library now broadly affect other libraries using the databases. Thus, qual-
ity issues have not only a local impact but a global impact as well. Further-
more, quality is particularly important in cataloging electronic resources
because the record itself is the only point of access to the information.
Periodic scrutiny and correction of records in the online databases are nec-
Downloaded by [DePaul University] at 20:39 14 November 2014

essary to ensure accuracy and adherence to high standards. Errors encountered


should be used to guide the creation of more accurate future records. Such
maintenance would ensure that the database is efficient and that it increases in
quality even as it increases in size. The shared databases particularly should
provide a pool of reliable records for foreign-language materials that can be
used by smaller libraries lacking language resources.

4. QUALITY IN CHINESE, JAPANESE, AND KOREAN CATALOGING

Quality is becoming an increasingly important issue in the cataloging of


CJK materials. Spurred by the increased interest in East Asian Studies and the
demand for materials to support international studies, the number of CJK ma-
terials that academic libraries in the United States acquire has grown rapidly
since the 1980s, resulting in a corresponding increase in the number of CJK rec-
ords contributed to both OCLC and RLIN. Technological developments have
fostered this growth, allowing for the input and display of CJK vernacular
scripts and direct-script record access. RLIN developed a vernacular CJK pro-
gram in 1983, with OCLC following suit in 1986.17
Resource sharing has further increased the number of CJK records. In No-
vember of 1989, OCLC and RLIN began exchanging CJK database informa-
tion on a monthly basis via tape-load. Later, OCLC and RLIN extended
resource sharing internationally, downloading records from individual librar-
ies. Various retrospective conversion projects (that is, the converting of tradi-
tional cards into online records) have also greatly increased the number of CJK
records downloaded into both databases.18
From 1994 to 2000, the number of Korean records added to OCLC increased
anywhere from 6 to 21% annually. RLIN experienced a similar increase ranging
from 7 to 22% annually between 1992 and 2000 (see Appendix 1). As of July
31, 2000, the OCLC database consisted of 1,871,421 CJK records, 168,232 of
which were Korean, with 139,506 containing vernacular scripts, making Ko-
Hee-sook Shin 59

rean the 17th largest of the more than 200 languages represented in OCLC. As
of March 2000, the RLIN database contained 2,366,077 CJK titles, of which,
185,536 corresponded to Korean materials, with 145,822 records containing
vernacular scripts.19,20
As such a large number of CJK records have been added to the databases
over such a short period of time, the quality of the shared databases could have
been compromised, as records have been acquired from diverse sources and
formats. Even given established quality mechanisms, the overall quality of the
shared databases is questionable, as many records are as yet unchecked. This
study aims partially to remedy this by evaluating the quality of Korean records,
Downloaded by [DePaul University] at 20:39 14 November 2014

identifying those factors that might diminish the effectiveness of the online
cataloging system.

5. RESEARCH METHODOLOGY

The study’s methodology is based on a modified version of Zeng’s er-


ror-categorization system. This methodology facilitates comparisons between
the two studies and makes possible recommendations with the goal of improv-
ing cataloging quality.
This study involved a four-step process: (1) sample records were collected;
(2) error categories were determined and integrated into a data table; (3) rec-
ords were evaluated for error content and collected data were integrated into a
table, and (4) data were analyzed using the software package, the Statistical
Package for the Social Science (SPSS).

5.1. The Research Samples

In October 2000, 2,000 Korean language records were randomly collected


by the Office of Research at OCLC, based on the following criteria:

1. Only monograph records were selected.


2. All records contained the “kor” language code in the fixed field.
3. All records contained vernacular fields as well as romanized fields.

To maintain consistency, this study examined only monographic records


because such records tend to be standardized while other formats can vary ac-
cording to local library procedure. Of the 2,000 records randomly collected,
the analysis excluded (1) records contributed by the Library of Congress
(LC) (because LC records are generally complete, consistently based on cat-
aloging rules, and more accurate21); (2) incomplete (K, 7, M-level) records
60 CATALOGING & CLASSIFICATION QUARTERLY

(because they generally contain only the minimum number of required fields,
and thus it is unclear whether or not missing fields are errors made by catalog-
ers); (3) juvenile literature records (because such records are generally sim-
ple); and (4) Pre-AACR2 records (to ensure that all selected records followed
the same set of cataloging rules). Thus, this study examined only full-level (I,
L-level) records contributed by OCLC or RLIN (see Appendix 2).
These criteria resulted in a sample of 623 records out of a total of 2,000 rec-
ords, as indicated in Table 1 below. As OCLC and RLIN have different input
requirements for CJK records, the sample was divided into two categories:
(1) records created by OCLC members and (2) those created by RLIN mem-
Downloaded by [DePaul University] at 20:39 14 November 2014

bers and tape-loaded into OCLC WorldCat. Although some local libraries are
members of both organizations, records were categorized according to the en-
coding level in the fixed fields. For comparative purposes, the categories were
analyzed separately.

5.2. Error Categorization and Table Construction

An error classification was formulated for this study as described in Table 2.


To facilitate comparison, the error categories applied in this study closely
follow those developed by Zeng, with some modifications: In Zeng’s study,
romanization errors were classified within the “Content errors” and the
“Editing & Inputting errors” categories. To better suit the parameters of
this study, the author added a fourth category for romanization errors due to
the level of complexity inherent in romanizing Korean. While Chinese
romanization is character dependent and does not change as the characters are
used to form words, Korean and Japanese romanization is word dependent.
Thus, the phonology and diacritics for each character change as pronunciation
changes with the character’s placement within the structure of the word. For
instance, if “F” starts a word, it is romanized as “kuk,” but if it appears in the
middle or at the end of the word it is romanized as “guk” (FG: kukka, HF:
Han’guk). Also, complex rules for word division, although not present in Chi-
nese, are present in Korean.

TABLE 1. Number of Sample Records for the Analysis

Institutions Type of Records Number of Records % of Records

OCLC Original (I) 508 81.5

RLIN Original (M) 115 18.5

Total 623 100.0


Hee-sook Shin 61

Within the categories described in Table 2, specific errors were identified


and arranged in USMARC format order, including fixed as well as variable
fields. Each error was encoded based on the MARC field in which it occurred.
Errors were grouped into 12 field categories; for instance, 1xx includes main
entry field errors, while 6xx includes subject field errors. LC subject headings
were checked against online authority files and against the rules in the LC Sub-

TABLE 2. Classification of Errors

Error Categories Description & Code


Downloaded by [DePaul University] at 20:39 14 November 2014

Format errors
F1 Incorrect field tag
F2 Incorrect indicator
F3 Incorrect or missing subfield code, incorrect sequence of subfields
F4 Incorrect or missing punctuation and space (as required by ISBD)
(e.g., incorrect ending punctuation)
Content errors
M1 Missing a whole field or containing an extra field
M2 Missing a part of the entry or containing an extra part for the entry (e.g., subfield = d in 100
field)
M3 Missing the corresponding vernacular field or containing an extra entry for a non one-to-one
name
S1 Inconsistency between romanized and vernacular data (including number, subfield code,
words, punctuation)
S2 Inconsistency between corresponding fields or values (mismatching fixed and variable
fields)
R1 Incorrect content in a whole field (e.g., missing access point for a second author or using an
entry different from the authority entry)
R2 Other content error (e.g., a series used a 440 entry which is coded nna (old heading) in the
authority file)
R3 Incorrect content (e.g, incorrect subfield or qualifier is partly missing)
R4 Incorrect call number
R5 Incorrect subject heading
Editing &
Inputting errors
E1 Incorrect upper and lower case
E2 Punctuation used CJK mode
E3 Space used CJK mode
E4 Other editing and inputting errors (e.g., misspelling of English words)
Romanization errors
RO1 Incorrect or misspelling of romanization
RO2 Incorrect word division (including hyphen usage)
RO3 Incorrect or missing diacritics
RO4 Incorrect or missing vernacular character
62 CATALOGING & CLASSIFICATION QUARTERLY

ject Cataloging Manuals (LCSH, SCM, Free floating subdivisions). Classifica-


tion numbers were checked for accuracy; that is, for correlation with the
subjects as assigned in the records. Using MS Excel, three tables were con-
structed, one for OCLC, one for RLIN, and one for the entire sample. Error
codes were incorporated into the tables as data categories, creating a structure
into which errors were recorded.22

5.3. Collecting, Analyzing, and Integrating of Data

Each individual record was analyzed for: (1) completeness and (2) errors
Downloaded by [DePaul University] at 20:39 14 November 2014

in adhering to standard cataloging rules as established in the LCRIs and the


OCLC Bibliographic formats and standards manual. The sample was then
analyzed to identify the frequency of specific errors. Each record was re-
viewed twice, first by the author and then by OCLC senior cataloger, Barbara
Brownell. Because the author did not have direct access to most of the books,
both the author and Brownell checked only the fields that were present on the
bibliographic records according to the OCLC Bibliographic Formats and Stan-
dards manual.23 Thus, the records themselves formed the primary basis for
analysis; however, in cases where they were unclear, the books were refer-
enced if available (see Appendix 3). The author also checked for duplicate rec-
ords. Each error was assigned a code (see Table 2).
For each record, LC authority records were consulted, especially for sub-
ject headings and for the 440 and 490 fields.24 If the field did not match the
authority entry or follow the correct LC tracing practice, this was counted as
an error. In cases where more than one romanization or word division for a
specific word was found in the records, the word as established by the
McCurn-Reinshauer romanization system, accepted by LC as standard, and by
Korean dictionaries was considered to reflect the correct usage.25,26 Where the
LC rules were not clear, the author used the word as established in the majority
of WorldCat records.
This study excluded some elements. For example, since extra spaces in the
vernacular fields do not affect retrieval in the shared database and are ignored
by the system, titles containing extra spaces were not counted as errors. Also,
because of variations in local library practices, missing call numbers and sub-
ject headings were excluded (for example, some public libraries add a call
number in the 099 field that is not retained in OCLC master records). The ab-
sence of romanized or vernacular fields in note fields was not counted as an er-
ror because, in certain situations, libraries may create CJK script-only fields
without romanized equivalents and vice versa. The use of subfield ‘v’ was ex-
cluded because it is a new practice, implemented in 1999.
Hee-sook Shin 63

In certain situations, multiple errors in a record were counted as a single er-


ror. For example, when a format or content error occurs in a romanized field,
the same error usually occurs in the vernacular field, thus, such errors were
counted only once. Similarly, a missing field, such as an added entry, usually
results in two missing fields; such an error was also counted only once. Errors
that could be assigned to more than one category were counted only once. For
example, when the vernacular character was inconsistent with the romanized
one, it was counted as an S1 error (inconsistency between romanized and ver-
nacular data) instead of as two errors, S1 and R2 (incorrect content). However,
such an error was considered an R2 error when content did not match that es-
Downloaded by [DePaul University] at 20:39 14 November 2014

tablished by LC or by the book itself.


On the other hand, a word incorrectly romanized and appearing in more
than one field was counted per occurrence. Different words incorrectly
romanized appearing in a single field were also counted per occurrence. For
example, “Hanguk kodae ui togi : huk, yesul, sam kwa chugum” romanized as
“Hanguk kodae ui toki : hulk, yesul, salmgwa chukum” was counted as four
romanization errors. Capitalization errors were also counted per occurrence,
even when they occurred in a single field or phrase. For instance, the corporate
name “Han’guk Hyondae Munhwa Yonguwon” entered as “Han’guk hyondae
munhwa yonguwon” was counted as three capitalization errors, for Korean
corporate body names are capitalized in the same manner as in English.

5.4. Data Analysis

Data was first entered into MS Excel spreadsheets (see Appendix 4) and
then transferred into the SPSS software program. Errors were compared both
by type and by frequency, and they were analyzed within their respective fields
and within the MARC format. Graphs and tables were used to compare errors
occurring in OCLC and RLIN, and errors in Korean records were compared
with errors in Chinese records. Comparisons were not made between individ-
ual member libraries and their contributions, nor was any attempt made to
identify the errors of specific institutions.

6. RESEARCH FINDINGS

The sample records obtained from OCLC were contributed by 79 institu-


tions and covered a narrow range of subjects, mainly in the humanities. The
analysis of the sample records revealed error patterns similar to those estab-
lished in Zeng’s work. Although this study involved Korean language materi-
als, the error patterns were comparable to those occurring in Chinese and
64 CATALOGING & CLASSIFICATION QUARTERLY

English-language records, indicating that most cataloging problems are to be


found in following and interpreting rules rather than understanding the lan-
guage.
In general, the majority of errors revealed have little impact on database op-
eration and record usability, because they occur in punctuation and spacing.
However, substantial errors affecting the accessibility of the record did occur,
and mainly these errors involved incorrect or inadequate access points or
romanization.
MARC format, which arranges information in a record such that it can be
Downloaded by [DePaul University] at 20:39 14 November 2014

accessed and identified in the database using information within encoded and
structured fields, was affected by errors, as the errors tended to be clustered
within certain field types, most notably the title statement (245) field and im-
print (260) field.

6.1. Completeness of a Record

In this study, the completeness of a record is based on the completeness of


the descriptive cataloging and on whether or not the record includes a classi-
fication number and subject headings. Of the 623 reviewed records, 456 rec-
ords (73.19%) were complete. Classification numbers were present in 457
records (73.3%), including Library of Congress Classification (LCC) num-
bers and Dewey Decimal Classification (DDC) numbers. However, 166 rec-
ords (26.7%) lacked classification numbers. Some libraries do not classify
fiction, and therefore these records do not contain call numbers. Also, some li-
braries enter classification numbers in the 099 field, which is not retained in
the master record. These libraries should be encouraged to enter their numbers
in the 090 or 092 fields as well. In regard to subject headings, 485 records
contained at least one subject heading. The average number of headings per
record, excluding literature, was 2.04, which is close to the Program for Co-
operative Cataloging (PCC) recommendation of 2.00.

6.2. Error-Free and Duplicate Records

The sample included 148 error-free records (23.8%), a much higher per-
centage than that found in other studies (only 18% of records were error-free in
Zeng’s study). Reasons for this may include this study’s exclusion of incom-
plete records and the assumption that the records accurately reflected books
represented. Out of the 623 records, 58 were duplicates, 41 (8%) in OCLC and
17 (14.8%) in RLIN. Most duplicate records occurred in the OCLC system be-
cause words were romanized differently. The major cause of record duplica-
Hee-sook Shin 65

tion was variation in word division, followed by access points that differed
from authority headings and missing matching fields.

6.3. Errors in the Korean Records

Table 3 indicates the number of error occurrences in both shared databases.


Table 4 indicates that enhanced records contained fewer errors than did origi-
nal records. It is important to note that the data presented in Table 3 does not
indicate that RLIN records are of poorer quality than OCLC records, as the
sample RLIN records were selected from OCLC WorldCat. That is, punctua-
Downloaded by [DePaul University] at 20:39 14 November 2014

tion and spacing errors were introduced into the RLIN records when they were
tape-loaded into WorldCat, and thus errors were the result of differences be-
tween the two systems.
As illustrated in Figure 1, which displays the frequency distribution of error
occurrences, the maximum number of errors in an individual record was 16,
while the minimum was zero.

6.3.1. Number of Error Occurrences by Field

As illustrated in Table 5, a significant number of errors occurred in the 245,


Fixed & 0xx, and 260 fields. Incorrect punctuation and inadequate spacing ac-
cording to ISBD standards were the most significant errors occurring in the
245 field, followed by incorrect romanization and incorrect word division. The
Fixed & 0xx fields mostly involved inconsistency between corresponding

TABLE 3. Number of Total Error Occurrences

No. of Records No. of Errors Ave. Error per Record (%)

OCLC 508 1085 2.14

RLIN 115 467 4.06

TOTAL 623 1552 2.49

TABLE 4. Number of Error Occurrences in Original and Enhanced Records


(OCLC)

No. of Records No. of Errors Ave. Error per Record (%)

Original 425 943 2.22

Enhanced 83 142 1.71


66 CATALOGING & CLASSIFICATION QUARTERLY

FIGURE 1. Frequency of Distribution of Error Occurrences

18

16

14

12
Downloaded by [DePaul University] at 20:39 14 November 2014

10
Error

⫺2

Reviewed Records

fields and values (mismatching of fixed and variable fields), incorrect call
numbers, and missing fields (for example, omitting the 041 field). Finally, the
260 field chiefly involved incorrect punctuation and inadequate spacing ac-
cording to ISBD standards, followed by incorrect romanization and incorrect
use of upper and lower case.
For the most part, the error rate and category distribution occurring in the
OCLC records mirrors that of the sample as a whole (see Appendix 5), with
one exception. In addition to punctuation, spacing, romanization, and word di-
vision errors, improper use of upper and lower case was also a significant oc-
currence in the 245 field. The OCLC sample also differs from the total sample
in that the third largest rate of errors occurred in the 5xx (or notes) fields. These
errors were generally related to incorrect punctuation and content, missing
fields, and incorrect use of upper and lower case.
With regard to the RLIN records, the 245 field similarly exhibited the
highest error frequency. Again, punctuation, spacing, romanization, and
word division errors were the most prevalent; however, inconsistencies be-
tween romanized and vernacular fields also occurred at a significant rate. The
TABLE 5. Number of Error Occurrences by Field and Category

RO RO RO RO
Downloaded by [DePaul University] at 20:39 14 November 2014

F1 F2 F3 F4 M1 M2 M3 S1 S2 R1 R2 R3 R4 R5 E1 E2 E3 E4 1 2 3 4 Total %
Fixed & 0xx 4 1 5 10 31 0 0 0 124 2 2 2 37 0 0 0 0 1 0 0 0 0 219 0.35
1xx 7 0 0 12 6 12 3 1 0 5 0 6 0 0 0 0 0 1 4 0 1 0 58 0.09
240 2 8 0 3 12 2 2 0 0 2 2 0 0 0 2 0 0 0 1 1 1 0 38 0.06
245 0 17 12 123 1 3 0 30 0 1 3 5 0 0 39 5 0 1 85 61 25 2 413 0.66
246/740 2 10 0 1 30 2 12 0 0 0 2 0 0 0 3 0 0 0 9 12 1 0 84 0.13
250 0 0 0 2 6 1 0 0 0 0 2 0 0 0 0 0 0 0 1 11 1 0 24 0.04
260 0 0 6 73 1 1 2 10 0 1 7 5 0 0 20 1 0 0 20 11 5 1 164 0.26
300 0 0 16 42 1 11 0 0 0 0 3 12 0 0 1 0 0 2 0 0 0 0 88 0.14
4xx/8xx 16 15 0 24 15 0 4 2 0 2 3 7 1 0 10 3 0 0 8 3 4 0 117 0.19
5xx 6 1 0 82 14 0 1 3 0 2 1 0 0 0 15 0 0 6 8 8 4 0 151 0.24
6xx 1 1 8 9 2 1 12 0 0 4 0 5 0 24 6 0 0 10 3 0 0 0 86 0.14
7xx 4 3 0 20 39 13 2 2 0 4 0 6 0 0 8 0 0 2 5 0 2 0 110 0.18
Total 42 56 47 401 158 46 38 48 124 23 25 48 38 24 104 9 0 23 144 107 44 3 1552 2.49

Error Ranking by Field


Rank MARC field
1 245 (Title Statement)
2 Fixed & 0xx (Fixed Fields and 0xx Fields)
3 260 (Imprint)
4 5xx (General Notes)
5 4xx/8xx (Series Statement)
6 7xx (Added Entry for Personal Name, Corporate Body, Uniform Title)
7 300 (Physical Description)
8 6xx (Subject Added Entries)
9 246/740 (Varying Form of Title and/or Title Added Entries)
10 1xx (Main Entries)
11 240 (Uniform Title)
67

12 250 (Edition Statement)


68
TABLE 5 (continued)
Downloaded by [DePaul University] at 20:39 14 November 2014

Error Ranking by Category


Rank Error type
1 F4 (Incorrect or missing punctuation and space according to ISBD standards)
2 M1 (Missing a whole field or containing an extra field)
3 RO1 (Incorrect or misspelling romanization)
4 S2 (Inconsistency between corresponding fields or values)
5 RO2 (Incorrect word division (including hyphen usage))
6 E1 (Incorrect upper and lower case)
7 F2 (Incorrect indicator)
8 S1 (Inconsistency between romanized and vernacular data)
R3 (Incorrect content)
9 F3 (Incorrect or missing subfield code, incorrect sequence of subfields)
10 M2 (Missing a part of the entry or containing an extra part for the entry)
11 RO3 (Incorrect or missing diacritics)
12 F1 (Incorrect field tag)
M3 (Missing the corresponding vernacular field or containing an extra entry for a non one-to-one name)
R4 (Incorrect call number)
13 R2 (Other content error)
14 R5 (Incorrect subject heading)
15 R1 (Incorrect content in a whole field)
E4 (Other editing and inputting errors (e.g., misspelling of English words))
16 E2 (Punctuation used CJK mode)
17 RO4 (Incorrect or missing vernacular character)
18 E3 (Space used CJK mode)
Hee-sook Shin 69

260 field proved to be the second most problematic area, mostly involving in-
correct punctuation and inadequate spacing, followed by incorrect use of up-
per and lower case and inconsistency between corresponding fields or values.
Errors occurring in the 300 field formed the third most frequent category and
were generally related to incorrect punctuation and inadequate spacing, omit-
ting or containing an extra part in the entry, and incorrect content (see Appen-
dix 5).
Normalization was used to account for variations in required MARC fields
and access points. For instance, although the 245, 260, and 300 fields are man-
datory in all records, the 1xx is required as applicable. Additionally, fields
Downloaded by [DePaul University] at 20:39 14 November 2014

such as the 5xx, 6xx, and 7xx fields may be displayed more than once in the rec-
ord. The method of normalization for the number of errors by field is the actual
number of errors in a field (y) divided by the total instances of this field occur-
ring in all sample records (x), giving the rate of error occurrence per entry. Ta-
ble 6 displays error occurrences by fields after normalization.
As Table 6 and Figure 2 illustrate, normalization changes the error rate dis-
tribution. Although the 245 field is still the most error prone, it is followed in
error frequency by the 4xx/8xx and 240 fields. Appendix 6 describes the error
rate distribution after normalization by record type (whether OCLC or RLIN).

6.3.2. Number of Error Occurrences by Category

As is shown in Table 5, the greatest number of errors were inadequate ISBD


punctuation and spacing errors, followed by missing and extra field errors and
spelling and romanization errors. Appendix 5 describes the error distribution
by category in terms of record type (whether OCLC or RLIN).
Table 5 also reveals that the greatest number of errors occurred in the con-
tent errors category, accounting for a total of 572 errors (including subject
heading and call number errors). The 48 errors identified specifically as “in-
correct content” involved incorrect application of authority entries, including
name headings that failed to match the authority file and/or partially missing
qualifiers.
Out of the 623 sample records, 457 call numbers were checked and com-
pared to designated subject headings. Inaccurate call numbers were assigned
in 38 records and mainly reflected a faulty use of tables and/or placement of
the Cutter decimal point. In some cases, although the first heading was not spe-
cific enough to match the call number, the second heading was and these were
not counted as errors. An error was counted only when a subject heading was
set up improperly according to the LC SCM (for example, a heading did not
match the authority file or subdivision were used incorrectly). There were few
subject heading errors. At times, a broader term was used where a narrower
70 CATALOGING & CLASSIFICATION QUARTERLY

TABLE 6. Error Occurrences by Field After Normalization

MARC Field X Y R z
(Entry No.) (Error No.) (Rate = y/x) (r* Sample No.)
Fixed & 0xx 623 219 0.35 219.00
1xx 409 58 0.14 88.35
240 96 38 0.40 246.60
245 623 413 0.66 413.00
246/740 217 84 0.39 241.16
250 311 24 0.08 48.08
Downloaded by [DePaul University] at 20:39 14 November 2014

260 623 164 0.26 164.00


300 623 88 0.14 88.00
440/490/8xx 264 117 0.44 276.10
5xx 657 151 0.23 143.19
6xx 993 86 0.09 53.96
7xx 407 110 0.27 168.38

FIGURE 2. Error Occurrences by Field After Normalization

Fixed & 0xx

7xx 1xx
6xx
240

245

5xx 246/740

440/490/8xx 250
300 260

heading would have been more suitable, but these cases were not counted as
errors.
Tables 5 reveals that there were 546 format errors present in the sample,
while editing and inputting errors occurred 136 times, most significantly as er-
rors of incorrect capitalization resulting from mis-application of AACR2. All
23 errors involved the misspelling of English words. With regards to the 9 er-
rors involving the use of CJK-mode punctuation, this error type was included
in OCLC records because CJK-mode punctuation potentially affects record re-
Hee-sook Shin 71

trieval as the system reads punctuation as a character. In RLIN, CJK-mode


punctuation errors do not impede record retrieval, and thus were not counted.
In addition, because the OCLC CJK system was implemented into the PRISM
environment in 1994 and because this system ignores spacing errors, this study
excluded extra spacing as an error.27
As stated above, this study treated romanization errors as a separate cate-
gory; there were 298 errors in the romanization category (see Table 5). Failure
to adhere to the McCurn-Reinshauer romanization system accounted for the
greatest number of romanization errors, and these errors involved either the in-
Downloaded by [DePaul University] at 20:39 14 November 2014

correct romanization or misspelling of vernacular Korean words or the use of a


different romanization system altogether.

6.4. An Analysis of Major Errors

Overall, the record quality in this study is comparable to that found in previ-
ous studies (e.g., Zeng’s and Intner’s). Although this study discovered errors
in the Korean records, they were for the most part minor and did not affect the
user’s ability to retrieve the record. Most were ISBD punctuation errors, and
although such errors do not typically block the record from use, they do affect
the way information is arranged and displayed in the database. On the other
hand, the error categories discussed below do pose some concern, for errors in
these categories potentially inhibit access to the record.

6.4.1. Romanization Errors

Similar to the findings of Zeng’s study, one of the major concerns in the
present study is incorrect romanization and word division. Such errors may af-
fect user retrieval and may lead to duplicate records. This decreases time effi-
ciency, for catalogers spend time both in evaluating duplicates and in creating
new records when they cannot find an incorrectly created record.
Errors in romanization are perhaps due to the current standard rules of Ko-
rean romanization and word division according to the LC romanization table,
which uses the McCune-Reischauer sytem. These rules are not clear and are
difficult for even a native speaker to follow. The complexities inherent in
romanizing Korean may also contribute to romanization errors. To further
complicate the issue, Korean dictionaries are not consistent in their use of
word division and Korean has an elaborate phonologic and orthographic struc-
ture, which leads to enormous problems in transcribing the language. Even
Korean catalogers can become confused when faced with variations in pronun-
ciation and word division. Furthermore, catalogers can transcribe the same
72 CATALOGING & CLASSIFICATION QUARTERLY

character in different ways (e.g., sam or salm; Hanguk hyondae munhak non or
Hanguk hyondae munhangnon).
Some Korean records contain Chinese and/or Japanese names or publish-
ers, and catalogers can become confused when inputting such records, using
different romanization systems instead of McCurn-Reinshauer, which in turn
causes problems in record retrieval. In one example, concerning a book pub-
lished in China but written in Korean, the 245 field was romanized using
McCurn-Reinshauer, while the 260 field was romanized using Wade-Giles (a
system for romanizing Chinese). According to rule 1.0E in AACR2, descrip-
tive cataloging must be consistent and use the same romanization scheme
Downloaded by [DePaul University] at 20:39 14 November 2014

throughout. Thus, when romanizing CJK characters, the 245, 250, 260, and
4xx fields in any single record should all adhere to the same system regardless
of language.
Although RLIN and OCLC CJK can display vernacular script, most Online
Public Access (OPAC) systems in institutions and libraries in the U.S. do not
support these scripts, and romanization provides the only access to these col-
lections. Therefore, romanization is a significant process and should be per-
formed consistently.

6.4.2. Access Point Errors

The sample in this study contained a significant number of content errors,


including errors in access points. Because the bibliographic record is listed un-
der access point entries, they are important pathways to related works, and
such errors may have a negative impact on information retrieval.
A typical error found in this study involved omitting access points for sec-
ondary authors and titles. This reflects a misunderstanding of AACR2 Chapter
21, especially rule 21.6 “Works of Shared Responsibility” and rule 21.7 “Col-
lections and Works Produced under Editorial Direction.” The sample for this
study also contained records of translated works, and uniform titles and 041
fields were frequently omitted. These errors reflect a lack of understanding of
AACR2 Chapter 25 and could also be due to carelessness on the part of catalog-
ers.
Incomplete or inaccurate heading entries that do not match established au-
thority forms have a profound impact on the user’s ability to retrieve informa-
tion from the database. This study found 96 occurrences (6%) where access
points did not match when compared with authority headings. Perhaps the
higher rate in this study (compared to 2.3% in Zeng’s study) indicates that
non-professionals created a larger percentage of the records or that catalogers
did not properly understand either the language or the rules. To maintain con-
sistency throughout the database, catalogers should confirm that all headings
Hee-sook Shin 73

used in each record match the form established in the appropriate authority
file.
Authority records that contain vernacular characters are another concern.
Since several characters share the same pronunciation, many Korean names
share the same romanization, and thus identical authority headings can refer to
several different people. Also, vernacular characters in the 245 field can con-
flict with the standard authority form of the author’s name as it appears in the
100 field, making it difficult to determine the appropriate authority form. For
example, Korean characters may be entered for an author’s name as it appears
in the item, in the 245 field. However, the authority file does not indicate
Downloaded by [DePaul University] at 20:39 14 November 2014

whether the author’s established name should be entered in the 100 field using
Chinese or Korean vernacular characters, and, as a result, the author’s name
may appear in both character sets throughout the database. Errors of this type
usually occur in Korean and Japanese materials, particularly when Chinese
characters are used to display bibliographic information. Having the appropri-
ate vernacular characters available in the authority file would alleviate both of
these problems.
Missing access points not only impair record retrieval but also potentially
result in duplicate records, and this problem should be stressed so that all ac-
cess points are present when applicable. As access points that conflict with es-
tablished authority forms also pose a serious problem, heading consistency is
also important.

6.4.3. Capitalization, CJK-Mode Punctuation, and Misspelling

Capitalization errors occurred more frequently in this study (7%) than in


Zeng’s study (1%). These errors were mainly found in names of corporate
bodies, personal titles (e.g., Sonsaeng and Paksa), and countries. Some errors
may have been caused by the use of Chinese capitalization rules, where only
the first word of a corporate name is capitalized (Korean capitalization rules
are similar to those of English in that all words are capitalized (e.g., Soul
Taehakkyo, rather than Soul taehakkyo)).
Misapplication of CJK-mode punctuation was another error affecting rec-
ord retrieval. While the RLIN system ignores both CJK and keyboard-based
ASCII modes and, thus, punctuation does not affect record retrieval, the
OCLC system (which uses the ASCII mode) reads punctuation marks as char-
acters, which impairs the retrieval of records. OCLC is working on resolving
this glitch in order to prevent problems when sharing resources with the RLIN
system.
There were 23 spelling errors in this study, affecting 1% of the sample, sim-
ilar to the 1% found in Zeng’s study. Although this figure is not high, it does in-
74 CATALOGING & CLASSIFICATION QUARTERLY

dicate a potential to impair the retrieval of bibliographic records. Carelessness


and/or a failure to review records before exporting them caused such errors.

6.4.4. Tagging Errors and Indicator Errors

The 42 tagging and 56 indicator errors found in this study included incorrect
choice of main entry, incorrect indicator used in the 246 field, and improper
use of the 440 or 490/8xx fields. The majority of tagging errors occurred in the
440 and 490/8xx fields (16 errors) and the 100 and 700 fields (11 errors). In the
440 and 490/8xx fields, the reviewers only checked the tags against the author-
Downloaded by [DePaul University] at 20:39 14 November 2014

ity file, determining whether the series was traced properly or not, using either
a 440 or a 490/8xx combination. Errors in the 100 field usually resulted from
using the added entry rather than the main entry or vice versa. These errors were
related to either a lack of understanding of AACR2 Chapter 21 and/or a failure to
check the authority headings thoroughly. Tagging and indicator errors, though
more frequent in this study than in Zeng’s (tagging errors: 0.2%, indicator er-
rors: 1%), do not affect user retrieval of Korean records in the online environ-
ment. However, indicator errors should be considered for titles and data in other
languages, as indicators would affect the retrieval of these records.

6.4.5. Incorrect or Missing Punctuation and Space (as Required by ISBD)

As in Zeng’s study, where punctuation errors occurred in 17.5% percent of


the sample, this study also found that incorrect punctuation and inadequate
spacing according to ISBD standards was the most significant error category,
affecting 26% of the sample. Although ISBD punctuation does not affect
searching, catalogers should be careful to apply ISBD correctly so that users
have an easier time reading the bibliographic record. In addition, correct ISBD
application is important when arranging and indexing information into the da-
tabase as punctuation can tell the system when information is complete for
each field. It is also important when exchanging resources internationally be-
cause many countries use different MARC systems and ISBD is thus the only
way to arrange and index data in the system. Therefore, the correct use of
ISBD is necessary to ensure that all records are consistent. The correct use of
ISBD will also make it easier for those maintaining the database, as they will
not have to manually correct poorly formatted records.

6.4.6. Inconsistency Between Fixed and Variable Fields Errors

The number of fixed field errors found in this study is significant, involving
8% of the sample. Inconsistencies between fixed and variable fields occurred
Hee-sook Shin 75

more frequently in this sample than in Zeng’s, where only 3% of the records
were affected. However, as the values in “contents,” “illustration,” and “in-
dex” are optional and do not affect record retrieval, they cannot be considered
major errors.

6.5. A Summary of the Findings

This study, like previous studies, revealed many catalog record errors.
Moreover, some errors in this study occurred in patterns similar to those found
in other studies (for example, the average number of errors per record found in
Downloaded by [DePaul University] at 20:39 14 November 2014

the present study was 2.49, while Zeng’s study found 2.23 per record). En-
hanced records in this study had a much lower error rate than that found in
original records. As in Zeng’s study, most errors involved ISBD punctuation
and other minor errors that do not prevent users from locating the record. This
study also showed, however, that there were some major errors that did impact
the usability of the record, notably romanization and access point errors.
As described above, errors were found most often in the 245, Fixed and 0xx,
and 260 fields. This distribution is similar to that found by Zeng (see Table 7
and Figure 3). Since the 245 field is the primary source of information for the
user, the rules for transcribing this field must be emphasized. Although errors
involving the matching of fixed and variable fields do not affect retrieval, con-
sistency is important, and these fields may provide useful search keys in the fu-
ture. As the 260 field does provide a search key, errors in this field may affect
record retrieval in shared databases. This study also found many errors in the
1xx and 7xx fields, mostly involving confusion between the main and added
entry.
Again, the greatest number of errors in this study involved inadequate ISBD
punctuation, followed by missing an entire field or containing an extra field,
and missing fixed field values as compared to the variable fields. These find-
ings were similar to Zeng’s findings, where ISBD punctuation and spacing for-
mat errors were the most frequent, followed by CJK-mode punctuation errors
and missing or extra field errors. Because CJK-mode punctuation errors do not
impede record retrieval in RLIN, this study excluded such errors.
Although RLIN had a higher error rate than OCLC, overall, error patterns in
OCLC and RLIN records were similar. Because of differences between the
two systems, errors such as punctuation and spacing errors were introduced
into records as they were tape-loaded into the OCLC system. This problem
needs to be addressed at a technical level.
Although the majority of errors occurred in the content and format catego-
ries, romanization errors posed the greatest concern. Romanization and word
division errors occurred frequently, rendering such records difficult to retrieve
76 CATALOGING & CLASSIFICATION QUARTERLY

TABLE 7. Error Distributions Within Fields as Compared with Zeng’s Study

Chinese Records Korean Records


Errors % Errors %
Fixed & 0xx 499 0.12 219 0.14
1xx 149 0.04 58 0.04
240 16 0.00 38 0.02
245 1231 0.30 413 0.27
246/740 105 0.03 84 0.05
250 79 0.02 24 0.02
Downloaded by [DePaul University] at 20:39 14 November 2014

260 1002 0.25 164 0.11


300 0 0.00 88 0.06
4xx/8xx 320 0.08 117 0.08
5xx 269 0.07 151 0.10
6xx 117 0.03 86 0.06
7xx 273 0.07 110 0.07
Total 4060 1.00 1552 1.00

FIGURE 3. Error Distributions Within Fields as Compared with Zeng’s Study

0.35
Chinese records

0.30
Korean records

0.25

0.20

0.15

0.10

0.05

0.00
Fixed 1xx 240 245 246/740 250 260 300 4xx/8xx 5xx 6xx 7xx
& 0xx
Hee-sook Shin 77

from the system. It is thus clear that understanding Korean romanization and
emphasizing consistency between romanized and vernacular fields is crucial
to improving the quality of cataloging in Korean records. Missing and incor-
rect access points created a similar problem, as several entries failed to match
established and expected authority forms, complicating record retrieval. These
were serious errors impacting the efficiency and usability of the databases as
catalogers duplicated records already in the system and users would presum-
ably have to work to locate items based on substandard records.
Downloaded by [DePaul University] at 20:39 14 November 2014

7. RECOMMENDATIONS ON QUALITY ENHANCEMENT


IN KOREAN CATALOGING

As a result of this study, the author offers the following recommendations


for enhancing quality in Korean cataloging:
1. As errors of mis-romanization are the most likely to affect record re-
trieval, it is clear that catalogers should take action to alleviate the confusion
inherent in romanizing the Korean language. The LC romanization table is the
official system for cataloging Korean materials in the United States. However,
it is difficult to understand and apply, even for a native Korean speaker. There-
fore, this system should be simplified and should include a more detailed ex-
planation of each rule. In addition, more examples of potentially ambiguous
romanizations should be included, especially for specific words that repeat-
edly cause problems. These changes would benefit both library patrons and
catalogers.
2. As Korean materials generally contain two scripts, Chinese and Korean,
cataloging such materials can become complicated, especially in terms of uni-
form titles, access points, and so on. Additionally, edition information is not
always clear, as publishers frequently identify a reprint as a new edition. To
lessen confusion, catalogers should have a thorough understanding of rules
contained in AACR2 and the LCRIs as regarding both bibliographic descrip-
tion and the choice of access points. The comprehensive AACR2 workbook for
cataloging East Asian publications, created by the Committee of East Asian
Library members and the Library of Congress, is also a vital tool as it incorpo-
rates rules and guidance particular to CJK catalogers. A revision of the current
edition is in process and one hopes that it will be available shortly. The updated
edition should address the findings of both this study and Zeng’s study and
also incorporate more Korean examples.
3. Authority records should include vernacular characters for all non-
western language entries. This policy would eliminate ambiguous identifica-
tions and provide more accurate access points. The name Kim, Yong-su, for
78 CATALOGING & CLASSIFICATION QUARTERLY

example, has twelve different authority entries, which likely describe different
individuals, and correspond to different vernacular characters. Including ver-
nacular characters would also identify which language characters (Chinese or
Korean) catalogers should use in the fields as access points.
4. A duplicate detection program is currently available to detect and elimi-
nate non-CJK duplicates in the OCLC database. The database would benefit
from a similar program to detect CJK duplicates. Future versions of OCLC and
RLIN could also provide an automatic romanization function for Korean ver-
nacular characters, modeled after that currently used in cataloging Arabic and
Downloaded by [DePaul University] at 20:39 14 November 2014

Chinese in OCLC. This function would be of particular help to new catalogers


and to people unfamiliar with the principles of romanization. Such a program
would help to eliminate romanization errors.
5. Typographical errors, although they greatly affect the accessibility of a
record are easy to fix. To help reduce the number of typographical errors, cata-
logers should use “cut and paste” functions when copying all access point en-
tries from authority files.
6. Administrators should emphasize record quality over record quantity,
giving catalogers time to perform a more thorough search before inputting new
records and more time to review records before exporting them. Also, as cata-
logers encounter errors, they should either enhance records or send error re-
ports to OCLC or RLIN. Statistical figures of new records created should not
be the sole determinant of staff efficiency.
7. Regular workshops and training sessions focusing on CJK cataloging
should be conducted, so that catalogers can keep current with the latest docu-
mentation and format guidelines.
8. Catalogers of Korean materials should foster information sharing and
discussion groups, so that they can make errors known and correct them
throughout the system. Korean catalogers should lead small discussion ses-
sions at the CEAL meetings. E-group can also promote idea sharing, and Ko-
rean catalogers can use the large, already established EastLib group. The
CEAL Korean committee web site would contribute to Korean-language rec-
ord quality by adding a Korean cataloging FAQ to the site.

8. CONCLUSION

Quality in cataloging is a vital issue in library services for both patrons and
staff. Cataloging is reemerging as a core service in libraries, fostered by the
ever increasing body of electronic resources in which quality becomes even
more important as the record serves as the only access to the material. The gen-
Hee-sook Shin 79

eral availability of all library resources is directly affected by the quality of the
library database.
The library staff is responsible for maintaining the quality of the local data-
base and should perform regular quality searches using various strategies.
Also, individual catalogers must always be on the alert for errors that they fre-
quently find (or make) and work to remedy them. Although persistence and at-
tention to detail are required, the resulting database will be helpful both to the
public and to technical services staff and will reflect positively on the library.
Shared databases should also be regularly maintained because errors have a
wide effect on the efficient use of the system. In foreign language cataloging,
Downloaded by [DePaul University] at 20:39 14 November 2014

regular maintenance is especially vital, as the shared databases should provide


a pool of reliable records to libraries lacking language resources. Regular qual-
ity control measures should be exercised to ensure that each record adheres to
high standards, is accurate, follows appropriate guidelines, and is consistent
with other records.
The purpose of this study was to alert catalogers to some major problems
in cataloging Korean materials and to offer suggestions for improvement.
Ideally, these suggestions will help catalogers avoid these errors altogether.
This study could also help in the training of new catalogers and may inspire
other researchers to investigate the quality control of other languages included
in shared databases. The author hopes that these findings will be integrated
into the quality control aspect of the cataloging process and in training manu-
als and workshops.

Received: January 2002


Revised: May 2002
Accepted: June 2002

NOTES
1. Lei Zeng, “An Evaluation of the Quality of Chinese-Language Records in the
OCLC OLUC Database and A Study of a Rule-based Data Validation System for On-
line Chinese Cataloging.” Ph.D Diss., University of Pittsburgh, 1992.
2. Janet Swan Hill, “The Elephant in the Catalog: Cataloging Animals You Can’t
See or Touch.” Cataloging & Classification Quarterly, 23, no. 1 (1996): 5-25.
3. Joseph C. Harmon, “The Death of Quality Cataloging: Does It Make a Differ-
ence for Library Users?” Journal of Academic Librarianship, 22 (July 1996): 306-307.
4. Lydia W. Wasylenko, “Building Quality That Counts into Your Cataloging Op-
eration.” Library Collections, Acquisitions, and Technical Services, 23, no. 1 (Spring
1999): 101-104.
5. Rahmatollah Fattahi, “Super Records: an Approach Towards the Description of
Works Appearing in Various Manifestations.” Library Review, 45, no. 4, (1996):
19-29.
80 CATALOGING & CLASSIFICATION QUARTERLY

6. Sarah E. Thomas, “Quality in Bibliographic Control.” Library Trends, 44 (Win-


ter 1996): 491-505.
7. Peter S. Graham, “Quality in Cataloging: Making Distinctions.” The Journal of
Academic Librarianship, 16 (Sept. 1990): 213-218.
8. Richard Reeb, “A Quantitative Method for Evaluating the Quality of Cata-
loging.” Cataloging & Classification Quarterly, 5, (Winter 1984): 21-26.
9. Thomas, see note 6.
10. Ann D. Chapman, “Up to Standard? A Study of the Quality of Records in a
Shared Cataloguing Database.” Journal of Librarianship and Information Science, 26
(Dec. 1994): 201-210.
11. Edward T. O’Neill, “Characteristics of duplicate records in OCLC’s online un-
ion catalog.” Library Resource & Technical Services, 37 (Jan. 1993): p. 59-71.
Downloaded by [DePaul University] at 20:39 14 November 2014

12. James R. Dwyer, “The Catalogers’ ‘Invisible College’ at Work: The Case of the
Dirty Database Test." Cataloging & Classification Quarterly, 14, no. 1 (1991): 75-82.
13. Lisa Romero, “Original Cataloging Errors: A Comparison of Errors Found in
Entry-level Cataloging with Errors Found in OCLC and RLIN.” Technical Services
Quarterly, 12, no. 2 (1994): 13-27.
14. Robert N. Bland, “Quality Control in a Shared Online Catalog Database: The
Lambda Experience.” Technical Services Quarterly, 4 (Winter 1986): 43-58.
15. Sheila S. Intner, “Quality in Bibliographic Databases: An Analysis of Mem-
ber-contributed Cataloging in OCLC and RLIN.” Advances in Library Administration
and Organization, 8 (1989): 1-24.
16. Fung-yin K. Simpson, “Quality Control of Chinese Monographic Records: A
Case Study.” Journal of East Asian Libraries, no. 116 (Oct. 1998): 31-40.
17. Karen T. Wei and Sachie Noguchi, “RLIN CJK versus OCLC CJK: the Illinois
experience.” Library Resources & Technical Services, 33, no. 2 (1989): 140-151.
18. Hisako Kotaka, interview by author. OCLC, OH, 20 November 2000.
19. CJK Database Statistics. http://www.oclc.org/oclc/cjk/statmenu.htm.
20. Karen Smith-Yoshimura, Email to author, 18 January 2001.
21. Barbara G. Preece, “Preliminary LC records for monographs in OCLC.” Infor-
mation Technology and Libraries, 11 (Mar. 1992): 3-9.
22. Library of Congress. Cataloger’s Desktop Version 2001 issue 1, Library of
Congress, Washington D.C.
23. Online Computer Library Center, “OCLC Bibliographic Formats and Stan-
dards.” Ohio: OCLC, 1999.
24. Online Computer Library Center. OCLC Authority File. OCLC, Ohio.
25. G. M. McCune, and E. O. Reischauer. The Romanization of the Korean Lan-
guage: Based upon its Phonetic Structure. Korea : s.n., 1939?.
26. Samuel E. Martin, Yang Ha Lee and Sung-Un Chang. A Korean-English dictio-
nary. New Haven: Yale University Press, 1967.
27. Kotaka see note 18.
Hee-sook Shin 81

APPENDIX 1. Growth of Korean Records in OCLC and RLIN

(1:1000)

1994 1995 1996 1997 1998 1999 2000

OCLC 69 88 103 113 128 148 159

RLIN 82.9 106.9 120.5 132.9 150.9 172.9 185.5


Downloaded by [DePaul University] at 20:39 14 November 2014

200
180
160
140
120
100 OCLC
RLIN
80
60
40
20
0
1994 1995 1996 1997 1998 1999 2000
82 CATALOGING & CLASSIFICATION QUARTERLY

APPENDIX 2. Sample Development Based on All 2000 Records Obtained


from OCLC, Limited Both by Institution and by Encoding Level

Institutions Encoding level Number of records Proportion of records (%)


Original records (blank, 4, 7) 674 33.7
Library of Congress
Enhanced by LC (L, I) 58 2.9
Original records (I ) 425 21.25
Enhanced records (I) 83 4.15
OCLC member
Minimum records (K) 352 17.6
Downloaded by [DePaul University] at 20:39 14 November 2014

Pre-AACR or Juvenile records ( i & j ) 197 9.85


Original records (L ) 115 5.75
RLIN Member
Minimum records (M) 89 4.45
Juvenile records ( j) 7 0.35

1. Encoding level codes, indicated by a capital or numeric character, reflect the degree of completeness of machine-readable
(MARC) records.

• I–full-level input by OCLC participants.


• 4–core-level indicates a record that is less-than-full, but greater-than-minimal-level cataloging and that meets
core record standards for completeness.
• K–less-than-full input by OCLC participants, used also for core-level records that are not created by Cooperative
Cataloging participants.
• L–full-level record tape-loaded from an institution other than LC or NLM.
• M–less-than-full level record tape-loaded from institutions other than LC, NLM.
• 7–minimal-level record that meets the National Level Bibliographic Record specifications.

2. Descriptive Cataloging codes according to the provisions of International Standard Bibliographic Description (ISBD).

• i–record has the descriptive cataloging and punctuation conventions of ISBD but is known to be a non-AACR2 rec-
ord

3. Target Audience is also reflected by a lower case character.

• j–juvenile work, indicates items intended for children (up to a 9th grade level or 15 years old).
Hee-sook Shin 83

APPENDIX 3. Bibliographic Record Showing Errors


Downloaded by [DePaul University] at 20:39 14 November 2014
84 CATALOGING & CLASSIFICATION QUARTERLY

APPENDIX 4. An Example of the Excel Sheet

fixed RO RO RO RO
& 0xx F1 F2 F3 F4 M1 M2 M3 S1 S2 R1 R2 R3 R4 R5 E1 E2 E3 E4 1 2 3 4 Total
1 0
2 0
3 0
4 0
5 1 1
6 0
Downloaded by [DePaul University] at 20:39 14 November 2014

7 0
8 1 1
9 1 1
10 0
11 0
12 0
13 1 1
14 3 3
15 0
16 0
17 1 1 2
18 0
19 1 1
20 0
21 1 1 2
22 0
23 1 1
24 0
25 0
26 1 1
27 1 1
28 0
29 0
30 1 1 2
31 1 1
32 0
33 0
34 0
35 1 1
36 0
37 0
38 0
Hee-sook Shin 85

APPENDIX 5. Number of Error Occurrences on OCLC and RLIN Records

1) Number of Error Occurrences in OCLC

RO RO RO RO
F1 F2 F3 F4 M1 M2 M3 S1 S2 R1 R2 R3 R4 R5 E1 E2 E3 E4 1 2 3 4 Total %
Fixed
&0xx 3 1 1 10 29 0 0 0 93 1 2 1 31 0 0 0 0 0 0 0 0 0 172 0.34
1xx 5 0 0 6 5 8 3 1 0 5 0 4 0 0 0 0 0 1 3 0 1 0 42 0.08
Downloaded by [DePaul University] at 20:39 14 November 2014

240 1 6 0 2 12 2 1 0 0 2 2 0 0 0 2 0 0 0 1 1 1 0 33 0.06
245 0 13 9 58 1 3 0 17 0 1 2 5 0 0 29 5 0 1 71 47 14 2 278 0.55
246/
740 2 6 0 0 28 1 8 0 0 0 2 0 0 0 3 0 0 0 9 11 1 0 71 0.14
250 0 0 0 1 4 1 0 0 0 0 2 0 0 0 0 0 0 0 1 2 1 0 12 0.02
260 0 0 3 29 0 0 2 3 0 0 4 5 0 0 12 1 0 0 15 6 4 1 85 0.17
300 0 0 4 20 1 0 0 0 0 0 3 1 0 0 0 0 0 2 0 0 0 0 31 0.06
4xx/
8xx 15 14 0 11 14 0 4 2 0 2 2 6 1 0 4 3 0 0 8 2 4 0 92 0.18
5xx 4 1 0 64 12 0 1 2 0 2 1 0 0 0 11 0 0 4 8 7 3 0 120 0.24
6xx 1 0 4 6 1 1 11 0 0 4 0 5 0 18 4 0 0 7 3 0 0 0 65 0.13
7xx 4 2 0 14 31 11 0 0 0 4 0 4 0 0 8 0 0 1 4 0 1 0 84 0.17
Total 35 43 21 221 138 27 30 25 93 21 20 31 32 18 73 9 0 16 123 76 30 3 1085 2.14

Error Ranking by Field


Rank MARC field
13 245 (title statement)
14 Fixed & 0xx (fixed fields and 0xx fields)
15 5xx (General notes)
16 4xx/8xx (Series statement)
17 260 (Imprint)
18 7xx (Added entry for personal name, corporate body, uniform title)
19 246/740 (Varying Form of Title and/or title added entries)
20 6xx (Subject added entries)
21 1xx (Main entries)
22 240 (Uniform title)
23 300 (Physical Description)
24 250 (Edition Statement)
86 CATALOGING & CLASSIFICATION QUARTERLY

APPENDIX 5 (continued)

Error Ranking by category


Rank Error type
1 F4 (Incorrect or missing punctuation and space according to ISBD standards)
19 M1 (Missing a whole field or containing an extra field)
20 RO1 (Incorrect or misspelling romanization)
21 S2 (Inconsistency between corresponding fields or values)
22 RO2 (Incorrect word division (including hyphen usage))
Downloaded by [DePaul University] at 20:39 14 November 2014

23 E1 (Incorrect upper and lower case)


24 F2 (Incorrect indicator)
25 F1 (Incorrect field tag)
26 R4 (Incorrect call number)
27 R3 (Incorrect content)
28 RO3 (Incorrect or missing diacritics)
M3 (Missing the corresponding vernacular field or containing an extra entry for a non one-to-one name)
29 M2 (Missing a part of the entry or containing an extra part for the entry)
30 S1 (Inconsistency between romanized and vernacular data)
31 R1 (Incorrect content in a whole field)
F3 (Incorrect or missing subfield code, incorrect sequence of subfields)
32 R2 (Other content error)
33 E4 (Other editing and inputting errors (e.g., misspelling of English words))
34 R5 (Incorrect subject heading)
35 E2 (Punctuation used CJK mode)
36 RO4 (Incorrect or missing vernacular character)
37 E3 (Space used CJK mode)
Hee-sook Shin 87

2) Number of Error Occurrences in RLIN

RO RO RO RO
F1 F2 F3 F4 M1 M2 M3 S1 S2 R1 R2 R3 R4 R5 E1 E2 E3 E4 1 2 3 4 Total %
Fixed
& 0xx 1 0 4 0 2 0 0 0 31 1 0 1 6 0 0 0 0 1 0 0 0 0 47 0.41
1xx 2 0 0 6 1 4 0 0 0 0 0 2 0 0 0 0 0 0 1 0 0 0 16 0.14
240 1 2 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0.04
245 0 4 3 65 0 0 0 13 0 0 1 0 0 0 10 0 0 0 14 14 11 0 135 1.17
Downloaded by [DePaul University] at 20:39 14 November 2014

246/
740 0 4 0 1 2 1 4 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 13 0.11
250 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 0 0 12 0.10
260 0 0 3 44 1 1 0 7 0 1 3 0 0 0 8 0 0 0 5 5 1 0 79 0.69
300 0 0 12 22 0 11 0 0 0 0 0 11 0 0 1 0 0 0 0 0 0 0 57 0.50
4xx/
8xx 1 1 0 13 1 0 0 0 0 0 1 1 0 0 6 0 0 0 0 1 0 0 25 0.22
5xx 2 0 0 18 2 0 0 1 0 0 0 0 0 0 4 0 0 2 0 1 1 0 31 0.27
6xx 0 1 4 3 1 0 1 0 0 0 0 0 0 6 2 0 0 3 0 0 0 0 21 0.18
7xx 0 1 0 6 8 2 2 2 0 0 0 2 0 0 0 0 0 1 1 0 1 0 26 0.23
Total 7 13 26 180 20 19 8 23 31 2 5 17 6 6 31 0 0 7 21 31 14 0 467 4.06

Error Ranking by field


Rank MARC field
1 245 (title statement)
2 260 (Imprint)
3 300 (Physical Description)
4 Fixed & 0xx (fixed fields and 0xx fields)
5 5xx (General notes)
6 7xx (Added entry for personal name, corporate body, uniform title)
7 4xx/8xx (Series statement)
8 6xx (Subject added entries)
9 1xx (Main entries)
10 246/740 (Varying Form of Title and/or title added entries)
11 250 (Edition Statement)
12 240 (Uniform title)
88 CATALOGING & CLASSIFICATION QUARTERLY

APPENDIX 5 (continued)

Error Ranking by category


Rank Error type
1 F4 (Incorrect or missing punctuation and space according to ISBD standards)
2 RO2 (Incorrect word division (including hyphen usage))
S2 (Inconsistency between corresponding fields or values
E1 (Incorrect upper and lower case)
3 F3 (Incorrect or missing subfield code, incorrect sequence of subfields)
Downloaded by [DePaul University] at 20:39 14 November 2014

4 S1 (Inconsistency between romanized and vernacular data)


5 RO1 (Incorrect or misspelling romanization)
6 M1 (Missing a whole field or containing an extra field)
7 M2 (Missing a part of the entry or containing an extra part for the entry)
8 R3 (Incorrect content)
9 RO3 (Incorrect or missing diacritics)
10 F2 (Incorrect indicator)
11 M3 (Missing the corresponding vernacular field or containing an extra entry for a non one-to-one name)
12 F1 (Incorrect field tag)
E4 (Other editing and inputting errors (e.g., misspelling of English words))
13 R4 (Incorrect call number)
R5 (Incorrect subject heading)
14 R2 (Other content error)
15 R1 (Incorrect content in a whole field)
16 E2 (Punctuation used CJK mode)
E3 (Space used CJK mode)
RO4 (Incorrect or missing vernacular character)
Hee-sook Shin 89

APPENDIX 6. Error Occurrences by Fields After Normalization for OCLC and


RLIN Records

1) Error Occurrence by Fields After Normalization for OCLC Records

MARC field x y r z
(entry number) (error number) (rate = y/x) (r* sample number)
Fixed & 0xx 508 172 0.34 172.00
1xx 336 42 0.13 63.50
240 90 33 0.37 186.27
Downloaded by [DePaul University] at 20:39 14 November 2014

245 508 278 0.55 278.00


246/740 175 71 0.41 206.10
250 266 12 0.05 22.92
260 508 85 0.17 85.00
300 508 31 0.06 31.00
440/490/8xx 223 92 0.41 209.58
5xx 515 120 0.23 118.37
6xx 798 65 0.08 41.38
7xx 322 84 0.26 132.52

Fixed & 0xx


6xx 7xx
1xx
5xx
240

440/490/8xx
300 245
260 246/740

250
90 CATALOGING & CLASSIFICATION QUARTERLY

APPENDIX 6 (continued)

2) Error Occurrence by Fields After Normalization for RLIN Records

MARC field x y r z
(entry number) (error number) (rate = y/x) (r* sample number)
Fixed & 0xx 115 47 0.41 47.00
1xx 73 16 0.22 25.21
240 6 5 0.83 95.83
245 115 135 1.17 135.00
Downloaded by [DePaul University] at 20:39 14 November 2014

246/740 42 13 0.31 35.60


250 45 12 0.27 30.67
260 115 79 0.69 79.00
300 115 57 0.50 57.00
440/490/8xx 41 25 0.61 70.12
5xx 142 31 0.22 25.11
6xx 195 21 0.11 12.38
7xx 85 26 0.31 35.18

Fixed & 0xx


6xx 7xx
5xx 1xx
440/490/8xx 240

300

260 245
250 246/740

S-ar putea să vă placă și