Sunteți pe pagina 1din 15

This article was downloaded by: [Universitat Politècnica de València]

On: 29 October 2014, At: 02:21


Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Cataloging & Classification Quarterly


Publication details, including instructions for authors and
subscription information:
http://www.tandfonline.com/loi/wccq20

Spaces in Korean Bibliographic Records:


To Be, or Not to Be
a b c
Wooseob Jeong , Joy Kim & Miree Ku
a
School of Information Studies , University of Wisconsin–Milwaukee ,
Milwaukee, Wisconsin, USA
b
Korean Heritage Library , University of Southern California , Los
Angeles, California, USA
c
Duke University , Durham, North Carolina, USA
Published online: 23 Sep 2009.

To cite this article: Wooseob Jeong , Joy Kim & Miree Ku (2009) Spaces in Korean Bibliographic
Records: To Be, or Not to Be, Cataloging & Classification Quarterly, 47:8, 708-721, DOI:
10.1080/01639370903203382

To link to this article: http://dx.doi.org/10.1080/01639370903203382

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the
“Content”) contained in the publications on our platform. However, Taylor & Francis,
our agents, and our licensors make no representations or warranties whatsoever as to
the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content
should not be relied upon and should be independently verified with primary sources
of information. Taylor and Francis shall not be liable for any losses, actions, claims,
proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or
howsoever caused arising directly or indirectly in connection with, in relation to or arising
out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &
Conditions of access and use can be found at http://www.tandfonline.com/page/terms-
and-conditions
Cataloging & Classification Quarterly, 47:708–721, 2009
Copyright © Taylor & Francis Group, LLC
ISSN: 0163-9374 print / 1544-4554 online
DOI: 10.1080/01639370903203382

Spaces in Korean Bibliographic Records:


To Be, or Not to Be

WOOSEOB JEONG
School of Information Studies, University of Wisconsin–Milwaukee,
Milwaukee, Wisconsin, USA
Downloaded by [Universitat Politècnica de València] at 02:21 29 October 2014

JOY KIM
Korean Heritage Library, University of Southern California, Los Angeles, California, USA
MIREE KU
Duke University, Durham, North Carolina, USA

The purposes of this study are (1) to investigate how spacing in


the Korean script fields in bibliographic records affects retrieval
in various OPAC systems and (2) to propose ways to deal with
these problems. We conclude that, from the end-user perspective,
the systems using morphological indexing are harder to use than
the systems using single character indexing where spaces had no
impact on retrieval. We also recommend using both n-gram and
morphological indexing, adopting good ranking systems, develop-
ing active education programs on this issue, and providing cross
references between Chinese and Korean characters.

KEYWORDS Korean language bibliographic records, word di-


vision, Korean language OPAC, Korean language catalogs,
romanization

INTRODUCTION

North American libraries began collecting Chinese, Japanese, and Korean


(CJK) materials in 1869.1 Currently the combined holdings of CJK books in
academic institutions in North America are approaching 21 million volumes.2
Up until the 1980s, materials in the CJK languages had been cataloged via

Received May 2009; revised July 2009; accepted July 2009.


Address correspondence to Wooseob Jeong, Associate Professor, School of
Information Studies, University of Wisconsin–Milwaukee, P.O. Box 413, Milwaukee, WI 53201,
USA. E-mail: wjj8612@uwm.edu

708
Spaces in Korean Bibliographic Records 709

romanization, a process of converting the CJK scripts into roman alphabet


according to a predetermined set of rules. Romanization at that time was
basically a filing and indexing device so as to fit the CJK records into the
roman-alphabet based environment of North American libraries, be it either
card catalogs or bibliographic utilities. Then in 1983 the Research Libraries
Group (RLG) revolutionized CJK cataloging by introducing an automated
cataloging system called RLIN CJK that used CJK characters in cataloging. In
1986, OCLC followed suit with its own CJK cataloging system. Since then,
many East Asian Libraries have joined one of the two systems and have
eagerly begun automating the cataloging of their CJK collections. In CJK
cataloging, the romanized fields (such as author, title, imprint, series, etc.)
Downloaded by [Universitat Politècnica de València] at 02:21 29 October 2014

are accompanied by their corresponding CJK scripts, providing two different


means for description (and access if the system permits script searching) for
the same bibliographic information. As of April 13, 2009, there are 3,552,010
CJK records in OCLC (1,692,842 Chinese, 1,599,970 Japanese, and 259,198
Korean).3
When RLIN merged with OCLC in 2007, an unexpected issue arose with
regard to the handling of spaces in Korean script fields. Before the merger,
the two utilities had been using different spacing conventions in Korean
script fields. RLIN had separated each Hangul (the Korean alphabet) word
according to the ALA/LC Romanization Rules for Korean4 exactly paralleling
the romanized fields. OCLC, on the other hand, had used no spaces in script
fields. After the RLIN-OCLC merger, the Library of Congress (LC) and other
former RLIN users decided to maintain their practice of retaining spaces in
the parallel Korean script fields even in the new OCLC environment. This
resulted in the coexistence of two different kinds of Korean bibliographic
records in the current OCLC database: one with spaces (RLIN’s practice)
and the other without spaces (OCLC’s practice) between words in Hangul.
In 2008, the CEAL Committee on Korean Materials organized a panel at its
annual meeting to study the implications of these different spacing conven-
tions on retrieval from the end-user perspective. This article incorporates the
findings of the panel.5

LITERATURE REVIEW

Of the many approaches to indexing in information retrieval, we will consider


the two popular ones that are particularly relevant to the Korean language:
the n-gram approach and the morphological approach.
With n-gram indexing, character strings are used as the basic indexing
unit, and the number of characters in a unit has been the core of the discus-
sions. In the Korean language, Lee and others6 found that more than 80%
of Korean nouns were composed of one or two Hangul characters. Based
on this, Savoy used bi-gram indexing for his research on Asian languages
710 W. Jeong et al.

including Korean.7 Numerous other studies focused on the enhancement of


n-gram indexing for Korean.8
Natural language processing on computational morphology—the struc-
ture of words in linguistics—has been actively studied.9 The morphological
analysis in information retrieval utilizes dictionaries and other techniques to
retrieve similar forms of a word together. Many different methodologies were
proposed for Korean morphological analyses,10 with recent efforts focusing
on expediting the process.11
Many studies pointed out problems with the ALA/LC (American Library
Association/Library of Congress) Romanization Table for Korean, the stan-
dard used by North American libraries. This system is based on the McCune-
Downloaded by [Universitat Politècnica de València] at 02:21 29 October 2014

Reischauer (MR) Romanization rules, which were devised in 1939 by two


Americans in consultation with Korean linguists. Major complaints against
the McCune-Reischauer system have mostly to do with the fact that it is a
phonetic system. As such, the same character can be romanized in several
different ways due to euphonic changes in different contexts, a common fea-
ture in the spoken Korean language. Koreans are not trained to distinguish
voiced from voiceless sounds and vice versa, so among other unfamiliar fea-
tures of the MR system, the diacritical marks in the MR system to distinguish
voiceless from voiced sounds look strange and useless to them. Jeong dis-
cussed these points with ample examples.12 Park13 and Kim14 called for LC’s
adoption of the current Korean government standard, following the Pinyin
conversion model for Chinese.15 The major feature of the ALA/LC rule has
less to do with romanization per se, but more to do with word division,
which neither the MR system nor the Korean government system address at
all. Word division in Korean romanization is an extremely complex issue.
The ALA/LC word division rules require each lexical unit, including particles,
to be separated, which is in stark contrast to the word division rules used in
Korea. This causes confusion for the uninitiated users who are unaware of
the different practices. This problem has already been addressed in the pro-
fessional literature many times.16 It should be noted that the word division
issue is not unique to the Korean language. Huang and Haynes17 found that
in both OCLC and RLIN, recall and precision varied greatly depending on
whether or not the syllables of a keyword were aggregated in Chinese.
The real issue is not which romanization system is better than others,
but that Koreans are not familiar with any romanization system at all. Ro-
manization is never taught in Korean schools, and Koreans like to spell their
names in highly idiosyncratic ways. Kim fully illustrated this point,18 albeit
inadvertently. Twelve Korean native graduate students in a major American
university were asked to romanize a famous South Korean author’s name,
The result was beyond astonishing: each of the 12 students spelled
the name in 12 different ways, and none matched the MR standard used by
North American libraries (Pak Kyong-ni) nor the South Korean system (Bak
Gyeongni)!
Spaces in Korean Bibliographic Records 711

PURPOSES OF THE STUDY

The two purposes of this study are to investigate how spacing in the Korean
script fields affects retrieval in various systems and to propose ways to deal
with the problems, both individually and as a community.

METHODOLOGY
Downloaded by [Universitat Politècnica de València] at 02:21 29 October 2014

In 2008 we conducted some test searches in 15 online public access cata-


log (OPAC) systems in North America and 4 OPAC systems in the Repub-
lic of Korea (South Korea) for comparison purposes. The North American
OPAC systems we tested included: Aleph (4 institutions), Voyager (6 insti-
tutions), and others (5 institutions); the details are listed in Table 3. The
North American OPAC systems were chosen based on the fact that they
have relatively large Korean collections with at least a part-time Korean
Studies librarian. We selected (Korean grammar) as our search
term. This phrase was chosen because we knew every academic or re-
search collection must have at least a few books containing this phrase in its
collection.
We also conducted a brief survey via Eastlib, the primary listserv for the
East Asian librarianship community in North America. The survey question-
naire is appended.

LIMITATIONS

Each local database is unique, made up of records created over a long period
of time by different people, and serving different and varied clienteles. The
records at each institution reflect idiosyncratic local practices that may not
have been duplicated anywhere else. Also, each system is more or less
unique. One vendor may supply the same product to multiple institutions,
but each campus may customize its system. For this reason, it is difficult to
generalize.
For this study, we only used basic search methods such as Keyword
Search or Title Keyword Search, based on the assumption that library users
use the basic keyword search techniques most often.
The North American Korean Studies librarianship field is a small com-
munity, involving less than 25 institutions overall. Accordingly, the overall
number of survey participants and the responses in each group were not big
enough for inferential statistics.
712 W. Jeong et al.

DATA ANALYSIS
Survey of Librarians on Korean Language Support
in their OPAC System
In order to learn about the current state of Korean language support in the
OPAC systems of North American universities we conducted a brief survey
via Eastlib, a listserv. A total of 23 people responded to the online survey,
representing 22 institutions. These institutions use 7 or 8 different systems
(one respondent did not specify the system name). The respondents were
mostly expert Korean studies librarians but also included a few librarians
with limited Korean expertise whose responsibilities include working with
Downloaded by [Universitat Politècnica de València] at 02:21 29 October 2014

Korean collections and patrons. The majority of survey respondents were


from the institutions used for the previously mentioned OPAC searches.
Although CJK cataloging began in the mid-1980s, the CJK OPAC is a
relatively new phenomenon. Of the 15 who reported having a CJK OPAC
with Korean capabilities, 8 have had it for less than five years, and only 5
have had it more than five years. The other two did not answer. As seen in
Table 1, one-third of the North American universities surveyed still do not
support Korean script searching.
Spaces in Korean script fields may or may not affect retrieval in the
systems that support Korean script searching. It is curious to see that even
when libraries use the same system, the role of spaces in Korean scripts is
reported to be different. This could mean (1) the treatment of spaces could
be a local option; (2) respondents’ errors; or (3) both.
When the librarians were asked about their favorite ways to search
for Korean materials, eleven (11) answered “by romanization,” two (2) “by
Korean scripts” and one (1) “both equally.” Those who prefer romanization
said that romanization is easier to use and gives better results. Those who
prefer searching by Korean scripts said it is a more convenient way of
searching, especially when they are not sure about how to romanize the
query terms. Non-native speakers of Korean had a tendency to prefer this

TABLE 1 OPAC Systems and Their CJK Capabilities

Support Korean Spaces Affect Korean


Script Searching? Script Searching?
OPAC System
(respondents) Yes No Yes No n/a

Voyager (9) 8 1 7 1 1
SIRSI (4) 0 4 0 0 4
Innovative (4) 3 1 4 0 0
Aleph (2) 2 0 0 2 0
Horizon (2) 2 0 1 1 0
GEAC (1) 0 1 0 0 1
TLC (1) 0 1 0 0 1
Total 15 8 12 4 7
Spaces in Korean Bibliographic Records 713

TABLE 2 Favorite Search Strategies for Korean Materials in OPAC

By By Korean Both Don’t


Romanization Scripts Equally Know

Librarians 11 2 1 0
Users 4 4 3 3
(observed by
librarians)

Reasons Better, more Convenient, except Different searches
comprehensive, & for Chinese require different
more accurate characters; cannot strategies
results; ease of romanize
searching
Downloaded by [Universitat Politècnica de València] at 02:21 29 October 2014

option. One librarian who employs both said that different queries require
different strategies.
However, when the librarians were asked about their users’ favorite
ways to search for Korean materials, the answers were evenly spread: four
(4) romanization, four (4) scripts, and three (3) both equally. While the
difference between the two groups is curious, it should be noted that since
the information on users’ search behavior was based on the casual perception
(not by systematic observation) of the librarians, it may not accurately reflect
the reality. A direct survey of the user group would have been more reliable.
Table 2 shows a summary of the survey on favorite search methods for
Korean materials.
Other interesting findings from the survey include that Chinese charac-
ters (called Hancha in Korea) are rarely used, at most incidentally, and that
most non-native Korean librarians who perform Korean studies librarianship
duties on a part time basis were either not aware of the role of spaces in
Korean script fields or had an incorrect perception about the issue.

OPAC Systems in North America


To examine the impact of spaces in Korean scripts searching, we tested 15
OPAC systems in North America with a simple query of (romanized
“kugo munpop” meaning Korean grammar) (Table 3).
Voyager and Innovative require spaces before and after the CJK charac-
ters that form “semantic units” to support word searching. Since these systems
support word searching, the users can get search results only if they input
the spaces exactly as in the catalog records. For example, in the University
of British Columbia (UBC) OPAC (Voyager), we retrieved 36 records from a
keyword search for “kugo munpop” using romanization. All the records are
indexed with spaces in the same way as they were created in RLIN, such
714 W. Jeong et al.

TABLE 3 Search for in North American OPACs

Keyword Search
Kugo
Institution System (no space) (with space) munpop

1 Duke Aleph 17 17 8
2 Harvard Aleph 37 37 51
3 Univ. of Michigan Aleph 38 38 20
4 Melvyl (UC) Aleph 90 90 5
5 WorldCat OCLC 234 234 103
6 Univ. of Chicago Horizon 50 50 36
7 Library of Congress Voyager no result 452 1015
8 Univ. of British Columbia Voyager no result 20 36
Downloaded by [Universitat Politècnica de València] at 02:21 29 October 2014

9 Univ. of Hawaii Voyager no result 5 24


10 UCLA Voyager 3 No result 43
11 Cornell Voyager no result 48 133
12 Columbia Voyager no result 26 56
13 Univ. of Washington Innovative 3 No result 29
14 Yale Voyager 1 11 26
15 Princeton Voyager 2 No result 26

as A keyword search for (Hangul with a space)


yielded 20 records, while (without a space) yielded no result.
The same applies for title keyword searches as well. A user must key
in the exact spacing such as space space space Only when
the users can match the spacing exactly as cataloged, will they get a search
result. Without spaces, the records are not retrievable.
In systems that support “character by character searching,” such as Aleph
and the free OCLC WorldCat (http://www.worldcat.org/), the spaces do not
have any impact on retrieval. This means that users can get consistent search
results regardless of spacing in their queries. For example, we retrieved 20
records with a keyword search for “kugo munpop” in romanization from
the University of Michigan OPAC (Aleph). Keyword searches for
(without space) and (with space) returned exactly the same results:
38 records. The Hangul searches produced 18 more records than the search
by romanization. Some of the 18 excess records were presumably false drops,
the records that have the characters in other contexts that may
have little (if at all) to do with Korean grammar (e.g., ). We learned
that the system’s relevancy algorithm pushes irrelevant records to the end of
the list. While UBC and Michigan (former RLIN members) use spaces in their
Korean cataloging, Duke University (an OCLC member) catalogs with no
spaces in the Korean script fields. In Duke’s catalog, users can get the same
consistent search results whether or not they use spaces in their queries.
In order to understand why searching by Korean scripts and by ro-
manization produce significantly different results, we closely examined all
Duke’s catalog records that have As you see in Table 4, the records
Spaces in Korean Bibliographic Records 715

TABLE 4 Search Results of Korean Scripts versus Romanization from Duke University OPAC

Retrieved by Both Retrieved only Retrieved


(Romanization and by Korean only by
Hangul) Searching Scripts Romanization
Downloaded by [Universitat Politècnica de València] at 02:21 29 October 2014

in the third column were not retrieved from our Hangul searches because
they contained the search terms in Chinese characters. This illustrates the
need for cross mapping between Korean and Chinese. Many Korean schol-
arly vocabularies were derived from Chinese, and Korean scholarly works
commonly use Chinese characters. Those Korean words derived from Chi-
nese can directly replace their Chinese equivalents (called Hancha in Korea)
and vice versa. However, since Hangul and Hancha are not cross referenced,
the queries in Hangul characters cannot retrieve the many works written in
Hancha, and vice versa. Another issue has to do with the phonetic changes
in romanization. The records that include affixes surrounding the search
words such as (as in ) or (as in ) cause euphonic changes
in the pronunciation of these words and therefore are romanized differently
in the MR system. So “kugo” becomes “Hangugo” and “munpop” becomes
“munpomnon.” Obviously, they cannot be retrieved with search terms “kugo
munpop.” Therefore, 10 highly relevant records were only retrievable by Ko-
rean scripts because of the affixes in the words.

OPAC Systems in the Republic of Korea


In an attempt to be the most user-friendly, OPAC systems in Korea adopted
the n-gram segmentation (especially bi-gram) as their main indexing method,
supplemented by Korean morphological analysis for precision. The former
emphasizes good recall and the latter precision. The following example will
show how this combination of indexing techniques works. In the case of
the text string is divided into two parts, and
And then, the following index terms are extracted from each string using
bi-gram segmentation: These are supplemented by
which is in the Korean morphological analysis dictionary, to yield
716 W. Jeong et al.

TABLE 5 Search Results using Variations of Spacing in a Longer Query String

OPAC System (two spaces) (one space) (no space)

The National Library of Korea 14 5 5


National Assembly Library 1 6 0
Seoul National University Library 3 5 1
Yonsei University Library 7 6 6

the following the index terms: as well


as the whole string, Users can search the book
Downloaded by [Universitat Politècnica de València] at 02:21 29 October 2014

by using any of these index terms. We note that the bi-gram segmentation
technique produces some irrelevant or meaningless index terms (such as
in this case), which would theoretically result in false drops.
Our test searches in Korean OPACs generated somewhat inconsistent
results. We speculate the reason to be that each system has a unique local
indexing setting. Tables 5 and 6 show the results of a comparative search
in four OPAC systems in Korea using variations of the string,
and respectively. As seen in Table 5, longer queries seemed to gen-
erate more conflicting results, presumably because more spacing variation
possibilities in the query strings. Table 6 shows that shorter search queries
result in more consistent search results regardless of spaces.

WORD DIVISION OPTIONS

Based on our study so far, the current state of the role of spaces in Korean
script searching on OPAC systems in North America and Korea appears to be
largely dependent on individual libraries’ configuration of indexing mecha-
nisms which seems to vary. For librarians who deal with Korean materials
on a daily basis in today’s networked global environment this is a highly
serious issue. Lacking a standard or a common shared practice, their work of
bibliographic control and user service become unnecessarily complicated. It
is an area crying out for further study and improvement.
In an attempt to form a consensus among Korean studies librarians
and make a recommendation to the Library of Congress, the Committee

TABLE 6 Search Results using Variations of Spacing in a Shorter Query String

OPAC System

The National Library of Korea 614 400 114


National Assembly Library 102 102 102
Seoul National University Library 161 161 161
Yonsei University Library 218 141 6
Spaces in Korean Bibliographic Records 717

on Korean Materials of the Council on East Asian Libraries organized a


discussion session on spacing issues in Korean script fields in 2008. At this
meeting, four options were considered and discussed. Table 7 summarizes
possible implications for each of these four options. Please note that this
discussion applies only to the systems where spaces affect searching.
The majority of the Korean Studies librarians present at the meeting
voted for the “As in Resource” option. If the major goal of the catalog records
is to help the information seekers to find the appropriate resource, and
bibliographic matching is fostered by predictability based on predetermined
rules, then this option is the most impractical one, in that users do not “see”
the resource when they search for known or unknown items in the catalog.
Downloaded by [Universitat Politècnica de València] at 02:21 29 October 2014

Therefore, they have no way of knowing the spacing variations used in the
resource. OCLC and the Library of Congress continue to maintain their own
practices, respectively, so the vote has had no impact as of this writing.

RECOMMENDATIONS

We have demonstrated the need for a common standard for Korean scripts
in North American catalogs. While achieving this goal may take time, we
propose the following temporary solutions that are possible to implement
locally.

1. Use both n-gram and morphological indexing in OPACs. In so far as it is


a local option, we recommend character-by-character (n-gram) indexing
for CJK scripts so as to increase recall, supplemented by morphological
indexing. As discussed earlier, the n-gram approach and the morphologi-
cal approach have their own advantages and disadvantages, respectively.
Implementing both approaches will mitigate the shortcomings of each and
ensure the best performance.
2. Adopt good ranking systems in OPACs. The Library of Congress’s OPAC
is a good example of a library system that incorporates effective ranking.
The Google search engine’s well-crafted ranking system works for Korean
Web documents as well. These examples show that the technology for
good ranking systems is available, and we only need to adopt this existing
technology for our local OPAC systems.
3. Implement active education/communication programs. As we pointed out,
the average Korean library user in North America is ignorant of issues re-
lated to romanization, and needs to be enlightened through active library
instruction programs. The systems staff and the administrators within in-
dividual libraries need to be aware of these issues as well, so they could
make appropriate decisions and/or demands to the vendors. The vendors
need to be aware of these problems to improve their products.
Downloaded by [Universitat Politècnica de València] at 02:21 29 October 2014

TABLE 7 Advantages and Disadvantages of Spacing Options in Korean Bibliographic Records

718
Option Advantages Disadvantages

ALA/LC/RLIN • Easy for catalogers to apply when creating records—Just • Not user friendly; the average user finds the rules unfamiliar
follow the same spacing in the romanized fields and difficult to learn
• Predictable for both librarians and users • Mediated search by a reference librarian often necessary
• Look and feel unnatural to average users (i.e., native
speakers)
• Inconsistent with OCLC spacing practice for Chinese and
Japanese, which may mislead the unsuspecting CJ
colleagues. The survey responses confirm this concern.
No Space/OCLC • Easy for catalogers to apply when creating records • Look unnatural
• Easy for everyone to search (only one “rule” to learn • Hard to read
and to remember)
• Predictable for everyone
• Consistent with Chinese and Japanese practice
As in Resource • Easy for catalogers to apply when creating • Paying attention to spaces as well as words when creating
records—Just copy the resource records may become an added chore for catalogers
• Unpredictable, since publishers often ignore word division
rules when designing title page, cover, colophon, spine
(descriptive sources)
• Hard to search since the users do not see the resource when
they search for it on library catalogs
• Totally dependent on the whims of publishers and users,
systematic recall cannot be expected
• Requires superior indexing mechanisms for this method to
work, over which we have little control
• Inconsistent with OCLC spacing practice for Chinese and
Japanese, which can mislead the unsuspecting CJ colleagues
Korean Word Division Rules • Rules are well established based on scientific principles • For those unfamiliar with the rules, there is a learning curve
• Widely accepted • Inconsistent with OCLC spacing practice for Chinese and
• Look and feel natural Japanese, which can mislead the unsuspecting CJ colleagues
• Many users are already familiar with the rules
Spaces in Korean Bibliographic Records 719

4. Provide cross references between Chinese and Korean characters locally.


Even though it is beyond the scope of this article, we include this recom-
mendation because it is directly related to the retrieval of Korean records.
As shown in our test searches, the lack of cross reference between Chinese
and Korean characters makes the significant number of Korean scholarly
works that use some Chinese characters hard to find. While it is not impos-
sible for the most dedicated and conscientious library experts to find them,
in reality those works are as good as lost bibliographically. While wait-
ing for OCLC and local systems to implement cross references between
the two sets of characters, we recommend that catalogers manually create
added access with Korean Hangul to the significant fields that contain
Downloaded by [Universitat Politècnica de València] at 02:21 29 October 2014

Hancha.

CONCLUSION

We demonstrated two different ways of indexing Korean scripts—with or


without spaces—and the impact of spaces (or lack thereof) in various OPAC
systems in North America and Korea. From the end-user perspective, we
conclude that the systems that use single character indexing where spaces
had no impact on retrieval return far superior search results than those use
morphological (i.e., word) indexing.
The average library user does not know the difference between “char-
acter by character searching” and “word by word searching.” The retrieval
process should be simple, easy, and accommodate all possibilities in order
to support a wide variety of users, needs, and materials. Library retrieval
systems should support not only the most basic searches but also the highly
sophisticated searches performed by experts. It is the job of catalogers and
system builders to create an environment where users do not have to be
concerned about these technical issues. We hope this article has provided
some practical information that can be used in working toward that goal.

NOTES

1. Eugene Wu, “CEAL at the Dawn of the 21st Century,” JEAL no. 121 (June 2000), http://
contentdm.lib.byu.edu/u?/EastAsianLibraries,152
2. Council on East Asian Libraries Statistics, CEAL Annual Statistics from All Institu-
tions, Year 1950 to 2008 (every 3 years), http://www.lib.ku.edu/ceal/quickview.asp?view=all yearly&
step=3&tblview=1&from=1950&to=2008
3. “WorldCat Records by Language,” http://www.oclc.org/us/en/worldcat/statistics/charts/
languagecloud.htm
4. http://www.loc.gov/catdir/cpso/romanization/korean.pdf
5. “2008 CKM Annual Meeting Agenda,” http://www.eastasianlib.org/ckm/meetings/p2008.html.
6. J. J. Lee, H. Y. Cho, and H. R. Park, “N-gram-based Indexing for Korean Text Retrieval,”
Information Processing & Management 35, no. 4 (1999): 427–441.
720 W. Jeong et al.

7. J. Savoy, “Comparative Study of Monolingual and Multilingual Search Methods for Use with
Asian Languages,” ACM Transactions on Asian Language Information Processing 4, no. 2 (2005): 163–
189.
8. J. J. Lee, H. Y. Cho, and H. R. Park, “N-gram-based Indexing for Korean Text Retrieval”; J. H.
Lee and J. S. Ahn, “Using n-grams for Korean Text Retrieval.” Proceedings of the 19th Annual International
ACM SIGIR Conference on Research and Development in Information Retrieval. Zurich: Switzerland (1996):
216–224.
9. R. W. Sprout, Morphology and Computation (Cambridge, Mass.: MIT Press, 1992); W. J. Beesley,
J. K. Kenneth, and M. K. Karttunen, Finite State Morphology (Stanford, Calif.: CSLI Publications, 2003).
10. Seung-Shik Kang and Yung Taek Kim, “Syllable-based Model for the Korean Morphology.”
Proceedings of the 15th Conference on Computational Linguistics, Kyoto, Japan, August 5–9, 1994; Deok-
Bong Kim et al., “A Two-level Morphological Analysis of Korean.” Proceedings of the 15th Conference on
Computational Linguistics, Kyoto, Japan, August 5–9, 1994; Hyun S. Park, “Integrating Phrase Structure
Grammar Rules with Spelling Rules for Morphological Analysis of Korean.” Proceedings of the 18th
Downloaded by [Universitat Politècnica de València] at 02:21 29 October 2014

International Conference on Computer Processing of Oriental Languages (1999): 485–490.


11. K. S. Shim and J. H. Yang, “MACH: A Supersonic Korean Morphological Analyzer.” Proceedings
of the 19th International Conference on Computational Linguistics. Taipei: Taiwan (2002): 1–7.
12. Wooseob Jeong, “A Pilot Study of OCLC CJK Plus as OPAC.” Library & Information Science
Research 20, no. 3 (1998): 271–292.
13. J.-R. Park, “Information Retrieval of Korean Materials Using the CJK Bibliographic System:
Issues and Problems.” Proceedings of the Second KSAA Biennial Conference: Korean Studies at the Dawn
of the Millennium. Australasia: Korean Studies Association (2001): 245–255.
14. S. Kim, “Romanization in Cataloging of Korean Materials,” Cataloging & Classification Quarterly
43, no. 2 (2006): 53–76.
15. Changing a long-standing romanization standard such as the MR system is an extremely serious
proposition and must not be taken lightly. We are reminded that the transition from Wade-Giles to Pinyin
for Chinese took more than three decades for LC to prepare and complete, not to mention the great
burden the rest of the North American libraries had to bear in converting millions of their own records
combined. The current Korean government standard is still new and does not address the many issues
related to managing bibliographic records in North American libraries. In addition, since South Korea
has changed its romanization systems several times in the past decades, the overseas Korean studies
community is wary of the current system’s long-term viability. For these reasons, we are not in support
of changing romanization rules at this time.
16. Wooseob Jeong, “A Pilot Study of OCLC CJK Plus as OPAC,” Library & Information Science
Research 20, no. 3 (1998): 271–292; J.-R. Park, “Information Retrieval of Korean Materials Using the
CJK Bibliographic System: Issues and Problems.” Proceedings of the Second KSAA Biennial Conference:
Korean Studies at the Dawn of the Millennium, Australasia: Korean Studies Association (2001): 245–255;
Kim, “Romanization in Cataloging of Korean Materials.”
17. Huang Jie and Kathleen J. M. Haynes, “The Issue of Word Division in Cataloging Chinese
Language Titles,” Cataloging & Classification Quarterly 38, no. 1 (2004): 27–42.
18. Kim, “Romanization in Cataloging of Korean Materials.”
Spaces in Korean Bibliographic Records 721

APPENDIX
Survey on Character Searching (Hangul, Hancha) in Local OPACs
1. Your name (optional):
2. Institution:
3. Your OPAC vendor
4. Does your OPAC support Hangul/Hancha searching?
No (Please skip to item 10 at the end of the survey)
Yes Since (month/year)
5. What is your favorite way of searching?
By romanization By characters No preference
Downloaded by [Universitat Politècnica de València] at 02:21 29 October 2014

Why?

6. Over the past three months, I have searched in my OPAC by:


Romanization: approximately % of the time
Hangul: approximately % of the time
Hancha: approximately % of the time
7. For the following question, just select one statement based on what you
know or think AS OF NOW. Please do NOT perform research to find the
right answer.
In my OPAC, spaces in the character fields make a difference in
retrieval
In my OPAC, spaces in the character fields do NOT make a diffe-
rence
I don’t know if spaces make a difference or not: my guess is they
do
I don’t know if spaces make a difference or not: my guess is they
don’t
I don’t know if spaces make a difference or not: I can’t guess
8. If your response to number 7 above was any of the last three, please
perform a few Hangul searches, both with and without spaces between
words, in your OPAC, & answer this.
I learned that spaces in the character fields affects retrieval
I learned that spaces in the character fields do NOT make a diffe-
rence
9. My library USERS search in the OPAC mostly by:
Romanization
Characters
Both equally
I don’t know
10. Thank you for your time! If you want me to share the result with you,
please give me your name and email below: