Documente Academic
Documente Profesional
Documente Cultură
Article information:
Access to this document was granted through an Emerald subscription provided by 434496 []
For Authors
If you would like to write for this, or any other Emerald publication, then please use our Emerald for
Authors service information about how to choose which publication to write for and submission guidelines
are available for all. Please visit www.emeraldinsight.com/authors for more information.
OIR
29,2
The current issue and full text archive of this journal is available at
www.emeraldinsight.com/1468-4527.htm
SAVVY SEARCHING
208
Peter Jacso
University of Hawaii, Manoa, Hawaii, USA
Abstract
The launch of Google Scholar not surprisingly drew much attention and praise,
although not necessarily for the right reasons, both from the popular and the
professional media. Google makes it very easy and free to find scholarly information
about any topic an important service for those who do not have access to the most
appropriate fee-based indexing/abstracting databases which traditionally have helped
in information discovery. Google Scholar goes beyond information discovery by
leading qualifying users at subscribing libraries to the primary digital documents, and
any users to the millions of open access (free) primary documents offered through
mega-databases of preprint and reprint servers, as well as to the full text digital
collections of several government agencies and research organizations. Google also
deserves credit for introducing albeit a bit belatedly advanced options to refine the
search process. On the negative side, the most important problem is that the crawlers
of Google Scholar have not indexed millions of articles, even though they were let into
the digital archives of most of the largest academic publishers and preprint servers and
repositories. The stunning gaps give a false impression of the scholarly coverage of
topics and lead to the omission of highly relevant articles by those who need more than
just a few pertinent research documents. The rather enigmatic presentation of the
results befuddles many users and the lack of any sort options frustrates the savvy
searchers.
Pros
Google deserves credit for making the first step in providing a tool for discovering
scholarly information, even though access may be limited to very short bibliographic
citations of articles and conference papers in a sizable segment of Google Scholar. Still,
many of the records have informative snippets from the full text for any users, and
many also offer the abstracts of the articles for anyone. This free service alone is
Cons
The underlying problem with Google Scholar is that Google is as secretive about its
coverage as the North Korean government about the famine in the country. There is no
information about the publishers whose archive Google is allowed to search, let alone
about the specific journals and the host sites covered by Google Scholar.
Google Scholar:
the pros and
the cons
209
OIR
29,2
210
Figure 1.
Field-specific indexes for
search criteria
Google Scholar:
the pros and
the cons
211
Figure 2.
The hit ratio between the
native search engine and
Google Scholar is 8:1
(NPG) yields a single record, while the native search engine finds eight items, all of
them relevant.
Searching for the words tsunami warning in the full text using the two alternative
tools shows an even bigger discrepancy; 17 items retrieved by the native search engine
and only one by Google Scholar (see Figure 2).
For verification, the 16 additional hits were searched using the allintitle option in
Google Scholar without any site limitations to see if the records may be available
through other sources. One was found in ADS, two in the archive of the National
Institute of Health (NIH). For one article there was one minimalist record found labeled
as [CITATION]. This follow-up known-item search still resulted in an abysmal hit rate
in Google Scholar (see Figure 3).
OIR
29,2
212
Figure 3.
The ratio is far worse for
full-text searching
Apparently Google did not fully index the current eight-year collection, let alone the
archive of Nature (which includes all the issues between 1987-1996) and the other 64
journals of NPG, all of which are hosted on the nature.com site. The native search
engine at the NPG site finds nearly 87,000 records for items published in Nature alone
between January 1987 and December 2004; Google Scholar finds only 13,700 records
from the entire nature.com site. Using Google Scholar to search for the exact phrase
tsunami warning in the Nature retrieves one hit for a 1993 article from the ADS
database of Harvard and none from Natures archive (see Figure 4). The native search
engine finds seven matching records.
The ADS record provides a link to the Nature archive, but the otherwise excellent
ADS collection is not a substitute for Natures 18 years of digital coverage at its home
site. In addition the test searches revealed that Google Scholar also has a puny
coverage of ADS itself. The native search engine of the ADS database finds 32 records
with the exact phrase tsunami warning in the abstract, while Google Scholar
retrieves a mere nine records for the same query from the ADS database. It is quite
telling about the shallowness of coverage that Google finds only 268,600 of the more
than 4.1 million indexing/abstracting records in ADS.
A similar pattern is found when searching the ten-year archive of Science magazine
(October 1995-December 2004) by the native search engine (nearly 40,000 records)
versus Google Scholar with the site:sciencemag.org parameter (11,800 records). For the
exact phrase tsunami warning Google Scholar retrieved a single record with this site
limiter parameter, while the native search engine found three articles. Although Google
Scholar does not always reproduce the same number of hits even when repeated within
one hour intervals, these hit figures did show up consistently (see Figure 5). This was
not the case when searching the site of the Proceedings of the National Academy of
Sciences (PNAS), which retrieved 12,900 hits in late November, but 300 fewer records
on 1 January 2005. I can only speculate that Google dynamically assigns the server to
answer the query and the query-servers may not mirror their content exactly. The
above three periodicals are among the most cited and most respected scholarly journals
in their respective fields. If Google Scholar finds only 10-30 per cent of the records
Google Scholar:
the pros and
the cons
213
Figure 4.
A record for a Nature
article retrieved only from
the ADS database
OIR
29,2
214
Figure 5.
Search results from ADS
through its native search
engine and Google Scholar
which are available through using the sophisticated, still intuitive, native search
engines, users would remain unaware of many potentially important articles.
These days, when scientists, administrators, politicians and financial experts need
to find comprehensive and high quality scholarly information about the state of the art
in tsunami warning systems to implement a feasible solution for the devastated Indian
Ocean region, many will turn to Google Scholar to discover only a fragment of the
scholarly literature. They also miss out on many scholarly papers which are open
access (sometimes after an embargo of 3-12 months), as is the case with the
poorly-covered papers published in the top-cited PNAS. Google has kept the beta label
as a shield for some of its services for years The fact that Google Scholar is in beta
version is not a good excuse for the massive content omissions. Google has used the
beta shield for some of its services for over two years after their launch. Hopefully,
Google Scholar will come out from its beta in a much shorter time, disclose the sources
covered and fill the gaps to provide an excellent free tool for scholarly information
discovery and retrieval.
References
Jacso, P. (2004), Peters digital ready reference shelf Google Scholar, (web-only document),
available at: http://GoogleScholar.notlong.com
Price, G. (2004), Google Scholar documentation and large PDF files, (web-only document),
available at: http://blog.searchenginewatch.com/blog/041201-105511
Sullivan, D. (2004), Search engine size war erupts, (web-only document), available at: http://
blog.searchenginewatch.com/blog/041111-084221
20. Pter Jacs. 2012. Using Google Scholar for journal impact factors and the hindex in nationwide
publishing assessments in academia siren songs and airraid sirens. Online Information Review 36:3,
462-478. [Abstract] [Full Text] [PDF]
21. Andreas G.F. Hoepner, Benjamin Kant, Bert Scholtens, Pei-Shan Yu. 2012. Environmental and ecological
economics in the 21st century: An age adjusted citation analysis of the influential articles, journals, authors
and institutions. Ecological Economics 77, 193-206. [CrossRef]
22. Panagiota Altanopoulou, Maria Dontsidou, Nikolaos Tselios. 2012. Evaluation of ninety-three major
Greek university departments using Google Scholar. Quality in Higher Education 18, 111-137. [CrossRef]
23. Kayvan Kousha, Mike Thelwall, Somayeh Rezaie. 2011. Assessing the citation impact of books: The role
of Google Books, Google Scholar, and Scopus. Journal of the American Society for Information Science and
Technology 62:10.1002/asi.v62.11, 2147-2164. [CrossRef]
24. Gerry Kerr. 2011. What Simon said: the impact of the major management works of Herbert Simon.
Journal of Management History 17:4, 399-419. [Abstract] [Full Text] [PDF]
25. Carol Haigh, Pip Hardy. 2011. Tell me a story a conceptual exploration of storytelling in healthcare
education. Nurse Education Today 31, 408-411. [CrossRef]
26. Chang-Wook Jeung, Hea Jun Yoon, Sunyoung Park, Sung Jun Jo. 2011. The contributions of human
resource development research across disciplines: A citation and content analysis. Human Resource
Development Quarterly 22:10.1002/hrdq.v22.1, 87-109. [CrossRef]
27. Xiaotian Chen. 2010. Google Scholar's Dramatic Coverage Improvement Five Years after Debut. Serials
Review 36, 221-226. [CrossRef]
28. Xiaotian Chen. 2010. Google Scholar's Dramatic Coverage Improvement Five Years after Debut a. Serials
Review 36, 221-226. [CrossRef]
29. Carol A. Haigh. 2010. Reconstructing nursing altruism using a biological evolutionary framework. Journal
of Advanced Nursing 66:10.1111/jan.2010.66.issue-6, 1401-1408. [CrossRef]
30. Xiaotian Chen. 2010. The Declining Value of Subscription-based Abstracting and Indexing Services in
the New Knowledge Dissemination Era. Serials Review 36, 79-85. [CrossRef]
31. Dirk Lewandowski. 2010. Google Scholar as a tool for discovering journal articles in library and
information science. Online Information Review 34:2, 250-262. [Abstract] [Full Text] [PDF]
32. Dan Ben-David. 2010. Ranking Israels economists. Scientometrics 82, 351-364. [CrossRef]
33. Bhaskar MukherjeeWeb citation analysis of journals contents 211-249. [CrossRef]
34. D. Jordan Lowe, David D. Van Fleet. 2009. Scholarly achievement and accounting journal editorial board
membership. Journal of Accounting Education 27, 197-209. [CrossRef]
35. Eric T Meyer, Ralph Schroeder. 2009. The world wide web of research and access to knowledge. Knowledge
Management Research & Practice 7, 218-233. [CrossRef]
36. Geoffrey N. Soutar, Jamie Murphy. 2009. Journal quality: A Google Scholar analysis. Australasian
Marketing Journal (AMJ) 17, 150-153. [CrossRef]
37. Tomislav Hengl, Budiman Minasny, Michael Gould. 2009. A geostatistical analysis of geostatistics.
Scientometrics 80, 491-514. [CrossRef]
38. M. Auffhammer. 2009. The State of Environmental and Resource Economics: A Google Scholar
Perspective. Review of Environmental Economics and Policy 3, 251-269. [CrossRef]
39. Bhaskar Mukherjee. 2009. Do open-access journals in library and information science have any scholarly
impact? A bibliometric study of selected open-access journals using Google Scholar. Journal of the
American Society for Information Science and Technology 60:10.1002/asi.v60:3, 581-594. [CrossRef]
40. Lutz Bornmann, Werner Marx, Hermann Schier, Erhard Rahm, Andreas Thor, Hans-Dieter Daniel.
2009. Convergent validity of bibliometric Google Scholar data in the field of chemistryCitation counts
for papers that were accepted by Angewandte Chemie International Edition or rejected but published
elsewhere, using Google Scholar, Science Citation Index, Scopus, and Chemical Abstracts. Journal of
Informetrics 3, 27-35. [CrossRef]
41. Michael Thelwall. 2009. Introduction to Webometrics: Quantitative Web Research for the Social Sciences.
Synthesis Lectures on Information Concepts, Retrieval, and Services 1, 1-116. [CrossRef]
42. G.E. Gorman. 2008. They can't read, but they sure can count. Online Information Review 32:6, 705-708.
[Abstract] [Full Text] [PDF]
43. Michael Norris, Charles Oppenheim, Fytton Rowland. 2008. The citation advantage of open-access
articles. Journal of the American Society for Information Science and Technology 59:10.1002/asi.v59:12,
1963-1972. [CrossRef]
44. Pter Jacs. 2008. The pros and cons of computing the hindex using Web of Science. Online Information
Review 32:5, 673-688. [Abstract] [Full Text] [PDF]
45. Dora Marinova, Peter Newman. 2008. The changing research funding regime in Australia and academic
productivity. Mathematics and Computers in Simulation 78, 283-291. [CrossRef]
46. Philipp Mayr, Anne-Kathrin Walter. 2008. Studying Journal Coverage in Google Scholar. Journal of
Library Administration 47, 81-99. [CrossRef]
47. Mark Sanderson. 2008. Revisitingh measured on UK LIS and IR academics. Journal of the American Society
for Information Science and Technology 59:10.1002/asi.v59:7, 1184-1190. [CrossRef]
48. John J. Meier, Thomas W. Conkling. 2008. Google Scholars Coverage of the Engineering Literature:
An Empirical Study. The Journal of Academic Librarianship 34, 196-201. [CrossRef]
49. David Ettinger. 2008. The Triumph of Expediency: The Impact of Google Scholar on Library
Instruction. Journal of Library Administration 46, 65-72. [CrossRef]
50. Pter Jacs. 2008. Google Scholar revisited. Online Information Review 32:1, 102-114. [Abstract] [Full
Text] [PDF]
51. Alastair G. Smith. 2008. Benchmarking Google Scholar with the New Zealand PBRF research assessment
exercise. Scientometrics 74, 309-316. [CrossRef]
52. Kayvan Kousha, Mike Thelwall. 2008. Sources of Google Scholar citations outside the Science Citation
Index: A comparison between four science disciplines. Scientometrics 74, 273-294. [CrossRef]
53. Mike Thelwall, Xuemei Li, Franz Barjak, Simon Robinson. 2008. Assessing the international web
connectivity of research groups. Aslib Proceedings 60:1, 18-31. [Abstract] [Full Text] [PDF]
54. Judit Bar-Ilan. 2008. Informetrics at the beginning of the 21st centuryA review. Journal of Informetrics
2, 1-52. [CrossRef]
55. Philipp Mayr, AnneKathrin Walter. 2007. An exploratory study of Google Scholar. Online Information
Review 31:6, 814-830. [Abstract] [Full Text] [PDF]
56. Lokman I. Meho, Kiduk Yang. 2007. Impact of data sources on citation counts and rankings of LIS
faculty: Web of science versus scopus and google scholar. Journal of the American Society for Information
Science and Technology 58:10.1002/asi.v58:13, 2105-2125. [CrossRef]
57. William H. Walters. 2007. Google Scholar coverage of a multidisciplinary field. Information Processing &
Management 43, 1121-1132. [CrossRef]
58. Glenn Haya, Else Nygren, Wilhelm Widmark. 2007. Metalib and Google Scholar: a user study. Online
Information Review 31:3, 365-375. [Abstract] [Full Text] [PDF]
59. Kayvan Kousha, Mike Thelwall. 2007. Google Scholar citations and Google Web/URL citations: A multidiscipline exploratory analysis. Journal of the American Society for Information Science and Technology
58:10.1002/asi.v58:7, 1055-1065. [CrossRef]
60. Michael Norris, Charles Oppenheim. 2007. Comparing alternatives to the Web of Science for coverage of
the social sciences literature. Journal of Informetrics 1, 161-169. [CrossRef]
61. Mary L. Robinson, Judith Wusteman. 2007. Putting Google Scholar to the test: a preliminary study.
Program 41:1, 71-80. [Abstract] [Full Text] [PDF]
62. Alastair SmithInformation sources 105-119. [CrossRef]
63. Rena Helms-Park, Pavlina Radia, Paul Stapleton. 2007. A preliminary assessment of Google Scholar as a
source of EAP students' research materials. The Internet and Higher Education 10, 65-76. [CrossRef]
64. Norbert Lossau, Sabine Rahmsdorf, Dirk Lewandowski, Philipp Mayr. 2006. Exploring the academic
invisible web. Library Hi Tech 24:4, 529-539. [Abstract] [Full Text] [PDF]
65. Xiaotian Chen. 2006. MetaLib, WebFeat, and Google. Online Information Review 30:4, 413-427.
[Abstract] [Full Text] [PDF]
66. Mike McGrath. 2006. Interlending and document supply: a review of the recent literature 56. Interlending
& Document Supply 34:3, 140-147. [Abstract] [Full Text] [PDF]
67. Hans Ragneskog, Linda Gerdner. 2006. Competence in nursing informatics among nursing students
and staff at a nursing institute in Sweden. Health Information and Libraries Journal 23:10.1111/
hir.2006.23.issue-2, 126-132. [CrossRef]
68. Alan Dawson, Val Hamilton. 2006. Optimising metadata to make highvalue content more accessible to
Google users. Journal of Documentation 62:3, 307-327. [Abstract] [Full Text] [PDF]
69. MiguelAngel Sicilia, Peter Jacso. 2006. Deflated, inflated and phantom citation counts. Online
Information Review 30:3, 297-309. [Abstract] [Full Text] [PDF]
70. Pter Jacs. 2006. Dubious hit counts and cuckoo's eggs. Online Information Review 30:2, 188-193.
[Abstract] [Full Text] [PDF]
71. Kiduk Yang, Lokman I. Meho. 2006. Citation Analysis: A Comparison of Google Scholar, Scopus, and
Web of Science. Proceedings of the American Society for Information Science and Technology 43, 1-15.
[CrossRef]
72. Pter Jacs. 2005. Visualizing overlap and rank differences among webwide search engines. Online
Information Review 29:5, 554-560. [Abstract] [Full Text] [PDF]
73. Burton Callicott, Debbie Vaughn. 2005. Google Scholar vs. Library Scholar. Internet Reference Services
Quarterly 10, 71-88. [CrossRef]
74. Pter Jacs. 2005. Options for presenting search results. Online Information Review 29:4, 412-418.
[Abstract] [Full Text] [PDF]