Sunteți pe pagina 1din 2

COMMENTARY

How many scientific papers are not original?


Michael Lesk1
measurements by Sir Cyril Burt. Burt had
Department of Library and Information Science, Rutgers University, New Brunswick, NJ 08901
studied what seemed to be a remarkable
number of identical twins raised apart. His
Is plagiarism afflicting science? In PNAS, Given the incentives, it is hardly surprising data were challenged soon after his death as
Citron and Ginsparg (1) count the number of that some authors are attempting to exploit too good to be true; the original notes were
authors who are submitting articles contain- the system. This can be surprisingly easy. gone, and his coworkers could not be found.
ing text already appearing elsewhere. They Delgado et al. (7) explain how they created Although there has been argument back and
report disturbing numbers of authors resort- a half-dozen fake papers, with several hun- forth, even his supporters have been de-
ing to copying, particularly in some countries dred citations. One of the authors saw his fending him by saying he was careless rather
where 15% of submissions are detected as citation count go up by a factor of 2 and than fraudulent and that other people
containing duplicated material. I am on the his h-index increased from 10 to 15. Fans studying genetics and intelligence have found
editorial board of an Institute of Electrical of bicycle racing may smile on reading that about the same level of correlation (10).
and Electronic Engineers (IEEE) magazine, the fake papers were attributed to Alberto More recently, two economists, Carmen
which also finds it useful to run all of the Pantini-Contador. Reinhart and Kenneth Rogoff, published
submissions through a plagiarism filter. What Refereeing, at least for some journals, a claim that economic growth slowed in
can be done about this? is pretty shaky. As cited by Citron and countries whose national debt exceeded
In 1830, Charles Babbage deplored unreli- Ginsparg, Bohannon (8) submitted a fake ar- 90% of gross domestic product. After 2 y,
able science. He discussed hoaxes, forgeries, ticle to more than 300 open access journals, they gave their spreadsheet to researchers
data trimming, and “cooking” (selecting data and more than half accepted it. Following at the University of Massachusetts, who
to match a theory) (2). Today, doubtful up, he found that one of these journals found several errors; for example, the first few
papers may be plagiarized, invented, or mis- had plagiarized its own description from a countries in alphabetical order had been
taken. This paper documents problems left out of the calculation. A corrected
at one extreme: straightforward pla-
One bright spot in the spreadsheet did not show the same abrupt
giarism within one publisher. More com- Citron and Ginsparg slowdown in growth, but the original pa-
per had already been used to justify a
plex deceptions can be found at the site paper is that plagiarism change to budget-balancing policies in major
retractionwatch.com, which includes, among
other examples, invented or fraudulent
is concentrated: they economies (11).
data. Mistaken research was highlighted note that a small number Returning to the simpler problem of
plagiarism, it can extend beyond individual
in an important study by Begley and Ellis, of authors produce a papers. In 2009, a conference in Hainan,
who found that it was impossible to repli-
cate 47 of 53 oncology studies that they
disproportionate China, called itself the “International Joint
attempted to repeat (3). At a time when share of the doubtful Conference on Artificial Intelligence.” That
name is very familiar to artificial intelligence
important scientific questions are under at- submissions. researchers as the title of a major conference
tack, we need to improve confidence in
our publications. reputable journal in the same subject area. held regularly since 1969. However, the
How can we increase our level of trust in The scholarlyoa.com site attempts to catalog conference with the long history met in
the scientific literature? In 2012, more than 2 the doubtful publishers and their journals. Pasadena in 2009; the Hainan conference just
million papers were published (4). They ap- Much more common than completely fake borrowed the name. Perhaps it is not sur-
pear in publications ranging from highly papers is the boosting of publication count by prising that the Hainan conference in-
competitive and prestigious journals such dividing one’s reports into multiple short cluded several papers that had come from the
as Nature, Science, Lancet, and this journal, papers, an idea that has been called the “least SCIGen chatterbot or some similar program.
publishable unit” since the 1970s. Some pub- Here is a sentence from one abstract (since
down to the predatory publishers listed in
scholarlyoa.com who will print pretty much
lishers or conference organizers join in the removed from IEEE Xplore): “Furthermore,
manipulation. Whilhite and Fong describe it explored a pervasive tool for enabling
anything for a fee. University faculty, in par-
an editor who asked prospective authors pasteurization, which is used to show that
ticular, are encouraged to publish because the
to add citations to his journal to their context-free grammar and B-trees are largely
reward systems often depend on publication
articles to increase the impact factor of compatible.” Chatterbot output can now be
and citation counts as ways of evaluating
the journal (9). detected automatically (12) and publish-
merit. The h-index is the modern equivalent
of the old saying “Deans can’t read, they can Consequences ers find themselves, regrettably, forced to
only count.” In some countries, having a pa- Deception and mistake can have real con-
per accepted in a top journal can mean a cash sequences outside of science. For decades, the Author contributions: M.L. wrote the paper.

bonus, with Zhejiang University offering a UK educational system emphasized the “11- The author declares no conflict of interest.

$30,000 payment to an author who publishes plus” examination, justified by a belief in the See companion article on page 25.
in Science or Nature (5, 6). inheritability of intelligence that came from 1
Email: lesk@acm.org.

6–7 | PNAS | January 6, 2015 | vol. 112 | no. 1 www.pnas.org/cgi/doi/10.1073/pnas.1422282112


COMMENTARY
use such software, as well as anti-copying with some combination of carrots and sticks, Some ignore the flag, and some say that what
utilities. encourage the institutions in all countries to they are doing is acceptable practice. These
Plagarism would matter less if counting enforce standards? There are very few in- responses suggest that some additional
articles was less significant than under- dividual scientists today, and approaching the response is needed (although Citron and
standing them. ArXiv at least does not claim institutions might be the best way to affect Ginsparg do not say how many authors re-
to referee submissions; anyone using it knows a change in attitude. spond to the warning in which way).
that they have to read and evaluate the con- For example, recently I received a request Nature published a discussion on plagia-
tent for themselves. This, of course, trans- from someone in Asia who wanted to be a rism 2 y ago, and in it, Zhang and McIntosh
fers the burden of judgment from a small postdoctoral researcher in our department in
suggested keeping a blacklist of individuals
number of referees to the much larger the United States. I took the first two para-
(14). They note that this should be a
number of potential readers. In addition, graphs of his research statement and found
many of those readers may be students, or them on a commercial website of a US multipublisher effort and that it is unclear
in a different discipline, and be less able company. Should I have told this to the head who would run it or pay for it (14). I would
to evaluate a paper. This is why we have the of his institution? Right now, we don’t do suggest one further step: identify depart-
current publication system, but it is being that, partly out of politeness and partly out of ments, and perhaps institutions, where the
abused by researchers who know that for fear of lawsuits. However, when Citron and problems are arising. Publishers should sug-
some purposes, the main question being Ginsparg write that some of the people whose gest that they will blacklist the entire de-
asked of a candidate for hiring or promo- plagiarism is detected reply by asking to be partment (or, if need be, the institution).
tion is “how many articles?” told which parts were found to be copied, Intermediate forms of punishment are possi-
Mere number of publications is not what presumably to learn how to evade detection ble, such as delaying publication rather than
is really important. When challenged as a in the future, one despairs. denying it entirely.
“half-wit,” the Roman emperor Claudius, at For experimental studies, the move to re- In summary, this paper describes the scope
least in the British Broadcasting Corpora- quiring data availability will be a step for- of plagiarism within arXiv. The good news
tion version of his life, replied that it is ward. If an author did not actually write the is that the tools used to detect plagiarism
quality rather than quantity of wits that paper under discussion, presumably that work effectively and efficiently, the copied
matters (13). Similarly, the National Sci- author does not have the data behind it. The
papers are concentrated by author and by
ence Foundation asks those who submit data can be copied as well, but that offers
country, and the copied papers are less cited.
proposals to list five important and relevant another chance for automated tools to spot
papers and not to attempt to drown the ref- the duplication, and one where paraphrasing The bad news is that the problem is real and
erees in dozens (or hundreds) of articles. is more complicated. in some countries severe. ArXiv is now
Fortunately, one bright spot in the Citron ArXiv is trying to motivate authors by identifying the papers that have substantial
and Ginsparg paper is that plagiarism is flagging papers that contain overlap. Readers overlap and is waiting to see if that affects the
concentrated: they note that a small number are then on notice that the paper has a submissions. Perhaps the publishing com-
of authors produce a disproportionate share problem; unfortunately, authors do not nec- munity as a whole should be preparing to see
of the doubtful submissions. In addition, essarily react with shame or withdrawal. if stronger steps are needed.
those articles are not the heavily cited ones,
suggesting that they have less influence. Also,
there are many important countries where 1 Citron DT, Ginsparg P (2015) Patterns of text 8 Bohannon J (2013) Who’s afraid of peer review? Science
reuse in a scientific corpus. Proc Natl Acad Sci USA 112: 342(6154):60–65.
the plagiarism rate is low. Conversely, the 25–30. 9 Wilhite AW, Fong EA (2012) Scientific publications. Coercive
methodology of the paper relies on exact text 2 Babbage C (1830) Reflections on the Decline of Science in citation in academic publishing. Science 335(6068):542–543.
overlap; it will not detect, for example, an England, and on Some of Its Causes (B. Fellowes, London). 10 Plucker JA, Esping A, eds. (2014) The Cyril Burt affair. Human
3 Begley CG, Ellis LM (2012) Drug development: Raise standards for Intelligence: Historical Influences, Current Controversies, Teaching
article translated from another language, nor preclinical cancer research. Nature 483(7391):531–533. Resources. Available at: www.intelltheory.com. Accessed November
one which paraphrases but adds nothing to 4 Reich ES (2013) Science publishing: The golden club. Nature
23, 2014.
502(7471):291–293.
its source. 11 Krugman P (2013) How the case for austerity has crumbled. The
5 Davis P (2011) Paying for impact: Does the Chinese model make
New York Review of Books. Available at: www.nybooks.com/articles/
sense? Available at: scholarlykitchen.sspnet.org/2011/04/07/paying-
Possible Actions for-impact-does-the-chinese-model-make-sense/. Accessed
archives/2013/jun/06/how-case-austerity-has-crumbled. Accessed on
November 27, 2014.
What can we do? This paper observes a November 24, 2014.
12 Labbé C, Labbé D (2013) Duplicate and fake publications in the
6 Shao J, Shen H (2011) The outflow of scientific papers from
strong cultural connection with plagiarism: China: Why is it happening and can it be stemmed? Learn Publ scientific literature: How many SCIgen papers in computer science?
there are some countries in which 15% of the 24(2):95–97. Scientometrics 94(1):379–396.
13 Pullman J (1976) I, Claudius [television production], director Wise H
submissions to arXiv are plagiarized, and 7 Lopez-Cozar E, Robinson-Garcia N, Torres-Solinas D (2013)
Manipulating Google Scholar citations and Google Scholar metrics: (British Broadcasting Corporation).
others in which very few papers are copying Simple, easy and tempting. Available at: arxiv.org/abs/1212.0638. 14 Zhang Y, McIntosh I (2012) How to stop plagiarism: Blacklist
from others. Can the scientific community, Accessed November 27, 2014. repeat offenders. Nature 481(7379):22.

Lesk PNAS | January 6, 2015 | vol. 112 | no. 1 | 7

S-ar putea să vă placă și