Sunteți pe pagina 1din 2

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/319914511

An Easy-to-Use Clinical Text De-identification Tool for Clinical Scientists: NLM


Scrubber

Poster · November 2015


DOI: 10.13140/RG.2.2.13587.37921

CITATIONS READS

0 50

5 authors, including:

Mehmet Kayaalp Zeyno A Dodd


National Institutes of Health 17 PUBLICATIONS   116 CITATIONS   
44 PUBLICATIONS   281 CITATIONS   
SEE PROFILE
SEE PROFILE

Pamela Sagan Clement Mcdonald


National Institutes of Health U.S. Department of Health and Human Services
5 PUBLICATIONS   11 CITATIONS    315 PUBLICATIONS   13,284 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Public Health View project

NLM-Scrubber View project

All content following this page was uploaded by Mehmet Kayaalp on 19 September 2017.

The user has requested enhancement of the downloaded file.


An Easy-to-Use Clinical Text De-identification Tool for Clinical Scientists:
NLM Scrubber

Mehmet Kayaalp, MD, PhD, Allen C. Browne, MS,


Zeyno A. Dodd, Ph.D., Pamela Sagan, RN, Clement J. McDonald, MD
Lister Hill National Center for Biomedical Communications,
U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD

Abstract
Health Insurance Portability and Accountability Act (HIPAA) requires that clinical documents be stripped of
personally identifying information prior to their secondary use for clinical research. We have been studying clinical
text de-identification for more than a decade and developing NLM Scrubber²it is a tool for every clinical scientist
who conducts retrospective research using clinical reports. Although we continuously improve and add new
functionalities to it, it is very simple to install and use.
1. Introduction
The Privacy Rule of Health Insurance Portability and Accountability Act (HIPAA) requires that clinical documents
be stripped of personally identifying information before they can be released to researchers and others; however,
manual clinical text de-identification is an arduous task. Furthermore, human annotators alone are usually not as
accurate as automatic clinical text de-identification systems. Even though no automatic de-identifier is perfect, they
can quickly produce de-identified text, which can then be easily reviewed and verified by the data providers for their
de-identification accuracy. If and when the de-identified text needs revisions, the necessary editing is usually minimal.
2. NLM Scrubber
The major downside of commercial de-identifiers is obviously their cost, which many clinical scientists may be unable
to afford. Most other de-identifiers found in the literature have been developed for research purposes only and are not
available. Besides their no cost to the user, the major advantage of the few freely available de-identifiers is that they
can be easily tested, evaluated and verified by independent third parties. The freely available de-identifiers can be
further divided into two categories depending on their training data requirements. De-identifiers that require training
data impose significant burden on their users demanding a large set of clinical documents annotated in compliance
with their prerequisite format.
NLM Scrubber is a freely available automatic clinical text de-identification tool with full support by its developers.
Furthermore, it does not impose any annotation requirement on clinical scientists; i.e., no text to be annotated to run
the application for producing de-identified clinical reports. Although we continuously add sophisticated functionalities
to NLM Scrubber, we strive to keep the user interface as simple as possible so that novice users can operate it easily.
NLM Scrubber is a product of several years of studies on clinical text de-identification.1 2 We recently rewrote NLM
Scrubber converting it from a pure research product to a consumer product. The user needs to fill out a short form
stating mainly where the text files are located in the computer. At this point in time, it can accept only ASCII text
reports formatted with proper capitalization; i.e., it would not perform well on all lowercase or all uppercase text.
The system is available on three platforms: Windows, Linux, and Mac OS X. It can be downloaded from
http://scrubber.nlm.nih.gov.
Funding and Competing Interests
This work was supported by the Intramural Research Program of the National Institutes of Health, National Library
of Medicine. The first author receives royalties from University of Pittsburgh for his contribution to a de-identification
SURMHFW1/0¶V(WKLFV2IILFe reviewed and approved his appointment.
References
1. Kayaalp M, Browne AC, Callaghan FM, Dodd ZA, Divita G, Ozturk S, et al. The Pattern of Name Tokens in
Narrative Clinical Text and a Comparison of Five Systems for Redacting them. J Am Med Inform Assn 2013.
2. Kayaalp M, Browne AC, Dodd ZA, Sagan P, McDonald CJ. De-identification of Address, Date, and Alphanumeric
Identifiers in Narrative Clinical Reports. Proceedings of the Annual American Medical Informatics Association
Fall Symposium 2014.

1522

View publication stats

S-ar putea să vă placă și