Documente Academic
Documente Profesional
Documente Cultură
Bedside
NLP
clinical Text Analysis and Knowledge Extraction System
Document Version:
1.0
1.6
Table of Contents
Document Management_________________________________________________________3
1.
Introduction_______________________________________________________________4
2.
Design____________________________________________________________________5
2.1
2.1.1
2.1.2
2.1.3
2.1
3.
cTAKES GUI_________________________________________________________________5
Input______________________________________________________________________________5
Output____________________________________________________________________________5
The NLP processing pipeline___________________________________________________________6
Tables____________________________________________________________________8
3.1
Table________________________________________________________________________8
3.1
4.
Data Objects______________________________________________________________10
5.
Data Permission___________________________________________________________12
6.
Limitations_______________________________________________________________13
Page 2 of 16
DOCUMENT MANAGEMENT
Revision
Number
Date
Author
Description of change
1.0
04/12/12
Pei J. Chen
Initial Draft
Page 3 of 16
1. INTRODUCTION
There is a wealth of information within the plain text clinical narrative. The purpose of
this cell is to harness the unstructured information by allowing i2b2 users to query and
join that information with existing i2b2 concepts. Currently, the entire note is commonly
stored as a single row in the observation_blob field in the observation_fact table in i2b2.
One of NLP cTAKES features is its capability to read through and extract concepts from
plain text notes and transform them into structured and normalized information. The
purpose of this cell is to incorporate cTAKES and i2b2 by formatting the output of
cTAKES into the i2b2 observation_fact table format (facts, concepts, modifiers, and
values) which can then be easily queried by existing i2b2 interfaces.
There will be 2 main components:
1. An administrative tool (cTAKES GUI) that will allow users to specify the input
DataSource of the note(s), the output of the notes(s), and the NLP pipeline to be
used. The cTAKES GUI will be designed to be a web interface (packaged a war
file to be easily deployed to standard servlet containers such as Tomcat). The
configuration information will be stored and could be reused for future
experiments.
2. An interface for users to query the extracted data. We plan to reuse the existing
web client tool by adding an NLP ontology which contains all of the concepts
that could be used to filter and joined with other ontologies such as
demographics or codified data.
Page 4 of 16
2. DESIGN
2.1
cTAKES GUI
2.1.1 Input
Users will be able to specify the source of the notes and flexible enough to also enter
their custom own SQL.
Page 5 of 16
2.1.2 Output
The DataSouce for the output should also be a relational database (specially
designed to be the i2b2 observation_fact table itself.) However, the UI will allow
users to specify exactly which DB/table they would like populated.
Example SQL template: insert
into i2b2_stg_db.dbo.Observation_Fact_NLP
(encounter_num,patient_num,concept_cd,provider_id,start_date,modifier_cd,valtype_c
d,tval_char,nval_num,observation_blob) values (?,?,?,?,?,?,?,?,?,?) [Note: All of
these fields are REQUIRED in order to populate the i2b2 output format correctly.]
patient_n
um
1189799
concept_cd
SNO:155673
008
provider
_id
2030
start_d
ate
00:00.0
modifier
_cd
@
valtype
_cd
T
tval_c
har
-1
nval_n
um
NULL
observation_
blob
reflux
Note: Currently we using the tval_char value for polarity (negation) indicator. In the future,
attributes of identified annotations may be stored as modifiers in separate rows.
Page 6 of 16
Page 7 of 16
1.1
Page 8 of 16
2. TABLES
The user metadata (data describe the user configuration data) will be stored in a selfcontained relational database (Hypersonic) embedded within the GUI.
2.1
Table
1.1
We are using the liquibase tool to manage the DDLs of the cTAKES GUI
configuration/metadata tables. The latest version could be found with the source code
under: src/main/resources/db/1.xml.
Note: The target i2b2 observation_fact table is not included here, but could be found in
i2b2s core documentation.
<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ext="http://www.liquibase.org/xml/ns/dbchangelog-ext"
xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-2.0.xsd
http://www.liquibase.org/xml/ns/dbchangelog-ext http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-ext.xsd">
<changeSet author="dev" id="1">
<createTable tableName="CTAKES_USER">
<column autoIncrement="true" name="id" type="BIGINT">
<constraints nullable="false" primaryKey="true"
primaryKeyName="PK_User" />
</column>
<column name="userName" type="varchar(100)">
<constraints nullable="false" unique="true" />
</column>
<column name="name" type="varchar(254)" />
<column name="firstName" type="varchar(254)" />
<column name="email" type="varchar(254)">
<constraints nullable="false" />
</column>
<column name="passwordHash" type="varchar(80)" />
<column name="locale" type="varchar(8)" />
<column name="enabled" type="BOOLEAN" />
<column name="createDate" type="DATETIME" />
</createTable>
</changeSet>
<changeSet author="dev" id="2">
<createTable tableName="CTAKES_ROLE">
<column autoIncrement="true" name="id" type="BIGINT">
<constraints nullable="false" primaryKey="true"
primaryKeyName="PK_Role" />
</column>
<column name="name" type="varchar(50)">
<constraints nullable="false" />
</column>
</createTable>
</changeSet>
<changeSet author="dev" id="3">
<createTable tableName="CTAKES_USERROLES">
<column name="userId" type="BIGINT">
<constraints nullable="false" />
</column>
<column name="roleId" type="BIGINT">
<constraints nullable="false" />
</column>
</createTable>
<addPrimaryKey columnNames="userId,roleId"
constraintName="PK_UserRoles" tableName="CTAKES_USERROLES" />
<addForeignKeyConstraint baseColumnNames="userId"
baseTableName="CTAKES_USERROLES" constraintName="FK_UserRoles_User"
referencedColumnNames="id" referencedTableName="CTAKES_USER" />
<addForeignKeyConstraint baseColumnNames="roleId"
baseTableName="CTAKES_USERROLES" constraintName="FK_UserRoles_Role"
referencedColumnNames="id" referencedTableName="CTAKES_ROLE" />
</changeSet>
<changeSet author="dev" id="4">
<insert tableName="CTAKES_ROLE">
<column name="name" value="ROLE_ADMIN" />
</insert>
<insert tableName="CTAKES_ROLE">
<column name="name" value="ROLE_USER" />
</insert>
</changeSet>
<changeSet author="dev" id="5">
<createTable tableName="CTAKES_CONFIG_PARAM">
<column name="param_name" type="varchar(254)">
<constraints nullable="false" unique="true" />
</column>
<column name="param_value" type="varchar(254)" />
</createTable>
</changeSet>
<changeSet author="dev" id="6">
<createTable tableName="CTAKES_CONFIG_DATASOURCE">
<column autoIncrement="true" name="id" type="BIGINT">
<constraints nullable="false" primaryKey="true"
primaryKeyName="PK_datasource_id" />
</column>
<column name="name" type="varchar(254)" />
<column name="description" type="varchar(254)" />
<column name="ds_type" type="varchar(254)" />
<column name="ds_driverclass" type="varchar(254)" />
<column name="ds_url" type="varchar(254)" />
<column name="ds_col_name" type="varchar(254)" />
<column name="ds_table_name" type="varchar(254)" />
<column name="ds_sql" type="varchar(5000)" />
Page 9 of 16
Page 10 of 16
2. DATA OBJECTS
Data Objects in the cTAKES GUI are represented as Java DAOs Hibernate and are
injected by the Spring Framework. These entities and repository could be found in the
org.chboston.cnlp.ctakes.gui.entity and repository packages.
The web interface is build on top of an existing javascript framework (ExtJS). These
objects are also represented in the MVC pattern in javascript. These objects are
exposed via ExtDirect services library which maps javascript calls directly to the Java
Page 11 of 16
backed code/methods.
Page 12 of 16
3. DATA PERMISSION
The full original note should not be persisted locally by the GUI. Rather, it should read
through the note, extract the identified annotations, and only store the identified
annotations (either embedded or back to i2b2s DB).
The cTAKES GUI is designed to be an administrative tool. The user should be an Admin
have read/write rights to the input and target DataSources.
There is built in encryption support by the cTAKES GUI. i.e. If the original note stored in
the i2b2 observation_blob was encrypted, there will be a configurable input field in the
GUI for the users to specify the required key for decryption (The existing i2b2s
Encryption Java APIs are reused).
Page 13 of 16
Web Client:
Page 14 of 16
5. TECHNOLOGIES USED
Java 6
cTAKES
UIMA
ExtDirect
Spring
Jetty/Servlet Container
Liquibase
Page 15 of 16
6. LIMITATIONS
Currently, the GUI and the NLP processing are bundled and process together,
therefore limited to only 1 thread/1 instance of the pipeline per container. In the
future, these 2 components will be decoupled where the GUI only saves the jobs,
and off loads the NLP pipeline processing in a separate process.
Currently, we are populating the polarity (negation) attribute in the tval_char field.
In the future, these attributes may be stored as modifiers.
Currently, we support the extract the concepts defined in the full 2011AB UMLS
SNOMED-CT and RxNorm (w/Thesauruses from SNOMED CT, NCI
Thesaurus, Medical Subject Headings (Mesh), RxNorm). There is a
placeholder to allow users to enter in their own dictionaries but has not been
implemented yet.
Note: The cTAKES GUI was designed to output the data in the i2b2 format.
However, it can also be run as a stand-alone UI where the data could be
outputted to other RDMS or its embedded DB where it could be queried via
standard SQL.
Page 16 of 16