A Time-Dependent, Discrete Logic-Based Computer Model of Mitogenic Neutrophil Signaling Circuitry Derived Using NLP Information Extraction Software

rebro University
School of Medicine
Degree project, 15 ECTS
June 2016
A time-dependent, discrete logic-based

computer model of mitogenic neutrophil
signaling circuitry derived using NLP
information extraction software
Author Sebastian Hansen

Supervisor Robert Kruse, PhD
Abstract
Background Molecules in cells operate in chains that facilitate communication.
These intracellular pathways are similar to the digital circuits in computers, and
they can be modeled in this fashion, as so-called logical circuits. Computers
have driven the influx of information in modern biology, via high-throughput
methodology that has been instrumental in characterizing protein pathways.
Computers are ideal tools for creating models of the data volumes they create.
Constructing a logic model of a subset of communication proteins is the topic of
this paper. The present studys aim was to construct a model using data from
protein databases, with the hypothesis that such a model could be verified using
experimental data collected in this study.
Results The model did not output dynamic states when subjected to various
input conditions, but rather exhibited static behavior.
Conclusion One possible reason for the models stationary nature is a potential
poor performance of the underlying data set. A more careful selection of
proteins from a high-quality pool of proteins could feasibly improve results.
Given the improper model output, gauging the functioning of the model against
an experimental benchmark is not attainable.
Abbreviations
ANOVA: analysis of variance
MAPK: mitogen-activated protein kinase
DE: differential equation
MQ: Milli-Q
DM: data mining
NaCl: sodium chloride
DMEM: Dulbeccos modied Eagle's medium
NLP: natural language processing
EMEM: Eagle's minimal essential medium
PBS: phosphate-buffered saline
FCS: fetal calf serum
PIN: protein interaction network
fMLP: N-formylmethionyl-leucyl-phenylalanine
PIP3: phosphatidylinositol-3,4,5-tri-
Hi-FBS: heat-inactivated fetal bovine serum
phosphate
IL-1b: interleukin 1
PPI: protein-protein interaction
IL-8: interleukin 8 (CXCL8)
PS: phosphatidylserine
IPA: Ingenuity Pathway Analysis
RPM: revolutions per minute
KEGG: Kyoto Encyclopedia of Genes and
RPMI: Roswell Park Memorial Institute
Genomes
medium
LPS: lipopolysaccharide
1
Introduction
The cell has elaborate instruments for sensing its external and internal milieu. This allows the
cell to continuously evaluate its environment, and to asses queries such as what is the
immediate environments pH? or what is the present cytoplasmic calcium level? [1,2]
Depending on ambient conditions, the cell can take various actions, ranging from simply
continuing its day-to-day-activities in an unchanging environment, to deciding to proliferate
in a nutritionally rich setting or executing a cell death
program under adverse conditions. The cell can also
produce its own signal molecules that enable it to
communicate with other cells [3].
The concepts of reading the environment, processing
environmental data and taking actions based upon
environmental conditions are fundamental principles
of cell signaling [1,2]. In many respects, cell signaling
works like a computer [4]. Computers take data,
Figure 1. The AND gate. (a) The
symbol of an AND gate. A and B are
the inputs of the gate, X is the gate
output. (b) The truth table for the AND
gate. Given the inputs A and B, the
output X will return true (1) only if A
and B are true. Any other
combination of A and B returns a
false X (0).
process information, and execute responses [5]. A

computer is made possible in part by logic gates. A
logic gate is a device which performs a logical
operation on one or more Boolean input signals to
produce a Boolean output signal, typically used in
digital circuits [6]. Boolean signifies data that can only have two states: true (1, on) or
false (0, off).
In a computer, logic gates take input as voltage, and output voltage levels depending on a set
of rules. For example, the AND gate (fig 1a) will return true (i.e., 1) only if both its input
values are true for all other combinations of inputs, it will return false (0). Combinations of
inputs and outputs are listed in truth tables (fig 1b). There are also other forms of gates, such
as OR and NOT gates (fig 3).
Logic gates are what allow computers to perform calculations and store information, among
other applications. The idea that cell signaling is comparable to a computer has led to logic
gates being used to model biological systems [7]. The central idea is that, instead of voltages,
gate inputs can be cellular parameters, such as protein-protein interactions, enzyme activity,
gene expression and more. Protein-protein interactions (PPIs) are often represented by graphs
(fig 2). Graphs are extensively used in the branch of molecular biology that seeks to
understand proteins on the basis of their interactions [8-10]. Graphs contain symbols (termed
1
nodes) representing proteins, and interactions between these nodes (edges) [11]. The
number of edges a node has (i.e., the number of connections) is called the nodes degree ( )
[10].
Nodes
with
high
degrees
are
sometimes called hubs [10]. The number

of edges directed away from a node is called
its out-degree (
). The number of edges
directed towards a node is its in-degree

(
Figure 2. An example protein graph. Nodes
are named N04 . Edges are titled E13 . Nodes
represent proteins; edges represent proteinprotein interactions. The number of connections
(edges) a node has is called its degree (). N1
has = 1, N2 has = 3, and N0 = 0. N0 is
termed an isolated node.
).
The first implementation of logic gate

models in cell biology, by Kauffman,
described gene regulation as a series of logic
gates [12]. Logic-based formalism has since
been used to probe cell signaling in both health and disease.

Zhang et al. designed a logic model of cytotoxic T lymphocyte (CTL) signaling in T cell large
granular lymphocyte leukemia (T-LGL). Their model was built by collecting data on
physiological CTL-signaling, combined with established signaling abnormalities found in
CTLs involved in T-LGL. The result was a model that among other results could predict
the activation of T-bet, a protein that is indeed activated in T-LGL. The authors conclude that
the T-LGL-model could be used for unveiling targets for leukemia therapy [13].
Saez-Rodriguez et al. collected cytokine- and growth factor-related protein data from
literature. The data was used to build a model that was trained (calibrated) to experimental
results. The model identified interactions that were not present in the models underlying data
set, but were correctly supported in literature [14].
Aldridge et al. used a form of logic modeling known as fuzzy logic in constructing a
model of TNF, EGF and insulin signaling in human colon carcinoma. Fuzzy logic allowed the
group to predict node interactions that were previously unknown [15].
Sahin et al. scrutinized cell signaling in breast cancer. ERBB receptor regulation was studied;
ERBB is important for the regulation of cell cycle progression. From literature-derived
interactions they built a network of proteins important in cell cycle. The model was able to
characterize a putative target in breast cancer therapy [16].
Today, the perception that cellular signaling operates as a series of logic gates is wellestablished enough for it to be mentioned in an undergraduate textbook: Alberts et al. remark
in Molecular biology of the cell that [signal] integration depends in part on intracellular
2
coincidence detectors, which are equivalent to AND gates in the microprocessor of a

computer. [17]
A node that requires more than one protein present at the same time to be activated can be
modeled as an AND gate (fig 3). In addition to AND gates, there are several other types of
gates: OR gates, NOT gates, and buffers (also known as YES gates). The OR gate is activated
under any true input (1), including when all inputs are present (when all inputs equal 1) at the
same time. When a protein can be activated by any of several different proteins, including all
of them at once, it is modeled as an OR gate (fig 3).
Figure 3. Logic gate types. Several different logic gates. Logic gates take inputs and produce
outputs depending on the type of gate. AND and OR gates are binary (take two inputs each); YES
and NOT gates are unary (take one input each). Inputs are called A (and, for binary gates, also B).
Output is X. Descriptions are free-text explanations that are either from literature (AND gate [17],
OR gate [18,55]), or made up by the author (YES gate, NOT gate).
Proteins that are modeled as OR gates are highly connected (they have a large degree
compared to other nodes). Hubs with the highest degree, and with a lot of connectivity
between different functional modules, are arguably date hubs. Agarwal et al. [18] describe
date hubs as higher-level connectors between groups that perform varying functions, and
for this reason, OR gates are referred to here as group connectors (fig 3).
The basic units of signal transduction are conceivably activation and inhibition. To activate
something is defined as to initiate a process [6]. In the language of logic, this can be
implemented by modeling an activated node as a gate that simply passes along a signal
without changing it. It is posited here that such a signal conduction gate exists: the YES
gate (fig 3).
The inverse of activation is inhibition. To inhibit something is to restrain, check, hinder,
prevent, stop [6]. A node that inhibits another node further downstream thus blocks the
conduction of a signal. Logic-wise, the NOT gate could be implemented as a signal blocker
(fig 3).
It is worth mentioning that, in addition to the gates presented in fig 3, there are three
additional types of logic gates (namely, NOR, XNOR and XOR gates). These gates are
arguably more intricate than those in fig 3, and were not included in the present study since
the aim was to make an elementary model of a part of a signaling pathway, using only the
basic logic gates described in fig 3. The aim of the present study was to construct a timedependent, discrete, binary model: time-dependent in the sense that it should be time-variant
(i.e., dynamic), discrete since it should only incorporate time in steps (as opposed to
continuous time), and binary on account of only handling true and false values.
Logic models are one of several possible ways of imitating cell signaling in silico. Morris et
al. [19] define a continuum of cell signaling models, ranging from complex differential
equation (DE) models to the graphical representations of proteins and their relationships in
protein interaction networks (PINs). While the former are biochemically exact and complex,
the latter are simple models that communicate relationships well but do not take
spatiotemporal factors into consideration [19]. A drawback of DE models is also the limited
number of proteins that can be included: contemporary DE models cannot incorporate more
than a few dozen elements [19]. PINs can contain large numbers of proteins, though they do
not allow for the modeling of input and output in networks [19].
A middle ground between DE- and PIN-based modeling is the logic gate model. It is versatile
enough to include many elements, yet retains some input-output capability, similar to that
available with mathematical (DE) models [19].
It is not within the scope of this project to provide in-depth details of different molecular logic
models. For reviews of such models, see Morris et al [19] and Watterson et al. [7].
Besides the use of logic gates in computer hardware, logic gates are also employed in
programming languages. As a consequence, programming is an ideal environment for
4
constructing a logic-dependent cell signaling model. A computer model implemented in the

programming language Python is a key piece of the present study.
High-throughput biologic analysis has enabled rapid simultaneous measuring on large

amounts of biomolecules. As a direct result of high-throughput technology, data accumulation
is brisk in contemporary biomedicine. Consequently, extracting or mining data from
large databases efficiently has become possible. Data mining (DM) is according to the
Oxford English Dictionary [6] the process or practice of examining large collections of
data in order to generate new information, typically using specialized computer software.
Mining PPI data from biomedical literature databases is an important tool in the elucidation of
molecular pathways. Data mining is applied in such diverse fields as molecular biology [20],
pharmacology [21], genetics [22,23], epidemiology [24], and clinical medicine [20,25]. The
basis for this projects model is PPIs collected using database software. The software
employed utilizes natural language processing (NLP). NLP is defined as a form of
computational linguistics in which natural-language texts are processed by computer (for
automatic machine translation, literary text analysis, etc.) [6]. NLP is in essence (text) data
mining in which a computer can distinguish and extract protein relationships from biomedical
literature [26].
In the present study, several nodes of the RTK-Ras-MAPK-pathway are included. Arguably,
the most important ones are Ras, MAPK and p53. Ras and MAPK are parts of the RTK-RasMAPK-pathway, which is highly conserved across species [27]. Consequently, protein data
from different species should be applicable to RRM in human peripheral neutrophil
granulocytes, which is the experimental model system for this project.
Aim
The aim of the present study was to evaluate if a biologic system can be simplified and
correctly modelled with logic gates. There are three essential parts in this degree project: (1)
the collection of data, both from pathway databases and from a laboratory setting, (2) the
usage of pathway data to construct a logical model, and (3) the comparison of the model
output to the experimental output (fig 4). The logic model will hence be constructed using
data mined from databases, and compared to experimentally collected data.
Figure 4: outline of project workflow. Classification blocks represent important project parts.
Layout adapted from Tsafnat et al. [54].
To assess the accuracy of the logic model, the following hypotheses were composed:
H0: there exists no correlation between the logic model and an experimental system when
comparing the logic model to experimental data.
HA: there exists a correlation between model and experimental system.
The main outcome of interest was apoptosis as detected via annexin V and measured by
fluorescence intensity but protein levels (phosphatidylserine) served as a marker of
apoptosis. To assay this, a control group of neutrophils was compared to test groups, which
were stimulated with LPS or IL-1b.
Hypotheses were also set up in regard to neutrophil experimental data:

H0: there is not a statistically significant apoptosis difference between control and test group
neutrophils.
HA: the difference in apoptosis between control and test group neutrophils is statistically
significant.
Materials and methods

Pathway identification. One set of nodes was predetermined for project inclusion, since the
project is aimed towards identifying nodes important in apoptosis and proliferation. These
nodes were Ras, MAPK and p53. Furthermore, a second set of nodes was identified in
literature regarding apoptosis in neutrophils: fMLP and phosphatidylserine. Neutrophils were
the model systems for experimental data collection. Another set of molecules was used in
experimental data collection, and thus included as nodes in the pathway: LPS, IL-1b, IL-8 and
annexin V. A final set of molecules was gathered from textbooks [28,29] and a biological
pathway database called Kyoto Encyclopedia of Genes and Genomes [30,31]. All these node
sets were combined to form a data set called prior knowledge network on which database
information retrieval could be carried out (see table 1).
Table 1. Criteria for choosing nodes to make up a foundation network. Prior to extracting nodes
from databases, some proteins were included in a fundamental, basic network, based on several
inclusion criteria. This included compiling nodes from textbooks and online protein repositories. The
figure details inclusion criteria and nodes found using that criterion. In total, 70 nodes were found, and
included in a data set called prior knowledge network.
Inclusion criteria
Important and well-known molecules
involved in proliferation and apoptosis [32,33].
Molecules relevant in neutrophil apoptosis and
survival [34,35].
Molecules used in experimental data
collection.
Molecules from textbooks (Alberts, Weinberg)
and database (KEGG) [28-31].
Nodes
Ras, MAPK, p53
N-formylmethionyl-leucyl-phenylalanine
(fMLP), phosphatidyl-serine (PS)
Lipopolysaccharide (LPS), IL-1b, IL-8
(CXCL8), annexin V
14-3-3, ATM, ATR, Akt, BAD, BCL2L1,
CASP8, CCND1, CD14, CDK2, CDK4/6,
Cyclin D1, CDKN1A, CDKN1B, CDKN2A,
CHEK1, CHEK2, CHUK, Cdc2, Creb,
Cyclin B, Cyclin D, Cyclin E, ERK, FADD,
FASLG, FOXO4, GADD45, Gap, Gef, IL1R, IL1, IRAK1, IRAK4, Ikk, Il8r, JUN,
MAP3K7, MDM2, MDM4, MYC, MYCN,
MYD88, Mek, NF1, PTEN, RASSF1,
RASSF5, RPS6KA, Raf, Sos, TAB1, TAB2,
TIRAP, TLR4, TRAF6, c-Myc/N-Myc,
phosphatidylinositol-3,4,5-tri-phosphate
(PIP3), FPR1, MST1
Total count
Count
3
2
4
61
70
Database information retrieval. Protein data was generated by using database software. The
software used was Qiagens Ingenuity Pathway Analysis [36] (IPA application build
377306M, content version 27216297).
One IPA feature is a grow function, which takes a user-defined network of nodes and
expands it by adding data (nodes and edges) from available repositories. There are many
parameters for selecting what data should be included: data sources, confidence level(s),
species, tissue and cell lines, mutations, relationship types (activation and inhibition, among
other types), publication date range, node types, and whether the data should be specific to
some disease(s) or organ(s). These parameters are adjusted by the user. The prior knowledge
network data set was used as a basis for building networks in IPA.
Data cleaning. When data had been collected using IPA, the raw data was subjected to
several criteria to remove unwanted nodes: any isolated node (i.e., any node with k = 0) was
removed, and all nodes and edges that did not have the interaction type activation and
inhibition were removed. The resultant data set was titled IPA output tweaked.
Assumptions made about node pathway. One assumption was made when constructing the
node pathway; in addition to extracting phosphatidylserine (PS) protein interactions, we
inferred analytically the relation of PS to other nodes in the pathway. Phosphatidylserine is a
phospholipid normally located in the inner cell membrane. However, during apoptosis PS
displaces to the outer cell membrane leaflet [37,38]. We manually expanded the IPA output
tweaked data set with two relevant edges, so that pro-apoptotic nodes also activate PS, and
anti-apoptotic nodes inhibit PS.
Logic interpretation of data. A number of assumptions were made when modeling. As

mentioned, activation and inhibition were modeled as YES- and NOT-gates, respectively.
Another modeled mechanism was the feedback loop. Feedback entails to return [a signal] to
an input of the same or a preceding stage of the circuit [] that produced it [6]. There are
both positive and negative feedback loops. While it would seem intuitive to implement a
positive feedback loop in a logical circuit by connecting a gate output to its input directly,
such a design is not without obstacles: it creates a so-called one-time latch, which can
change value once but cant switch back to its previous state [39].
Negative feedback loops take their own a signal that is the opposite of their output. An
important example of a negative feedback loop is the p53-Mdm2-circuit: p53 activates Mdm2,
which in turn inactivates p53 (fig 5a) [40]. DNA damage inhibits Mdm2s inhibition of p53
(fig 5b). Stated differently, it could be said that p53 is activated by DNA damage. The p53
regulation can be translated to a logic circuit as follows: p53 is activated by DNA damage or
by an absence of Mdm2 thus p53 must be an OR gate. When Mdm2 is active, it inhibits p53
(i.e., it doesnt send a signal to
p53) therefore, Mdm2 must be a
NOT gate. p53 feeds back to
Mdm2. Thus, we complete our
circuit (fig 5c).
Programming. This present study

relies heavily on programming.
The programming language used
was Python. Python is a highFigure 5. The p53-Mdm2-relationship. (a) Physiologically,
the proteins p53 and Mdm2 are part of an oscillatory
feedback loop: p53 activates Mdm2, which in turn
deactivates p53. When no other inputs are present, p53 will
when inhibited stop inhibiting Mdm2, and Mdm2 thus stops
inhibiting p53. This causes a cycle that with no other input
loops indefinitely. (b) When DNA damage is present in the
cell, p53 is activated and suppresses Mdm2s inactivation of
p53. (c) Using logic gate notation, the above relationship was
modeled so that Mdm2 is a NOT gate that takes its input
from p53 (and outputs the opposite of what p53s input was).
p53 is modeled as an OR gate that can take exactly two
inputs: the output from Mdm2, and an input called DNA
damage. When DNA damage is present it breaks the
oscillatory loop between p53 and Mdm2, and stabilizes p53
in an active state.
level
and
general-purpose
programming language, which

means that it is non-complex and
easy
to
understand
for
the
programmer, and that it can be

used in many domains [41,42].
The
rationale
behind
programming a logic model as

opposed to using existing logic
gate simulation software was
twofold: programming presented a way of simulating a logic model with fully adjustable input
parameters, and it gave the possibility of writing all the results to files with formats that could
easily be handled with spreadsheet software.
One basic feature of programming is the function. The computer science idea of a function
is similar to the function concept in mathematics: a function takes a value (i.e., an input),
processes that input in some way, and returns a value (i.e., an output). In addition, functions
are ways of bundling instructions that are wanted repeatedly, for easy execution.
Programming fundamentally incorporates the means to simulate logic statements (AND, OR,
9
NOT, and so on). Several classes of logic gates could thus be set up, using Pythons native
logic wrapped in functions named AND, NOT, and so on.
Initially, we evaluated the entire IPA output tweaked data set and translated nodes and
interactions into logic gates. Second, we programmed the nodes into the model, so that each
node name corresponded to the gate it had been assigned. The model was set up such that all
the models nodes (i.e., the gates) were collected into groups. Two key proteins LPS and IL1b were placed into the first group (called group 0, or
0,
because programming generally
starts numbering at 0, not 1). Nodes that were only one edge away from either LPS or IL-1b
were placed into
1,
nodes with two edges between them and the key proteins placed into
2,
and so on.
Whenever a protein had both an activating and inhibiting edge with another protein, these
edges were interpreted as a YES and NOT gate, respectively. This resulted in a larger number
of logic gates than there were proteins in the IPA output tweaked data set. In total, the
model had 13 groups and 253 gates.
Central to the construction of the model was the fashion in which it handled time (fig 6).
Time here implies the logic gates progression through different states as determined by
their inputs, so that a gate could sense its present input and update its output based on the
current input state. The model treats time by executing all nodes in the first group (
0 ),
in the
order they have been arranged: the nodes are executed in a list that the model reads from top
to bottom. When
has been executed, the model moves on, to
1,
and continues in this
fashion until all groups have had their gate states checked and updated. The model execution
from
to
(where
model begins again at
= 13) constitutes one full cycle. After the cycle is completed, the
0.
In the model, this cycling function is named the nodeIterator
function; see fig 6. The main intention was to run a large number of these cycles, to examine
how the node states evolved over time under different inputs. For proper model execution, the
time the model was set to run had to be a multiple of the number of groups (if it was not, the
model would end prematurely). Model run time T thus had to be set so that
where
(i.e., Q is an integer) and
= 0. The model is programmed so that each logic
gate checks its input at time t, calculates the new output based on the input, and writes the
output to a file. This results in a file that contains every node state at every time, which can be
imported into Excel.
10
Figure 6. The nodeIterator function. This image details the principal operation of a function
called nodeIterator, which iterates (steps) through the networks nodes (logic gates) in sequence.
Each grey square represents the state of a group of nodes. The x-axis (top row of squares)
marks time (t) in arbitrary units; the y-axis shows a number of groups (here labeled o , 1 , 2 and
so on) that can each contain several gates. nodeIterator works as follows: starting at time 0
(t=0), nodeIterator evaluates group o by checking its input(s), processing the inputs as o :s gate
specifies, and outputting the result. Time is then stepped forward one step, to t=1. 1 is
evaluated in the same fashion as o . Time steps forward to t=2. 2 is evaluated. (Time steps are
represented by the thin, solid arrows). When the final node of the network in this case
has
been computed, time t is set to (i itial value 1) (this reset is represented by the bold arrow),
and the process of evaluating nodes begins again, in the fashion specified above (represented
by the thin, dashed arrows). The difference is that time is now t=1 at o , t=2 at 1 , and so on.
The nodeIterator function runs through these steps until it reaches a time t max , which is the time
interval during which the model is specified to run.
For the function to perform as intended, a prerequisite is that t max
(should t max < ,
nodeIterator will not have executed all the groups before the function terminates). To accurately
reflect model dynamics, the relationship between nodeIterators run time and number of groups
is in practice t max .
The model incorporated the possibility of assigning values to gates before execution began.
Some nodes could hence be set to already be active (i.e., output a true value, 1) when model
execution commenced. This was added to closer adhere to reality where, while a cell takes
input (signals), some of the proteins in its signaling networks can already be active due to
other inputs or processes.
11
Generating data using logic model. The logic model was used to generate data to compare
with the experimental data. It was arbitrarily chosen to run the model for 195 (13
1 ) time
steps, using several different input configurations: the inputs (on- and off-states) of LPS and
IL-1b were varied.
Interpreting logic model data. We chose to focus the model data analysis on the ten most
connected proteins in the IPA output tweaked data set, in addition to those nodes that were
experimentally studied. It was unwieldy to handle the large amounts of data generated per
logic gate, and we thus took the arithmetic mean of the analyzed nodes states:
(
)
t
where
is the summation of all states from time 0 to time t, t being how long the model runs
(that is, 195 time steps), and
is the state (gate output) at time i, This meant that gates which
were active more often presented with higher means.
Experimental data collection. Experimental data was collected from neutrophils that were
stimulated with either lipopolysaccharide (LPS) or IL-1, or no stimulus at all (i.e., control).
Briefly, peripheral blood samples were collected from three healthy volunteers.
Neutrophils were isolated as follows: each freshly collected, room temperatured, blood
sample was agitated for 5 minutes. 5 ml polymorphprep (Axis-Shield Diagnostics Ltd,
Dundee, Scotland) was deposited in a 15 ml Falcon centrifuge tube together with 3 ml
lymphoprep (Axis-Shield Diagnostics Ltd, Dundee, Scotland). 6 ml of blood was then poured
along the Falcon tube wall, and the tube was centrifuged at 4 0
, for 40 minutes in room
temperature.
After centrifugation, the three top layers were removed from the Falcon vial and placed in a
50 ml Falcon tube. The Falcon tubes contents were then mixed with equal parts room
temperatured 0.45% NaCl solution. 20 ml room temperature sterile PBS (phosphate-buffered
saline) (Sigma-Aldrich, St. Louis, MO, USA) was then added, and the tube turned over a few
times.
The 50 ml Falcon tube was centrifuged at 400
, for 10 minutes in room temperature.
Supernatant was removed and the pellet resuspended with 4.5 ml sterile MQ water on ice, in
order to lyse remaining erythrocytes. After approximately 35 seconds, 1.5 ml cold PBS with
12
3.4% NaCl was added to the Falcon tube, and then 5 ml KRG without Ca2+ was added to the
tube, on ice.
The tube was centrifuged again, at 400
, for 5 minutes at +4C. The supernatant was once
again removed, and the pellet resuspended with 4.5 ml sterile MQ water on ice. After
approximately 35 seconds, 1.5 ml cold PBS with 3.4% NaCl was. 5 ml KRG without Ca2+, on
ice, was added. The tube was centrifuged at 1323 RPM (400 g), for 5 minutes at +4C.
Supernatant was removed, and the pellet dissolved with the following solution: 1 ml Roswell
Park Memorial Institute medium (RPMI) 1640 (Thermo Fisher Scientific, Waltham, MA,
USA), 10% heat-inactivated fetal bovine serum (Hi-FBS) (Thermo Fisher Scientific,
Waltham, MA, USA) and 2 mM-L-glutamine (Thermo Fisher Scientific, Waltham, MA,
USA). 10 l of the cell suspension and 90 l PBS was added to a tube, and then 100 l 0.4%
trypan blue was added (this yields a 20-fold dilution). The tube was vortexed and then
incubated for 3 minutes at room temperature.
Annexin V (Thermo Fisher Scientific, Waltham, MA, USA) was used to assay apoptosis.
Annexin V is a probe used for detecting apoptosis; annexin V detects cells that express
phosphatidylserine (PS). Phosphatidylserine is a phospholipid in the cell membrane that is
located to the inner leaflet of the membrane during physiological conditions. In an early
apoptotic setting PS translocates to the outer leaflet [37,38]. PS is therefore a marker of
apoptosis, which can be detected via annexin Vs strong affinity for PS [37,38].
Neutrophils were plate incubated, using solutions in the well configuration described in table
2: each row represents a well that has neutrophil samples from one individual, and is
incubated with the solution described (LPS or IL-1b). The three-by-seven image in table 2 is
in reality a part of 48 well plate.
Table 2. Well plate with solutes. DMEM: Dulbeccos modied Eagle's medium; LPS:
lipopolysaccharide.
Subject
1
2
3
DMEM
DMEM
DMEM
DMEM
DMEM
DMEM
1
1
1
LPS
ml 1
ml 1
ml 1
Isolated neutrophils were resuspended at
10
10
10
LPS
ml 1
ml 1
ml 1
LPS
100
ml 1
100
ml 1
100
ml 1
cells/ml in RPMI 1640-medium
10
10
10
10
IL-1b
ml 1
ml 1
ml 1
IL-1b
100
ml 1
100
ml 1
100
ml 1
supplemented with 5% FCS (fetal calf serum) (Sigma-Aldrich, St. Louis, MO, USA) and 2
mmol/l L-glutamine, and incubated at 37C with gentle agitation. Plate wells (see table 2)
were seeded at 1 0
10 cells/ml (so that 200 l/well = 2 0
10 cells/well). Neutrophils
13
were stimulated for 4, 6 and 18 h. After stimulation, plates were centrifuged at 400
at
room temperature for 5 minutes.

3 ml binding buffer was prepared. 100 l binding buffer was added to each well, and the plate
was then centrifuged at 400 g at room temperature for 5 minutes. Supernatant was carefully
removed. Annexin V-solution (50 l annexin V added to 450 l binding buffer) was prepared.
50 l annexin V was added per well. The plate was incubated for 10 minutes in the dark, in
room temperature. The plate was centrifuged at 400
at room temperature for 5 minutes.
Supernatant was carefully removed. 100 l binding buffer was added and the plate read using
a microplate photometer, with excitation at 485 nm and emission at 520 nm. Emission data
was stated in fluorescence intensity values.
Experimental data analysis. Annexin V-data was analyzed using paired non-parametric
ANOVA (Friedmans test), followed by post-hoc Dunns test.
Comparing experimental and model data. Given that experimental data is expressed as
fluorescence intensity, the intensity variations between neutrophil experimental groups could
be stated as a percentage of the control groups fluorescence intensity. Equivalently, model
data could be compared among data sets generated, using inputs corresponding to the
molecules tested experimentally (i.e., LPS and IL-1b).
Ethics. There are several ethical ramifications to be considered in bioinformatics. The

collection and storage of large volumes of data, sometimes from humans, can raise concerns
about personal integrity. Furthermore, the highly specialized and complex (i.e. mathematical)
evolution of some branches of biology means that biologists may have to recognize ethical
implications that have traditionally been the duty and realm of physicians.
Results
IPA was used to expand the prior knowledge network, via the grow-tool. The grow-tool
is a function in IPA that can expand a given set of nodes using node and edge data from
several databases (see table 3).
IPA has a limit on the number of nodes that can be added to a pathway (here denoted
Specifically,
=10
tool was greater than
).
10 nodes. As long as the number of nodes added via the grow

, IPA would only add a number of nodes that is within the limit
14
(that is, the number of added nodes are [
], where
number of nodes in the prior knowledge network). If the number

and (
)=
, there is a possibility that
is the
of added nodes is
represents only a portion
of the number of nodes found via the grow tool, and for this reason adding a number of nodes
so that
ensures that the selection of nodes is a controlled process, as opposed to
simply being the selection of the first available nodes that fit within the maximum limit.
Similarly, IPA has a limit on the number of edges that can be added (which is e
104
= 30
edges). The reasoning supporting the filter adjustments when adding edges are the same as
those for nodes. Accordingly, we adjusted the grow tool parameters to fit these constraints.
(again, see table 3).
10
30
10
>
1 04
10
30
10
>
14
10
30
10
>
1 32 4
10
30
10
>
12
10
30
10
>
1 21
10
30
10
>
11
10
30
10
>
NP
NP
WT
NP
WT
NP
ND
DB1
Edges
added
Notes
Nodes
added
Max limit
Data source
Disease
Mutation
Node type
Medium
confidence
level
High
confidence
level
Experimentally
validated
Table 3. The number of nodes and edges. Number of nodes and edges found, using Ingenuity
Pathway Analysis (IPA:s) grow-tool to expand the prior knowledge network data set, after applying
different
tool
filters.
The
initial
data
set
consisted
of
0 10 nodes and 3
102 edges. IPA:s node limit (
) is 1 0 10 ; its edge limit (e ) is 3 0
10 . Both or one of these maximum limit values can either be exceeded (>), or be at or below the limit
value () of the number of allowed nodes or edges. When the maximum limit is exceeded, the number
of nodes that are added will be . Adding a number of nodes so that
ensures that the
selection of nodes is a controlled process, as opposed to simply being the selection of the first
available nodes that fit within the maximum limit. For this reason, filters were carefully adjusted to
fulfill the condition
. The same mathematical argument was applied when adjusting the edge
filters as when adjusting node filters. See the explanation of mathematical terms below.
Colors: white means default settings, black means that the given filter is applied (inclusion), grey,
crossed-out cells means that the given filter is not applied (exclusion). Text given on a black field
indicates the inclusion criteria for that filter (e.g.: WT only on black field means that no other
genotypes than those of wild type were considered).
Abbreviations and equations: WT: wild type, NP: non-pharmaceutical (not biologic or chemical drug),
ND: no disease association, DB1: used specific databases (only Ingenuity Expert Information
databases and protein-protein interaction-databases), DB2: used specific databases (protein-protein
interaction-databases).
=(
): the number of nodes added if the maximum limit is reached.
: the number of nodes already in the network when new nodes are added.
= 1 0 10 : the number of nodes that IPA allows.
15
NP
NP
NP
WT
10
30
10
>
10
30
10
>
1 42
10
30
10
>
2 31
DB1
1 112
ND
DB1
11
ND
DB1
WT
* Every filter setting at default (white color) implies that all filters are present; nodes of all confidence
NP
DB2
10
10
levels, molecular types, mutations, diseases and data sources are used. Used for model
construction; in this study termed the IPA output data set.
The IPA output data set was used for constructing the logic model. In the data set, a small
subset of nodes lacked interactions with the network. This subset was removed from the data
set entirely.
IPA generated several distinct types of interactions. Some of the interactions were of an
ambiguous nature, and were thus not compatible with the logic model: for example,
ubiquitination has no specification whether it signifies activation, inhibition, or some other
kind of interaction. Interactions of the type activation or inhibition were kept, and all
other interaction types removed (table 4).
Table 4. The number of nodes and edges in IPA output data set after applying criteria.
Criteria
None.
Removed isolated nodes.
Removed interactions that
did not fit criteria.
Results
Nodes
Notes
Edges
1
1
102
102
2 31
2 2
10
10
IPA output data set (raw data)
1 43
102
42
102
IPA output tweaked data set
Of the 426 interactions in the IPA output tweaked, some consist of connections that join to
the proteins they originated from: these are called self-loops. 38 of the interactions in the
network are self-loops.
The IPA output tweaked data set is displayed in fig 7. This is the data set that served as a
basis for the logic model. Nodes are color coded according to their degree ( ). The frequency
distribution of the degrees in the data set is graphed in fig 8.
16
Frequency
Figure 7. Picture of IPA output tweaked data set. The colors in the image correspond to a nodes
degree ( ). Nodes with a high are tinted red, those with a low are blue, and nodes with k that falls
in between the extremes are green. Lines between nodes are their edges (interactions). The red arrow
points to the node FADD. Image generated using Cytoscape (version 3.3.0) [43].
40
16
9
2
4
1
3
1
1
1
1
1
1
1
0
10
20
30
40
50
60
70
Degree (k)
Figure 8. Frequency distribution of degrees in IPA output tweaked data set. x-axis marks
degree; y-axis marks the frequency with which that degree occurs among nodes.
17
In the IPA output tweaked data set, the ten most connected nodes those with the highest k
are those presented in table 5. In addition, the k of the nodes used for experimental
comparison are presented in table 5.
Table 5. Details of high- nodes and one experimentally important node subset. * expressed
as a percentage of the total number of edges in the IPA output tweaked data set, where the total
edge number is 4 2
102 .
Subset
Highest
Node
Subset
,%*
TP53
67
15.7
Highest
MDM2
54
12.7
CDKN1A
32
7.5
Experimentally
TRAF6
30
7.0
important
CDKN1B
25
5.9
CDKN2A
24
5.6
CDK2
23
5.4
PS: phosphatidylserine. LPS: lipopolysaccharide.
Node
Akt
ATM
CASP8
PS
LPS
IL-1b
21
21
20
10
5
2
,%*
4.9
4.9
4.7
2.3
1.2
0.5
Data was collected from the logic model by varying the input of LPS and IL-1b. This yielded
the data output in table 6.
Table 6. Logic model output when varying inputs. One column implies one instance of running the
model. The input for a given row is the user-defined input state for that model execution instance.
Values in the output row are the arithmetic mean of the node states during the entire model execution,
so that 0 means a node was never active and 1 shows the node was always active.
Input row colors: red means an input of 1, white means input of 0.
Output row colors: red means output of 1, white means output of 0, colors in between signify different
levels of activation.
Inputs
Experimental
comparisons
Other
comparison
Outputs
LPS
IL-1b
0
0
1
0
0
1
1
1
0
0
TP53
TP53
MDM2
CDKN1A
TRAF6
CDKN1B
CDKN2A
CDK2
Akt
ATM
CASP8
PS
LPS
IL-1b
0.357143
0.448980
0.357143
0
0.357143
0.448980
0.357143
0.428571
0.428571
0.464286
0.357143
0
0
0.357143
0.448980
0.357143
0
0.357143
0.448980
0.357143
0.428571
0.428571
0.464286
0.357143
0
0
0.357143
0.448980
0.357143
0
0.357143
0.448980
0.357143
0.428571
0.428571
0.464286
0.357143
0
1
0.357143
0.448980
0.357143
0
0.357143
0.448980
0.357143
0.428571
0.428571
0.464286
0.357143
0
1
0.397959
0.443878
0.357143
0
0.357143
0.443878
0.357143
0.428571
0.428571
0.459184
0.357143
0
0
Fig 9 details experimental data, which was collected from neutrophils and consisted of
measured fluorescence intensity.
18
Analysis results of annexin V-data is presented in fig 9.
Figure 9. Fluorescence intensity as detected in neutrophils stimulated for 18 h. The x-axis

describes the molecules stimulate neutrophils. y-axis marks a fluorescence intensity which relates to
annexin V. The greater the intensity, the more annexin V has been detected. Higher values
accordingly imply a larger degree of apoptosis. CCM: cell culture medium; LPS: lipopolysaccharide.
Discussion
This studys aim has been to examine whether it is possible to (1) collect data about proteinprotein interactions from a database, (2) reformulate these interactions as a model network of
logic gates, and (3) collect results from the network model to be compared against
experimental data. The model network and the experimental data related to cellular apoptosis
pathways. Experimental data was gathered by stimulating neutrophils with several different
molecules and measuring the frequency of apoptosis in differently exposed groups. Two
hypotheses were defined, the first being that this studys model would not conform to
experimental results, the second that experimental data would not show significant differences
in apoptosis frequency among neutrophil groups. The model was tested by feeding it values
that corresponded as closely as possible to experimental molecules. While the output of the
model was not satisfactory, the logic model was able to properly take input which it could
then perform computations on and transform into output. Neutrophil stimulation was
statistically significant regarding LPS, and near-significant with reference to IL-1b. It was not
possible to compare the model with experimental data.
The study commenced with gathering information on apoptosis pathways. Inclusion criteria
(table 1) meant that molecules from several undergraduate textbooks [28,29] were admitted
to the initial pathway. This may be perceived as a non-rigorous approach, but the fact that
19
such textbooks often set out to describe very general cases could imply that the molecules
gathered from them are also accepted in the scientific community.
The initial data set the prior knowledge network was expanded using IPA (table 3). For
extending data sets, IPA offers tools with several adjustable parameters. Fastidious parameter
alterations were performed and documented in an attempt to emulate the workflow that is
common to benchside, experimental biology. Such documentation was crucial: it has been
suggested that documentation in bioinformatics is no less important than that of traditional
biology bioinformatics-grounded research must also meet the exacting reproducibility
criteria that are the hallmark of proper science [44].
When expansion was completed, exclusion criteria were used to scale down the IPAgenerated node data (table 4). These criteria shrunk the node count of the IPA output
tweaked data set by approximately 84%. This resulted in a network that exclusively had
edges of the type activation or inhibition. The distribution of these edges (fig 8) is skewed:
57% of the nodes have only one edge, while one protein (p53) alone connects to
approximately 16% of all edges present in the data set. The ostensible prominence of p53, as
measured by its , could be attributable to several factors: first, it is possibly explained by
p53s importance in cell signaling, and the large volume of oncological research devoted to
studying p53 [45]. Second, the
used.
of p53 may also represent a bias in the database software
Finally, it could be an artifact from the exclusion criteria applied the degree
distribution of the original IPA output data set might feasibly be different from IPA output
tweaked; the exclusion criteria may have effected an unfavorable removal of meaningful
edges. It is noteworthy however, that allowing for the third factor does not remove the fact
that the logic model could not include non-binary interactions (i.e., interactions of a type other
than the dichotomous activation-inhibition variety). With regards to other nodes with a high
(highest nodes in table 5), MDM2 is second only to p53. A possible cause for Mdm2s
high
is its close association with p53, which it inhibits. For this reason, the same arguments
as postulated regarding p53s high
can be applied to Mdm2s .
The concept of degree takes into account nodes with edges to themselves (self-loops)
FADD, for example (fig 7; marked with a red arrow, left of center in bottom row), has
= 4,
where one of the edges is a self-loop: hence, such loops can and do influence the degree
distribution of a network.
Turning to the data output from the logic model (table 6) and focusing at the results of LPSand IL-1b-input, it is apparent that there is no variation in the model, neither among the gates
that represent the most connected proteins, or those proteins previously deemed
20
experimentally important. There are several possible reasons for the model not showing any
variation. One cause could be the IPA output tweaked data set itself. Because the logic
model is a direct function of the data set used to construct it, the quality of the data set
influences the model.
It is possible that the interaction data set contained too many interactions that were
contradictory: in several cases, one node would simultaneously activate and inhibit a target
node (i.e., an upstream gate would be connected to a downstream gate via both a YES and a
NOT gate). Physiologically this is certainly not an impossible situation, because a protein
might activate some other protein at a given time, and inhibit that same protein at other times
[46]. In this projects logic model, however, such dual interaction was not beneficial because
it continuously activated the target protein: it essentially made the node a one-time latch.
It is possible to circumvent the contradiction of having both positive and negative regulation
of the same node at the same time, but this arguably requires transforming a logic model into
a continuous one (i.e., a model that can take values between 1 and 0) where interactions can
be assigned strengths. While this paradigm requires differential equation models, it displays a
higher granularity and may thus be a better representation of reality
[47]. Continuous
biochemical models have been assembled by Wittman et al. [48] and MacNamara et al. [49],
among others.
Returning to the IPA data set and its implication for the logic model, the sheer quantity of
data generated via IPA could mean that quality could be lost. Even if parameters were
carefully adjusted when accruing data with the software, it is imaginable that there exist
variations in the quality of PPIs, even if that data was experimentally verified. In support of
this view is the fact that Saez-Rodriguez et al., in building pathways using IPA, noted that the
softwares description of IRS-1 biology is poor. In the paper, Saez-Rodriguez et al. decided
to supplement the IPA-generated data set using other sources [14].
The quantity of nodes included in this studys model is much larger than the amount of nodes
in other studies (e.g., Zhang et al. used 58 nodes [13] and Aldridge et al. 15 nodes [15]). This
could indicate that this studys model may have performed better had the data collection
focused on establishing a small, but well-evidenced network, of proteins and their
interactions.
As mentioned, the IPA output tweaked data set contains self-loops (approximately 9% of
the edges are self-loops). When they consist of YES gates, such loops create one-time latches
that after activation put gates into permanent on-states. If such loops are present on nodes
with a high
, the protein in question will activate many downstream nodes, and thus
21
significantly affect the behavior of the network at large. It is important to note that self-loops
consisting of NOT gates do not exhibit the one time latch-behavior (such loops may, however,
oscillate). Removal of self-loops with YES gates could possibly mitigate the static behavior of
the model.
The logic model data set did exhibit one small variation: when p53 was switched to an onstate and the model executed, p53 was active somewhat longer than what was the norm of the
other model runs (specifically, p53s active duration was 11% greater than normal). It is
imaginable that this external activation perturbs the system, and p53, so that it maintains an
on-state for a longer time before falling back to its resting (off) state. Possibly, p53s logic gate
had a more generous window without inhibition during this period, during which it could
inhibit other gates.
Having thus far focused on the underlying data set, it is necessary to discuss the model itself.
This studys logic model was conceived using the programming language Python. While
software already exists for the construction of such models from biological pathways,
building our model from the ground up meant that full control was had over model
formulation. In addition, programming allowed for peripheral functionality that created an
efficient pipeline between input, output, and data analysis.
The model was constructed on the assumption that a pathway could be simulated as a network
of simple logic gates. This simplicity could be a reason for the models staticity, since the
Boolean model of the kind constructed here has several drawbacks that could be caused by
oversimplification of reality: it does not take into account many of the spatiotemporal
characteristics of PPIs, such as the compartmentalization of signaling processes in cells and
complex, potentially non-linear, dose-response relationships [50-52].
It is natural that discrete logic models, that only incorporate activated and inactivated states,
do not capture the molecular reality of protein-protein interactions instead of
phosphorylation, dimerization, ubiquitination and other interactions, logic models have onand off-states that symbolize PPI:s. This is a limitation because it reduces a complex system
to a simple one, and a possible cause of the models non-desired behavior.
Taken together, the arguments given thus far could indicate that the negative results generated
by the model are caused by the data set used to construct the model, rather than the
assumptions made about the model itself. A key argument for this is the fact that, as earlier
presented, it seems possible that the data set itself became inconsistent when subjected to the
inclusion criteria that were used to remove edges. Furthermore, this type of model has been
used previously and tenably with measurable and clinically significant results [16]. In one
22
case, results of one logic model found interactions that were not previously known, [15]
indicating the value of this type of model.
Having discussed the logic model and the data used to build it, experimental data is now
addressed. The LPS-stimulated neutrophils displayed lowered levels of apoptosis, that were
statistically significant. This conforms with other studies, in which LPS have been shown to
slow the rate of apoptosis in neutrophils [53]. IL1 with
= 0 0 1 did not meet criteria
for statistical significance, but were close. This could be due to the small sample ( = 3) or
due to natural variation: it is possible that the -value had been smaller had the experiment
been iterated with similar conditions. Because phosphatidylserine is a well-documented
apoptosis predictor, implying that these results especially LPS-induced apoptosis reduction
are valid. The intention of comparing model data to experimental data, together with
statistically significant experimental results, could still indicate the logic model-to-experiment
as a potential method for future discoveries regarding biologic pathways.
Conclusion
Given the partly significant experimental results, the null hypothesis regarding experimental
data can be rejected. Because of the inconclusive logic model data, however, comparing the
model with experimental data becomes futile: it can only be safely ascertained that the logic
model does not perform as intended, and that our evidence points to there existing no
correlation between model and experimental data: the null hypothesis about model accuracy is
thus presumed true.
Expansions of this project could include larger pathways, automated conversion of mined
node data into logic gates, and non-discrete representations of protein-protein interactions. To
truly represent the real-world cell in a model, one could implement representations of natural
stochastic processes as a source of perturbation in modeled pathways.
Given the results presented here, we conclude that for a time-dependent, discrete logic model
to function properly, alternative procedures are requisite. A vital resource would be highquality protein data sets with rigorously sourced nodes and edges. Improved characterization
of protein-protein relationships could inform an improved, yet data-driven, logic modeling
paradigm.
23
Acknowledgements
I have the most heartfelt gratitude towards Dr. Robert Kruse, for his above-and-beyond
dedication in guiding the completion of this study.
I would also like to thank Doug Lauffenburger (MIT) and Melody K. Morris, two researchers
whose work on logic models unbeknownst to them has been pivotal in this studys
execution.
References
1. Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P. Mechanisms of cell
communication. Molecular biology of the cell. Fifth ed. New York, USA: Garland
Science, Taylor & Francis Group; 2008. p. 879.
2. Lodish H, Berk A, Zipursky S. Signal transduction and G protein-coupled receptors.
Molecular cell biology. Seventh ed. New York, USA: W. H. Freeman and company;
2012. p. 673.
3. Wu H. Higher-order assemblies in a new paradigm of signal transduction. Cell
2013;153(2):287-292.
4.
Regev
A,
Shapiro
E.
09/26;419(6905):10.1038/419343a.
Cells
as
computation.
Nature
2002
5. de Silva AP, Uchiyama S. Molecular logic and computing. Nat Nano 2007 print;2(7):399410.
6. Oxford English Dictionary (OED). Available at: http://www.oed.com.
7. Watterson S, Marshall S, Ghazal P. Logic models of pathway biology. Drug Discov Today
2008 May;13(9-10):447-456.
8. Center for Cancer Systems Biology (CCSB). Network biology. Available at:
http://ccsb.dfci.harvard.edu/web/www/ccsb/research/networks.html. Accessed 11/01,
2016.
9. Gursoy A, Keskin O, Nussinov R. Topological properties of protein interaction networks
from a structural perspective. Biochem Soc Trans 2008 Dec;36(Pt 6):1398-1403.
10. Pavlopoulos GA, Secrier M, Moschopoulos CN, Soldatos TG, Kossida S, Aerts J, et al.
Using graph theory to analyze biological networks. BioData Min 2011;4:10-0381-4-10.
11. Bonetta L. Protein-protein interactions: Interactome under construction. Nature 2010
12/09;468(7325):851-854.
12. Kauffman SA. Metabolic stability and epigenesis in randomly constructed genetic nets. J
Theor Biol 1969 Mar;22(3):437-467.
24
13. Zhang R, Shah MV, Yang J, Nyland SB, Liu X, Yun JK, et al. Network model of survival
signaling in large granular lymphocyte leukemia. Proc Natl Acad Sci U S A 2008
07/05;105(42):16308-16313.
14. Saez-Rodriguez J, Alexopoulos LG, Epperlein J, Samaga R, Lauffenburger DA, Klamt S,
et al. Discrete logic modelling as a means to link protein signalling networks with
functional analysis of mammalian signal transduction. Mol Syst Biol 2009;5:331.
15. Aldridge BB, Saez-Rodriguez J, Muhlich JL, Sorger PK, Lauffenburger DA. Fuzzy Logic
Analysis of Kinase Pathway Crosstalk in TNF/EGF/Insulin-Induced Signaling. PLoS
Comput Biol 2009 04/03;5(4):e1000340.
16. Sahin O, Frohlich H, Lobke C, Korf U, Burmester S, Majety M, et al. Modeling ERBB
receptor-regulated G1/S transition to find novel targets for de novo trastuzumab
resistance. BMC Syst Biol 2009 Jan 1;3:1-0509-3-1.
Science, Taylor & Francis Group; 2008. p. 897.
18. Agarwal S, Deane CM, Porter MA, Jones NS. Revisiting date and party hubs: novel
approaches to role assignment in protein interaction networks. PLoS Comput Biol 2010
06/17;6(6):e1000817.
19. Morris MK, Saez-Rodriguez J, Sorger PK, Lauffenburger DA. Logic-based models for the
analysis of cell signaling networks. Biochemistry 2010 Apr 20;49(15):3216-3224.
20. Krallinger M, Erhardt RA, Valencia A. Text-mining approaches in molecular biology and
biomedicine. Drug Discov Today 2005 Mar 15;10(6):439-445.
21. Percha B, Garten Y, Altman RB. Discovery and explanation of drug-drug interactions via
text mining. Pac Symp Biocomput 2012:410-421.
22. Perez-Iratxeta C, Bork P, Andrade MA. Association of genes to genetically inherited
diseases using data mining. Nat Genet 2002 Jul;31(3):316-319.
23. Kantety RV, La Rota M, Matthews DE, Sorrells ME. Data mining for simple sequence
repeats in expressed sequence tags from barley, maize, rice, sorghum and wheat. Plant
Mol Biol 2002 Mar-Apr;48(5-6):501-510.
24. Salath M, Bengtsson L, Bodnar TJ, Brewer DD, Brownstein JS, Buckee C, et al. Digital
epidemiology. PLoS Comput Biol 2012 07/26;8(7):e1002616.
25. Shouval R, Bondi O, Mishan H, Shimoni A, Unger R, Nagler A. Application of machine
learning algorithms for clinical predictive modeling: a data-mining approach in SCT.
Bone Marrow Transplant 2014 Mar;49(3):332-337.
26. Zeng Z, Shi H, Wu Y, Hong Z. Survey of Natural Language Processing Techniques in
Bioinformatics. Comput Math Methods Med 2015;2015:674296.
25
27. Cox AD, Der CJ. Ras history: The saga continues. Small GTPases 2010 Jul-Aug;1(1):227.
Science, Taylor & Francis Group; 2008.
29. Weinberg RA. The biology of cancer. Second ed. New York, USA: Garland Science,
Taylor & Francis Group, LLC; 2014. p. 227.
30. Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference
resource for gene and protein annotation. Nucleic Acids Res 2016 Jan 4;44(D1):D45762.
31. Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids
Res 2000 Jan 1;28(1):27-30.
32. Bonni A, Brunet A, West AE, Datta SR, Takasu MA, Greenberg ME. Cell survival
promoted by the Ras-MAPK signaling pathway by transcription-dependent and independent mechanisms. Science 1999 Nov 12;286(5443):1358-1362.
33. Zuckerman V, Wolyniec K, Sionov RV, Haupt S, Haupt Y. Tumour suppression by p53:
the importance of apoptosis and cellular senescence. J Pathol 2009 Sep;219(1):3-15.
34. Worthen GS, Avdi N, Buhl AM, Suzuki N, Johnson GL. FMLP activates Ras and Raf in
human neutrophils. Potential role in activation of MAP kinase. J Clin Invest 1994
Aug;94(2):815-823.
35. Marino G, Kroemer G. Mechanisms of apoptotic phosphatidylserine exposure. Cell Res
2013 Nov;23(11):1247-1248.
36. IPA, QIAGEN Redwood City, www.qiagen.com/ingenuity.
37.
Abcam plc. Annexin V detection protocol for apoptosis. Available at:

http://www.abcam.com/protocols/annexin-v-detection-protocol-for-apoptosis. Accessed
05/22, 2016.
38. Thermo Fisher Scientific Inc. Annexin V apoptosis detection kits. Available at:
http://www.ebioscience.com/knowledge-center/area-of-biology/apoptosis/annexin-v.htm.
Accessed 05/22, 2016.
39. Raymer MG. Digital memory and computers. The silicon web: physics for the internet
age: CRC Press; 2009. p. 377.
40. Moll UM, Petrenko O. The MDM2-p53 interaction. Mol Cancer Res 2003
Dec;1(14):1001-1008.
41. RD Glossary. ... "A high level language is a programming language which abstracts the
execution semantics of a computer architecture from the specification of the
program. This abstraction make the process of developing a program a much more
26
simple
and
understandable
process."
...
Available
at:
https://web.archive.org/web/20070826224349/http://www.ittc.ku.edu/hybridthreads/gloss
ary/index.php. Accessed 05/23, 2016.
42.
The
Free
Dictionary.
"General-purpose
language:
a computer programming language whose use is not restricted to a particular type of com
puter
or a specialized application.".
Available
at:
http://encyclopedia2.thefreedictionary.com/general-purpose+language. Accessed 05/23,
2016.
43. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: a
software environment for integrated models of biomolecular interaction networks.
Genome Res 2003 Nov;13(11):2498-2504.
44. Baggerly KA, Coombes KR. Deriving chemosensitivity from cell lines: Forensic
bioinformatics and reproducible research in high-throughput biology. The Annals of
Applied Statistics 2009:1309-1334.
45. Prives C, Hall PA. The p53 pathway. J Pathol 1999 Jan;187(1):112-126.
46. Filippov AK, Fernandez-Fernandez JM, Marsh SJ, Simon J, Barnard EA, Brown DA.
Activation and inhibition of neuronal G protein-gated inwardly rectifying K(+) channels
by P2Y nucleotide receptors. Mol Pharmacol 2004 Sep;66(3):468-477.
47. Janes KA, Lauffenburger DA. A biological approach to computational models of
proteomic networks. Curr Opin Chem Biol 2006 Feb;10(1):73-80.
48. Wittmann DM, Krumsiek J, Saez-Rodriguez J, Lauffenburger DA, Klamt S, Theis FJ.
Transforming Boolean models to continuous models: methodology and application to Tcell receptor signaling. BMC Systems Biology 2009;3(1):1-21.
49. MacNamara A, Terfve C, Henriques D, Bernabe BP, Saez-Rodriguez J. State-time
spectrum of signal transduction logic models. Phys Biol 2012 Aug;9(4):0450033975/9/4/045003. Epub 2012 Aug 7.
50. Scott JD, Pawson T. Cell signaling in space and time: where proteins come together and
when they're apart. Science 2009 Nov 27;326(5957):1220-1224.
51. Petersen OH. Calcium signal compartmentalization. Biol Res 2002;35(2):177-182.
52. Birtwistle MR, Rauch J, Kiyatkin A, Aksamitiene E, Dobrzyski M, Hoek JB, et al.
Emergence of bimodal cell population responses from the interplay between analog
single-cell signaling and protein expression noise. BMC Systems Biology 2012;6(1):112.
53. Francois S, El Benna J, Dang PM, Pedruzzi E, Gougerot-Pocidalo MA, Elbim C.
Inhibition of neutrophil apoptosis by TLR agonists in whole blood: involvement of the
phosphoinositide 3-kinase/Akt and NF-kappaB signaling pathways, leading to increased
levels of Mcl-1, A1, and phosphorylated Bad. J Immunol 2005 Mar 15;174(6):36333642.
27
54. Tsafnat G, Glasziou P, Choong MK, Dunn A, Galgani F, Coiera E. Systematic review
automation technologies. Syst Rev 2014 Jul 9;3:74-4053-3-74.
55. Chang X, Xu T, Li Y, Wang K. Dynamic modular architecture of protein-protein
interaction networks beyond the dichotomy of "date" and "party" hubs. Scientific Reports
2013 04/22;3:1691.
28

A Time-Dependent, Discrete Logic-Based Computer Model of Mitogenic Neutrophil Signaling Circuitry Derived Using NLP Information Extraction Software

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

A Time-Dependent, Discrete Logic-Based Computer Model of Mitogenic Neutrophil Signaling Circuitry Derived Using NLP Information Extraction Software

Încărcat de

Drepturi de autor:

Formate disponibile

rebro University

A time-dependent, discrete logic-based

Author Sebastian Hansen

MAPK: mitogen-activated protein kinase

DE: differential equation

DM: data mining

NaCl: sodium chloride

DMEM: Dulbeccos modied Eagle's medium

NLP: natural language processing

EMEM: Eagle's minimal essential medium

PBS: phosphate-buffered saline

FCS: fetal calf serum

PIN: protein interaction network

Hi-FBS: heat-inactivated fetal bovine serum

PPI: protein-protein interaction

IL-8: interleukin 8 (CXCL8)

IPA: Ingenuity Pathway Analysis

RPM: revolutions per minute

KEGG: Kyoto Encyclopedia of Genes and

RPMI: Roswell Park Memorial Institute

process information, and execute responses [5]. A

sometimes called hubs [10]. The number

). The number of edges

directed towards a node is its in-degree

The first implementation of logic gate

been used to probe cell signaling in both health and disease.

coincidence detectors, which are equivalent to AND gates in the microprocessor of a

constructing a logic-dependent cell signaling model. A computer model implemented in the

High-throughput biologic analysis has enabled rapid simultaneous measuring on large

Hypotheses were also set up in regard to neutrophil experimental data:

Materials and methods

Logic interpretation of data. A number of assumptions were made when modeling. As

Programming. This present study

programming language, which

programmer, and that it can be

programming a logic model as

because programming generally

has been executed, the model moves on, to

and continues in this

model begins again at

In the model, this cycling function is named the nodeIterator

(i.e., Q is an integer) and

= 0. The model is programmed so that each logic

(that is, 195 time steps), and

were active more often presented with higher means.

, for 40 minutes in room

, for 10 minutes in room temperature.

, for 5 minutes at +4C. The supernatant was once

Isolated neutrophils were resuspended at

cells/ml in RPMI 1640-medium

10 cells/ml (so that 200 l/well = 2 0

room temperature for 5 minutes.

at room temperature for 5 minutes.

Ethics. There are several ethical ramifications to be considered in bioinformatics. The

tool was greater than

10 nodes. As long as the number of nodes added via the grow

(that is, the number of added nodes are [

number of nodes in the prior knowledge network). If the number

, there is a possibility that

ensures that the selection of nodes is a controlled process, as opposed to

IPA output data set (raw data)

IPA output tweaked data set

Analysis results of annexin V-data is presented in fig 9.