Informatics

EMBO
reports
Informatics and hypothesis-driven research

The amassing of enormous data sets in of data, but rather as providing significant Ultimately, informatics should be
genomics, proteomics and imaging has ‘added value’. Consider a commercial viewed neither as a bag of tools and pro-
led a number of scientists to envision a database consisting of credit-card trans- grammes nor as inextricably linked to the
future in which automated data-mining actions: its purpose is to keep track of idea of artificial intelligence, but rather as
techniques, or ‘data-driven discovery’, individual accounts, and most of the pointing to a new approach to experimental
will eventually rival the traditional queries to the database are specific, design that takes into account the future
hypothesis-driven research that has domi- focused and initiated individually. In con- use of primary data. If investigators and
nated biomedical science for at least the trast, automated data-mining techniques funding agencies simply included archiv-
past century. It is no surprise that promi- permit the same database to be character- ing of samples and data into research
nent scientists have expressed their scep- ised in terms of significant large-scale projects together with the metadata
ticism—to say the least—about this point correlations that provide a rich array of needed to understand how the data were
of view (Allen, 2001). However, I believe market research data. More importantly, collected, the increased efficiency and
that framing the debate in terms of hypo- one can search on an ongoing basis for productivity that would accrue via data
theses versus informatics, with the subtext anomalous patterns of activity that raise recycling should allow them to recoup
of man versus machines, misses an import- the possibility of fraud; in fact, a commer- their investments many-fold. Admittedly,
ant point: currently available informatics cial database that does not carry out such most fields within biomedical science still
techniques can greatly assist traditional automated ‘data-driven discovery’ might lack an effective infrastructure for data
hypothesis-driven research, but only if even be considered negligent. I suggest archiving, sharing and collaboration. But
investigators slightly alter their practice to that research databases that are populated this only means that investigators need to
take advantage of this opportunity. become actively involved to make this a
and analysed according to specific
For example, informatics tools exist that hypotheses (Valencia, 2002) should also reality and not retreat in the belief that
can assist investigators in formulating, benefit from being monitored by compu- informatics represents a threat to hypothesis-
assessing and prioritising their hypotheses. driven research.
ter programs that search for unanticipated
Many hypotheses are, in fact, straight-
correlations and anomalous patterns.
forward extrapolations from current find-
ings: for example, knowing that apolipo-
One of the basic concepts of informat- References
ics is the ‘future value of primary data’. It
protein E4 is a risk factor for Alzheimer’s Allen, J.F. (2001) In silico veritas. Data-mining
is envisioned that the primary data—and, and automated discovery: the truth is in there.
disease, it is almost an automatic process
if possible, the actual samples—collected EMBO rep., 2, 542–544.
to ask whether E4 may also be a risk factor
for other neurological diseases or whether by one investigator will be archived and Koslow, S.H. (2000) Should the neuroscience
it interacts with other known risk factors; made available to other investigators, community make a paradigm shift to sharing
who may re-analyse the data from a primary data? Nat. Neurosci., 3, 863–865.
if one knows that RNA interference
different point of view, employ part of the Smalheiser, N.R. and Swanson, D.R. (1998) Using
occurs in plants and lower organisms, it is Arrowsmith: a computer-assisted approach
logical to wonder whether it may occur in data set not relevant to the first investigator,
to formulating and assessing scientific
mammals as well. Publicly available tools, pool data with other studies or conduct
hypotheses. Comput. Methods Programs
such as Arrowsmith (http://arrowsmith. new measurements on the original samples Biomed., 57, 149–153.
psych.uic.edu), do not attempt to bypass (Koslow, 2000). This is entirely compat- Swanson, D.R. and Smalheiser, N.R. (1997) An
scientists, but rather help them to integrate ible with hypothesis-driven research. interactive system for finding complementary
knowledge that is retrievable from the Indeed, a good hypothesis is not one that literatures: a stimulus to scientific discovery.
scientific literature in order to formulate is likely to be correct, but one that opens Artif. Intell., 91, 183–203.
up a new arena of investigation. Since this Valencia, A. (2002) Search and retrieve. Large-
hypotheses quickly, systematically and com-
arena cannot be fully perceived in scale data generation is becoming increasingly
prehensively (Swanson and Smalheiser, important in biological research. But how
1997; Smalheiser and Swanson, 1998). advance, one must be prepared to carry good are the tools to make sense of the data?
These tools can be thought of as analo- out new analyses not included in the EMBO rep., 3, 396–400.
gous to word processors: they do not original hypothesis. Yet, most current
write manuscripts, and they do not do experimental design simply ignores this
anything that people cannot do by them- fact: the investigator collects only those Neil R. Smalheiser
selves, but they do promise a new standard data that are deemed relevant to the
of efficiency and productivity. original hypothesis, and when new Neil R. Smalheiser is at the UIC Psychiatric Institute
Likewise, data mining of research data- information causes the original hypothesis in Chicago, IL.
bases need not be thought of as bypassing to change, the investigator must plan a E-mail: smalheiser@psych.uic.edu
the traditional hypothesis-driven analysis new experiment from scratch. DOI: 10.1093/embo-reports/kvf164
702 EMBO Reports vol. 3 | no. 8 | 2002 © 2002 European Molecular Biology Organization

Informatics

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Informatics

Încărcat de

Drepturi de autor:

Formate disponibile

EMBO

Informatics and hypothesis-driven research

S-ar putea să vă placă și