Sunteți pe pagina 1din 10

Practice No 9

General information
The first objective of this practice is to create a view to insert data in files that are created by
EpiInfo 2005. This is software useful particularly in epidemiology. (We have to point out that the
type of files created by this software is the same as that specific to Access.)
The major advantage of this software is the price (it is free-of-charge), due to the fact that it allows
the most part of data processing needed in medical research. Its strong point is the possibility to
create questionnaires (views), which do allow inserting only not erroneous data. The major
weakness is the low quality of the diagrams that are created.
A second objective of the practice is presenting how elementary statistical processing is done and
how diagrams are constructed by using this software.
During this practice:
a) You will create files of database type, and inside them questionnaires, and then you will
insert records;
b) You will start statistical processing of records, from simple examples.
Subjects
35: creating questionnaires in Epi Info
36: inserting data in Epi Info
37: primary statistical analysis of data from files
Software used during practice: Epi Info 2005

Practice no 9/2014-2015__________________________ UMF Carol Davila Medical Informatics & Biostatistics

Subject 35: Creating questionnaires in Epi Info


Epi Info is software for processing data organized in questionnaire form and presenting results in
reports. Initially used in epidemiology, Epi Info is successfully used also to process other
biomedical data; this software allows management and statistical processing similar to SAS, SPSS,
and is freeware. The starting page is as follows:

The main components of Epi Info:


Make View, which is a text editor de text, used to define data fields on one or several pages of a
View.
Enter Data, which lists questionnaires built with Make View, controls the process of inserting
data and allows searching for records.
Analyze Data, used to analyze data stored in files created not only with Epi Info, but also with
dBase, FoxPro, Excel etc. These files may contain lists, frequencies, tables, and diagrams, data
typical to epidemiological studies.
Create Maps, which is an instrument used to create epidemiological maps.
Create Reports, used to generate reports.
Other components of the software are as follows:
NutStat, used to register and evaluate measurements related to heights, weights, head and thorax
circumference for youngsters.
StatCalc, which is used to compute with data stored in tables.
Data Compare, used to identify differences between two tables.
Table to View, used to generate a view on the basis of the existent data table.
VisData, used to read data tables and change their properties.
Epi Lock, which codifies data to protect the access and to facilitate transmission and data backup
creation.
Compact, which compacts databases of (MS)Access type.

Practice no 9/2014-2015__________________________ UMF Carol Davila Medical Informatics & Biostatistics

Epi Info contains also:


A help system, containing information about what is offered,
A user manual, and
An interactive program to create epidemiological files.
To create questionnaires just use Make View, more precisely the command: FileNewFile
name (name of data base: name_EPI)OpenName the View (Chest1 as questionnaire
name)

On the left side three options referring to the management of questionnaire pages are presented
(Add Page inserting new pages at the end of the existent ones, Insert Page inserting new
pages between two existent, Delete Page deleting the current page). The command Program
allows programming some checking operations, to avoid errors that may appear when inserting
data.
Inserting new fields in the current page of the view (at right) is easy: a right-click over the position
where the new field should appear (the grid helps identifying this position). Then, in the dialog
box Field Definition the necessary characteristics of the field the name, type, dimensions, limits
of values, codes, legal values etc. are to be specified.
The dialog box Field Definition is presented below. Notice that the type of the field is, by default,
Text.

Practice no 9/2014-2015__________________________ UMF Carol Davila Medical Informatics & Biostatistics

The questionnaire (view) you create will contain 15 fields:


1. The personal ID (SSN). In the edit text Question or Prompt insert the text Social Security
Number:, in the group Field or Variable choose as Type the value Number, and as Pattern
the value ############# (i.e. 13 digits); finally, in the edit text Field Name insert the text
SSN.
(Let us mention here that the sequence SSN will stand for the name of the field, and the longer
sequence Social Security Number: will be used as label on the screen.)
2. Family name of patient will be of text type and will have at most 30 characters. This time in
the edit text Question or Prompt insert Family name:, as Type choose Text, Size will be
fixed at 30. Leave the name of the field that proposed in the edit text Field Name.
3. Last name of patient will be treated similarly.
4. The gender of patient will have two possible values: F or M. This time, in the edit text
Question or Prompt insert Gender:, as Type choose again Text, but in group Code Tables
press the button Legal Values, then the button Create New, and key in the legal values F, then
M. Leave also in this case the name of the field that proposed in the edit text Field Name.

5. Birth date of patient will be obviously a calendar date. To be able to correctly collect suchdata,
in the edit text Question or Prompt insert Birth date:, as Type choose Date,
and as Pattern choose DD-MM-YYYY. This time we have to insert in the edit text Field Name
the name of the field, for example BirthDate.
6. Admission date of patient will be treated similarly.
7. Edema will be a two possible values variable (Yes/No). This time, in the edit text Question or
Prompt insert Edema?, as Type choose Yes/No. In this case the name of the field, in the edit
text Field Name, will be modified into Oedema. Proceed similarly for the next three fields:
8. Pleurisy.
9. Palpitations.
10. Cough.
11. Temperature will be a numerical type variable and will take as values numbers between 35
and 43. To fix these limits just validate the check box Range and choose as Lower and Upper the
values 35, res. 43.
The last five fields (Edema, Pleurisy, Palpitations, Cough and Temperature) will be grouped into a
group named Symptoms. To create a group just select (by dragging the mouse over) the fields,
then select from the menu Insert the command Group.
The constructed page may look similar to the following:

Practice no 9/2014-2015__________________________ UMF Carol Davila Medical Informatics & Biostatistics

By use of the command Add Page (from the menu of the left side) add a new page in which insert
the last three fields:
12. Employed, of Yes/No type,
13. Number of children, of numeric type, values between 0 and 14,
14. Children, a list-table that will contain the name and the age of children. In the edit text
Question or Prompt insert Children:, and in group Code Tables press button Grid.
Now, in the combo box Enter Column Name for Grid insert the text Name of child, then press
Save Column. Do the same for the Age of child.
15. Age of patient at admission, of numeric type.
Obviously, if we know the birth date and the admission date, the age of patient should be
automatically computed! To do such operations the command Program from the left side should
be used!
As a result of the Program command, the screen will be organized in another way: the left side is
now entitled Check, the right side Check Commands. Choose Age as field for which the value is to
be computed, then command Assign, and try to insert the computing expression (see the figure
below)
=YEARS(DataNast, DataIntern)

Practice no 9/2014-2015__________________________ UMF Carol Davila Medical Informatics & Biostatistics

Probably you wont be successful. The reason should be clear: the fields AdmissionDate and
BirthDate are placed on another page, their values are not accessible for computing in this second
page! Try to move, as a solution, the field Age from page 2 on page 1. To do the move, appeal to
commands Cut/Paste from the menu Edit. After this move the problem is solved.
Subject 36: Inserting data in Epi Info
Data can be inserted directly from the menu File, using the command Enter Data. Other
possibilities: after leaving (closing) the module Make View, from the main page Epi Info either
select directly the module Enter Data, or command Enter Data from the menu Programs. In this
case the necessary view (and project) should be chosen. (The project named name_EPI.mdb is
that created previously.)
Insert at least four records (this implies filling in the data fields for at least four persons, on both
pages!). Save the file name_EPI.mdb in your personal older.
In the following figure the insertion of admission date, on the first page, for the third record, is
presented. Let us mention that for all labels associated to field values a standard font (MS Sans
Serif) of size 14 p.t. was selected.

Subject 37: Primary statistical analysis of data from files


To obtain statistical results the module Analyze Data is used. Inside this module several
commands are available in the command window at left. The results of the execution are presented
in the window at right upper part (entitled Analysis Output). Below, in the window entitled
Program Editor the previous executed commands are shown; here new commands can be keyed in,
then executed.
The commands at left are grouped. We distinguish the data processing commands (grouped in
Data), the commands that operate on the variables (grouped obviously in Variables), the
selection commands (grouped in Select/If), the elementary statistical analysis commands
(grouped in Statistics) etc.
Practice no 9/2014-2015__________________________ UMF Carol Davila Medical Informatics & Biostatistics

Read (Import) is the command that is used at the beginning of every new work session in the
module Analysis. The (imported) data are available for processing until a new Read (Import)
command is given. The default data format is Epi 2000, but this can be changed. It is possible to
import data from other types of files, such as different versions of Excel, of Fox Pro, Paradox or
even hypertext documents.
Epi Info is endowed with several projects to exemplify and self-learning; the simplest is
Sample.mdb.
Execute the command:
Read (Import)Data Formats: Epi 2000
Data Source: Sample.mdb
Show: Views
Views: viewBabyBloodPressure
You will see that the full command is:
READ 'C:\...\Epi_Info\Sample.mdb':viewBabyBloodPressure
List, from the group Statistics, is a command used to present, under a tabular form (either Grid
or HTML), of values of some variables from the active data file. Remember, the star * means
all. Thus, in the list Variables a star * means that all the values for all variables will be shown.
When only some variables are selected, then only the values of these variables will be shown. This
command allows also some changes of values from the active data file (Allow Updates).
As an example, let us show on screen only the values of variables (i.e. fields) Birthweight,
SystolicBlood, AgeInDays under a tabular form (Display Mode: Grid). Of course, we have to
select the fields in the list Variables.
The full command is:
LIST Birthweight SystolicBlood AgeInDays GRIDTABLE

Frequencies, also from group Statistics, is the first command to begin the analysis of a new
dataset; before more processing is done, we need to find out some basic information about the
distribution of data. This command is applied to both qualitative and quantitative variables; the
result is a synthetic table containing all values of variables that were specified in the list
Frequency of:, together with the absolute frequencies (number of apparitions), the percents and
Practice no 9/2014-2015__________________________ UMF Carol Davila Medical Informatics & Biostatistics

the cumulative percents for each value of the variable. Attached to the table a sketch of a bar
diagram represents the percents. In the figure below the effect of the command FREQ
Birthweight is represented:

Notice the 95% confidence limits, for each value of the variable, are presented. Read these as
follows: we are 95% confident that the percent of newborn that weight 90 oz is situated
somewhere between 0.2% and 30.2%. This result is based of 1 in 16 recorded cases!
When a stratifying variable is specified, several frequency tables, one for each stratum, are
obtained.
The command Means leads to values of some center and spread statistics: the Mean, the Median,
the (25% and 75%) quartiles, the Minimum and Maximum values, the mode (i.e. the values with
highest frequency, the Variance and standard deviation (Std Dev). Obviously,
Obs is the total number of values of the variable, and Total is the sum of all values. In the figure
below the effect of the command MEANS AgeInDays is represented:

Practice no 9/2014-2015__________________________ UMF Carol Davila Medical Informatics & Biostatistics

The command Means may be used only for quantitative variables. For qualitative variables we
limit ourselves to the command Frequencies.
The command Select, from the group Select/If, is used to select those records that satisfy a
certain criterion. After selection only these records are processed, thus the command Select
remains active until it is cancelled (Cancel select).
As an example, let us select the newborn children with the age (expressed in days) greater than
3. In the dialog Select Criteria: insert the expression AgeInDays>3. Then, after a List
command the following result is obtained:
The last two columns, entitled UniqueKey and RecStatus, are special fields for tables created
with Epi Info. In the field RecStatus the status of each record is kept. Namely, for records that are
marked as deleted the value here is 0; for the others the value is 1. The field UniqueKey is used
to automatically count the records.

The command Header, from group Output, may be used to insert a text as title for the results,
also the rendering characteristics may be specified (font, size, etc.). An example: HEADER 2
"Results for newborn children" (BOLD) TEXTFONT +4
The command Type, from the same group Output, is analogous to the previous one; obviously,
it is used to insert a string of characters or a content of a text-file in the output stream (which, by
default, is the monitor, or is that specified by the command RouteOut).
The command RouteOut redirects the results (the output stream) toward the contents of a file
specified by name; this process is ended by a command CloseOut. The results obtained by
commands such as Frequencies, List etc. will be inserted in the content of the file whose name
was previously specified by a command RouteOut.
Open (from Sample.mdb), by using the command Read (Import), the table
viewEstriolAndBirthweight. Use the command RouteOut to redirect the obtained results toward
Practice no 9/2014-2015__________________________ UMF Carol Davila Medical Informatics & Biostatistics

the file named name_EBW (obviously, in the folder C:\Anul_2). Notice the extension of this file.
Insert the title The estriol and the weight at birth by help of the command Header, activating the
options Bold and Italic and choosing the font size 7. Insert then the text Content of the file
by the command Type activating again the options Bold and Italic, but the font size 5. Use
command List to see the values of the two variables Birthweight and Estriol, choosing the
alternative Web (HTML). Insert a new text: Statistical processing keeping the same values of
parameters as above. Using the command Means compute the statistics for the variable
Birthweight, then for Estriol. Close the results file by using RouteOut.
Probably we all agree that information presented diagrammatically is easier to be transmitted and
understood. The most used diagrams are those with rectangles (Bar or Rotated Bar), the pie
charts and the histograms. The first two are adequate to present information about variables that
have a small number of values (especially qualitative).
The last type is adequate to summarize variables that have a large number of numeric values (as
is the case of weights in grams, or of heights in centimeters), of course, after grouping the values
into several intervals.
The command Graph, from group Statistics, is used to represent diagrammatically variables
from the active data file. As an example, open (from the source Sample.mdb) by help of the
command Read (Import) the table viewSmoke. Then, using the command Graph, present the
values of the variable Sex in a bar chart. Thus, in the dialog box of the command, select Bar in
the list Graph Type: and Sex in X-AXIS Main_Variable(s):.
In Y-AXIS Show values of: keep the default value Count. The title of the diagram will be:
Distribution of smokers by sex | created by ... (your name). After seeing the diagram on screen,
export it (FileExport...) in format jpg then rename the obtained file as
name_DIAGSX.jpg by help of the command Export Destination: File Browse.
Similarly for the variable Race. However, for this the type Rotated Bar is selected.
Then, for the variable Marital select the type Pie. Save the two diagrams, with adequate titles,
into the files named name_DIAGRC.jpg resp. name_DIAGMR.jpg. For the quantitative variable
Age the adequate type will be Histogram, for which the length of the grouping interval will be
fixed at 10, and the first value will be. Save the obtained diagram in the file
name_DIAGAGE.jpg. Which title is adequate?
As another example, open the table viewOswego from the project Sample.mdb. Redirect the
results toward the file name_OSW. Every command should be accompanied by genuine
explanations. For the variable Age compute the average for the healthy persons (criterion
ill=No) and separately for the ill persons (obviously, ill=Yes).
Represent diagrammatically the variables Age, Sex, Ill, save all diagrams in format JPG and insert
them, accompanied by your comments about what is represented in the diagrams, in a document
file named name_DIAGOSWEGO.doc.
Create a questionnaire intended to be used to insert only a few data. Use the database
name_MEDPL.mdb in which you should create the view Quest2. Prepare this view for
inserting the following data:
a) Code of patient (numeric, starting with 1);
b) Gender (legal values M or F);
c) Start date of the treatment procedure;
d) Type of treatment (legal values only genuine pill or placebo);
e) Evaluation date;
f) Result (legal values totally cured, partially cured, not cured).
Insert now (module Enter Data) at least 40 records, trying to balance the numbers according to the
type of treatment (around 20 with value genuine pill, around other 20 with value placebo).

Practice no 9/2014-2015__________________________ UMF Carol Davila Medical Informatics & Biostatistics

10

S-ar putea să vă placă și