Sunteți pe pagina 1din 7

1.

2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.

Medical Informatics Statistics Module Prac 1


Open the Excel file data1.xls In the file you will find data from an study on bone mineral
density in pre-menopausal women who regularly run the Comrades marathon.
(For those of you who are not experienced in excel, please use the help option. You will learn
far more if you find the solutions for yourself, rather than asking a friend.)
Copy the block of data and paste it into cell H1.
Working on the copied data, draw a line along the bottom of the cells H21 to M21.
In cell G22 type in Average
In cell I22 enter the formula to calculate the average for the data in cells I2 to I21
Set the format of the cell containing the average to allow only 1 decimal place
In the next row type in SD and enter the formula to calculate the standard deviation for the
data in cells I2 to I21
Save your work.
In the next row type in max and enter the formula to find the maximum value for the data in
cells I2 to I21
In the next row type in min and enter the formula to find the minimum value for the data in
cells I2 to I21
In the next row type in median and enter the formula to find the median for the data in cells I2
to I21
In the next row type in n = and enter the formula to find the number of data sets for the data in
cells I2 to I21
Save your work.
In the next row type in SE and enter the formula to find the standard error of the mean for the
data in cells I2 to I21
In the next row type in 95% of samples and enter the formula to find the range above and
below the mean, in which 95% of the observations will occur, for the data in cells I2 to I21
Save your work.
In the next row type in 95% CI and enter the formula to find the 95% confidence intervals for
the data in cells I2 to I21
In the next row type in first quartile and enter the formula to find the first quartile for the data
in cells I2 to I21
In the next row type in second quartile and enter the formula to find the second quartile for
the data in cells I2 to I21
Save your work.
In the next row type in 25th percentile and enter the formula to find the percentile for the data
in cells I2 to I21
In the next row type in 50th percentile and enter the formula to find the 50th percentile for the
data in cells I2 to I21
In the next row type in 37 and enter the formula to count all the occurrences of 37 in the data
in cells I2 to I21
Copy all the formulae that you have generated so that you get the results for weight height
and body mass index (BKI) in columns J,K and L. (Note: You should not have to re-enter any
of the formulae.)
Save your work.
In Row 1, column M type in I + J
In Row 2, column M enter the formula to calculate the sum of the subjects Age and Weight.
Copy this formula for all the subjects Copy the formulae to calculate all the steps from 7 to
25.
Copy all the data in worksheet 1, into a new worksheet.
Save your work.
Rename the new worksheet, solutions
In the solutions worksheet, copy the original data from data and paste it into cell P1
Sort the data for the Indian and White subjects so that all of the data for the Indian subjects is
in cells P2 to T4 using the sort command.

34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.

Repeat all of the calculations that you have performed, for Indian (I) subjects as a group in
cells O37 to T50, and White (W) subjects as a separate group in cells O22 to T35. Copy the
headings into the appropriate places in column O.
In row 1, column U, type in, Correct. Fact. and in row 1 column V type in 2.
Save your work.
In row 2 column U, type in the formula that will add cell V1 to each of the BMI values, ie,
cell U1 is a constant in the addition process.
In row 1, column W, type in BMI 2
In row 2, column W enter the formula to calculate the body mass index (BMI), that is the
weight divided by the height squared.
Copy the formula to get the BMI for all the subjects.
Save your work.
Reduce the number of decimal points for column V to 2.
In cell W22, type Average
Calculate the average for BMI2 in cell X22
In cell Y23 type SE
Save your work.
In cell Y21, calculate the standard error of the mean for BMI2 based on the formula SE =
standard deviation divided by the square root of the number of samples (n=20).
Copy the original data from the data worksheet and paste it into worksheet 3
Rename worksheet 3, Sorted
Sort the data based on age, so that the ages are shown in ascending order. Ensure that the
other data remains correct for each subject
Save your work.
Draw a bar graph of the weight of each subject.
Label the Y axis Weight (kg) and the X axis Subject.
Change the format of the legend HEIGHT so that it is placed at the bottom of the graph.
Remove the Title Weight.
Change the scale of the y axis to range from 0 kg to 80 kg.
Change the gap between the bars from 150 to 100.
Remove the grid lines.
Change the format of the plot area so that there is no fill.
Change the colour of the bars to grey with a black outline.
Change the font of the graph to Times New Roman and the Font size to 10.
Remove the border that continues from the x and y axes.
Copy the graph and paste it next the first graph.
Add the BMI of each subject to the graph by adding another series.
Remove the Y axis title weight (kg).
Change the legend Weight to Weight (kg).
Save your work
Copy and paste the chart.
Convert the copied chart from a bar graph to a line graph with each point showing on the line
graph.
Save your work
Draw a scatterplot of the relationship of height on the x axis and weight on the y axis of each
subject .
Remove the title Height and the Legend Height
Remove the grid lines
Reformat the plot area so that there is no border or fill.
Change the scale of the X axis to 1.5 to 1.75 and the y axis to 40 to 70.
Save your work.
E-mail your completed worksheets to me at esterhuizent@nu.ac.za giving your name and
student number in the body of the text.

Medical Informatics Statistics Module Prac 2


1.
2.
3.
4.
5.

6.
7.
8.
9.
10.
11.
12.

13.
14.
15.
16.

17.
18.
19.
20.
21.

22.
23.
24.

Use the data provided for practical 2.


Remember to save your work regularly because excel is not set to do regular saves for you.
Copy the data worksheet to worksheet 2 and rename it results.
Underline the data provided, from B21 to F21
In A22 is the heading Percentile. In columns B22 to B62 are increments of 0.25 from 0 to 100.
These will assist you in the next task which is to calculate the percentiles in 2.5% increments from
0 100% for Age. Do this in cells C22 to C62. Think carefully about this. It is possible to use the
data in column B in your formula in column C, so that you only have to enter the formula once in
column C and then copy it to all the other cells.
Copy the age column B1:B21 and paste it in G22
Sort the ages into ascending order
In F30 type in Median
From the rearranged ages, work out the median without using the median formula command.
Enter your answer in F31
You need to work out the frequency of the ages in 2 year increments. If you copied the worksheet
correctly you should have the increments in cells K22 to K33. In K22 is 30-31, in K24 and in
K33, 52-53.
Work out the frequency of people within the age ranges. One way would be to work out the
cumulative frequency which we can do using the COUNTIF command, use Help to get the
syntax correct. In L21 enter Cum Freq. In L22 enter the command to give you the count for
those less than 32 years of age, ie aged 30 and 31.
Repeat this for all the age ranges.
In M21, type in Frequency and enter the frequencies, based on the cumulative frequencies,
calculate the frequencies for each of your ranges in cells N22 to N33.
In cell P21 type in Frequency
Excel has a Function command that allows you to more easily calculate the Frequency. (You will
need to use the Help to understand the command, and you will probably have to do the example
that they provide.) As you will see in the Help you have to define two arrays. The first array will
be Age, from G23 to G42 and the second array is the Bin. The Bin defines the ranges that will be
used, assigning the correct number of people to each interval. To assist you will find the bin array
in cells O22 to O33. What the Bin array is saying is that the formula will return a value for the
number of people aged 31 or less in the first row, the number of people aged 32 and 33 in the
second row etc. Enter the command in cell P22 and enter. Then highlight cells P22 to P33, press
function key F2 and then the control shift and enter keys together. If you have done it correctly,
you will get the same numbers under each of your Frequency columns , column M and P.
Draw a histogram of the frequencies for the various age categories, starting the chart in K34.
As this is a histogram and the data are continuous, there should not be gap between the bars in the
histogram. Remove the gap between the bars. Right click on one of the bars go to format data
series and options.
From the Histogram determine the Mode.
Change the colour of the mode bar to yellow.
There is another way of drawing a histogram in excel using the histogram function. If you cant
find the Histogram function under the Functions icon options, go to Tools, Data Analysis and you
will find it there. If you are working on your own computer and you dont get the option Data
Analysis under Tools, then got to Tools, Add Ins and install the Data Analysis pack. Once again
the first array is the Age column, G23 to G42, the Bins array is O22 to O33, for the output range
enter T21. Choose to have a chart and dont select pareto or cumulative percentage.
Now generate a cumulative percentage based on the ranges for age. This time choose the
cumulative frequency and the chart option and the output range as T40.
In F43 enter mean and calculate the mean age in G43
In F44 enter median and calculate the median for age in G44. Is it the same as the your answer
inF31. If not, why not?

25. In F45 enter mode and calculate the mode in G45. Why is the mode different to the category that
you highlighted in yellow in your histogram?
26. Now we will work out the standard deviation. In H22 enter x-mean and in H23 (xmean)Squared.
27. Calculate x-mean in cells H23-H42 for the ages in cells G23-G42. You should be able to do this
by only entering the formula once and then copying it to the remaining cells.
28. Similarly calculate (x-mean)squared in column I.
29. In H43 enter sum and in I43 calculate the sum of the square
30. In H44 enter n-1 and calculate n-1 in I44
31. In H45 enter Sum/n-1 and calculate this in I45
32. Enter SqRoot in H46 and calculate the square root of I45 in I46.
33. In F46 enter SD and calculate the standard deviation of the ages in G23-G42 using the standard
deviation function.
34. If your standard deviation in G46 does not equal the number you in I46, you have made a mistake
somewhere. Go back and check all of your steps.
35. The mean plus and minus one standard deviation incorporates 68.25% of the population. From the
percentile table that you made, highlight in yellow, the percentiles above and below the mean,
which are nearest to one standard deviation.
36. In F47 type in SD*1.96 and in G47 calculate 1.96 times the standard deviation. This gives you
the number which you would add or subtract from the mean to determine the range within which
95% of the data lies if it is normally distributed.
37. Now highlight it grey the percentiles that denote the range in which 95% of the population falls.
38. So far you have entered the formula for a range of descriptive stats, including mean, standard
deviation, standard error of the mean, maximum, minimum, 95% confidence intervals. These are
descriptive statistics. The data analysis option will do many of these for you automatically. From
the tools menu choose, data analysis, descriptive stats. Click on help, it is self explanatory. For the
input range highlight g22 to g42. The data are grouped by columns with the label in the first row,
enter F50 as the output range, choose summary statistics and tick the confidence level for the
mean.
39. Make column F wider so that you can read all the headings in for the stats that have been done
automatically for you. Compare the results that you have obtained.
40. E-mail your completed worksheets to me at esterhuizent@nu.ac.za giving your name and student
number in the body of the text.
Practical 3
The data required for this practical session are in an Excel file called prac123.xls.
Exercise 1
The marks scored by males and females in a recent test are given in the sheet called marks1. Is there
a significant difference in marks based on gender? Using InStat to determine your answer:
1. State the goal and the data entry format that you specified on the first InStat screen. (2)
2. What appropriate tests did InStat offer for your particular problem? (1)
3. State the 95% confidence intervals for the measurements at the various sites given on the data
entry page. (1)
4. Based on these confidence intervals would you expect there to be a significant difference between
males and females? Explain how you reached your conclusion. (3)
5. What is a confidence interval? (1)
6. Are your data paired or unpaired? (1)
7. Did you assume your data to be Gaussian in distribution? Explain your answer. (3)
8. Did you choose a one tailed or two tailed test? Why? (3)
9. What tests did InStat offer you? (1)
10. Was there a significant difference in the marks of the males and the females? (1)
11. How were the degrees of freedom calculated? (1)
12. What is the F test? (1)

13. How is the P value for the F test ascertained? (1)

(20)

Exercise 2
A class took a test before a lecture (pre-test) and the same test after the lecture (post test). The marks
are given in the sheet Marks 2. Did the marks improve after the test, and was the improvement
significant?
1.
2.
3.
4.
5.
6.
7.
8.

State the goal and the data entry format that you specified on the first InStat screen. (2)
What appropriate tests did InStat offer for your particular problem? (1)
State the 95% confidence intervals. Is your data paired or unpaired? (2)
Did you assume your data to be Gaussian in distribution? Explain your answer. (3)
Did you choose a one tailed or two tailed test? (1)
What tests did InStat offer you? (1)
Was there a significant difference in the pre and post test marks? (1)
How were the degrees of freedom calculated? (1) (12)

Exercise 3
Many patients present at King Edward with poor circulation in their legs. One of the tests of
circulation is the measurement of transcutaneous oxygen pressure (TcpO 2). The data in the sheet
called gangrene were obtained from these patients. Measurements were performed at the various
sites at which a subsequent amputation might be performed. These are the middle of the foot (Foot),
10 cm below the knee (Bka), 10 cm above the knee (Aka) and at a reference point on the chest. The
patients were categorised according to their presenting symptom, which was gangrene, ulcer, pregangrene or claudication. The total summarised data for those patients who presented with gangrene
of the foot are presented below. The researcher wants the following question answered:
Is the fall in the TcpO2 measured from the chest to the foot significant?
Using InStat to determine your answer:
State the goal and the data entry format that you specified on the first InStat screen. (2)
1. What appropriate tests did InStat offer for your particular problem? (1)
2. Based on the 95% confidence intervals would you expect there to be a significant difference
between the values derived from the foot and the above knee sites? Explain how you reached your
conclusion. (3)
3. Which columns did you choose to test, and why? (3)
4. After you had chosen the columns, what test options did InStat offer? (1)
5. Are your data paired? Explain your answer. (2)
6. Are your data sampled from Gaussian distributions? Explain your answer. (3)
7. Are the data for the Foot possibly skewed? Explain your answer. (2)
8. If InStat did not offer you the option of choosing whether your data are Gaussian or not, explain
why this may have happened? (1)
9. Is there a significant difference between the sites at which the TcpO 2 was measure? (2)
10. If you are offered the option of multiple comparison post hoc testing, choose the Tukey test. What
further information do you derive from performing these tests? (5) (25)
The total data were then further categorised on whether the patient was diabetic or not:
Is there any difference between in diabetics and non diabetics in theTcpO 2 pressures at the different
levels?
1. Based on the 95% confidence limits do you expect there to be a difference? (1)
2. What test did you perform? (1)
3. Which data sets did you include in your analysis? (1)
4. Should you have performed a parametric or a non parametric test? (2)
5. Does diabetes influence the TcpO2 values at the different sites in patients who present with a
gangrenous foot? (2) (7)

Exercise 4
Some of the raw data for the TcpO2 values at the Foot and at the AKA level for patients presenting
with gangrene are given in sheet Raw data. Are the results different at the two levels?
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.

State the goal and the data entry format that you specified on the first InStat screen. (2)
What appropriate tests did InStat offer for your particular problem? (1)
State the 95% confidence intervals for the measurements given on the data entry page. (1)
Based on these confidence intervals would you expect there to be a significant difference between
the values derived from the foot and the above knee sites? Explain how you reached your
conclusion. (1)
Is your data paired? Explain your answer. (2)
Are your data sampled from Gaussian distributions? Explain your answer. (3)
Did you choose to perform parametric or non-parametric test? Explain your choice. (2)
What tests does InStat offer you? (1)
Is there a significant difference between the sites at which the TcpO 2 was measure? (2)
Is the result you obtained different in any way to that obtained on the averaged data provided in
exercise 3? State the difference if any. (2)
Which method was more appropriate, that used in exercise 3 or exercise 4? Explain why. (2) (19)

Exercise 5
You will probably have noticed that the data sets in Exercise 1 and 2 were the same. Was the result the
same? If not, list the differences and suggest why there is a difference? (5)
Exercise 6
A lecturer keeps a class attendance register and compares the marks in the exams with the attendance
at lectures. The data are given in the worksheet lectures. Is there a relationship between the number
of lectures attended and the marks earned?
1. What options did you specify on the first InStat screen? (2)
2. Which data set did you enter under X and which under Y1. What was your reasoning in
making this decision? (2)
3. What was the regression equation that you derived? (1)
4. What was the correlation coefficient? (1)
5. What can you infer from r2 ? (1)
6. What do you understand the result of the statement Test: Is the slope significantly different from
zero? to mean? (1)
7. Compare the result of the correlation coefficient derived from the determining the correlation
coefficient and that derived from determining the regression equation. (1) (9)
Exercise 7
The results of a drug trial Trial 1 are given in the worksheet drug trial. There were more survivors
on drug A than on Drug B. Is this difference significant?
1.
2.
3.
4.

Which goal and data entry did you choose on the first screen? (2)
Which test did you select? (1)
What was the result of the test? (1)
Can you say that Drug A better than Drug B? (1) (5)

Exercise 8
The second set of data Trial 2 expresses the data from trial 1 as a percentage.
1. Is the difference between drug A and drug B, based on the percentage significant? (2)
2. Is drug A better than drug B? Which data should you report on, trial 1 or trial 2? (2)

3. If trial 1 had been extended to increase the sample size to 100 subjects treated with drug A and
100 treated with drug B, and the outcome of the trial is represented in trial 2, is drug A better than
drug B? (1) (5)
Practical 3
Using the data and your results from practical 1
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.

Draw a bar graph of the weight of each subject.


Label the Y axis Weight (kg) and the X axis Subject.
Change the format of the legend HEIGHT so that it is placed at the bottom of the graph.
Remove the Title Weight.
Change the scale of the y axis to range from 0 kg to 80 kg.
Change the gap between the bars from 150 to 100.
Remove the grid lines.
Change the format of the plot area so that there is no fill.
Change the colour of the bars to grey with a black outline.
Change the font of the graph to Times New Roman and the Font size to 10.
Remove the border that continues from the x and y axes.
Copy the graph and paste it next the first graph.
Add the BMI of each subject to the graph by adding another series.
Remove the Y axis title weight (kg).
Change the legend Weight to Weight (kg).
Save your work
Copy and paste the chart.
Convert the copied chart from a bar graph to a line graph with each point showing on the line
graph.
Save your work
Draw a scatterplot of the relationship of height on the x axis and weight on the y axis of each
subject .
Remove the title Height and the Legend Height
Remove the grid lines
Reformat the plot area so that there is no border or fill.
Change the scale of the X axis to 1.5 to 1.75 and the y axis to 40 to 70.
Save your work.
E-mail your completed worksheets to me at esterhuizent@nu.ac.za giving your name and student
number in the body of the text.

S-ar putea să vă placă și