Sunteți pe pagina 1din 157

BIOSTASTICS QUESTION-ANSWERS FOR UNDERGRADUATE AND POSTGRADUATE NURSES

Mrs. Khanapurkar Usha M.Sc. Nursing

PREFACE
It is designed to serve as a valid and reliable guidance in biostatistics and ready to use question- answers. To prepare this book the syllabus of M.Sc. Nursing of MUHS Nasik has been followed and question patterns of other universities referred. This book is prepared under continuous motivation of Mrs.Sugathan Tressiumma, Principal of Bombay Hospital College of nursing, Mumbai. I am deeply thankful to Mr. Khanapurkar Satish, my husband, philosopher and guide for giving idea and help to prepare manuscript of book. I would like to extend my gratitude to my colleagues and friends who have been a source of inspiration to me in writing and completing this book. Without help and encouragement of my son Dr. Khanapurkar Aditya and other family members it would have been impossible to prepare it successfully. And nothing seems possible to man without the guiding light of the Almighty and I am indebted to Him for the pathway He has shown to me. Everybody is requested to see the limitations of the book and any suggestions are heartily welcomed looking forward for a better outcome next time. Author Mrs. Usha Khanapurkar M.Sc. Nursing

INDEX
Sr. Content No. 1 Multiple Choice Questions 2 Long and Short answer Questions 3 Statistical Exercise 4 Important Statistical Formulas Page No. 4 43 125 137

MULTIPLE CHICE QUESTIONS


(With correct answer underlined)

WHEN VARIABLES ARE NOT MEASURABLE. WHICH OF THE FOLLOWING IS USED TO BE REPRESENTED? a) Nominal b) Ordinal c) Interval d) Ratio

WHICH OF THE FOLLOWING IS NOT A DISCRETE VARIABLE? a) Skin color b) Blood pressure c) Weight d) Boys in the class

WHICH OF THE FOLLOWING IS NOT A CONTINUOUS VARIABLE? a) Parity


5

b) Height c) Weight d) Temperature 4 ALL OF THE FOLLOWING ARE EXAMPLES OF NOMINAL SCALE EXCEPT a) Sex b) Socio-economic status c) Marital status d) Religion 5 WHEN VARIABLES ARE NOT MEASURABLE WHICH OF THE FOLLOWING IS USED TO BE REPRESENTED a) Nominal b) Ordinal c) Interval
6

d) Ratio 6 THE SCALE USED IN MEASURING PRESSURE OR ABSENCE OF RISK FACTURE IS a) Nominal b) Ordinal c) Interval d) Ratio 7 WHEN FREQUENCY OF A VARIABLE IS GIVEN IN FORM OF MILD, MODERATE & SEVERE, THE DATA SCALE USED IS a) Nominal b) Ordinal c) Interval d) Ratio 8 THE SCALE USED IN MEASURING SERUM CREATININ (mg/dl) IS
7

a) Nominal b) Ordinal c) Interval d) Ratio 9 WHICH OF THE FOLLOWING IS A RATIO SCALE? a) Age b) Rank in class c) Temperature d) Height 10 WHICH OF THE FOLLOWING TYPE OF DIAGRAMS CAN BE USED TO FIND OUT THE RELATIONSHIP BETWEEN TWO VARIABLES? a) Pictogram b) Bar diagram c) Histogram
8

d) Scatter diagram 11 A PICTORIAL DIAGRAM OF FREQUENCY DISTRIBUTION IS DENOTED BY a) Line chart b) Bar chart c) Histogram d) Pie chart 12 LOW BIRTH WEIGHT (L B W) STATISTICS OF A HOSPITAL IS BEST SHOWN BY a) Bar chart b) Histogram c) Frequency polygon d) Pie chart 13 LINE CHART SHOWS a) Trend of an event with passage of time
9

b) Arithmetic mean c) Most commonly occurring values d) Difference between highest & lowest values 14 THE PERCENTAGE AGE IS DEPICTED BY a) Bar diagram b) Histogram c) Line diagram d) Pie diagram 15 A GRAPH OF A CUMULATIVE FREQUENCY DISTRIBUTION IS CALLED a) Line diagram b) Histogram c) Ogive d) None of the above
10

16

SEX COMPOSITION IS DEMONSTRATED IN WHICH OF THE FOLLOWING a) Pie chart b) Component bar chart c) Age pyramid d) Multiple bar chart

17

30 BABIES WERE BORN IN A HOSPITAL. 10 WERE LESS THAN 2.5 KG AND 20 WERE GREATER THAN 2.5 KG, THE AVERAGE IS a) Arithmetic mean b) Median c) Geometric mean d) Mode

18

VARIABLES ARE ARRANGED EITHER IN ASCENDING OR DESCENDING


11

ORDER OF MAGNITUDES TO DETERMINE a) Mean b) Mode c) Median d) Range 19 WHAT IS THE MODE IN STATISTIC? a) Arithmetic average b) Difference between the highest and lowest value c) Value of middle observation d) Most commonly occurring value 20 THE DISTANCE BETWEEN THE LOWEST AND HIGHEST VALUES OBSERVED a) Range b) Dispersion
12

c) Standard error d) Mean deviation 21 MEDIAN IS ALMOST EQUIVALENT a) 25th percentile b) 50th percentile c) 75th percentile d) 10th percentile 22 WHILE OF THE FOLLOWING IS NOT A MEASURE OF CENTRAL TENDENCIES? a) Mean deviation b) P50 c) Mode d) Geometric

23

IN MICROBIOLOGY, THE AVERAGE DILUTION TITERS IS COMPUTED BY

13

a) Arithmetic mean b) Geometric mean c) Harmonic mean d) Median 24 IF MEAN, MEDIAN & MODE ARE 10, 16 AND 22 RESPECTIVELY, THE DISTRIBUTION IS a) Symmetric b) Normal c) Positively skewed d) Negatively skewed 25 MEDIAN IS PREFERRED TO MEAN WHEN a) Population is large enough b) Low variance is seen c) Skewed distribution is seen d) None of the above
14

26

CENSUS IN INDIA IS DONE a) Every year b) Every 10 year c) 6th monthly d) As and when noted

27

SAMPLE REGISTRATION SYSTEM IS DONE IN ONCE IN A a) 6 months b) 1 year c) 2years d) 5years

28

POPULATION COUNT IS TAKEN ON a) 1st January b) 1st April c) 1st July d) 1st August

29

BEST METHOD OF VARIABILITY IS


15

a) Mean b) Median c) Range d) Standard deviation 30 WHICH MEASURE IS NOT DEPENDENT ON THE VALUE OF EACH SCORE? a) Mean b) Variance c) Range d) Standard deviation 31 ALL ARE MEASURES OF DISPERSION EXCEPT a) Range b) Median c) Mean deviation d) Standard deviation 32 THE RATIO OF 3:1 IS ACHIEVED BY
16

a) Quartile b) Median c) 3rd quartile d) Range 33 THE PEAKEDNESS OF A FREQUENCY DISTRIBUTION CURVED IS KNOWN AS a) Mode b) Kurtosis c) Skew-ness d) None of the above 34 THE SEMI-INTER-QUARTILE RANGE IS MOST CLOSELY RELATED TO a) Mean b) Median c) Mode d) None of the above

17

35

MEAN OF400 VARIABLES IS 100 & STANDARD DEVIATION IS 8. WHAT WILL BE STANDARD DEVIATION? a) 0.4 b) 1.0 c) 2.0 d) 4.0

36

STANDARD ERROR IS A MEASURE OF a) Conceptual error b) Sampling error c) Instrumental error d) Observer error

37

pq n

INDICATES

a) Standard error of mean b) Difference between proportions c) Standard error of proportion


18

d) Difference of two probabilities 38 AS SAMPLE SIZE INCREASE, STANDARD DEVIATION OF SAMPLE MEAN a. Decreases b. Increases c. Remains the same d. Approaches to 39 THE EFFECT SAMPLE SIZE DEPENDS ON a) Power of test b) Standard of deviation c) Level of significance d) All of the above 40 FOR QUANTITATIVE DATA, THE SAMPLE SIZE (N) IS COMPUTED BY THE FORMULA,
19

a) 4 pq/E2 b) 22/E c) 42/E2 d) 4pq /E 41 FOR QUALITATIVE DATA, THE SAMPLE SIZE (N) IS COMPUTED BY, a) 4pq/L2 b) 4 p2/n c) Pq/L d) 4pq/ 2 42 IF PROBABILITY OF BEING RH NEGATIVE IS 1/1, THEN OF BEING RH POSITIVE WILL BE a) 1/10 b) 9/10 c) d) 1
20

43

PROBABILITY VARIES BETWEEN a) 0 and 2 b) 0 and 100 c) -1 and +1 d) 0 and 1

44

IF THE PROBABILITY OF NEW BORN BEING FEMALE IS 0.5, THEN THE PROBABILITY OF NEW BORN BEING MALE IS a) 0.5 b) 1 c) 0 d) None of the above

45

THE PROBABILITY OF A VALUE FOLLOWING OUTSIDE 95% CONFIDENCE LIMIT IS a) 1 in 5


21

b) 1 in 15 c) 1 in 20 d) 1 in 30 46 THE WEIGHT OF EACH OF THE 10 BABIES BORN IN A HOSPITAL WAS 2.5 KG CONSIDERING THE DISTRIBUTION WAS A NORMAL CURVE. THE STANDARD DEVIATION IS a) 1 b) 0 c) 2.5 d) 10 47 CONFIDENCE LIMITS CAN BE CALCULATED BY USING a) Means and range b) Means and standard deviation c) Median and range
22

d) Median and standard deviation 48 FOR 95% CONFIDENCE LIMITS, TRUE IS a) Reduces 95% of values b) 1.96 of standard error of mean c) 2.58 of standard error of mean d) 5% of standard error of mean 49 TRUE ABOUT NORMAL DISTRIBUTION IS ALL, EXCEPT a) Mean median and mode all coincide b) Standard deviation is one c) Mean of the curve is 100 d) Total area of the curve is one 50 WHICH IS TRUE ABOUT NORMAL DISTRIBUTION CURVE? a) Standard deviation-1, mean-0 b) Curve skews towards left
23

c) Curve skews towards right d) Standard deviation-0, mean-1 51 IN A NORMAL CURVE, TRUE IS a) Mean = Standard deviation b) Mean = Median c) Mean = 2 Standard deviation d) None of the above 52 AREA BETWEEN ONE STANDARD DEVIATION ON EITHER SIDE OF MEAN IN NORMAL DISTRIBUTION CURVE IS a) 60% b) 68% c) 95% d) 99% 53 NORMAL DISTRIBUTION CURVE DEPENDS UPON a) Mean and sample
24

b) Mean and median c) Mean and standard deviation d) Median and standard error 54 IF THE MEAN IS 220 AND THE STANDARD ERROR IS 10, THE 95% CONFIDENCE LIMITS WOULD BE a) 210 to 330 b) 200 to 240 c) 215 to 225 d) 220 to 0.2 55 NORMAL CURVE IS a) Linear b) Symmetrical c) Curvilinear d) Parabolic 56 A NON-SYMMETRICAL FREQUENCY DISTRIBUTION IS KNOWN AS
25

a) Normal distribution b) Skewed distribution c) Cumulative frequency distribution d) None of the above 57 WHEN A NULL HYPOTHESIS IS ACCEPTED, IT IS POSSIBLE THAT a) A correct decision has been made b) A type I error has been made c) Both (a)and (b) d) None of the above 58 A TYPE I ERROR IS COMMITTED WHEN A a) False H0 is accepted b) True H0 is rejected c) True HA is accepted d) None of the above

26

59

A TYPE II ERROR IS COMMITTED WHEN A a) False H0 is accepted b) True H0 is rejected c) True HA is accepted d) None of the above

60

WHICH IS A NON-DIRECTIONAL ALTERNATIVE HYPOTHESIS? a) < 100 b) > 100 c) 100 d) none of the above

61

ANOTHER TERM FOR A TYPE II ERROR IS a) error b) error c) error


27

d) none of the above 62 P VALUE IS THE PROBABILITY OF a) Rejecting H0 when it is true b) Accepting H0 when it is true c) Rejecting H0 when it is false d) None of the above 63 THE CO-EFFICIENT OF CORRELATION BETWEEN HEIGHT AND WEIGHT IS 2.2. WHAT DOES IT MEAN? a) Positive correlation b) Negative correlation c) No association d) Calculation of coefficient of correlation is wrong 64 CORRELATION COEFFICIENT TENDS TO LIE BETWEEN a) Zero to +1
28

b) -1 to Zero c) -1 to +1 d) -2 to +2 65 NOT TRUE ABOUT CORRELATION IS a) Tells about the risk of the diseases b) -1 correlation shows linear relationship c) Tells association between two variables d) Does not tell about causation 66 THE SQUARE OF THE CORRELATION COEFFICIENT, R2 REPRESENTS a) The risk ratio between two variables b) The strength of the association between two variables c) The co-variance of two variables d) None of the above 67 IF 6D2/n(n2-1) IS ZERO, THEN (rho) IS a) 1
29

b) 0 c) -1 d) None of the above 68 THE AMOUNT OF REGRESSION TOWARDS THE MEAN IS LEAST WHEN a) Zero b) Negative c) Positive d) High 69 WHICH VALUE OF r PERMITS THE GREATEST ACCURACY OF PREDICTION? a) -0.83 b) -0.25 c) 0.5 d) 0.75

30

70

RANDOMIZATION IS USEFUL IN ELIMINATION OF a) Observer bias b) Patient bias c) Confounding factors d) Sampling bias

71

HEIGHT FOR WEIGHT OF BOYS IN THE CLASS IS a) Association b) Index c) Proportion d) Correlation

72

IN RANDOM SAMPLE, CHANCE OF BEING SELECTED IS a) Not same and not known b) Same and known c) Same and not known
31

d) Not same but known 73 TRUE ABOUT SAMPLE RANDOM SAMPLING IS a) Techniques provide least number of possible samples b) Every fixed unit is taken for selection c) All units have equal chance to be selected d) Only selected units have right to be selected 74 FOR A SURVEY, A VILLAGE IS DIVIDED INTO 5 LANES. THE EACH LANE IS SAMPLED RANDOMLY. IT IS AN EXAMPLE OF a) Simple Random Sampling b) Stratified Random Sampling c) Systematic Random Sampling d) Multi-phase Random Sampling
32

75

WHICH IS TRUE FOR CLUSTER SAMPLING? a) Every nth case is chosen for study b) Involves use of random member c) A natural group is taken as sampling unit d) Stratification of population is done

76

THE CLUSTER SAMPLING TECHNIQUE USED FOR EVALUATING UNIVERSAL IMMUNIZATION PROGRAMME COVERAGE IS a) 30 cluster of 5 children b) 30 cluster of 10 children c) 30 cluster of 7 children d) 20 cluster of 5 children

77

SIGNIFICANT p VALUE IS a) 0.05 b) 0.01


33

c) 0.1 d) 0.005 78 WHICH OF THE FOLLOWING IS UNRELATED TO THE CHI-SQUARE TEST OF SIGNIFICANCE? a) Degree of freedom b) Life table c) Significance level d) Qualitative data 79 THE MEAN BP OF A GROUP OF PERSONS WAS DETERMINED AND AFTER AN INTERVENTION TRAIL, THE MEAN BP WAS ESTIMATED AGAIN. THE TEST TO BE APPLIED TO DETERMINE THE SIGNIFICANCE OF INTERVENTION IS a) Chi-square test
34

b) Paired t test c) Correlation coefficient d) Z-test 80 THE CHI-SQUARE TEST FOR 2X2 CONTINGENCY TABLE IS NOT VALID UNLESS a) Both variables are continuous b) All the expected frequencies are greater than 5 c) The sample is very large d) At least one variable is from a normal distribution 81 TRUE REGARDING CHI SQUARE TEST a) Null hypothesis is equal b) Does not test the significance c) Tests correlation and regression

35

d) Measure the significance of difference between two proportion 82 THE STATISTICAL ANALYSIS OF TWO UNRELATED BIG DATA (n=200) IS BY a) Paired t test b) Chi-square test c) Z-test d) Unpaired t test 83 FORMULA FOR CHI-SQUARE VALUE IS a) O-E/E b) O-E/E2 c) (O-E)2/N d) (O-E)2/E 84 DENOMINATOR OF MATERNAL MORTALITY IS a) Per1000 deaths b) Per 1000 mother deaths
36

c) Per 1000 live births d) Per 1000 deaths of children 85 WHAT IS THE DENOMINATOR OF GENERAL FERTILITY RATE? a) Married women b) Women in reproductive age group (15 to 49 years) c) Every married women in age of 15-49 years d) All women 86 ANNUAL GROWTH RATE IS a) Crud birth rate crude death rate b) Crud birth rate + crude death rate c) Crud birth rate + crude death rate x100 d) None of the above

37

87

WHICH OF THE FOLLOWING IS BEST INDICATOR OF HEALTH STATUS OF A COMMUNITY? a) Birth rate b) Infant mortality rate c) Crude death rate d) None of the above

88

SAMPLE REGISTRATION SYSTEM (SRS) IS TO ACQUIRE INFORMATION ON WHICH OF THE FOLLOWING a) Morbidity rate of various diseases b) Migration statistic c) Death rate from rural area d) Birth and death rates for the states and the country

89

INCIDENCE RATE IS DEFINE AS

38

a) Number of new cases occurring during a specified period in a given population b) Number of cases existing in a period c) Number of old cases present during a specified period in a given population d) Number of new cases found during a specified period 90 MORTALITY EXPERIENCE IS TAKEN INTO CONSIDERATION WHEN DEFINING a) General fertility rate b) Total fertility rate c) Net reproduction rate d) Gross reproduction rate 91 DEMOGRAPHIC GRAPH MEANS a) Difference in sex ratio

39

b) Difference between age specific birth and death rates c) Difference between birth and death ratio d) Difference between in child and human ratio 92 DENOMINATOR OF PERCENTILE MORTALITY IS a) Number of live births b) Number of deliveries (live and still births) c) Live birth still births d) Number of still births 93 NUMERATOR FOR NEONATAL MORTALITY IS a) Number of deaths up to 28 days of life b) Number of deaths between 28 days to one year

40

c) Number of deaths less then or equal to 7 days of life d) All infants under 1 year 94 TOTAL FERTILITY RATE (T.F.R.) REFERS TO a) Number of births per 1000 women b) Number of women between 15-49 years c) Number of female children per women d) Approximate completed family size 95 THE DENOMINATOR TO CALCULATED LITERACY RATE IS A a) Population above 14 years b) Total population c) Population above 7 years of age d) Population above 10 years of age 96 FERTILITY RATE CAN BE REDUCED BY a) Family literacy
41

b) Compulsory sterilization c) Early marriages d) Spacing of pregnancies 97 INFANTILE DEATHS IS TAKEN ONLY BELOW a) 1 month b) 7 days c) 1 year d) 9months 98 IF TFR IS A POPULATION IS 5 PER WOMEN, THE GRR IS a) 2.5 b) 5 c) 10 d) None of the above 99 PREVALENCE IS A a) Ratio
42

b) Rate c) Proportion d) Mode of disease 100 NET REPRODUCTION RATE (NRR=1) IMPLIES A COUPLE a) 50 b) 60 c) 70 d) 80

43

LONG AND SHORT ANSWER QUESTIONS

44

1. DEFINE FOLLOWING TERMS USED IN STATISTICS. i)Variables ii) Data iii) Statistics iv) Discrete series v) Continuous series vi) Population vii) Sample viii) Sample size Variables: are attributes or qualities which exhibit differences in magnitude and which vary along some dimension. Data: They are figures, ratings, check lists and other information collected in experiments surveys and descriptive studies. Statistics: A body of mathematical techniques or processes for gathering, analyzing and interpreting numerical data. Discrete series: series which exhibit real gap between two consecutive numbers. Continuous series: the scores are along the continuum. It is capable of any degree of

45

subdivision. (CGS or MKS systems are continuous series.) Population: Any group of individuals that have one or more characteristics in common those are of interest to the researcher. Sample: A small proportion of a population selected for observation and analysis. Sample size: There is no fixed number. The ideal sample is large enough to serve as an adequate representation of the population of the population about which the researcher wishes to generalize and small enough to be selected economically in terms of subject availability, expense in both time and money, and complexity of data analysis. 2. DESCRIBE TABULATION OF DATA IN NOMINAL, ORDINAL, INTERVAL & RATIO SCALES AND FREQUENCY DISTRIBUTION.

46

Nominal scale: least precise method of quantification. Quantitative data is classified in terms of frequency in different groups. Order of the group is not important. Ordinal scale: Qualitative data is classified in terms of frequency, but the groups or classes differ in amount or degree. Order of groups or classes is important and observed. Interval scale: Quantitative continuous series data. Length of class interval is equal for the particular tabulation. Length of class interval depends on size of sample, range and precision required. Usually the numbers of classes are from 3 to 21, but it is not a rule. There is no true zero. (Lower limit - 0.5 & Upper limit + 0.5). Ratio scale: When there is true zero. Remaining attributes are same as interval scale.

47

Frequency distribution: In any of the above scales, each individual can be a member of only one class or group or set and all the members of the same class have same defined characteristics. 3. ENLIST SAMPLING TECHNIQUES.

There are two main sampling techniques. Those are probability & non-probability sampling. Those are further divided in sub-techniques as follows: A) a. b. Probability sampling: Simple random sampling Systematic random sampling

c. d.

Stratified random sampling Multistage random sampling

e.

Multi-phase random sampling


48

(Generalization is possible because each individual has equal chance to get selected.) B) a. b. c. d. e. Non-probability sampling: Incidental sampling Purposive sampling

Quota sampling By snow-ball technique sampling

4.

WHAT ARE THE TYPES OF STATISTICS? Statistics is a science of figures. Bio-statistics is a term used when tools of statistics are applied to data that is derived from biological sciences such as medicine.

There are two main types of statistics.


49

1) 2)

Descriptive statistics. Inferential statistics.

Descriptive statistics:1) Descriptive statistical techniques reduce data manageable proportions by summarizing them. They describe various characteristics of the data under study.

2)

Descriptive techniques include:1) 2) 3) 4) Central tendency e.g. mean, median, mode. Measures of variability e.g. SD, Percentage, Range. Correlation techniques e.g. scatter plots. Measure mint of scales nominal, Interval, Rank, Ordinal.

Inferential statistics: It combines mathematical processes and logic that allow researches to test
50

hypotheses about a population using data obtained from probability samples. Purposes:1) To estimate the probability that statistics found in the sample accurately (reflect the population parameters) To test hypothesis about a population. To make objective decisions about the outcome of their study. Inferential statistics use the small hypothesis for testing the validity of a scientific hypothesis in sample data. FREQUENCY DISTRIBUTION.

2) 3) 4)

5.

A frequency distribution is a systematic way to list aeries of observations of a variable. Steps: - This is done by 1) Listing the categories of the scale. 2) Then tabulating the frequency of each occurrence.

51

3)

4) 5)

In a frequency distribution the number times each event occurs is counted or the data are grouped. And the frequency of each group is reported. Data is presented in tabular or graphic from.

Use: - Allow salutation and observation of, 1) Sacredness. 2) Symmetry 3) Modality 4) Kurtosis.

6.

SCOPE OF STATISTICS.

Statistics is a science of figures. Bio statistics is the term used when tools of statistics are applied to the data that is derived from biological sciences such as medicine.

52

Scope of statistics: 1) 2) 3) 4) In physiology and anatomy. To find limits of normality. To find difference between means and proportion. To find correlation. In pharmacology. To find action of drug. To find relative potency of new drug. To compare the actins of two drugs. In medicine. To compare effect of two treatments. To identify signs and symptoms. To find association between two attributes. In public health. To test usefulness of sera or vaccine for epidemiological studies. Also it is useful in zoology, botany, agriculture etc. Biological sciences.

53

7.

WHAT IS DATA? WHAT ARE TYPES OF DATA AND THEIR SORCES?

Data are figures, ratings, checklists and other o\information collected in experiments, surveys, descriptive studies and records. There are two types of datas. 1) Qualitative (or discrete) data. 2) Quantitative (or continuos) data. Qualitative Data They are classified by courting individuals having same attribute by counting individuals having same attribute and not by measurement of magnitude or size of attribute Qualitative data is describe in nature i.e. it exhibits real gaps between two consecutive numbers. The results obtained are expressed in ratio, proportion, percentage or rate.

54

The statistical methods commonly employed are standard error of proportion and chisquare tests. Quantitative Data The quantitative data have a magnitude. The characteristic is measured either on an interval or on a ratio scale. Quantitative data is continuous in nature i.e. the scores of any degree of sub division of CGS, MKS systems. Some of the statistical methods employed in analysis are mean, range, SD, coefficient of variation and correlation coefficient.

Main sources of DATA are1) 2) 3) 8. Experiments. Surveys. Records. CHARACTERISTICS OF GOOD DATA PRESENTATION.
55

1)

Data become concise without losing the details. Arouse interest in the radar. Become simple and meaning full to form impressions. Needs few words to explain. Define the problem and suggest the solution too. Become helpful in further analysis. It is fully labeled, sample and honest. WHAT ARE METHODS OF DATA PRESENTATION? WHAT ARE THE SCALES USED FOR TABULATIONS TO DATA? DESCRIBE THEM.

2) 3)

4) 5)

6) 7) 9.

There are two main methods of presenting data.


56

1) 2)

Tabulation. Drawing

Quantitative data Histogram Frequency curve Line chart or graph Cumulative frequency diagram Scatter or dot diagram. Qualitative data Bar diagram Pie or set or diagram Pictogram or picture diagram Map diagram or spot diagram.

Scales used in tabulation of data:1) Nominal scale.


57

2) 3) 4)

Ordinal scale Interval scale Ratio scale.

Nominal scale least precise method of quantification qualitative data is classified in terms of frequency in different groups. Order of groups is not important. Ordinal scale qualitative data is classified in terms of frequency but the groups or classes differ in amount or degree. Order of the groups or classes is important and observed. Interval scale Quantitative continuous series data. Length of the class interval is equal for the particular tabulation. Length of the class interval depends on size of the sample, range and the precision required. Usually numbers of classes are from 3 to 21, but it is not a rule. There is no true zero (0) (Lowes limit is 0.5 and upper limit is + 0.5 ).
58

Ratio scale There is true zero (0) remaining attributes are same as interval scale.2

10.

WHAT ARE THE PURPOSES OF TABULATION? ENLIST THE COMPONENTS OF A TABLE? WRITE PRINCIPLES OF TABLE CONSTRUCTION.

Tabulating is the process of transcription of data can be used to summarize and arrange data in a compact form for its analysis. Purposes of table 1) Conserve space, by presenting the data in such a way that the narrative may be reduced. Aid in the visualization of relations among the data and facilitates the process of data comparison. Help forward the process of summation and detect errors and omissions in the categories Make the tabulated data easy to remember.
59

2)

3)

4)

5)

Put together statistical tables as a basis for computation.

Components of a table A) Heading I) II) III) Table number Title of the table Designation of units

B) Body I) Sub head: - heading of all rows / blocks of sub-items. II) Body head: - heading of all columns or main captions and their sub- caption III) Field / body: - The cells in rows / columns C) Notation:I) foot note: - where ever applicable II) Source: - where ever applicable.
60

Principles of construction of tables:A) B) Every table should have title. A number to facilitate easy reference should identity every table.

C) D)

The caption ( or column headings ) should be clear and brief. The units of measurement under each heading must always be indicated. Any explanatory foot notes concurring the table itself placed directly beneath the table and reference symbols such as ( * ) asterisk ( + ) bagger and the like may be used.

E)

F)

If the data in a series of tables have been obtained from different sources, indicate the specific source in a place just below the table.

61

G)

Columns may be numbered to facilitated references.

H)

All column figures should be properly aligned ( eg. Decimal points + , - sign ) Columns and rows that are to be compared with one another should be brought close together.

I)

J)

To tales of rows should be placed at the extreme right column and totals of columns at the bottom. The arrangement of categories in a table may be chronological, geographical, alphabetical or according to magnitude in ascending or descending order.

K)

62

L)

M) N)

Miscellanies and exceptional items are generally placed in the last row of the table. The tables length is more than its width. Abbreviations and ditto marks should be avoided.

O)

The table should be made logical, clear, accurate and simple as possible.

11.

WHAT ARE THE TYPES OF DIAGRAMS USED FOR QUALITATIVE DATA AND QUANTATIVE DATA PRESENTATION ! ENLIST THE GENERAL RULES FOR GRAPHIC REPRESENTATION.

Diagrams for qualitative data presentation 1) Histogram polygon.

63

2) 3)

Frequency polygon. Frequency curve.

4) 5)

Line chart or graph. Cumulative frequency diagram.

6)

Scatter or dot diagram.

Diagrams for qualitative data presentation 1) 2) 3) Bar diagram / graph. Pie or sector diagram / graph. Pictogram or picture diagram.

4)

Map diagram or spot diagram.

General rules in graphic representation A) Chart should have a title placed directly above the chart.
64

B)

The title should be clear, concise and simple and should describe the nature of the data presented.

C)

Numerical data upon which the chart is based should be presented in an accompany table. The horizontal line measures time or independent variable and vertical line the measured variable.

D)

E)

Measurements proceed from left to right on horizontal line and from bottom to top on the vertical line. Each curve or bar on the chart should be label.

F)

65

G)

If there are more than one curve on bars, they should be clearly differentiated from one another by distinct patterns on colours. The zero point should always be represented and the scale intervals should be equal.

H)

I)

Graphic forms should be used sparingly.

12. 1) a. b. c.

CALCULATION OF MEAN. Grouped series - small scores - X /n = X; where X- Sum of all scores n- number of scores X- Mean score

Large scores - (X-w) /n = X (w working zero)

66

2)

Grouped series - no range or interval fx /n = X fx Sum of all frequencies into scores Interval - small group - fxg /n = X ( xg= midpoint of each group )

a.

13.

CALCULATING MEDIAN AND MODE.

Median1) 2)

Ungrouped series n+1 /2 Grouped series n /2

Mode Most frequently occurring observation is series.

67

14.

WRITE APPLICATIONS AND USES OF PERCENTILE IN BIOSTATISTICS. Location of a percentile that divides frequency distribution into two parts. Preparation of a standard percentile such as quartile (Q) on median (Q2) for particular ages and genders etc.

1)

2)

3)

Comparison of one percentile value of a variable of one sample with that of another sample, drawn from same population or different population. To study growth in children. (Growth chart of detect malnutrition level.)

4)

5)

As a measure of dispersion Inter quartile and semi quartile ranges are used as measures of dispersion.
68

15.

WHAT ARE VARIABLES AND THEIR TYPES?

Variables are the attributes or qualities which exhibit differences in magnitude and which vary along with some dimension. There are three main types of variabilitys. 1) Biological variability. 2) Real variability. 3) Experimental variability.

1) Biological variability :Individuals in similar environments differ when compared as regards sex, class and other attributes but the difference noted may be small and is said to occur by chance. Such differences are called biological variabilitys. Biological variability can be classified as

69

1) 2) 3)

4)

Individual variability e.g. Height, weight. Periodical variability e.g. Temp., pulse, B.P. Class group or category variability e.g. Ht, wt. varies with age, sex, caste, nature of work. Sampling variability ( i.e. Sampling error or statistical error.)

2) Real variability :When the difference between two readings, observations or values of classes or samples is more than the defined limits in universe, it is said to be real as cause lies in external factors e.g. Cure rate due to a drug. 3) Experimental variability: - error or variation may be due to materials, methods, procedures employed in the study or defeats in the techniques in the experiment. They are of 3 types70

1) 2) 3) 16.

Observer error a) Subjective b) objective. Instrumental error e.g. Defect in weighing machine. Sampling error e.g. Biased or too small. WHAT ARE THE MEASURES OF VARIABILITY?

Measures of variability are measures of dispersion or scatter against the measures of central tendency. They are two types. Measures of variability of individual observations. Range. Inter quartile range. Mean deviation. Standard deviation. Coefficient of variation. Measures of variability of samples. Standard error of mean.

1) i) ii) iii) iv) v) 2) i)

71

ii) iii) iv) v) vi)

Standard error of difference between two means. Standard error of proportion . Standard error of difference between two proportions. Standard error of correlation coefficient. Standard deviation of regression coefficient. WRITE THE FORMULAS TO CALCULATE i)Mean deviation ii) variance iii) S. D. iv) Coefficient of variation v) semi quartile range.

17.

i)

Mean deviation = X- X/ n

ii)

Variance = (X- X)2 / n-1

iii)

Standard deviation = (X- X)2 / n-1


72

iv)

Coefficient of variation = S.D * 100 / Mean

v)

Semi quartile range = (Q3-Q1) / 2

18.

WRITE THE ARITHMETICAL CALCULATION OF NORMAL DISTRIBUTION OF MAL PROBABILITY CURVE. Mean + ISD = 68.27% of observations Mean + 2SD = 95.45% of observations Mean + 1.86 SD = 95% of observations Mean + 3SD = 99.73% of observations Mean + 2.58 SD = 99% of observations.

1) 2) 3) 4) 5)

E.g. Average wt of baby at birth is 3.05 kg with SD of 0.39 by calculate normal limits of weight.

73

Assumption normal limits will include 95% of observations Normal limit = 3.05 + (1.96 * 0.39 ) = 3.05 + 0.76 = 3.81 to 2.29.

19.

HOW WILL YOU Z SCORE TO CALCULATE AREA UNDER PROBAILITY CURVE?

E.g. If mean MC is 28 days with SD 2. How frequently would you expect a MC of less than 22 days? Z= (X-X) / S.D. = (22- 28) / 2 = - 6/2 =-3

74

Corresponding value as per table is 0.0013 for 1 person. i.e. 0.13 of women will have less than 22 days.

20.

EXPLAIN RELATION BETWEEN PRECISION AND SAMPLE SIZE.

Precision = n / S.D. Precision means accuracy or freedom from error. Assume SD = 2 n = 4 Precision = 4 /2 = 2/2 =1 If SD is similar, i,e. 2 and sample size changes Precision = 16 / 2 =4 /2 =2. Precision= 64 / 2 = 8 /2 = 4. This we can say as sample size increases the precision or accuracy increases. Precision is proportionate to the sample size.

75

21.

EXPLAIN RELATIONSHIP BETWEEN SAMPLING ERROR AND SAMPLE SIZE? In quantitative date:Sampling error L = 2 S.D. / n Therefore, n = 4 (S.D.)2 / L2 Calculate with SD 5 stable n = 4 *25 /1 = 100 where L =1 n= 4*25 /2 = 50 where L = 2 n= 4*25/0.5 = 200 where L = 0.5 This we can say as we want to decrease sampling error the sampling size has to be increased. Sampling error is inversely proportionate to sampling size.

In qualitative data:n= 4pq / L2, (where p = positive character; q= 1-p; L is allowable error, ---% of p)
76

n= (4*5*95) / (0.5*0.5) = 1900/ 0.25 =7600 (where p=5% & L= 0.5) n= (4*5*95) / (1*1) = 1900 / 1 =1900. Thus we can say as we want to decrease sampling error the sample size has to be increased. This sampling error is inversely proportionate to sampling size. 22. WHAT ARE THE SAMPLING TECHRUIQUES USED IS RESEARCH?

Sampling is a act of selector samples. There are two types of sampling. 1) Probability sampling. 2) Non probability sampling. Techniques used for probability samplings are 1) 2) Simple random lottery method, table of random numbers. Systematic random Kth house
77

3)

4)

5) 6)

7)

K= total population / Sample size desired Stratified sampling when population is non homogeneous divided in to home genius e.g. sex called strata and then sample is drown from each stratum. Multi stage random sampling e.g. In district survey 10% of villages in talukas and then examined every 10th home in village. Multi please random sampling in T. B. survey 1st phase physical exam 2nd phase monotox test +ve 3rd phase x-ray +ve etc. Cluster sampling e.g. block, village, work-shop.

Techniques used for non probability sampling are, 1) Incidental sampling.

2) Purposive sampling. 3) Quota sampling. 4) Snow ball technique.


78

23.

WHAT ARE THE METHODS OF DRAWING CONCLUSION OR KNOWING THE SIGNIFICANCE OF THE RESULTS OBTAINED?

There are basic two methods of knowing significance of results obtained. 1) The estimation of a population parameter from a sample statistics. 2) The testing of hypothesis about population parameters.

1)

Estimation of population parameters. In most of the statistical studies, the levels of significance are set at 5% (P, 0.05); 1% (P, 0.01) and 0.5% (P, 0.005). This means that many % of extreme values will occur by chance e.g. 5% level of significance means extreme values will occur by chance only 5 times in 100
79

experiments or one time in 20 experiments. 2) Testing of statistical hypothesis. A test of significance such as Z- test is performed to adept (H0); or to reject (H0) and accept (H1). To make minimum error in rejection or acceptance of Ho, we divide the sampling distribution or area under the no mural curve into regions. Zone of acceptance = with in x = 1.96 SD. Or Y = 1.96 SE. Zone of rejection -beyond mean = 1.96 SE.

1) 2)

There are two types of error 1) type I 2) type II 1) Type I Ho is true but we reject it. Type II Ho is false but we accept it.

80

2)

Type I error is fixed in advance by choice of the level of significance employed in the test, Type I error is more serious in medical studies than type II error.

3)

4)

Type one error can be made as small as desired by changing level of significance.

24.

WHAT IS Z TEST?

The score of value of the ratio between the observed difference and SE is called Z. Prerequisites to apply Z- test of mean are 1) 2) 3) 4) Samples must be randomly selected. Data must be quantitative. The variable is assumed to follow normal distribution in population. The sample size must be larger than 30.

81

Z test for means has two applications 1) To test the significance of difference between a sample mean (x) and a known value of population () z = (X - ) / S.E. 2) To test the significance of difference between two samples means or between experimental sample mean and a control sample mean. Observed difference between two sample means Z= SE of difference between two sample means To determine the significance of Z- value the probability (P) value is found from the table. Z value is inversely proportionate to (p) value.

25.

WHAT ARE THE AIMS OF STUDY OF STATISTICS?


82

Statistics is a science of figures. A study of statistics helps to, 1) 2) To determine the extent of variation. To determine whether the difference observed from the central value is due to some factors other than natural, i.e. Whether it is by chance or really due to some factors interfering with nature e.g. Birth rates of two countries / cities.

26.

How computers are useful for data analysis?

Computer assisted qualitative data analysis software (CAQDAS) can help to remove some of the work of cutting and pasting pages of narrative material. These programs permit the entire data file to be entered onto the computers, each portion of an interview or observational record coded and then portions of the text corresponding to

83

specified codes retrieved and printed for analysis. The software can also be used to examine relationships between codes. Software can not how ever do the coding and it can not tell the researches how to analysis the data. Researches must continue their role as analysts and critical thinkers. The main types of software packages that are available to handle and manage qualitative data include: 1) 2) 3) 4) 5) Text retrievers Code and retrieve Theory building Concept maps and diagrams Data conversion / collection.

Text retrievers locate text and terms in data bases and documents. Code and retrieve packages chunks of the text are marked and linked to a name given by the researcher to indicate that piece of the texts content. The software can then
84

collect together all the parts of the text labeled with the same code. Theory building software e.g. NVivo7 helps researcher find patterns in their data and explores hunches and enables them to display and analyze relationships in the data. Software for concept mapping and diagramming Concept maps are a means for organizing and representing knowledge, which includes concept and relationships between them. This software constructs more sophisticated diagrams. Data conversion / collection Converts audio into text. These soft-wares are more popular and used frequently because it frees up researchers time and permits them to pay greater attention to more important conceptual issues.

85

27.

WHERE THR RATIOS, RATES AND TRENDS ARE USED?

Ratio: - It expresses a relationship in size between two random quantities. The numerator is not component of the denominator. Broadly ratio is the result of the dividing one quantity by another; it is expressed in the form of x: y or x/y. e.g. Sex ration, doter population ratio, child woman ratio, etc. Rates: - It comprises the following elements 1) Numeration 2) Denominator 3) Time specification, usually a calendar year 4) Multiplier, round figure selected according fraction of convenience or convection to avoid. Trends: - means fashion or customs E.g. Use of contraceptive methods preferred.

86

28.

What is score? What is scaling? Z = (X X) / Where X= score, X= mean score, = standard deviation Here the mean of scores always 0 (the reference point) and the S. D is always unity or 1.00.

Uses 1) It enables us to use and combine. Scores originally expressed in different units, 2) It is used to calculate area under normal probability curve. Z Scaling is done for more prcised measures. Z scaling = 500 + 100 Z 29. What is standard score and T score?

In Z score half the scores are negative and half are positive. In addition these scores are often small decimal fractions in computation.

87

For these reasons Z scores are usually converted into new distribution with M and S.D. so selected as to make all scores positive and relatively easy to handle. Such scores are called standard scores. T scores are normalized standard scores. T scaling is devised by Mc Call and first used by him in the construction of a series of reading test designed for use in the elementary grades. T scores are normalized standard scores converted into a distribution with a mean of 50 and S.D. of 10. T score = 50 +10Z. 30. WHAT IS RELIABITY? WHAT ARE THE METHADS TO ESTABLISH RELIABITY OF THE TEST SCORES?

Reliability is the degree of consistency or dependability with which an instrument measures an attribute.
88

Methods of establishing reliability1) Test -retest method. 2) Two parallel tests. 3) Integrated or inter observers reliability. 4) Split half method. Test-retest methodEstablishes reliability of stability if every time we test the scores are nearly same. Two parallel testThere two test are papered with items with same difficulty value and discriminating index. But very less used as difficult to construed.

Integrated or inter observers reliabilityThe terms applied to the comparisons of raters or observers using the same instruments. This type of reliability is determined by the degree to which two or more independent. Raters or observers are in agreement.
89

Split half methodSplit half method is done by test items scores are divided in to two sets eg odd and even numbers to gather and administered to two groups.

31. 1) 2) 3) 4) 5) 6)

WHAT ARE THE CONDITIONS FOR PARAMETRIC TEST? Normal distribution of population. Random sampling. Sample size > 30. Interval size. Variance nearly equal. Unbiased selection.

32.

DISCRIBE 2 TEST.

Chi Square test is, a non-parametric test, not based on any assumptions distribution of any variable. It is most commonly used when data are in frequencies such as in the number of responses in two or more categories.
90

Application of chi- square:It was developed by karl Pearson and has got the following three common but very important applications in medical statistics as test of: 1) 2) 3) Proportion Association Goodness of fit

2 = (frequency observed frequency expected)2 / frequency expected = (fo fe )2 / fe 2 values calculated and tested for probability of occurrence of the opinions on an policy issue with equal distribution hypothesis.

33.

WHAT IS SIGN MEDIAN TEST?

91

A non-parametric test for comparing to paired groups based on the relative ranking of values between the pairs. Criteria for sign median test. 1) 2) 3) Small group ( 2*2 bi-variate table) Measure is quantitative. May not be continuous.

Calculation:- Steps of sign median test 1) Calculate common median = [(N1 + N2)/2] +1 2) For each score more than median give ve sign and less than median give +ve sign. 3) Draw 2*2 tables. 4) Calculate 2={N[1(CD BC)/-N]2}/ {(A+B) (B+D) (C+D) (A+C)} 5) Calculate degree of freedom (C-1) * (R-1) = 1. See table E and calculate probability of occurrence.
92

6)

34.

WHAT IS MAN WHITNEY TEST?

Man Whitney U test is a nonparametric statistic used to test the difference between two independent groups, based on ranked scores. Criteria:1) 2) 3) 4) Small group. Two groups. Quantitative measures. May not be continuous.

Steps of calculation of Man Whitney test: 1) 2) 3) We give common ranks to scores of both groups together. Calculate R1 of N1, and R2 of N2 Calculate either U1 or U2 U1= N1 N2 +{ [N1(N1+1)] /2} - R1 U2 = N1N2 +{ [N2 (N2+1)] /2} - R2

93

4)

Use U1 or U2 for further calculation U (N1 N2) /2 Z= [(N1 N2)(N1+N2+1)] /12 Table for Z is used for significance of value.

5)

35.

WHAT IS t TEST?

It is a parametric statistical test for analyzing the difference between two means. Application of t test: It is applied to find the significance of difference between two means as 1) 2) Unpaired t - test. Paired t - test.

Criteria for applying t test:1) 2) 3) 4) Random samples. Quantitative data. Variable normally distributed. Sample size less than 30.
94

(Criteria differ from Z test in condition number 4) Use:To find out the significance of the difference between the mean of two groups. Calculation:t= Mean D / SEMD By using the table values for degree of freedom (df), the significance level is tested. Generalization for the population is possible. t table value, Reject Ho ; t < table value, accept Ho.

36.

WHAT IS ANOVA?

ANOVA means Analysis of variance. It is a statistical procedure for testing mean differences among three or more groups of comparing variability between groups to variability between groups.

95

The statistic computed in ANOVA is the F ratio. If F < table value, no significant difference between groups. When three or more groups are to be compared for the significance of difference between the means, ANOVA is used when only there is one independent variable. 37. WHAT IS MULTIVARIATE ANALYSIS OF VARIANCE?

It is statistical procedure used to test the significance of differences between the means of two or more groups on two or more dependent variables / considered simultaneously. When three or more groups with more than one independent variable are to be compared for the significance of used. Manual calculation is very complicated and difficult therefore computer programme is used. 38. WHAT IS ANCOVA?
96

ANCOVA means Analysis of covariance. It is statistical procedure used to test mean differences among groups on a dependent variable, while controlling for one or more covariates. For more than two groups which are not equated I e pre post test taking one group as reference group, scores of the other groups are adjusted and then the ANOVA technique is used to find the significance of difference between means.

39.

WHAT IS EXPERIMENTAL DESIGN? GIVE THREE CATEGORIES OF EXPERIMENTAL DESIGNS.

Research design is the overall plan for addressing a research question, including specifications for enhancing the studys integrity.

97

Research design is the architectural back-bone of the study. There are three categories of experimental design. 1) a) b) c) 2) a) b) c) 3) a) b) c) d) True experimental It has all three criteria fulfilled. Manipulation of variables. Control group for comparison. Randomization ( for rigor of study ). Quasi experimental designs. Manipulation of variable. No strict randomization. No control group. Pre-experimental designs. Manipulation of variables. No randomization. No control group. No control for extraneous variables.

40.

WHAT IS RANDOMIZED BLOCK DESIGN?

98

Randomized block design is an experimental design involving two or more independent variables, with only some experimentally manipulated. The variable which researcher can not manipulate is the blocking or stratifying variable. In theory, the number of blocking and manipulated variables is unlimited, but practically, dictates a relatively small number of each. As a general rule of thumb, a minimum of 20 subjects per call is recommended to achieve stability within cells. This means that minimum 80 subjects would be needed for a 2X2 design. e.g. effect of tactile versus auditory stimulation for male versus female infants. The gender is a blocking variable which researcher can not manipulate. Genetically factors, age, birth weight, height at birth, sex, are the examples of blocking variables.

99

41.

WHAT IS LATIN SQURE DESIGN?

It is an experimental design with samples divided in to groups and procedures or intervention differs for each group. Number of interventions and number of groups are same. Each intervention is rotated for each group. This type of within subjects design has the advantage of ensuring the highest possible equivalence among subjects exposed to different conditions as they are composed of the same people Order of interventions:Q&I G & II G & III A, B, C, B, C, A, C, A, B,

Here the F ratio is calculated to compare effect of interventions which is in groups. eg. 4 groups of students under go four different tests in crossover pattern.
100

Test I spell the word in picture. Test II correcting the spelling of fished. Test III selecting the correct word spelling e.g. Gilr, geerl, girl. Test IV Word making from given letters G, r, I, l, - girl. S, f, I, h, - fish. By using these methods it is studied how is better.

Tests. G&I G & II G & III G & IV II I IV III IV II III I I III II IV III IV I II

42.

WHAT IS THE MEANING AND NEED OF CORRELATION?


101

Correlation is an association or bond between variables, with variation in one variable systematically related to variation in another. There are 5 types of correlations depending on its extent and direction. 1) 2) 3) 4) 5) Perfect positive correlation = +1. Perfect negative correlation = -1. Absolutely no correlation = 0. Moderately positive correlation < <+1. Moderately negative correlation -1 < <0.

So we can say correlation between two variables lies between -1 and +1. Correlation is needed in establishing the relationship between two variables. It helps two understand what will happen to another variable if one is changed e.g. raise in temp increases 8 beats of pulse. Perfect correlation.

102

43.

WHAT IS RANK ORDER CORRELATION?

If data is small sample size and quick results required and when it is not possible to give scores to certain qualities e.g. honest, social adjustment ranks are given in ascending order i. e. minimum score is given 1st rank. in bath variables if 2 samples have same scores, then ranks are added and the average rank is given to both. By using spearmans rank difference formula is calculated. This is an easy way of finding relationship but as the scores are not considered this is less reliable measure.

44.

WHAT IS SCATTERED DIAGRAM? WHAT IS ITS USE?

The two sets of scores for an individual for two variables are presented in a bi-variant table or graphical presentation is known as scattered diagram.
103

It is used as 1) 2) It gives the rough idea about the coefficient of correlation. It is used to calculate the product movement coefficient of correlation, e.g. Pearsons product movement of correlation .

45.

WHAT IS SIMPLE LINEAR REGRESSION ANALYSIS AND PREDICTION? Simple liner regression is an analysis for predicting of a dependent variable from ( an independent variable) predictor, by drawing a straight line fit to the data that minimizes deviations from the line.
104

The basic linear regression equation is Y1 = a + b x Where Y1 = Predicted value of variable Y a = intercept constant b = regression coefficient x = actual value of independent variable X b = [ (X- X) (Y-Y)] /[ (X-X)2]

46.

WHAT IS PRODUCT MOMENT CORRELATION? A correlation coefficient designating the magnitude of relationship between two variables measured on at least an interval scale is called as the predict movement correlation or Pearsons . This coefficient is computed with variables measured on either an interval or ratio scale.
105

47.

WHAT IS PROBABILITY? Probability may be defined as the relative frequency or probable chances of occurrence with which an event is expected to occur on an average. E.g. giving birth to a boy in 1 st pregnancy. Probably is usually expressed by the symbol P. It ranges from 0 to 1 when P = 0 (zero) it means no chance of even happening. E.g. Survival after rabies is zero. When P = 1 (one) it means 100% chance of even happening. E.g. Survival after sand-fly fever is 100%. If probability of happening is denoted by P the probability of not happening is devoted by q.

P+q=1

q=1P
106

Where, P is number of events occurring per number of trails.

48.

WHAT ARE THE PROPERTIES OF A NORMAL PROBABILITY CURVE? Bell shaped. Asymptotic (Never touches X axis ). Values of mean, mode and median coincide. Perpendicular on X-axis from the mean is axis of symmetry.

1) 2) 3)

4)

5) 6)

Skew-ness is zero (0). Kurtosis is 0.263.

107

7) 8)

Area between 1 and +1, x is 68.26%. Area between 2 and =2, is nearly 95%.

9) 10)

Area between 3 and -2, is nearly 99%. % of cases in a normal distribution can be calculated by using Table A for area under the curve. WHAT IS SAMPLING ERROR? When sample is selected by using probability sampling methods, sample is representative of population, therefore sample mean is considered as a population mean. But in practice it is not true. There airframe error chance error and response error added together known as sampling error.

49.

1)

Sampling error for mean S E m = /N is calculated.

108

2)

If sample size is large the error is less and if is small, error is less.

Sampling error is a tendency for statistics to fluctuate from one sample to another. There are two types of errors in statistical inference. 1) 2) Type I Type II

Type I error occurs when the researches reject a null hypothesis that actually true. Type II error occurs when the researcher accepts a null hypothesis controls a risk of making types I error by setting the alpha level as level of significance. Unfortunately reducing risk of type I error by reducing level of significance increases risk of making type II error.

50.

WHAT IS AN Ogive? HOW IS IT USED IN FINDING PERCENTILE AND PERCENTILE RANKS?

Ogive is cumulative percentage frequency curve.


109

The cumulating frequencies are turned in to percentages of N. scores are on X axis and cumulative percentage on Y axis. The O give is then drawn by joining point platted graph by smooth line. To locate percentile:1) Select percentage on Y axis. 2) Draw a horizontal line to X axis till curve and draw the perpendicular from that point to X axis, which gives the percentile. To locate percentile rank of the score is treated by:1) 2) 3) Select the score on X axis. Draw parallel line to Y axis till it meets the curved. Draw a perpendicular from that point on Y axis to locate percentile rank. WHAT IS CUMULATIVE FREQUENCY GRAPH?

51.

52. The scores of the distribution are added serially or cumulative frequencies are calculated and
110

a diagram is plotted by taking frequencies on Y axis and scores on X axis.

53.

WHAT IS RANGE? WHAT INFORMATION DO YOU GET FROM RENGE?

Difference between maximum score and minimum score is range. Range = Max. Score Min. Score. Range gives us the idea about group, whether homo genius or heterogeneous, which helps in further planning. E.g. Normal range of Hemoglobin is 12 to 14 gm% for male.

111

Ordinarily observations falling within a particular range are considered normal and those falling outside the normal range are considered abnormal. 54. WHAT DO YOU MEAN BY PERCENTILE? HOW DOES IT HELP IN UNDERSTANDING THE DISTRIBUITION OF SAMPLE?

Percentile means a position below or above which are certain percent of scores. E.g. P9o means a score below which are 90% and above which are 10% scores. P5o is medium. Percentile helps to understand screw-ness. 55. HOW AVERAGE DEVIATION IS CALCULATED? D. = (X-X) / N Where, X = Score X = Mean Score
112

A.

N = Number of frequencies = Sum of.

56.

WHAT IS QUARTILE DEVIATION? HOW IS IT CALCULATED?

Q is one half the scale distances between 75th and 25th percentiles in frequency distribution. 75th percentile is Q3, 25th percentile is Q1 Q = (Q3 Q1) / 2 57. WHAT IS STANDARED DEVIATION? WHY IS IT MORE RELIABLE? WHERE IS IT USED?

113

Standard division is the most frequency used statistic for measuring the degree of variability in a set of scores. SD = X2/N It is a most stable variability as the deviations for all scorch are taken from mean and to avoid ve sign the difference is squared and then used. S. D. is useful variability index for describing a distribution and interpreting individual scores. 58. WHY MEAN IS THE RELIABLE MEASURE OF CENTRAL TENDENCY? WHAT IS THE ASSAMPTIM WHILECALCULATING MEAN FROM A FREQUENCY DISTRIBUITION? WHEN DOES MEAN BECOME LESS RSLIABLE?

114

Mean is a measure implies arithmetic averaged or arithmetic mean which is obtained by summing up all the observation and dividing the total by number of observation. X=fX / f = X / n Mean is the reliable measure of central tendency because: 1) 2) 3) 4) It is based on all observation. It is simple to understand and easy to calculate. It is capable for further mathematical analysis. It is least affected by sampling fluctuation.

Assumption: while calculation mean from frequency distribution is mid point of the class interval is taken as representative score for the frequencies in the class interval. Mean becomes less reliable when, 1) 2) It is affected by extreme observation. It is only applicable for quantitative data.
115

3) 4) 59.

If the class interval is opened class interval then it is not possible. It cannot be determined graphically. UNDER WHAT COUNDITION MEDIAN BECOMES A OROPER MEASURE OF CENTRAL TENDENCY? WHAT IS THE ASSAMPTION IN CALULATING MEDIAN?

Median is the value of middle score. Median becomes a proper measure of central tendency when 1) There are extreme scores or, 2) Length of the class interval is not equal or, 3) End class intervals are not well defined, or 4) Data is ordinal qualitative data. The assumption in calculating median is the scores of frequencies are distributed equally in the class interval.

Merits of median:

116

1) 2) 3) 4) 5)

Median is applied only for ordinal qualitative data. It is simple understand easy to calculate. Median is not affected by extreme observations. Median can be calculated for the distribution with open and class interval. It can be determined graphically.

Demerits of median: 1) 2) 3) 4) 5) It is not based on all observations. It is not capable of further mathematical analysis. It is not rigidly defined as arithmetic mean. It is affected by sampling fluctuation and less stabile than mean. It is not better average when number of observations is small.

60.

WHAT IS MODE? HOW TO FIND THE MODE FROM UNGROUPED

117

DATA AND FROM FREQUENCY DISTRUBITION TABLE? Mode shows the tendency of the sample. The maximum frequency from any variable in ungrouped data is mode. For frequency distribution the midpoint of the class interval with highest frequency is the value of mode. Merit:1) Simple to understand and easy to calculate. 2) Use for both quantitative and qualitative data. 3) It is not affected by extreme values. 4) Mode can be calculated for frequency. 5) It can be determined graphically. Demerits:1) 2) 3) It is based on only frequency. It is not rigidly defined. It is affected by sampling fluctuation.

118

4) 5)

It is not capable for further mathematical analysis. When data contain more than one mode such mode are difficult to interpret or compare DIFFERENCE BETWEEN CORRELATION AND REGRESSION. CORRELATIO REGRESSIO N N The change in The relationship the or association measurement of between two a variable Quantitatively character, on Definition measured or the positive or continuous negative side, variables is called beyond the as correlation. mean is called as regression. The extent or A measure of degree of the change in relationship one dependent Coefficient between two sets character with of figures is one unit change measured in in independent terms of another character is
119

61.

parameter called as correlation coefficient.

Denoted

It is denoted by letter Correlation gives degree and direction of relationship between the two variables.

calculated in terms of another parameter called as regression. It is denoted by lette r Regression analysis enables us to predict the value of one variable on the basis of another variable.

Analysis

Causeeffect relationshi p

The cause-effect relationship between two variables is understood grossly.

The causeeffect relationship between two variables is understood precisely.

120

62.

EXPLAIN RELATION SHIP BETWEEN REGRESSION AND CORRELATION WITH HELP OF GRAPH. Steepness of the lines indicates the extent of correlation, greater is the steepness of regression line X on Y or Y on X.

In perfect correlation two regression lines coincides. (= +1 or -1)

= +1

121

= -1

When correlation is portal two regressions will separate and diverge forming an acute angle at the meeting point of perpendiculars drawn from means of two variables.

When correlation is O or nil, i.e. the variables are independent the two lines intersect at right angle.
122

63.

WHAT ARE THE POINTS TO BE CONSIDERED WHILE SELECTING STATISTICAL TEST?

Following is the check list for choice of a statistical test: 1. Aims of the study

2.

Variables to be studied

123

3.

Data type (Numerical, Nominal, Ordinal, Binary)

4.

Analysis type (Comparison, association, Regression)

5.

Number of groups : 2 or more

6.

Distribution: Nominal or Non-nominal

7.

Design: Paired or Unpaired data

8.

Use following table to chose statistical test.

124

Type of Data

Unpaired or Between gro up

Paired or Within group

Interval

t (unpaired) ANOVA (one way)

t (paired) ANOVA Repeated Measu res McNemar Cochrane Q Friedmann

Categoricnominal ordinal or nongaussian data

Chi-square Fishers exact Kruskal Wallis

Mann-whitney Wilcoxons MPSR

125

STATISTICAL EXERCISE
(With solutions)

126

1) Incubation period of 7 polio cases are: 6, 5, 8, 6, 4, 5; calculate the Mean. Mean= = =6

8,

2) Calculate mean of the following data: 750, 710, 680, 660, 620, 580, 540. X 750 710 680 660 620 580 540 X= 4540 Mean= X- 500 250 210 180 160 120 080 040 (X- 500)= 1040

+ 500 = 148.6+ 500 = 648.6

127

3) Average ESR values of 10 patients is 60/hour. Average ESR values of 60 patients is 50/hour. Average ESR values of 30 patients is 40/hour. Calculate average ESR value of all patients. (i.e. 100 patients) N 10 60 30 100 ESR value 60 50 40 Weights of each group 600 3000 1200 4800 = 48/ hrs

Weighted mean=

4) Calculate mean of grouped data of 53 students weights as tabulated below Weights in Kgs frequency 65-69 6 60-64 5 55-59 10 50-54 8 45-49 9 40-44 8 35-39 7 N= 53
128

Weights frequency Mid-point Weights in Kgs of each contributed group by each group 65-69 6 67 402 60-64 5 62 310 55-59 10 57 570 50-54 9 52 416 45-49 9 47 423 40-44 8 42 336 35-39 7 37 259 N= 53 fXg=2716 Formula : mean= = = 51.245

5) Calculate the median for the following data 7, 8, 4, 5, 6, 3, 10, 6, 5. Arranging in ascending order 3, 4, 5, 5, 6, 7, 8, 8, 10. Here are total 9 values ; middle one is 5th value, that is 6. 6) Calculate the median for the following data 7, 8, 4, 5, 6, 3, 10, 6, 5, 11.
129

Arranging in ascending order 3, 4, 5, 5, 6, 7, 8, 8, 10, 11. Here are total 10 values ; middle one is 5th and 6th value, that is 6,7. Median = = 7.5

7) Calculate the median for the following grouped data Weight in Kg frequency 80-84 1 75-79 2 70-74 4 65-69 4 60-64 5 55-59 10 50-54 8 45-49 8 42

Median= 42/2 = 21st frequency= 10

130

Weigh Class t in Kg interval

Lower frequency Cumulative limit of frequency class (l)

80-84 75-79 70-74 65-69 60-64 55-59 50-54 45-49

79.5 74.5 69.5 64.5 59.5 54.5 49.5 45.5

1 2 4 4 5 10 8 8 42

1 3 7 11 16 26 34 42

Median= L+ Where i= class interval L= lower limit of the class in which median lies f= frequency of class having median c= cumulative frequency of the next lower class median= 54.5+ = 57

131

8) Calculate mode 6, 7, 8, 10, 12, 3, 2, 1, 8, 8. Arrange in ascending order: 1, 2, 3, 6, 7, 8, 8, 8, 10, 11. Frequently occurring value is 8. So mode is 8. 9) Calculate the mode of the following group data Weight in Kg. 65-69 60-64 55-59 50-54 45-49 40-44 35-39 frequency 6 5 10 8 9 8 7 53 Xi

Mode= L1+

Where L1=lower limit of modal class = 55 f0= frequency before the modal class = 8 f2= frequency after the modal class = 5
132

f1= frequency of mid class = 10 i= class interval = 5 (Modal class means the class interval having highest frequency value) Mode= 55+ X 5 =55 + 3.08 + 58.08

10) Diastolic blood pressure of 10 patients is given. Calculate range. 86, 76, 80, 70, 90, 96, 94, 84, 98, 72. Arrange in ascending order: 70, 72, 76, 80, 84, 86, 90, 94, 96, and 98. So, range is 98 to 70. Or 98-70= 28 11) Calculate mean deviation following data 10, 12, 8, 10, 6, 4, 8, and 10. Mean = X = =8.5 for the

133

frequency 10 12 8 10 6 4 8 10 N= 8

mean 8.5

Frequency - mean 1.5 3.5 -0.5 1.5 -2.5 -4.5 -0.5 1.5 (X-X)=16

M.D. = (X-X) / N = 16 / 8 = 2

12) Incubation period of 10 measles cases is given below, calculate standard deviation. 6, 7, 5, 4, 3, 4, 5, 6, 7, 8. X2 (X)2 / N S.D.= N-1 = {325-[(55)2 / 10]} / (10-1) = 325 302.5 / 9 = 2.5 = 1.58
134

13) Given the following data set (age of patients):18, 59,24,42,21,23,24,32 Find the inter-quartile range! 1. Sort the data from lowest to highest 2. Find the bottom and the top quarters of the data 3. Find the difference (inter-quartile range) between the two quartiles. 18, 21, 23, 24, 24, 32, 42, 59. 1st quartile = The {(n+1)/4}th observation = (2.25)th observation = 21 + (23-21)x .25 = 21.5 3rd quartile = {3/4 (n+1)}th observation = (6.75)th observation = 32 + (42-32)x .75 = 39.5 Hence, IQR = 39.5 - 21.5 = 18 The interquartile range is a preferable measure to the range. Because it is less prone to distortion by a single large or small value. That is, outliers in the data do not affect the inerquartile range. Also, it can be computed when the distribution has open-end classes.
135

14) Areas of sprayable surfaces with DDT from a sample of 15 houses are as follows (m2) : 101,105,110,114,115,124,125, 125, 130,133,135,136,137,140,145 Find the variance and standard deviation of the above distribution. The mean of the sample is 125 m2. Variance (sample) = s2 = (xi x)2/n-1 = {(101-125)2 + (105-125)2 + . (145-125)2} / (15-1) = 2502/14 = 178.71 (square meters) 2 Hence, the standard deviation = 178.71 = 13.37 m2.

15) A sample of 50 students is to be drawn from a population consisting of 500 students belonging to two institutions A and B. The number of students in the institution A is 200 and the institution B is 300. How will you draw the sample using proportional allocation? There are two strata in this case with sizes N1 = 200 and N2 = 300 And the total population N = N1 + N2 = 500 The sample size is 50. If n1 and n2 are the sample sizes, N
136

n1 = X N1= n2 = X N2=

X 200= 20 X 300= 30

The sample sizes are 20 from A and 30 from B. Then the units from each institution are to be selected by simple random sampling. 16) Mid year population of a town was 210000. The following events occurred during same year. Total live births 5000 Total deaths 2000 Total maternal deaths 22 Infant deaths 300 CBR= (5000 / 210000) X1000 =23.81/ 1000ppl. CDR= (2000 / 210000) X1000 =9.52/ 1000ppl. IMR= (300 / 5000) X 1000 =60 / 1000Live births. MMR= (22 / 5000) X 1000 =4.4/ 1000 L.B. 17) Census population of a village in 2001 was 6000. Males are 3400 and females are
137

2600. Following events occurred in same year. Total live births 220 Total deaths 100 Out of deaths, males 60 Deaths in age group under 5 years 30 Deaths in December month 16 Infant deaths 25 Number of TB cases present 120 Deaths due to TB are 8 Calculate relevant indices. CBR= (220/ 6000)x 1000= 36.67/ 1000ppl. CDR= (100/ 6000) x 1000= 16.67/ 1000ppl. IMR= (25/ 220) x 1000= 113.67/ 1000L.B. Specific death rate for males= (60/ 3400)x1000= 17.65/ 1000 males. Specific death rate for females= (40/ 2600) x 1000= 15.38/ 1000 females. Specific death rate due to TB= (8/ 6000) x 1000= 1.33 / 1000ppl. Proportional mortality rate due to TB=
138

(8/100) x 100 = 8/ 100 deaths Case fatality rate= (8/ 120) x 100= 6.67/ 100 cases. Under 5 years mortality rate= (30/ 220) x 1000= 136/ 1000 L.B. Under 5 proportional mortality rate= (30/ 100) x 100= 20/ 100 deaths. Death rate for December= [(16 x 12)/ 6000] x 1000= 32/ 1000ppl.

18) The population of the India was 884million as per march 1991 census and 1027million as per march 2001 census. Calculate mid year population for 2006. Pn= Pc2 + = 1027+ million (a=5yrs, 4mths =16/3 months) = 1103.26 million.
139

19) Calculate attributable risk and relative risk for following data Cases Non cases Total exposed A= 900 B= 400 1300 NonC= 100 D= 600 700 exposed Total 1000 1000 2000 Attributable risk= -

= (900/1300) (100/700) = 0.692-0.142 =0.55 Relative risk=


= 0.692/ 0.142 = 4.873.

20) Calculate incidence of disease in population, in exposed and in non exposed for following data. Lung Present Absent Total cancer Smokers A= 100 B= 99900 100000 Non C= 8 D= 99992 100000 smokers Total 108 199892 200000
140

Incidence of disease in population (IP) = 108/200000 = 0.54/ 1000ppl. Incidence of disease in exposed (IE) = 100/ 100000= 1/1000ppl. Incidence of disease in non exposed (IO) = 8/ 100000 = 0.08/ 1000ppl. 21) Calculate sensitivity, specificity, positive predictive value, negative predictive value, and percentage of false negative and false percentage of false positive of screening test data for AIDS, i.e. Western blot test. Disease Present Absent Total Positive A= 85 B= 50 105 Negative C= 25 D= 80 105 Total 110 100 Sensitivity= (85/ 110) x 100 = 77.27% Specificity= (80/ 100) x 100= 80% Positive predictive value= (85/ 105) x 100= 81% Negative predictive value= (80/105) x 100=76% % of false negative = (25/110) x 100= 22.73%
141

% of false positive = (20/100) x 100 = 20% 22)

142

Important Statistics Formulas

143

Parameters

Population mean = = ( Xi ) / N Population standard deviation = = sqrt [ ( Xi - )2 / N ] Population variance = 2 = ( Xi - )2 / N Variance of population proportion = P2 = PQ / n Standardized score = Z = (X - ) / Population correlation coefficient = = [ 1 / N ] * { [ (Xi - X) / x ] * [ (Yi - Y) / y ] }

Statistics Unless otherwise noted, these formulas assume simple random sampling.

Sample mean = x = ( xi ) / n Sample standard deviation = s = sqrt [ ( xi x )2 / ( n - 1 ) ] Sample variance = s2 = ( xi - x )2 / ( n - 1 ) Variance of sample proportion = sp2 = pq / (n 1) Pooled sample proportion = p = (p1 * n1 + p2 * n2) / (n1 + n2) Pooled sample standard deviation = sp = sqrt [ (n1 - 1) * s12 + (n2 - 1) * s22 ] / (n1 + n2 - 2) ]
144

Sample correlation coefficient = r = [ 1 / (n - 1) ] * { [ (xi - x) / sx ] * [ (yi - y) / sy ] }

Correlation

Pearson product-moment correlation = r = (xy) / sqrt [ ( x2 ) * ( y2 ) ] Linear correlation (sample data) = r = [ 1 / (n 1) ] * { [ (xi - x) / sx ] * [ (yi - y) / sy ] } Linear correlation (population data) = = [ 1 / N ] * { [ (Xi - X) / x ] * [ (Yi - Y) / y ] }

Simple Linear Regression


Simple linear regression line: = b0 + b1x Regression coefficient = b1 = [ (xi - x) (yi - y) ] / [ (xi - x)2] Regression slope intercept = b0 = y - b1 * x Regression coefficient = b1 = r * (sy / sx) Standard error of regression slope = sb1 = sqrt [ (yi - i)2 / (n - 2) ] / sqrt [ (xi - x)2 ]

Counting

n factorial: n! = n * (n-1) * (n - 2) * . . . * 3 * 2 * 1. By convention, 0! = 1. Permutations of n things, taken r at a time: nPr = n! / (n - r)!


145

Combinations of n things, taken r at a time: nCr = n! / r!(n - r)! = nPr / r!

Probability

Rule of addition: P(A B) = P(A) + P(B) P(A B) Rule of multiplication: P(A B) = P(A) P(B|A) Rule of subtraction: P(A') = 1 - P(A)

Random Variables In the following formulas, X and Y are random variables, and a and b are constants.

Expected value of X = E(X) = x = [ xi * P(xi) ] Variance of X = Var(X) = 2 = [ xi - E(x) ]2 * P(xi) = [ xi - x ]2 * P(xi) Normal random variable = z-score = z = (X )/ Chi-square statistic = 2 = [ ( n - 1 ) * s2 ] / 2 f statistic = f = [ s12/12 ] / [ s22/22 ] Expected value of sum of random variables = E(X + Y) = E(X) + E(Y) Expected value of difference between random variables = E(X - Y) = E(X) - E(Y)
146

Variance of the sum of independent random variables = Var(X + Y) = Var(X) + Var(Y) Variance of the difference between independent random variables = Var(X - Y) = Var(X) + Var(Y)

Sampling Distributions

Mean of sampling distribution of the mean = x = Mean of sampling distribution of the proportion = p = P Standard deviation of proportion = p = sqrt[ P * (1 - P)/n ] = sqrt( PQ / n ) Standard deviation of the mean = x = /sqrt(n) Standard deviation of difference of sample means = d = sqrt[ (12 / n1) + (22 / n2) ] Standard deviation of difference of sample proportions = d = sqrt{ [P1(1 - P1) / n1] + [P2(1 - P2) / n2] }

Standard Error

Standard error of proportion = SEp = sp = sqrt[ p * (1 - p)/n ] = sqrt( pq / n )

147

Standard error of difference for proportions = SEp = sp = sqrt{ p * ( 1 - p ) * [ (1/n1) + (1/n2) ] } Standard error of the mean = SEx = sx = s/sqrt(n) Standard error of difference of sample means = SEd = sd = sqrt[ (s12 / n1) + (s22 / n2) ] Standard error of difference of paired sample means = SEd = sd = { sqrt [ ((di - d)2 / (n - 1) ] } / sqrt(n) Pooled sample standard error = spooled = sqrt [ (n1 - 1) * s12 + (n2 - 1) * s22 ] / (n1 + n2 - 2) ] Standard error of difference of sample proportions = sd = sqrt{ [p1(1 - p1) / n1] + [p2(1 p2) / n2] }

Discrete Probability Distributions

Binomial formula: P(X = x) = b(x; n, P) = nCx * Px * (1 - P)n - x = nCx * Px * Qn - x Mean of binomial distribution = x = n * P Variance of binomial distribution = x2 = n * P * ( 1 - P ) Negative Binomial formula: P(X = x) = b*(x; r, P) = x-1Cr-1 * Pr * (1 - P)x - r Mean of negative binomial distribution = x = rQ / P
148

Variance of negative binomial distribution = x2 = r * Q / P 2 Geometric formula: P(X = x) = g(x; P) = P * Qx


-1

Mean of geometric distribution = x = Q / P Variance of geometric distribution = x2 = Q / P2 Hypergeometric formula: P(X = x) = h(x; N, n, k) = [ kCx ] [ N-kCn-x ] / [ NCn ] Mean of hypergeometric distribution = x = n * k/N Variance of hypergeometric distribution = x2 = n * k * ( N - k ) * ( N - n ) / [ N2 * ( N - 1 ) ] Poisson formula: P(x; ) = (e-) (x) / x! Mean of Poisson distribution = x = Variance of Poisson distribution = x2 = Multinomial formula: P = [ n! / ( n1! * n2! * ... nk! ) ] * ( p1n1 * p2n2 * . . . * pknk )

Linear Transformations For the following formulas, assume that Y is a linear transformation of the random variable X, defined by the equation: Y = aX + b.

Mean of a linear transformation = E(Y) = Y = aX + b.


149

Variance of a linear transformation = Var(Y) = a2 * Var(X). Standardized score = z = (x - x) / x. t-score = t = (x - x) / [ s/sqrt(n) ].

Estimation

Confidence interval: Sample statistic + Critical value * Standard error of statistic Margin of error = (Critical value) * (Standard deviation of statistic) Margin of error = (Critical value) * (Standard error of statistic)

Hypothesis Testing

Standardized test statistic = (Statistic Parameter) / (Standard deviation of statistic) One-sample z-test for proportions: z-score = z = (p - P0) / sqrt( p * q / n ) Two-sample z-test for proportions: z-score = z = z = [ (p1 - p2) - d ] / SE One-sample t-test for means: t-score = t = (x ) / SE Two-sample t-test for means: t-score = t = [ (x1 x2) - d ] / SE

150

Matched-sample t-test for means: t-score = t = [ (x1 - x2) - D ] / SE = (d - D) / SE Chi-square test statistic = 2 = [ (Observed Expected)2 / Expected ]

Degrees of Freedom The correct formula for degrees of freedom (DF) depends on the situation (the nature of the test statistic, the number of samples, underlying assumptions, etc.).

One-sample t-test: DF = n - 1 Two-sample t-test: DF = (s12/n1 + s22/n2)2 / { [ (s12 / n1)2 / (n1 - 1) ] + [ (s22 / n2)2 / (n2 - 1) ] } Two-sample t-test, pooled standard error: DF = n 1 + n2 - 2 Simple linear regression, test slope: DF = n - 2 Chi-square goodness of fit test: DF = k - 1 Chi-square test for homogeneity: DF = (r - 1) * (c - 1) Chi-square test for independence: DF = (r - 1) * (c - 1)

Sample Size Below, the first two formulas find the smallest sample sizes required to achieve a fixed margin of error,
151

using simple random sampling. The third formula assigns sample to strata, based on a proportionate design. The fourth formula, Neyman allocation, uses stratified sampling to minimize variance, given a fixed sample size. And the last formula, optimum allocation, uses stratified sampling to minimize variance, given a fixed budget. Mean (simple random sampling): n = { z2 * 2 * [ N / (N - 1) ] } / { ME2 + [ z2 * 2 / (N - 1) ] } Proportion (simple random sampling): n = [ ( z2 * p * q ) + ME2 ] / [ ME2 + z2 * p * q / N ] Proportionate stratified sampling: nh = ( Nh / N ) *n Neyman allocation (stratified sampling): n h = n * ( Nh * h ) / [ ( Ni * i ) ] Optimum allocation (stratified sampling): nh = n * [ ( Nh * h ) / sqrt( ch ) ] / [ ( Ni * i ) / sqrt( ci ) ] Vital statistics

Crude birth rate (CBR)= x 1000

152

Crude death rate (CDR) = x 1000

Infant mortality rate = x1000

Maternal mortality rate = x1000

Neonatal mortality rate = x 1000

Early neonatal mortality rate = x 1000

Post neonatal mortality rate = x 1000 Or

153

x 1000

Peri-natal mortality rate = x 1000

Proportional mortality rate = x100

Case fertility rate = x 100

154

BIBLIOGRAPHY David ML, David S, Timothy CK, Mark LB. Statistics for managers : Using Microsoft excel. 3 rd. New Jercey: Prentice hall; 2003.

David ML, David S, Timothy CK, Mark LB. Statistics for managers : Using Microsoft excel. 3 rd. New Jercey: Prentice hall; 2003.

Douglas CM, George CR. Applied statistics and probability for engineers. 1st ed. USA: John Wiley and Sons; 1994.

155

Garrett Henry E, Woodworth RS. Statistics in psychology and education. 6th ed.New York: David McKay Co.;1981.

B K Mahajan. Methods in Biostatistics. 6 th ed. New Delhi: Jaypee; 2006.

Basawanthappa BT. Nursing research.2ed. Delhi: Jaypee brothers; 2007.

Polit & Beck. Nursing research. 8 ed. New Delhi: Walters Kluwer; 2008.

M Harris Harris, G Taylor. Medical


156

Statistics: Made easy. London and New York: MD Groups; 2003.

Research Methodology & Biostatistics Workshop at Bombay Hosp. College of Nsg. Conducted by Dept. of Pharmacology & therapeutics Seth GS Med. College & KEM Hosp. 10-12/09/12.

157

S-ar putea să vă placă și