Documente Academic
Documente Profesional
Documente Cultură
Introduction
Any science needs precision for its development. For precision, facts, observations or measurements have to be expressed in figures. It has been said when you can measure what you are spea ing about and express it in numbers, you now something about it, but when you cannot express it in numbers your nowledge is of meagre and unsatisfactory ind.! everything depends on measurement. &.g. you have to measure or count the number of missing teeth '( measure the vertical dimension and express it in number so that it ma es sense. Statistic or datum means a measured or counted fact or piece of the information stated as a figure such as height of one person, birth weight of a baby etc. Statistics or data is plural of the same. Statistics is the science of figures. Bio statistics is the term used when tools of statistics are applied to data that is derived from biological sciences such as medicine. " #ord $elvin %imilarly in medicine, be it diagnosis, treatment or research
)*+,
In pharmacology
/ 0o find the action of drugs / 0o compare the action of two drugs or two successive dosages of same drug / 0o find the relative potency of a new drug with respect to a standard drug
In medicine
/ 0o compare the efficiency of a particular drug, operation or line of treatment / 0o find association between two attributes such as cancer and smo ing / 0o identify signs and symptoms of disease
In research
/ It helps in compilation of data , drawing conclusions and ma ing recommendations.
For students
/ 1y learning the methods in biostatistics a student learns to evaluate articles published in medical and dental 4ournals or papers read in medical and dental conferences. / 5e also understands the basic methods of observation in his clinical practice and research.
Variable
/ 6haracteristics which ta es different values for different person, place or thing such as height, weight, blood pressure
Population
/ .opulation includes all persons, events and ob4ects under study. it may be finite or infinite.
Sample
/ -efined as a part of a population generally selected so as to be representative of the population whose variables are under study
Biostatistics Dr Shilpi Gilra -epartment of .rosthodontics 8*+,
Parameter
/ It is a constant that describes a population e.g. in a college there are +9: girls. 0his describes the population, hence it is a parameter.
Statistic
/ %tatistic is a constant that describes the sample e.g. out of 399 students of the same college +,: girls. 0his +,: will be statistic as it describes the sample
Attribute
/ A characteristic based on which the population can be described into categories or class e.g. gender, caste, religion.
%ource of data
0he main sources for collection of data / &xperiments / %urveys / (ecords
Experiments
/ &xperiments are performed to collect data for investigations and research by one or more wor ers.
Surveys
/ 6arried out for &pidemiological studies in the field by trained teams to find incidence or prevalence of health or disease in a community.
Biostatistics Dr Shilpi Gilra -epartment of .rosthodontics +*+,
Records
/ (ecords are maintained as a routine in registers and boo s over a long period of time / provides readymade data.
0ypes of data
-ata is of two types
,*+,
-ata presentation
%tatistical data once collected should be systematically arranged and presented / 0o arouse interest of readers / For data reduction / 0o bring out important points clearly and stri ingly / For easy grasp and meaningful conclusions / 0o facilitate further analysis / 0o facilitate communication 0wo main types of data presentation are / 0abulation / >raphic representation with charts and diagrams
!abulation
It is the most common method -ata presentation is in the form of columns and rows It can be of the following types / %imple tables / Fre=uency distribution tables
?*+,
%imple 0able
@umber of patients at $I-%, 1gm Aan 9? Feb 9? Darch 9? 3,B99 ),C99 ),<,9
1ar chart 5istogram Fre=uency polygon Fre=uency curve #ine diagram 6umulative fre=uency diagram or ogive %catter diagram .ie chart .ictogram %pot map or map diagram
1ar chart
#ength of bars drawn vertical or hori;ontal is proportional to fre=uency of variable. suitable scale is chosen bars usually e=ually spaced 0hey are of three types #simple bar chart #multiple bar chart two or more variables are grouped together #component bar chart bars are divided into two parts each part representing certain item and proportional to magnitude of that item
300 250 200 150 100 N um be r o f C D P a tie nts
B*+,
400 350 300 250 200 150 100 50 0 1st Qtr 45 320 250
390
80
2nd Qtr
3rd Qtr
4th Qtr
3 000 2 500 2 000 1500 1000 1500 500 0 1s t Q tr 2nd Q tr 3rd Qtr 4th Q tr 300 450 P a tie nts to pro s tho 200 1850 14 00 2100 P a tie nts to o the r D e pa rtm e nts 500
5istogram
Biostatistics Dr Shilpi Gilra -epartment of .rosthodontics C*+,
pictorial presentation of fre=uency distribution consists of series of rectangles class interval given on vertical axis area of rectangle is proportional to the fre=uency
80 70 60 50 40 30 20 10 0 Number o !arious "esions 45 40 32 22 43 34 29 38 75 0 to 3 3 to 6 6 to 9 9 to 12 12 to 15 15 to 18 18 to 21 21 to 24 24 to 27
Fre=uency polygon
obtained by 4oining midpoints of histogram bloc s at the height of fre=uency by straight lines usually forming a polygon
Fre=uency curve
Biostatistics Dr Shilpi Gilra -epartment of .rosthodontics )9*+,
when number of observations is very large and class interval is reduced the fre=uency polygon losses its angulations becoming a smooth curve nown as fre=uency curve
#ine diagram
line diagram are used to show the trends of events with the passage of time
90 80 70 60 50 40 30 20 10 0 0 1 2 3 4 5 10 25 60 Patients # ith $eriodontitis 85
graphical representation of cumulative fre=uency . it is obtained by adding the fre=uency of previous class
100 90 80 70 60 50 40 30 20 10 0
25
35
)3*+,
.ie chart
In this fre=uencies of the group are shown as segment of circle -egree of angle denotes the fre=uency Angle is calculated by / class $re%uency & '() total observations
70- 11.
30- 5. 200- 31. PR/)01/ C/N)/ P+R2/ /R01/ P+D/ 150- 24.
180- 29.
.ictogram
.opular method of presenting data to the common man
Average value in a distribution is the one central value around which all the other observations are concentrated Average value helps / to find most characteristic value of a set of measurements / to find which group is better off by comparing the average of one group with that of the other the most commonly used averages are / mean / median / mode
Dean
refers to arithmetic mean it is the summation of all the observations divided by the total number of observations EnF denoted by H for sample and I for population H J x) K H3 K H8 L. Hn * n Advantages / it is easy to calculate -isadvantages / influenced by extreme values
Dedian
Mhen all the observation are arranged either in ascending order or descending order, the middle observation is nown as median In case of even number the average of the two middle values is ta en Dedian is better indicator of central value as it is not affected by the extreme values
Dode
Biostatistics Dr Shilpi Gilra -epartment of .rosthodontics )+*+,
Dost fre=uently occurring observation in a data is called mode @ot often used in medical statistics. &xample @umber of decayed teeth in )9 children 3,3,+,),8,9,)9,3,8,B Dean J 8+ * )9 J 8.+ Dedian J E9,),3,3,*+',8,+,B,)9F J 3K8 *3 J 3., Dode J 3 E 8 0imesF
0ypes of variability
0here are three types of variability / Biolo"ical variability / Real variability / Experimental variability &xperimental variability are of three subtypes / 'bserver &rror / Instrumental &rror / %ampling &rror
1iological variability
It is the natural difference which occurs in individuals due to age, gender and other attributes which are inherent 0his difference is small and occurs by chance and is within certain accepted biological limits
Biostatistics Dr Shilpi Gilra -epartment of .rosthodontics ),*+,
(eal 2ariability
such variability is more than the normal biological limits the cause of difference is not inherent or natural and is due to some external factors e.g. difference in incidence of cancer among smo ers and non smo ers may be due to excessive smo ing and not due to chance only
&xperimental 2ariability
it occurs due to the experimental study they are of three types / 'bserver error the investigator may alter some information or not record the measurement correctly / Instrumental error this is due to defects in the measuring instrument both the observer and the instrument error are called non sampling error / %ampling error or errors of bias this is the error which occurs when the samples are not chosen at random from population. 0hus the sample does not truly represent the population
e.g. 1. of an individual can show variation even if ta en by standardi;ed method and measured by the same person. 0hus one should now what is the normal variation and how to measure it. 0he various measures of variation or dispersion are (ange Dean or average deviation %tandard deviation 6o efficient of variation
(ange
It is the simplest -efined as the difference between the highest and the lowest figures in a sample -efines the normal limits of a biological characteristic e.g. freeway space ranges between 3"+ mm @ot satisfactory as based on two extreme values only
Dean deviation
)<*+,
It is the summation of difference or deviations from the mean in any distribution ignoring the K or / sign -enoted by DD- J N E x / x F n H J observation H J mean n J no of observation
%tandard deviation
Also called root mean s=uare deviation It is an Improvement over mean deviation used most commonly in statistical analysis -enoted by %- or s for sample and O for a population -enoted by the formula %- J N E x / x F3 n or n") >reater the standard deviation, greater will be the magnitude of dispersion from mean %mall standard deviation means a high degree of uniformity of the observations Gsually measurement beyond the range of P 3 %- are considered rare or unusual in any distribution
It summari;es the deviation of a large distribution from its mean. It helps in finding the suitable si;e of sample e.g. greater deviation indicates the need for larger sample to draw meaningful conclusions It helps in calculation of standard error which helps us to determine whether the difference between two samples is by chance or real
6oefficient of variation
It is used to compare attributes having two different units of measurement e.g. height and weight -enoted by 62 62 J %- H )99 Dean and is expressed as percentage
)C*+,
39*+,
0hese limits on either side of measurement are called con$idence limits the loo of fre=uency distribution curve may vary depending on mean and %- . thus it becomes necessary to standardi;e it. &g" 'ne study has %- as 8 and other has %- as 3,thus it becomes difficult to compare them 0hus normal curve is standardi;ed by using the unit of standard deviation to place any measurement with reference to mean. 0he curve that emerges through this procedure is called standard normal curve
3)*+,
%ampling
It is not possible to include each and every member of population as it will be time consuming, costly , laborious . therefore sampling is done %ampling is a process by which some unit of a population or universe are selected for the study and by sub4ecting it to statistical computation, conclusions are drawn about the population from which these units are drawn 0he sample will be a representative of entire population only
Biostatistics Dr Shilpi Gilra -epartment of .rosthodontics 33*+,
It is sufficiently large It is unbiased %uch sample will have its statistics almost e=ual to parameters of entire population 0wo main characteristics of a representative sample are .recision Gnbiased character
.recision
.recision depends on a sample si;e 'rdinarily sample si;e should not be less than 89 .recision J Sn*s n J sample si;e , s J standard deviation .recision is directly proportional to s=uare root of sample si;e, greater the sample si;e greater the precision Also greater the %-, less will be the precision 0hus in such cases to obtain precision, sample si;e needs to be increased
Gnbiased character
0he sample should be unbiased i.e. every individual should have an e=ual chance to be selected in the sample. 0hus a standard random sampling method should be used @on sampling errors can be ta en care of by / Gsing standardi;ed instruments and criteria / 1y single , double , triple blind trials / Gse of a control group
38*+,
0he investigator needs to decide how large an error due to sampling defect is allowable i.e. allowable error # &ither the investigator should start with assumed %- or do a pilot study to estimate %sample si;e J + %-3 * #3 Dean pulse rate of population is <9 beats per min with standard deviation of B beats. Mhat will be the sample si;e if allowable error is P) n J + H B H B * ) H ) J 3,? If # is less n will be more i.e. larger the sample si;e lesser is the error.
3+*+,
e.g. incidence rate in last influen;a was found to be ,: of the population exposed what should be the si;e of the sample to find incidence rate in current epidemic if allowable error is )9:T p J ,: = J C,: l J )9 : of p J 9.,: nJ + H , H C, * 9., H 9., J <?99
.robability or p value
6oncept of probability is very important in statistics .robability is the chance of occurrence of any event or permutation combination. It is denoted by p for sample and . for population In various tests of significance we are often interested to now whether the observed difference between 3 samples is by chance or due to sampling variation. 0here probability or p value is used . ranges from 9 to ) 9 J there is no chance that the observed difference could not be due to sampling variation ) J it is absolutely certain that observed difference between 3 samples is due to sampling variation 5owever such extreme values are rare. . J 9.+ i.e. chances that the difference is due to sampling variation is + in )9 'bviously the chances that it is not due to sampling variation will be ? in )9
Biostatistics Dr Shilpi Gilra -epartment of .rosthodontics 3,*+,
0he essence of any test of significance is to find out p value and draw inference If p value is 9.9, or more it is customary to accept that difference is due to chance Esampling variationF . 0he observed difference is said to be statistically not significant. If p value is less than 9.9, observed difference is not due chance but due to role of some external factors. 0he observed difference here is said to be statistically significant. ,rom shape o$ normal curve Me now that C,: observation lie within mean P 3%- . 0hus probability of value more or less than this range is ,: ,rom probability tables p value is also determined by probability tables in case of student t test or chi s=uare test By area under normal curve 5ere ;J standard normal deviate is calculated 6orresponding to ; values the area under the curve is determined EAF .robability is given by 3E9., " AF
3?*+,
0ests of significance
Mhatever be the sampling procedure or the care ta en while selecting sample, the sample statistics will differ from the population parameters Also variations between 3 samples drawn from the same population may also occur i.e. differences in the results between two research wor ers for the same investigation may be observed 0hus it becomes important to find out the significance of this observed variation ie whether it is due to chance or biological variation Estatistically not significantF '( due to influence of some external factors E statistically significantF 0o test whether the variation observed is of significance, the various tests of significance are done. 0he test of significance can be broadly classified as ). Parameteric tests 3. -on parametric tests
.arameteric tests
.arametric tests are those tests in which certain assumptions are made about the population / .opulation from which sample is drawn has normal distribution / 0he variances of sample do not differ significantly
Biostatistics Dr Shilpi Gilra -epartment of .rosthodontics 3<*+,
0he observations found are truly numerical thus arithmetic procedure such as addition, division, and multiplication can be used
%ince these test ma e assumptions about the population parameters hence they are called parameteric tests . 0hese are usually used to test the difference 0hey areU / %tudent t testE paired or unpairedF / A@'2A / 0est of significance between two means
3B*+,
0est of significance can also be divided into one tailed or 3 tailed test
3C*+,
. value is determined using any of the previously mentioned methods If pV 9.9, the difference is due to chance and not statistically different but if p W 9.9, the difference is due to some external factor and statistically significant
0ypes of error
Mhile drawing conclusions in a study we are li ely to commit two types of error. / 0ype I error / 0ype II error
0ype I error
0his type of error occurs Mhen we conclude that the difference is significant when in fact there is no real difference in the population ie we re4ect the null hypothesis when it is true -enoted by X
0ype II error
0his type of error occurs Mhen we say that the difference is not significant when in fact there is a real difference between the populations i.e. the null hypothesis is not re4ected when it is actually false It is denoted by Y
Biostatistics Dr Shilpi Gilra -epartment of .rosthodontics 8)*+,
83*+,
0hus the difference is not due to chance and may be due to influence of some external factor i.e. the difference is statistically significant
It helps to now what is the significance of difference obtained by 3 research wor ers for the same investigation %& EH) / H3F J S %-)3 * n) K %-33 * n3 &g.Find the significance of difference in mean heights of ,9 girls and ,9 boys with following values >irls 1oys SE Dean )+<.+ ),).? 2 3 4(0(5* 67) 8 2 90*: 12observed di$$erence 6 SE 1 2 9790( 9;<0; 6 90*: 2 '0*( %ince R value is more than 3 ,p will be less than .9, 0hus difference is statistically significant and it can be concluded that boys are taller than girls 4(0'5* 6 7) %?.? ?.8
8+*+,
%tandard error of proportion is the unit which measures variation in proportion of a character from sample to population %& of proportion J S p H = * n pJproportion of positive character =Jproportion of negative character nJsample si;e Also proportion of population J proportion of sample P 3 %&. 0hus one can determine whether the proportion of sample is within limits of population proportion .roportion of blood group 1 among Indians is 89:. If in a sample of )99 individuals it is 3,: what is your conclusion about the group %&. J S p H = * n J S 3, H <, * )99 J +.88 R J observed diff * %& J 89 " 3, * +.88 J ).), %ince ; is W 3 ,p will be more .9, thus the difference is not significant.
Gnpaired t test
Applied to unpaired data of observation made on individuals of 3 separate groups to find the significance of difference between 3 means %ample si;e is less than 89 e.g. difference in accuracy in an impression using two different impression materials %teps in unpaired t 0est are 6alculate the mean of two samples 6alculate combined standard deviation 6alculate the standard error of mean which is given by %&D J %- S)*n) K )*n3 6alculate observed difference between means H) / H3
Biostatistics Dr Shilpi Gilra -epartment of .rosthodontics 8?*+,
6alculate t value J observed difference * %tandard error of mean -etermine the degree of freedom which is one less than no of observation in a sample En ")F 5ere combined degree of freedom will be J En) / )F K En3 / )F (efer to table and find the probability of the t value corresponding to degree of freedom .W 9.9, states difference is significant .V 9.9, states difference is not significant In a nutritional study )8 children in group A are given usual diet along with vitamin A and vitamin - while )3 children in group 1 ta e the usual diet. 0he gain in weight in pounds for both groups after )3 months is shown in the table Is vitamin A and - responsible for gain in weightT
>roup A , 8 + 8 3 ? 8 3 8 ? < , 8 >roup 1 ) 8 3 + 3 ) 8 + 8 3 3 8 "
8<*+,
Dean of group A J + Dean of group 1 J 3., 0otal %- J ).8< 0otal %& J 9.,+B tJ 'bserved difference * %& t J + / 3., * 9.,+B J 3.<+ 6ombined degree of freedom J n) K n3 / 3 )3 K)8 " 3 p 2alue is chec ed corresponding to the t value at 38 d.f. from the t table It is W 9.93 0hus difference is statistically significant And accounted to role of vitamins AZ-
.aired t test
It is applied to paired data of observation from one sample only . Gsed in sample less than 89 0he individual gives a pair of observation i.e. observation before and after ta ing a drug
Biostatistics Dr Shilpi Gilra -epartment of .rosthodontics 8B*+,
0he steps involved are 6alculate the difference in paired observation i.e. before and after J x) / x3 J y 6alculate the mean of this difference J y 6alculate % 6alculate %& J %- * S n -etermine t J y * %& -etermine the degree of freedom %ince there is one sample df J n") (efer to table and find the probability of the t value corresponding to degree of freedom .W 9.9, states difference is significant .V 9.9, states difference is not significant &g.%ystolic 1. of a normal individual before and after in4ection of hypotensive drug is given in the table. -oes the drug lower the 1.T
1. before giving drug H) )33 )3) )39 )), )3? )89 )39 )3, )3B 1. after giving drug H3 )39 ))B )), ))9 )33 )89 ))? )3+ )3, -ifference H)"H3 J y 3 8 , , + 9 + ) 8
Dean of difference y J N y * n J 3< * C J 8 %- J S N E y " y F3 *n") J ).<8 %& J %- * n J ).<8 * C J 9.,B t J y * %& J 8 * 9.,B J ,.)< -egree of freedom to n / ) J C / ) J B
Biostatistics Dr Shilpi Gilra -epartment of .rosthodontics 8C*+,
p value corresponding to t J ,.)< and d.f. B is W 9.99) 0hus highly significant 0hus decrease in 1. is due to the -rug
X3 J [ E ' / & F 3 * & X3 denotes 6hi s=uare ' J 'bserved 2alue & J &xpected 2alue
2accine 1 seems to be superior to 2accine A Me perform 6hi %=uare test to verify if the vaccine 1 is superior to vaccine A or is it merely due to chance
Find total attac and non attac rates 0otal Attac rate J 8? * )<? J 9.39+ 9.<C, 0otal @on Attac (ate J )+9 * )<? J
2accine A EnJC9F
Attac ed ' J 33 & J 9.39+ H C9 J)B.8? ' " & J K 8.?+ ' J )+
@ot Attac ed ' J ?B & J 9.<C, H C9 J <).,, ' " & J " 8.,, ' J <3 & J 9.<C, H B? J ?B.8< ' " & J K 8.?8
1 EnJB?F
X3 J N E ' / & F 3 * & J E8.?+F3 *)B.8? K E8.,,F 3 * <).,, K E8.,+F 3* )<.,+ K E8.?8F 3 * ?B.8< J 9.<3 K 9.)< K 9.<) K 9.)C J ).<C Find the -egree of Freedom J Ec")F Er")F c J number of 6olumns r J number of (ows d.f. J E3")FE3")F J ) Find the p value 'n referring to 6hi s=uare table with one degree of freedom the p value was more than 9.9,.
+3*+,
5ence the difference is not statistically significant and the null hypothesis of no difference between vaccines is accepted.
A@'2A
Analysis of variance Investigations may not always be confined to comparison of 3 samples only e.g. we might li e to compare the difference in vertical dimension obtained using 8 or more methods li e phonetics, swallowing, niswongers method In such cases where more than 3 samples are used A@'2A can be used Also when measurements are influenced by several factors playing there role e.g. factors affecting retention of a denture, A@'2A can be used. A@'2A helps to decide which factors are more important (e=uirements -ata for each group are assumed to be independent and normally distributed %ampling should be at random 'ne way A@'2A Mhere only one factor will effect the result between 3 groups 0wo way A@'2A Mhere we have 3 factors that affect the result or outcome Dulti way A@'2A
Biostatistics Dr Shilpi Gilra -epartment of .rosthodontics +8*+,
F test
F J Dean %=uare between %amples * Dean %=uare within %amples F J variance ratio 0he values of Dean s=uare are seen from the analysis of variance table if we have the values of sum of s=uares and degree of freedom E which are calculated F Dean %=uare between %amples / It denotes the difference between the sample mean of all groups involved in the study EA, 1, 6 etcF with the mean of the population Dean %=uare within %amples / it denotes the difference between the means in between different samples 0he greater both these value more is the difference between the samples 0he F value observed from the study is compared to the theoretical F value obtained from the 0ables at ): and ,: confidence limits. 0he results are then interpreted. If the observed value is more than theoretical value at ): , the relation is highly significant. If the observed value is less than the theoretical value at ,: it is not significant.
Biostatistics Dr Shilpi Gilra -epartment of .rosthodontics ++*+,
If the observed value is between ) and ,: of theoretical value it is statistically significant. .resented by -r %hilpi >ilra
+,*+,