Sunteți pe pagina 1din 19

MaMaEuSch

Management Mathematics for European Schools http://www.mathematik.unikl.de/ mamaeusch

Populationandsample. Samplingtechniques
Paula Lagares Barreiro JustoPuertoAlbandoz
MaMaEuSch ManagementMathematics forEuropean Schools 94342 CP-1-2001- 1-DE -COMENIUS -C21 -

Univ ers it y of Sev ille This pro ject has b een carried out with the partial supp ort of the Europ ean Comm unit y in the framew orkof the Sokratesprogram me. The con tendo es t not neces sarily reectthe p osit ionf the Europ e an o Comm unit y , nor do es it in v olv e an y resp onsibi lit y on the part of the Europ ean C omm unit y .

Con te n ts
1 Population and sample. Sampling techniques 1.1 Reas ons to use sampling. P reviou s considerati. ons . . . . 1.2 Sam p lin g tec h niques. . . . . . . . . . . . . . . . . . . . . 1.3 Random sampli ng with and without re placemen t . . . . . 1.4 Strati ed s ampling . . . . . . . . . . . . . . . . . . . . . . 1.5 Clus ter s ampling. . . . . . . . . . . . . . . . . . . . . . . 1.6 Sys te matic sampli. ng . . . . . . . . . . . . . . . . . . . . 1.7 Other s ampl ing tec hnique . . . . . . . . . . . . . . . . . s 2 An example of the application of sampling techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 4 5 6 8 10 11 12

Chapt er 1

P opulati on and sampl e. Sam pli ng tec hni ques


Let us e xtend this c hapter hat w eh a valre ady in w e presen ted the b e ginning De scriptiv e in of Statistic s, luding w inc no the denition somesampli ng hniqu and concepts orde r o b e of tec es in t able to dec ide wh ic h is the appropriate sampling tec hnique for eac h situation . Let us imagi ng, for ins tance , that y our c lass has b een c hosen as a sample of a p opulation. The study that is goi ng to b e mad e can b e ab out dieren t them es, for example: 1. Th e opinion ab out the p ossibilit y of organizing alternativ e activities in y ou r c it y and a prop osal of the activiti es that can b e mad e. 2. A p oll ab o ut the opin ion on th e d ieren t p olitic leaders. 3. Th e opinion ab out ab out th e p ossib le c hoice s for a end-of y ear-trip with the stud en ts of y our class. Do y ou think that y our class w ould b e a go o d sample for an y of the se answ er is The situations? that, for instance, for the s econd situation, the stud en ts of a class are not an appropri ate sample. F or the rst situation , w e ma y think that the s tuden ts of a class c an gi v e us in te r e sting i nf ormation, yb e the sample can b e to o small and w e cou ld ha v e a lac k of infor ma tion (b o ys and though ma girls of other ages, living in die ren t quar ters ,. ..), while for the thi rd situation, the sample can b e v ery usefuThe refore , it is v ery imp ortan t th e c hoice of an appropriate s ampling tec hnique wh ic h l. ass u res us that w e are c ho osin g a go o d s ample for t he study w e w an t to mak e.

1.1

Reasons to use sam Previo us considerat pling. ions Let us imagine that w e are going to mak e stud ies to get the foll o wing information:
The p erce n tage of Sp anis h p op ulation that h as acces s to in ternet. The a v erage las ting of a c onc rete trade of bat te r ies. 2

F or the rs t case , the p opulation y ou w ou ld ha v e to ask to is b igger thanIt is million p eople . 40 ob vious that making an in tervi ew to more than 40 million p e ople require s a big e ort in man y e lds . Firs t of all, the re is a big nee d of time, and s econd, of money , b ecau se it is nec ess ary to emplo y man y p e ople to mak e the in tervie ws, pa y thei r trips to let the m go to ev ery village,there is Moreo v e r, etcadditional di cult y: is complicated to get to eac h and ev ery Spaniard, b ecaus e wh en w e mak e an . it the in terview s, the re will b e p eople in hospitals, in a t rip t o a foreign counsituation, In this try , etc. for economic reas on s, it wi ll b e con v e n ien t t o in terview a cert ain part of the p opulati on, a sample, c hosen in an appropriate w a y so that w e can obt ain later c onc lusions for the whole p opulation. In the secon d situation w e h a v e a d i eren t dic e w an t to kno w the l asting of a If w u lt y . , ha e battery w e v to useit un tilit i so v er. e re f ore, Th scertain w e omeho w des tro y ele men t the this of p opulation. f w e w ould ha v e to try eac h and ev ery b attery of the p opulation, w e w ould k eep none I of them. Th us,what w e should do in this situation is al s o to c ho ose an ap pr opr iate s ample and the n w e could tak e the approp riate gene ral conclu sions. Due to the reason s w e ha v e j ust men tion ed, it is c on v e nien t in man y instanc es to use samples But if w e w .an t to get really go o d conclusion s from the m, w e need to assure that w e mak e a righ t c hoice of samples.F orinstanc e, the c ase the in ternet cess in pai n, w ec h o ose our for of ac S if 10 p e ople of the 40 millionof inhabitan ts, sis c learly enou gh, is not a repre sen tativ e out thi not it sample. wil l als o not b e repre sen tativ e if w e c ho ose 100 p eople f rom Madrid, or c h o osing all y our I friendsand y our family . There are some ics top whi c h shou ld b e cl early den ed once w e w an t to sample: 1. Th e selec tion meth o d for the elem en ts of the p opulat ion (sam p ling metho d to b e us ed). 2. Sam p le size . 3. Reliabilit y degree the c onc lusions atw e an of th c obtain,thi sis, an e stimationf the error o that w e are going to h a v e (in terms of probabilit y). As w e ha v e ju s t said, a non ap pr opr iate sele ction of the eleme n ts of the samp le can cause fu rthe r errors once w e w an t to es ti mate the corresp on din g p arame ters i B the w e can nd n u t p opulation. some more dieren t t yp es of errors: tervie w e r can b e partial, the in this is,he c a n pr om ote s ome ans w e rs more than othersan als o happ en that the p erson w e are going to in tervi ew d o es not It c . w an to ans w er c ertain tions (or can not answW e c lassify all thes e p ossi ble err ors in the t ques er). follo wing w a y: 1. Selection error: if an y of the e le men ts of the p opulation has a higher probabilit y of b ei ng selec te d than the rest. us imagi ne that w e w an t to meas ure ho w s atis e d the clien ts Let of a gymnasium ar e , and for that, w e are going to in terview s ome of them from 10 to 12 in t he morni ng. s means that the p e ople who go to t he gymnasiu m in the af terno on will n ot b e Thi repre sen ted, and then the s ample will not b e represe n tativ e of A wth y tolien ts . all a e c a v oid this kind of err ors is c ho osing the s ample so t hat all the clien ts ha v e the s ame p robab ilit y of b e ing s elec ted. 2. Non-answer error : it is als o p ossible that som e of the elemen ts of the p opulation do not w an t or cannot a nsw er c ertain qu e it can s. Or stion also happ en, when w e ha v e a qu es tionnaire inc luding e rs onal p question s , s ome the me m b ersthe p opulation not answ e r that of of do hone stlyThis e r rors generally e ry omplicateto a v oid, utin casethat w e an to . are v c d b w t 3

c hec hones tin answ ers , ecan inc lude k y w somequestion ( lterques tions) detec if t he s to t answ e rs are h ones t. After wh at e v see n tilno w, e w ha e un w can sa y that w e a vaeb ias ed h sample whenit is not repre sen tativ e for the p opul ation .

1.2

Sampling tec hniq ues

W e ha v e already s tre ssed the imp ortance of a righ t c hoic e for the ele me n ts of the sam p le s o as to mak e it repre sen tativ e of our p opulation but, ho w c an w e classify th e dieren t w a ys of c ho osin g a sample? e can s a y that the re are three t yp es of sampling: w 1. Probabilit y s ampling:is the one in whic h eac h sample has the same p robab ilit y of b eing it c hosen. 2. Pur p osivsampling:it is the onein whic h e the p ers on is selecting sample swh o who the i trie s to mak e the sample represen tat iv e, dep ending on his opinion or purp ose, th us b eing t he repre sen tation sub je ctiv e. 3. No-rule samp linw e tak e a sample without an y rule, b eing the sample re p res en tativ e if the g: p opulation is homogeneous and w e ha v e no sele ction bias. W e will alw a ys mak e probabilit y s ampling, b ecause in c ase w e c ho ose the approp riate tec hn ique , it assures that the sam p le repres en tativ e w ecan estimate errorsfor the samp ling. us is and the There are d i eren t t yp es of probabilit y sampling: Rand om sampling with and without re placemen t. Stratie d sampli ng. Cluster s ampling. Sys tematic sampling. Othe r t yp es of sam p ling te c hniques. Let us imagineno wtha tw eha v alreadysele cted samp le.F roma high sc ho ol e a with 560 stude n tw eha v s elected sampleof 28 stu den ts kno w they ha v ien ternet nec ti at e a to if con on s, home.But, what d o es it m ean to s elec t 28 out of h p rop ortion of the p opulation are w e Whic 560? wh en w e w an t to ha v e conclus ions ab out the p opulation, ho w man y of the studen ts selec ting? And of the p opulation do es e ac h one of the s ample e le men ts repre sen t? T o calculate the prop ort ion of stu den ts that w e are in tervi ewing, w e divide the s amp le size b y the p opulat ionize, his i s: 28 560= 0.05, and this me anshat w e ak the p ollt o5% of the s t / t m e p opulation. No w w e are goi ng to calcul ate ho w man y stude n ts re p res en tselemen ts of t the eac h one of he sample.W e mak e the othe r quotien t, no w w e divide th e n um b e r of el eme n ts of the p opulation b y the n um b er of e le men ts of the 560 28= 20, wh ic h w ould me an that e ac h of the studen ts s ampl e: / of the sample re presen ts 20 st ude n ts of the high s c ho ol. The t w o con cepts that w e ha v e j ust pre sen t ed ha v e the follo wing formal denition: 4

1. Elevation factor: it is the quotien t b et w een the size of th e p opulation and the size of the N sample,n . It repre sen ts the n u m b e r of el eme n ts existing in the p opulation for eac h e lemen t of the sam p le. 2. Sampling factor: it is t hequotien b e t w e ensizeof the samp le t the and the s iz e f t he o n p opulation, . If thi s quoti en t is m ultiplie d b y 100, w e get the p erce n tage of the p opulation N repre sen ted in the s ample .

1.3

Random sampling wi th and wit hout replace me n te W eha v alread y men tionethat if w ew an to samp le suc h w a yhat the sample eget is d t in a t w
repre sen tativ e, w e should c ho ose a probabilistic sampling tec will y ou d o to sele ct 28 Ho w hnique. stude n t s out of 560 in a high sc ho ol to get that all of the m ha v e the s ame p robabilit y of b eing in the sample? The easies t thing w ould b e to mak e a dra w to c ho os e 28 of the c ho ose to m, th is is, the m randomly , so that the y all ha v e the s ame p ossib ilit y of b elongin g to t he sample. This s elec tion o cess pr corresp onds a randoms amplin g. ewill sa ythat w ear emaking to W random s ampling wh en the pro ces s, th rough whic h w e c ho ose the s a mple, guaran tee s that all the p os sible samples that w e can tak e from the p opulation ha v e th e same probabi lit y of b eing c h osen, this is, all the eleme n ts of the p opulation ha v e the s ame probabilit y of b e ing c hose n to b e long to the s ample. Whe na certaine le men tse lecteand w eha v me asurethe v ari ables is d e d neede d a certain in study and it can b e selecte d again, w e sa y that w e mak e s ampl ing with replacem en t. This sampling tec hnique is usually call ed simpl e ran dom sampling. In the case that the eleme n t cannot b e s elec ted again af ter b ein g sele cte d once , w e s a y that w e ha v e obtain e d the sample through a ran dom s ampl ing without replacem en t. In ou rexample, henw eare goingto s electthesampl e ut of the 560stude n of t heh igh w o t s sc ho ol, if w e are going to ask ab out the fact that they ha v e in ternet con nec tio n at h ome or not, it is not in teresting for us to ask t wic e the same p erson, s o onc e w e c h o ose an eleme n t of the p opulation w e dont w an t to c ho ose it So w e w ould mak e random s ampling with out repl aceme n t. again. gh Thou theset w o metho dare diere n t,whenthe s ize the p opulation inn ite, r it is s of is o so b igthat w ecan con sider that it is innite,b othmetho ds ill leadus to similarconclu sion s . w Nev ertheles s, s ampling fraction is greater if the n/N than 0.1 (w e ampl morethan 10% ofthe s e p opulation) the d iere nce b et w een the conclus ions w e get ma y b e imp ortan t. Whe n w e as k in our exam ple if the stude n ts ha v e in terne t c onnec tion at home or not, w e are in te r e ste d not only in the n u m b er of st uden ts ha ving the con nection but also in the prop ortion that it repre sen ts in the high scThes e t w o v alues and the a v erage in some other cases (for instance , ho ol. when w e ask ab out the he igh t of the stud en ts), are the param eters c alculate d more ofte n and the one s w e usuall y w an t to In the cas e of random samp ling, with and without repl aceme n t, estimate . the se estimator s ha v e the follo wing expre ssions: T otal: n Xi X =N X . b n i =1 Av erage: X =X b
n

i =1

Xi . n

P rop ortion :

P =X b

i =1

Pi . n

The prop ortion w ould b e the a v e rage of a v ariable that only c an b e zero orssions In the expre one. ab o v e : X i is the v alu e of th e v ariable w e ar e s tudy ing. N is the size of the p opulation. n is the size of the samp le. P i is a v ariable that tak es v alues 0 or 1. The estimation of the error f or th ese estimator s w ould b e: Total: F or s ampl ing with replac eme n t: S2 V (X ) = N 2 . b b n F or s ampl ing without repl acem en t: V (X ) = N 2A b b Average: F or s ampl ing with replaceme n t: S2 V (X ) = . b b n F or s ampl ing without repl acem en t: V (X ) = A b b Proportion: F or s ampl ing with replaceme n t: PQ V (P ) = b b . b b n 1 F or s ampl ing without repl acem en t: V (P ) = A b b n PQ ) bb. N n 1 nS 2 ) . N n nS 2 ) . N n

1.4

Strati ed sampling

Let us imagin e n o w that w e w an t to mak e a p oll to kno w what d o p eople in y our c it y do in the ir spare time. e all kno w that anci en t p e ople do not h a v e the sam e ac tiv ities than middle-age p eop le, W as y our pare n t s, f or inst ance.ould b e in te reste d in getting that all the information, that w e Wew alre ady kno w, can help us to nd a more represen tativ e sample.ar e in tere sted in getting In fact, w e that all thes e groups are represe n ted in our sampl e . Th ese groups that ha v e b een dene d (in our 6

example, b y ages ) w e will call the mWhat w e will d o no w is t o div ide our s ample in s u c h strata. aLet w a y that w e ha v e elemen ts of all the us den e the w a y w e sampl e in th is strata. us c ons ider w e a vou r p opulation s ize divide d in to subp opulations siz es case. of N Let that h e k of N1 2 N rify that + k . Thes e subp opulat ions ar e disjoin t and v e 1 + N 2 + N k = N . Eac h ,N the sub p o pulations of ,...,N is calle d stratus. If w e an to ha v a sample n elemen ts the initial w t e of of p opulation, w e sele ct a sample ni soiz e 1 + n2 + nk = n. of s thatn + Whic h adv an tages and disadv an tages presen ts stratied e pr e sen t t hem no w: W sampling? Adv an tages: W e can ha v e more p reci se information inside the subp opulations ab out the v ariables w e are stud ying. W e c an raise pre cision of the estimator s of the v ariables of the whole p op ulation. Disadv an t age s: The c hoic e of the size of th e samp les insid e eac h stratu s to let the n. sample size b e It ma y b e dicul t in s ome p op ulations to divide in to strata. As a general th in g, st ratie d s ampling pr o vides b etter results than the random s amplin g wh en the s trata are more dieren t am ong t hem and more homogeneous i n ternal ly . W e c an c ons ide r 3 me tho ds to dis tribute th e si ze of the s ample am ong the strata. 1. Prop ortionally to the s ize of eac h stratus , i.e., if w e jtak s tratus with s N j , and -th e the n a s ample of this stratu s will ha v(N js ), b eing the ize of t he p opulation and n e /N N s ize ize s iz e of the sample. n the 2. Prop ortionallyo the v ar iabilitof the parametew eare c ons idering e ac stratu s.F or t y r in h ins tance, if w e kno w that the v ar iance for the h e igh t in the male s tuden ts is 15 cm and for the female studen ts is 5 cm, the prop ortion of the male studen ts to fem ale stud en ts is 3 to 1 and the sample s h ould k eep that p rop ortion . 3. W e as sign the same size to eac h As ratu ons eque nce w e p romote the smaller strata and st a c s . the c on trary happ ens with the bigger ones in te rms of prec is ion. F or the case of strati ed s ampling, the main estimator s are the follo wing: Total: X = X N hX .h b
h=1 k

Average: X = X whX b
h=1 k h

=X

Nh xh. N h=1

Proportion:

P = X wh bh b h=1 P , where X h is t he sample a v erage for v ariab le X in stra tus h. N h is the size of stratu h. s N is the size of the p opulation. nh is the sample s ize in str h. atus n is the sample s ize. P h is t he sample p rop orti on of the v ariable in s tratus h, b and the estimation f or the error w e mak e when w e estimate the p op ulation p arame ters is : Total:
2 V (X ) = X N h A b b h=1 k

S2 f h ) bh , nh

with fh = Average:
2 V (X ) = X wh A b b h=1 k

nh Nh

nh " 1 X 2 2 Sh = X hi b nh 1 nh i =1

nh

# xh .

S2 f h ) bh , nh

where h , f h y w Proportion:

2 Sh

are the same as b e fore.


k

2 V (P ) = X wh A b b h=1

P hQ h f h) b b , nh 1

wherebh = 1 P h. Q b

1.5

Cluster sampling

W e think n o w ab out making a p oll to stu dy the a v e rage h e igh t of the s t ude n ts of h igh sc ho ols in y our cit y Inste ad of sampling am ong eac h of the st ude n ts of the cit y , w e c ould c onsider the p oss ibilit y of . c ho osing some quarte rs b ecause referring to the h e igh t, quarters ar e lik e small p opulations that w e can compare to the cit this case, can w e simpl ify th e c hoice of the sample so that w e c ho ose In y . quarters withou t lo osing accuracy?answ er is th at in this case , w e could c ho ose quarters and The analyzethe h eighwithou t osing t lo accuracy Let us pres enthe s ampling etho d . t m whic h allo ws that. In cluster s ampling, p opulation is divid ed in to units or groups , called s trata (usually they are units or areas in whic h the p opulati on h as b een divided i n), w h ic h should b e as represen tativ e as 8

p os sible for the p opulation, i.e., they shou ld represe n t the heterogeneit y of the p opulati on w e are studyin g and they should b e homogeneous among them. The reason to mak e thi s sam p ling is that s ometime s it is to o e xp ensiv e t o mak e a complete list of all the eleme n ts of the p opulation that w e w an t to study , or that when w e nish making the list y ha v e no sens e to mak e the it ma study . maindisadv an tage w e The that ma y v is that if the clus te rs ha e arenot homoge neou s amon g the m, the nal sample ma y n ot b e re p resen tativ e of the p opulation. If w e supp ose that the clu sters ar e as het eroge neou s as the p opulation, refe rring to the v ariable w e are con siderin g, and that the cl usters are homogeneous am ong them, then to get a sam p le w e only ha v e to c ho ose s ome clus ters.y that w e mak e cluster sampl ing in one stage . Wesa This s ampling meth o d has the adv an t age that it s implies the c ollecting of the s ample information. Let us see no w the expre ssions of the estimators for this s amplin g tec h niq ue: Total: P X =M b P Average: P X = b P Proportion: P P = b P
n i =1 A i n i =1 M i n i =1 X i n i =1 M i n i =1 X i n i =1 M i

where X i is the total of v ar iab le clu ster X in i. b X i is t he sample a v erage of v X in clus ter ariable i. b N is the n um b er of cl usters of the p opulation. M is t he size of the p opulation. n is the n um b er of c luste rs of the sample. M i is the size of cluster i. A i is the tot al of v ariable A, whic h tak es v alues 0 or 1 in cluste r i, and the estimation of the errors w e mak e when w e e stimate th rough these e xpres sions are: Total: V (X ) = b b Average: V (X ) = b b Proportion: 9 N (N n) 1 X (X i M2 n n 1 i =1
n

N (N n) 1 X (X i n n 1 i =1

XM i )2.

XM i )2.

V (P ) = b b

N (N n) 1 X (P i M2 n n 1 i =1

PM i )2.

1.6

System ati c sam pling W e can think ab out a dieren t w a y of sam p lin s imagin e that in y our high s c ho ol and w e Let u g.
ha v decid ed c ho ose p e ople. this case,the e lev ation e to 28 In factorw ould e560 28 = 20. W e b / n um b er stud en ts from 1 W e t hen c ho ose a n x randomly from 1 to 20 and this w ould to 560. u m b er b e the rst stu den t sele e n, w e select n uxm b ,x +2 Th +20er 20and so on.t is not a random I cted. sampling b ecaus e all the s amples are not equal lyLet obab e ne this sampl ing tec hn ique. pr us d le. Let us supp ose that w e ha v e a p opulation of N elemen ts or dered an d n um b e re d from 1 to N, and w e w an t t o get a s ampl elemen ts. ne This p opulation can b e divi ded in n subsets , eac h of withm with = N ele men ts, i.e., eac h subs et has as man y el eme n ts as the elev ati on factor indicates . the v n W e rando mly c ho ose a n u m b e red e le me n N from w e call t 0, and the n w e tak e t and 1, 2 un x il it n the follo wi ng e le me0 n ts: 0 + 2v,x0 + 3v,x0 + 4v,.. x + v,x . In case th atis not a natu ral n um b er, w e clear to the c loser on e (lo w er) , so ma yb e some s ampl es v ma y ha v e size 1. This fact b rings a sm all p ertu rbation in the theory of system atic sampl ing, n that w e do n ot ha v e to tak e in to accoun t, n> 50. if This t yp e of sampli ng ne eds that w e ha v e previous ly c h e c k e d that the orde re d ele men ts presen t no p erio d icit y in the v ariables w e w an t to stu dy , b ec ause if w e can nd p e rio dicit y and it is c lose alue, the results that w e obtain w ould ha v e a big bias and w ould not b e v to v v alid. samplingis equ iv alen t ran dom S ys tematic to sampli ng the elem en are n um b e red if ts in an random w a y . Adv an tages of thi s metho d are: 1. Extends the s ampl e to all the p opulation. 2. It is v ery eas y to apply it. Disadv an t age s of the metho d are: 1.In cre ase of the v arianc e if there is p erio dicit y in the n u m b e ring of the e le me n ts, ap p earing a bias due to selec t ion. 2. Problems wh en w e w an t to e stimate th e v arianc e. W e can con sider an instan ce of clus te r s ampling, ha ving eac h c luste r the follo wing ele me n ts w e presen t b y their n um b er in the list: Fir s t cluster: 1 + 1 + 2, 1 + 3, 1 + 4 ,.. 1, v, v v v . v Second cluster: 2 + 2 + 2, 2 + 3, 2 + 4,.. 2, v, v v . ... v-th cluste rv, 2v, 3v, 4v,... : Selec ting a system atic sam p le is equiv alen t to s elec t randomly only oneso, it is T o do clu ster. nv. necess ar y that eac h of the clusters has a similar s tructur e to the p opulation. W ecan als o consider systematic sampling a particularcas e str atie sampling as of d with n strata, eac h of them with men ts, s o that w e c ho ose onl y one elemen t of eac h stratu s. v ele 10

In strati ed sampl ing the s elec te d elemen t is random , while in this tec hnique w e c ho ose randomly the rs t ele men t and the re st are d e termine v. b y factor d The e stimators for this t yp e of sampling are: Total: X = v X X i. b
i =1 n

Average: 1 X = X X i. b n i =1 Proportion: 1 P = X Pi , b n i =1 where is a v ari able taking v alue s 0 or 1. P


n n

1.7

Other sampling tec hniq ues Tw o-stage s amplin g is a particular case of clus ter s ampl ing in whic h in the s econd s tage w e d o not
select all the elemen ts of the cluster, but s ome eleme n ts c h osen in a random in a y . Clust ers w the rst stage ar e c alled primar y units an d the ones in the se cond stage are secondary units. Multistage sampling is a generalization of the previous tec h niq ue, so that eac h cluster c an b e a group os clus ters and so on in e ac h st age . In general, to mak e c ompl icated stud ies con cep ts of stratifying, clu sters and random s ampling are used. or instance , the p opulation of a coun try can b e d ivid ed in to clusters (pro vinces , cities , F quarters) that can b e heterogeneous insid e (for ins tance, refe rring to c ons u m p tion) bu t homogeneous among them . Afterw ards it is nece ssar y to divide th ese units in hom ogeneous strata (primary units , for i nstance , quarte Eac h of the se units is divided in to new units (buildings) c alled secondary rs). units, whic h are divided in to ats (hou ses). oul d c ho ose our sample in the follo wing w a y: Wew 1. W e select a stratied sampl e w ould tak e at least one stratus (one quarter). W . 2. W e c ho ose ran domly some bu ildi ngs of eac h of the s elec ted quarters. 3. W e tak e random ly one or sev eral h ouse s of e ac h of the bu ildings selec ted.

11

Chapt er 2

An exampl e o f the appl icat ion of sam pli ng tec hniq ues
W e ha v e de cide d to mak e a study in a high e w ho ol. ha v e data ab out the n u m b e r of le f W s c an t to t handed st ude n ts, the n um b er of stu den ts who ha v e in te rnet conn e ction at h ome , the heigh t of the stude n t s and t he p o c k e t money they receiv e w eekly . use fulne ss of kno wing the n um b er of left handed s tuden ts of a high s c ho ol is e asy to un The derstan d, ecause high sc h o s hould v an appropriate b the ol ha e equi pmen t them,for instan ce for adapted c h airs. In tern et conne ction homeis an imp ortaninformati on.t can b euse d at t I not onl yto c hec k w e the r it is p oss ible to oer some material for the s tuden ts throu gh the in ternet, bu t also to kno w if they access to s ome other didactic information a v ailable on the w eb. The stu dy of heigh t is clas It is an yw a y in teresting to kno w if h e igh t is c hanging with y sical. and the p opulati on is getting ears taller. P o c k mone y a so ci al et is relev an t data. It is alsoin tere sting kno w w uc mon ey he to ho m h t stude n t s deal with , and it is also in terest ing to kno w ho w they s p end it to unde r s tand what the y ev ote their time to. d Onc e w e ha v e xe d what w e w an t to get, w e d e cide t o sample to get t he conclusion s ab out theall stud e n ts of the high s c ho ol without asking e ac The information a v ailable for us is h of th em. the one referred to the distribution of stud en ts in y ears and class es: A 1st y ear 33 2nd y ear 20 3rd y e ar20 4th y ear 27 5th y ear 33 2th y ear 30 B 20 15 15 27 28 34 C 30 26 25 30 32 T otal 53 65 14 75 79 31 23 145 31 127 D E

S o w e are w orking with a p opulation of 544 stu den ts of a h igh sc ho ol. W estart p osin th atw eare goingto usea s ampl s iz e f around60 stud e n whic h t he g e o is ts, maxim um allo w ed an d that w e think that ma y b e e n ough for the st udy w e are going to mak e . 12

W e c an ge t th en the rst information, our samp ling fraction w ould b e: n 60 = =0 .1102 , N 544 this is, w e ar e goin g to sampl e appro xi mately 11% of the p e can also c al c ulate t he W opulation. ele v ation factor whic h w ould b e: f = N 544 = =9 .1, n 60 or equi v alen tly , eac h studen t in tervie w e d represen ts 9 colleagues . No w w e ha v e to decide wh ic h metho d w e w an t to us e to sample th e d i eren t c haracte ristics w areegoing to s t udy t us denote them i n the follo wing w a y: Le . E = X will represen t the heigh t. Y will repre sen t the p o c k e t money . Z will repre sen t v ariab le b e ing le ft handed, whic h will tak e v alu e 1 if a stude n t if left handed and 0 if he/s h e is n ot left handed. I will repr esen tar iable ving n ternet onnec tion home v ha i c at whic h will tak e alu 1 in v e armativ e case and 0 in n e gativ e case. W e will mak e a diere n c e in to 2 cases of the 4Th ariables. v e rst thing, w e mak e to our selv es a questionw e h a v e our p opulation divided in to gr oups and lev els , can w e consid er that this division : has an inuence in an y of these v ariables? can w e c ons ider that in eac h l ev el, for instance , This i s, the a v erage h e igh t can cThe an sw er to this ques tion is that it is logic to think t hat i t hange? will c hange. p riori, w e c an supp ose that the age h as an imp ortan t inuence for the heigh t. A And for the p o c k e t mone y?the age is also imp or tan t, b ecau se w e all could get more money from our Then pare n ts wh ile get ting Do es it happ en the same for b e ing left Th e n , d he answ er is n o older. h ande t ? b e cause i f y ou ar e left hande d , t his happ ens from the da y y ou w ere morn, so age has no inu ence on this. An d th e same appli es for the f ac t of ha ving in ternet conne ction at e c ho ose So w home. dieren t sam p lin g tec hniques for these t w o case s. Case I: variables pocket money and height w e ha v e alread y men tioned that w e ha v e the p opulation divid ed i n toFlev els ,and groups. or us the division in le v els is a division in ata ecause the lev e ls are h omoge neous ins ide the m w ith str to b resp ec t to the age (an d w e c an also think that it happ ens t he same for the p o c k et money and the heigh t), an d a s w e ha v e said b e fore, age has a big inue nce on these v ariables and it mak es sense that w e are in terested in ha ving all these s trata re presen ted in our sampl e . for t hese So w e c ho ose cas es andom st r atie d sampl ing r . The next thing to b e d one is to dec ide the sample size ins ide eac h strat a. W e ha v e 6 strata with the follo wing siz es:

13

Stratus Siz e 1st l ev el (str atus 1) 1 = 53 N 2nd lev el (stratus 2)N 2 = 65 3rd lev el (stratu s 3) 3 = 75 N 4th l ev el (str atus 4) 4 = 79 N 5th l ev el (str atus 5)5 = 1 45 N 6th l ev el (str atus 6)6 = 1 27 N The usual thing in this situation is to use samp le size in the s trata prop or tiona l to their siz e, so that the siz es of the s ampl es k eep the s ame prop ortion than the siz esW e c astrata. of the lculate the n the size of the sample in eac h stratus through the follo wing e xpression: Ni ni = n , N and w e g e t the follo wing s ample s iz es: 53 n1 = 60544 = 5 .84s o w e tak 1e= 6 n , 65 n2 = 60544 = 7 .16s o w e tak 2e= 8 n , 75 n3 = 60544 = 8 .27s o w e tak 3e= 8 n , 79 n4 = 60544 = 8 .71s o w e tak 4e= 8 n , n5 = 60145 = 15 .99s o w e tak 5e= 1 , n 6 544 n6 = 60127 = 14 .00s o w e tak 6e= 1 , n 4 544 where the clearing h as b e en made t o k eep the sample s ize 60 that w e had ha v e So w e p osed. the sample size s th at w e needed and w e can mak e random sampling inside eac h str atus, to select the n um b er of studen ts th at w e ha v e alread y dec ide d . Our data are the follo wing: for the hei gh t w e got: St St St St St St ratu ratu ratu ratu ratu ratu s 153 1165 s 164 2157 s 165 3168 s 175 4164 s 185 5175 s 177 6190 161 161 165 171 173 178 153 168 175 177 161 194 150 162 175 163 158 183 151 165 165 170 175 165 171 163 165 164 170 169 165 160 158 161 158 171 175 170 187 168 170 176 173 168 183 173 183 174

and for the p o c k et mone y: St St St St St St ratu ratu ratu ratu ratu ratu s s s s s s 110 0 3.5 30 0 20 5 0 0 15 0 3 2 35 8 8 10 20 5 10 0 412 6 5 0 12 12 6 0 5 5 10 12 15 10 12 30 12 30 10 6 15 5 10 21 40 612 10 9 06 8 9.4 15 0 20 10 15 10 0

W e no w pr o ce ed to the esti m ations T h e rst thing to do is to c alculate the a v erage in th e strata, whic h giv es s inf ormationb out u a the b eha vior the v ariab les the s trata.Later on, w ewill of in calculate the a v e rage of t he heigh t and the p o c k et money of the studen ts of the high sc ho ol and w e

14

will gi v e it together with an estimation of the error w e get when w e mak e s uc h an e stimation . We mak e the pro c ess indep end e n tly for eac h of the v ariables: F or the h eigh t w e ha v e: Stratus Av erage 1 x1 = 155 .5 2 x2 = 164 .625 3 x3 = 167 .625 4 x4 = 168 .125 5 x5 = 1 69 .3125 6 x6 = 177 .642857 Std deviation 2 Sx1 = 36 .7 2 Sx2 = 21 .4107 2 Sx3 = 22 .5535 2 Sx4 = 36 .6964 2 Sx5 = 81 .6958 2 Sx6 = 67 .478

W ecan directlyseethat s omething curious. The a v erageincreasing the lev elncreas es. is as i This leads us to thi nk that the c hoice of strati ed s ampling h as b e en righ t in this case. W e c alculate no w the same for p o c k et mon ey: Stratus Av erage 1 y1 = 2 .75 2 y2 = 3 .125 3 y3 = 8 .25 4 y4 = 6 .625 5 y5 = 15 .1875 6 y6 = 8 .8857 St d deviation 2 Sy1 = 4 .026 2 Sy2 = 26 .4107 2 Sy3 = 33 .3571 2 Sy4 = 25 .4107 2 Sy5 = 101 .2291 2 Sy6 = 35 .229

No ww e calculat e e stimated v erage the a fromthe complete sample and the estimation the of error in terms of the estimati on of the v arianc e for t he 2 v ar iable s wF or the tudying. e are s heigh t:
6

X = X whx h = X b
h=1

Nh 53 65 75 79 155 164 167 168 xh = .5+ .625+ .625+ .125 N 544 544 544 544 h=1 +

145 127 169 177 .3125 + .642857 168 = .9463 . 544 544 The e xpression for the v ariance is
2 V (X ) = X wh A b b h=1 k

S2 f h ) bh , nh

and in our case w e ha v e: Stratus 1 2 3 4 5 6 wh 53 =0 .095 544 65 =0 .1194 544 75 .1344 544 = 0 79 =0 .1415 544 145 =0 .2598 544 127 =0 .2276 544
2 wh 0.009 0.014 0.018 0.02 0.0675 0.0518

fh 6 =0 .1132 53 8 =0 .123 65 8 =0 .1066 75 8 .1012 79 = 0 16 =0 .1103 145 14 =0 .1102 127

1 fh 0.8868 0.8769 0.8934 0.8988 0.8897 0.8898

15

No w w e subs ti tute thes e n um b ers in the pre vious exp ression and w e g e t:
2 V (X ) = X wh A b b h=1 k

S2 36 .7 21 .4107 22 .5535 f h ) bh = 0 .009 0.8868 + 0.014 0.8769 + 0.018 0.8934 nh 6 8 8

36 .6964 81 .6958 64 .478 0.8897 0.8898 +0.020.8988 + 0.0675 + 0.0518 =0 .728 . 8 16 14 S o in the case of the h eigh t w e already ha v e our es timations. The e stimated a v erage heigh t is 168.9463nd w e c alculate that w e ha v e an e rror of 0.728. a No w w e mak e the same calculat ion for the p o c k e s tart b y .calculat ing the estim a ted W e t money a v erage : Y = X whyh = X b
h=1 6

Nh 53 65 75 79 2.75+ 3.125+ 8.25+ 6.625 yh = N 544 544 544 544 h=1 +

145 127 15 8.8857 8 .1875 + = .8633 . 544 544 The es ti mati on of the v ariance c an b e c alculated directly b ecaus e w e ha v e t he s ame v alues for wh andf h :
k

2 V (Y ) = X wh A b b h=1

S2 4.026 26 .4107 33 .3571 0.8769 0.8934 f h ) bh = 0 .009 0.8868 +0.014 +0.018 nh 6 8 8

25 .4107 101 .2291 35 .229 +0.020.8988 + 0.0675 0.8897 + 0.0518 0.8898 =0 .666 . 8 16 14 Case II: Variables being left handed and having internet connection at home No w w e w an t to stud y v ariables b e ing left handed and ha ving in ternet connection a t home . It is eas y to see that the d ivis ion in to str ata is not u s eful i n this case, s o w e s h ould th ink ab out using someothe r sampl ing hn ique. es till w an ge t s amplef around60 stude n W ecould tec W to a o t ts. think that, with re sp ect thesev ariable the groupsthat the p opul ation dividedin b eha v e to s, is lik esmallp opulations, e., w e i. can cons ider that the groupsb eha v e ethe wh ole lik high s c ho ol. More o v er it i s in tere sting for us the p ossibilit y of sampling some groups b ecause sele cting a ran dom sample of stu den ts, nding th em and in terviewing the m is not an e asy task. But no w, hat are groupsfor us? W eh a valreadysaid that insidethem,they b eha v e e w e lik sm all opulations p with resp e ct our v ariables , to whilethe groupsare s imilar amongthem. This me ans that w e ha v e the p opulati on divi ded in to clusters, so w e will apply c luste r sam p lin g to this situation. The n ext t hin g to b e done is t he n um b er of groups to bW sampl w that the groups e e kno e d . do n ot ha v e the same size, but 2 or 3 group s w ould ass u re u s a s ample of around 60 s tuden ts. To a v oid the p oss ibilit y of ha ving a sampl e of 2 sm all groups and the n getting a to o sm all sample for our purp o s es, w e decide to se lect 3 gr oups fr om the high s c ho ol. 16

S o the data w e ha v e got are the folloor the v ar iable b eing le ft h ande d: F wing. Cluster 1: 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0, Cluster 2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0, Cluster 3: 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0, where1 means eing handedand 0 means eing b left b not le fthande d No w,for the v ariable . ha ving in t erne t conn e ction at hom e, w e got: Cluster 1: 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 0 1 0 0, Cluster 2: 1 1 1 0 1 1 0 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 0, Cluster 3: 1 1 0 1 0 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1, where no w 1 me ans ha ving in terne t conne ction at h om e and 0 means ha vi ng not. W e s tart no w estimatin g the total amoun t of left handed and thenprop or tion of lef t s tude ts, handed stud en t s, as w ell as the total amoun t of stude n ts ha ving in ternet c onnec tion at home and the prop or tion that this represe n ts in th e whole high s c h o ol. W e c alculate the total an d prop o rtion for eac h group and v ar iable : Left handed In terne t Clus ter T otal Prop ortion T otal Prop ortion 1 3 0.15 10 0.5 2 0 0 17 0.7391 3 2 0.08 20 0.8 No ww e an c calculat e estim ationsorthe p rop ortion total of v ariables and I . W e the f and Z start with v ariable Z: P Z =M b P
n i =1 Zi b n i =1 M i

P = 544 P
n i =1 A i n i =1 M i

3 i =1 Zi b 3 i =1 M i

3+ 0+ 2 5 = 40 = 544 = 544 , 20+ 23+ 25 68

P PZ = c P

3+ 0+ 2 5 = =0 .0735 , 20+ 23+ 25 68

and w e d o the same for v ar iable I P n Ii P 3 Ii 10+ 17+ 20 47 = 376 I = M ni =1 b = 544 3i =1 b = 544 = 544 , b P i =1 M i 20+ 23+ 25 68 P i =1 M i P PI = c P
n i =1 A i n i =1 M i

10+ 17+ 20 47 = =0 .6911 . 20+ 23+ 25 68

W e con tin ue no w estimating the error w e h a v e comm itted for the v ar iable b ein g left handed: V (Z) = b b N (N n) 1 X 21 3) 1 (Zi ZM i ) = C n n 1 i =1 3 2 17
n 2 20) + 0.0735 2 23) + B 0.0735 2 25) 0.0735

S-ar putea să vă placă și