Sunteți pe pagina 1din 14

ICRAF Research Support Unit

Technical Note

SAMPLING SIZE DETERMINATION IN FARMER SURVEYS

Ric Coe

Version January 1996

ICRAF World Agroforestry Centre PO Box 30677, Nairobi, Kenya Telephone : +254 2 524 000 or +1 650 833 6645 Fax : +254 2 524 001 or +1 650 833 6646 http://www.worldagroforestrycentre.org

Correct citation: Ric Coe, 1996. Sampling Size Determination in Farmer Surveys. ICRAF Research Support Unit Technical Note No 4. ICRAF World Agroforestry Centre Nairobi, Kenya. 11 pp.

Copyright 1996 ICRAF World Agroforestry Centre

This publication is the intellectual property of the International Centre for Research in Agroforestry. While use of the information it contains and its reproduction is encouraged, the content should not be republished in any way for commercial purposes without the permission of the publishers. The publisher and the author make no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made. All terms mentioned in this publication that are known to be trademarks or service marks have been appropriately capitalized. The publisher and the author cannot attest to the accuracy of this information. Use of a term in this publication should not be regarded as affecting the validity of any trademark or service mark.

ICRAF World Agroforestry Centre PO Box 30677 Nairobi Kenya http://www.worldagroforestrycentre.org

Contents
1 Introduction ..................................................................................................................................1 1.1 Types of objective ................................................................................................................1 1.1.1 Informal/Exploratory....................................................................................................1 1.1.2 Estimation of proportions.............................................................................................1 1.1.3 Estimation of means and totals.....................................................................................2 1.1.4 Comparison of groups (sub-populations) .....................................................................2 1.1.5 Estimating relationships ...............................................................................................2 Sampling and Estimates ...............................................................................................................2 Factors to consider when choosing sample size...........................................................................3 Calculations for a simple random sample. ...................................................................................4 Factors increasing required sample size.......................................................................................8 Factors decreasing the required sample size ..............................................................................10 Steps in fixing the sample size ...................................................................................................11

2 3 4 5 6 7

Sampling Size Determination in Farmer Surveys

Introduction

Choosing the sample size is a problem faced by anyone doing a survey of any type. What sample size do I need? is one of the most frequently questions asked to statisticians. The response always starts It depends on.... In this note I have summarised what it depends on, and the steps needed to reach a decision. The sample size must depend on what you want to know about (hence the section on objectives) and how well you want to know about it (the section on sampling variation). Factors such as how the sample is selected then further modify the required sample size. None of this material is new. It can be found in many text books. The books range from the practical to the mathematical. Good sources covering many practical points of farmer surveys are: Casley D.J and D.A.Lury (1987): Data Collection in Developing Countries, 2nd Ed. Oxford: Oxford University. 225pp. Poate C.D and P.F.Daplyn (1993). Data for Agrarian Development. Cambridge: Cambridge University Press. 387pp. Details of the mathematics can be found in books such as: -

Cochran W.G. (1977). Sampling Techniques, 3rd Ed. New York: Wiley. 428pp.

1.1 Types of objective


1.1.1 Informal/Exploratory

At early stages in many research programs surveys have general exploratory objectives such as To understand constraints in the farming system..... or To examine farmers soil fertility maintenance strategies. Informal survey techniques are used and informal approaches to sample size are sufficient. The simple rule is stop collecting data when you stop learning anything new. Such surveys are essential for developing understanding of issues and developing hypotheses, but will not be considered further here.

1.1.2

Estimation of proportions

Objectives of focused, formal surveys often can be reduced to estimation of proportions of a specified population. Examples are: The proportion of farmers in Embu that plant beans in the long rains, The proportion of individuals who spend at least 20% of work time working off farm, The proportion of farmland occupied by permanent tree crops.

ICRAF Research Support Unit Technical Note No 4

1.1.3

Estimation of means and totals

Examples are: The average amount of tree fodder consumed per cow during the 4 month dry season, The labour required to clear a 2 year old improved fallow plot, The number of grevillea trees per 100m of farm boundary.

Note Totals (for a population) are estimated as mean x population size. If the population size is known then the estimation of a total is the same problem as estimation of a mean. Sample size for estimation of other quantities such as the median or 25% point follows the same principles as for means.

1.1.4

Comparison of groups (sub-populations)

The objectives may require comparison of proportions or means (totals, medians,..) between different groups. For example: Do farmers with large farms have more livestock?, Is grevillea more common in agro-ecological zone UM2 than UM3?, Do female headed households use less hired labour?

1.1.5

Estimating relationships

Objectives may be reduced to confirming the existence of a hypothesized relationship or estimating parameters in a known relationship. For example Confirm that the number of fruit trees planted is inversely proportional to distance from the road.

Sampling and Estimates

For each of the above objectives there is a quantity M (or quantities) to be estimated. There is a true value of the quantity, which is the answer we would get if we could measure the whole population without error. This is (almost) always impossible, but the idea is useful. A sample is taken )and an estimate M of M is found ( e.g. If M is the mean then M could be the sample mean). If M is close to M we have a good estimate, if it is very different from M it is a poor ) estimate. ) If we took another sample we would get a different value of M . The set of possible values of M is the sampling distribution of the estimate. In practice we take only one sample, but the distribution of possible values is again a useful idea. Your one sample could give any one of the possible estimates so it is useful to know whether they are all close to the true value, or if there is a fair chance of getting an estimate which is very far from the true value.
) )

Sampling Size Determination in Farmer Surveys

Factors to consider when choosing sample size


Accuracy.

How close do you want your estimate to be to the true answer? It will never be equal to the true answer, but how much inaccuracy can you tolerate? Accuracy (or inaccuracy) depends on: Sampling error, the deviation of M from M due to the fact that we only measure some (a sample) of the individuals in the population. Sampling error can be mathematically examined and the effect of sample size determined. ) Non-sampling error, the deviations of M from M due to anything other than sampling error. For example, poorly phrased questions, inaccurate recording, refusal to respond, faking data (it happens!). Non-sampling error is very difficult to quantify and is usually ignored in sample size calculations, yet can be larger than sampling error. Sample size may or may not affect non-sampling errors. Many nonsampling errors increase with increasing sample size because of the less careful supervision and quality control possible, the larger number of enumerators, longer time for collecting and processing the data, and so on. Bias. An estimate is biased if it is consistently too large or too small. The definition ) of bias is the difference between M and the mean of the sampling distribution of M . Bias can arise for many reasons. For example: (i) sample selection favouring large farms, so that farm size and anything correlated with it will be over estimated. (ii) Respondents rounding up values to please interviewers. (iii) Interviewers rounding up answers because they are sure the farmer underestimated. Many sources of bias are unaffected by sample size.
)

Precision. The precision of an estimate is the spread of its sampling distribution, usually measured by its standard deviation. The standard deviation of the sampling distribution is called the standard error of the estimate. For many sampling schemes the effect of sample size on standard error can be calculated.

Cost. Increasing sample size will increase costs in the field (transport, enumerators) and afterwards (coding, data entry). Choosing sample size involves balancing all these factors.

ICRAF Research Support Unit Technical Note No 4

Calculations for a simple random sample.

A simple random sample is a sample of n selected from the population in such a way that every individual in the population has equal chance of being included in the sample. Estimating a proportion

If you are trying to estimate a proportion P then the estimate will be P , the sample proportion. The

standard error of P is A confidence interval is

P(1 P) = se(P) n

P t. se( P )
The value of t depends mainly on the level of confidence required. A common value is 95%, which gives a value of t of (about) 2, leading to simple calculations. A 95% confidence interval is a range of values which, roughly, we are 95% certain contains the true value P.

Note the width of the confidence interval is C = 2t se ( P ) = 4 se ( P ) when t = 2.

Example: A simple random sample of 50 farmers are interviewed. 40 of them own

cattle. If P=proportion owning cattle then P = 40/50 = 0.8 se( P ) = (0.8 0.2/50) = 0.057 95% c.i. for P is 0.8 2 0.057 = (0.69, 0.91). P lies somewhere between 69% and 91%. Estimating a mean The estimate of a population mean M is the sample mean M = values.

m
n

, where mi are the sample

se (M ) =

s , n

where s = sample variance =

(M

M)

n 1

Sampling Size Determination in Farmer Surveys

A 95% confidence interval is M 2 se ( M )

Example.

Twenty five 100m lengths of farm boundary were selected from all

boundaries. The mean number of grevillea trees was M = 11.2. The variance was s2 = 37.2. Then se(M) =

37.2 = 1.22 25

A 95% c.i. is 11.2 2 1.22 = (8.8, 13.6). The mean number of trees per 100 m of boundary is between 8.8 and 13.6. Estimating the difference between two proportions. P1 is the proportion in one population and P2 is the proportion in another. If interested in d = P1 - P2 , then samples would be taken in each population and the estimated difference is d = P 1 P 2 . The standard error is se(d) = taken in population i.

P1(1 P1) / n1 + P2(1 P2) / n 2 , where ni is the size of the sample

Example: A simple random sample of 25 large (>2.5ha) farms was selected. The proportion owning cattle was 84%. Another sample of 25 small (< 1.0ha) farms was selected. The proportion owning cattle was 53%. The difference is d = 0.84 - 0.53 = 0.31 se(d) = (0.84 0.16 + 0.53 0.47 )/25 = 0.12 A 95% c.i. is 0.31 2 0.12 = (0.07, 0.55). The difference in rate of cattle ownership is somewhere between 7% and 55% . Note that when comparing two populations, results can be presented in two ways. The significance of the difference can be calculated or the size of the difference, together with its uncertainty (standard error or confidence interval). The later is much more informative. Lack of a significant difference can be due either to the real difference being very small, or the difference being quite large but poorly estimated. Quoting a confidence interval for the difference distinguishes these two cases.

ICRAF Research Support Unit Technical Note No 4

Estimating the difference between two means M1 and M2 are the means of two populations and d is the difference. A simple random sample is taken in each population and the sample means found. Then
2 2 d = M 1 M 2 and se(d) = s1 / n1 + s 2 / n 2 , where s2 i is the variance and ni the sample size in

population i.

Example: The number of Grevillea trees per 100m of farm boundary was measured separately for farms settled over 10 years ago and for those settled less than 10 years ago with the following results: < 10 years > 10 years n 15 10 mean 14.2 8.4 s2 23.2 12.8 The difference in mean number of Grevillea is d = 14.2 - 8.4 = 5.8. se(d) = (23.2/15 + 12.8/10) = 1.7 A 95% c.i. is 5.8 2 1.7 = (2.4, 9.2). The difference in mean number of grevillea per 100m is somewhere between 2.4 and 9.2.

The calculations in each of the above sections can be inverted to give n, the sample size, if the other quantities are known.

Sampling Size Determination in Farmer Surveys

Estimating

Need to know

Proportion P

1. Approximate value of P 2. Width of confidence interval, C Or

16P(1 P) C2 P(1 P) se(P)2


16s2 C2
s2 se(M)2

2. se ( P ) Mean M 1. Variance in population, s2 2. Width of confidence interval, C Or

2. se( M ) Difference in Proportions P1 - P2 1. Approximate P1 and P2 2. Width of confidence interval of difference, C 1. Population variances s2 i 2. Width of confidence interval of difference, C

32 [P1 (1 P1 ) + P2 (1 P2 )] * C2
2 2 s2 32 s1 ( + ) * C 2 n1 n2

Difference in means M1 - M2

(* Total, assuming equal sample size in each population) Sampling fraction The sampling fraction, f, is the proportion of the population included in the sample. The sample fraction does not enter the above calculations. Sample size should not be selected by choosing the sampling fraction. The only exception is when the population is very small and the sampling fraction becomes large (>10%). Then standard errors are reduced by a factor (1-f). Note that when a census is done (i.e. every individual is measured and f = 100%), there is no sampling error. There may well still be substantial non-sampling errors.

ICRAF Research Support Unit Technical Note No 4

Factors increasing required sample size


Cluster sampling.

Simple random sampling is rarely practical. Some form of multistage or cluster sampling is usually used. For example, 50 farmers might be selected by choosing 5 districts at random, 2 villages in each district and 5 farmers in each village. Nearly always observations in the same cluster will be positively correlated (i.e. more similar to each other than observations in different clusters). Hence each new observation gives less new information than if it was independent, so the standard error is inflated. A clustered or multistage sample of size n is less precise than a simple random sample of the same size n. If cluster or multistage sampling is used then the required sample size will be larger than that predicted by the formulae above. How much larger depends on the size of the intracluster correlation. Choosing a suitable sampling scheme (eg how is the total sample size to be distributed between districts and villages within districts). is a separate problem not dealt with here. The basic principle is to ensure the sample is spread out through the whole population as much as possible. Thus, for example, choosing 2 villages then 50 farmers per village is likely to give far less precise results than choosing 20 villages and 5 farmers per village, though both have the same total sample size. Other non-independence Other sources of non-independence in the observations have the same effect as clustering. Examples include interviewer effects (responses collected by the same interviewer tend to be similar), communication between respondents (the extreme case being attempting to collect data from individuals at group meetings). These will increase the required sample size beyond that estimated above. Non-random sampling Non-random methods of sampling can have the same effect as clustering if the result is nonindependent observations. Non-sampling errors Non-sampling errors will inflate true standard errors beyond those given by the formulae above. A larger sample size will thus be required. Note that increasing the sample size may not help if the problem is bias, for example that caused by the sample selection procedure. Non-response, drop out, lost respondents. The planned sample size is often not achieved as selected respondents refuse to take part, can not be found or drop out of multi round surveys. Some of these problems can be avoided by having a well-defined rule for replacing selected respondents (e.g. if no one is at home at the selected house go to the next but one house on the other side of the road). Beware that selected respondents who refuse to take part or cannot be found may introduce bias (but that is another problem). The planned sample size must be increased to allow for non-response.

Sampling Size Determination in Farmer Surveys

Sub grouping by cross tabulation In the Grevillea boundary planting survey, the sample size was set to 25. At analysis it was decided a tabulation of mean number of trees, classified by type of household and the length of time since starting the farm, would be useful.

Household head Male Female

Time on-farm (yrs) <5 5-15 >15

The total sample of 25 may be adequate but the sample size within each cell, or along the margins, of the table will probably be too small. Multiway tables and tables with more than 2 or 3 categories very quickly lead to very small sample sizes in many cells. If the interest is in comparing the subgroups formed by cells in the table (which it must be, or the table is not interesting) then sample size selection for comparison of means should be used. Often one can choose the total sample size before starting the survey, but cannot choose the size of each subgroup. For example, the gender of the household head and length of time on the farm will not be known until the interview. The proportion of the sample falling into each subgroup will not be known until after data collection. However, for the purposes of sample size determination the proportions can be guessed or estimated sufficiently well from a pilot survey. Interest in low or high quantiles. If interest is in quantiles of a distribution (i.e. percentage points such as the 10% point or the lower quartile, the 25% point), not just the mean, then sample sizes will have to be larger. Generally the standard error of a quantile estimate will increase with its distance from the centre of the distribution.

ICRAF Research Support Unit Technical Note No 4

Factors decreasing the required sample size


Stratification

Stratification means dividing the population into subgroups (strata) that can be identified before data collection, and taking a separate sample in each stratum. Two main reasons for stratification are: Because results are required for each stratum. Sample size determination will have to be done for each stratum. Reducing variability and increasing precision. If strata are selected so that the variation between individuals within each stratum is less than the overall variance, the gain in precision can be substantial, hence reducing the required sample size. Effective stratification to increase precision requires knowledge of the population being studied, so that relevant, identifiable strata can be chosen. Once within and between stratum variances are known sample sizes can be calculated. The ideas of stratification are also used to design surveys for estimating relationships. Consider the problem of looking at the relationship between the number of fruit trees (y) and distance from the road (x). A simple random sample is likely to have many observations with x values close to the mean. The scatter diagram of y against x will then have most observations in a cloud near the middle and it will not be possible to get a clear picture (and good estimate) of the relationship. If the population is stratified by distance from the road ( for example dividing it into 3 groups of <500m, 500 - 1000m and > 1000m) a stratified sample can be taken. This ensures that there are sufficient observations at the low and high ends of the x range. Relationships involving several x variables will require stratification by each of those variables. Non-random sampling

Some non-random sampling schemes can be much more efficient than simple random sampling, giving small standard errors, or reduced sample sizes for the same precision. The best example is grid sampling of spatially defined populations (e.g. land use, soil). The gain in precision that can be achieved by systematic sampling is difficult to quantity.

10

Sampling Size Determination in Farmer Surveys

Steps in fixing the sample size

1. Refine the objectives of the survey. In order to make rational sample size choices, both the quantities to be estimated and the precision required must be specified. Note that objectives (or hypotheses) should never be stated in terms of lack of significance (for example Confirm there is no significant difference in the density of grevillea planted on large and small farms). Lack of significant effects can always be guaranteed by doing a poor survey! 2. Decide which are the key objectives. Most surveys have multiple objectives which might require different sample sizes. Choose 2 or 3 principle objectives. 3. Collect the information needed to make sample size calculations. variation expected (s2, CV) and possibly useful stratification come from: Information on the

4. 5.

Previous surveys in the area Similar surveys in other areas Censuses Rough estimates from informal data collection Pilot surveys. Estimate the sample size for each of the main objectives, and select the largest. Estimate the cost (in terms of any limiting source: money, transport, time...).

If the cost of the required sample is too high go back to 1. The objectives will have to be made more modest! There is no point just reducing sample size to match available resources, as the objectives will not be met.

11

S-ar putea să vă placă și