Sunteți pe pagina 1din 120

Techniques of doing Descriptive Research:

Sampling And Sampling Distributions

Learning Objectives

• Understand the basic concept of Sampling.


•Distinguish between SAMPLE AND CENSUS.
•Differentiate between a sample error and a non –sampling error.
• Understand the meaning of sampling design.
•Explain different types of Probability and Non Probability
Sampling techniques.
•Understand the sampling distribution of sample proportion.
Sampling is the process of selecting
observations
(a sample) to provide an adequate descriptio
and inferences of the population. n
 Sample
 It is a unit that is selected from
population
 Represents the whole population
 Purpose to draw the inference

 Why Sample???
A researcher generally takes a small portion of population for
study , which is referred to as sample. The process of selecting
a sample from the population.
What you
What you
want to
talk about Populatio actually
observe
n in the
data

Sampling
Process
Sampling
Sampl
Frame e

Inferenc
e
Why Sampling is essential?

•Among the advantages are that sampling can save cost and human
resources during the process of research work.

•In ICT, sampling does not cause much constraint such as heavy use of tools
and technology in predicting the research output.

• When the researcher process is descriptive in nature, sampling minimizes


the destruction.

•Sampling broadens the scope of the study in the light of the scarcity of
resources.

• It has been noticed that sampling provides more accurate results, as


compared to census because in sampling non-sampling errors can be
controlled more easily.
•In most cases census is not possible so sampling is the only option left.
Disadvantages of Sampling

•A researcher may not find the information about the population


being studied especially on its characteristics. The research can
only estimate or predict them. This means that there is a high
possibility of error occurrence in the estimation made.

• Sampling process only enables a researcher to make estimation


about the actual situation instead of finding the real truth. If you
take a piece of information from your sampling population, and if
your reasoning is correct, your findings should also be accurate to
a certain degree.
Census or complete enumeration is an examination of each
and every element of the population.

Uses of sampling in Real Life

 Suppose you have a guest for dinner at your residence. Your


mother prepares a number of dishes and before the guest
arrives, she may give you a tablespoon of each of the dish to
taste and tell her whether all the ingredients are in the right
proportion or not. Again, a sample is being taken from each of
the dish to know how each of them tastes.
Difference Between Census and Sampling
ASIS FOR
CENSUS SAMPLING
COMPARISON
Meaning A systematic method Sampling refers to a
that collects and portion of the
records the data population selected to
about the members of represent the entire
the population is group, in all its
called Census. characteristics.
Enumeration Complete Partial
Study of Each and every unit of Only a handful of
the population. units of the
population.
Time required It is a time consuming It is a fast process.
process.
Cost Expensive method Economical method
Results Reliable and accurate Less reliable and
accurate, due to the
margin of error in the
data collected.
Error Not present. Depends on the size
of the population
Basis Population Sample

Definition Collection of Part or portion of


items being population
considered chosen for study.

Characteristics “ PARAMETERS” “STATISTICS”

Symbols Pop. Size= N Sample size=n


Pop. Mean= µ Sample mean=X
Pop. Standard Sample Standard
Deviation= б Deviation= s
SAMPLING TERMINOLOGIES

(a)The community, families living in the town with smart homes


form the population or study population and are usually denoted by
the letter N.

(b) The sample group of elderly people or senior citizens and


disable people in the vicinity of the smart home community is
called sample.

(c) The number of elderly people or senior citizens and disabled


people you obtain information to find their average age is called
the sample size and is usually denoted by letter n.
(d) The way you select senior citizens and disabled people is
called the sampling design or strategy.

(e) Each citizen or disabled people that becomes the basis for
selecting your sample is called the sampling unit or sampling
element.

(f) A list identifying each respondent in the study population is


called sampling frame. In case when all elements in a sampling
population cannot be individually identified, you cannot have a
sampling frame for the study population.

(g) Finally, the obtained findings based on the information of


the respondents are called sample statistics.
Sampling Vs Non Sampling Error
BASIS FOR NON-SAMPLING
SAMPLING ERROR
COMPARISON ERROR
Meaning Sampling error is a An error occurs due to
type of error, occurs sources other than
due to the sample sampling, while
selected does not conducting survey
perfectly represents activities is known as
the population of non sampling error.
interest.
Cause Deviation between Deficiency and
sample mean and analysis of data
population mean
Type Random Random or Non-
random
Occurs Only when sample is Both in sample and
selected. census.
Sample size Possibility of error It has nothing to do
reduced with the with the sample size.
increase in sample
size.
Sampling Design Process

• Step 1: Target Population must be defined.

Target population is the collection of objects which


posses the information required by the researcher and about
which the inference has to be made.

• Step 2: Sampling frame must be determined.

A researcher takes the sample frame from population list,


directory ,or any other source used to represent the
population. This list possess the information about the subject
and is called the sampling frame. Sampling is carried out from
the sampling frame not from the target population.
Step 3: Appropriate sampling technique should be selected

In sampling with replacement an element is selected from the frame, required


information is obtained, and than element is placed back in the frame. This way
there is possibility of element being selected again in the sample. As compared
to this in sampling without replacement a sample is selected from the frame and
not replaced in the frame. This way further inclusion of the element in the sample
is eliminated.

Step 4: Sample size must be determined.


Sample size refers to the number of elements to be included in the study.

Step 5: Sampling process must be executed


Probability/Random sampling Vs Non Probability/Random
Sampling

In probability sampling, each sample has an equal probability of


being chosen. We can say, a probability sample is one in which
each element of the population has a known non-zero probability
of selection. This method of sampling gives the
probability that our sample is representative of a population.

Where as Non Probability is in which each and every element of


the population does not have an equal chance of being selected
in the sample.
Simple Random Sampling :

The simplest of all kind of probability sampling is


simple random sampling method. From census/Population
a sample is drawn unit by unit or one by one in such a way
that each unit of population has equal or independent
chance of being included in the sample. If the unit is
drawn is replaced again in the population it is called
simple random sampling with replacement otherwise it is
known as simple random sampling without replacement.
There are 2 possibilities in random sampling

1.Simple Random Sampling with replacement –


(SRSWR)
It is also known as equal chance of probability. A
sample is to be selected in SRSWR by “n” drawn from
a population of “N”. If the sample is drawn by
observing the following rules-

a.At each drawn sample in the population have the


same probability of being included.
b.A unit selected at a draw is replaced to the
population before the next is drawn.
Remark:

The same unit can be selected more than once.


Probability of getting a sample-{ I 1, I2, I3………………………In}
P[ s{ i1, i2, i3 ……………………….in}] = 1/N = 1/ Nn
Where I = Population, S= Sample size i= sample units

Suppose there are 5 shops A,B,C,D and E. Consider a


sample of size 2. Make sample sizes possible with simple
random sampling with replacement.

{A,B} {B,A} {C,D} {D,A} {E,A}


{A,C} {B,C} {C,A} {D,B} {E,B}
{A,D} {B,D} {C,B} {D,C} {E,C}
{A,E} {B,E} {C,C} {D,E} {E,D}
{A,A} {B,B} {C,E} {D,D} {E,E}

Total =25
Simple Random Sampling Without Replacement

It is the method of sampling in which units are drawn by a


population of “N”. Once a no. is picked, it is not included again.
Therefore, probability of selecting a unit varies from other.

For example, if there are 52 cards in a pack of playing cards and


if we select 5 samples out of 52 it will be 1/52, 1/51 , 1/50, 1/49,
1/48.

Theorem of SRSWR:

In SRSWR taking “n” sample unit in “N” population, sample mean


“Y” is an unbiased estimator of “Y” population mean.

E(y) = Y
There are 2 methods of Simple Random Sampling without
replacement:

1.Lottery Method
In lottery method chits of all the samples is made and one
chit at random is taken out. The name written on the chit has to
be then further studied and analysed.

2. Random Number Table-


In this 3 types of tables given by economists is used-
a.Tippets Table
b.Fischer and Yateer’s Table
c.Kendall and Raington table
Tippet’s Table – There are 10,400 number in this table
arranged in 4 digit times.

Fischer’s Table- It comprises of 16,000 digits which are


arranged in number consisting of 2 digits each.

Kendell's and Rington – They consist of 10,000 digit


which have been grouped in 25,000 sets of 4 digits
each number.
Advantages of Simple Random Sampling

1.One of the great advantages of Simple Random Sampling


method is that it needs only a minimum knowledge of the study
group of population in advance.
2.It is free from errors in classification.
3.This is suitable for data analysis which includes the use of
inferential statistics.
4.Simple random sampling is representative of the population.
5.It is totally free from bias and prejudice.
6.The method is simple to use.
7.It is very easy to assess the sampling error in this method.
Disadvantages of Simple Random Sampling Method

1.This method carries large errors from the same sample


size than that are found is stratified sampling.
2.In sample random sampling the selection of sample
becomes impossible it the units or items are widely spread.
3.One of the major disadvantages of simple random
sampling method is that it cannot be employed where the
units of the population are heterogeneous in nature.
4.This method lacks the use of available knowledge
concerning the population.
5.Sometimes, it is difficult to have completely catalogued
universe.
6.It may be impossible to contact the cases which are very
widely dispersed.
Non Random Sampling

Advantages
1.Least expensive, least time consumption and most convenient.
2.Sample can be controlled for certain characteristics.

3.Can estimate rare characteristics.

Demerits

1.Selection bias, sample not representative, not recommended


for descriptive or causal research.
2. Does not allow generalization, subjective selection bias, no
assurance of representativeness.
3.Time Consuming
Stratified Sampling :

Stratified sampling is a probability sampling method and a


form of random sampling in which the population is
divided into two or more groups (strata) according to one
or more common attributes.

Stratified random sampling intends to guarantee that the


sample represents specific sub-groups or strata.
Accordingly, application of stratified sampling method
involves dividing population into different subgroups
(strata) and selecting subjects from each strata in a
proportionate manner.
The table below illustrates simplistic example where
sample group of 10 respondents are selected by
dividing population into male and female strata in
order to achieve equal representation of both genders
in the sample group.
Stratified sampling can be divided into the following
two groups: proportionate and disproportionate.
Application of proportionate stratified random
sampling technique involves determining sample size
in each stratum in a proportionate manner to the entire
population.

In disproportionate stratified random sampling,


on the contrary, numbers of subjects recruited from
each stratum does not have to be proportionate to the
total size of the population. Accordingly, application of
proportionate stratified random sampling generates
more accurate primary data compared to
disproportionate sampling.
Application of Stratified Sampling: an Example

Suppose, you dissertation aims to explore the leadership


styles exercised by medium-level managers at Bayerische
Motoren Werke Aktiengesellschaft (BMW AG). You have
selected semi-structured in-depth interviews with managers as
the most appropriate primary data collection method to
achieve the research objectives.

Application of stratified random sampling contains the


following three stages.
1. Identification of relevant stratums and ensuring
their actual representation in the population. Apart
from gender as illustrated in example above, range of
criteria that can be used to divide population into different
strata include age, the level of education, status,
nationality, religion and others. Specific patterns of
categorization into different stratums depends aims and
objectives of the study.

In our case, BMW Group employees are employed across


four business segments – automotive, motorcycles,
financial services and other entities. Accordingly, each
segment can be adapted as stratum to draw sample group
members.
2. Numbering each subject within each stratum with a
unique identification number.

3. Selection of sufficient numbers of subjects from each


stratum. It is critically important for samples from each stratum
to be selected in a random manner so that the relevance of bias
can be minimized.
Advantages of Stratified Sampling

1.Stratified random sampling is superior to simple random


sampling because the process of stratifying reduces sampling
error and ensures a greater level of representation.

2. Stratified random sampling represents the adequate


representation of all subgroups can be ensured.

3. When there is homogeneity within strata and heterogeneity


between strata, the estimates can be as precise (or even more
precise) as with the use of simple random sampling.
Disadvantages of Stratified Sampling:

1.The application of stratified random sampling requires the


knowledge of strata membership a priori. The requirement to
be able to easily distinguish between strata in the sample
frame may create difficulties in practical levels.

2.Research process may take longer and prove to be more


expensive due to the extra stage in the sampling procedure.

3.The choice of stratified sampling method adds certain


complexity to the analysis plan.
Cluster sampling

Cluster Sampling is defined as a sampling method where


multiple clusters of people are created from a population
where they are indicative of homogeneous
characteristics and have an equal chance of being a part
of the sample. In this sampling method, a
simple random sample is created from the different
clusters in the population.
Cluster Sampling: Steps and Tips

•Sample: Decide the target audience and also the size


of the sample.

•Create and evaluate sampling frames: Create a


sampling frame by using either an existing frame or
creating a new one for the target audience. Evaluate
frames on the basis of coverage and clustering and
make adjustments accordingly. These groups will be
varied considering the population which can be
exclusive and comprehensive. Members of a sample
are selected individually.

•Determine groups: Determine the number of groups


by including the same average members in each group.
Make sure each of these groups are distinct from one
another.
•Geographic segmentation: Geographic segmentation is the
most commonly used cluster sample.

•Sub-types: Cluster sampling is bifurcated into one-stage and


multi-stage subtypes on the basis of the number of steps
followed by researchers to form clusters.
Cluster Sampling Methods with Examples

Single Stage Cluster Sampling: As the name suggests,


sampling will be done just once. An example of Single Stage
Cluster Sampling –An NGO wants to create a sample of girls
across 5 neighboring towns to provide education. Using single-
stage cluster sampling, the NGO can randomly select towns
(clusters) to form a sample and extend help to the girls
deprived of education in those towns.
Multiple Stage Cluster Sampling: For effective
research to be conducted across multiple geographies,
one needs to form complicated clusters that can be
achieved only using multiple-stage cluster sampling
technique. Steps of listing and sampling will be used in
this sampling method. 

An example of Multiple Stage Cluster Sampling –


Geographic cluster sampling is one of the most
extensively implemented cluster sampling technique. If
an organization intends to conduct a survey to analyze
the performance of smartphones across Germany. They
can divide the entire country’s population into cities
(clusters) and further select cities with the highest
population and also filter those using mobile devices.
Cluster Sampling Advantages

There are multiple advantages of using cluster sampling,


they are:

•Consumes less time and cost: Sampling of


geographically divided groups require less work, time
and cost. It’s a highly economical method to observe
clusters instead of randomly doing it throughout a
particular region by allocating a limited number of
resources to those selected clusters.

•Convenient access: Large samples can be chosen


with this sampling technique and that’ll increase
accessibility to various clusters.
•Least loss in accuracy of data: Since there can be large
samples in each cluster, loss of accuracy in information per
individual can be compensated.

•Ease of implementation: Since cluster sampling facilitates


information from various areas and groups, it can be easily
implemented in practical situations in comparison to other
probability sampling methods such as simple random
sampling, systematic sampling, and stratified sampling or non-
probability sampling methods such as convenience sampling.
Cluster Sampling vs Stratified Sampling
Cluster Sampling Stratified Sampling
Elements of a population are
The entire population is divided
randomly selected to be a part
into even segments (strata).
of groups (clusters).
Members from randomly Individual components of the
selected clusters are a part of strata are randomly considered
this sample. to be a part of sampling units.
Homogeneity is maintained Homogeneity is maintained
between clusters within the strata.
Heterogeneity is maintained Heterogeneity is maintained
with the clusters. between strata.
The strata division is primarily
The clusters are divided
decided by the researchers or
naturally.
statisticians.
The key objective is to minimize The key objective is to conduct
the cost involved and enhance accurate sampling along with
competence. properly represented population.
Systematic Sampling :

Systematic sampling is a probability sampling method where


the elements are chosen from a target population by selecting
a random starting point and selecting other members after a
fixed ‘sampling interval’. Sampling interval is calculated by
dividing the entire population size by the desired sample size.

For instance, if a local NGO is seeking to form a


systematic sample of 500 volunteers from a population of
5000, they can select every 10th person in the population
to systematically form a sample.
Steps to form a sample using the Systematic
Sampling technique:

•A defined structural audience needs to be developed


for the researcher to start working on the sampling
aspect.

•The research in charge must figure out the ideal size of


the sample, i.e how many people from the entire
population to choose to be a part of the sample.

•The key to precise, reasonable and practical results is


a bigger size of the sample.

•Once the number of the sample size is decided, a


number must be assigned to each and every member
of the sample.
The example mentioned above suggests that the
sample interval should be 10 which is the result the of
division of 5000 (N= size of the population) and 500
(n=size of the sample).
Systematic Sampling Formula for interval (i) = N/n =
5000/500 = 10

•The researcher needs to select these members who fit


the criteria which in this case will be 1 in 10 individuals.

•A number will be randomly chosen as the starting


member (r) of the sample and this interval will be
added to the random number to keep adding members
in the sample. r, r+i, r+2i etc. will be the elements of
the sample.
Multistage Sampling

The Multistage Sampling is the probability sampling technique


wherein the sampling is carried out in several stages such that
the sample size gets reduced at each stage.

The multistage sampling is a complex form of cluster


sampling. The cluster sampling is yet another random
sampling technique wherein the population is divided into
subgroups called as clusters; then few clusters are chosen
randomly for the survey.
While in the multistage sampling technique, the first level
is similar to that of the cluster sampling, where the
clusters are formed out of the population, but further,
these clusters are sub-divided into smaller targeting
groups, i.e. sub-clusters and then the subject from each
sub-clusters are chosen randomly. Further, the stages can
be added depending on the nature of research and the
size of the population under study.

For example, If the government wants to take a sample


of 10,000 households residing in Gujarat state. At the first
stage, the state can be divided into the number districts,
and then few districts can be selected randomly. At the
second-stage, the chosen districts can be further sub-
divided into the number of villages and then the sample
of few villages can be taken at random. Now at the third-
stage, the desired number of households can be selected
from the villages chosen at the second stage. Thus, at
each stage the size of the sample has become smaller
Real Life Examples
•The Census Bureau uses multistage sampling for the
U.S. National center for Health Statistics’ National
Health Interview Survey (NHIS). A multistage
probability sample of 42,000 households in 376
probability sampling units (PSUs are usually counties
or groups of counties), which are chosen in groups of
around four adjacent households.
•The Gallup poll uses multistage sampling. For
example, they might randomly choose a certain
number of area codes then randomly sample a number
of phone numbers from within each area code.
•Johnston et. al’s survey on drug use in high
schools used three stage sampling: geographic areas,
followed by high schools within those areas, followed
by senior students in those schools.
•The Australian Bureau of Statistics divides cities
into “collection districts”, then blocks, then
households. Each stage uses random sampling,
 The probability of each case being selected from the
total population is not known.

 Units
of the sample are chosen on the basis of
personal judgment or convenience.

 There
are NO statistical techniques for measuring
random sampling error in a non-probability sample.
 A. Convenience
Sampling

 B. Quota
Sampling

 C. Judgmental (Purposiv Sampling


Sampling e )

 D.
Snowball
sampling

 E.
Self-selection
sampling
Convenience Sampling

Convenience Sampling (also called availability


sampling) is a non-probability/non-random sampling
technique used to create sample as per ease of
access, readiness to be a part of the sample,
availability at a given time slot or any other practical
specifications of a particular element. The researcher
chooses members merely on the basis of proximity
and doesn’t consider whether they represent the
entire population or not. Using this technique, they
can observe habits, opinions, and viewpoints in the
easiest possible manner.
Convenience Sampling Examples:

•The most basic example of where the convenience sampling


method is used is when companies stop people at a mall or
on a crowded street to distribute their promotional pamphlets
and ask questions.

•Businesses use convenience sampling method to gather


information about critical issues that are to be addressed
almost immediately or when a brand is collecting feedback
about a particular feature or newly launched product from the
sample created using this method.
Convenience Sampling Advantages:
•Quick mode to collect data: The rules to gather
elements for the sample are least complicated in
comparison to methods such as simple random
sampling, stratified sampling or systematic sampling.
Due to this simplicity, data collection takes minimal
time.  

•Inexpensive to create samples: The money and


time invested in other probability sampling methods
are quite large when we compare it to convenience
sampling. This allows researchers to create more
samples with less or no investment and in a brief
period of time.

•Easily collectible samples: The name of this


surveys gives a clear indication of how samples are
formed. Elements are easily accessible by the
Quota Sampling :

In it the entire population is segmented


into mutually exclusive groups or
categories. The no. of respondents (Quota)
that are to be drawn from each of several
category is specified in advance and final
selection of respondent is left to the
interview who proceed until the quota for
each category is filled. Quota sampling
finds extensive use in commercial research
where the main objective is to ensure that
the sample represents in relative
proportion , the people in various
categories in the population such as
For example, If a researcher wants to segment the entire
population based on gender, than he would have two
categories of respondents, that is male and females if he
collect a sample of 30, he may allot a quota of 15 for male and
15 for female respondents (assuming that the population has
an equal proportion of males and females.) Therefore the
researcher will stop administering the questionnaire to females
after the interviews the 15th female respondent , that is when
the quota of 15 females is filled.
Judgemental Sampling: In this sampling technique, population
elements are selected on the basis of judgement and expertise of
the researcher. It may be useful in situation where broad
inferences are required.

When to execute Judgmental Sampling?

Judgmental sampling is most effective in situations where there


are only a restricted number of people in a population who own
qualities that a researcher expects from the target population.
Researchers prefer to implement Judgmental sampling when
they feel that other sampling techniques will consume more time
and that they have confidence in their knowledge to select a
sample for conducting research.
Examples of Judgmental Sampling

Consider a scenario where a panel decides to understand what


are the factors which lead a person to select ethical hacking as
a profession. Ethical hacking is a skill which has been recently
attracting youth. More and more people are selecting it as a
profession. The researchers who understand what ethical
hacking is will be able to decide who should form the sample to
learn about it as a profession. That is when judgmental
sampling is implemented. Researchers can easily filter out
those participants who can be eligible to be a part of the
research sample.
Judgmental Sampling Advantages

•Consumes minimum time for execution: In this


sampling approach, researcher expertise is important
and there are no other barriers involved due to which
selecting a sample becomes extremely convenient.

•Allows researchers to approach their target


market directly: There are no criteria involved in
selecting a sample except for the researcher’s
preferences. Due to this, he/she can communicate
directly with the target audience of their choice and
produce desired results.

•Almost real-time results: A quick poll or survey can


be conducted with the sample using judgmental
sampling since the members of the sample will possess
appropriate knowledge and understanding of the subject.
Snowball Sampling:

Snowball sampling or chain-referral sampling is defined as a


non-probability sampling technique in which the samples have
traits that are rare to find. This is a sampling technique, in which
existing subjects provide referrals to recruit samples required for
a research study.

For example, if you are studying the level of customer


satisfaction among the members of an elite country club, you
will find it extremely difficult to collect primary data sources
unless a member of the club agrees to have a direct
conversation with you and provides the contact details of the
other members of the club.
Types of Snowball Sampling :

Linear Snowball Sampling: The formation of a sample group


starts with one individual subject providing information about just
one other subject and then the chain continues with only one
referral from one subject. This pattern is continued until enough
number of subjects are available for the sample.

Exponential Non-Discriminative Snowball Sampling: In this


type, the first subject is recruited and then he/she provides
multiple referrals. Each new referral then provides with more data
for referral and so on, until there is enough number of subjects for
the sample.

Exponential Discriminative Snowball Sampling: In this


technique, each subject gives multiple referrals, however, only one
subject is recruited from each referral. The choice of a new
subject depends on the nature of the research study.
Snowball Sampling Examples

For some population, snowball sampling is the only way


of collecting data and meaningful information. Following
are the instances, where snowball sampling can be
used:

No official list of names of the members: This


sampling technique can be used for a population, where
there is no easily available data like their demographic
information. For example, homeless or list of members
of an elite club, whose personal details cannot be
obtained easily.

Difficulty to locate people: People with rare diseases


are quite difficult to locate. However, if a researcher is
carrying out a research study similar in nature, finding
the primary data source can be a challenge. Once
he/she is identified, they usually have information about
People who are not willing to be identified: If a researcher is carrying
out a study which involves collecting information/data from sex workers or
victims of sexual assault or individuals who don’t want to disclose their
sexual orientations, these individuals will fall under this category.
Advantages of Snowball Sampling

1.It’s quicker to find samples: Referrals make it easy


and quick to find subjects as they come from reliable
sources. An additional task is saved for a researcher, this
time can be used in conducting the study.

2.Cost effective: This method is cost effective as the


referrals are obtained from a primary data source. It’s is
convenient and not so expensive as compared to other
methods.
Sample hesitant subjects:

Some people do not want to come forward and


participate in research studies, because they don’t
want their identity to be exposed. Snowball sampling
helps for this situation as they ask for a reference from
people known to each other.

There are some sections of the target population which


are hard to contact. For example, if a researcher
intends to understand the difficulties faced by HIV
patients, other sampling methods will not be able to
provide these sensitive samples.

In snowball sampling, researchers can closely examine


and filter members of a population infected by HIV and
conduct a research by talking to them, making them
understand the objective of research and eventually,
Disadvantages of Snowball Sampling
1.Sampling bias and margin of error: Since people
refer those whom they know and have similar traits
this sampling method can have a potential sampling
bias and margin of error. This means a researcher
might only be able to reach out to a small group of
people and may not be able to complete the study with
conclusive results.
2.Lack of cooperation: There are fair chances even
after referrals, people might not be cooperative and
refuse to participate in the research studies.
Some important sampling distributions, which are
commonly used, are:

1.sampling distribution of mean;


2.sampling distribution of proportion;
3.student’s ‘t’ distribution;
4.F distribution; and
5.Chi-square distribution.
Sampling Distribution of Mean

The Sampling Distribution of the Mean is the mean of


the population from where the items are sampled. If the
population distribution is normal, then the sampling
distribution of the mean is likely to be normal for the
samples of all sizes.
The Sampling Distribution of the Sample Proportion

If repeated random samples of a given size n are taken


from a population of values for a categorical variable,
where the proportion in the category of interest is p,
then the mean of all sample proportions (p-hat) is the
population proportion (p).

As for the spread of all sample proportions, theory


dictates the behavior much more precisely than saying
that there is less spread for larger samples. In fact, the
standard deviation of all sample proportions is directly
related to the sample size, n as indicated below.
Since the sample size n appears in the denominator of the
square root, the standard deviation does decrease as
sample size increases. Finally, the shape of the
distribution of p-hat will be approximately normal as long
as the sample size n is large enough. The convention is to
require both np and n(1 – p) to be at least 10.
We can summarize all of the above by the following:
Student’s t-distribution:

When population standard deviation Formula is not


known and the sample is of a small size bi.e., n < 30 g ,
we use t distribution for the sampling distribution of
mean and workout t variable as:

i.e., the sample standard deviation.


The F-Distribution is also called as Variance Ratio
Distribution as it usually defines the ratio of the
variances of the two normally distributed populations.
The F-distribution got its name after the name of R.A.
Fisher, who studied this test for the first time in 1924.
Symbolically, the quantity is distributed as F-distribution with ν1
=n1-1 and ν2 = n2-1 degrees of freedom and is represented as:

Where, S12 is the unbiased estimator of σ12 and is


calculated as:
S22 is the unbiased estimator of σ22 and is calculated
as:
The Chi Square distribution

A standard normal deviate is a random sample from the


standard normal distribution. The Chi Square
distribution is the distribution of the sum of squared
standard normal deviates. The degrees of freedom of
the distribution is equal to the number of standard
normal deviates being summed. Therefore, Chi Square
with one degree of freedom, written as χ2(1), is simply
the distribution of a single normal deviate squared. The
area of a Chi Square distribution below 4 is the same as
the area of a standard normal distribution below 2,
since 4 is 22.
TYPES OF ESTIMATE

A point estimate is a single numerical value used


to estimate the
corresponding population parameter.

An interval estimate consists of two numerical values


defining a range of values that, with a specified degree of
confidence, most likely includes the parameter being
estimated.
Sampling Distribution of Mean

The Sampling Distribution of the Mean is the mean of the


population from where the items are sampled. If the population
distribution is normal, then the sampling distribution of the
mean is likely to be normal for the samples of all sizes.

Sampling distributions serve two purposes:


(1)they allow us to answer probability questions about sample
statistics, and

(2) they provide the necessary theory for making statistical


inference procedures valid.

A sample statistic is a descriptive measure, such as the mean,


median, variance, or standard deviation, that is computed from
the data of a sample.
Central Limit Theorem

The Central Limit states that the distribution of the sum of the
large numbers of independent identically distributed variables
will be approximate normal, regardless of the underline
distribution of X1, X2, X3………………….Xn are “n” random
variables which are independent and having the same
distribution with the µ and the standard deviation (σ ) than if n
gives infinity (n = α), the limiting distribution of the
standardized mean.
The normal distribution

The normal distribution is the most important probability


distribution in statistics because it fits many natural
phenomena. For example, heights, blood pressure,
measurement error, and IQ scores follow the normal
distribution. It is also known as the Gaussian distribution and
the bell curve.

The normal distribution is a probability function that describes


how the values of a variable are distributed. It is a symmetric
distribution where most of the observations cluster around the
central peak and the probabilities for values further away from
the mean taper off equally in both directions. Extreme values
in both tails of the distribution are similarly unlikely.
Example of Normally Distributed Data: Heights

Height data are normally distributed. The distribution in


this example fits real data that I collected from 14-year-
old girls during a study.
As you can see, the distribution of heights follows the typical
pattern for all normal distributions. Most girls are close to the
average (1.512 meters). Small differences between an
individual’s height and the mean occur more frequently than
substantial deviations from the mean. The standard deviation is
0.0741m, which indicates the typical distance that individual girls
tend to fall from mean height.

The distribution is symmetric. The number of girls shorter than


average equals the number of girls taller than average. In both
tails of the distribution, extremely short girls occur as
infrequently as extremely tall girls.
Parameters of the Normal Distribution

The normal distribution has two parameters, the mean and standard
deviation.

Mean
The mean is the central tendency of the distribution. It defines the location
of the peak for normal distributions. Most values cluster around the mean.
On a graph, changing the mean shifts the entire curve left or right on the X-
axis.
Standard deviation

The standard deviation is a measure of variability. It defines the


width of the normal distribution. The standard deviation
determines how far away from the mean the values tend to fall. It
represents the typical distance between the observations and the
average.
On a graph, changing the standard deviation either tightens or
spreads out the width of the distribution along the X-axis. Larger
standard deviations produce distributions that are more spread
out.
When you have narrow distributions, the probabilities are higher
that values won’t fall far from the mean. As you increase the
spread of the distribution, the likelihood that observations will be
further away from the mean also increases.
Population parameters versus sample estimates

The mean and standard deviation are parameter values


that apply to entire populations. For the normal
distribution, statisticians signify the parameters by using
the Greek symbol μ (mu) for the population mean and σ
(sigma) for the population standard deviation.

Unfortunately, population parameters are usually


unknown because it’s generally impossible to measure
an entire population. However, you can use random
samples to calculate estimates of these parameters.
Statisticians represent sample estimates of these
parameters using xx̅ for the sample mean and s for the
sample standard deviation.
Properties of a Normal Distribution

A normal distribution is a continuous probability distribution for a random


variable x. The graph of a normal distribution is called the normal curve,
which has all of the following properties:

1. The mean, median, and mode are equal.


2. The normal curve is bell-shaped and is symmetric about
the mean.
3. The total area under the curve is equal to one.
4. The normal curve approaches, but never touches, the x-
axis.
5. Between µ−σ and µ + σ the graph is concave down and
elsewhere the graph is concave up. The points at which the
graph changes concavity are called inflection points.
Most of the area under the curve (99.7%) lies between -3σ and +3σ of the average.
PROBABILITY DENSITY FUNCTION

The probability density function for the normal distribution is given by:

where x is some value between -∞ to +∞, μ is the average and σ is


the standard deviation. The standard deviation is an indication of
how wide the normal distribution is. The average gives the location
of the normal distribution.
The distributions below show how the normal distribution changes
as the standard deviation changes. The average is 100 and there
are three different distributions with standard deviations of 5, 10,
and 20. Note that the larger the standard deviation, the wider the
distribution. When you are making a control chart, the range chart
is actually monitoring the "width" of the distribution. The range
chart answers the following question: is the spread in my data
staying the same over time (in control) or is the spread getting
smaller or larger (out of control)?
A normal distribution has the following properties:

68% of the data is within +/- 1 standard deviation of the average


95% of the data is within +/- 2 standard deviations of the average
99.7% of the data is within +/- 3 standard deviations of the average
If the sample size is sufficiently large we need not know the
population distribution because the central limit theorem assures
us that x-bar (mean) can be approx. by normal distribution. A
sample size more that 30 is generally considered to be large
enough for these purposes.

Many practical samples are higher than 30 in all these cases, we


know that sampling distribution of mean can be approximated as
normal distribution with an expected value=Pop mean and the
variance which is equal to the population variance is divided by
sample size “n”.

S-ar putea să vă placă și