Sunteți pe pagina 1din 41

Lecture 1:

OVERVIEW OF THE RESEARCH PROCESS


Ophelia M. Mendoza, DrPH
Lecturer
Characteristics of a Research Process
It is a problem solving activity
It involves the application of the scientific method in investigating a problem
Conclusions made from the research process are based on empirical evidence or observed facts
DEFINITION OF RESEARCH
It is the systematic and rigorous investigation of a situation or problem in order to generate new knowledge
or validate existing knowledge
DEFINITION OF HEALTH RESEARCH
Health research is the generation of new knowledge using the scientific method to identify and deal with
health problems (COHRED, 1991)
Health research can take on many forms, from clinical trials of drugs to qualitative studies
EXAMPLES OF DIFFERENT TYPES OF HEALTH RESEARCH: CLINICAL TRIALS
Experimental designs used by clinicians and epidemiologists to evaluate drugs, medical devices and clinical
or healthcare procedures
The most common form of a clinical trial is the randomized, controlled, double blind clinical trial
PRE-CLINICAL STUDIES
Experiments done prior to testing drugs in humans for purposes of:
Isolating and characterizing active compounds
Testing of absorption, distribution, metabolism, excretion and toxicological properties(ADME/Tox)
Pharmacology and toxicology in animals
Establishing no observable adverse effect levels to determined usage to be used for initial Phase1 clinical
trial of the drug
PHASES OF CLINICAL TRIALS
PHASE1

Perform initial human testing in a small group of


healthy volunteers (about 20-100)
Major goal is to determine if drugis safe in humans

PHASE 2
Test in a small group of patients (about 100 500)
Objective is to determine possible short-term side
effects and risks associated with the drug; if it works
according to expected mechanism
PHASE 3
Test in a large group of patients (about 1000-5000) to
show safety and efficacy
PHASE 4
Post-marketing surveillance of drug to determine
long-term
safety
and
reassess
effectiveness,acceptability and continued use under
normal field settings

pg. 1

EXAMPLES OF DIFFERENT TYPES OF HEALTH RESEARCH: HEALTH SERVICES RESEARCH


Health services research is the multidisciplinary field of scientific investigation that studies how social factors,
financing systems, organizational structures and processes, health technologies, and personal behaviors affect
access to healthcare, the quality and cost of health care, and ultimately our health and well-being. Its research
domains are individuals, families, organizations, institutions, communities, and populations.(Ad Hoc
Committee, AHSA)
Steps in Conducting Research
Steps in Conducting Research
1. Identify and define the research problem
1.1.1 Selecting a research topic
1.1.2 Formulating research objectives

Criteria for Selecting a Research Topic


a. Relevance
How large or widespread is the problem?
Who is affected?
How severe is the problem?
Is my topic in line with the priority areas of the funding agency to whom my proposal will be submitted
b. Avoidance of Duplication
Has the topic been investigated before?
Are there major questions which deserve further investigation?
c. Feasibility
Can the study be done given the existing resources?
Can data from the required number of samples be collected within the time frame the study, given the
inclusion and exclusion criteria?
Criteria for Selecting a Research Topic
d. Political acceptability
Does the topic have the interest and support of the authorities?
Can policy makers be involved at an early stage?
e. Applicability of possible results and recommendations
What is the chance of the recommendation from the study being applied?
f. Urgency of the data needed
How urgently are data needed for making a decision?
g. Ethical Acceptability
How acceptable is the research to those who will studied?
(Note: Cultural sensitivity must be considered)
Can informed consent be obtained from the subjects?
Will the condition of the subjects be taken into account?
RESEARCH OBJECTIVES
Directly related to the research problem
Reflect the questions the investigator wishes to answer at the end of the study
Provides the general direction27 in the conduct of the research project
2. Review the literature related to the problem identified
Uses of the review of the related literature:
a.
To know more about previous studies done
Who has done previous work in the research area considered?
What research methods (design, variable definition, instrumentation, etc.) were utilized?

pg. 2

What problems were met and how were they resolved?


b.
To establish the theoretical or conceptual framework for the research
c.
Important things to remember about the review of related literature:
d.
The literature to be reviewed must not only be related to the topic of the research, but more so,
on the specific objectives actually covered

e.

The results of the literature review can be used to generate hypothesis,methods and comparative
datawhich are useful in the interpretation and discussion of results

f.

The studies reviewed should be evaluated in terms of assumptions, sources, techniques,


conclusions, applications and unsolved problems

g.

Related literature should be summarized by topic rather than as a running bibliography. This
means that the conclusions of authors dealing with a particular topic should be compared and
synthesized

h.

If the research being proposed is pioneering, and no previous studies have been done in the
area, this has to be mentioned in the review of related literature to provide additional
basis/justification for the conduct of the proposed research

3. Revisit the research objectives and redefine the actual problem for investigation in more clear and specific
terms
Refers to the process of reviewing, refining or fine-tuning the first draft of the general and specific
objectives based on new knowledge derived from the review of related literature.
It may involve delimiting the scope of the study without dealing with a trivial problem
4. Formulate testable hypothesis and define basic concepts and variables
Identifying attributes of the variables to be tested in the research project
* Estimating magnitudes
* Determining differences
* Looking at relationships
Formulating conceptual and operational definitions of variables
Steps in Conducting Research
5. Construct the research design
Areas of concern include:
Study design
Methods of subject selection
Sample size
Strategies for control and manipulation of relevant variables
Establishment of criteria to evaluate outcomes
Instrumentation
Major considerations in formulating the research design:
Internal Validity
Does the study measure what it intends to measure?
Refers to the extent to which various types of biases are controlled in the study like comparability of
subjects, measurement bias and others
External Validity
Refers to the extent to which the study results can be generalized to a larger population
Covers issues related to sample selection and sample size
6. Design the tools for data collection

pg. 3

7. Design the plan for data analysis


Identification of statistical techniques to be applied in order to achieve the research objectives
8. Collect the data
9. Process the collected data
10. Analyze the data
11. Write the research report
12. Disseminate the results
13. Utilize the result
Comparison between the research process and food preparation
1-7. Planning for the
research activity
Meal planning
8. Data collection
Marketing
9. Data Processing
Washing, paring, slicing
10. Data Analysis
Cooking
11. Report writing
Garnishing; plating
12. Data Dissemination
Serving
13. Data Utilization
Eating
The success of a research project depends on:
How well thought out the research project is; and on
How potential problems have been identified and resolved BEFORE DATA COLLECTION BEGINS

LECTURE 2a: JUSTIFYING THE SIGNIFICANCE OF THE RESEARCH LECTURE


2b: DEVELOPIING THE CONCEPTUAL FRAMEWORK
Ophelia M. Mendoza, DrPH
Lecturer
1. JUSTIFYING THE SIGNIFICANCE OF THE RESEARCH
Convincing

others that the problem is important


Explaining

what is not known about the problem hence the need for the proposed research
Providing

documentation that this is actually a problem


related literature
available reports, statistics, documents
What is the contribution of my research to existing knowledge in this area?
How will my research results improve:
current practices?
existing policies?
Is the problem to be studied current or timely? Does it exist now?
How widespread is the problem in terms number of areas or people affected?
Does the problem affect important populations of special interest (ex., mother and children, elderly, youth, etc.?
Does the problem relate to on-going programs, projects activities, or initiatives?

pg. 4

Does the problem relate to broader social, economic or health issues (ex., poverty; climate change; status
of women and children, etc.)?
Who else are concerned about the problem (ex., government; civil society; church, etc)?
1.1 JUSTIFYING THE SIGNIFICANCE OF THE RESEARCH: HOW TO WRITE-UP THIS SECTION
a. Review your answers to the questions listed earlier.
b. Sort your answers into 2 categories whether they address broad or specific issues related to your research
problem
c. Arrange your answers in 1 or 2 paragraphs which justify the importance of the research problem. The
suggested flow of the discussion is one which follows an inverted triangle, starting with broad issues, then
focusing on specific issues related to particular groups or settings to be studied in the proposed research
1.2 JUSTIFYING THE SIGNIFICANCE OF THE RESEARCH: FLOW OF DISCUSSION
BROAD ISSUES

SPECIFIC ISSUES

1.3 IDENTIFICATION OF END-USERS AND TARGET BENEFICIARIES


Who

can use, apply or benefit from the results of my research?


These

can be specific persons, groups, agencies or institutions


Each

end-user/target beneficiary may have a different use or can benefit from the research results in a
different way
The

proponent must describe in a concise way specifically how each end-user/target beneficiary can apply
or benefit from the research results
1.3.1 IDENTIFICATION OF END-USERS AND TARGET BENEFICIARIES: EXAMPLE
Title of Research:
Capacities and Needs Assessment for Health Emergency Management among conflict-affected and disasterprone LGUs in the Ligawasan Wetlands Biodiversity Reserve (LWBR)
1.3.2 IDENTIFICATION OF END-USERS AND TARGET BENEFICIARIES: BAD EXAMPLE
The following are the end-users and target beneficiaries of this research:
LGUs of disaster-prone areas
Legislators at the regional and local levels
Academicians/researchers
Residents in disaster-prone communities
1.3.3 IDENTIFICATION OF END-USERS AND TARGET BENEFICIARIES: GOOD EXAMPLE (actual write-up
presented in the proposal)
This study has immense use not only for the health services providers networks and government health
functionaries and personnel in the four LGUs, but also for Local Government Units, in harnessing and
mobilizing local resources toward an integrated and harmonized health emergency planning for preparedness
and resilience.
On the policy side, legislators, both at the local and regional levels, use the results of this study to push for
more integrative approaches in capacitating local health and health-related functionaries and other personnel
down to the barangay level.

pg. 5

The tools for gathering data can be integrated in various social science courses, especially in the Sociology of
Disaster, and in the graduate program in Public Administration, especially in Public Policy (Health and
Emergencies in LGUs). These tools are not yet included in the catalogue of traditional methods of gathering
data in most institutions of higher learning in the region.
More importantly, communities that continue to suffer from inordinate and heavy damage to life and property
after armed conflicts and natural disasters can also learn to appreciate their pro-active role in mitigating
disasters and in lessening their vulnerabilities to health and life risks resulting from disasters.
2. DEVELOPMENT OF THE CONCEPTUAL FRAMEWORK
The conceptual framework is a written or a visual presentation which explains either graphically or in
narrative form, the main variables being studied in the proposed research and how they are related to each
other
Inputs needed in developing a conceptual framework include:
Experiential knowledge of the researcher
Technical knowledge
Research background.
Personal experience.
Literature review:
Prior related theory concepts and relationships that are used to represent the world, what is happening
and why
Prior related research how people have tackled similar problems and what they have learned
Other theory and research - approaches, lines of investigation and theory that are not obviously
relevant/previously used.
In
the research process, the development of the conceptual framework is done after the review of related
literature and before the formulation of the research objectives
There

must be consistency between the conceptual framework presented and the research objectives to be
investigated
2.1 DEVELOPMENT OF THE CONCEPTUAL FRAMEWORK: CONVENTIONS/USUAL PRACTICES
In building the framework:
Start with the dependent /outcome variable or endpoint for intervention
Identify potential independent variables deemed to affect the dependent/outcome variable based on
empirical or theoretical evidence
Identify intervening, confounding , antecedent or mediating variables whose effects may alter the
relationship between the dependent and independent variable
Variables

are presented in boxes while relationships are represented by arrows


Logical

presentation of concepts is from left-to-right or top-to-bottom


Concepts

are labelled briefly and concisely


EXAMPLE 2: CONCEPTUAL FRAMEWORK FOR A RESEARCH ON NUTRITION EDUCATION FOR
MOTHERS OF PRESCHOOLERS
INPUT
IEC Materials on child feeding
Trained health workers

pg. 6

OUTPUT
Number of nutrition education classes on child feeding conducted for mothers
Numbers of mothers trained on proper child feeding
OUTCOME
Change in mothers knowledge, attitudes and practices on child feeding
IMPACT
Change in the prevalence of malnutrition

Lecture 3:
OBJECTIVE FORMULATION
Ophelia M. Mendoza, DrPH
Lecturer
Steps in the Formulation
Research Objectives

of

Questions Asked

Steps to be Taken

What is the problem?


Why should it be studied?
What contribution can I make to
the existing knowledge by studying
this problem?

SELECTION, ANALYSIS
STATEMENT
OF
RESEARCH PROBLEM

What information
available?

LITERATURE REVIEW

is

already

What do we hope to achieve in the


research?
What questions do we want to be
answered?

Important Elements of Each Step


AND
THE

* Problem
identification
* Problem
Prioritization
* Justification/
significance of the
research problem
* Literature and other
available information
* Synthesis of previous
studies done

FORMULATION OF OBJECTIVES
General and specific research
objectives

What are Research Objectives?


They summarize what is/are to be achieved by the study
They reflect the questions the study wishes to answer
They serve as the steering wheel of the whole research process, by providing direction regarding the rest
of the steps of the research process
They are derived from the statement of the problem
They can be stated either in the form of a statement or a question
* To determine if there is a relationship between smoking and lung cancer
* Is there a relationship between smoking and lung cancer?

pg. 7

PROJECT vs RESEARCH OBJECTIVES


PROJECT OBJECTIVE
Describes what the project proponent wants to happen in the course of, or at the end of the project
Reflects the activities or the desired outputs of the intervention or project being considered
Example:
To develop operational guidelines for disaster management and mitigation in Zamboanga City.
RESEARCH OBJECTIVES
In the context of a project or an intervention, the research objective reflect the questions or problems which
need to be answered about it
They can cover different phases of the project , from needs assessment to project evaluation
Example:
To assess the availability and adequacy of financial, human and technical resources for disaster management
and mitigation in Zamboanga City
GENERAL vs SPECIFIC OBJECTIVES
GENERAL OBJECTIVE
It reflects the overall purpose of the project
It states what is expected to be achieved by the study in general terms
SPECIFIC OBJECTIVES
They are statements regarding the specific questions expected to be answered in the study
They break up the general objective into smaller, logically connected parts
They systematically address the various aspects of the problem as defined in the problem statement
Example of General and Specific Objectives
GENERAL OBJECTIVE
To determine the extent and nature to which the staff of DOH Regional Office IX are exposed to occupational
stress
SPECIFIC OBJECTIVES
1. To determine the prevalence of occupational stress among the staff of DOH Regional Health Office IX;
2. To determine whether the following factors are associated with occupational stress among the staff of DOH
Regional Office IX
a. Sex of the personnel
b. Length of employment
c. Type of position (contractual vs regular; managerial vs non-managerial positions, etc.)
REASONS FOR SPECIFYING RESEARCH OBJECTIVES
To help define the focus of the study
To identify the specific variables to be measured, and avoid the collection of data which are not essential to
the problem identified
To organize the study into clearly defined parts and phases

pg. 8

To guide the researcher in the development of the research methodology, and orient the collection, analysis
and interpretation of the data
CHARACTERISTICS OF RESEARCH OBJECTIVES
They are phrased in such a way they focus on what the study is attempting to solve, and cover the different
parts of the problem in a logical way.
They are clearly phrased in measurable and operational terms,specifying exactly what are the researcher
wishing to do.
They are realistic,considering the constraints within local conditions and should be feasible.
They use action verbs which are specific enough to be measured.
SPECIFIC ACTION VERBS NON-SPECIFIC ACTION VERBS
Determine Appreciate
Compare Understand
Compute Explore
Describe
SOME DOWN TO EARTH REMINDERS WHEN SPECIFYING RESEARCH OBJECTIVES
1. We were taught basic grammar in grade school for a reason ----to write readable and understandable
research objectives when we get older
To describe the psychiatric needs of Hospital X through physicians assessment
2. KISS Keep It Short and Simple
To determine the efficacy of indoctrinating
a superannuated canine with innovative maneuvers
3.Say what you mean and mean what you say.
Consider the following research objectives:
1.To determine the mean birth weight of babies born to mothers in the following age-groups: <18, 18-35, and
36-49
2.To compare the mean birth weight of babies born to mothers in the following age-groups: <18, 18-35, and 3649
3.To compare the incidence of low birth weight among babies born to mothers in the following age-groups:
<18, 18-35, and 36-49
4.To determine if there is an association between the incidence of low birth weight and the age of the mother
5.To determine if place of age of the mother is a predictor of the incidence of low birth weight
Exercise: Consider the following research objectives and identify the potential problems
1. To determine reported cases of injuries and accidents related to waste management practices
2. Title: Assessment of HIV spread among the most vulnerable populations of Province X
General objective:
To assess the possible spread of HIV among the most vulnerable populations of Province X
3. Determine the utilization of health delivery facilities to support the health needs and problems of the
community.
4. To determine the health condition of women in the rural areas of Province X along the following:
Household morbidity
Household disability
Child and household deaths

pg. 9

Pregnancy/maternal and child care


Fertility and family planning
Immunization of children
Household diet and nutrition

LECTURE 4:
DEFINITION OF VARIABLES AND DATA COLLECTION
Ophelia M. Mendoza, DrPH
1. DEFINITION OF VARIABLES
The main questions to be answered when defining variables are:
a. What variables are needed for this research?
b. What specific data elements need to be collected in order to measure this variable?
These questions have to be considered in three phases of the research process, namely:
a. the formulation of research objectives;
b. the development of data collection tools; and
c. data analysis
1.1 Formulation of Specific Objectives
involves transforming abstract concepts presented in the theoretical or conceptual framework into
observable and measurable indicators

Consider this example:


Student Characteristics
Teacher Characteristics
School Characteristics

Academic Achievement

Based on the above conceptual framework, the following research objective can be formulated:
To determine the effect of school characteristics on the academic achievement of students
The above research objective can be rephrased as follows, after making it more specific:
To compare the grades of students in large and small class sizes
1.2 Development of Data Collection Tool
involves identifying variables needed to measure or compute the indicators of interest
Nutritional
Status

Weight
Nutritional Age
Sex
Height

involves selecting an indicator which is the most relevant to the phenomenon of interest or outcome
considered

Grades

grade per course


weighted average grade for all courses
grades for selected courses

1.3 Data Analysis


involves formulating operational definitions
needs to answer question, How will data be treated in the analysis?

pg. 10

decision needs to be made in determining categories of variables


1.3.1 Frequently used ways basis of categorizing variables
a. Existing/standard definitions or cut-off points
b. Statistical measures of location (ex., quintiles, deciles, etc.)
c. Objectives of the study
1.3.2 Types of variables
When data analysis requires looking at relationships between variables, three types of variables can be
considered:
a. dependent variable - refers to the output, outcome or the response variable
b. independent variable refers to the variable which is presumed to cause, effect, influence or stimulate the
outcome
c. control variable
a variable which by itself may produce changes which may be mistaken to be the effect of the independent
variable being considered
needs to be controlled, held constant or randomized so that its effects are neutralized, cancelled out or
equated for all conditions
1.3.3 Levels of Variable Definition
A variable may be defined at three levels as follows:
a. Conceptual definition -- refers to the concept or issue of interest (ex., academic achievement of a student)
b. Operational definition -- actual data element collected in the study, in order to obtain information on the issue
of interest. Using the example in a) above, students grades may be used as the operational definition of the
concept of academic achievement of a student
c. Variable definition variable constructed from the data, to be used in the analysis. If, for example, students
grades will be used as the operational measure of academic achievement, there are several options as to how
grades will be used in data analysis for example, will it be the individual grades for various subjects which will
be used, or will it be summarized in the form of an average grade? If the average will be used as a summary
figure, what kind of average grade will be computed will it be the arithmetic average or the weighted average
of grades?
2. ISSUES IN DATA COLLECTION
2.1 What data do we collect?
involves transforming concepts into operational indicators
identification and definition of variables to be collected in the study
2.2 How do we collect the data?
two aspects of how:
Mode

of data collection
Design

2.2.1 Modes of data collection


review records
o cheap and fast
o common problems include incompleteness of records, non-comparability of definitions, and the difficulty of
validating entries
ask people questions
o personal interviews
o self-administered questionnaires
o group discussions (focus groups, nominal groups, etc.)
observe and/or document events as they happen
o qualitative/ethnographic research

pg. 11

o requires rigid training of data collectors; ensuring objectivity of observations may be a problem
o can provide more in-depth picture of non-quantitative phenomenon
2.2.2 Design
Use of primary vs secondary data
Observational vs Experimental
Cross-sectional vs Longitudinal
Paired vs Independent Samples
W
ith or without a control group
Quantitative vs qualitative approaches
2.3 From whom do we collect data?
identification of the most appropriate respondent
crucial issue when the subjects of the study are not in a position to provide answers (e.g., babies, sick
elderly)
2.4 Who should be the data collector?
Factors to consider:
required skills
cost
potential biases that can be committed
2.5 When and how often should data be collected?
Factors to consider:
objectives of the study
design
variable(s) being collected
important factor to consider when variable being studied is affected by time or seasonal patterns
2.6 Where should data be collected?
subjects home vs health facility vs public meeting place (e.g., barangay hall)
involves issues of practicality, the need to maximize study yield as well as to minimize biases
target vs sampling population
2.7 What procedures, activities or mechanisms are needed to minimize data collection problems?
memory/recall bias
lack of cooperation
Hawthorne effect (tendency of people to change their behavior because they are observed)
Observer bias
3. DESIGN OF DATA COLLECTION TOOLS
3.1 TYPES OF DATA COLLECTION TOOLS
a. Interview Schedule a tool used by the interviewer to ask questions and record responses during the
conduct of personal interviews.
b. Questionnaire a data collection tool which is self-administered or completed without the assistance of an
interviewer.
c. Form a concise data collection tool. It contains only labels or names of variables (ex., age) instead of the
items being phrased in question form (How old are you?)
d. Guide questions a listing of questions which serve as discussion or observation guides to be used for
qualitative modes of data collection like focus group discussions, nominal group techniques, participant
observation, etc.)
3.2 GOALS IN DESIGNING DATA COLLECTION TOOLS

pg. 12

3.2.1 RELEVANCE
What specific kinds of data are needed by the researcher?
The inclusion of each item in the data collection tool must be justified in relation to:
a. why the item or question will be asked (RESEARCH OBJECTIVES)
b. what will be done with the information (DATA ANALYSIS)
3.2.2 ACCURACY
enhanced when the wording and the sequence of the items/questions are designed to facilitate recall or to
motivate the respondent to answer accurately
3.3 CONSIDERATIONS IN THE CONSTRUCTION OF DATA COLLECTION TOOLS
a. Who will make the entries?
b. Wording of questions
c. Sequence and flow of questions
d. Number of questions or items asked
e. Purpose and relevance
f. Is the questionnaire or interview schedule to be used repeatedly?
g. What type of data processing will be used?
3.4. GUIDELINES TO QUESTION DEVELOPMENT AND FORMATTING
3.4.1 General Guidelines
a. Remember that the aim of designing data collection tools is to obtain complete and accurate information
which is relevant to the objectives of the data collection activity
b. Remember that the respondent is doing the data collector a favor by providing the necessary information.
c. Justify the relevance of each question or item in the data collection tool.
Avoid extraneous or irrelevant questions
Avoid back-rider questions to the extent possible.
d. Be sensitive to concerns the respondent may have to his/her privacy
Empathy think as a respondent when developing the data collection tool
3.4.2 Question Wording
a. Be careful about questions which require respondents to recall events or facts which occurred sometime in
the past
need to minimize recall or memory bias
the respondent can be helped in recalling events by tying-up dates with significant events
b. Use simple, generally familiar words which respondents might use in a conversation. Avoid technical jargon,
formal language and colloquialism
c. Avoid questions which are ambiguous because of a generally inadequate frame of reference
ex., How many times were you sick?
d. Avoid multi-barrelled questions. These are questions which are ask for more than one item at the same
time.
e. Avoid leading questions. These are questions which are phrased in such a way that the respondent gets a
clue on what the desired response is, and will be encouraged to provide it.
f. Avoid emotionally charged words in questions which arouse positive or negative feelings which might
overshadow the specific content of the question.
3.4.3 Format
a. When arranging the sequence of the questions or items in the data collection tool, start with those which are
easy to administer or to answer. The first questions should be an attempt to:

pg. 13

create the respondents interest and motivation


build the respondents confidence in the data collection activity
b. Items or questions should be grouped according to subject areas to avoid an unnatural flow
c. Respondents should be eased into embarrassing or sensitive questions by a series of lead-in questions
d. A good data collection tool must be:
d1. easy to understand
important words and phrases are underlined or printed in italics
use of all capital letters for instructions to interviewers
d2. easy to follow
use of instructions
sequential numbering
indentation
e. Consider data processing when developing the format and listing of response options to closed response
questions
3.5 PRE-TESTING A DATA COLLECTION TOOL
3.5.1 General Guidelines
a. If the data collection tool was developed using a language/dialect which is different from the one which will
be used during the actual data collection, it should be translated first before it is pre-tested. The translated
version should be the one used for pre-testing.
b. Select a sample of individuals who are representative of the population towards which the data collection
tool is eventually intended
c. Administer the pre-test under conditions comparable to those anticipated in the actual data collection activity
d. Examine the returned trial forms or questionnaires for trouble signs items left blank or yielding no useful
information, ambiguous answers, etc.
e. Analyze the results to assess the effectiveness of the draft of the data collection tool to yield the desired
information.
f. Make appropriate deletions, additions and modifications to the data collection tool
3.5.2 Evidence for the Need for Revision
a. High incidence of dont know responses, or items left blank
b. High incidence of incomplete interviews
c. Responses to one or more questions fall into only one category
d. Too many ambiguous or qualified (e.g., It depends) responses
e. Responses elicited are irrelevant to the objectives of the data collection activity

Lecture 5:
EPIDEMIOLOGIC STUDY DESIGNS
Ophelia M. Mendoza, DrPH
Lecturer
WAYS OF CATEGORIZING STUDY DESIGNS
1. Objectives of the study
descriptive vs analytical

pg. 14

DESCRIPTIVE

ANALYTICAL

Describes
Is more exploratory
Profiles characteristics of group
Focuses on what
Assumes no hypothesis
Does not require comparisons between groups or
over time

Explains
Is more explanatory
Analyzes why group has characteristics
Focuses on why
Assumes a hypothesis
Requires comparisons between groups over time

WAYS OF CATEGORIZING STUDY DESIGNS


2. Degree of control of the independent variables by the investigator
observational vs experimental
3. Whether or not the study outcome has already occurred
prospective vs retrospective
4.Whether or not the data has already been collected (concept and terminology used by residency programs)
prospective vs retrospective
TYPES OF EPIDEMIOLOGIC STUDY DESIGNS
1. OBSERVATIONAL STUDIES
1.1 Case Studies/ Case Series
1.2 Cohort
1.3 Case-Control
1.4 Cross-sectional
1.5 Ecologic
2. EXPERIMENTAL STUDIES
2.1 Clinical Trials
2.2 Intervention Studies
CASE STUDY/CASE SERIES
a simple descriptive account of interesting characteristics observed either in a single (case) or in a group
(series) of patients
Subjects of a case study or case series are purposively selected, generally because they exhibit certain
characteristics which makes them differentfrom the typical population
generally involve patients seen over a relatively short period of time
does not include control subjects
does not involve any research hypothesis
findings generally lead to generation of hypothesis that are subsequently investigated in a cohort, casecontrol, or cross-sectional study
COHORT STUDIES
ADVANTAGES:
1. May yield information on the incidence of the disease
2. Possible to compute for the relative risk
3. The temporal relationship between exposure and disease is clearly defined.
4. The design is particularly efficient for studies involving rare exposure factors.
5. It is the strongest observational design for establishing cause-effect relationships.
DISADVANTAGES:
1. Time-consuming
2. Often requires a large sample size
3. Expensive
4. Not efficient for the study of rare diseases
5. Losses to follow-up may diminish validity
6. Changes over time in diagnostic methods may lead to biased results

pg. 15

CASE-CONTROL STUDIES
ADVANTAGES
1. Feasible when dealing with rare diseases
2. Requires a smaller sample size than a cohort study
3. Little problem with attrition
DISADVANTAGES:
1. Incidence rates and attributable risks cannot
be computed.
2. The temporal sequence between disease and exposure may be a problem
3. Big chance for bias in the selection of cases and controls
4. Difficult to obtain information on exposure if the recall period is too long.
5. Selective survival may bias the comparison.
POPULATION-BASED
CASE-CONTROL STUDY
Cases and controls are sampled from a
defined population
ADVANTAGES
1. Source population is better defined.
2. It is easier to make certain that cases and controls come from the same source population
3. The exposure histories of the controls are more likely to reflect those of persons without the
disease of interest.
HOSPITAL-BASED
CASE-CONTROL STUDIES
Investigator selects cases from persons withthe disease of interest who are admitted to a particular hospital
controls are selected from persons admitted with other conditions but with no evidence of the disease of
interest
ADVANTAGES
1. Subjects are more accessible.
2. Subjects tend to be more cooperative.
3. Background characteristics of cases and controls may be balanced.
4. Easier to collect exposure information from medical records and biologic experiments
CROSS-SECTIONAL STUDIES
ADVANTAGES:
1. Less time-consuming and less costly than prospective studies
2. They often serve as the starting-point in prospective cohort studies for screening-out already existing
conditions
3. The design allows the measurement of risk, although the estimate is not precise
CROSS-SECTIONAL STUDIES
DISADVANTAGES:
1. It does not enable the direct estimation of risk.
2. Prone to bias from selective survival
3. Often difficult to establish the temporal sequence of exposure factor and the disease
ECOLOGICAL STUDIES
- unit of observation and unit of analysis is an aggregate rather than individual persons
- most practical design to use when exposure level is relatively homogeneous in a population but differs
between populations (ex., water quality) or when individual measurements of exposure are impossible
(ex., air pollution)
- they are used to generate hypothesis, or as a quick method of examining associations
- they cannot be used as basis for making causal inference

pg. 16

its most serious flaw is the risk of ecological fallacy--i.e., the characteristics of the geographical units
are incorrectly attributed to individuals

Lecture 6
EXPERIMENTAL STUDY DESIGNS
Ophelia M. Mendoza, DrPH
Lecturer
FEATURES:
they provide the best evidence for testing any hypothesis or to investigate possible cause-effect
relationships
they resemble cohort studies in that they require follow-up of subjects to determine outcome
its essential distinguishing feature is that it involves action, manipulation or intervention on the part of the
investigator
it typically uses a control group as a baseline against which to compare the group(s) receiving the
experimental treatment
they are generally difficult to carry out and raise some ethical issues
DEFINITION OF TERMS
a. reference population the group of ultimate interest
b. experimental population the group actually studied
c. random allocation process of permitting chance to determine the assignment of subjects to sub-groups
assures similarity on the average
should be distinguished from random selection
COMMONLY USED EXPERIMENTAL DESIGNS (based on types of treatment and measurements included in
the study)
1. One-shot Case-Study Design
Treatment
Post test
X
T
1.1 Uses
a. to develop ideas
b. to explore researchable hypothesis (i.e., conduct fishing expeditions)
1.2 Disadvantages
a. It is difficult, if not impossible to know if any change has occurred or to assess the degree to which the
observed behavior resulted from the treatment or intervention
b. The design cannot be used as basis for making defensible conclusions
2. One-Group Pre-test Post-test Design
Pre-test
Treatment
Post-test
T1
X
T2
2.1 Advantages and Disadvantages
a. Pre-test provides comparison between performance by the same group before and after exposure to X
b. Its major limitation is that there is no control group to permit the assessment of the possibility that the
observed change was influenced by factors other than the treatment give
2.2 Possible Sources of Error
a. maturation -- subjects growing older, more tired, less enthusiastic or less attentive
b. testing effect the experience of T1 by itself may increase motivation or modify attitudes

pg. 17

c. changes in instrumentation changes in the type of test given, in scoring, in observation or interviewing
techniques or calibration of instruments which make T1 and T2 different events
3. Nonrandomized Control-Group Pre-test post-test Design
Pre-test
Treatment
Experimental Group
Control Group

Post-test T1 X T2
T1 - T2

3.1 Characteristics
a. The design requires pre and post treatment measures for both the experimental and control groups
b. Randomization is not done in the assignment of the experimental and control groups hence this is often
referred to as quasi-experimental design
3.2 Guidelines
a. Subjects in the experimental group should not be exposed to X before the pre-treatment measure
b. The control group should be drawn from a population similar to that of the experimental group
c. Analysis of pre-treatment measures should be made to ensure comparability of the experimental and control
groups
4. Randomized Control Group Post-test Only Design
Pre-test
Treatment
Post-test R X T2
Experimental Group
Control Group
R - T2
4.1 Characteristics
a. Requires the use of both a control and experimental group, with the assignment of subjects to groups strictly
at random. The design represents that of a true experiment.
b. Pre-treatment measures are omitted since randomization techniques ensure comparability and objectivity in
the assignment of groups
4.2 Uses
a. When pre-tests are unavailable
b. When subjects anonymity must be maintained
c. When pre-test may interact with the intervention or treatment X
4.2 Uses
a. When pre-tests are unavailable
b. When subjects anonymity must be maintained
c. When pre-test may interact with the intervention or treatment X
5. Variations of Nonrandomized Control Group Pre-test Post-test Design
5.1 Extending design to include additional post-treatment measures
Pre-test
Treatment
Post-test T1 X T2 T3 T4 ......
Experimental Group
Control Group
T1 - T2 T3 T4 ......
ISSUES IN EXPERIMENTAL DESIGNS
1. What comparisons shall be made?
1.1 Between the treatment being studied and the complete absence of treatment
1.2 Between the test treatment and another treatment known or believed to be without therapeutic effect
(placebo)
1.3 Between the test treatment and another treatment of established therapeutic efficacy

pg. 18

1.4 Between one form of treatment and another form(s) of the same treatment (e.g., different dose levels of the
same drug, routes of administration, etc.)
1.5 Between early and later effects of the same treatment
2. Differences in composition of the study and control groups
Remedies:
a. randomization
b. stratified randomization or blocking
c. matching
d. using the patient as his own control
3. Subject expectations and observer bias
Open trial -- both subject and investigator are fully aware of what treatment is being given/received
Single-blind trial either the subject or the investigator (usually the former) is unaware of the nature of the
treatment given
Double-blind trial neither subject nor person assessing efficacy is aware of the nature of the treatment given
Treble-blind trial the subject, data collector and the data analyst are all unaware of the nature of the
treatments given
4. Interference between treatments
Cross-over design subjects in each group are taken off one treatment and crossed over to the treatment
previously given to other subjects
5. Sample attrition
creates problems in statistical analysis of the data
affects the comparability of the treatment and control groups
severe side-effects in treatment group may lead the investigator to withdraw patients from the trial, leaving
only the successes; treatment will appear more successful than what it really is
treatment may be so effective that patients believe themselves to have been cured and cease taking the
medication; successes disappear from treatment, leaving the treatment to appear less effective than what it
really is
drop-out rate should be considered in sample size estimation
drop-out data can be used as indicator if therapeutic usefulness and effectiveness and should be considered
when drawing conclusions from the trials
6. Ethical issues
the random allocation of subjects to the control and experimental groups gives rise to certain ethical
questions
informed consent in writing must be obtained from subjects
BASIC CATEGORIES OF EXPERIMENTAL DESIGNS BASED ON RANDOMIZATION PROCEDURE USED
1. Completely Randomized Design (CRD)
1.1 Description
a. Used for single factor experiments, where the effect of only one variable is being studied.
b. Random samples of size n are selected from each of k populations, representing the different
categories/groups of the variable or treatment being studied.
c. The sample size per treatment group may or may not be equal. Differences in sample size do not complicate
the computations for the one-way ANOVA to be applied to analyze the data, even when manual computation is
done.
1.2 Example

pg. 19

An experimenter is interested in evaluating the effectiveness of 3 methods of teaching a course. Three groups
of 8 subjects each were selected at random. The subjects were then taught using one of the 3 teaching
methods being tested. Upon completi9n of the course, each of the sub-groups was given a common test, and
their scores are shown below. Which of the 3 teaching methods is the most effective?

TOTAL
MEAN

Method 1
3
5
2
4
8
4
3
9
38
4.75

Method 2
4
4
3
8
7
4
2
5
37
4.62

Method 3
6
7
8
6
7
9
10
9
62
7.75

137
5.71

2. Randomized Complete Blocks Design (RCBD)


2.1 Description
a. RCBD is the experimental counterpart of stratified random sampling in sample surveys.
b. The term block originated from agricultural experiments which referred to an area of land. Each block is
divided into k sub-blocks of equal size. Within each block, the k treatments are assigned at random to the subblocks. All treatments are applied to each block.
c. In general, a block corresponds to a repetition of an experiment under essentially comparable conditions.
The principle underling blocking of treatments is to provide a certain degree of control over the heterogeneity of
the experimental units.
d. The statistical tool used to analyze data from an RCBD is the two-way ANOVA which tests for two null
hypothesis: equality between treatments and equality between blocks.
2.2 Example
Four different machines were compared in terms of the time it takes to assemble a particular product. It was
decided to use 6 different operators to compare the machines since the operation of the machine requires a
certain amount of physical dexterity and it is known that there is a difference in the speed between operators
in operating the machines. The results of the experiment are shown in the table below: Time (in seconds0 to
assemble product by machine and operator
Machine 1
Machine 2
Machine 3
Machine 4
Total
Mean
Operator 1
43.5
39.8
40.2
41.3
163.8
40.95
Operator 2
39.3
40.1
40.5
42.2
162.1
40.52
Operator 3
39.6
40.5
41.3
43.5
164.9
41.22
Operator 4
39.9
42.3
43.4
44.2
169.8
42.45
Operator 5
42.9
42.5
44.9
45.9
176.2
44.05
Operator 6
43.6
43.1
45.1
42.3
174.1
43.52
Total
247.8
248.3
255.4
259.4
1010.9
42.12
Mean
41.30
41.38
42.57
43.23
42.12
3. Factorial Design
3.1 Description
a. Factorial experiments permit the experimenter to evaluate the combined or simultaneous effects of 2 or
more experimental variables.
b. Factorial experiments enable the evaluation of interaction effects. This is the effect attributed to the
combination of variables above and beyond that which can be predicted from the variables taken singly.
Five lambs were assigned at random to each of 4 treatment groups (AM control; PM control; AM treated; and
PM treated). The results are shown below
Plasma Phospholipid Levels in
Lambs According to Treatment
Group Diethylstilbestrol Status
Control

Bleeding Time: AM

Bleeding Time: PM

8.53
20.53

39.14
26.20

pg. 20

12.53
31.33
14.00
45.80
10.80
40.20
Treated
17.53
32.00
21.07
23.80
20.80
28.87
17.33
25.06
20.07
29.33
Source: Steel, R. and Torrie, J. Principles and Procedures of Statistics, page 201
Mean
Plasma
Phospholipid Levels in
Lambs
According
to
Treatment
Group
Diethylstilbestrol Status
Control
Treated
Total

Bleeding Time:
AM

Bleeding Time:
PM

Total

13.28
19.36
16.32

36.53
27.81
32.17

24.91
23.59
24.25

4. Latin Square Design


a. This design is useful when the order in which the treatments are given may have an effect and hence needs
to be controlled. This design has been used to advantage in many kinds of experiments where two major
sources of variation, represented by the row and column variables, are present.
b. Treatments are assigned in blocks in two different ways, namely by rows and columns. Each treatment
appears only once in each row and column. Each row and each column is a complete block
c. For example, in an experiment involving 3 treatments (A,B and C) the order in which they are given may be
any of the following:
A
B
C
A
C
B
C
A
B
C
B
A
B
C
A
B
A
C
For an experiment with
4 treatment categories,
the arrangement can be:
A
B
D
C

D
C
A
B

C
A
B
D

B
D
C
A

pg. 21

LECTURE 7
MOST COMMONLY USED MODES OF DATA COLLECTION
IN QUALITATIVE STUDY: IN-DEPTH INTERVIEWS
AND FOCUS GROUP DISCUSSIONS
Ophelia M. Mendoza, DrPH
Lecturer
1. COMPARISON BETWEEN QUANTITATIVE AND QUALITATIVE RESEARCH METHODS
ASPECT
General framework

Analytical objectives

QUANTITATIVE
Usually
Seeks
to
confirm
hypotheses about
phenomena
Instruments use more rigid
methods
of
eliciting
and
categorizing
responses
to
questions
Use highly structured methods of
data
collection
such
as
questionnaires, surveys,
and structured observation
To quantify variation
To predict causal relationships
To describe characteristics of a
Population

QUALITITATIVE
seek to explore phenomena
Instruments use more flexible,
iterative style of eliciting and
categorizing
responses
to
questions
Use semi-structured methods
such
as in-depth interviews, focus
group discussions, and participant
observation

To describe variation
To
describe
and
explain
relationships
To
describe
individual
experiences
To describe group norms
Question format
Closed-ended
Open-ended
Data format
Numerical (obtained by assigning Textual
(obtained
from
numerical values to responses
audiotapes,
videotapes, and field notes)
Flexibility in study design
Study design is pre-determined Some aspects of the study are
and stable from beginning to end
flexible (for example, the addition,
Participant responses do not exclusion, or wording of particular
influence or determine how and interview questions)
which questions researchers ask Participant responses affect how
next
and which questions researchers
Study design is subject to ask next
statistical
assumptions
and Study design is iterative, that is,
conditions
data collection and research
questions are adjusted according
to what is learned
Method of sample selection
Makes use of a representative Makes use of non-probability
sample selected through the sampling
designs,
usually
application of a probability purposive, quota and snow-ball
sampling design
sampling
Data analysis
Information collected is classified Information is classified into
according
to
predetermined categories which are identified in
categories (deductive process)
the data itself through an
inductive process
Succinct, quantifiable, can be Extensive, descriptive, cannot be
presented in numerical tables and succinctly
presented,
analyzed statistically
interpretation more subjective
Adopted from Family Health International.(2005) Qualitative Research Methods: A Data Collectors Guide
(Module 1: Qualitative Research Methods Overview).
2. IN-DEPTH INTERVIEWS
2.1 Description of Method

pg. 22

a. The in-depth interview is a technique designed to elicit a vivid picture of the participants perspective on the
research topic. It is a useful and effective method to use when the objective is to elicit individual experiences,
opinions, feelings as well as when addressing sensitive topics.
b. During the in-depth interviews, the person being interviewed is considered as an expert to the topic being
considered in the research. Hence this method is also called a key informant interview (KII).
c. Subjects or respondents of an in-depth interview are purposively selected based on their position, or specific
characteristic which makes them the best source if information regarding the topic being considered for the
research.
d. The researchers interviewing techniques during an in-depth interview are motivated by the desire to learn
everything the participant can share about the research topic or issue being discussed. Researchers engage
with participants by posing questions in a neutral manner, listening attentively to participants responses, and
asking follow-up questions and probes based on those responses. They do not lead participants according to
any preconceived notions, nor do they encourage participants to provide particular answers by expressing
approval or disapproval of what they say.
e. In-depth interviews are usually conducted face-to-face and involve one interviewer and one participant.
Phone conversations and interviews with more than one participant also qualify as in-depth interviews. On
average, in-depth interviews last from one to two hours.
2.2 Skills Needed in Conducting In-depth Interviews
Skills Needed
What
required
skills
mean
Emphasizing the
participants

Treating
the
perspective
participant as the expert

Keeping
the
participant
from
interviewing you
Balancing deference
to the participant with
control over the interview
Being an engaged
listener

Demonstrating
a
neutral attitude

Rationale
The
interviewers
perspective
on
the
research issue should be
invisible..
This avoids the risk that
participants will modify
their responses to please
the interviewer instead of
describing their own
perspectives.

Tip
Remember that the
purpose of the interview
is
to
elicit
the
participants perspective;
consider
yourself
a
student
If a participant asks
for factual information
during the interview,
write down the questions
and respond after the
interview is over.
If a participant asks
what you think, deflect
the question. Let the
participant know that you
consider his or her point
of view more important.

Dont
overcompensate
for
perceived
status
differences by giving the
participant too much
control
over
the
interview.
Pay attention to what
participants say and
follow up with relevant
questions and probes.
Be aware that what
you say, how you say it,
and your body language
can convey your own
biases and emotional
reactions
Use them instead to
convey neutrality and
acceptance.

pg. 23

Adapting to different
personalities and
emotional states

Being able to quickly


adjust your style to suit
each individual
Participant

Every participant has a


unique character and
demeanor. By adopting
an appropriate
demeanor
for
each
individual, the
interviewer can help the
participant
be
comfortable enough to
speak freely about the
research topic.

Different interviewing
styles may be needed for
different participants
for example, be able to
retain control of a
conversation
with
a
dominant personality and
to
animate
a
shy
participant.
Know how to tone
down
heightened
emotions, such as when
a participant starts crying
or becomes belligerent.
Adapting to each
individual may require
softening the way you
broach sensitive issues,
adjusting your tone of
voice to be more sober
or upbeat, or exhibiting
increased

Source: Family Health International (2005). Qualitative Research Methods: A Data Collectors Guide (Module
3:In-depth
Interviews).
3. FOCUS GROUP DISCUSSIONS
3.1 Description of Method
a. A focus group is a qualitative data collection method in which one or two researchers and a number of
participants meet as a group to discuss a given research topic.
b. A focus group consists of a small number (8-12) of relatively similar individuals who provide information
during a directed and moderated interactive group discussion. Focus group participants are typically chosen
based on their ability to provide specialized knowledge or insight into the issue under study.
c. Focus groups are especially effective for capturing information about social norms and the variety of
opinions or views within a population. The richness of focus group data emerges from the group dynamic and
from the diversity of the group. Participants influence each other through their presence and their reactions to
what other people say. Because not everyone will have the same views and experiences because of
differences in age, gender, education, access to resources, and other factors many different viewpoints will
likely be expressed by participants.
d. One researcher (the moderator) leads the discussion by asking participants to respond to open-ended
questions that is, questions that require an in-depth response rather than a single phrase or simple yes or
no answer. A second researcher (the note-taker) takes detailed notes on the discussion. A principal
advantage of focus groups is that they yield a large amount of information over a relatively short period of time.
They are also effective for accessing a broad range of views on a specific topic, as opposed to achieving group
consensus.
3.2 Preparing the FGD Guide Questions
a. Ask questions that encourage description and depth
One of the advantages of a focus group over a written survey is the opportunity to achieve greater depth of
understanding using open-ended rather than yes/no questions. FGD questions often begin with:
"How do you feel about .,"
"What is your opinion of ...." or
"Please describe."
Particularly effective are questions that begin with "how." Beware of "why" questions, however, at the beginning
of focus groups, because they may lead participants to justify their actions or opinions.
b. Use simple, clear language

pg. 24

Use language participants understand. Avoid asking questions that have several possible meanings or
questions that are so long that they are difficult to follow.
c. Avoid biased or leading questions
Avoid questions that lead respondents to answer a particular way. Similarly, avoid words such as "all,"
"always," "none," "never," "only," "just," and "merely," which may bias responses.
d. Use only one concept per question
Questions addressing more than one concept may confuse participants, leading them to answer only one part
of the question or to answer neither part. The solution is to separate two ideas into two questions.
e. List areas to probe
To ensure that the moderator consistently covers specific topics in all sessions, list probes or follow-up
questions after the main question.
Sample question: What are some factors that would motivate you to enroll for one course rather than another?
Probe: Explore the following factors:
i. fit with preferred schedule
ii. interest in subject
iii. instructor's reputation
iv. course difficulty and grading
v. career concerns
f. Organize focus group topics
Focus group discussions typically begin with general questions and end with one or two specific questions tied
to the study objectives. Because a group cannot adequately discuss a long list of questions in 90 minutes,
choose 6 to 10 questions, grouping similar questions. Once participants become comfortable, they may be
more likely to answer sensitive questions, so ask these questions toward the middle of the discussion.
3.3 Conducting the FGD
3.3.1 Participants
According to Goldenkoff (2004), "The key to focus groups is participant chemistry." To encourage participation
and openness, select participants with common concerns or backgrounds who don't know each other. The
American Statistical Association (1997) cautions, "Never put people together who are in the same chain of
command," so don't include a professor and her student or an employee and his boss in the same group. It's
not necessary to randomly select participants because results from a focus group are not meant to generalize
to a larger population. The goal is to recruit enough participants to get a full range of opinion, but not so many
as to discourage participation.

pg. 25

3.3.2 Setting The setting should be convenient, comfortable, and relaxing. Rooms with one-way mirrors,
conference tables, and microphones hanging from the ceiling may make participants feel like they are
performing, so make the setting informal, because people are more likely to open up if they feel at home. If
business operations are being discussed, a conference room may be fine, but for more personal topics, living
room-style seating is better. Serving light snacks and beverages can create a friendly atmosphere. If you are
using food as an incentive, however, serve it before or after the session, so it doesn't distract participants from
the discussion.
Dressing appropriately for the setting will improve rapport. It's acceptable to wear blue jeans for a student
focus group but better to wear more professional attire among program managers or administrators.
3.3.3 Moderating
An effective moderator keeps the discussion focused without discouraging the sharing of ideas and gets all
members to contribute while making sure that one or two members don't dominate. Some of the important
qualities of moderators are the following:
a. Knowledgeable: become thoroughly familiar with the topics of the focus group.
b. Enthusiastic: value your work but remain impartial.
c. Structuring: explain the purpose for the focus group; ask whether participants have questions.
d. Clear: ask simple, easy, short questions without using jargon.
e. Approachable: blend in; make sure the group can relate to you.
f. Gentle: allow people to finish; give them time to think; tolerate pauses.
g. Sensitive: listen attentively to what is said and how it is said; be empathic.
h. Open and flexible: respond to what is important to the participants.
i. Steering: know what you want to find out; keep the group focused; keep one or two members from
dominating.
j. Critical: prepare to politely challenge what is said. For example, you might question inconsistencies in
participants' replies.
k. Remembering and integrating: relate what is said to what has previously been said.
l. Interpreting: clarify and extend meanings of participants' statements without changing the meaning.
m. Inclusive: encourage reserved members to contribute by using eye contact, body language, and directly
asking for their input.
The focus group discussion begins with an introduction that explains the purpose, ground rules, and duration
(usually between 45 and 90 minutes) and conveys the expectation that everyone will contribute, all
contributions will be valued and remain confidential, and the session will be tape-recorded. Recording
increases the accuracy of your conclusions, so test your recording equipment immediately before each focus
group.
Inform participants of any exceptions to confidentiality. For example, if a participant discloses details of child
abuse or threats to his or her safety, you may be required by law to report this. Anticipate possible emotional
reactions from participants and how you will handle them.
After the introduction, the moderator typically has group members introduce themselves or uses an icebreaking
exercise to get them involved. To preserve confidentiality and commonality, the moderator should ask
members to introduce themselves by first name only and should avoid topics that emphasize differences in
status that might threaten cohesion.
For groups that focus on sensitive issues such as race or gender, the moderator's demographic background
should match that of participants.
Skilled moderators use reinforcers and probes. Reinforcers communicate interest in what members share but
don't suggest what is expected or acceptable. Use reinforcers like, "I see," or "Let me write that down," but
avoid comments like, "Excellent response," or nodding your head after some responses but not others. Try to
smile and appear open and friendly.
Be prepared to use probes such as, "Could you tell me some more about that?" "What do you mean by that?"
or "Anything else?" Allow participants time to respond, using silence in moderation to encourage someone to
expand on an answer. Nonverbal behaviors will help you judge whether a participant is uncomfortable or just
thinking about an answer. When a participant rambles or does not state a clear point of view, ask an
interpretive question, such as, "Do you mean that your priorities have shifted from developing programs to
building support for programs?"
A t the end of the discussion, summarize important points to ensure you have made the correct interpretation
and to allow participants to elaborate. Always thank respondents for their participation and ask them if they
have any questions for you.
3.4 STEPS IN ANALYZING FGD RESULTS
3.4.1 Review individual transcripts

pg. 26

a. Get the overall picture


Note general impressions
Review specific areas (comparison of FDG objectives vs results)
b. Edit the information collected
3.4.2 Data reduction this refers to the process of selecting, focusing, simplifying, abstracting and
transforming the data that appears in written-up field notes and transcriptions
a. Reading and annotating
Creating categories
Assigning categories
b. Maintaining a log book of responses - done not to generate numbers or statistical tables based on the
results, but to help the researcher assess the relative importance of the different responses
3.4.3 Data display
aims to provide an organized, compressed assembly of information that permits conclusion drawing
can be in the form of an extended text, a diagram, a chart, or a matrix
allows the analyst to extrapolate from the data enough to begin to discern systematic patterns and
interrelationships can be designed to facilitate either intra-case or cross-case analysis of the data
a. Splitting and splicing
Pattern-searching responses are organized such that commonalities are identified
Domain analysis similar to pattern searching, with the additional step of data being analyzed and
classified thematically
b. Linking data
c. Making connections
Discourse analysis -- the types of relationships implied in the respondents actual statements are
determined and indicated in diagrammatic representations
3.4.4 Conclusion drawing and verification
involves analyzing what the collected data means vis--vis the FGD objectives
verification entails revisiting the data as many times as necessary to verify the conclusions derived
3.5.5 Writing up the results
In analyzing FGD results, the following should be considered:
a. WORDS weigh the meaning of words used by the participants. Can a variety of words and phrases
categorize similar responses?
b. FRAMEWORK Consider the circumstances in which a comment was made (context of previous
discussions, tone and intensity of the comment)
c. INTERNAL AGREEMENT figure out whether shifts in opinions during the discussion were caused by group
pressure
d. PRECISION OF RESPONSES Decide which responses were based on personal experience and give
them greater weight than those based on vague impersonal impressions
e. THE BIG PICTURE Pinpoint major ideas. Allocate time to step back and reflect on major findings
f. PURPOSE OF THE REPORT Consider the objectives of the study and the information needed for
decision-making. The type and scope of reporting will guide the analytical process. For example FGD reports
typically are:
brief oral reports that highlight key findings
descriptive reports that summarize the discussion
analytical reports that provide trends, patterns, or findings and include selected comments
References:
Dawson, S., Manderson L., and Tallo, V. (1993). A Manual for the Use of Focus Groups. International Nutrition
Foundation for Developing Countries (INFDC), Boston, MA.

pg. 27

Family Health International (2005). Qualitative Research Methods: A Data Collectors Guide. Module 1
Qualitative Research Methods Overview. Research Triangle Park, NC.
Family Health International (2005). Qualitative Research Methods: A Data Collectors Guide. Module 3 Indepth Interviews. Research Triangle Park, NC.
Family Health International (2005). Qualitative Research Methods: A Data Collectors Guide. Module 4 Focus
Groups. Research Triangle Park, NC.
Frechling, J., Sharp, L. (1997). User-Friendly Handbook for Mixed Method Evaluations. Directorate for
Education and Human Resources, National Science Foundation
USAID Center for Development Information and Evaluation (1996). Performance Monitoring and Evaluation
TIPS: Conducting Focus Group Interviews. USAID CDIE

Lecture 8
SAMPLING DESIGNS
Ophelia M. Mendoza, DrPH
1. Advantages of Sampling
a. It is cheaper.
b. It is faster.
c. Better quality of information can be collected.
d. More comprehensive data may be obtained.
e. It is the only possible method when the procedure is destructive.
2. Definition of Terms
a. Population the entire group of individuals or items of interest in the study
b. Target population the group from which representative information is desired and to which inferences will
be made. Whatever conclusions will be derived from the study, will be generalized to the target population.
c. Sampling Population the population from which a sample will actually be taken
Ideally, the target population should be the same as the sampling population. However there are
certain instances when there is a gap between the two, resulting from limited resources and other field
realities. When this occurs, what is important is for the investigator to determine the extent and
direction of the bias (if any) created by the gap between the target and the sampling population

d. Elementary unit or elements an object or a person on which a measurement is actually taken


e. Sampling Unit units which are chosen in selecting the sample
f. Sampling frame a listing or any other material like spot maps or aerial photographs which shows or
accounts for the target population. It is a collection of the sampling units
3. Components of a Sampling Design
a. Where? -- geographic area to be covered by the survey
b. Who? elements (households, mothers, infants, etc.) to be studied in the survey. In cases when the
subjects of the survey are not in a position to provide the information (ex., young children, sick elderly), the
actual survey respondents must be indicated
c. How many? sample size; number of elements to be included in the survey and its basis. The values of
important parameters considered in sample size determination must be explicitly indicated (ex., specific
variable used as basis, anticipated value of the variable, confidence level; margin of error; power of the test,
etc.)
d. How to select? procedures to be followed in selecting the elements to be included. If stratification variables
are used, these should be mentioned with a concise justification why they were considered. If multi-stage

pg. 28

sampling is used, the sampling units at each level of selection and the corresponding sampling frames used
must be mentioned.
e. When? refers to the time period for the conduct of the survey. This is an important consideration when the
variable being studied has seasonality
4. Basic Sampling Designs
4.1 Non-probability Sampling Designs the probability of each member of the sampling population to be
selected in the sample is difficult to determine or cannot be specified. Hence the reliability of the resulting
estimates of the sample results cannot be assessed
4.1.1 Judgment or Purposive Sampling a representative sample is selected based on an experts subjective
judgment or on some pe-specified criteria
4.1.2 Accidental or haphazard sampling whatever comes on hand or whoever is available is included as
sample
4.1.3 Quota sampling data collectors are given quotas to meet; they keep on collecting data in a given place,
until the quota is met
4.1.4 Snow-ball technique frequently used in studying hidden populations like drug users, commercial sex
workers, etc. Selection of subjects is based on who the earlier respondents have identified as members of the
eligible population for the survey
4.2 Probability Sampling Designs -- the rules and procedures for selecting the sample and estimating the
parameters are explicitly and rigidly specified
4.2.1 Simple Random Sampling
Characteristics: Every element in the population has an equal chance of
being included in the sample
Procedures for Sample Selection:
a. Prepare the sampling frame
b. Number all the population elements in the sampling frame chronologically from 1 to N, where N is the
population size
c. Determine the required sample size, n.
d. Select n numbers at random between 1 and N, using either the lottery method or a table of random numbers
e. The population elements in the list whose numbers correspond to the n numbers randomly selected will
comprise the simple random sample
4.2.2 Stratified Random Sampling
Characteristics: This design is used when the investigator wants to:
a. ensure that groups of interest or subsections of the population
considered important for the study are adequately represented
b. derive reasonably accurate estimates for important subsections of the population
Procedures:
a. Identify the stratification variable.
b. Classify the population elements according to the categories of the stratification variable.
c. Number the population elements chronologically from 1 to N, within each category of the stratification
variable
d. Determine the sample size needed from each stratum
e. Within each stratum, select the required number of samples by simple random sampling.
A frequent question asked in relation to stratified random sampling is how to allocate the computed sample
size to the various strata. There are several ways of doing it, and one of the frequent methods used is by
proportional allocation. This is a commonly used method because its application results in equal probability of
selection. As such, data analysis will be simplified since it avoids the need of computing and applying sampling
weights in the estimation of population parameters. The following is an example of how proportional allocation
of samples is applied to the different strata:
Suppose we want to allocate 250 samples to 3 sample barangays included in the study. These 3 barangays
have the following populations:

pg. 29

BARANGAY
NUMBER
A
B
C
TOTAL

3000
10500
6500
20000

POPULATION SIZE
%
15.0
52.5
32.5
100.0

The 250 samples can be allocated to the 3 barangays to reflect the population distribution as follows:
BARANGAY

POPULATION SIZE

A
B
C
TOTAL

NUMBER
3000
10500
6500
20000

SAMPLE SIZE
%
15.0
52.5
32.5
100.0

NUMBER
38
131
81
250

%
15.0
52.5
32.5
100.0

4.2.3 Systematic Sampling


Characteristics:
a. Every element has an equal chance of being selected.
b. It is often used under the following conditions:
the population elements are too many to list or to number chronologically
a frame is not available
c. It is often used in combination with other designs
Procedures:
a. Determine the required sample size, n.
b. Determine the sampling interval, k, where:
k = population size
sample size
c. Select a number at random between 1 and k. The population element in the frame corresponding to the
random number selected will be the first to be included in the sample
d. Include in the sample survey every kth population element after the forst random number selected
Comparison of the method of sample selection between simple, stratified and systematic random sampling.
Suppose we have the following:
N = 800 households of which, NUrban = 320 and Nrural = 480
n
=
200
households,
of
which,nUrban = 80 and nRural =
120
METHOD
OF
SAMPLE
SAMPLING FRAME
SELECTION
SAMPLING DESIGN
Simple random sampling

List of 800 households, numbered


chronologically from 1 to 800

Select 200 numbers at random,


between 1 and 800

Stratified random sampling

Two sampling frames are needed:


a. For URBAN areas: List of urban
320 households, chronologically
numbered between 1 and 320
b. List o 480 rural households,
chronologically numbered between
1 and 480

Urban and rural samples are


selected separately, as follows:
a. For urban areas, 80 numbers
are selected at random between 1
and 320
b. For rural areas, 120 numbers
are selected at random between 1
and 480

Systematic sampling

Not needed
1. Compute for the sampling
interval, k where k=N/n. Therefore
k=800/200 = 4. This means that for
every 4 households in the
population, 1 household will be

pg. 30

selected as sample
2. Select a random number
between 1 and 4. Suppose #2 was
selected. Therefore the second
household in the population to be
studied is included as sample.
3. Every 2nd household thereafter
will be included on the study.
These include households number
2, 6, 10, 14, 18, 22, 26, 30, 34, 38.
etc.
4.2.4 Cluster Sampling
Characteristic:
a. It is used when a frame for the individual elementary units in the
population is not available. However, a frame for groups or clusters of
elements is available.
b. The sampling unit is different from the elementary unit.
Procedures:
a. Identify the groups or clusters of elementary units. It is best if the sizes of
the clusters are not too big and do not vary much from each other.
b. Select a random sample of clusters.
c. All elements in the selected clusters will be included in the survey.
4.2.5 Multi-stage Sampling
Characteristics:
a. It is generally used when the survey has a wide coverage and a sampling
frame for the elementary units is difficult to obtain.
b. Sampling is done in successive stages.
c. Data collection is concentrated only on the samples selected at each stage, resulting in lower cost per unit of
inquiry.
d. Stratification and systematic sampling may be incorporated at any stage.
e. Statistical analysis of the data is more complicated.
Procedures for Sample Selection
a. Identify the number of stages of selection to be used in the sampling
design and the sampling units to be used at each stage.
b. Determine the sample size necessary for each stage of selection.
c. Prepare the sampling frame for the 1st stage of selection, and select at random a sample of primary
sampling units (PSUs).
d. For each of the PSUs earlier selected, prepare the sampling frame for the 2nd stage of selection. Randomly
select the corresponding number of secondary sampling units (SSUs) from each PSU included in the sample.
e. Repeat the process of frame preparation and sample selection until the last stage of sampling is reached.

LECTURE 9
SAMPLE SIZE DETERMINATION
Ophelia M. Mendoza, DrPH
1. WHEN IS IT IMPORTANT TO DETERMINE THE ADEQUATE SAMPLE SIZE FOR A GIVEN STUDY?
When the study is based on a sample instead of the whole population
When a probability sampling design was used to select the sample
When it is important to derive precise estimates of the variables/parameters being studied
2. GENERAL COMMENTS ABOUT SAMPLE SIZE DETERMINATION:

pg. 31

2.1 Sample size determination is a complicated issue which needs a lot of:
a. statistical inputs
b. practical considerations
2.2 The formula for sample size determination differs according to:
a. type of study design
b. type of sampling design
c. type of variables being measured
d. study objectives
e. number of groups being studied and compared
2.3 The following generalizations can be made regarding the sample size requirements of a given study:
a. longitudinal study designs require larger samples than cross-sectional or case-control study designs
b. cluster sampling designs require larger samples than simple random sampling of elementary units
c. the smaller the value of the parameter being estimated, the larger the sample size needed
rare conditions
small differences
d. the more heterogeneous the variable is in the population, the larger the sample size that is necessary
e. the more precise and the higher the confidence level you wish to have for the resulting estimates, the larger
is the sample size needed
3. INFORMATION NEEDED FOR SAMPLE SIZE DETERMINATION
3.1 ESTIMATING A MEAN OR A PROPORTION:
a. the anticipated value of the parameter to be estimated in the study (e.g., the prevalence of the disease; the
average length of stay in the hospital, etc.)
Possible sources of this value are:
i. previous studies or past records
ii. values derived from the pre-test or pilot phase of the project
iii. an experts opinion or an educated guess
iv. conducting the study in two parts
b. the degree of precision required for the resulting estimates (margin of error) this can be expressed either
in absolute (ex., 5%) or in relative terms (ex., 5% of the value of the resulting estimate). The value of an
acceptable margin of error will depend on:
the magnitude/level of the parameter being estimated
how the results of the study will be used
available resources for the conduct of the study
c. the desired confidence level standard levels used are 90%, 95% and 99%, with 95% being the most
commonly used
d. the estimated degree of variability of the observations (variance or the standard deviation)
3.2 SAMPLE SIZE FORMULA FOR ESTIMATING PROPORTIONS , USING SIMPLE RANDOM
SAMPLING n = z2 P Q
d2
where:
z = a value derived from the normal distribution and is dependent on the desired confidence level
for the derivation of the estimate. The z-values corresponding to the standard confidence levels
used when deriving estimates in research studies
are as follows:
Confidence level
z-value
90%
1.645
95%
1.96
99%
2.58
P = anticipated value of the proportion to be estimated in the population
Q = 1 P (the complement of P, where P + Q = 1)

pg. 32

d = the margin of error or maximum permissible error; a measure of the desired level of precision for the
resulting estimates
3.2.1 EXAMPLE:
A Municipal Health Officer wishes to conduct a survey to determine the prevalence of malnutrition among
preschoolers in his area. The only background data available is the result of a study done by his predecessor 5
years ago which indicates that the prevalence of moderate and severe malnutrition among preschoolers is
25%. If he decides to select a random sample of preschoolers for his study , how big should his sample size be
if he sets his error rate to be within 5%, with 95% confidence?
For this example:
z = 1.96 (based on the desired confidence level of 95%)
P = 0.25 (malnutrition prevalence based on a survey done 5 years ago)
Q = 1 - 0.25 = 0.75
d = 0.05
Therefore, n = (1.96)2 (0.25)(0.75) = 288
(.05)2
Note that sample sizes for single proportions corresponding to a given confidence level have already been
tabulated. An example is Table 1 from the book of Lwanga and Lemeshow which is included in your handout.
From this table, we get the same sample size of 288.
3.3 COMPARING TWO SAMPLE PROPORTIONS, P1 AND P2
3.3.1 INFORMATION NEEDED:
a. Anticipated values of the two proportions, P1 and P2
b. Magnitude of the difference between P1 and P2 which the investigator regards as clinically or
practically meaningful and which should be detected by the statistical test. If targets have been specified for
the amount of change observed in the indicators being studied in the research (ex., a 30% decrease in the
prevalence of malnutrition among preschoolers between the intervention and the control groups) the difference
between P1 and P2 will be equal to the target
c. Desired level of significance of the test () this refers to the degree of confidence with which it is desired
to be certain that an observed change or comparison group difference of the magnitude specified in (b) above ,
would not have occurred by chance. The conventional levels of significance used are 1%, 5% and 10%, with
5% being the most commonly used.
d. Desired power of the test (1-) this refers to the degree of confidence with which it is desired to be certain
that an actual change or difference of the magnitude specified in (b) above will be detected. Conventional
levels of statistical power used are 80% and 90%
3.3.2 SAMPLE SIZE FORMULA FOR TESTING FOR THE DIFFERENCE BETWEEN TWO
PROPORTIONS, USING SIMPLE RANDOM SAMPLING
n = { z1 - /2 [ 2P (1-P) ] + z1- [ P1 (1- P1) + P2 (1- P2) ] } 2
( P1 - P2 ) 2
Where:
P = ( P1 + P2 ) / 2
P1 = anticipated value of the population proportion for the first group; this is usually the
pre/baseline value, or the value of the control or the comparison group
P2 = anticipated value of the population proportion for the second group; this Is usually the
post/endline value, or the value of the study group
z1 - /2 = desired level of significance for the test of hypothesis
z1- = desired power of the test
The sample size values for the difference between two proportions for a one-tailed test with a 5% level of
significance and 90% power are presented in Table 5(a) of Lwanga and Lemeshow, which is included among
your handouts.
3.3.3 EXAMPLE:
a. It is believed that the proportion of patients who develop complications after undergoing one type of surgery
is 5% while the proportion of patients who develop complications after a second type of surgery is 15%. How
large should the sample size be in each of the two groups of patients if an investigator wishes to detect, with a
power of 90%, whether the second procedure has a complication rate significantly higher than the first at the
5% level of significance? (Ans. From Table 5(a) of Lwanga and Lemeshow, the needed sample size for this
study is 153 patients per group).

pg. 33

b. An NGO wishes to determine the effectiveness of their health education program on HIV/AIDS which they
have implemented. Among the important indicators used in this program is the proportion of female sex
workers (FSWs) who require their clients to use condoms. Suppose experiences in other areas where similar
projects on HIV AIDS have been conducted showed that the proportion of female sex workers who used
condoms before exposure to health education increased from 20% at baseline to 50% after the program. Using
these results from previous studies as basis, how many FSWs should be interviewed by the NGO in order to
determine, with 90% power and 5% level of significance, whether the proportion of FSWs who require the use
of condoms by their clients has significantly increased after their program? (Use Table 5(a) to determine the
sample size). (Ans. From Table 5(a) of Lwanga and Lemeshow, the needed sample size is 42 female sex
workers).
3.4 SAMPLE SIZE FORMULAS FOR ESTIMATING MEANS
3.4.1 ESTIMATING A SINGLE POPULATION MEAN
3.4.1.1 Formula for sample size determination
The sample size formula for estimating a simple population mean is:
n = z2 1-/2 2
d2
where:
= estimated standard deviation of the variable being studied
d = the desired precision, expressed in absolute terms
3.4.1.2 Example
Suppose a researcher wishes to determine the average increase in body weight of infant rats given treatment
A within a certain period of time. Since the study is still in its planning stage, there is no reliable estimate of the
variance or its standard deviation available. However, on the basis of a previous study, as well as of the results
of a number of pilot studies done on a small number of rats, it can be approximated that the standard deviation
of the body weight increase of infant rats would be 20g. How large should the sample size be if the researcher
wishes to have a margin of error of 10g for the resulting estimate with 95% confidence?
Solution:
In the above problem, z = 1.96 (based on a 95% confidence level)
= 20
d = 10
Substituting the above values in the formula for sample size determination for estimating a single mean, we
get:
n = (1.96)2 (20)2 = (3.84)(400) = 1536 = 15.4 or 16 rats
(10)2 100 100
3.4.2 TESTING FOR THE DIFERENCE BETWEEN TWO POPULATION MEANS
3.4.2.1 Formula for sample size determination
The formula for comparing the means of 2 groups, with variances assumed to be equal in each group
n = 22 [ z1-/2 + z1- ]2
( 1 - 2 )2
where:
= estimated standard deviation of the variable being studied, assumed to be equal for each group
1 - 2 = clinically or practically meaningful difference between the means of the 2 groups being which the
investigators wish to be detected in the study
and z1-/2 and z1- have the following values corresponding to the confidence level and power of the test,
respectively:
Confidence level
90%
95%
99%

Value of z1-/2
1.645
1.96
2.58

Power of the test


70%
80%
90%
95%

Value of z1-
.524
.842
1.282
1.645

3.4.2.2 Example
A researcher wishes to compare the mean weight gain of rats subjected to treatment A with those subjected to
treatment B in a given period of time. He considers a difference of at least 30g between the two groups to be
practically meaningful. How many rats per group should he include in his study if he sets his confidence level
and power of the test to be both 95%? Based on past studies, the standard deviation of the weight gains of rats
given treatments A and B were found to be equal at 20g.

pg. 34

Solution:
In the above problem:
= 20g
z1- = 1.96
z1- = 1.645
1 - 2 = 30g
Substituting the above values in the formula for sample size determination for the difference between two
means:
n = 22 [ z1-/2 + z1- ]2 = 2(20)2 [1.96 + 1.645]2 = 800 (12.9960) = 11.6
( 1 - 2 )2 (30)2 900
n = 12 rats per group
3.5 ADDITIONAL CONSIDERATIONS IN SAMPLE SIZE DETERMINATION
a. When deciding on the sample size requirements for a study with more than one objective involving the
estimation and/or testing of several parameters and hypothesis, the sample size requirement of each important
parameter has to be computed and considered.
b. When estimating a proportion whose value is unknown, a common practice is to assume that P=.50. The
basis for this is the fact that the variance of indicators which are in the form of proportions have a maximum
value when P=.50 and Q=0.50, and hence will ensure an adequate sample size irrespective of the actual value
of P.
c. When the sampling design used makes use of cluster sampling instead of pure simple random sampling, the
sample size has to be corrected for the design effect (deff) i.e.,
n(cluster sampling) = n(simple random sampling) x deff
Deff is the factor by which the sample size for a cluster sample has to be increased in order to derive
estimates with the same precision as a simple random sample. It has been shown that for most health surveys,
deff = 1.5 to 2.0, with deff=2.0 being a common value used.
d. In order to ensure that the required sample size is reached, a correction factor for non-response is usually
applied at the time of sample size determination. This avoids the need for looking for substitutes during data
collection which usually introduces biases in sample selection. The non-response rate varies depending on the
survey setting (ex., urban areas generally have high non-response rated compared to rural areas; surveys
which ask for sensitive questions also have higher non-response rates) but in general, an inflation factor of
10% for non-response has been shown to be adequate in most situations. Therefore, if for example, the
required sample size of a given survey after applying the design effect is 800, then the revised target sample
size after applying for the correction factor for non-response will be 800 + 80 = 880. Data collection activities
should therefore be planned for a sample size of 880.
e. There are instances when the computed sample size is deemed too big relative to the population size.
(There are even instances when the computed sample size is bigger than the population size).This is when the
finite population correction (fpc) can be applied to determine the final sample size to be considered. The
sample size formula after application of the fpc is:
nfpc = ______n0_________
1 + n0/N
where nfpc = computed sample size after application of the finite population correction
n0 = initial sample size computed prior to application of fpc
N = population size
References:
Aday, L.A. Designing and Conducting Health Surveys A Comprehensive Guide (Second Edition). Jossey-Bass
Publishers. San Francisco. 1996
Lwanga, S.K. and Lemeshow, S. Sample Size Determination in Health Studies A Practical Manual.
World Health Organization. Geneva. 1991.
Magnani, R. Sampling Guide. Food and Nutrition Technical Assistance Project. USAID.1997

LECTURE 10
AN OVERVIEW OF DATA ANALYSIS TECHNIQUES
Ophelia M. Mendoza, DrPH

pg. 35

1. FACTORS TO BE CONSIDERED IN CHOOSING THE MOST APPROPRIATE STATISTICAL TEST TO BE


APPLIED IN DATA ANALYSIS
1.1 Objectives of the study
Consider the following research objectives:
a. To determine the mean birthweight of babies born to mothers in the following age-groups: <18, 1835, and 36-49
b. To compare the mean birthweight of babies born to mothers in the following age-groups: <18, 18-35,
and 36-49
c. To compare the incidence of low birthweight among babies born to mothers in the following agegroups: <18, 18-35, and 36-49
d. To determine if there is an association between the incidence of low birthweight of the baby and the
age of the mother
e. To determine if age of the mother is a predictor of the incidence of low birthweight of the baby

1.2 Scale by which the variables are measured


a. Nominal scale
categories are always qualitative
categories are merely labels or descriptions
only counts and proportions or percentages can be applied
b. Ordinal Scale
categories can be qualitative or quantitative
categories can be ranked or ordered
quantitative data in the ordinal scale only considerers the order, and not the magnitude or the value
c. Ratio scale
categories are always quantitative
has a fixed zero point
d. Interval Scale
categories are always quantitative
has an arbitrary zero point
1.3 Study Design
a. Type of Study
observational vs experimental
type of observational or experimental study
b. Mode of sample selection
sampling design used (for sample surveys)
randomization procedure applied (for experiments)
c. Type of samples
independent samples
related samples
d. Number of groups being compared
one group
two groups
three or more groups
1.4 Number of variables included in the analysis
univariate analysis
bivariate analysis
multivariate analysis

pg. 36

1.5 Sample size


large samples
small samples
2. STATISTICAL TECHNIQUES COMMMONLY USED
2.1 Descriptive Statistics
2.1.1 Statistical Tables
a. Example of one-way tables
Table 1. Distribution of Respondents According to Sex
SEX
Male
Female
TOTAL

NUMBER
365
469
834

PERCENT (%)
43.8
56.2
100.0

Table 2. Distribution of Respondents According to Smoking Behavior


SMOKING BEHAVIOR
NUMBER
Smoker
174
Non-smoker
660
TOTAL
834

PERCENT (%)
20.9
79.1
100.0

b. Example of a cross-tabulation
Table 3. Distribution of Respondents According to Sex and Smoking Behavior
SEX
SMOKER
NON-SMOKER
Male
102
263
Female
72
397
TOTAL
174
660

TOTAL
365
469
834

Suppose the research objective is to determine if there is a relationship between the smoking behavior and sex
of high school students in private and public schools. Which of the two dummy tables below is the most
appropriate in answering this objective?
Table 4. Distribution of High School Students According to Smoking Behavior, Sex and type of School Attended
Smoking Behavior

Sex of Student
Male
Female

Type of School Attended


Public
Private

TOTAL

Smoker
Non-smoker
TOTAL
Table 5. Distribution of High School Students According to Smoking Behavior,
Sex and type of School Attended
Smoking Behavior
Male
Smoker
Non-smoker
TOTAL

Public Schools
Female

Private Schools
Male

TOTAL
Female

EXERCISE ON CROSS-TABULATION
Construct dummy tables corresponding to the following objectives:
1. To compare the seasonal patterns in the incidence of diarrhea and acute respiratory infections in 2010 and
2011, for Municipality X.
2. To compare the incidence of food poisoning among those who ate and did not eat fresh lumpia during a
wedding party, for different age-groups of guests.
3. To determine the relationship between hypertension and diabetes among adults with different levels of
physical activity.
4. To determine the relationship between the immunization status of children and the educational attainment of
mothers in urban and rural areas.

pg. 37

5. To determine the relationship between family planning practice and socio-economic status among Catholic
and non-Catholic women
2.1.2 TABLE 6. TYPES OF GRAPHS COMMONLY USED IN PRESENTING STATISTICAL DATA TYPE OF
GRAPH
Graphical Presentation
TYPE OF GRAPH
Histogram

TYPE OF VARIABLE/
DATA BEING GRAPHED
Continuous quantitative

Frequency Polygon

Continuous quantitative

Bar Chart
(horizontal or vertical)

Qualitative, or
Discrete quantitative

Line Diagram

Absolute counts as well as


relative or summary figures of
both quantitative and qualitative
variables for which the analysis of
trends is relevant
Qualitative, or
broad categories of quantitative
variables

Pie Chart

Component bar diagram/chart

Qualitative, or
broad categories of quantitative
variables

Scatterpoint diagram

Quantitative
(discrete or continuous)

PURPOSE OF PRESENTING
THE GRAPH
To
present
a
frequency
distribution of a quantitative
continuous variable like age,
height, etc.
The same use as the histogram,
but is better to use when
presenting
more than one
frequency distribution in the same
graph (ex., comparison of the
weights of male and female
children)
To show or compare absolute
counts
or
relative
figures
(percentages, rates, etc) of
qualitative or discrete quantitative
variables
Used to show trends in absolute
counts, rates or means with
respect to time, age, etc.
Shows how a total is divided into
sub-categories; used when the
number of categories are not too
many
Same as the pie chart, but is
better to use when presenting or
comparing two or more sets of
data
To show the nature and the
strength of the relationship
between
two
continuous
quantitative variables

2.1.3 Measures of Central Tendency - measures used to summarize a set of quantitative data by computing for
a representative figure.
If I will be asked to describe the data by using just one value, what will that value be?
a. Mean
statistical measure derived by dividing the sum of all observations by the total number of observations
included in the computations
is easily affected by outliers with extremely high or low values; therefore it is not recommended to be used
when the data has extreme values
there are several kinds of means; the most commonly used are the arithmetic mean, the weighted mean
and the geometric mean
b. Median
computed by determining the middlemost value in a set of observations
is not affected by outliers with extremely low or high values
is usually the measure of central tendency used when the distribution is skewed
c. Mode

pg. 38

is determined by identifying the value which occurs most frequently in the data
it is possible for a data set not to have any mode at all; it is also possible for a data set to have several
modes
2.1.4 Measures of Dispersion - measures used to describe the degree of variability or heterogeneity of a given
data set
If I will be asked to describe how different are the values in the data by using just one value, what will that
value be?

a. Range
computed by determining the difference between the highest and the lowest value in the data set
it is easily affected by outliers with extremely high or low values
b. Variance and Standard Deviation
computed by determining the average of the squared deviations from the mean of a given data set
the units of the computed value of the variance are in squared units (ex., 5.8 kg.2 )
if the square root of the variance is extracted, the resulting value is called the standard deviation
the variance and the standard deviation are the measures of dispersion usually used in more advanced
statistical analysis like hypothesis testing
c. Coefficient of Variation (CV)
computed by expressing the standard deviation as a percentage of the mean
used when comparing the variability of two variables which have different units of measurement
VARIABLE

MEAN

Weight
Height

50.0 kg.
160.0 cm.

STANDARD
DEVIATION
3.9 kg
7.5 cm.

COEFFICIENT
OF
VARIATION
(3.9/50.0) x 100 = 7.8%
(7.5/160.0) x 100 =
4.7%

2.1.5 Measures of Location


a. Median
b. Quartiles
c. Deciles
d. Percentiles
2.2 INFERENTIAL STATISTICS
2.2.1 Estimation -- refers to the process by which a statistic computed from a random sample Is used to
approximate (estimate) the value of the population parameter
a. Point estimate -- a single numerical value used to approximate a population parameter
b. Interval estimate an estimate which consists of two numbers, a lower and an upper limit which serve as
the bounding values within which the parameter is expected to lie within a certain degree of confidence
2.2.2 Hypothesis-testing for means and proportions process of assessing evidence provided by the data in
favor of a statement
a. Test for a single mean (z-test or t-test)
b. Test for the difference between two means or more means (z-test or t-test; analysis of variance (ANOVA)
c. Test for a single proportion ( z-test)
d. Test for the difference between two proportions ( z-test)
2.2.3 Investigating Relationships Between Variables
a. Tests of associations for qualitative variables (X2- test)
b. Correlation analysis
c. Regression analysis
3. Types of Statistical Techniques and Tests Commonly Applied Corresponding to Different Types of Research
Objectives

pg. 39

Lecture 11
ADMINISTRATIVE ASPECTS OF RESEARCH:
DETERMINING THE PROJECT TIMETABLE AND BUDGET
Ophelia M. Mendoza, DrPH
Lecturer
1. THE RESEARCH TIMETABLE
1.1 What is the Research Timetable or Schedule?
It designates work to be done and specifies deadlines for completing tasks and deliverables. It includes:
Estimates of time/duration of each research task/activity
Start and finish dates of each task/activity
Sequence of tasks/activities
Names of responsible staff and resources assigned for each activity
It is a tool used by the Research Manager in planning, executing and controlling the various research tasks,
as well as in monitoring the progress of the research.
It defines timelines for key deliverables and sets expectations for research progress and completion
1.2 Steps in developing a research timetable
a. Determine the tasks to be included in the research timetable.
b. Determine the relationships among the tasks. This involves identifying the sequence of tasks especially
those which need to be completed before others tasks can be started, as well as those which can be
performed at the same time.
c. Identify/assign responsible persons for each task.
d. Estimate the amount of time/effort required for each task.
e. Consider the other variables that go into building the schedule like when, where and how the tasks must be
performed; expected delays due to uncontrolled circumstances like scheduled brown-outs; time constraints of
research staff working only on a part-time basis, etc.)
f. Construct a Gantt chart to graphically present the research timetable
1.3 Determining Duration of Data Collection Activity in Research
Data collection is generally allocated the longest time among the various tasks in research
It is important for the researcher to have a good basis in determining the duration of time needed for data
collection. In certain instances, especially when the research results are needed
immediately, the time for data collection may be set at the start (ex., maximum of 2 months). In this case, the
task of the researcher is to determine how many data collectors and other resources are needed in order to
ensure that data collection can be finished within the prescribed time.
Important inputs needed to determine the duration of data collection are:
Average length of time needed to collect data from one sample. For example, in a household survey, this
refers to the average length of time to complete one household interview, including the time needed to travel to
and locate the sample household. The average length of interview is one of the important variables to be
determined during the pre-testing of the data collection tool.
Number of data collectors to be used/hired
An example of how the duration of data collection is computed is as follows:
Suppose a particular research needs to interview 500 sample households. During the pre-test, it was
determined that the average time to interview one household is 2 hours including travel time. If 2 interviewers
are available to collect the data for the research. how long will it take to collect data from 500 households?
Note that the interviewers will be collecting data for 8 hours/day, d days per week.
No. of households/day/interviewer = 8 working hours per day 2 hrs per household
= 4 households/day/interviewer

pg. 40

Total number of households covered per day by the 2 interviewers:


4 households/day/interviewer x 2 interviewers = 8 households/day
Duration of time to interview 500 households:
500 households 8 households/day = 62.5 days
62.5 days 5 days/week = 12.5 weeks
12.5 weeks 4 weeks/month = 3.125 months
Suppose in the above example, the results are needed immediately so it was set at the start that data
collection should not exceed 2 months. How many data collectors are needed in order to finish the data
collection within 2 months? (Note: There are 22 working days/month; 8 hours/day)
Total number of days available for data collection = 2 months x 22 working days/month = 44 days
Number of households to be interviewed per day in order to finish the task within 44 days:
500 sample households 44 days of data collection = 11.4 12 households/day
Number of interviewers needed to collect data from 12 households/day:
Recall from earlier example that one interviewer can cover 4 households/day
Therefore, 12 households/day 4 households/interviewer/day = 3 interviewers
1.4 The Gantt Chart
A Gantt chart is a horizontal bar chart used to graphically present a project schedule. It was developed in
1917 by an American engineer and social scientist named Henry L. Gantt as a tool in production control.
The horizontal or x-axis of the Gantt chart represents the total time span of the project, broken down into
increments (days, weeks or months). The vertical or the y-axis represents the various tasks in the project, with
each task/activity represented by one bar.
It is possible for the span of some of the bars to overlap, which happens in the case of tasks/ activities
which are done at the same time.
2. BUDGET DEVELOPMENT
2.1 Important points to remember about budget development
A budget is a financial proposal which presents a computation of all the estimated costs associated with
conducting the research, both direct and indirect.
A number of research funding agencies like the US National Institutes of Health for example, use the
following criteria in reviewing the budget of research proposals submitted to them for funding: the costs
charged must be allowable; allocable; reasonable; necessary; and consistently applied irrespective of the
source of funds.
The first thing the proponent must do when developing a budget is to read the policies/guidelines of the
funding agency regarding research budgets in order to know:
What costs are allowed and disallowed
The limits set (i.e., how much/maximum amount which can be charged) for each type of cost
How different types of costs are defined/categorized
There must be consistency between the methods described in the proposal and the items/costs charged in
the budget.
The budget proposal must provide sufficient details which would enable the reviewers to determine whether
or not a particular cost charged in the budget is reasonable. For example, in presenting personnel costs, the
basic information needed would be:
Rate per person per unit of time (ex., salary/honorarium per month or per day)
Number of persons to be such rate
Amount of time
In developing the research budget, it is important to request for a reasonable amount of funds to conduct
the research not more and not less because significant over or under-estimation implies that you do not
understand the scope of the work and might be taken against you in the final decision whether or not to
approve your research proposal
Honesty, transparency and trust are important virtues to emulate in budget development

pg. 41

S-ar putea să vă placă și