Course Notes - Research Methods PDF

• The hallmarks or main distinguishing characteristics of scientific research may be
listed as follows:
o Purposiveness
o Rigor 
o Testability 
o Replicability
o Precision and confidence
o Objectivity
o Generalizability
o Parsimony
• The hypothetico-deductive method involves the seven steps listed and discussed
next.
1. Identify a broad problem area
2. Define the problem statement
3. Develop hypotheses 
4. Determine measures
5. Data collection 
6. Data analysis
7. Interpretation of data
• The research proposal drawn up by the investigator is the result of a planned,

organized, and careful effort, and basically contains the following:
1. The purpose of the study. 
2. The specific problem to be investigated.
3. The scope of the study. 
4. The relevance of the study. 
5. The research design offering details on:
• The sampling design. 
• Data collection methods.
• Data analysis.
6. Timeframe of the study, including information on when the written report will
be handed over to the sponsors. 
7. The budget, detailing the costs with reference to specific items of
expenditure.
8. Selected bibliography.
• Reliability: measures the precision of a measurement (you weigh yourself on a scale
throughout the day and the weight does not vary by much). In other words, the
reliability of a measure is an indication of the stability and consistency with which the
instrument measures the concept and helps to assess the "goodness" of a measure.
o Stability of measures: The ability of a measure to remain the same over time
- despite uncontrollable testing conditions or the state of the respondents
themselves
▪ Test-retest reliability: same test, different times in order to measure
consistency (same respondents)
▪ Alternate form / parallel form reliability: check reliability of testing
procedure using a clone test. When responses on two comparable sets
of measures tapping the same construct are highly correlated, we
have parallel-form reliability
o Internal consistency: The internal consistency of measures is indicative of the
homogeneity of the items in the measure that tap the construct. In other
words, the items should "hang together as a set,"
▪ Interitem-consistent reliability: test of the consistency of
respondents' answers to all the items in a measure. To the degree that
items are independent measures of the same concept, they will be
correlated with one another. The most popular test of interitem-
consistency reliability is Cronbach's coefficient alpha
▪ Split-half reliability: randomly divide test items in 2 groups and
measure correlation
• Validity: extent to which a test measures what it is intended to measure (an

intelligence test that actually measures intelligence)
o Internal validity: is high when we can isolate that the treatment caused the
outcome (and it is not caused by another independent variables). Disturbing
effects: research itself, external circumstances and error
o External validity: whether the findings are generalizable to other groups.
▪ Experiments have high internal validity (treatment applied to a group
actually caused the outcome). However, they have low external
validity: you cannot easily transfer what happens in a lab to the real
world
▪ Surveys have a large sample of the population: higher external validity
o Measurement validity
▪ Face validity: Is an index of content validity. Does the measure make
sense? Will the question asked operationalize what it intends to
operationalize? (question ‘Do you hate America?’ has nothing to do
with liberalism → low face validity). Are the items intended to be
measured, on the face of it, look like they measure the concept?
▪ Content validity: does the measure adequately capture the content’s
full meaning? If measures only capture a small slice of the concept →
low content validity. Solution may be to narrow the concept or expand
the number of measures
▪ Criterion validity: does the measure agree with outside measures
(that we know are related to the concept we’re trying to
operationalize). If our measures are good they should be related to
those outside measures
• Concurrent validity: people with high score on my measure
should have high scores on similar outside measures that
capture the same concept
• Predictive validity: people with high scores on my measure
should experience things that we associate with the concept
we’re capturing (if people with high socio-economic status live
longer, then people with high scores on my SES index should
live longer as well
▪ Construct / congruent validity: when using multiple questions /
variables to operationalize the same concept. These variables should
be related and distinct from other variables that measure other
concepts. Construct validity is about ruling out other possible
explanations.
• Convergent validity: variables should be related. Is established
when the scores obtained with two different instruments
measuring the same concept are highly correlated.
• Discriminant validity: collection of indicators you use to
operationalize a concept should not be related to other
concepts. Two variables are predicted to be uncorrelated, and
the scores obtained by measuring them are indeed empirically
found to be so.
Formative vs reflective measures
o Reflective
▪ Direction of causality is from construct to measure
▪ Measures are expected to be correlated
▪ Indicators are interchangeable
o Formative
▪ Directions of causality is from measure to construct
▪ Correlation among measures is not required
▪ Indicators are not interchangeable
• Primary and secondary data

o Primary data refer to information obtained first-hand by the researcher on
the variables of interest for the specific purpose of the study.
o Secondary data refer to information gathered from sources that already exist.
Secondary data refer to information gathered by someone other than the
researcher conducting the current study.
• Quantitative research: objective, concrete and measured. uses statistics /

experiments / surveys / observations. It does not require personalized
interpretations, so it should be replicable by anyone.
• Qualitative research: subjective, interpretive and descriptive → requires researcher
to interpret data (induction). Often includes interviews (with open-ended questions),
case studies and observation.
• Scientific statements can be tested (empirically) where non-scientific statements

cannot
• Empirical cycle
o Observation: observation sparks idea for hypothesis → notice a relation
o Induction: taking a statement that’s true in specific cases and inferring that
the statement is true in all cases (general rule → hypothesis set up)
o Deduction: deduce that the relation also holds in new specific instances (is all
about making predictions!). Definition of relevant concepts, measurement
instruments, procedures, the sample to collect new data
o Testing: collect new empirical data and compare them to the prediction (can
also serve as new observations for a new cycle)
o Evaluation: interpret results in terms of hypothesis
• Nature of research
o Exploratory: initial research into a hypothetical or theoretical idea. Either
taking well-defined theories and applying them in the new are or developing
your own theories from scratch. Research question mostly starts with ‘what
(is happening)?’
o Descriptive: attempts to explore and explain while providing additional
information about the topic. Builds on exploratory research. Usually requires
a lot of data. Research question mostly starts with ‘how (do these things work
together)?’
o Explanatory: tries to explain relationships between variables. Builds on both
exploratory and descriptive research. Explains why things happen. Research
question mostly starts with ‘why (do those things happen)?’
• Concepts, variables and constructs

o Concept: empirical abstraction to classify natural phenomena. It is the basic
idea.
o Variables: when concepts can be assigned with values, they can be
manipulated as variables, so that their relationship can be examined
o Constructs: abstract variables that cannot be measured directly, but by
measuring relevant correlated behavior or observable. Examples: itchiness or
intelligence
• Unit of analysis: the ‘who’ or ‘what’ that is being studied. Example: the organization
• Unit of observation: the unit described in the dataset. Example: workers in the
organization
• Cross-sectional study: snapshot of a particular state. Particular moment of a time

perspective. Mostly surveys or case studies
• Longitudinal study: studies the change and development of phenomena. Make
measurements at different times
• Types of variables
o Independent variable: the thing over which the researcher has control and is
manipulating
o Dependent variable: is believed to be dependent on the independent
variable
o Control variable: variable that is kept constant in each trial (number of people
in trial for instance)
o Moderator variable: can increase or decrease the relation between the
independent and dependent variable
o Mediating variable:
• Mixed method research: research study that combines quantitative and qualitative
methods to answer one research question
o Explanatory: first quantitative and then qualitative data are collected
(qualitative data serve the purpose to provide explanations for the
quantitative data)
o Exploratory: first qualitative and then quantitative data are collected
(qualitative data are analyzed and results are used to inform the collection of
the quantitative data)
o Convergent: calls for concurrent design in which the qualitative and
quantitative data are collected at approximately the same time
• Multi-method research: both qualitative and quantitative data are collected but are
used to answer different research questions
• Content analysis: systematic, objective and quantitative method for researching
messages. It can be used to analyze like media reports, interviews, speeches, etc. can
be very time consuming because researchers need to record all messages
o Manifest content: the actual word or phrase that is counted
o Latent content: the underlying meaning of the messages
• Degrees of freedom: number of elements in a system that are free to vary. So if you
put one value in the table below, the rest of the values are known already. It’s
calculated by: (rows -1) * (columns-1)
• Sampling
o Probability sampling: all units in the population has a chance of being
selected in the sample
▪ Simple random sampling: lottery system to determine which units are
selected. Applicable when population is small, homogenous and
readily available. Disadvantages: if sample is small vs big population,
minority subgroups may not be present
▪ Systematic sampling: ordering scheme and selecting elements at
regular intervals (every seventh house example in the book).
Advantages: sample is easy to collect and evenly spread.
Disadvantages: if hidden periodicity within the population coincides
with that of sample selection
▪ Stratified sampling: if population embraces a number of distinct
categories the frame can be organized in different ‘strata’ (mutually
exclusive groups). Each strata is sampled as a sub-population.
Different sampling approaches can be applied to different strata.
Disadvantages: sampling frame has to be prepared separately for each
stratum. Also, some criteria may be missed. Can be either
proportionate or disproportionate sampling.
▪ Multistage Cluster sampling: i.e. two-staged sampling. In the first
stage a sample of areas is chosen (area sampling). In the second stage
a sample of respondents within those areas is selected. Advantages:
cuts down costs of preparing sample frame and reduces
administrative costs. Disadvantages: sampling error may be higher
than with simple random sampling. Difference strata’s and clusters: all
strata’s are in the sample whereas not all clusters are. Also, stratified
sampling has best results when elements are internally homogenous
where cluster sampling is best when elements in clusters are internally
heterogeneous. Disadvantage: the conditions of intracluster (within)
heterogeneity and intercluster homogeneity are often not met.
▪Multistage sampling: complex form of cluster sampling in which two
or more levels of units are embedded (different phases). Is the process
of taking random samples of preceding random samples
▪ Double sampling: . A sampling design where initially a sample is used
in a study to collect some preliminary information of interest, and
later a subsample of this primary sample is used to examine the
matter in more detail, is called double sampling. For example, a
structured interview might indicate that a subgroup of the
respondents has more insight into the problems of the organization.
These respondents might be interviewed again and asked additional
questions.
o Non-probability sampling: some elements have no chance of being selected.
▪ Convenience sampling: type of sampling which involves the sample
being drawn from that part of the population which is close to hand
▪ Purposive sampling: researcher chooses the sample based on who
they think would be appropriate for the study
• Quota sampling: population is first segmented into mutually
exclusive sub-groups. Then judgement is used to select
subjects from each segment. Quota sampling ensures that
certain groups are adequately represented in the study
through the assignment of a quota.
• Judgmental sampling: judgment sampling design is used when
a limited number or category of people have the information
that is sought.
• Confidence interval: the mean for the population lies between A and B
• Variance: represents the average squared deviations between a group of
observations and their respective mean
• Heterogeneous vs. Homogeneous samples

o Heterogeneous sample: sample in which every member has a different value
for the characteristic you’re interested in (for example, different ages =
heterogeneity for age)
o Homogeneous sample: made up of things that are similar to each other (all
20-year old physics students for example). Homogeneous samples tend to be
small and made up of similar cases.
• Sampling process
o Define the population. 
o Determine the sample frame. 
o Determine the sampling design. 
o Determine the appropriate sample size.
o Execute the sampling process.
Population Sample
Mean  X
Standard Deviation  S
• Precision refers to how close our estimate is to the true population characteristic.
o Variability of the sample distribution of the sample mean is called the
standard error. S is the standard deviation of the sample, n is the sample size,
and Sx̄ indicates the standard error or the extent of precision offered by the
sample. If we want to reduce the standard error we have to increase the
sample size
• Confidence denotes how certain we are that our estimates will really hold true for
the population.
o The standard error Sx̄ and the percentage or level of confidence we require
will determine the width of the interval, which can be represented by the
following formula, where K is the t statistic for the level of confidence desired.
• Too large a sample size, however (say, over 500) could also become a problem
inasmuch as we would then be prone to committing Type II errors. That is, we would
accept the findings of our research, when in fact we should reject them
• Efficiency in sampling is attained when, for a given level of precision (standard error),
the sample size could be reduced, or for a given sample size (n), the level of precision
could be increased
1. Sample sizes larger than 30 and less than 500 are appropriate for most research. 
2. Where samples are to be broken into subsamples; (males/females, juniors/ seniors,
etc.), a minimum sample size of 30 for each category is necessary.
3. In multivariate research (including multiple regression analyses), the sample size
should be several times (preferably ten times or more) as large as the number of
variables in the study.
4. For simple experimental research with tight experimental controls (matched pairs,
etc.), successful research is possible with samples as small as 10 to 20 in size.
• Non-response error exists to the extent that those who did respond to your survey
are different from those who did not on (one of the) characteristics of interest in
your study. Two important sources of non-response are not-at-homes and refusals.
• Power: probability of correctly rejecting a null hypothesis when indeed it’s false
• Transform data (if not normal) -> look up!
• Factor analysis: tries to understand to what extent different factors correlate. It is a

multivariate technique that confirms the dimensions of the concept that have been
operationally defined, as well as indicating which of the items are most appropriate
for each dimension (establishing construct validity).
• Kinds of studies
o Field studies: correlational studies in natural environment (noncontrived =
work continues as normal)
o Field experiments: cause-and-effect studies in natural environment. Here, as
we have seen earlier, the researcher does interfere with the natural
occurrence of events inasmuch as the independent variable is manipulated
o Lab experiments: experiments done to establish a cause-and-effect
relationship and to control all extraneous variables are done in an artificial
environment
• Kinds of studies
o Cross-sectional studies (or one-shot studies): data are gathered just once
o Longitudinal studies: data is gathered at two different point in time.
Experimental designs are longitudinal studies, since data is collected before
and after manipulation. Field studies may also be longitudinal
• Operationalization: reduction of abstract concepts to render them measurable.
• Univariate techniques are used when you want to examine two-variable relation-
ships. For instance, if you want to examine the effect of gender on the number of
candy bars that students eat per week, univariate statistics are appropriate
• Multivariate statistical techniques are used when you’re interested in multiple
variables
• The Wilcoxon Signed-Rank Test is a nonparametric test for examining significant
differences between two related samples or repeated measurements on a single
sample. It is used as an alternative to a paired samples t-test when the population
cannot be assumed to be normally distributed.
• McNemar's Test is a rather straightforward technique to test marginal homogeneity.

Marginal homogeneity refers to equality (or the lack of a significant difference)
between one or more of the marginal row totals and the corresponding marginal
column totals.
• Independent Samples T-Test is carried out to see if there are any significant
differences in the means for two groups in the variable of interest. That is, a nominal
variable that is split into two subgroups (for example, smokers and nonsmokers) is
tested to see if there is a significant mean difference between the two split groups on
a dependent variable, which is measured on an interval or ratio scale (for instance,
extent of well-being; pay; or comprehension level).
• Whereas the (independent samples) t-test indicates whether or not there is a

significant mean difference in a dependent variable between two groups, an analysis
of variance (ANOVA) helps to examine the significant mean differences among more
than two groups on an interval or ratio-scale. The results of ANOVA show whether or
not the means of the various groups are significantly different from one another, as
indicated by the F statistic. The F statistic shows whether two sample variances differ
from each other or are from the same population
• Simple regression analysis is used in a situation where one independent variable is
hypothesized to affect one dependent variable.
• The basic idea of multiple regression analysis is similar to that of simple regression
analysis. Only in this case, we use more than one independent variable to explain
variance in the dependent variable.
• Discriminant analysis helps to identify the independent variables that discriminate a

nominally scaled dependent variable of interest. The linear combination of
independent variables indicates the discriminating function showing the large
difference that exists in the two group means
• Logistic regression is also used when the dependent variable is nonmetric. However,
when the dependent variable has only two groups, logistic regression is often
preferred, because it does not face the strict assumptions that discriminant analysis
faces and because it is very similar to regression analysis
• Conjoint analysis requires participants to make a series of trade-offs. In marketing,

conjoint analysis is used to understand how consumers develop preferences for
products or services.
• Two-way ANOVA can be used to examine the effect of two nonmetric independent
variables on a single metric dependent variable. wo-way ANOVA enables us to
examine main effects (the effects of the independent variables on the dependent
variable) but also interaction effects that exist between the independent variables
(or factors).
• MANOVA is similar to ANOVA, with the difference that ANOVA tests the mean
differences of more than two groups on one dependent variable, whereas MANOVA
tests mean differences among groups across several dependent variables
simultaneously, by using sums of squares and cross-product matrices
• Canonical correlation examines the relationship between two or more dependent

variables and several independent variables; for example, the correlation between a
set of job behaviors (such as engrossment in work, timely completion of work, and
number of absences) and their influence on a set of performance factors (such as
quality of work, the output, and rate of rejects)
• A dummy variable is a variable that has two or more distinct levels, which are coded
0 or 1. Dummy variables allow us to use nominal or ordinal variables as independent
variables to explain, understand, or predict the dependent variable.
• Multicollinearity is an often-encountered statistical phenomenon in which two or

more independent variables in a multiple regression model are highly correlated.
o If the objective of the study is to reliably estimate the individual regression
coefficients, multicollinearity is a problem.
o Multicollinearity is not a serious problem if the purpose of the study is to
predict or forecast future values of the dependent variable, because even
though the estimations of the regression coefficients may be unstable,
multicollinearity does not affect the reliability of the forecast.
o The tolerance value and the variance inflation factor (VIF - the inverse of the
tolerance value) are common measures for multicollinearity. These measures
indicate the degree to which one independent variable is explained by the
other independent variables. A common cutoff value is a tolerance value of
0.10, which corresponds to a VIF of 10.
o Methods to reduce collinearity:
▪ Reduce the set of independent variables to a set that are not collinear
(note that this may lead to omitted variable bias, which is also a
serious problem).
▪ Use more sophisticated ways to analyze the data, such as ridge
regression.
▪ Create a new variable that is a composite of the highly correlated
variables.

Course Notes - Research Methods PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Course Notes - Research Methods PDF

Încărcat de

Drepturi de autor:

Formate disponibile

• The hallmarks or main distinguishing characteristics of scientific research may be

• The research proposal drawn up by the investigator is the result of a planned,

• Validity: extent to which a test measures what it is intended to measure (an

• Primary and secondary data

• Quantitative research: objective, concrete and measured. uses statistics /

• Scientific statements can be tested (empirically) where non-scientific statements

• Concepts, variables and constructs

• Cross-sectional study: snapshot of a particular state. Particular moment of a time

• Heterogeneous vs. Homogeneous samples

• Transform data (if not normal) -> look up!

• Factor analysis: tries to understand to what extent different factors correlate. It is a

• Operationalization: reduction of abstract concepts to render them measurable.

• McNemar's Test is a rather straightforward technique to test marginal homogeneity.

• Whereas the (independent samples) t-test indicates whether or not there is a

• Discriminant analysis helps to identify the independent variables that discriminate a

• Conjoint analysis requires participants to make a series of trade-offs. In marketing,

• Canonical correlation examines the relationship between two or more dependent

• Multicollinearity is an often-encountered statistical phenomenon in which two or

S-ar putea să vă placă și