Sunteți pe pagina 1din 57

# RESEARCH

METHODOLOGY
LECTURE 6
VARIABLES, HYPOTHESIS
AND ERRORS
1

Imran Siddiqi
Dept of CS

imran.siddiqi@gmail.com

CONTENTS
Variables and Concepts
Types of variables
Hypothesis and Types
Testing the Hypothesis
Errors and Types

VARIABLE
"An attribute that is observable,
measurable, and has a dimension that
can vary".

## For example, temperature is a variable

that is observable, measurable, and
varies from high to low.
3

Meaning

With

## A variable can be measured, a

concept can not be.

Need

## CONCEPTS, INDICATORS &

VARIABLES
Concepts in your study How to measure?
Identify the indicators

## Convert the indicators to variables

Example

Concept

Rich
Indicator Income & Assets
Income(in \$) is also a variable
Assets

## House, Cars, Investements

Convert each of these into dollars

## Based on total income and total value of assets,

decide whether a given person is rich or not.

CONVERTING CONCEPTS TO
VARIABLES

Concept

Indicators

Variables

Decision level

Rich

1. Income
2. Assets

1. Income/year
2. Total value:
1. Home
2. Cars

1. If >\$100,000
2. If > \$250,000

achievement

1. Marks exam
2. Marks practical

1. Percentage
2. Percentage

1. If > 80%
2. If > 80%
6

TYPES OF VARIABLES

## Classification can be based on:

The

causal relationship
The design of study
The unit of measurement

The

## cause responsible for brining about a change in a

phenomenon or situation
Variable that is believed to cause or influence the
dependent variable

Variable

variable.

## that is influenced by the independent

Extraneous variables
Variables

## affecting the cause-and-effect relationship

EXAMPLES
Does

Smoking

Cause

Lung cancer ?

Does

Nursing care

Cause

Rapid recovery ?

Does

Drug (a)

Cause

Improvement ?

Cause

Effect

Independent variable

Dependent variable

EXAMPLES
Extraneous
Variables

## Variable that confound the relationship

between the dependent and independent
variables, thus it needs to be controlled.
E.g., "air pollution" is an extraneous
variable interferes with studying the
relationship between smoking
"independent variable" and lung cancer
"dependent variable".
10

DESIGN

Active variables
Variables

## that do not pre-exist, so, the researcher has to

create them.
These variables can be manipulated, changed or
controlled.

Attribute Variables
A

## pre-existing characteristic or attribute which the

researcher simply observes and measures.
These variables cannot be manipulated, changed or
controlled
11

EXAMPLE

## Study designed to measure the effectiveness of

three teaching models A,B,C
Researcher may change the teaching model
No control on the characteristics of the student
population age, gender or motivation to study

12

VARIABLES MEASUREMENT
VIEWPOINT
Categorical Variables (Qualitative)
Continuous Variables (Quantitative)

13

VARIABLES MEASUREMENT
VIEWPOINT
Categorical Variables
Measured on nominal scales
Two types

Dichotomous

Variables

## Vary in only two values.

E.g. alive or dead, day or night etc.

Polytomous

Variables

## More than two categories

E.g. Religion Muslim, Christian, Jew

14

VARIABLES MEASUREMENT
VIEWPOINT
Continuous Variables
Continuity in measurement take any value on
the scale on which they are measured
E.g. age, income etc.

15

Hypothesis

16

HYPOTHESIS

Hypothesis
Brings

## Possible to conduct a study without hypothesis as

well
Hypothesis how to construct
Arise

## from hunches or educated guesses

17

HYPOTHESIS - EXAMPLES

Hunch

## Horse#6 will win

Hunch is true or false Only after the race

Distribution of smokers
Hunch

## more male smokers at your workplace than

female smokers
Conclude hunch was right or wrong

18

HYPOTHESIS - EXAMPLES

Public health
A

## disease is very common in people coming

from a specific sub-group of population
To find every possible cause enormous
time and resources
Narrow down based on your study identify
the most probable cause e.g. contaminated
water
Perform a study collect information to
Verificiation hunch correct or not
19

HYPOTHESIS - EXAMPLES

In example 1
Waited

In example 2 & 3
Designed

## a study to test the validity of your hunch

20

HYPOTHESIS
Researcher does not know about a phenomenon,
situation or a condition
But does have a hunch, assumption or guess
Conclude through verification
Hunch may be

Right

Wrong

Partially

right

21

HYPOTHESIS - DEFINITIONS

## A tentative statement about something, the

validity of which is usually unknown
A proposition that is stated in a testable form
and that predicts a particular relationship
between two or more variables.
A hypothesis is written in such a way that it can
be proven or disproven by valid and reliable data
it is in order to obtain these data that we
perform our study.

22

23

HYPOTHESIS - CONSIDERATIONS

clear
No

## ambiguity in the hypothesis makes verification

difficult
Unidimensional should test one relationship at a
time
Must be familiair with the subject area (literature
review) before suggesting the hypothesis

24

HYPOTHESIS - CONSIDERATIONS
The average age of male students in the
class is higher than that of female students
Clear
Specific
Testable

25

HYPOTHESIS - CONSIDERATIONS
Suicide rates vary inversely with social
cohesion
Clear
Specific
Testable?

Difficult

## What is social cohesion, how to measure it.

26

HYPOTHESIS - CONSIDERATIONS

Data

## collection and analysis

Hypothesis cannot be tested?
May forumulate hypothesis for which methods of
verification not available

Expressed

## in terms that can be measured

27

TYPE OF HYPOTHESIS

Categories of hypothesis
Research

hypothesis

want to test
Alternate

hypothesis

## Specify the relationship that will be

considered as true in case the research
hypothesis proves to be wrong.
28

WAYS OF FORMULATING
HYPOTHESIS

## There is no significant difference in the proportion

of male and female smokers in the study population
A greater proportion of females than males are
smokers in the study population
A total of 60% of females and 30% of males in the
study population are smokers
There are twice as many female smokers as male
smokers in the study population

29

WAYS OF FORMULATING
HYPOTHESIS

Hypothesis of No Difference
When

## you formulate a hypothesis stipulating that

there is no difference between two situations, groups
or outcomes

There

## is no significant difference in the proportion of

male and female smokers in the study population

30

WAYS OF FORMULATING
HYPOTHESIS

Hypothesis of Difference
A

## hypothesis in which a researcher stipulates that there

will be a difference but does not specify its magnitude

## greater proportion of females than males are smokers

in the study population

31

WAYS OF FORMULATING
HYPOTHESIS

Hypothesis of Point-Prevalence
A

## researcher has enough knowledge about the

behaviour/situation

Able

## total of 60% of females and 30% of males in the

study population are smokers

32

WAYS OF FORMULATING
HYPOTHESIS

Hypothesis of Association
Expressed
Twice

as a relationship

## as many female smokers as male smokers

33

HYPOTHESIS TESTING

Hypothesis testing - H0
Null

hypothesis

## Usually corresponds to a default "state of nature", for example

"this person is healthy", "this accused is not guilty" or "this
product is not broken".

Alternate

hypothesis

## Negation of null hypothesis, for example, "this person is not

healthy", "this accused is guilty" or "this product is broken ".

Errors

## depend directly on null hypothesis.

34

HYPOTHESIS TESTING
True state of nature

H0 is True

H0 is False

Reject H0
Accept H0

35

HYPOTHESIS TESTING

## True state of nature

H0 is True

H0 is False

Reject H0

Type I error

Correct Decision

Accept H0

Correct Decision

Type II error

36

Reject H0
Accept H
HYPOTHESIS TESTING

H0 is True

H0 is False

Type I error

Correct Decision

Correct Decision

Type II error

## H0 = This person is healthy

Telling the person that he is sick when infact he was healthy Type I error
Telling the person that he is sick when infact he was sick

Correct

Telling the person that he is healthy when infact he was sick Type II error
Telling the person that he is healthy when infact he was healthy

Correct

## Traditionally probability of type I errors is denoted

by and that of type II errors by

37

HYPOTHESIS TESTING

H0 = Defendent is Innocent

38

Innocent

Terrorist

Terrorist

False positive

True positive

Innocent

True Negative

False negative

39

True Positives

False Negative

False Positives

## True Negative (Rest of the image)

40

PERFORMANCE MEASURES
Recall

TP
R
TP FN
Precision

TP
P
TP FP

F measure
Precision . Recall
F 2.
Precision + Recall

41

TP
P
TP FP

TP
TP FN

## EXAMPLE: FACE DETECTION

How many faces
were detected out
of total?

Did system
detected extra
objects other
than faces?

## Precision = 3/6 = 50%

42

EXAMPLE - BIOMETRICS

Finger

Enrollment
Enroll

## all the authorized users take their finger prints,

facial images or iris scans etc.

Validation
A

person arrives
Take data (finger print, iris, face)
Compare with database
If matched with an individual Allow
Else - Decline

43

EXAMPLE - BIOMETRICS
Enrollment

## What kind of errors the system can make?

44

http://www.idteck.com/support/biometrics.asp

EXAMPLE

## The FRR is the frequency that an

authorized person is rejected access

## The FAR is the frequency that a non authorized

person is accepted as authorized

45

EXAMPLE - BIOMETRICS
Challenge
How

## to find a similarity threshold value for

acceptance/rejection
Find system response to a large number of
inquires from authorized as well as unauthorized
users.
Record similarity scores of authorized and
unauthorized cases
Plot respective histograms/distributions
46

EXAMPLE - BIOMETRICS

47

EXAMPLE - BIOMETRICS

48

EXAMPLE - BIOMETRICS

49

EXAMPLE - BIOMETRICS

50

## Move the decision boundary (threshold) to the right

FAR will decrease and FRR will increase

51

## Move the decision boundary (threshold) to left

FAR will increase and FRR will decrease

52

## Which boundary to chose?

Depends upon your application Which errors are less serious

53

## HOW TO QUANTIFY SYSTEM

PERFORMANCE
On different thresholds system has different
values of FAR and FRR
If some one asks you what is the performance of

54

## HOW TO QUANTIFY SYSTEM

PERFORMANCE
Equal Error Rate - EER
Change the value of threshold and plot FAR and
FRR
The point where both are equal is the EER

55

PERFORMANCE

## The Receiver Operating Characteristic (ROC)

Curve

High security
cannot
afford FAR
Balance
User comfort Lesser
False Rejections

56

REFERENCES

## Research Methodology, Ranjit Kumar, Chapter 6

http://en.wikipedia.org/wiki/Type_I_and_type_II_errors

http://www.intuitor.com/statistics/T1T2Errors.html

http://www.fingerprint-it.com

http://fingerchip.pagesperso-orange.fr

57