Sunteți pe pagina 1din 59

What is a variable?

Characteristic or attribute
Can assume different values

DISCRETE VS. CONTINUOUS


DISCRETE
Finite, can be counted
Counting or enumeration
Examples: Sex, STFAP Bracket

CONTINUOUS
Infinitely many values
Real line
Examples: Height, Life span

QUALITATIVE VS.
QUANTITATIVE

QUALITATIVE
Categorical
Descriptive

QUANTITATIVE
Numerical
Represents amount or quantitiy

WHAT IS MEASUREMENT?
Process
Determine value or label of variable
Based on observations

WHY MEASURE?
Important
Determine appropriate tool

NOMINAL
ORDINAL
INTERVAL
RATIO

Numbers serve as classification


Categories are
Distinct
Mutually exclusive
Exhaustive

Weakest level of measurement


No absolute value
No operations permissible
Examples
Sex
Civil Status
Enrollment Status

Classification + Ranking
Arrange categories according to magnitude
No exact measurement between two orders
Examples

Shirt sizes
Awards
Year level
Evaluation scores

Contains all properties of ordinal and nominal


Equal intervals are present
Multiplication and division are not possible
Fixed unit of measure
No absolute zero; zero does not indicate absence of
characteristic
Examples
IQ
Calendar dates
Temperature reading

Contains all properties of ordinal, nominal, and interval levels


Has magnitude
Has fixed unit of measurement
Has an absolute zero
Strongest level of measurement
Any arithmetic operation is permissible
Examples
Weekly allowance
Speed of a car in kph
Rainfall in cm

General misconception: always collect data


Identify all pertinent data already available
Previous studies
Data compiled by agencies
NSO
NSCB
BAS
BSP

Examine
Source
Availability
Scope

What is a survey?
Data collection method
Asking questions

People who answer questions


are RESPONDENTS
Sample survey
More common
Respondents: chosen objectively

Several ways for communication

Personal interviews
Telephone
Self-administered
Online surveys
Focus group discussion

Weigh pros and cons


Accuracy of data obtained
Cost and time
Ability of method to obtain data
needed

Suitable for many types of problems


Usually provides most accurate and complete responses
Ability to read and write not necessary (for respondents)
Most expensive and time consuming
Relatively high response rate
INTERVIEWER is the most crucial
Reliability of data should be ensured
Training
Editing
Field
Central

PROS
Shares some advantage of
personal interview:
INTERVIEWER
Easier to supervise work of
interviewers
Cost- and time-efficient

CONS
Respondents that can be
reached are limited
Time for interview is limited

PROS
Cost-efficient
Convenient for respondents
Own time and pace
Freedom in expressing self

CONS
More prone to
misinterpretation of questions
More prone to vague
answers
Response rate is low
Delay in responses
May be
remedied

Closely linked to self-administered questionnaires


Same pros and cons as self-administered questionnaires
Additional con
Internet forms can be manipulated

In-depth discussion among participants


Small group
Moderator
Data obtained
Sentiments
Ideas
Attitudes

Results not always conclusive


Mostly used to
Formulate hypotheses
Explain results of previous studies

Method of collecting data


Direct human intervention
Determine effects of a
certain treatment
Control group and a
treatment group
Dependent, explanatory, and
extraneous variables
Also has disadvantages
Uncommon
Volunteer subjects
Applicability of results

Basic steps
1.
2.
3.
4.
5.
6.

Specify the response variable


and explanatory variables
Identify possible extraneous
variables
Determine how to control
extraneous variables
Assign treatment at random
and apply assigned
treatment
Measure response variable
for each subject at the end of
experiment
Analyze the data

Method of data collection


Recording observations
As phenomenon happens
Useful for
Studying reactions and behaviors
Subjects unable to express themselves

Major approaches
Duration recording (how long
behavior lasts)
Frequency count recording (how often
behavior happens)
Latency recording (length of time
between stimulus and fist occurrence)
Interval recording (partition time;
number of intervals behavior occurs)
Time sampling (checks for behavior at
a specified time)

Two types
Participant
Non-participant

Use of observation is limited to


characteristics that can be
observed
Usually more successful than
surveys (nonverbal behavior)
More successful than experiments in
getting realistic data
Objective sampling procedures are
difficult to use

Internal data generated from operation and administration


Data generated from registration
Computer simulation

Commonly used in surveys


Statistical sampling theory
Said to be accurate
Based on probability theory

Sample
Subset of population of
interest
Should be representative
Homogenous population not
always a requirement
Range of data is important
Probability theory

Sample survey
Use depends on
Type of problem
Population of interest
Amount of resources

Sampling design
High precision, low cost
Select the design which meets
Budget
Time
Precision requirement

Population
Target population. The population
about which information is desired
Sampled or sampling population.
The population from which sample is
actually obtained.

Sampling Frame. List of units or


members in a sampling
population
Sampling Unit. Member or unit
of the sampling population
Survey
Census
Sample survey

Bias. Systematic tendency for


sample to misrepresent population
Precision
Repeated sampling
Values tend to be widely spread out

Probability. Measure of relative


occurrence or non-occurrence of
one of the possible outcomes of an
experiment or procedure
Sampling Error. Due to errors
while sampling
Non Sampling Error. Due to other
factors
Total Error. Deviation of am
estimate from the true value it is
supposed to estimate.

More economical
Accomplished faster
Wider scope
More accurate
Most feasible method

PROBABILITY SAMPLING
Each unit has a known,
nonzero probability of being
in the sample
Rules and procedures present
for sample selection and
estimation
Objective: make inferences

NON-PROBABILITY SAMPLING
Probabilities of selection are not
specified
Some elements may not have a
chance to be in the sample
No objective way of assessing
results obtained
Pros
Convenience
Economical
Easy

Cons
Error in sampling cant be measured
Sample may not be representative

Accidental/Convenience Sampling. Whatever items or units


that come to hand are used as sample
Judgment/Purposive Sampling. Sample selected in
accordance with an experts subjective judgment. We choose
only those who best meet the purpose of the study.
Quota Sampling. Interviewer required to interview a certain
number of persons with a given set of characteristics.
Snowball Sampling

Simple Random Sampling


Stratified Sampling
Systematic Sampling
Cluster Sampling
Multi-stage Sampling

Random sample
n observations
Each subset of n observations
of the population has the
same chance of being
selected

With replacement or
without replacement
Appropriate for
homogenous populations

Advantages
Theory involved is much
easier to understand
Estimation methods are
simple and easy

Disadvantages
Sample chosen may be
widely spread
Population list (frame) is
needed
May not be applicable for
heterogeneous populations

Population should be
divided or stratified into
homogenous groups
Select simple random
sample from each
subgroup
Strata
Related to topic being
studied
Different from each other,
homogeneous within

Advantages
May increase precision of
estimates
Comprehensive analysis
Convenient

Disadvantages
List for each stratum is
needed
Additional prior info is
needed
Population
Sub-populations

Take every kth unit from an


ordered population
First unit is selected at
random
k is the sampling interval

Advantages
Easy to draw sample
Possible to select sample
without frame
Sample is spread evenly
Likely to give more precise
estimates compared to SRS

Disadvantage
Sample may consist of only
similar types if there are
periodic regularities in the list

Sample of distinct groups


or clusters of smaller units
(elements)
Clusters are mutually
exclusive
Each cluster is
heterogeneous
aka Area Sample
M is the size of the cluster
N is the number of clusters

Advantages
Population list not needed
Only list of clusters needed
Reduced transportation cost

Disadvantages
Difficult estimation
procedures
Costs and problems or
statistical analyses are
greater

Sampling accomplished
on two or more steps
First stage primary units
second stage or
secondary units
Further steps may be
added

Advantages
Reduced listing cost
Reduced transportation
cost

Disadvantages
Estimation procedures are
difficult
Much planning is needed

Consider efficiency of the scheme


Larger sample more confidence in conclusions
Avoid bias

Dependent on
Population
Nature
Size
Purpose of study

At least minimum sample size to properly represent population


Keep in mind
Cost
Reliability of estimates obtained
Little to no variation, no reason to take large sample just because
population is large
Greater variation, larger sample size; balanced with cost

Textual
Tabular
Graphical

Put important figures in the text of the report


Highlight significant figures
Should give reader clearer understanding of the significance of
the figures about conclusions made in the research problem
Not advisable for large masses of data

Political crises in the Middle East and in North Africa resulted in zero growth in Net
Primary Income (NPI). This zero growth, in turn, impeded the growth in gross national
income (GNI) from 11.5 percent the previous year to 3.6 percent this quarter.
In terms of seasonally adjusted data, GDP grew by 1.9 percent whereas GNI grew,
albeit at a slower pace, by 0.9 percent. Growth for the Agriculture, Hunting, Fishery, and
Forestry sector was recorded at 0.9 percent. This continued growth is attributed to the
rebound of production of palay, sugarcane, and corn. Meanwhile, the strong
performance of Manufacturing, Mining & Quarrying, and Construction neutralized the
contraction of Electricity, Gas, and Water Supply. As a result, the Industry sector grew
by 3.2 percent this quarter, from its 2.1 percent gain from the fourth quarter of 2010.
Services sector likewise perked up in the first quarter. The sector grew by 1.3 percent
after a 0.7 percent decline in the fourth quarter of 2010. All service subsectors posted
positive growth, apart from Public Administration and Defense.
As projected population reached 95.1 million, per capita GDP rose by 2.9 percent.
Per capita GNI and Household Final Consumption Expenditure also grew by 1.7 and 2.9
percent, respectively.

Most common method of data presentation


Allows for comparison and pattern/relationship recognition
Rows and columns
Minimal discussion or lengthy explanations in text
Three types
Leader work
Text tabulation
Formal statistical table

Leader Work

Simplest layout
No table title or column headings
Within text; support
Descriptive or introductory
statement is needed

Parts of a formal statistical table

Heading

Table number
Table title
Head note

Box Head
Spanner head
Column heading
panel

Text Tabulation
Has column headings and table
borders
Not table title and number
Introductory statement needed

Formal Statistical Table


Most complete
Can stand alone
Has parts

Stub

Stub head
Center head
Row caption
Block

Field
Line
Column
Cell
Footnote
Source note

1
4

5
3
1 Heading

2 Stub

4 Boxhead

5 Field

3 Notes

Portrays numerical figures


or relationships among
variables in pictorial form
General picture
Good chart must be

Accurate
Clear
Simple
Professional
Well-designed

Types

Line chart
Vertical bar chart
Horizontal bar chart
Pictograph
Pie chart
Statistical map

Chart title
Coordinate axes
Point of origin
Scale divisions
Grid lines or coordinate lines
Scale figures
Scale labels or legends
Curves
Curve legends
Footnote
Source note

Figure 1. Growth of Exports & Imports


Year-on-year growth rates in percent (%) - 2009-Q1 to 2011-Q1
(In constant 2000 prices)

30.00
25.00

Exports

Imports

20.00

15.00
10.00
5.00
-

(5.00)

2010:1 2010:2 2010:3 2010:4 2011:1 2011:2 2011:3 2011:4 2012:1 2012:2

(10.00)
(15.00)
Source: NSCB
Note: Previous issues use data with 1985 as the base year; starting 2011, the NSCB uses 2000 as the new
base year.

Organization facilitates analyses and interpretation


Data characterization
Raw data
Array
Frequency distribution table

Also known as ungrouped or unclassified data


Data in its original form
Not organized yet
Recorded in the order observed

Arrangement of data according to magnitude


Ascending or descending
Advantages
Easier to detect smallest and largest values
Easy to infer concentration of data values

Disadvantages
Inconvenient as data becomes voluminous
Does not picture clearly the distribution for large masses of data

Summarized table
Classes are distinct values or intervals with frequency counts
Two types
Single-value grouping. Frequency count of observed values where
classes are distinct values.
Grouping by class intervals. Frequency count of observed values where
classes are intervals.

Class interval. Numbers defining a class


Class frequency. Number of observations falling under a class
interval
Class limits. End numbers of a class interval
Lower class limit (LCL)
Upper class limit (UCL)

Open class interval. Class interval with either no LCL or UCL


Class boundaries. True class limits; number of decimal place is one
more than the class limit.
Class size. Size of the class interval; difference between two
successive LCLs or UCLs
Class mark. Midpoint of a class interval
Modal class. Class interval with the highest frequency

1. Determine adequate number of classes, K.


K = 1+3.322log10n, n is the total number of observations

2. Determine the range, R.


R = maximum - minimum

3. Calculate the approximate class size, C.


C = R/K

4. Determine the class size C by rounding off C to a number that


is easy to work with.
5. List the required number of class intervals

Less than cumulative frequency distribution (<CF)


Number of observations with values smaller than the UPPER class
boundary

Greater than cumulative frequency distribution (>CF)


Number of observations with values greater than the LOWER class
boundary

To get the relative frequency, divide the frequency of each


class interval by the total number of observations.
Sum of the relative frequency column is equal to 1.
Simply multiply the relative frequency by 100 to get the
relative frequency percentage

Given the following data construct an FDT

16500

10850

11850

7500

13500

23500

16500

10500

11000

12500

4500

5250

16500

9950

13950

18950

24000

15000

10000

9900

Three ways
Frequency histogram
Shape of distribution
Class boundaries on the
horizontal axis, class
frequencies on the vertical
axis

Frequency polygon
Frequencies on the vertical,
class marks on horizontal
Closed shape

Ogives
For less than or greater than
cumulative frequencies
Less than ogive less than
CF
Greater than ogive
greater than CF

Alternative method to describe a data set


Histogram-like picture
Allows for retention of data
Partly tabular, partly graphical
Observations are divided into two: STEM and LEAF
Types
Ordered
Split

List the stem values, in order, in a vertical column


Draw a vertical line to the right of the stem value
For each observation, record the leaf portion of that
observation in the row corresponding to the appropriate stem
Reorder leaves from lowest to highest within each stem row.
If the number of leaves appearing in each row is too large,
dived stem into two groupsone whose leaves are from 0 to 4,
and the other whose leaves are from 5 to 9.
Provide a key to stem-and-leaf coding so that the reader can
recreate actual measurements.

Given the following data for the price (in pesos) per gallon of a
sample of brands of sparkling mineral water sold in super
markets, construct a stem-and-leaf display
31

40

28

30

63

35

38

33

42

22

36

68

31

32

36

34

46

34

34

28

S-ar putea să vă placă și