Sunteți pe pagina 1din 29

DATA COLLECTION AND

PRESENTATION
MATH142-3
Engineering Data Analysis
Course Outcome
Compute the probability distribution of a random variable for
both discrete and continuous data.
Apply statistical methods in the analysis of data.

Design experiments involving several factors.


Learning Objectives
• At the end of the lesson the • Differentiate the various
students are expected to sampling Techniques
• Differentiate the different • Plan a data gathering strategy
sources of Data for specific research problem.
• Identify the different ways of • Propose a method of gathering
gathering Data data from a research problem.
• Apply the suited Interview,
Observation or Experiment
based on its advantage and
disadvantages.
Introduction to Data Analysis
DATA
Set of values with respect to a variable

QUALITATIVE DATA
data for description without measurement
Examples: student attitudes towards school,
attitudes towards exam cheating
friendliness of students to teachers

QUANTITATIVE DATA
Data that is expressed in numbers and summarized using statistics
to give meaningful information.
Examples: heights, weights, or ages of students.
Obtaining Data
PRIMARY SOURCES OF DATA SECONDARY SOURCES OF DATA
• When the data is obtained directly from • When the data is collected by another researcher
individuals, objects or processes. or agency that initially gathered it makes it
available
Advantages:
Tailored fit to the research objectives, no Examples: National Statistics Office, stock prices,
customizations needed to make the data usable Advantages:
Reliable because you control how the data is Save time and money
collected and can monitor its quality.
Relative ease of access (publications, government
Resources is allocated in gathering only required agencies, data aggregation websites and blogs)
data Eliminates effort duplication
Proprietary, so you enjoy advantages over those Disadvantages
who cannot access the data.
Might be Incomplete.
Disadvantages
You cannot verify the accuracy of secondary data
Costlier, Requires more time
Documentation may be incomplete or missing.
Methods of Collecting Data
• A survey is a data collection method where you select a sample
of respondents from a large population in order to gather
information about that population. The process of identifying
individuals from the population who you will interview is known
as sampling.
Interview
Observation
Experimentation
Interviews
In-person Interviewing Telephone Interviewing
• Face to Face Question and Answer • Requires calling the respondents over the phone and
Advantages: interview them.
(1) excellent response rates Advantages:
(2) enables you to conduct interviews that take a longer (1) quickly collecting data
amount of time.
(2) cheaper than in-person interviewing.
(3) follow-up questions to responses that are not clear.
Disadvantages:
Disadvantages:
(1) trustworthy of respondents.
(1)expensive and takes more time because of interviewer
training, transport, and remuneration. (2) limit the amount of data that can be collected on a
phone interview.
(2) access to some areas of a population, such as
neighborhoods prone to crime, cannot be accessed which
may result in bias.
Interviews
Online Interviewing
• Researchers send an email inviting respondents to Focus Groups
participate in an online survey. • The researcher identify a group of 6 to 10 people with
Advantages: similar characteristics. A moderator then guides a
(1) low-cost way high volume way of interviewing. discussion to identify attitudes and experiences of the
(2) anonymity; group. The responses are captured by video recording,
Disadvantages: voice recording or writing—this is the data you will
(1) not getting a representative sample. analyze to answer your research questions.
(2) No clarification on unclear responses. Advantages:
Mailed Questionnaire (1) fewer resources and time
• Researcher send a printed questionnaire to the postal (2) may ask clarifications to unclear responses.
address of the respondent. The participants fill in the Disadvantages:
questionnaire and mail it back. (1) the sample selected may not represent the population
Advantage: accurately.
(1) obtaining information that respondents may be (2) dominant participants can influence the responses of
unwilling to give when interviewing in person. others.
Disadvantages:
(1) low response rate.
(2) delays or loss of mail
(3) low literacy may inhibit clarifications on responses.
Observational Data Collection Methods
There are four types of observational methods that (B) CASE-CONTROL
are available to you as a researcher: The researcher creates cases and controls and
observe them. A case has been exposed to a
phenomenon of interest while a control has not.
(A) CROSS-SECTIONAL
After identifying the cases and controls, researcher
collect data on observed relationships once. move back in time to observe how your event of
Advantage: cheaper and taking less time vs case- interest occurs in the two groups. This is why case-
control and cohort control studies are referred to as retrospective.
Disadvantage: time factor is being disregarded. Example: a medical researcher suspects a certain
type of cosmetic is causing skin cancer. You recruit
people who have used a cosmetic, the cases, and
those who have not used the cosmetic, the controls.
You request participants to remember the type of
cosmetic and the frequency of its use.
Advantage: cheaper and less time vs cohort method.
Disadvantages: Relies on respondents memory
(recall bias).
Observational Data Collection Methods
There are four types of observational methods that
are available to you as a researcher: (D) ECOLOGICAL.
(C) COHORT
When researchers are more interested in studying a
Researcher follow people with similar characteristics population instead of individuals.
over a period. Example: say you are interested in lung cancer rates
Advantage: collecting data on occurrences that in Manila and Cebu. You obtain number of cancer
happen over a long period. cases per 1000 people for each city and compare
Disadvantage: costly and requiring more time. not them. You can then hypothesize possible causes of
advisable for events happening rarely. differences between the two cities.
Advantage: save time and money because data is
already available
Disadvantage: may lead you to infer population
relationships that do not exist.
Experimental Research Design
• Pre-experimental Designs • Quasi-experimental Designs
• One-shot Case Study (Treatment group only) • Non-random assignment to “treatment” & control
• One Group Pretest to Posttest Design—measures group observed
of change • Nonequivalent-Control Group Design
• Intact Group Comparison at posttest • Time-Series Design
• Experimental Designs • Ex-Post Facto Designs
• Random assignment to “treatment” & control group • Statistical controls for comparing alternative
• Posttest Only Control Group “treatments”
• Pretest-Posttest Control Group • Correlational Design
• Factorial • Criterion-Group Design
Experimental and Control Groups
• An experimental group is the group that receives an
experimental procedure or a test sample.
• A control group is a group separated from the rest of the
experiment such that the independent variable being tested
cannot influence the results. This isolates the independent
variable's effects on the experiment and can help rule out
alternate explanations of the experimental results.
Pre Experimental Research Design
ONE-SHOT CASE STUDY INTACT GROUP COMPARISON
Treatment group only at posttest

X O G1 X O

G2 Control O
Example:
“X” is a new personnel policy, a job satisfaction measurement is
taken, and then a response is observed Example:
G1 receives the treatment, G2 does not; then a job
ONE GROUP PRETEST TO POSTTEST DESIGN satisfaction measurement is taken and observed (in
Measures change this case G1 and G2 may represent two different
business units). The study does not include any pre-
O1 X O2 testing and therefore any difference between the two
groups prior to the study are unknown.
Example:
A job satisfaction measurement is taken before and after treatment
“X” is applied
Experimental Research Design
Random assignment to “treatment” & control group
Posttest Only Control Group Factorial
X2
X O
X1 O
Control O
X2
Pretest-Posttest Control Group
Control O
O1 X O2

O1 Control O2
Example:
A job satisfaction measurement is taken after treatment “X1” is applied or not and graveyard shift “X2” is implemented
Quasi-Experimental Research Design
Non-random assignment to “treatment” & control group observed.
Include one or more control groups.

Nonequivalent-Control Group Design Time-Series Design

G1 O1 X O2 …
… X
G2 O1 Control O2
O1 O2
Subjects receive a pretest (O1) treatment or non- Multiple observations are taken before and after a treatment is
treatment and then receive a posttest (O2) administered. Pretreatment observations establish a control
group baseline. Post-treatment observations establish a
consistent change in response.
Ex-Post Facto Design
Statistical controls for comparing “treatment” and “control” (relationships between two
variables). Called ex-facto because the researcher arrives after the treatment has been
administered.
Correlational Design Criterion-Group Design

G1 O

O1 O2

G2 O

SAT scores (O1) and GPA (O2) are collected. Group 2 is compared to Group 1
Ethical Considerations on Data Gathering
QUESTIONNAIRES:
• All information gathered must be ANONYMOUS. INTERVIEWS:
• The questions should be carefully written to • CONFIDENTIALITY - remember that an interview is
AVOID BIAS and should NOT be opinionated or a personal interaction and everything recorded
MISLEADING. needs to remain confidential to protect the
• Participants have the right to not complete any participant.
particular items in the questionnaire and to • Think about who else will be reading/verifying the
WITHDRAW AT ANY POINT during the study. data and transcriptions. Do they need to sign a
CONFIDENTIALITY AGREEMENT?
• Can what is being said and what is being recorded,
OBSERVATIONS: be MISINTERPRETED? Remember to make clear
• In OVERT observation, it needs to be considered and detailed notes to avoid confusion.
that the presence of an observer may be
THREATENING and may exert an influence on
behaviour.
• As for COVERT observation, it is the violation of
the principle of INFORMED CONSENT and hence
should be used only in situations where there is
no other alternative method.
Likert Scale
A Likert scale is a scale commonly involved in research that employs
questionnaires. It is the most widely used approach to scaling responses in
survey research, such that the term (or more accurately the Likert-type
scale) is often used interchangeably with rating scale, although there are
other types of rating scales. The scale is named after its inventor, Rensis
Likert.
Population vs Sample Sample
Population
Population
• Totality of all observations from which the Example:
dataset is acquired There are 2500 students enrolled in
• All of the possible events should be MATH146.
considered. Population: Students of MATH146
• Variable that describes population is known Parameter: 2500 (population size)
as parameter.
Sample
• Small group taken from the population Example:
Of the 5,786 students enrolled in
• A group heterogeneous as possible taken
MATH10-1, 3,456 are females.
from the large group to represent the
Sample: Female students in MATH10-1
population
Statistic: 3,456 (sample size)
• Variable that describes sample is known as
statistic.
Statistics
Data can be used in different ways. The body of knowledge called
statistics is sometimes divided into two main areas, depending on
how data are used. The two areas are the Descriptive statistics
and Inferential statistics.

1. Descriptive statistics is used to summarize or describe the


important characteristics of a known set of population data.
2. Inferential statistics involves the use of sample data to make
inferences about a population. It goes beyond mere
description. This is the focus of quantitative research.
Fields of Statistics
Indicate whether each of the following statements is a descriptive
or inferential statistics:
Descriptive 1. Last school year, there are 2000 students from Mapua
SHS Intramuros and 800 students Mapua SHS Makati.
Inferential 2. A recent study showed that garlic can repel mosquitos.

Descriptive
3. Cigarette smoking is associated with 25% of the 2500
new cases of lung cancer in 2005.
Inferential 4. A survey says that 1 out of 50 Filipinos is a member of
a fitness center.
Sampling and Sampling Techniques

Target
Pop.
(N)

Sample (n)
• Effective sampling produces a n which is representative of N
• Note: n is only ever representative of the N it was drawn from,
i.e. not necessarily the general population.
Sampling
• Sampling is the basis for inferential statistics.
• A sample is a segment of a population. It is, therefore, expected to reflect
the population. By studying the characteristics of the sample one can make
inferences about the population. There are several reasons why we take a part of
the population to study rather than taking a full census of the population. These
are:
• Sampling takes less time.
• Samples cost less.
• Samples are more accurate. Sample observations are usually of higher
quality because they are better screened for errors in measurement and
for duplication and misclassifications;
• Samples can be destroyed to gain information about quality (destructive
sampling).
• A random sample is a sample obtained in such a way that each possible
sample of fixed size n has an equal probability of being selected.
Sampling Techniques

Two types of sampling:


1. Probability Sampling - A sample in which
each element of the population has a
known and nonzero chance of being
selected is called a probability sample.
2. Non-Probability Sampling – members of
the population are not given equal
chances of being selected.

Set of Random Numbers


Probability Sampling Techniques
1. Simple random sampling – samples are drawn from a population
using a method such as the lottery method or using computer or
calculator to generate random numbers.
2. Stratified sampling – subdivide the population into at
least two different subpopulations (or strata) that share the
same characteristics (such as gender), and then draw a sample
from each stratum.
3. Systematic sampling – choose some starting point and then select
every kth element in the population.
4. Cluster sampling – divide the population area into sections (or
clusters), randomly select a few of those sections, and then choose
all the members from the selected sections.
Non-Probability Sampling Techniques
1. Convenience, haphazard or accidental sampling - members of the
population are chosen based on their relative ease of access. To sample
friends, co-workers, or shoppers at a single mall, are all examples of
convenience sampling. Such samples are biased because researchers may
unconsciously approach some kinds of respondents and avoid others and
respondents who volunteer for a study may differ in unknown but
important ways from others.
2. Snowball sampling - The first respondent refers an acquaintance. The
friend also refers a friend, and so on. Such samples are biased because they
give people with more social connections an unknown but higher chance of
selection but lead to higher response rates.
3. Judgmental sampling or purposive sampling - The researcher chooses
the sample based on who they think would be appropriate for the study.
This is used primarily when there is a limited number of people that have
expertise in the area being researched, or when the interest of the
research is on a specific field or a small group.
Purposive Sampling Techniques
• Different types of purposive sampling include:
1.Deviant case - The researcher obtains cases that
substantially differ from the dominant pattern (a special
type of purposive sample). The case is selected in order to
obtain information on unusual cases that can be specially
problematic or specially good.
2.Case study - The research is limited to one group, often with
a similar characteristic or of small size.
3.Quota Sampling - A quota is established (e.g. 65% women)
and researchers are free to choose any respondent they
wish as long as the quota is met.
Sampling Techniques
8.A market researcher for the Ford Motor company
Identify the type of sampling used: interview all drivers on each of 15 randomly selected
1. Motorola selects every 50th pager from the city blocks.
assembly line for careful testing and analysis. 9. A medical researcher from UST interviews all
2. A reporter writes the name of each senator on a leukemia patients in each of 5 randomly selected
separate card, shuffles the cards, and then draws 5 Metro Manila cities.
names. 10. A reporter for Business Week magazine interviews every
3. A dean at School of EE-ECE-COE surveys all students 50th chief executive officer in that magazine’s listing of
from each of 12 randomly selected classes. CEOs of the 1000 companies with the highest stock
4. A dean at School of Architecture selects 15 men and market values.
15 women from each of 4 classes. 11. A reporter for Business Week magazine obtained
5. Glamour Magazine obtains sample data from number listing of the 1000 companies with the
readers who decide to mail in a questionnaire highest stock market values, uses a computer to
printed in the latest issue. generate 20 random numbers between 1 and 1000,
6. A BIR auditor randomly selects 15 taxpayers with and then interviews the chief executive officers of
less than P250,000 in gross income and 15 companies corresponding to these numbers.
taxpayers with gross income with at least P250,000. 12. In conducting research for a psychology course, a
7. ABS-CBN News polls 750 men and 750 women about student at PLM interviews 40 students who are leaving
their use of credit cards. the canteen.
References
• https://www.albert.io/blog/data-collection-methods-statistics/
• Bluman, A. G. (2009). Elementary statistics: A step by step approach.
New York: McGraw-Hill Higher Education.
• Koursaris, C.M. Embry-Riddle Aeronautical University. Research and
Data (Part 1). 2013.
• Walpole, R. E., Myers, R. H., Myers, S. L., & Ye, K. (2012). Probability
& statistics for engineers & scientists (9th edition.). Boston: Prentice
Hall.
• Wackerly, Dennis D., William Mendenhall, and Richard L
Scheaffer. Mathematical Statistics With Applications. 7th ed. ;
international student ed. Belmont ; London: Thomson Brooks/Cole,
2008.

S-ar putea să vă placă și