Documente Academic
Documente Profesional
Documente Cultură
PRESENTATION
MATH142-3
Engineering Data Analysis
Course Outcome
Compute the probability distribution of a random variable for
both discrete and continuous data.
Apply statistical methods in the analysis of data.
QUALITATIVE DATA
data for description without measurement
Examples: student attitudes towards school,
attitudes towards exam cheating
friendliness of students to teachers
QUANTITATIVE DATA
Data that is expressed in numbers and summarized using statistics
to give meaningful information.
Examples: heights, weights, or ages of students.
Obtaining Data
PRIMARY SOURCES OF DATA SECONDARY SOURCES OF DATA
• When the data is obtained directly from • When the data is collected by another researcher
individuals, objects or processes. or agency that initially gathered it makes it
available
Advantages:
Tailored fit to the research objectives, no Examples: National Statistics Office, stock prices,
customizations needed to make the data usable Advantages:
Reliable because you control how the data is Save time and money
collected and can monitor its quality.
Relative ease of access (publications, government
Resources is allocated in gathering only required agencies, data aggregation websites and blogs)
data Eliminates effort duplication
Proprietary, so you enjoy advantages over those Disadvantages
who cannot access the data.
Might be Incomplete.
Disadvantages
You cannot verify the accuracy of secondary data
Costlier, Requires more time
Documentation may be incomplete or missing.
Methods of Collecting Data
• A survey is a data collection method where you select a sample
of respondents from a large population in order to gather
information about that population. The process of identifying
individuals from the population who you will interview is known
as sampling.
Interview
Observation
Experimentation
Interviews
In-person Interviewing Telephone Interviewing
• Face to Face Question and Answer • Requires calling the respondents over the phone and
Advantages: interview them.
(1) excellent response rates Advantages:
(2) enables you to conduct interviews that take a longer (1) quickly collecting data
amount of time.
(2) cheaper than in-person interviewing.
(3) follow-up questions to responses that are not clear.
Disadvantages:
Disadvantages:
(1) trustworthy of respondents.
(1)expensive and takes more time because of interviewer
training, transport, and remuneration. (2) limit the amount of data that can be collected on a
phone interview.
(2) access to some areas of a population, such as
neighborhoods prone to crime, cannot be accessed which
may result in bias.
Interviews
Online Interviewing
• Researchers send an email inviting respondents to Focus Groups
participate in an online survey. • The researcher identify a group of 6 to 10 people with
Advantages: similar characteristics. A moderator then guides a
(1) low-cost way high volume way of interviewing. discussion to identify attitudes and experiences of the
(2) anonymity; group. The responses are captured by video recording,
Disadvantages: voice recording or writing—this is the data you will
(1) not getting a representative sample. analyze to answer your research questions.
(2) No clarification on unclear responses. Advantages:
Mailed Questionnaire (1) fewer resources and time
• Researcher send a printed questionnaire to the postal (2) may ask clarifications to unclear responses.
address of the respondent. The participants fill in the Disadvantages:
questionnaire and mail it back. (1) the sample selected may not represent the population
Advantage: accurately.
(1) obtaining information that respondents may be (2) dominant participants can influence the responses of
unwilling to give when interviewing in person. others.
Disadvantages:
(1) low response rate.
(2) delays or loss of mail
(3) low literacy may inhibit clarifications on responses.
Observational Data Collection Methods
There are four types of observational methods that (B) CASE-CONTROL
are available to you as a researcher: The researcher creates cases and controls and
observe them. A case has been exposed to a
phenomenon of interest while a control has not.
(A) CROSS-SECTIONAL
After identifying the cases and controls, researcher
collect data on observed relationships once. move back in time to observe how your event of
Advantage: cheaper and taking less time vs case- interest occurs in the two groups. This is why case-
control and cohort control studies are referred to as retrospective.
Disadvantage: time factor is being disregarded. Example: a medical researcher suspects a certain
type of cosmetic is causing skin cancer. You recruit
people who have used a cosmetic, the cases, and
those who have not used the cosmetic, the controls.
You request participants to remember the type of
cosmetic and the frequency of its use.
Advantage: cheaper and less time vs cohort method.
Disadvantages: Relies on respondents memory
(recall bias).
Observational Data Collection Methods
There are four types of observational methods that
are available to you as a researcher: (D) ECOLOGICAL.
(C) COHORT
When researchers are more interested in studying a
Researcher follow people with similar characteristics population instead of individuals.
over a period. Example: say you are interested in lung cancer rates
Advantage: collecting data on occurrences that in Manila and Cebu. You obtain number of cancer
happen over a long period. cases per 1000 people for each city and compare
Disadvantage: costly and requiring more time. not them. You can then hypothesize possible causes of
advisable for events happening rarely. differences between the two cities.
Advantage: save time and money because data is
already available
Disadvantage: may lead you to infer population
relationships that do not exist.
Experimental Research Design
• Pre-experimental Designs • Quasi-experimental Designs
• One-shot Case Study (Treatment group only) • Non-random assignment to “treatment” & control
• One Group Pretest to Posttest Design—measures group observed
of change • Nonequivalent-Control Group Design
• Intact Group Comparison at posttest • Time-Series Design
• Experimental Designs • Ex-Post Facto Designs
• Random assignment to “treatment” & control group • Statistical controls for comparing alternative
• Posttest Only Control Group “treatments”
• Pretest-Posttest Control Group • Correlational Design
• Factorial • Criterion-Group Design
Experimental and Control Groups
• An experimental group is the group that receives an
experimental procedure or a test sample.
• A control group is a group separated from the rest of the
experiment such that the independent variable being tested
cannot influence the results. This isolates the independent
variable's effects on the experiment and can help rule out
alternate explanations of the experimental results.
Pre Experimental Research Design
ONE-SHOT CASE STUDY INTACT GROUP COMPARISON
Treatment group only at posttest
X O G1 X O
G2 Control O
Example:
“X” is a new personnel policy, a job satisfaction measurement is
taken, and then a response is observed Example:
G1 receives the treatment, G2 does not; then a job
ONE GROUP PRETEST TO POSTTEST DESIGN satisfaction measurement is taken and observed (in
Measures change this case G1 and G2 may represent two different
business units). The study does not include any pre-
O1 X O2 testing and therefore any difference between the two
groups prior to the study are unknown.
Example:
A job satisfaction measurement is taken before and after treatment
“X” is applied
Experimental Research Design
Random assignment to “treatment” & control group
Posttest Only Control Group Factorial
X2
X O
X1 O
Control O
X2
Pretest-Posttest Control Group
Control O
O1 X O2
O1 Control O2
Example:
A job satisfaction measurement is taken after treatment “X1” is applied or not and graveyard shift “X2” is implemented
Quasi-Experimental Research Design
Non-random assignment to “treatment” & control group observed.
Include one or more control groups.
G1 O1 X O2 …
… X
G2 O1 Control O2
O1 O2
Subjects receive a pretest (O1) treatment or non- Multiple observations are taken before and after a treatment is
treatment and then receive a posttest (O2) administered. Pretreatment observations establish a control
group baseline. Post-treatment observations establish a
consistent change in response.
Ex-Post Facto Design
Statistical controls for comparing “treatment” and “control” (relationships between two
variables). Called ex-facto because the researcher arrives after the treatment has been
administered.
Correlational Design Criterion-Group Design
G1 O
O1 O2
G2 O
SAT scores (O1) and GPA (O2) are collected. Group 2 is compared to Group 1
Ethical Considerations on Data Gathering
QUESTIONNAIRES:
• All information gathered must be ANONYMOUS. INTERVIEWS:
• The questions should be carefully written to • CONFIDENTIALITY - remember that an interview is
AVOID BIAS and should NOT be opinionated or a personal interaction and everything recorded
MISLEADING. needs to remain confidential to protect the
• Participants have the right to not complete any participant.
particular items in the questionnaire and to • Think about who else will be reading/verifying the
WITHDRAW AT ANY POINT during the study. data and transcriptions. Do they need to sign a
CONFIDENTIALITY AGREEMENT?
• Can what is being said and what is being recorded,
OBSERVATIONS: be MISINTERPRETED? Remember to make clear
• In OVERT observation, it needs to be considered and detailed notes to avoid confusion.
that the presence of an observer may be
THREATENING and may exert an influence on
behaviour.
• As for COVERT observation, it is the violation of
the principle of INFORMED CONSENT and hence
should be used only in situations where there is
no other alternative method.
Likert Scale
A Likert scale is a scale commonly involved in research that employs
questionnaires. It is the most widely used approach to scaling responses in
survey research, such that the term (or more accurately the Likert-type
scale) is often used interchangeably with rating scale, although there are
other types of rating scales. The scale is named after its inventor, Rensis
Likert.
Population vs Sample Sample
Population
Population
• Totality of all observations from which the Example:
dataset is acquired There are 2500 students enrolled in
• All of the possible events should be MATH146.
considered. Population: Students of MATH146
• Variable that describes population is known Parameter: 2500 (population size)
as parameter.
Sample
• Small group taken from the population Example:
Of the 5,786 students enrolled in
• A group heterogeneous as possible taken
MATH10-1, 3,456 are females.
from the large group to represent the
Sample: Female students in MATH10-1
population
Statistic: 3,456 (sample size)
• Variable that describes sample is known as
statistic.
Statistics
Data can be used in different ways. The body of knowledge called
statistics is sometimes divided into two main areas, depending on
how data are used. The two areas are the Descriptive statistics
and Inferential statistics.
Descriptive
3. Cigarette smoking is associated with 25% of the 2500
new cases of lung cancer in 2005.
Inferential 4. A survey says that 1 out of 50 Filipinos is a member of
a fitness center.
Sampling and Sampling Techniques
Target
Pop.
(N)
Sample (n)
• Effective sampling produces a n which is representative of N
• Note: n is only ever representative of the N it was drawn from,
i.e. not necessarily the general population.
Sampling
• Sampling is the basis for inferential statistics.
• A sample is a segment of a population. It is, therefore, expected to reflect
the population. By studying the characteristics of the sample one can make
inferences about the population. There are several reasons why we take a part of
the population to study rather than taking a full census of the population. These
are:
• Sampling takes less time.
• Samples cost less.
• Samples are more accurate. Sample observations are usually of higher
quality because they are better screened for errors in measurement and
for duplication and misclassifications;
• Samples can be destroyed to gain information about quality (destructive
sampling).
• A random sample is a sample obtained in such a way that each possible
sample of fixed size n has an equal probability of being selected.
Sampling Techniques