Sunteți pe pagina 1din 74

PSYC2010

Psychological Research Methodology 2 Lecture 1

todays lecture

timetable, assessment, housekeeping how to get through why psychology often requires statistics normal distributions z-scores sampling distribution of the mean

Revision from First year

Course coordinator
Sakinah Alhadad
office: email: S319, Social Sciences Building s.alhadad@psy.uq.edu.au

consult hours: Wednesdays 1pm 3pm or by appointment, or in the lecture break, or after the lecture

Tutorial allocation
Tutorials

start tomorrow sign on via my sisi-net

assessment
Mid Mid-semester exam (30%) (in lecture 5, week 3)
z

MC (theory) and calculation analysis, interpretation and writewrite-up of results of a psychological experiment MC (theory) calculation

Assignment (15%)
z

Final exam (55%)


z z

Final grade cutcut-offs


7 6 5 4 3 2 1 85 - 100% 75 - 84.99% 65 - 74.99% 50 - 64.99% 45 - 49.9 30 - 44.9% 0 - 29.9%

material

Textbook z Field (2009 OR 2005). Discovering statistics using SPSS


Other references:
z

Howell (2007). Statistical methods for psychology. 6th Ed. Other stats textbooks, the internet anywhere that might help!

Calculator for tutorials and for exams


z z

Must not have statistical functions NonNon -programmable

Tutorial workbook (handed out in tutes tutes) )


z

Electronic copy on BB

Optional material

APA manual (a must have really)

Publication Manual for the American Psychological Society (5th ed ed) ) UQ library: Z253 .A38 2001

Findlay (bit more useruser-friendly)

Findlay, B (2006). How to write psychology reports and essays (4th ed.). Frenchs Forests: Pearson UQ library: BF76.8 .F57 2006 (most are at Ipswich campus but you can order it in)/ in)/ BF76.8 .F57 2008

To get through...

you should understand first year statistics (but dont panic if you didnt) attend lectures & tutes Remember if you are doing 4 courses you are expected to work on your coursework 40hours per week this should be considered a fulltime job In the summer, this means that you should commit 20 hours per week to this course. keep up with lecture material
on Blackboard lecture notes use textbook/s

attend tutorials and actually do the exercises (need to do stats to understand) prepare for assessment and consult tutors if you are having problems (during consultation hours tutors are partpart-time)

tips

statistics is very much like learning a new language rehearse until basic words and symbols are automatic to you Accept that a certain amount of doubt and confusion will occur in a stats subject. This doesnt necessarily mean theres something wrong, or that youre not coping dont panic if youre not following something in a lecture. Often things dont sink in first time round think about it and follow it up in tutorials when you come to something that stresses you out (e.g., formulae) teach yourself to slow down rather than speed up keep trying to relate your stats back to realreal-world examples Make friends form learning groups, but dont become dependent on others others take responsibility for your own learning (in honours you have to do your stats) Make sure you really understand the concepts from 2010 this basic knowledge is expected in 3010 (and courses after that)

Access to lecture/tute material


z

See the PSYC2010 BB site for:


Lecture Slides General materials used in tutorials Practice exams Lecture notes posted Monday evening; tutorial notes Wednesday evening/ Thursday morning (after tute) This course will be podcast (assuming I dont muck it up) Occasionally symbols miraculously change between my PC and yours please let me know if my slides look different to yours
Note that I will not put the pictures on your slides.. They just get too big, and are not necessary..

z z

BUT PLEASE NOTE: z Additional material created by individuals tutors may not be available z Access to course materials is a courtesy to students not an obligation

tutors

Can help you with:


z z

Understanding the lecture material or the material covered in tutorials Exam/assignment questions Catch-up sessions/private tutoring for those who have missed class Via email:
Your message should include your name and a contact phone number.

Cannot help you with:


z

Are available:
z

In person:
Tutors will provide office hours close to exam periods. See BB site closer to the time.

PLEASE NOTE:
z

Tutors are employed on a casual basis and will read their email during consultation times. Responses may not be instant. Use your student account UQ and PSY spam filters may automatically remove non-student account emails before reaching their destination

other help

Forum
z

A peer learning resource: communication primarily between students However, the assignment is not to be discussed on the forum The forum will also be monitored by teaching staff

Talk to me, and remain anonymous

Give me feedback throughout the semester, anonymously

lecture 1 - preview
A

general review from earlier courses standard normal distribution finding areas under the normal curve sampling distribution of the mean

Data type? Quantitative (measurement)


Question about relationship

Qualitative (categorical)
One variable: Goodness of fit chi-squared Two variables: Contingency table chi-squared

Hypothesis testing: differences

Single sample compared to population If only pop. mean known: t-test

Comparison between groups

If pop. variance known: z-test

Two Groups

Multiple Groups

Form of relationship

Degree of relationship

linear regression power Dependent Groups: Matched samples t-test Wilcoxons MP signedranks and Sign test Independent Groups: Independent groups t-test Dependent Groups: Repeatedmeasures ANOVA Independent Groups: One-way ANOVA

Pearson correlation & point biserial correlation

Multiple Comparisons A priori (planned): Bonferroni t & Linear Constrasts

Spearmans rho

Wilcoxons Rank Sum test

Friedman

KruskalWallis

Post hoc: Scheffe Test

= parametric tests = non-parametric tests

Recall: samples and populations

population of scores on a variable (not of people) population = the entirety of scores


descriptive parameters: e.g.: mean (mu: ) and standard deviation (sigma: )

samples are small number of scores selected from the entire population
descriptive statistics: e.g.: mean (x(x-bar: X ) and standard deviation (s or SD)

Recall: Normal distribution

distribution - a graphical representation that associates a frequency or probability with each value of a variable the normal distribution is the most important distribution in statistics the dependent variable is often assumed to be normally distributed (e.g., for parametric tests) allows inference about values of a variable in the population
number of pe people

70 60 50 40 30 20 10 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 score

characteristics of the normal distribution

z z z

bell shaped unimodal and symmetrical (mode=mean=median) tails extend indefinitely (although often cant show this on graphs) area under the curve = 1 (100%)

Recall: standard deviation and standard scores (z(z-scores)


z

z-transformation transforms a normal distribution into a standard normal distribution

how to do a zz-transformation

computing z z-scores
Z=

score - mean standard deviation

So, a Z score is simply describing how far a data point (e.g., a persons score) lies away from the mean, expressed in standard deviation units

computing z z-scores
Z=

score - mean standard deviation


your score (e.g. your height)

X-X z= s
your score as a z-score

z=

X-

z-transformation

if you convert your raw scores into zz-scores and plot the z z-scores on a frequency distribution, its called a standard normal distribution (mean = 0; and SD = 1)

-3

-2

-1

z-transformation
the standard normal distribution
(1) allows comparison of performance across different tests

(1) allows us to compare scores from different distributions

Suppose I got 7 out of 10 in an Art History essay, and 62% in Parasitology exam. Am I doing better in Art History or Parasitology Parasitology? ? That depends on the mean and standard deviation of these assessments.
Suppose the Art History essay stats are: N = 37 X = 4.97 s = 2.09 z Parasitology test stats are: N = 60 X = 49.04 s = 8.31
z

Example: converting to zz-scores


_ ZE = X X = 7 4.97 = 0.971 s 2.09 _ ZP = X X = 62 49.04 = 1.560 s 8.31

Art History +0.97 Parasitology +1.56

-3

-2

-1

so

z-transformation alters the mean and SD of a variable, but not the relative location of scores z-transforming data of a normal distribution results in standard normal distribution (mean = 0; and SD = 1) z-scores represent the number of standard deviations that X is from mean

z-transformation
the standard normal distribution
(1) allows comparison of performance across different tests (2) tells you how many people score above or below you on a certain measure

(2) finding areas under the curve


and can be used to calculate the probability
that values will lie within a specified interval and we can use the following formula to work out that probability.

1 (X)2 / f(X) = (e) 2


or not

finding areas under the curve

Luckily someone sat down and created tables to provide these probabilities And even more fortunately, because any normal distribution can be transformed into the standard normal distribution (i.e., zz-transformation) only one set of tables is needed

50% of scores above/below mean

-3

-2

-1

68.26% between -1 and +1 SDs

-3

-2

-1

95.44% between -2 and +2 SDs

-3

-2

-1

for example
lets

say you want to find out how much caffeine you drink relative to the rest of the university student population
= 115mg/day, =15mg

lets

say your intake is 125 mg/day...

use of tables of areas under normal curve


1. finding the area between and score above it

f(X)

(115)

125

i) convert 125 to z-score


z

z = 125_ 125_-_115 15 z = 10 = 0.667 15

ii) use tables to find area between mean and z

use of tables of areas under normal curve


1. finding the area between and score above it

f(X)

(115)

125

i) convert 125 to z-score


z

z = 125_-_115 15 z = 10 = 0.667 15

ii) use tables to find area between mean and z

use of tables of areas under normal curve


1. finding the area between and score above it

f(X)

(115)

125

i) convert 125 to z-score


z

z = 125_ 125_-_115 15 z = 10 = 0.667 15 area = .2486 (about 25% of scores fell between 115 and 125)

ii) use tables to find area between mean and z


z

use of tables of areas under normal curve


2. finding the area beyond the score
f(X)

(115)

125

i) use tables to find area beyond z

use of tables of areas under normal curve


2. finding the area beyond the score
f(X)

(115)

125

i) use tables to find area beyond z

~25% will lie above z = .67

3. finding the area below a score above the mean

use of tables of areas under normal curve

f(X)

(115)

125

i) use tables to find area between mean and z


z

area = . 2486 area =. = 7486 (equals 1- . 2514)

ii) add .50


z

use of tables of areas under normal curve


3. finding the area below a score above the mean

f(X)

(115)

125

i) use tables to find area between mean and z


z

area = . 2486 area =. 7486 (equals 1- . 2514)

ii) add .50


z

Or alternatively.

use of tables of areas under normal curve


4. finding the area between and score below it

f(X)

100

(115)

e.g., another student drinks 100 ml of caffiene i)convert 100 to z-score


z

z = 100 - 115 15 z = -15 = -1.000 15

ii) ignore negative sign, and look up tables

use of tables of areas under normal curve


4. finding the area between and score below it

f(X)

100

(115)

i) convert 100 to z-score


z

z = 100 - 115 15 z = -15 = -1.000 15

ii) ignore negative sign, and look up tables

use of tables of areas under normal curve


5. area between scores on the opposite side of the mean

f(X)

105

(115)

125

i) convert two scores to z-scores


z z

z you intake = 0.667 z yet another student = 105_ 105_-_115 = -0.667 15 . 2486 + .2486 = .4972

ii) find out the two areas and add them


z

use of tables of areas under normal curve


6. area between scores on the same side of the mean

f(X)

(115)

125

139

i) convert scores to zz-scores


z z

Z another student = 139_ 139_-_115 = 24 = 1.600 15 15 Z you = .667 (area = .2486) .4452 - .2486 = .1966

(area = .4452)

ii) take difference between two areas


z

finding a score when the area is known


We know that a score is above 80% of others, but do not know the score

X z=

First, rework the equations

z = X X = z

Then, remember that .80 = .50 + .30

f(X)

Then, use the table to find the z-score for .30

(115)

125

finding a score when the area is known


We know that a score is above 80% of others, but do not know the score

z = X -

First, rework the equations

z = X X = z

Then, remember that .80 = .50 + .30 Then, use the table to find the z-score for .30

f(X)

(115)

125

finding a score when the area is known

z = X X = z
So, given this equation

And the knowledge that the z-score of .30 is .84 X = 115 + (15) (.84) X = 127.600

A person with a score above 80% of other peoples scores has a score of 127.6

z-transformation
the standard normal distribution
(1) allows comparison of performance across different tests (2) tells you how many people score above or below you on a certain measure (3) allows you to make inferences concerning the probability that different scores will be obtained

3) comparing a sample with a population

its all well and good to compare individual scores with a mean, but research usually involves looking at groups of scores (i.e., a sample of scores) example: Lets say you want to know if students with a high GPA drink more (or less) caffeine than average in this case you need to compare a mean (X) for the sample with the population mean ( )

example

lets say you collect a sample of 50 students whose GPA is 6.0 or better. You find that the mean for this sample is 105mg. Suppose we know the mean for the population is 115mg. What can we conclude from this? you might be tempted to say that caffeine consumption negatively effects studying but the difference between the sample mean and the population mean could be caused by other factors

inferential statistics

using sample data to make inferences about population parameters if nothing else is known, the statistics of a sample (e.g., the mean) are the best estimates of the population parameters (e.g., height of UQ students based on this class).

But samples may fail to provide good estimates of population for two reasons:

(1) sampling bias and (2) sampling error

1) sampling bias

due to faulty sampling methods, some important subgroups of the population may be overover- or underunder -represented in our sample
z

Systematic variation (e.g., inadvertently got low caffeine consumers, more women study psychology)

e.g., a classic example


z
z

1948 telephone poll for US elections.


Thomas Dewey (Republican) was predicted to win by large margin Harry S Truman (Democrat) won easily Why?

z z

2) sampling error

no matter how careful we are, no two samples from the same population will be identical - by chance there would be natural variation in scores (sampling error). if I took a random sample of 50 students from anywhere, it would be a complete fluke if the mean for that sample was exactly the population mean (115mg). the term sampling error implies a mistake but this is misleading its a natural thing and cant be helped. so the question is not whether the sample mean differs from the population mean (it almost always will) but how likely is it that the difference we observed could have occurred by chance.

statistical inference

statistical inference is the foundation of hypothesis testing we use sample data to make inferences about population parameters this allows the researcher to determine the probability that a sample is from one population and not another it enables the researcher to evaluate the veracity (truth) of a hypothesis as if a whole population was available instead of just a small (but hopefully) representative sample

sampling distributions

the distribution of a statistic that we would expect if we drew an infinite number of samples (of a given size) from the population

sampling distributions

the distribution of a statistic that we would expect if we drew an infinite number of samples (of a given size) from the population sampling distributions have means and SDs can have a sampling distribution for any statistic, but the most common is the sampling distribution of the mean

sampling distribution of the mean


population of four scores:
(how many cups of coffee 2010 students drink in a day)

1, 2, 3, 4
(everyone either has 1, 2, 3 or 4 coffees per day)

= 2.5 = 1.118

sampling distribution of the mean


Draw all possible samples (n = 2) with replacement: sample
1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4

mean
1.0 1.5 2.0 2.5 1.5 2.0 2.5 3.0

sample
3,1 3,2 3,3 3,4 4,1 4,2 4,3 4,4

mean
2.0 2.5 3.0 3.5 2.5 3.0 3.5 4.0

X = 2.5, X = 0.791
The X stands for the sampling distribution of the mean

sampling distribution of the mean in diagram form


5 4

3 frequency 2

0 1 1.5 2 2.5 mean 3 3.5 4

sampling distribution of the mean


the distribution resembles the normal distribution, not the original population the mean of the sampling distribution is equal to the mean of the actual population

X =

standard error of the mean

standard error of the mean is the standard deviation of the distribution of sample means
X

it represents the typical or average distance between a sample mean X and the mean of the population it is used to define and accurately measure sampling error

sampling distribution of the mean


how to calculate the standard error of the mean.

X =
e.g.

X
N

The standard deviation of the scores from the whole population The number of people in each sample

1.118 X = = 0.791 2

making inferences from sampling distribution of the mean


Example: You want to test the theory that high doses of caffeine improve statistical performance. To test this, you take a random sample of 25 PSYC2010 students and give them high doses of caffeine throughout the semester the mean result for this sample is 80%. Over the years you know that PSYC2010 students have averaged 70% (standard deviation = 20). Question: Given a normally distributed population, with = 70 and x= 20, what is the probability of obtaining a sample (N=25) with a mean of 80 or higher?

making inferences from sampling distribution of the mean


calculate the standard error of the mean
X =
N

20 X = 25

20 X = 5

X = 4.00

on average we expect a sample to have a mean


that differs 4 points from the true population mean

making inferences from sampling distribution of the mean


transform the sample mean of 80 to a z-score using standard error of the mean as the denominator (not standard deviation of the scores)

z =

X -

80 - 70 z = 4.00 10 z = = 2.50 4.00

were actually asking about the mean of our sample relative to the sampling distribution of the mean (of the population) what is the likelihood that our mean comes from this population?

making inferences from sampling distribution of the mean


transform the sample mean of 80 to a z-score using standard error of the mean as the denominator (not standard deviation of the scores)

z =

X -

80 - 70 z = 4.00 10 z = = 2.50 4.00

were actually asking about the mean of our sample relative to the sampling distribution of the mean (of the population) what is the likelihood that our mean comes from this population?

making inferences from sampling distribution of the mean


use tables to determine area beyond z

~.6%

making inferences from sampling distribution of the mean


what is the probability that the sample mean will differ from the population mean by 10 points or more?

.0062 + .0062 = .0124 1.2% of samples are expected to differ 10 or more points

.0062

.0062

-3

-2

-1

To Sum

The SE is the sd of sample means It is a measure of how representative a sample is likely to be of the population A large SE (relative to the sample mean) indicates that there is a lot of variability between the means of different samples ~ sample may not be representative of the population A small SE indicates that sample means are similar to the population mean ~ our sample is likely to be an accurate reflection of the population

lecture 1 - review
A

general review standard normal distribution finding areas under the normal curve sampling distribution of the mean

S-ar putea să vă placă și