Sunteți pe pagina 1din 32

Statistical

Analysis
\

Curriculum Outcomes

Topic 1: Statistical analysis (2 hours)


1.1.1 State that error bars are a graphical representation
of the variability of data.
1.1.2 Calculate the mean and standard deviation of a set of
values.
1.1.3 State that the term standard deviation is used to
summarize the spread of values around the mean, and
that 68% of the values fall within one standard deviation
of the mean.
1.1.4 Explain how the standard deviation is useful for
comparing the means and the spread of data between two
or more samples.
1.1.5 Deduce the significance of the difference between two
sets of data using calculated values for t and the
appropriate tables.
1.1.6 Explain that the existence of a correlation does not
establish that there is a causal relationship between two
variables.

LETS START WITH AN EXAMPLE


Imagine

you want to study


some aspect of bean plants.
What sorts of things could
you study?
Create an hypothesis
How will you test and
measure your hypothesis?
Obviously you cant measure every bean plant
that exists!
Even thousands of bean plants are
unrealistic in terms of time...

We

must use samples of bean plants that


represent the entire population.
So what we do is grow
enough bean plants in
order to get a sample that
is small enough to efficiently
get our data but large
enough to represent the
population as a whole.

Statistics is a branch of math!


It allows us to take small portions from habitats,
communities and populations and draw
conclusions about the larger population.
Stats measures the differences and relationships
between sets of data.
As for our experiment...

Small sample compared


to large population.

Depending

on our sample size,


we can draw conclusions with a
certain level of confidence.
We can be 95% confident...
We may even be 99% confident...
But nothing is 100% confident in
science (Yikes... That makes my
scientific brain hurt...TOK
application!!!).

Descriptive Statistics
The

mean and the standard


deviation describe the data they
show us a picture that helps with
interpretation of the data.

Mean
The MEAN is the average of your
data points. It is calculated by adding
your data points together and dividing
by how many points there are.
Example:
Look at these numbers:
3, 7, 5, 13, 20, 23, 39, 23, 40, 23, 14, 12, 56, 23, 29
The sum of these numbers is equal to 330
There are fifteen numbers.
The mean is equal to 330 15 = 22

Range
Is the measure of the spread of data. It is
calculated by finding the difference between the
largest and smallest values.
The range can give us an idea of how variable the
data is
Example:
Largest value is 15, smallest value is 5
The range is 10 (15 5 = 10)

Note: very large and very small values, called


outliers, can have a very dramatic effect on the
range.

Standard Deviation (SD)


Measures

how the individual


observations of a data set are
dispersed (spread)around the mean.
We will learn how to calculate SD by
hand but usually you will use your
graphing calculator or Excel.

Calculating SD

Try

the example on your handout.


I find a table helps to organize the
calculation of SD
NOTE: In Biology, we are calculating
the sample standard deviation. In
math, you will calculate the
population standard deviation.
Funny s for math, Sx for biology on
your graphing calculators!

We

use standard deviation to measure


the spread of our values around the
mean.
If our data has normal distribution
(meaning our values are clustered
around the mean) then we assume that:
About 68% of our values lie within 1
SD of the mean.
This number rises to 95% for 2 SD
from the mean.

Error Bars

Are

graphical representations of the variability


of the data. Error bars can show either the
range of the data or the SD on a graph.

Back to our bean plant...


A sample of 100 bean plants
Some extremes (very small, some very large)
But when plotted our data should look something
like a bell curve with the majority of our data
centred around the mean.

The Normal Distribution


A

flat bell curve indicates that the data is


spread out widely from the mean.
Thus, the standard deviation would be
large.

bell curve that is very


tall and narrow shows
that the data is very
close to the mean.
Thus, the standard
deviation would be
very small.

Significant Difference between two


means
To determine if a difference between two data
sets is significant a t test is commonly used.
A t-test compares two sets of data.

T tables
Along

one side of the table of critical


values of t, you see probability (p), this is
the likelihood that chance alone could
produce your results.
If p = 0.50 that means that the difference
is due to chance 50% of the time. This is
not significant.
If p = 0.05 then only 5% of the difference is
due to chance and means the 95% of the
difference is due to one set of data actually
being different from the other. This is
considered to be a significant difference.

The

mean, standard deviation and sample


size are all used to calculate the value of t.
On the left column you will notice the
degrees of freedom this is calculated by
adding the two sample sizes together and
subtracting 2.
Line up the degrees of freedom and the
0.05 level of significance and this will give
you the critical value of t for your data.

Compare

this critical value with the


calculated value of t
***If the calculated t value is larger than
the number on the chart then the two
groups are significantly different from
each other!

LETS DO AN EXAMPLE TO CLEAR


THE MUD...
Ms.

Chris conducted an experiment. She


wanted to study the effect of a hair growth
product on the length of toe hair. She
measured the length the hair on the toes
of students in her biology class (Sample X)
and then she had the students apply the
growth product daily for one week and
measured the length of the hair again
(Sample Y).
Both groups were normally distributed

for sample X was 23


N for sample Y was 19
Ms. Chris did some fancy
math and calculated t = 2.956.
Use a level of significance of 0.05
Can we conclude that the hair
growth product resulted in
significant hair growth?

We

will be testing the null


hypothesis that is the two groups are
the same.
Step 1: Calculate the degrees of
freedom:
df= (N1+N2)-2
Df = 23+19-2 = 40

Step

2: Use the chart of critical values of t


Line up 0.05 with 40
This gives us a critical value of t to be 2.021

Step

3: Evaluate
Remember*** If the calculated t value is
larger than the number on the chart, then the
groups are significantly different from each
other
So... We were given a calculated t of 2.956
which is larger than the table value so....

We

can conclude that the two


groups are significantly different
from each other and that the
hair growth formula resulted in
significant hair growth!

Try another...
Are

two samples with df=17 and


a calculated t value of 1.935
significantly different at 0.05?

Correlation does not mean causation!


We

make observations about the


world around us all the time.
We might notice that our grass turns
yellow when its soil is dry, this is a
simple observation.
We might do an experiment to see if
watering our grass prevents
yellowing.

Observing that the yellowing occurs when soil is


dry is a simple correlation, but the experiment
gives us evidence that a lack of water is the
cause of the yellowing.
Experiments provide a test that shows cause,
observations without a test only show a
correlation.

Odd Examples...

Ice cream sales and the number of shark attacks


on swimmers are correlated.
The number of cavities in elementary school
children and vocabulary size have a strong
positive correlation.
Clearly there is no real interaction between the
factors involved simply a co-incidence of the data.

Application and Consolidation


Now

its time to apply what we have


learned.
Please complete the worksheet called
Statistical Analysis Application Practice
You can put your heads together with the
person sitting next to you
We will go over everything when you have
had a chance to try the best way to learn
is to try!

S-ar putea să vă placă și