Sunteți pe pagina 1din 31

Chapter 1

Everything you ever wanted to know about Statistics

What will this chapter tell us?


This chapter will tell you the overview of some important statistical concepts.

BUILDING STATISTICAL MODELS


Real world phenomenon; it is the actual phenomenon that really exists in a world. The researcher wants to build a model which most closely resembles the real world phenomenon.

How to Build Statistical Model?


Collecting data from the real world Analyse the data to draw conclusions Building statistical model based on conclusion.

Why do we build statistical model?


Analogy: inference that if two or more things agree with one another in some respect they will probably agree in others.

We build statistical model of real world processes in an attempt to predict how these processes operate under certain conditions.

Explanation: Imagine a engineer who wishes to build a bridge across a river. a) The engineer collects data from the real world i.e. looks at bridges in the real world. E.g. bridge structure, usage, material's it made of. b) Uses this information to construct a model. c) The engineer might test whether the bridge can withstand strong winds, by placing the model in a wind tunnel. d) it is important that the model is an accurate representation of the real world.

Fit of the model


The degree to which a statistical model represent the data collected is known as the fit of the model.

Types of Model based on Fit


Good fit: The If the model is an excellent representation of the real world situation it is said to be a good fit Moderate Fit: If the model has some similarities of the real world but there are some big differences to the real world that is called moderate fit. Poor Fit: If the model bears no structural similarities to the real bridge it is termed as poor fit.

Population & Sample


As a researcher, we are interested in finding results that apply to an entire population of people or things. The bridge building engineer cannot make a full size model of the bridge she wants to build and so she builds a small scale model and tests this model under various conditions. From the results obtained from the small scale model(sample) the engineer infers things about how the full size bridge(population) will respond. The bigger the sample, the more likely it is to reflect the whole population.

Simple Statistical Model


Mean Sum of Squares Variance Standard Deviations

Mean
The mean is a hypothetical value that can be calculated for any data set. It is not have to be a value that is actually observed. This can be calculated by adding the values we obtained and by dividing the number of values measured. Formula:
(1+2+3+3+4)/5= 2.6 Hypothetical Mean

Sum of Squares

Standard Deviation(S.D)
Standard deviation is the square root of the variance, which ensures that the measure of average error is in the same units as the original measure. Formula: Interpretation:
The S.D is a measure of how well the mean represents the data. Small S.D indicates that the data points are close to the mean. The large S.D indicates that the data points are distant from the mean. A S.D of 0 would mean that all of the scores were the same.

Diagrammatic representation of S.D

What is a Frequency Distribution?


Also called the histogram, It is simply the graph plotting the value of observations on the horizontal axis, with the bar showing how many times each value occurred in the data set. By looking at which score has the tallest bar, we can immediately see the mode(most frequent score).

When frequency distribution is Normal?

If we draw a vertical line through the center of the distribution then it should look the same on both sides, this is known as a normal distribution. It is characterized by the bell shaped curve. This shape basically implies that the majority of score lies around the center of the distribution.

Properties of Frequency distribution


There are two main ways in which the distribution can deviate from normal.
Skewness: lack of symmetry Kurtosis: pointyness of the curve.

Properties of Frequency Distribution


Skewed Distribution: this distribution is not symmetrical and the most frequent scores are clustered at one end of the scale.

Types of skewness
Positive skewness: Negative skewness

Properties of Frequency Distribution


Kurtosis (pointyness): refers to the degree to which scores cluster in the tails of the distribution. Types of kurtosis
Platykurtic Leptokurtic

Interpretation of Normal Distribution


Ideally we want our data to be normally distributed that is not too skewed, and not too pointy or flat.
In a normal distribution the value of skew and kurtosis is 0. If the distribution has values of skew or kurtosis above or below 0 then this indicates a deviation from normal

The Standard Deviation(S.D) and the Shape of the Distribution


SD also tells us about the shape of the distribution
If the mean represents the data well then most of the scores will cluster close to the mean and the resulting SD is small When the mean is worse representation of data, the scores cluster more widely around the mean and the SD is large. Fig shows two distribution that have the same mean (50) but different SD.

SD= 25

SD=15

The Standard Error


What is the difference between SD(Standard Deviation) and SE(Standard Error)?
Standard error is the SD of the sample means. This would give you the measure of how much variability there is between the means of different samples.

Population Mean

If we take several samples from the same population , then each sample has its own mean.

Sampling Distribution is simply the frequency distribution of sample means from the same population. Is my sample representative of the population?
It can be asked through Standard Error. A large Standard error(relative to the sample mean) means that there is a lot of variability between the means of different samples and so the sample we have might not be representative of the population A small SE indicates that the most sample means are similar to the population mean and so our sample is likely to be an accurate reflection of the population.

Confidence Interval
Confidence Interval tells us with a known degree of confidence as to where the population value(parameter) actually lies. An interval can be computed by adding and subtracting a margin of error to the point estimate. = Point Estimate Margin of error

Hypothesis
Scientists are usually interested in testing hypothesis: that is, testing the scientific questions that they generate. Within these questions, there is usually a prediction that the researcher has made. This prediction is called experimental hypothesis. The reverse possibility that your prediction is wrong is called the null hypothesis. Example:
Hamburgers make you fat
The experimental hypothesis is that the more hamburgers you eat, the more you start to resemble a beached whale; the null hypothesis is that people will be equally fat regardless of how many hamburgers they eat.

One and two tail tests


Directional hypothesis uses one tail test
Example
The more someone reads this book, the more they want to kill its author

Non Directional hypothesis uses two tail test


Example
Reading more of this book could increase or decrease the reader`s desire to kill its author

Errors in testing of hypothesis


The Null hypothesis is accepted or rejected on the basis of the value of the test-statistic(z, t, F, chi-square). The test-statistic may land in acceptance or rejection region In this rejection plan or acceptance plan, there is the possibility of making any one of the two errors which are called Type I error and Type II error.

Type I error and Type II error


Type I error:
The null hypothesis Ho may be true but it may be rejected. This is an error and is called type I error
Example: We can say type I error is committed when
An intelligent is not passed An innocent person is punished.

Type II error:

The null hypothesis Ho may be false but it may be accepted. It is an error and is called type II error.

(alpha), Level of Significance:


The probability of making type I error is denoted by (alpha)

( Beta):
The probability of making type II error is denoted by

S-ar putea să vă placă și