Documente Academic
Documente Profesional
Documente Cultură
An Introduction
Presented to:
By
Muhammad Mushtaq Ahmed Mangat
Textile Faculty
Technical University Liberec
Descriptive stat.....................................................................................................................................6
Inferential stat.......................................................................................................................................6
Types of samples..................................................................................................................................6
Number systems...................................................................................................................................7
Data......................................................................................................................................................7
Variable................................................................................................................................................7
Independent variable............................................................................................................................7
Dependent variable...............................................................................................................................7
Univariate data.....................................................................................................................................7
Bivariate Data.......................................................................................................................................7
Multivariate data..................................................................................................................................7
Ordinal..................................................................................................................................................8
Nominal................................................................................................................................................8
Cross-sectional data..............................................................................................................................8
Primary.................................................................................................................................................8
Secondary.............................................................................................................................................8
List of data............................................................................................................................................8
Data Frequency....................................................................................................................................8
Part Two...............................................................................................................................................8
Frequency Table...................................................................................................................................9
Pie Chart...............................................................................................................................................9
Bar Chart..............................................................................................................................................9
Area Charts.........................................................................................................................................10
Line Charts.........................................................................................................................................11
Dot Plot..............................................................................................................................................12
Histogram...........................................................................................................................................12
Radar Charts.......................................................................................................................................13
Map chart............................................................................................................................................13
Polygon charts....................................................................................................................................14
Range..................................................................................................................................................15
Arithmetic mean.................................................................................................................................15
Geometric mean.................................................................................................................................15
Trimmed Mean...................................................................................................................................16
Median................................................................................................................................................16
Mode...................................................................................................................................................16
Percentiles..........................................................................................................................................16
Variance.............................................................................................................................................16
Percentile summary............................................................................................................................17
Standard Deviation.............................................................................................................................17
Normal Distribution...........................................................................................................................17
Skewed Distribution...........................................................................................................................17
Kurtosis..............................................................................................................................................18
Sampling distribution.........................................................................................................................18
Binomial distribution..........................................................................................................................20
Correlation..........................................................................................................................................20
Hypotheses Testing............................................................................................................................20
Null hypotheses..................................................................................................................................20
Z Score Test.......................................................................................................................................21
Types of Z test....................................................................................................................................21
Z value calculation.............................................................................................................................22
P value................................................................................................................................................23
Correlation..........................................................................................................................................23
Regression Analysis...........................................................................................................................24
Explanation of model:........................................................................................................................26
Chi-Square Test..................................................................................................................................27
Crosstabs............................................................................................................................................29
Part One: Statistics Definition and Functions
Statistics is an art and science of collecting and understanding data. Main functions:
1. Gathering
2. Arranging
3. Analyzing
4. Exploring the data
5. Estimate the unknown quantity
6. Presenting results
7. Interpreting results
8. Making available for decisions
9. Designing plan for data collection
10. Hypotheses testing
Descriptive stat
Descriptive statistics are used to describe the main features of a collection of data in quantitative
terms (en.wikipedia.org/wiki/Descriptive_statistics)
Inferential stat
A statistical inference is a conclusion made on the basis of data which is subject to random variation
of some kind, possibly observation errors or sampling variation
(en.wikipedia.org/wiki/Inferential_statistics)
Types of samples
Random sample, Stratified sample, Quota sample, Purposive sample, Convenience sample
Number systems
Natural : 0, 1, 2, 3, 4, 5, 6, 7, ..., n
Integers: −n, ..., −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, ..., n
Positive integers: 1, 2, 3, 4, 5, ..., n
Rational: a/b where a and b are integers and b is not zero (3/4)
Real: The limit of a convergent sequence of rational numbers (-1.23, 1.234)
Complex: a + bi where a and b are real numbers and i is the square root of −1
Prime numbers: a natural number that has exactly two distinct natural number divisors: 1 and itself
(1,3,5,7,11)
Irrational number: The irrational numbers are in fact precisely those infinite decimals which are not
repeating (7/22 Pai)
Data
Data refers to any kind of recorded information
Variable
A piece of information recorded for every item is called a variable
Independent variable
A variable, which can be exploited during experiment
Dependent variable
A variable affected by the exploitation of independent variable
Univariate data
It is a data set which one piece of information has recorded for each item.
Bivariate Data
Such data sets have exactly two pieces of information recorded for each item
Multivariate data
Such data sets have three or more pieces of information recorded for each item
Ordinal
In this there is a meaningful order e.g. 1 to 5 where 1 is the dull and 5 is full bright
Nominal
Where there is no meaningful order e.g. name of different departments
Cross-sectional data
Data collected at point of time e.g. grades of students in first term
Primary
Data collected for a specific purpose
Secondary
Previously collected data for another use
List of data
It is the simplest kind of data. It represents some kind of information.
Data Frequency
Frequency of data shows how often the various values occur in the data set. Normally presented in
shape of histogram
(source: http://www.stats.gla.ac.uk/steps/glossary/presenting_data.html#freqtab and results of
Google image research)
Part Two
Part Two
Central Tendency and Data Spread
Frequency Table
Pie Chart
Bar Chart
Area Charts
Line Charts
Dot Plot
Useful to identify any outliers, line of values also useful for this purpose.
Histogram
Histogram with normal curve
Radar Charts
Map chart
Stem and Leaf Plot
Polygon charts
Variability means the extent to which data values differ from each other.
Diversity, dispersion, spread and uncertainty have the same meanings
Range
Highest values-smallest value
Arithmetic mean
Geometric mean
Trimmed Mean
In this case some extreme values are removed for unbiased mean
Median
Halfway point of data set (n+1)/2 in case of odd number, in case of even number mean of two
middle values
Mode
The most common category
Percentiles
Percentiles are summary measures expressing ranks as percentage 0% to 100% rather than 1 to n.
These are used:
To indicate the data value at a given percentage
To indicate the percentage ranking of a given data value
Variance
For population
For samples
Percentile summary
Value attained by a given percentage after they have been ordered from smallest to largest.
Standard Deviation
It is an indication how different the numbers are from one another.
Normal Distribution
It is an idealized, smooth, bell-shaped histogram with all of the randomness removed.
It represents an ideal set that has lots of numbers concentrated in the middle.
It is common for statistical procedures to assume that the data set is reasonably approximated by a
normal distribution. Example with 5 and standard deviation:
Skewed Distribution
It is neither symmetric nor normal, because data values trail off more sharply on one side the on the
other. Pearson suggest following equation to measure skewness1:
Negative Positive
1
Online Statistics: An Interactive Multimedia Course of Study
Kurtosis
Sampling distribution
It is a distribution of the statistic for all possible samples of a given size from a population. It is
highly dependent on the distribution of population.
2
Online Statistics: An Interactive Multimedia Course of Study
Following figures are different mean and SD.
Correlation
1. A causal, complementary, parallel, or reciprocal relationship, especially a structural,
functional, or qualitative correspondence between two comparable entities: a correlation
between drug abuse and crime.
2. Statistics. The simultaneous change in value of two numerically valued random
variables: the positive correlation between cigarette smoking and the incidence of lung
cancer; the negative correlation between age and normal vision.
3. An act of correlating or the condition of being correlated3.
Hypotheses Testing
Statistical hypothesis test, or more briefly, hypothesis test, is an algorithm to state the alternative
(for or against the hypothesis) which minimizes certain risks
Null hypotheses
It is denoted by Ho and represents the default possibility about the population that you will accept
unless you have convincing evidence to the contrary.
Example:
Ho: μa = μ0
3
http://www.answers.com/topic/correlation
Ha: μa μ0
One tail test: population mean is greater/lesser that the sample mean,
Two Tail Test
In this case researcher claims that the sample mean may be different than the population mean
(greater or lesser).
Z Score Test
Considering central limit theorem lots of statistic analysis are possible since distribution is normal.
Z-tests are better if the sample size is not too small. It tells distance in standard deviation form from
the mean of a data set.
Z-test is a statistical test where normal distribution is applied and is basically used for dealing with
problems relating to large samples when n ≥ 30 (http://www.experiment-resources.com/z-test.html#ixzz0zCnm9iX5)
Types of Z test
1. Z test for single proportion to test hypothesis on a specific value of proportion, Ho: P=Po.
2. For two different groups of data, drinking habits of male and female
3. Test the specific value on a population. It is used when sample size >30 and standard
deviation is known.
4. Test of variance on a specific value of population variance.
5. Test of equality of two sets of variable when sample size >304.
Z value calculation
Formula of Z value:
Z value will be used to find the corresponding P value in table and will be compared with critical Z
value and if the P value is less than alpha, we reject the null hypothesis.
P value
P values indicates the probability if the test statistics are properly distributed under normal curve as
it was assumed in null hypothesis. The smaller p value supports to not accept the null hypothesis.
More common is 0.05(95%) significance; however 0.1 and .01 are also used.
4
Choudhury, Amit (2009). Z-Test. Retrieved [Date of Retrieval] from Experiment Resources: http://www.experiment-resources.com/z-test.html
Read more: http://www.experiment-resources.com/z-test.html#ixzz0zCqH1V8C
Correlation
1. For parametric statistic (Pearson's product-moment correlation)
2. For nonparametric statistic (Spearman's rank correlation). 5
Regression Analysis
Regression analysis is a process to find the best fit line to explain the relationship between the independent and
dependent variable. It is written as:
Simple regression:
Y=b0+ b1X+є
5
http://www.answers.com/topic/correlation-coefficient
6
Online Statistics: An Interactive Multimedia Course of Study
Multiple regression:
Y=b0+ b1X1++b2X2+b3X3+….bnXn+є
Where:
b0= interception on Y axis
Y= value of dependent variable
b1…b3=coefficient of independent value
X=independent variable
Є=noise or effect of unknown variable (it may be ignored)
Model Summary
Change Statistics
a. Predictors: (Constant), Thermal Conductivity at Dry State Wm^1K^-1, Sample Thickness at Dry State (mm),
ANOVAb
Sum of Mean
1 Regressio
10854.967 3 3618.322 21.659 .000a
n
Total 20210.112 59
StateW.m^-2.s1/2. K-1)
Coefficientsa
Thermal Resistance at
12696.547 10202.604 1.786 1.244 .219 .004 249.080
Dry StateK.m2W^-1)
Sample Thickness at
-334.270 199.663 -1.937 -1.674 .100 .006 161.903
Dry State (mm)
Thermal Conductivity
Wm^1K^-1
Explanation of model:
Adjusted R square=.512(51.2%) means that in dependent variable 51.2% changes are due to these
independent variables. Significant F change shows that model is significant. Standardized
coefficient are coefficients of independent variables. Their significance values describe the
significance of these variables in the regression equation. Less than 0.05 tells that variable is
significant.
Multinomial logistic regression7
It is used for:
1. Analyze relationship between non-metric dependent and metric dichotomous independent
variable
2. It compares the multiple group through a combination of a binary logistic regression
It used to predict:
Examples:
1. Influence of father professional and education on occupancy preference
2. Effect of food and exercise on a certain disease
3. Selection of brands based on gender and age Chi Square Test
7
Source: www.utexas.edu/.../MultinomialLogisticRegression_BasicRelationships.ppt SW388R7
Data Analysis & Computers II
Chi-Square Test
Chi- square test is used to find association between two sets of variable written in the form of a
matrix, two way table8:
Where:
X2= Chi-square value
O= observed frequency
E= expected frequency
Example:
Crosstabs
It is a non parametric test and used to measure the association between two categories by
controlling other categories.
Example:
People having high salaries are more likely to go on vocation as compared to people having low
salaries.
Most commonly Pearson chi-square, likelihood-ratio chi-square are used for test of significance.