Documente Academic
Documente Profesional
Documente Cultură
Contents
1
Statistics
1.1
Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1
Mathematical statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3
Data collection
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.1
Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.2
1.4
Types of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5
1.5.1
1.5.2
1.5.3
Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.4
Interval estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.5
Signicance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.6
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
Misinterpretation: correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
1.6
Misuse
1.6.1
1.7
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
1.8
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
1.8.1
14
1.8.2
14
1.8.3
Statistics in society . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
1.8.4
Statistical computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
1.8.5
14
1.9
Specialized disciplines
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
16
1.11 References
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
19
19
Portal:Statistics
20
21
i
ii
CONTENTS
3.1
See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
Business analytics
23
4.1
Examples of application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
4.2
Types of analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
4.3
23
4.4
History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
4.5
Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
4.6
Competing on analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
4.7
See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
4.8
References
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
4.9
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
Descriptive statistics
27
5.1
27
5.1.1
Univariate analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
5.1.2
Bivariate analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
5.2
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
5.3
External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
Quality control
29
6.1
30
6.2
30
6.3
See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
6.4
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
6.5
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
6.6
External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
Operations research
32
7.1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
7.2
History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
7.2.1
Historical origins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
7.2.2
33
7.2.3
35
7.3
Problems addressed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
7.4
Management science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
7.4.1
Related elds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
7.4.2
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
7.5
38
7.6
See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
7.7
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
7.8
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
7.8.1
42
CONTENTS
7.9
8
iii
7.8.2
Classic textbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
7.8.3
History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
Machine learning
44
8.1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
8.1.1
44
46
8.2.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
8.2
Relation to statistics
8.3
Theory
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
8.4
Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
8.4.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
8.4.2
48
8.4.3
48
8.4.4
Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
8.4.5
48
8.4.6
48
8.4.7
Clustering
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
8.4.8
Bayesian networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
8.4.9
Reinforcement learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
49
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
50
8.5
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
8.6
Ethics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
8.7
Software
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
8.7.1
51
8.7.2
52
8.7.3
Proprietary software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
8.8
Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
8.9
Conferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
53
8.11 References
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
55
56
Statistical inference
57
9.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
9.2
57
9.2.1
Degree of models/assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
9.2.2
58
iv
CONTENTS
9.2.3
Randomization-based models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
59
9.3.1
Frequentist inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
9.3.2
Bayesian inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
9.3.3
AIC-based inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
9.3.4
61
9.4
Inference topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
9.5
See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
9.6
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
9.7
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
9.8
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
9.9
External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
9.3
66
66
68
68
69
70
70
70
70
71
71
71
10.10References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
10.11Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
10.12External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
11 Regression analysis
74
11.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
75
76
76
76
77
78
11.4.2 Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
79
79
80
80
CONTENTS
80
11.9 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
11.10See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
11.11References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
11.12Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
11.13External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
12 Multivariate statistics
84
84
85
12.3 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
86
86
12.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
87
87
13 Data collection
88
13.1 Importance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
88
13.2 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
13.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
89
90
14 Time series
91
92
92
14.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
92
14.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
92
93
93
94
94
14.3.6 Classication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
95
95
14.3.9 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
14.4 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
14.4.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
96
14.4.2 Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
96
14.4.3 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
96
vi
CONTENTS
14.4.4 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
14.5 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
99
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Chapter 1
Statistics
More probability density is found as one gets closer to the expected (mean) value in a normal distribution. Statistics used in
standardized testing assessment are shown. The scales include standard deviations, cumulative percentages, percentile equivalents,
Z-scores, T-scores, standard nines, and percentages in standard nines.
Statistics is the study of the collection, analysis, interpretation, presentation, and organization of data.[1] In applying
statistics to, e.g., a scientic, industrial, or social problem, it is conventional to begin with a statistical population or
a statistical model process to be studied. Populations can be diverse topics such as all people living in a country or
every atom composing a crystal. Statistics deals with all aspects of data including the planning of data collection in
terms of the design of surveys and experiments.[1]
Some popular denitions are:
Merriam-Webster dictionary denes statistics as classied facts representing the conditions of a people in
a state especially the facts that can be stated in numbers or any other tabular or classied arrangement[2] ".
1
CHAPTER 1. STATISTICS
Scatter plots are used in descriptive statistics to show the observed relationships between dierent variables.
Statistician Sir Arthur Lyon Bowley denes statistics as Numerical statements of facts in any department of
inquiry placed in relation to each other[3] ".
When census data cannot be collected, statisticians collect data by developing specic experiment designs and survey
samples. Representative sampling assures that inferences and conclusions can safely extend from the sample to the
population as a whole. An experimental study involves taking measurements of the system under study, manipulating
the system, and then taking additional measurements using the same procedure to determine if the manipulation
has modied the values of the measurements. In contrast, an observational study does not involve experimental
manipulation.
Two main statistical methodologies are used in data analysis: descriptive statistics, which summarizes data from
a sample using indexes such as the mean or standard deviation, and inferential statistics, which draws conclusions
from data that are subject to random variation (e.g., observational errors, sampling variation).[4] Descriptive statistics
are most often concerned with two sets of properties of a distribution (sample or population): central tendency (or
location) seeks to characterize the distributions central or typical value, while dispersion (or variability) characterizes
the extent to which members of the distribution depart from its center and each other. Inferences on mathematical
statistics are made under the framework of probability theory, which deals with the analysis of random phenomena.
A standard statistical procedure involves the test of the relationship between two statistical data sets, or a data set and
a synthetic data drawn from idealized model. A hypothesis is proposed for the statistical relationship between the two
1.1. SCOPE
data sets, and this is compared as an alternative to an idealized null hypothesis of no relationship between two data
sets. Rejecting or disproving the null hypothesis is done using statistical tests that quantify the sense in which the null
can be proven false, given the data that are used in the test. Working from a null hypothesis, two basic forms of error
are recognized: Type I errors (null hypothesis is falsely rejected giving a false positive) and Type II errors (null
hypothesis fails to be rejected and an actual dierence between populations is missed giving a false negative).[5]
Multiple problems have come to be associated with this framework: ranging from obtaining a sucient sample size
to specifying an adequate null hypothesis.
Measurement processes that generate statistical data are also subject to error. Many of these errors are classied as
random (noise) or systematic (bias), but other types of errors (e.g., blunder, such as when an analyst reports incorrect
units) can also be important. The presence of missing data and/or censoring may result in biased estimates and
specic techniques have been developed to address these problems.
Statistics can be said to have begun in ancient civilization, going back at least to the 5th century BC, but it was not
until the 18th century that it started to draw more heavily from calculus and probability theory. Statistics continues
to be an area of active research, for example on the problem of how to analyze Big data.
1.1 Scope
Statistics is a mathematical body of science that pertains to the collection, analysis, interpretation or explanation, and
presentation of data,[6] or as a branch of mathematics.[7] Some consider statistics to be a distinct mathematical science
rather than a branch of mathematics. While many scientic investigations make use of data, statistics is concerned
with the use of data in the context of uncertainty and decision making in the face of uncertainty.[8][9]
1.1.1
Mathematical statistics
1.2 Overview
In applying statistics to a problem, it is common practice to start with a population or process to be studied. Populations
can be diverse topics such as all persons living in a country or every atom composing a crystal.
Ideally, statisticians compile data about the entire population (an operation called census). This may be organized
by governmental statistical institutes. Descriptive statistics can be used to summarize the population data. Numerical descriptors include mean and standard deviation for continuous data types (like income), while frequency and
percentage are more useful in terms of describing categorical data (like race).
When a census is not feasible, a chosen subset of the population called a sample is studied. Once a sample that
is representative of the population is determined, data is collected for the sample members in an observational or
experimental setting. Again, descriptive statistics can be used to summarize the sample data. However, the drawing of
the sample has been subject to an element of randomness, hence the established numerical descriptors from the sample
are also due to uncertainty. To still draw meaningful conclusions about the entire population, inferential statistics
is needed. It uses patterns in the sample data to draw inferences about the population represented, accounting for
randomness. These inferences may take the form of: answering yes/no questions about the data (hypothesis testing),
estimating numerical characteristics of the data (estimation), describing associations within the data (correlation) and
modeling relationships within the data (for example, using regression analysis). Inference can extend to forecasting,
prediction and estimation of unobserved values either in or associated with the population being studied; it can include
extrapolation and interpolation of time series or spatial data, and can also include data mining.
CHAPTER 1. STATISTICS
Sampling
When full census data cannot be collected, statisticians collect sample data by developing specic experiment designs
and survey samples. Statistics itself also provides tools for prediction and forecasting the use of data through statistical
models. To use a sample as a guide to an entire population, it is important that it truly represents the overall population.
Representative sampling assures that inferences and conclusions can safely extend from the sample to the population
as a whole. A major problem lies in determining the extent that the sample chosen is actually representative. Statistics
oers methods to estimate and correct for any bias within the sample and data collection procedures. There are also
methods of experimental design for experiments that can lessen these issues at the outset of a study, strengthening its
capability to discern truths about the population.
Sampling theory is part of the mathematical discipline of probability theory. Probability is used in mathematical
statistics to study the sampling distributions of sample statistics and, more generally, the properties of statistical
procedures. The use of any statistical method is valid when the system or population under consideration satises the
assumptions of the method. The dierence in point of view between classic probability theory and sampling theory
is, roughly, that probability theory starts from the given parameters of a total population to deduce probabilities
that pertain to samples. Statistical inference, however, moves in the opposite directioninductively inferring from
samples to the parameters of a larger or total population.
1.3.2
A common goal for a statistical research project is to investigate causality, and in particular to draw a conclusion
on the eect of changes in the values of predictors or independent variables on dependent variables. There are two
major types of causal statistical studies: experimental studies and observational studies. In both types of studies, the
eect of dierences of an independent variable (or variables) on the behavior of the dependent variable are observed.
The dierence between the two types lies in how the study is actually conducted. Each can be very eective. An
experimental study involves taking measurements of the system under study, manipulating the system, and then
taking additional measurements using the same procedure to determine if the manipulation has modied the values
of the measurements. In contrast, an observational study does not involve experimental manipulation. Instead, data
are gathered and correlations between predictors and response are investigated. While the tools of data analysis
work best on data from randomized studies, they are also applied to other kinds of data like natural experiments
and observational studies[12] for which a statistician would use a modied, more structured estimation method
(e.g., Dierence in dierences estimation and instrumental variables, among many others) that produce consistent
estimators.
Experiments
The basic steps of a statistical experiment are:
1. Planning the research, including nding the number of replicates of the study, using the following information: preliminary estimates regarding the size of treatment eects, alternative hypotheses, and the estimated
experimental variability. Consideration of the selection of experimental subjects and the ethics of research
is necessary. Statisticians recommend that experiments compare (at least) one new treatment with a standard
treatment or control, to allow an unbiased estimate of the dierence in treatment eects.
2. Design of experiments, using blocking to reduce the inuence of confounding variables, and randomized assignment of treatments to subjects to allow unbiased estimates of treatment eects and experimental error. At
this stage, the experimenters and statisticians write the experimental protocol that will guide the performance
of the experiment and which species the primary analysis of the experimental data.
3. Performing the experiment following the experimental protocol and analyzing the data following the experimental protocol.
4. Further examining the data set in secondary analyses, to suggest new hypotheses for future study.
5. Documenting and presenting the results of the study.
Experiments on human behavior have special concerns. The famous Hawthorne study examined changes to the
working environment at the Hawthorne plant of the Western Electric Company. The researchers were interested
in determining whether increased illumination would increase the productivity of the assembly line workers. The
researchers rst measured the productivity in the plant, then modied the illumination in an area of the plant and
checked if the changes in illumination aected productivity. It turned out that productivity indeed improved (under
the experimental conditions). However, the study is heavily criticized today for errors in experimental procedures,
specically for the lack of a control group and blindness. The Hawthorne eect refers to nding that an outcome
(in this case, worker productivity) changed due to observation itself. Those in the Hawthorne study became more
productive not because the lighting was changed but because they were being observed.[13]
Observational study
An example of an observational study is one that explores the association between smoking and lung cancer. This
type of study typically uses a survey to collect observations about the area of interest and then performs statistical
analysis. In this case, the researchers would collect observations of both smokers and non-smokers, perhaps through
a case-control study, and then look for the number of cases of lung cancer in each group.
Consider independent identically distributed (IID) random variables with a given probability distribution: standard
statistical inference and estimation theory denes a random sample as the random vector given by the column vector
CHAPTER 1. STATISTICS
of these IID variables.[19] The population being examined is described by a probability distribution that may have
unknown parameters.
A statistic is a random variable that is a function of the random sample, but not a function of unknown parameters.
The probability distribution of the statistic, though, may have unknown parameters.
Consider now a function of the unknown parameter: an estimator is a statistic used to estimate such function. Commonly used estimators include sample mean, unbiased sample variance and sample covariance.
A random variable that is a function of the random sample and of the unknown parameter, but whose probability
distribution does not depend on the unknown parameter is called a pivotal quantity or pivot. Widely used pivots include
the z-score, the chi square statistic and Students t-value.
Between two estimators of a given parameter, the one with lower mean squared error is said to be more ecient.
Furthermore, an estimator is said to be unbiased if its expected value is equal to the true value of the unknown
parameter being estimated, and asymptotically unbiased if its expected value converges at the limit to the true value
of such parameter.
Other desirable properties for estimators include: UMVUE estimators that have the lowest variance for all possible
values of the parameter to be estimated (this is usually an easier property to verify than eciency) and consistent
estimators which converges in probability to the true value of such parameter.
This still leaves the question of how to obtain estimators in a given situation and carry the computation, several
methods have been proposed: the method of moments, the maximum likelihood method, the least squares method
and the more recent method of estimating equations.
1.5.2
Interpretation of statistical information can often involve the development of a null hypothesis which is usually (but
not necessarily) that no relationship exists among variables or that no change occurred over time.[20][21]
The best illustration for a novice is the predicament encountered by a criminal trial. The null hypothesis, H0 , asserts
that the defendant is innocent, whereas the alternative hypothesis, H1 , asserts that the defendant is guilty. The indictment comes because of suspicion of the guilt. The H0 (status quo) stands in opposition to H1 and is maintained
unless H1 is supported by evidence beyond a reasonable doubt. However, failure to reject H0 " in this case does
not imply innocence, but merely that the evidence was insucient to convict. So the jury does not necessarily accept
H0 but fails to reject H0 . While one can not prove a null hypothesis, one can test how close it is to being true with
a power test, which tests for type II errors.
What statisticians call an alternative hypothesis is simply an hypothesis that contradicts the null hypothesis.
1.5.3
Error
Working from a null hypothesis, two basic forms of error are recognized:
Type I errors where the null hypothesis is falsely rejected giving a false positive.
Type II errors where the null hypothesis fails to be rejected and an actual dierence between populations is
missed giving a false negative.
Standard deviation refers to the extent to which individual observations in a sample dier from a central value, such
as the sample or population mean, while Standard error refers to an estimate of dierence between sample mean and
population mean.
A statistical error is the amount by which an observation diers from its expected value, a residual is the amount
an observation diers from the value the estimator of the expected value assumes on a given sample (also called
prediction).
Mean squared error is used for obtaining ecient estimators, a widely used class of estimators. Root mean square
error is simply the square root of mean squared error.
Many statistical methods seek to minimize the residual sum of squares, and these are called "methods of least squares"
in contrast to Least absolute deviations. The latter gives equal weight to small and big errors, while the former gives
A least squares t: in red the points to be tted, in blue the tted line.
more weight to large errors. Residual sum of squares is also dierentiable, which provides a handy property for doing
regression. Least squares applied to linear regression is called ordinary least squares method and least squares applied
to nonlinear regression is called non-linear least squares. Also in a linear regression model the non deterministic part
of the model is called error term, disturbance or more simply noise. Both linear regression and non-linear regression
are addressed in polynomial least squares, which also describes the variance in a prediction of the dependent variable
(y axis) as a function of the independent variable (x axis) and the deviations (errors, noise, disturbances) from the
estimated (tted) curve.
Measurement processes that generate statistical data are also subject to error. Many of these errors are classied as
random (noise) or systematic (bias), but other types of errors (e.g., blunder, such as when an analyst reports incorrect
units) can also be important. The presence of missing data and/or censoring may result in biased estimates and
specic techniques have been developed to address these problems.[22]
1.5.4
Interval estimation
CHAPTER 1. STATISTICS
Condence intervals: the red line is true value for the mean in this example, the blue lines are random condence intervals for 100
realizations.
interval is 95%. From the frequentist perspective, such a claim does not even make sense, as the true value is not a
random variable. Either the true value is or is not within the given interval. However, it is true that, before any data
are sampled and given a plan for how to construct the condence interval, the probability is 95% that the yet-to-becalculated interval will cover the true value: at this point, the limits of the interval are yet-to-be-observed random
variables. One approach that does yield an interval that can be interpreted as having a given probability of containing
the true value is to use a credible interval from Bayesian statistics: this approach depends on a dierent way of
interpreting what is meant by probability, that is as a Bayesian probability.
In principle condence intervals can be symmetrical or asymmetrical. An interval can be asymmetrical because it
works as lower or upper bound for a parameter (left-sided interval or right sided interval), but it can also be asymmetrical because the two sided interval is built violating symmetry around the estimate. Sometimes the bounds for a
condence interval are reached asymptotically and these are used to approximate the true bounds.
1.5.5
Signicance
Important:
Pr (observation | hypothesis) Pr (hypothesis | observation)
The probability of observing a result given that some hypothesis
is true is not equivalent to the probability that a hypothesis is true
given that some result has been observed.
Using the p-value as a score is committing an egregious logical error:
the transposed conditional fallacy.
Probability density
P-value
Very un-likely
observations
Very un-likely
observations
Observed
data point
Set of possible results
level to include the p-value when reporting whether a hypothesis is rejected or accepted. The p-value, however,
does not indicate the size or importance of the observed eect and can also seem to exaggerate the importance
of minor dierences in large studies. A better and increasingly common approach is to report condence
intervals. Although these are produced from the same calculations as those of hypothesis tests or p-values,
they describe both the size of the eect and the uncertainty surrounding it.
Fallacy of the transposed conditional, aka prosecutors fallacy: criticisms arise because the hypothesis testing
approach forces one hypothesis (the null hypothesis) to be favored, since what is being evaluated is probability
of the observed result given the null hypothesis and not probability of the null hypothesis given the observed
result. An alternative to this approach is oered by Bayesian inference, although it requires establishing a prior
probability.[23]
Rejecting the null hypothesis does not automatically prove the alternative hypothesis.
As everything in inferential statistics it relies on sample size, and therefore under fat tails p-values may be
seriously mis-computed.
10
CHAPTER 1. STATISTICS
1.5.6
Examples
1.6 Misuse
Main article: Misuse of statistics
Misuse of statistics can produce subtle, but serious errors in description and interpretationsubtle in the sense that
even experienced professionals make such errors, and serious in the sense that they can lead to devastating decision
errors. For instance, social policy, medical practice, and the reliability of structures like bridges all rely on the proper
use of statistics.
Even when statistical techniques are correctly applied, the results can be dicult to interpret for those lacking expertise. The statistical signicance of a trend in the datawhich measures the extent to which a trend could be caused
by random variation in the samplemay or may not agree with an intuitive sense of its signicance. The set of basic
statistical skills (and skepticism) that people need to deal with information in their everyday lives properly is referred
to as statistical literacy.
There is a general perception that statistical knowledge is all-too-frequently intentionally misused by nding ways to
interpret only the data that are favorable to the presenter.[24] A mistrust and misunderstanding of statistics is associated
with the quotation, "There are three kinds of lies: lies, damned lies, and statistics". Misuse of statistics can be both
inadvertent and intentional, and the book How to Lie with Statistics[24] outlines a range of considerations. In an attempt
to shed light on the use and misuse of statistics, reviews of statistical techniques used in particular elds are conducted
(e.g. Warne, Lazo, Ramos, and Ritter (2012)).[25]
Ways to avoid misuse of statistics include using proper diagrams and avoiding bias.[26] Misuse can occur when conclusions are overgeneralized and claimed to be representative of more than they really are, often by either deliberately
or unconsciously overlooking sampling bias.[27] Bar graphs are arguably the easiest diagrams to use and understand,
and they can be made either by hand or with simple computer programs.[26] Unfortunately, most people do not look
for bias or errors, so they are not noticed. Thus, people may often believe that something is true even if it is not well
represented.[27] To make data gathered from statistics believable and accurate, the sample taken must be representative of the whole.[28] According to Hu, The dependability of a sample can be destroyed by [bias]... allow yourself
some degree of skepticism.[29]
To assist in the understanding of statistics Hu proposed a series of questions to be asked in each case:[30]
Who says so? (Does he/she have an axe to grind?)
11
How does he/she know? (Does he/she have the resources to know the facts?)
Whats missing? (Does he/she give us a complete picture?)
Did someone change the subject? (Does he/she oer us the right answer to the wrong problem?)
Does it make sense? (Is his/her conclusion logical and consistent with what we already know?)
The confounding variable problem: X and Y may be correlated, not because there is causal relationship between them, but because
both depend on a third variable Z. Z is called a confounding factor.
1.6.1
Misinterpretation: correlation
The concept of correlation is particularly noteworthy for the potential confusion it can cause. Statistical analysis of
a data set often reveals that two variables (properties) of the population under consideration tend to vary together,
as if they were connected. For example, a study of annual income that also looks at age of death might nd that
poor people tend to have shorter lives than auent people. The two variables are said to be correlated; however,
they may or may not be the cause of one another. The correlation phenomena could be caused by a third, previously
unconsidered phenomenon, called a lurking variable or confounding variable. For this reason, there is no way to
immediately infer the existence of a causal relationship between the two variables. (See Correlation does not imply
causation.)
12
CHAPTER 1. STATISTICS
Its mathematical foundations were laid in the 17th century with the development of the probability theory by Gerolamo
Cardano, Blaise Pascal and Pierre de Fermat. Mathematical probability theory arose from the study of games of
chance, although the concept of probability was already examined in medieval law and by philosophers such as Juan
Caramuel.[32] The method of least squares was rst described by Adrien-Marie Legendre in 1805.
The modern eld of statistics emerged in the late 19th and early 20th century in three stages.[33] The rst wave,
at the turn of the century, was led by the work of Francis Galton and Karl Pearson, who transformed statistics
into a rigorous mathematical discipline used for analysis, not just in science, but in industry and politics as well.
13
Galtons contributions included introducing the concepts of standard deviation, correlation, regression analysis and the
application of these methods to the study of the variety of human characteristics height, weight, eyelash length among
others.[34] Pearson developed the Pearson product-moment correlation coecient, dened as a product-moment,[35]
the method of moments for the tting of distributions to samples and the Pearson distribution, among many other
things.[36] Galton and Pearson founded Biometrika as the rst journal of mathematical statistics and biostatistics
(then called biometry), and the latter founded the worlds rst university statistics department at University College
London.[37]
Ronald Fisher coined the term null hypothesis during the Lady tasting tea experiment, which is never proved or
established, but is possibly disproved, in the course of experimentation.[38][39]
The second wave of the 1910s and 20s was initiated by William Gosset, and reached its culmination in the insights
of Ronald Fisher, who wrote the textbooks that were to dene the academic discipline in universities around the
world. Fishers most important publications were his 1918 seminal paper The Correlation between Relatives on the
Supposition of Mendelian Inheritance, which was the rst to use the statistical term, variance, his classic 1925 work
Statistical Methods for Research Workers and his 1935 The Design of Experiments,[40][41][42][43] where he developed
rigorous design of experiments models. He originated the concepts of suciency, ancillary statistics, Fishers linear
discriminator and Fisher information.[44] In his 1930 book The Genetical Theory of Natural Selection he applied
statistics to various biological concepts such as Fishers principle[45] ). Nevertheless, A. W. F. Edwards has remarked
that it is probably the most celebrated argument in evolutionary biology".[45] (about the sex ratio), the Fisherian
runaway,[46][47][48][49][50][51] a concept in sexual selection about a positive feedback runaway aect found in evolution.
The nal wave, which mainly saw the renement and expansion of earlier developments, emerged from the collaborative work between Egon Pearson and Jerzy Neyman in the 1930s. They introduced the concepts of "Type II"
error, power of a test and condence intervals. Jerzy Neyman in 1934 showed that stratied random sampling was
in general a better method of estimation than purposive (quota) sampling.[52]
Today, statistical methods are applied in all elds that involve decision making, for making accurate inferences from
a collated body of data and for making decisions in the face of uncertainty based on statistical methodology. The use
of modern computers has expedited large-scale statistical computations, and has also made possible new methods
that are impractical to perform manually. Statistics continues to be an area of active research, for example on the
14
CHAPTER 1. STATISTICS
1.8 Applications
1.8.1
Applied statistics comprises descriptive statistics and the application of inferential statistics.[54][55] Theoretical statistics concerns both the logical arguments underlying justication of approaches to statistical inference, as well encompassing mathematical statistics. Mathematical statistics includes not only the manipulation of probability distributions
necessary for deriving results related to methods of estimation and inference, but also various aspects of computational
statistics and the design of experiments.
1.8.2
There are two applications for machine learning and data mining: data management and data analysis. Statistics tools
are necessary for the data analysis.
1.8.3
Statistics in society
Statistics is applicable to a wide variety of academic disciplines, including natural and social sciences, government,
and business. Statistical consultants can help organizations and companies that don't have in-house expertise relevant
to their particular questions.
1.8.4
Statistical computing
1.8.5
Traditionally, statistics was concerned with drawing inferences using a semi-standardized methodology that was required learning in most sciences. This has changed with use of statistics in non-inferential contexts. What was
once considered a dry subject, taken in many elds as a degree-requirement, is now viewed enthusiastically. Initially
derided by some mathematical purists, it is now considered essential methodology in certain areas.
In number theory, scatter plots of data generated by a distribution function may be transformed with familiar
tools used in statistics to reveal underlying patterns, which may then lead to hypotheses.
Methods of statistics including predictive methods in forecasting are combined with chaos theory and fractal
geometry to create video works that are considered to have great beauty.
The process art of Jackson Pollock relied on artistic experiments whereby underlying distributions in nature
were artistically revealed. With the advent of computers, statistical methods were applied to formalize such
distribution-driven natural processes to make and analyze moving video art.
15
Methods of statistics may be used predicatively in performance art, as in a card trick based on a Markov process
that only works some of the time, the occasion of which can be predicted using statistical methodology.
Statistics can be used to predicatively create art, as in the statistical or stochastic music invented by Iannis
Xenakis, where the music is performance-specic. Though this type of artistry does not always come out as
expected, it does behave in ways that are predictable and tunable using statistics.
16
CHAPTER 1. STATISTICS
Chemometrics (for analysis of data from chemistry)
Data mining (applying statistics and pattern recognition to discover knowledge from data)
Data science
Demography
Econometrics (statistical analysis of economic data)
Energy statistics
Engineering statistics
Epidemiology (statistical analysis of disease)
Geography and Geographic Information Systems, specically in Spatial analysis
Image processing
Medical Statistics
Psychological statistics
Reliability engineering
Social statistics
Statistical Mechanics
In addition, there are particular types of statistical analysis that have also developed their own specialised terminology
and methodology:
Bootstrap / Jackknife resampling
Multivariate statistics
Statistical classication
Structured data analysis (statistics)
Structural equation modelling
Survey methodology
Survival analysis
Statistics in various sports, particularly baseball - known as Sabermetrics - and cricket
Statistics form a key basis tool in business and manufacturing as well. It is used to understand measurement systems
variability, control processes (as in statistical process control or SPC), for summarizing data, and to make data-driven
decisions. In these roles, it is a key tool, and perhaps the only reliable tool.
1.11. REFERENCES
17
1.11 References
[1] Dodge, Y. (2006) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9
[2] Denition of STATISTICS. www.merriam-webster.com. Retrieved 2016-05-28.
[3] Essay on Statistics: Meaning and Denition of Statistics. Economics Discussion. 2014-12-02. Retrieved 2016-05-28.
[4] Lund Research Ltd. Descriptive and Inferential Statistics. statistics.laerd.com. Retrieved 2014-03-23.
[5] What Is the Dierence Between Type I and Type II Hypothesis Testing Errors?". About.com Education. Retrieved 201511-27.
[6] Moses, Lincoln E. (1986) Think and Explain with Statistics, Addison-Wesley, ISBN 978-0-201-15619-5 . pp. 13
[7] Hays, William Lee, (1973) Statistics for the Social Sciences, Holt, Rinehart and Winston, p.xii, ISBN 978-0-03-077945-9
[8] Moore, David (1992). Teaching Statistics as a Respectable Subject. In F. Gordon and S. Gordon. Statistics for the
Twenty-First Century. Washington, DC: The Mathematical Association of America. pp. 1425. ISBN 978-0-88385-0787.
[9] Chance, Beth L.; Rossman, Allan J. (2005). Preface. Investigating Statistical Concepts, Applications, and Methods (PDF).
Duxbury Press. ISBN 978-0-495-05064-3.
[10] Lakshmikantham,, ed. by D. Kannan,... V. (2002). Handbook of stochastic analysis and applications. New York: M.
Dekker. ISBN 0824706609.
[11] Schervish, Mark J. (1995). Theory of statistics (Corr. 2nd print. ed.). New York: Springer. ISBN 0387945466.
[12] Freedman, D.A. (2005) Statistical Models: Theory and Practice, Cambridge University Press. ISBN 978-0-521-67105-7
[13] McCarney R, Warner J, Ilie S, van Haselen R, Grin M, Fisher P (2007). The Hawthorne Eect: a randomised,
controlled trial. BMC Med Res Methodol. 7 (1): 30. doi:10.1186/1471-2288-7-30. PMC 1936999 . PMID 17608932.
[14] Mosteller, F., & Tukey, J. W. (1977). Data analysis and regression. Boston: Addison-Wesley.
[15] Nelder, J. A. (1990). The knowledge needed to computerise the analysis and interpretation of statistical information. In
Expert systems and articial intelligence: the need for information about data. Library Association Report, London, March,
2327.
[16] Chrisman, Nicholas R (1998). Rethinking Levels of Measurement for Cartography. Cartography and Geographic Information Science. 25 (4): 231242. doi:10.1559/152304098782383043.
18
CHAPTER 1. STATISTICS
[17] van den Berg, G. (1991). Choosing an analysis method. Leiden: DSWO Press
[18] Hand, D. J. (2004). Measurement theory and practice: The world through quantication. London, UK: Arnold.
[19] Piazza Elio, Probabilit e Statistica, Esculapio 2007
[20] Everitt, Brian (1998). The Cambridge Dictionary of Statistics. Cambridge, UK New York: Cambridge University Press.
ISBN 0521593468.
[21] http://www.yourstatsguru.com/epar/rp-reviewed/cohen1994/
[22] Rubin, Donald B.; Little, Roderick J. A.,Statistical analysis with missing data, New York: Wiley 2002
[23] Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLoS Medicine. 2 (8): e124. doi:10.1371/journal.pmed.0020124
PMC 1182327 . PMID 16060722.
[24] Hu, Darrell (1954) How to Lie with Statistics, WW Norton & Company, Inc. New York, NY. ISBN 0-393-31072-8
[25] Warne, R. Lazo; Ramos, T.; Ritter, N. (2012). Statistical Methods Used in Gifted Education Journals, 20062010.
Gifted Child Quarterly. 56 (3): 134149. doi:10.1177/0016986212444122.
[26] Drennan, Robert D. (2008). Statistics in archaeology. In Pearsall, Deborah M. Encyclopedia of Archaeology. Elsevier
Inc. pp. 20932100. ISBN 978-0-12-373962-9.
[27] Cohen, Jerome B. (December 1938). Misuse of Statistics. Journal of the American Statistical Association. JSTOR. 33
(204): 657674. doi:10.1080/01621459.1938.10502344.
[28] Freund, J. E. (1988). Modern Elementary Statistics. Credo Reference.
[29] Hu, Darrell; Irving Geis (1954). How to Lie with Statistics. New York: Norton. The dependability of a sample can be
destroyed by [bias]... allow yourself some degree of skepticism.
[30] Hu, Darrell; Irving Geis (1954). How to Lie with Statistics. New York: Norton.
[31] Willcox, Walter (1938) The Founder of Statistics. Review of the International Statistical Institute 5(4):321328. JSTOR
1400906
[32] J. Franklin, The Science of Conjecture: Evidence and Probability before Pascal,Johns Hopkins Univ Pr 2002
[33] Helen Mary Walker (1975). Studies in the history of statistical method. Arno Press.
[34] Galton, F (1877). Typical laws of heredity. Nature. 15: 492553. doi:10.1038/015492a0.
[35] Stigler, S. M. (1989). Francis Galtons Account of the Invention of Correlation. Statistical Science. 4 (2): 7379.
doi:10.1214/ss/1177012580.
[36] Pearson, K. (1900). On the Criterion that a given System of Deviations from the Probable in the Case of a Correlated
System of Variables is such that it can be reasonably supposed to have arisen from Random Sampling. Philosophical
Magazine Series 5. 50 (302): 157175. doi:10.1080/14786440009463897.
[37] Karl Pearson (18571936)". Department of Statistical Science University College London.
[38] Fisher|1971|loc=Chapter II. The Principles of Experimentation, Illustrated by a Psycho-physical Experiment, Section 8.
The Null Hypothesis
[39] OED quote: 1935 R. A. Fisher, The Design of Experiments ii. 19, We may speak of this hypothesis as the 'null hypothesis,
and it should be noted that the null hypothesis is never proved or established, but is possibly disproved, in the course of
experimentation.
[40] Stanley, J. C. (1966). The Inuence of Fishers The Design of Experiments on Educational Research Thirty Years
Later. American Educational Research Journal. 3 (3): 223. doi:10.3102/00028312003003223.
[41] Box, JF (February 1980). R. A. Fisher and the Design of Experiments, 1922-1926. The American Statistician. 34 (1):
17. doi:10.2307/2682986. JSTOR 2682986.
[42] Yates, F (June 1964). Sir Ronald Fisher and the Design of Experiments. Biometrics. 20 (2): 307321. doi:10.2307/2528399.
JSTOR 2528399.
[43] Stanley, Julian C. (1966). The Inuence of Fishers The Design of Experiments on Educational Research Thirty Years
Later. American Educational Research Journal. 3 (3): 223229. doi:10.3102/00028312003003223. JSTOR 1161806.
19
[44] Agresti, Alan; David B. Hichcock (2005). Bayesian Inference for Categorical Data Analysis (PDF). Statistical Methods
& Applications. 14 (14): 298. doi:10.1007/s10260-005-0121-y.
[45] Edwards, A.W.F. (1998). Natural Selection and the Sex Ratio: Fishers Sources. American Naturalist. 151 (6): 564569.
doi:10.1086/286141. PMID 18811377.
[46] Fisher, R.A. (1915) The evolution of sexual preference. Eugenics Review (7) 184:192
[47] Fisher, R.A. (1930) The Genetical Theory of Natural Selection. ISBN 0-19-850440-3
[48] Edwards, A.W.F. (2000) Perspectives: Anecdotal, Historial and Critical Commentaries on Genetics. The Genetics Society
of America (154) 1419:1426
[49] Andersson, M. (1994) Sexual selection. ISBN 0-691-00057-3
[50] Andersson, M. and Simmons, L.W. (2006) Sexual selection and mate choice. Trends, Ecology and Evolution (21) 296:302
[51] Gayon, J. (2010) Sexual selection: Another Darwinian process. Comptes Rendus Biologies (333) 134:144
[52] Neyman, J (1934). On the two dierent aspects of the representative method: The method of stratied sampling and the
method of purposive selection. Journal of the Royal Statistical Society. 97 (4): 557625. JSTOR 2342192.
[53] Science in a Complex World - Big Data: Opportunity or Threat?". Santa Fe Institute.
[54] Nikoletseas, M. M. (2014) Statistics: Concepts and Examples. ISBN 978-1500815684
[55] Anderson, D.R.; Sweeney, D.J.; Williams, T.A. (1994) Introduction to Statistics: Concepts and Applications, pp. 59. West
Group. ISBN 978-0-314-03309-3
Chapter 2
Portal:Statistics
Topics Culture
Geography
Health
History
Mathematics
Nature
People
Philosophy
Religion
Society
Technology
What are portals?
List of portals
Featured portals
20
Chapter 3
22
Chapter 4
Business analytics
Not to be confused with Business analysis.
Business analytics (BA) refers to the skills, technologies, practices for continuous iterative exploration and investigation of past business performance to gain insight and drive business planning.[1] Business analytics focuses on
developing new insights and understanding of business performance based on data and statistical methods. In contrast, business intelligence traditionally focuses on using a consistent set of metrics to both measure past performance
and guide business planning, which is also based on data and statistical methods.(citation needed)
Business analytics makes extensive use of statistical analysis, including explanatory and predictive modeling,[2] and
fact-based management to drive decision making. It is therefore closely related to management science. Analytics
may be used as input for human decisions or may drive fully automated decisions. Business intelligence is querying,
reporting, online analytical processing (OLAP), and alerts.
In other words, querying, reporting, OLAP, and alert tools can answer questions such as what happened, how many,
how often, where the problem is, and what actions are needed. Business analytics can answer questions like why is
this happening, what if these trends continue, what will happen next (that is, predict), what is the best that can happen
(that is, optimize).[3]
24
4.4 History
Analytics have been used in business since the management exercises were put into place by Frederick Winslow
Taylor in the late 19th century. Henry Ford measured the time of each component in his newly established assembly
line. But analytics began to command more attention in the late 1960s when computers were used in decision support
systems. Since then, analytics have changed and formed with the development of enterprise resource planning (ERP)
systems, data warehouses, and a large number of other software tools and processes.[3]
In later years the business analytics have exploded with the introduction to computers. This change has brought
analytics to a whole new level and has made the possibilities endless. As far as analytics has come in history, and
what the current eld of analytics is today many people would never think that analytics started in the early 1900s
with Mr. Ford himself.
4.5 Challenges
Business analytics depends on sucient volumes of high quality data. The diculty in ensuring data quality is integrating and reconciling data across dierent systems, and then deciding what subsets of data to make available.[3]
Previously, analytics was considered a type of after-the-fact method of forecasting consumer behavior by examining
the number of units sold in the last quarter or the last year. This type of data warehousing required a lot more storage
space than it did speed. Now business analytics is becoming a tool that can inuence the outcome of customer
interactions.[5] When a specic customer type is considering a purchase, an analytics-enabled enterprise can modify
the sales pitch to appeal to that consumer. This means the storage space for all that data must react extremely fast to
provide the necessary data in real-time.
25
4.8 References
[1] Beller, Michael J.; Alan Barnett (2009-06-18). Next Generation Business Analytics. Lightship Partners LLC. Retrieved
2009-06-20.
[2] Galit Schmueli and Otto Koppius. Predictive vs. Explanatory Modeling in IS Research (PDF).
[3] Davenport, Thomas H.; Harris, Jeanne G. (2007). Competing on analytics : the new science of winning. Boston, Mass.:
Harvard Business School Press. ISBN 978-1-4221-0332-6.
[4] Analytics List. Retrieved 3 April 2015.
[5] Choosing the Best Storage for Business Analytics. Dell.com. Retrieved 2012-06-25.
26
Chapter 5
Descriptive statistics
Descriptive statistics are statistics that quantitatively describe or summarise features of a collection of information.[1]
Descriptive statistics are distinguished from inferential statistics (or inductive statistics), in that descriptive
statistics aim to summarize a sample, rather than use the data to learn about the population that the sample
of data is thought to represent. This generally means that descriptive statistics, unlike inferential statistics, are
not developed on the basis of probability theory.[2] Even when a data analysis draws its main conclusions using
inferential statistics, descriptive statistics are generally also presented. For example in papers reporting on human subjects, typically a table is included giving the overall sample size, sample sizes in important subgroups
(e.g., for each treatment or exposure group), and demographic or clinical characteristics such as the average
age, the proportion of subjects of each sex, the proportion of subjects with related comorbidities etc.
Some measures that are commonly used to describe a data set are measures of central tendency and measures of
variability or dispersion. Measures of central tendency include the mean, median and mode, while measures of
variability include the standard deviation (or variance), the minimum and maximum values of the variables, kurtosis
and skewness.[3]
5.1.1
Univariate analysis
Univariate analysis involves describing the distribution of a single variable, including its central tendency (including
the mean, median, and mode) and dispersion (including the range and quantiles of the data-set, and measures of
27
28
spread such as the variance and standard deviation). The shape of the distribution may also be described via indices
such as skewness and kurtosis. Characteristics of a variables distribution may also be depicted in graphical or tabular
format, including histograms and stem-and-leaf display.
5.1.2
Bivariate analysis
When a sample consists of more than one variable, descriptive statistics may be used to describe the relationship
between pairs of variables. In this case, descriptive statistics include:
Cross-tabulations and contingency tables
Graphical representation via scatterplots
Quantitative measures of dependence
Descriptions of conditional distributions
The main reason for dierentiating univariate and bivariate analysis is that bivariate analysis is not only simple descriptive analysis, but also it describes the relationship between two dierent variables.[5] Quantitative measures of
dependence include correlation (such as Pearsons r when both variables are continuous, or Spearmans rho if one or
both are not) and covariance (which reects the scale variables are measured on). The slope, in regression analysis,
also reects the relationship between variables. The unstandardised slope indicates the unit change in the criterion
variable for a one unit change in the predictor. The standardised slope indicates this change in standardised (zscore) units. Highly skewed data are often transformed by taking logarithms. Use of logarithms makes graphs more
symmetrical and look more similar to the normal distribution, making them easier to interpret intuitively.[6]:47
5.2 References
[1] Mann, Prem S. (1995). Introductory Statistics (2nd ed.). Wiley. ISBN 0-471-31009-3.
[2] Dodge, Y. (2003). The Oxford Dictionary of Statistical Terms. OUP. ISBN 0-19-850994-4.
[3] Investopedia, Descriptive Statistics Terms
[4] Trochim, William M. K. (2006). Descriptive statistics. Research Methods Knowledge Base. Retrieved 14 March 2011.
[5] Babbie, Earl R. (2009). The Practice of Social Research (12th ed.). Wadsworth. pp. 436440. ISBN 0-495-59841-0.
[6] Nick, Todd G. (2007). Descriptive Statistics. Topics in Biostatistics. Methods in Molecular Biology. 404. New York:
Springer. pp. 3352. doi:10.1007/978-1-59745-530-5_3. ISBN 978-1-58829-531-6.
Chapter 6
Quality control
This article is about the project management process. For other uses, see Quality control (disambiguation).
Quality control, or QC for short, is a process by which entities review the quality of all factors involved in production.
Quality inspector in a Volkseigener Betrieb sewing machine parts factory in Dresden, East Germany, 1977.
29
30
ISO 9000 denes quality control as A part of quality management focused on fullling quality requirements.[1]
This approach places an emphasis on three aspects:
1. Elements such as controls, job management, dened and well managed processes,[2][3] performance and integrity criteria, and identication of records
2. Competence, such as knowledge, skills, experience, and qualications
3. Soft elements, such as personnel, integrity, condence, organizational culture, motivation, team spirit, and
quality relationships.
Controls include product inspection, where every product is examined visually, and often using a stereo microscope
for ne detail before the product is sold into the external market. Inspectors will be provided with lists and descriptions
of unacceptable product defects such as cracks or surface blemishes for example.
The quality of the outputs is at risk if any of these three aspects is decient in any way.
Quality control emphasizes testing of products to uncover defects and reporting to management who make the decision to allow or deny product release, whereas quality assurance attempts to improve and stabilize production (and
associated processes) to avoid, or at least minimize, issues which led to the defect(s) in the rst place. For contract
work, particularly work awarded by government agencies, quality control issues are among the top reasons for not
renewing a contract.[4]
6.4. REFERENCES
31
6.4 References
This article incorporates public domain material from the General Services Administration document Federal
Standard 1037C (in support of MIL-STD-188).
[1] ISO 9000:2005, Clause 3.2.10
[2] Dennis Adsit (November 9, 2007). What the Call Center Industry Can Learn from Manufacturing: Part I (PDF). National
Association of Call Centers. Retrieved 21 December 2012.
[3] Dennis Adsit (November 23, 2007). What the Call Center Industry Can Learn from Manufacturing: Part II (PDF).
National Association of Call Centers. Retrieved 21 December 2012.
[4] Position Classication Standard for Quality Assurance Series, GS-1910 (PDF). US Oce of Personnel Management.
March 1983. Retrieved 21 December 2012.
[5] Juran, Joseph M., ed. (1995), A History of Managing for Quality: The Evolution, Trends, and Future Directions of Managing
for Quality, Milwaukee, Wisconsin: The American Society for Quality Control, ISBN 9780873893411, OCLC 32394752
[6] Feigenbaum, Armand V. (1956). Total Quality Control. Harvard Business Review. Cambridge, Massachusetts: Harvard
University Press. 34 (6): 93101. ISSN 0017-8012. OCLC 1751795.
[7] Ishikawa, Kaoru (1985), What Is Total Quality Control? The Japanese Way (1 ed.), Englewood Clis, New Jersey: PrenticeHall, pp. 9091, ISBN 978-0-13-952433-2, OCLC 11467749
[8] Evans, James R.; Lindsay, William M. (1999), The Management and Control of Quality (4 ed.), Cincinnati, Ohio: SouthWestern College Publications, p. 118, ISBN 9780538882422, OCLC 38475486, The term total quality management,
or TQM, has been commonly used to denote the system of managing for total quality. (The term TQM was actually
developed within the Department of Defense. It has since been renamed Total Quality Leadership, since leadership outranks
management in military thought.)
[9] What Is Six Sigma?" (PDF). http://www.motorolasolutions.com. Schaumburg, Illinois: Motorola University. 2010-0219. p. 2. Retrieved 2013-11-24. When practiced as a management system, Six Sigma is a high performance system for
executing business strategy. External link in |website= (help)
[10] Phillips, Joseph (November 2008). Quality Control in Project Management. The Project Management Hut. Retrieved
21 December 2012.
Chapter 7
Operations research
For the academic journal, see Operations Research.
Operations research, or operational research in British usage, is a discipline that deals with the application of
advanced analytical methods to help make better decisions.[1] Further, the term 'operational analysis is used in the
British (and some British Commonwealth) military, as an intrinsic part of capability development, management and
assurance. In particular, operational analysis forms part of the Combined Operational Eectiveness and Investment
Appraisals (COEIA), which support British defence capability acquisition decision-making.
It is often considered to be a sub-eld of mathematics.[2] The terms management science and decision science are
sometimes used as synonyms.[3]
Employing techniques from other mathematical sciences, such as mathematical modeling, statistical analysis, and
mathematical optimization, operations research arrives at optimal or near-optimal solutions to complex decisionmaking problems. Because of its emphasis on human-technology interaction and because of its focus on practical
applications, operations research has overlap with other disciplines, notably industrial engineering and operations
management, and draws on psychology and organization science. Operations research is often concerned with determining the maximum (of prot, performance, or yield) or minimum (of loss, risk, or cost) of some real-world
objective. Originating in military eorts before World War II, its techniques have grown to concern problems in a
variety of industries.[4]
7.1 Overview
Operational research (OR) encompasses a wide range of problem-solving techniques and methods applied in the
pursuit of improved decision-making and eciency, such as simulation, mathematical optimization, queueing theory
and other stochastic-process models, Markov decision processes, econometric methods, data envelopment analysis,
neural networks, expert systems, decision analysis, and the analytic hierarchy process.[5] Nearly all of these techniques
involve the construction of mathematical models that attempt to describe the system. Because of the computational
and statistical nature of most of these elds, OR also has strong ties to computer science and analytics. Operational
researchers faced with a new problem must determine which of these techniques are most appropriate given the nature
of the system, the goals for improvement, and constraints on time and computing power.
The major subdisciplines in modern operational research, as identied by the journal Operations Research,[6] are:
Computing and information technologies
Financial engineering
Manufacturing, service sciences, and supply chain management
Policy modeling and public sector work
Revenue management
Simulation
32
7.2. HISTORY
33
Stochastic models
Transportation
7.2 History
As a discipline, operational research originated in the eorts of military planners during World War I (convoy theory
and Lanchesters laws). In the decades after the two world wars, the techniques were more widely applied to problems in business, industry and society. Since that time, operational research has expanded into a eld widely used
in industries ranging from petrochemicals to airlines, nance, logistics, and government, moving to a focus on the
development of mathematical models that can be used to analyse and optimize complex systems, and has become an
area of active academic and industrial research.[4]
7.2.1
Historical origins
Early work in operational research was carried out by individuals such as Charles Babbage. His research into the cost
of transportation and sorting of mail led to Englands universal Penny Post in 1840, and studies into the dynamical
behaviour of railway vehicles in defence of the GWR's broad gauge.[7] Percy Bridgman brought operational research
to bear on problems in physics in the 1920s and would later attempt to extend these to the social sciences.[8]
Modern operational research originated at the Bawdsey Research Station in the UK in 1937 and was the result of an
initiative of the stations superintendent, A. P. Rowe. Rowe conceived the idea as a means to analyse and improve
the working of the UKs early warning radar system, Chain Home (CH). Initially, he analysed the operating of the
radar equipment and its communication networks, expanding later to include the operating personnels behaviour.
This revealed unappreciated limitations of the CH network and allowed remedial action to be taken.[9]
Scientists in the United Kingdom including Patrick Blackett (later Lord Blackett OM PRS), Cecil Gordon, Solly
Zuckerman, (later Baron Zuckerman OM, KCB, FRS), C. H. Waddington, Owen Wansbrough-Jones, Frank Yates,
Jacob Bronowski and Freeman Dyson, and in the United States with George Dantzig looked for ways to make better
decisions in such areas as logistics and training schedules
7.2.2
The modern eld of operational research arose during World War II. In the World War II era, operational research was
dened as a scientic method of providing executive departments with a quantitative basis for decisions regarding
the operations under their control.[10] Other names for it included operational analysis (UK Ministry of Defence
from 1962)[11] and quantitative management.[12]
During the Second World War close to 1,000 men and women in Britain were engaged in operational research. About
200 operational research scientists worked for the British Army.[13]
Patrick Blackett worked for several dierent organizations during the war. Early in the war while working for the
Royal Aircraft Establishment (RAE) he set up a team known as the Circus which helped to reduce the number of
anti-aircraft artillery rounds needed to shoot down an enemy aircraft from an average of over 20,000 at the start of
the Battle of Britain to 4,000 in 1941.[14]
In 1941 Blackett moved from the RAE to the Navy, after rst working with RAF Coastal Command, in 1941 and then
early in 1942 to the Admiralty.[15] Blacketts team at Coastal Commands Operational Research Section (CC-ORS)
included two future Nobel prize winners and many other people who went on to be pre-eminent in their elds.[16]
They undertook a number of crucial analyses that aided the war eort. Britain introduced the convoy system to reduce
shipping losses, but while the principle of using warships to accompany merchant ships was generally accepted, it was
unclear whether it was better for convoys to be small or large. Convoys travel at the speed of the slowest member,
so small convoys can travel faster. It was also argued that small convoys would be harder for German U-boats to
detect. On the other hand, large convoys could deploy more warships against an attacker. Blacketts sta showed that
the losses suered by convoys depended largely on the number of escort vessels present, rather than the size of the
convoy. Their conclusion was that a few large convoys are more defensible than many small ones.[17]
While performing an analysis of the methods used by RAF Coastal Command to hunt and destroy submarines, one
of the analysts asked what colour the aircraft were. As most of them were from Bomber Command they were painted
34
A Liberator in standard RAF green/dark earth/black night bomber nish as originally used by Coastal Command
black for night-time operations. At the suggestion of CC-ORS a test was run to see if that was the best colour to
camouage the aircraft for daytime operations in the grey North Atlantic skies. Tests showed that aircraft painted
white were on average not spotted until they were 20% closer than those painted black. This change indicated that
30% more submarines would be attacked and sunk for the same number of sightings.[18] As a result of these ndings
Coastal Command changed their aircraft to using white undersurfaces.
35
Other work by the CC-ORS indicated that on average if the trigger depth of aerial-delivered depth charges (DCs)
were changed from 100 feet to 25 feet, the kill ratios would go up. The reason was that if a U-boat saw an aircraft
only shortly before it arrived over the target then at 100 feet the charges would do no damage (because the U-boat
wouldn't have had time to descend as far as 100 feet), and if it saw the aircraft a long way from the target it had time
to alter course under water so the chances of it being within the 20-foot kill zone of the charges was small. It was
more ecient to attack those submarines close to the surface when the targets locations were better known than to
attempt their destruction at greater depths when their positions could only be guessed. Before the change of settings
from 100 feet to 25 feet, 1% of submerged U-boats were sunk and 14% damaged. After the change, 7% were sunk
and 11% damaged. (If submarines were caught on the surface, even if attacked shortly after submerging, the numbers
rose to 11% sunk and 15% damaged). Blackett observed there can be few cases where such a great operational gain
had been obtained by such a small and simple change of tactics.[19]
Bomber Commands Operational Research Section (BC-ORS), analysed a report of a survey carried out by RAF
Bomber Command. For the survey, Bomber Command inspected all bombers returning from bombing raids over
Germany over a particular period. All damage inicted by German air defences was noted and the recommendation
was given that armour be added in the most heavily damaged areas. This recommendation was not adopted because
the fact that the aircraft returned with these areas damaged indicated these areas were not vital, and adding armour
to non-vital areas where damage is acceptable negatively aects aircraft performance. Their suggestion to remove
some of the crew so that an aircraft loss would result in fewer personnel losses, was also rejected by RAF command.
Blacketts team made the logical recommendation that the armour be placed in the areas which were completely
untouched by damage in the bombers which returned. They reasoned that the survey was biased, since it only included
aircraft that returned to Britain. The untouched areas of returning aircraft were probably vital areas, which, if hit,
would result in the loss of the aircraft.[20]
When Germany organised its air defences into the Kammhuber Line, it was realised by the British that if the RAF
bombers were to y in a bomber stream they could overwhelm the night ghters who ew in individual cells directed
to their targets by ground controllers. It was then a matter of calculating the statistical loss from collisions against the
statistical loss from night ghters to calculate how close the bombers should y to minimise RAF losses.[21]
The exchange rate ratio of output to input was a characteristic feature of operational research. By comparing the
number of ying hours put in by Allied aircraft to the number of U-boat sightings in a given area, it was possible to
redistribute aircraft to more productive patrol areas. Comparison of exchange rates established eectiveness ratios
useful in planning. The ratio of 60 mines laid per ship sunk was common to several campaigns: German mines in
British ports, British mines on German routes, and United States mines in Japanese routes.[22]
Operational research doubled the on-target bomb rate of B-29s bombing Japan from the Marianas Islands by increasing the training ratio from 4 to 10 percent of ying hours; revealed that wolf-packs of three United States submarines
were the most eective number to enable all members of the pack to engage targets discovered on their individual
patrol stations; revealed that glossy enamel paint was more eective camouage for night ghters than traditional dull
camouage paint nish, and the smooth paint nish increased airspeed by reducing skin friction.[22]
On land, the operational research sections of the Army Operational Research Group (AORG) of the Ministry of
Supply (MoS) were landed in Normandy in 1944, and they followed British forces in the advance across Europe.
They analysed, among other topics, the eectiveness of artillery, aerial bombing and anti-tank shooting.
7.2.3
With expanded techniques and growing awareness of the eld at the close of the war, operational research was no
longer limited to only operational, but was extended to encompass equipment procurement, training, logistics and
infrastructure. Operations Research also grew in many areas other than the military once scientists learned to apply
its principles to the civilian sector. With the development of the simplex algorithm for linear programming in 1947[23]
and the development of computers over the next three decades, Operations Research can now solve problems with
hundreds of thousands of variables and constraints. Moreover, the large volumes of data required for such problems
can be stored and manipulated very eciently.[23]
36
Floorplanning: designing the layout of equipment in a factory or components on a computer chip to reduce
manufacturing time (therefore reducing cost)
Network optimization: for instance, setup of telecommunications networks to maintain quality of service during outages
37
Allocation problems
Facility location
Assignment Problems:
Assignment problem
Generalized assignment problem
Quadratic assignment problem
Weapon target assignment problem
Bayesian search theory: looking for a target
Optimal search
Routing, such as determining the routes of buses so that as few buses are needed as possible
Supply chain management: managing the ow of raw materials and products based on uncertain demand for
the nished products
Ecient messaging and customer response tactics
Automation: automating or integrating robotic systems in human-driven operations processes
Globalization: globalizing operations processes in order to take advantage of cheaper materials, labor, land or
other productivity inputs
Transportation: managing freight transportation and delivery systems (Examples: LTL shipping, intermodal
freight transport, travelling salesman problem)
Scheduling:
Personnel stang
Manufacturing steps
Project tasks
Network data trac: these are known as queueing models or queueing systems.
Sports events and their television coverage
Blending of raw materials in oil reneries
Determining optimal prices, in many retail and B2B settings, within the disciplines of pricing science
Operational research is also used extensively in government where evidence-based policy is used.
38
The management scientists mandate is to use rational, systematic, science-based techniques to inform and improve
decisions of all kinds. Of course, the techniques of management science are not restricted to business applications but
may be applied to military, medical, public administration, charitable groups, political groups or community groups.
Management science is concerned with developing and applying models and concepts that may prove useful in helping
to illuminate management issues and solve managerial problems, as well as designing and developing new and better
models of organizational excellence.[25]
The application of these models within the corporate sector became known as management science.[26]
7.4.1
Related elds
Some of the elds that have considerable overlap with Operations Research and Management Science include:
7.4.2
Applications
Applications of management science are abundant such as in airlines, manufacturing companies, service organizations, military branches, and government. The range of problems and issues to which management science has
contributed insights and solutions is vast. It includes:[25]
Scheduling airlines, trains, buses etc.
Assignment (assigning crew to ights, trains or buses; employees to projects)
Facility location (deciding the most appropriate location for the new facilities such as a warehouse, factory or
re station)
Network ows (managing the ow of water from reservoirs)
Health service (information and supply chain management for health services)
Game theory (identifying, understanding and developing the strategies adopted by companies)
Management science is also concerned with so-called soft-operational analysis, which concerns methods for strategic
planning, strategic decision support, and problem structuring methods. In dealing with these sorts of challenges mathematical modeling and simulation are not appropriate or will not suce. Therefore, during the past 30 years, a number
of non-quantied modeling methods have been developed. These include:
stakeholder based approaches including metagame analysis and drama theory
morphological analysis and various forms of inuence diagrams
approaches using cognitive mapping
the strategic choice approach
robustness analysis
39
In 2004 the US-based organization INFORMS began an initiative to market the OR profession better, including a
website entitled The Science of Better[40] which provides an introduction to OR and examples of successful applications
of OR to industrial problems. This initiative has been adopted by the Operational Research Society in the UK,
including a website entitled Learn about OR.[41]
Journals
The Institute for Operations Research and the Management Sciences (INFORMS) publishes thirteen scholarly journals about operations research, including the top two journals in their class, according to 2005 Journal Citation
Reports.[42] They are:
Decision Analysis[43]
Information Systems Research[44]
INFORMS Journal on Computing[45]
INFORMS Transactions on Education[46] (an open access journal)
Interfaces[47]
Management Science: A Journal of the Institute for Operations Research and the Management Sciences
Manufacturing & Service Operations Management
Marketing Science
Mathematics of Operations Research
Operations Research: A Journal of the Institute for Operations Research and the Management Sciences
Organization Science[48]
Service Science[49]
Transportation Science
Other journals
4OR-A Quarterly Journal of Operations Research: jointly published the Belgian, French and Italian Operations
Research Societies (Springer);
Decision Sciences published by Wiley-Blackwell on behalf of the Decision Sciences Institute
European Journal of Operational Research (EJOR): Founded in 1975 and is presently by far the largest operational research journal in the world, with its around 9,000 pages of published papers per year. In 2004, its total
number of citations was the second largest amongst Operational Research and Management Science journals;
INFOR Journal: published and sponsored by the Canadian Operational Research Society;
International Journal of Operations Research and Information Systems (IJORIS)": an ocial publication of the
Information Resources Management Association, published quarterly by IGI Global;[50]
Journal of Defense Modeling and Simulation (JDMS): Applications, Methodology, Technology: a quarterly journal devoted to advancing the science of modeling and simulation as it relates to the military and defense.[51]
Journal of the Operational Research Society (JORS): an ocial journal of The OR Society; this is the oldest
continuously published journal of OR in the world, published by Palgrave;[52]
Journal of Simulation (JOS): an ocial journal of The OR Society, published by Palgrave;[52]
Mathematical Methods of Operations Research (MMOR): the journal of the German and Dutch OR Societies,
published by Springer;[53]
Military Operations Research (MOR): published by the Military Operations Research Society;
40
7.7. REFERENCES
41
[20] James F. Dunnigan (1999). Dirty Little Secrets of the Twentieth Century. Harper Paperbacks. pp. 215217.
[21] RAF History Bomber Command 60th Anniversary. Raf.mod.uk. Retrieved 13 November 2011.
[22] Milkman, Raymond H. (May 1968). Operations Research in World War II. United States Naval Institute Proceedings.
[23] 1.2 A HISTORICAL PERSPECTIVE. PRINCIPLES AND APPLICATIONS OF OPERATIONS RESEARCH.
[24] Staord Beer (1967) Management Science: The Business Use of Operations Research
[25] What is Management Science? Lancaster University, 2008. Retrieved 5 June 2008.
[26] What is Management Science? The University of Tennessee, 2006. Retrieved 5 June 2008.
[27] IFORS. IFORS. Retrieved 13 November 2011.
[28] Leszczynski, Mary (8 November 2011). Informs. Informs. Retrieved 13 November 2011.
[29] The OR Society. Orsoc.org.uk. Retrieved 13 November 2011.
[30] Socit franaise de Recherche Oprationnelle et d'Aide la Dcision. ROADEF. Retrieved 13 November 2011.
[31] www.cors.ca. CORS. Cors.ca. Retrieved 13 November 2011.
[32] ASOR. ASOR. 1 January 1972. Retrieved 13 November 2011.
[33] ORSNZ. ORSNZ. Retrieved 13 November 2011.
[34] ORSP. ORSP. Retrieved 13 November 2011.
[35] ORSI. Orsi.in. Retrieved 13 November 2011.
[36] ORSSA. ORSSA. 23 September 2011. Retrieved 13 November 2011.
[37] EURO (EURO)". Euro-online.org. Retrieved 13 November 2011.
[38] SISO. Sisostds.org. Retrieved 13 November 2011.
[39] I/Itsec. I/Itsec. Retrieved 13 November 2011.
[40] The Science of Better. The Science of Better. Retrieved 13 November 2011.
[41] Learn about OR. Learn about OR. Retrieved 13 November 2011.
[42] INFORMS Journals. Informs.org. Retrieved 13 November 2011.
[43] Decision Analysis. Informs.org. Retrieved 19 March 2015.
[44] Information Systems Research. Informs.org. Retrieved 19 March 2015.
[45] INFORMS Journal on Computing. Informs.org. Retrieved 19 March 2015.
[46] INFORMS Transactions on Education. Informs.org. Retrieved 19 March 2015.
[47] Interfaces. Informs.org. Retrieved 19 March 2015.
[48] Organization Science. Informs.org. Retrieved 19 March 2015.
[49] Service Science. Informs.org. Retrieved 19 March 2015.
[50] International Journal of Operations Research and Information Systems (IJORIS) (19479328)(19479336): John Wang:
Journals. IGI Global. Retrieved 13 November 2011.
[51] The Society for Modeling & Simulation International. JDMS. Scs.org. Retrieved 13 November 2011.
[52] The OR Society;
[53] Mathematical Methods of Operations Research website. Springer.com. Retrieved 13 November 2011.
[54] TOP. Springer.com. Retrieved 13 November 2011.
42
7.8.2
Classic textbooks
Frederick S. Hillier & Gerald J. Lieberman, Introduction to Operations Research, McGraw-Hill: Boston MA;
10th Edition, 2014
Harvey M. Wagner, Principles of Operations Research, Englewood Clis, Prentice-Hall, 1969
7.8.3
History
Saul I. Gass, Arjang A. Assad, An Annotated Timeline of Operations Research: An Informal History. New
York, Kluwer Academic Publishers, 2005.
Saul I. Gass (Editor), Arjang A. Assad (Editor), Proles in Operations Research: Pioneers and Innovators.
Springer, 2011
43
Maurice W. Kirby (Operational Research Society (Great Britain)). Operational Research in War and Peace:
The British Experience from the 1930s to 1970, Imperial College Press, 2003. ISBN 1-86094-366-7, ISBN
978-1-86094-366-9
J. K. Lenstra, A. H. G. Rinnooy Kan, A. Schrijver (editors) History of Mathematical Programming: A Collection
of Personal Reminiscences, North-Holland, 1991
Charles W. McArthur, Operations Analysis in the U.S. Army Eighth Air Force in World War II, History of
Mathematics, Vol. 4, Providence, American Mathematical Society, 1990
C. H. Waddington, O. R. in World War 2: Operational Research Against the U-boat, London, Elek Science,
1973.
Chapter 8
Machine learning
For the journal, see Machine Learning (journal).
Machine learning is a subeld of computer science[1] that evolved from the study of pattern recognition and computational
learning theory in articial intelligence.[1] In 1959, Arthur Samuel dened machine learning as a Field of study that
gives computers the ability to learn without being explicitly programmed.[2] Machine learning explores the study and
construction of algorithms that can learn from and make predictions on data.[3] Such algorithms operate by building
a model from example inputs in order to make data-driven predictions or decisions,[4]:2 rather than following strictly
static program instructions.
Machine learning is closely related to (and often overlaps with) computational statistics; a discipline which also focuses
in prediction-making through the use of computers. It has strong ties to mathematical optimization, which delivers
methods, theory and application domains to the eld. Machine learning is employed in a range of computing tasks
where designing and programming explicit algorithms is unfeasible. Example applications include spam ltering,
optical character recognition (OCR),[5] search engines and computer vision. Machine learning is sometimes conated
with data mining,[6] where the latter sub-eld focuses more on exploratory data analysis and is known as unsupervised
learning.[4]:vii[7]
Within the eld of data analytics, machine learning is a method used to devise complex models and algorithms that
lend themselves to prediction - in commercial use, this is known as predictive analytics. These analytical models
allow researchers, data scientists, engineers, and analysts to produce reliable, repeatable decisions and results and
uncover hidden insights through learning from historical relationships and trends in the data.[8]
8.1 Overview
Tom M. Mitchell provided a widely quoted, more formal denition: A computer program is said to learn from
experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T,
as measured by P, improves with experience E.[9] This denition is notable for its dening machine learning in
fundamentally operational rather than cognitive terms, thus following Alan Turing's proposal in his paper "Computing
Machinery and Intelligence" that the question Can machines think?" be replaced with the question Can machines
do what we (as thinking entities) can do?"[10]
8.1.1
Machine learning tasks are typically classied into three broad categories, depending on the nature of the learning
signal or feedback available to a learning system. These are[11]
Supervised learning: The computer is presented with example inputs and their desired outputs, given by a
teacher, and the goal is to learn a general rule that maps inputs to outputs.
Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to nd structure in
its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards
44
8.1. OVERVIEW
45
A support vector machine is a classier that divides its input space into two regions, separated by a linear boundary. Here, it has
learned to distinguish black and white circles.
Among other categories of machine learning problems, learning to learn learns its own inductive bias based on previous experience. Developmental learning, elaborated for robot learning, generates its own sequences (also called curriculum) of learning situations to cumulatively acquire repertoires of novel skills through autonomous self-exploration
and social interaction with human teachers and using guidance mechanisms such as active learning, maturation, motor
synergies, and imitation.
46
Another categorization of machine learning tasks arises when one considers the desired output of a machine-learned
system:[4]:3
In classication, inputs are divided into two or more classes, and the learner must produce a model that assigns
unseen inputs to one or more (multi-label classication) of these classes. This is typically tackled in a supervised
way. Spam ltering is an example of classication, where the inputs are email (or other) messages and the
classes are spam and not spam.
In regression, also a supervised problem, the outputs are continuous rather than discrete.
In clustering, a set of inputs is to be divided into groups. Unlike in classication, the groups are not known
beforehand, making this typically an unsupervised task.
Density estimation nds the distribution of inputs in some space.
Dimensionality reduction simplies inputs by mapping them into a lower-dimensional space. Topic modeling
is a related problem, where a program is given a list of human language documents and is tasked to nd out
which documents cover similar topics.
8.3. THEORY
47
task is the discovery of previously unknown knowledge. Evaluated with respect to known knowledge, an uninformed
(unsupervised) method will easily be outperformed by supervised methods, while in a typical KDD task, supervised
methods cannot be used due to the unavailability of training data.
Machine learning also has intimate ties to optimization: many learning problems are formulated as minimization of
some loss function on a training set of examples. Loss functions express the discrepancy between the predictions of
the model being trained and the actual problem instances (for example, in classication, one wants to assign a label
to instances, and models are trained to correctly predict the pre-assigned labels of a set examples). The dierence
between the two elds arises from the goal of generalization: while optimization algorithms can minimize the loss on
a training set, machine learning is concerned with minimizing the loss on unseen samples.[13]
8.2.1
Relation to statistics
Machine learning and statistics are closely related elds. According to Michael I. Jordan, the ideas of machine
learning, from methodological principles to theoretical tools, have had a long pre-history in statistics.[14] He also
suggested the term data science as a placeholder to call the overall eld.[14]
Leo Breiman distinguished two statistical modelling paradigms: data model and algorithmic model,[15] wherein 'algorithmic model' means more or less the machine learning algorithms like Random forest.
Some statisticians have adopted methods from machine learning, leading to a combined eld that they call statistical
learning.[16]
8.3 Theory
Main article: Computational learning theory
A core objective of a learner is to generalize from its experience.[17][18] Generalization in this context is the ability
of a learning machine to perform accurately on new, unseen examples/tasks after having experienced a learning data
set. The training examples come from some generally unknown probability distribution (considered representative
of the space of occurrences) and the learner has to build a general model about this space that enables it to produce
suciently accurate predictions in new cases.
The computational analysis of machine learning algorithms and their performance is a branch of theoretical computer
science known as computational learning theory. Because training sets are nite and the future is uncertain, learning
theory usually does not yield guarantees of the performance of algorithms. Instead, probabilistic bounds on the
performance are quite common. The biasvariance decomposition is one way to quantify generalization error.
For the best performance in the context of generalization, the complexity of the hypothesis should match the complexity of the function underlying the data. If the hypothesis is less complex than the function, then the model has
undert the data. If the complexity of the model is increased in response, then the training error decreases. But if
the hypothesis is too complex, then the model is subject to overtting and generalization will be poorer.[19]
In addition to performance bounds, computational learning theorists study the time complexity and feasibility of
learning. In computational learning theory, a computation is considered feasible if it can be done in polynomial time.
There are two kinds of time complexity results. Positive results show that a certain class of functions can be learned
in polynomial time. Negative results show that certain classes cannot be learned in polynomial time.
8.4 Approaches
Main article: List of machine learning algorithms
8.4.1
48
Decision tree learning uses a decision tree as a predictive model, which maps observations about an item to conclusions
about the items target value.
8.4.2
8.4.3
8.4.4
Deep Learning
8.4.5
8.4.6
8.4.7
Clustering
8.4. APPROACHES
49
Cluster analysis is the assignment of a set of observations into subsets (called clusters) so that observations within the
same cluster are similar according to some predesignated criterion or criteria, while observations drawn from dierent
clusters are dissimilar. Dierent clustering techniques make dierent assumptions on the structure of the data, often
dened by some similarity metric and evaluated for example by internal compactness (similarity between members of
the same cluster) and separation between dierent clusters. Other methods are based on estimated density and graph
connectivity. Clustering is a method of unsupervised learning, and a common technique for statistical data analysis.
8.4.8
Bayesian networks
8.4.9
Reinforcement learning
8.4.10
Representation learning
8.4.11
50
8.4.12
8.4.13
Genetic algorithms
8.5 Applications
Applications for machine learning include:
Adaptive websites
Aective computing
Bioinformatics
Brain-machine interfaces
Cheminformatics
Classifying DNA sequences
Computational anatomy
Computer vision, including object recognition
Detecting credit card fraud
Game playing
Information retrieval
Internet fraud detection
Marketing
Machine perception
Medical diagnosis
Natural language processing
8.6. ETHICS
51
8.6 Ethics
Machine Learning poses a host of ethical questions. Systems which are trained on datasets collected with biases may
exhibit these biases upon use, thus digitizing cultural prejudices such as institutional racism and classism.[32] Responsible collection of data thus is a critical part of machine learning. See Machine ethics for additional information.
8.7 Software
Software suites containing a variety of machine learning algorithms include the following:
8.7.1
dlib
ELKI
Encog
GNU Octave
52
8.7.2
KNIME
RapidMiner
8.7.3
Proprietary software
8.8. JOURNALS
MATLAB
Microsoft Azure Machine Learning
Neural Designer
NeuroSolutions
Oracle Data Mining
RCASE
SAS Enterprise Miner
Splunk
STATISTICA Data Miner
8.8 Journals
Journal of Machine Learning Research
Machine Learning
Neural Computation
8.9 Conferences
Conference on Neural Information Processing Systems
International Conference on Machine Learning
53
54
8.11 References
[1] http://www.britannica.com/EBchecked/topic/1116194/machine-learning This tertiary source reuses information from other
sources but does not name them.
[2] Phil Simon (March 18, 2013). Too Big to Ignore: The Business Case for Big Data. Wiley. p. 89. ISBN 978-1-118-63817-0.
[3] Ron Kohavi; Foster Provost (1998). Glossary of terms. Machine Learning. 30: 271274.
[4] Machine learning and pattern recognition can be viewed as two facets of the same eld.
[5] Wernick, Yang, Brankov, Yourganov and Strother, Machine Learning in Medical Imaging, IEEE Signal Processing Magazine, vol. 27, no. 4, July 2010, pp. 25-38
[6] Mannila, Heikki (1996). Data mining: machine learning, statistics, and databases. Int'l Conf. Scientic and Statistical
Database Management. IEEE Computer Society.
[7] Friedman, Jerome H. (1998). Data Mining and Statistics: Whats the connection?". Computing Science and Statistics. 29
(1): 39.
[8] Machine Learning: What it is and why it matters. www.sas.com. Retrieved 2016-03-29.
[9] Mitchell, T. (1997). Machine Learning, McGraw Hill. ISBN 0-07-042807-7, p.2.
[10] Harnad, Stevan (2008), The Annotation Game: On Turing (1950) on Computing, Machinery, and Intelligence, in Epstein,
Robert; Peters, Grace, The Turing Test Sourcebook: Philosophical and Methodological Issues in the Quest for the Thinking
Computer, Kluwer
[11] Russell, Stuart; Norvig, Peter (2003) [1995]. Articial Intelligence: A Modern Approach (2nd ed.). Prentice Hall. ISBN
978-0137903955.
[12] Langley, Pat (2011). The changing science of machine learning. Machine Learning. 82 (3): 275279. doi:10.1007/s10994011-5242-y.
[13] Le Roux, Nicolas; Bengio, Yoshua; Fitzgibbon, Andrew (2012). Improving First and Second-Order Methods by Modeling
Uncertainty. In Sra, Suvrit; Nowozin, Sebastian; Wright, Stephen J. Optimization for Machine Learning. MIT Press. p.
404.
[14] MI Jordan (2014-09-10). statistics and machine learning. reddit. Retrieved 2014-10-01.
[15] Cornell University Library. Breiman : Statistical Modeling: The Two Cultures (with comments and a rejoinder by the
author)". Retrieved 8 August 2015.
[16] Gareth James; Daniela Witten; Trevor Hastie; Robert Tibshirani (2013). An Introduction to Statistical Learning. Springer.
p. vii.
[17] Bishop, C. M. (2006), Pattern Recognition and Machine Learning, Springer, ISBN 0-387-31073-8
[18] Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar (2012) Foundations of Machine Learning, MIT Press ISBN 9780-262-01825-8.
[19] Ethem Alpaydin. "Introduction to Machine Learning" The MIT Press, 2010.
[20] Honglak Lee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng. "Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations" Proceedings of the 26th Annual International Conference on Machine
Learning, 2009.
55
[21] Lu, Haiping; Plataniotis, K.N.; Venetsanopoulos, A.N. (2011). A Survey of Multilinear Subspace Learning for Tensor
Data (PDF). Pattern Recognition. 44 (7): 15401551. doi:10.1016/j.patcog.2011.01.004.
[22] Yoshua Bengio (2009). Learning Deep Architectures for AI. Now Publishers Inc. pp. 13. ISBN 978-1-60198-294-0.
[23] A. M. Tillmann, "On the Computational Intractability of Exact and Approximate Dictionary Learning", IEEE Signal Processing Letters 22(1), 2015: 4549.
[24] Aharon, M, M Elad, and A Bruckstein. 2006. K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse
Representation. Signal Processing, IEEE Transactions on 54 (11): 4311-4322
[25] Goldberg, David E.; Holland, John H. (1988). Genetic algorithms and machine learning. Machine Learning. 3 (2):
9599. doi:10.1007/bf00113892.
[26] Michie, D.; Spiegelhalter, D. J.; Taylor, C. C. (1994). Machine Learning, Neural and Statistical Classication. Ellis Horwood.
[27] Zhang, Jun; Zhan, Zhi-hui; Lin, Ying; Chen, Ni; Gong, Yue-jiao; Zhong, Jing-hui; Chung, Henry S.H.; Li, Yun; Shi, Yuhui (2011). Evolutionary Computation Meets Machine Learning: A Survey (PDF). Computational Intelligence Magazine.
IEEE. 6 (4): 6875. doi:10.1109/mci.2011.942584.
[28] BelKor Home Page research.att.com
[29] The Netix Tech Blog: Netix Recommendations: Beyond the 5 stars (Part 1)". Retrieved 8 August 2015.
[30] Scott Patterson (13 July 2010). "'Articial Intelligence' Gains Fans Among Investors - WSJ. WSJ. Retrieved 8 August
2015.
[31] When A Machine Learning Algorithm Studied Fine Art Paintings, It Saw Things Art Historians Had Never Noticed, The
Physics at ArXiv blog
[32] Bostrom, Nick (2011). The Ethics of Articial Intelligence (PDF). Retrieved 11 April 2016.
56
Chapter 9
Statistical inference
Not to be confused with Statistical interference.
Statistical inference is the process of deducing properties of an underlying distribution by analysis of data.[1] Inferential statistical analysis infers properties about a population: this includes testing hypotheses and deriving estimates.
The population is assumed to be larger than the observed data set; in other words, the observed data is assumed to be
sampled from a larger population.
Inferential statistics can be contrasted with descriptive statistics. Descriptive statistics is solely concerned with properties of the observed data, and does not assume that the data came from a larger population.
9.1 Introduction
Statistical inference makes propositions about a population, using data drawn from the population with some form of
sampling. Given a hypothesis about a population, for which we wish to draw inferences, statistical inference consists
of (rstly) selecting a statistical model of the process that generates the data and (secondly) deducing propositions
from the model.
Konishi & Kitagawa state, The majority of the problems in statistical inference can be considered to be problems
related to statistical modeling.[2] Relatedly, Sir David Cox has said, How [the] translation from subject-matter
problem to statistical model is done is often the most critical part of an analysis.[3]
The conclusion of a statistical inference is a statistical proposition. Some common forms of statistical proposition
are the following:
a point estimate, i.e. a particular value that best approximates some parameter of interest;
an interval estimate, e.g. a condence interval (or set estimate), i.e. an interval constructed using a dataset
drawn from a population so that, under repeated sampling of such datasets, such intervals would contain the
true parameter value with the probability at the stated condence level;
a credible interval, i.e. a set of values containing, for example, 95% of posterior belief;
rejection of a hypothesis;[4]
clustering or classication of data points into groups.
58
population quantities of interest, about which we wish to draw inference.[5] Descriptive statistics are typically used as
a preliminary step before more formal inferences are drawn.[6]
9.2.1
Degree of models/assumptions
9.2.2
Whatever level of assumption is made, correctly calibrated inference in general requires these assumptions to be
correct; i.e. that the data-generating mechanisms really have been correctly specied.
Incorrect assumptions of 'simple' random sampling can invalidate statistical inference.[8] More complex semi- and
fully parametric assumptions are also cause for concern. For example, incorrectly assuming the Cox model can in
some cases lead to faulty conclusions.[9] Incorrect assumptions of Normality in the population also invalidates some
forms of regression-based inference.[10] The use of any parametric model is viewed skeptically by most experts
in sampling human populations: most sampling statisticians, when they deal with condence intervals at all, limit
themselves to statements about [estimators] based on very large samples, where the central limit theorem ensures that
these [estimators] will have distributions that are nearly normal.[11] In particular, a normal distribution would be
a totally unrealistic and catastrophically unwise assumption to make if we were dealing with any kind of economic
population.[11] Here, the central limit theorem states that the distribution of the sample mean for very large samples
is approximately normally distributed, if the distribution is not heavy tailed.
Approximate distributions
Main articles: Statistical distance, Asymptotic theory (statistics), and Approximation theory
Given the diculty in specifying exact distributions of sample statistics, many methods have been developed for
approximating these.
With nite samples, approximation results measure how close a limiting distribution approaches the statistics sample
distribution: For example, with 10,000 independent samples the normal distribution approximates (to two digits of
accuracy) the distribution of the sample mean for many population distributions, by the BerryEsseen theorem.[12] Yet
for many practical purposes, the normal approximation provides a good approximation to the sample-means distribution when there are 10 (or more) independent samples, according to simulation studies and statisticians experience.[12]
Following Kolmogorovs work in the 1950s, advanced statistics uses approximation theory and functional analysis to
quantify the error of approximation. In this approach, the metric geometry of probability distributions is studied; this
approach quanties approximation error with, for example, the KullbackLeibler divergence, Bregman divergence,
and the Hellinger distance.[13][14][15]
59
With indenitely large samples, limiting results like the central limit theorem describe the sample statistics limiting
distribution, if one exists. Limiting results are not statements about nite samples, and indeed are irrelevant to nite
samples.[16][17][18] However, the asymptotic theory of limiting distributions is often invoked for work with nite
samples. For example, limiting results are often invoked to justify the generalized method of moments and the
use of generalized estimating equations, which are popular in econometrics and biostatistics. The magnitude of the
dierence between the limiting distribution and the true distribution (formally, the 'error' of the approximation) can
be assessed using simulation.[19] The heuristic application of limiting results to nite samples is common practice in
many applications, especially with low-dimensional models with log-concave likelihoods (such as with one-parameter
exponential families).
9.2.3
Randomization-based models
60
9.3.1
Frequentist inference
9.3.2
Bayesian inference
61
in this way. While a users utility function need not be stated for this sort of inference, these summaries do all
depend (to some extent) on stated prior beliefs, and are generally viewed as subjective conclusions. (Methods of prior
construction which do not require external input have been proposed but not yet fully developed.)
Formally, Bayesian inference is calibrated with reference to an explicitly stated utility, or loss function; the 'Bayes
rule' is the one which maximizes expected utility, averaged over the posterior uncertainty. Formal Bayesian inference
therefore automatically provides optimal decisions in a decision theoretic sense. Given assumptions, data and utility,
Bayesian inference can be made for essentially any problem, although not every statistical inference need have a
Bayesian interpretation. Analyses which are not formally Bayesian can be (logically) incoherent; a feature of Bayesian
procedures which use proper priors (i.e. those integrable to one) is that they are guaranteed to be coherent. Some
advocates of Bayesian inference assert that inference must take place in this decision-theoretic framework, and that
Bayesian inference should not conclude with the evaluation and summarization of posterior beliefs.
9.3.3
AIC-based inference
9.3.4
62
arguments behind ducial inference on a restricted class of models on which ducial procedures would be welldened and useful.
9.6 Notes
[1] Upton, G., Cook, I. (2008) Oxford Dictionary of Statistics, OUP. ISBN 978-0-19-954145-4
[2] Konishi & Kitagawa (2008), p.75
[3] Cox (2006), p.197
[4] According to Peirce, acceptance means that inquiry on this question ceases for the time being. In science, all scientic
theories are revisable
[5] Cox (2006) page 2
[6] Evans, Michael; et al. (2004). Probability and Statistics: The Science of Uncertainty. Freeman and Company. p. 267.
[7] van der Vaart, A.W. (1998) Asymptotic Statistics Cambridge University Press. ISBN 0-521-78450-6 (page 341)
[8] Kruskal 1988
[9] Freedman, D.A. (2008) Survival analysis: An Epidemiological hazard?". The American Statistician (2008) 62: 110-119.
(Reprinted as Chapter 11 (pages 169192) of Freedman (2010)).
[10] Berk, R. (2003) Regression Analysis: A Constructive Critique (Advanced Quantitative Techniques in the Social Sciences) (v.
11) Sage Publications. ISBN 0-7619-2904-5
9.6. NOTES
63
[11] Brewer, Ken (2002). Combined Survey Sampling Inference: Weighing of Basus Elephants. Hodder Arnold. p. 6. ISBN
978-0340692295.
[12] Jrgen Homan-Jrgensens Probability With a View Towards Statistics, Volume I. Page 399
[13] Le Cam (1986)
[14] Erik Torgerson (1991) Comparison of Statistical Experiments, volume 36 of Encyclopedia of Mathematics. Cambridge
University Press.
[15] Liese, Friedrich & Miescke, Klaus-J. (2008). Statistical Decision Theory: Estimation, Testing, and Selection. Springer.
ISBN 0-387-73193-8.
[16] Kolmogorov (1963, p.369): The frequency concept, based on the notion of limiting frequency as the number of trials
increases to innity, does not contribute anything to substantiate the applicability of the results of probability theory to real
practical problems where we have always to deal with a nite number of trials.
[17] Indeed, limit theorems 'as n tends to innity' are logically devoid of content about what happens at any particular n .
All they can do is suggest certain approaches whose performance must then be checked on the case at hand. Le Cam
(1986) (page xiv)
[18] Pfanzagl (1994): The crucial drawback of asymptotic theory: What we expect from asymptotic theory are results which
hold approximately . . . . What asymptotic theory has to oer are limit theorems."(page ix) What counts for applications
are approximations, not limits. (page 188)
[19] Pfanzagl (1994) : By taking a limit theorem as being approximately true for large sample sizes, we commit an error the
size of which is unknown. [. . .] Realistic information about the remaining errors may be obtained by simulations. (page
ix)
[20] Neyman, J.(1934) On the two dierent aspects of the representative method: The method of stratied sampling and the
method of purposive selection, Journal of the Royal Statistical Society, 97 (4), 557625 JSTOR 2342192
[21] Hinkelmann and Kempthorne(2008)
[22] ASA Guidelines for a rst course in statistics for non-statisticians. (available at the ASA website)
[23] David A. Freedman et alias Statistics.
[24] David S. Moore and George McCabe. Introduction to the Practice of Statistics.
[25] Gelman A. et al. (2013). Bayesian Data Analysis (Chapman & Hall).
[26] Peirce (1877-1878)
[27] Peirce (1883)
[28] David Freedman et alia Statistics and David A. Freedman Statistical Models.
[29] Rao, C.R. (1997) Statistics and Truth: Putting Chance to Work, World Scientic. ISBN 981-02-3111-3
[30] Peirce, Freedman, Moore and McCabe.
[31] Box, G.E.P. and Friends (2006) Improving Almost Anything: Ideas and Essays, Revised Edition, Wiley. ISBN 978-0-47172755-2
[32] Cox (2006), page 196
[33] ASA Guidelines for a rst course in statistics for non-statisticians. (available at the ASA website)
David A. Freedman et alias Statistics.
David S. Moore and George McCabe. Introduction to the Practice of Statistics.
[34] Neyman, Jerzy. 1923 [1990]. On the Application of Probability Theory to AgriculturalExperiments. Essay on Principles.
Section 9. Statistical Science 5 (4): 465472. Trans. Dorota M. Dabrowska and Terence P. Speed.
[35] Hinkelmann & Kempthorne (2008)
[36] Hinkelmann and Kempthorne (2008) Chapter 6.
[37] Bandyopadhyay & Forster (2011). The quote is taken from the books Introduction (p.3). See also Section III: Four
Paradigms of Statistics.
64
[38] Neyman, J. (1937) Outline of a Theory of Statistical Estimation Based on the Classical Theory of Probability, Philosophical Transactions of the Royal Society of London A, 236, 333380.
[39] Preface to Pfanzagl.
[40] Soo (2000)
[41] Hansen & Yu (2001)
[42] Hansen and Yu (2001), page 747.
[43] Rissanen (1989), page 84
[44] Joseph F. Traub, G. W. Wasilkowski, and H. Wozniakowski. (1988)
[45] Neyman (1956)
[46] Zabell (1992)
[47] Cox (2006) page 66
[48] Hampel 2003.
[49] Davison, page 12.
[50] Barnard, G.A. (1995) Pivotal Models and the Fiducial Argument, International Statistical Review, 63 (3), 309323.
JSTOR 1403482
9.7 References
Bandyopadhyay, P. S.; Forster, M. R., eds. (2011), Philosophy of Statistics, Elsevier.
Bickel, Peter J.; Doksum, Kjell A. (2001). Mathematical statistics: Basic and selected topics. 1 (Second (updated
printing 2007) ed.). Prentice Hall. ISBN 0-13-850363-X. MR 443141.
Cox, D. R. (2006). Principles of Statistical Inference, Cambridge University Press. ISBN 0-521-68567-2.
Fisher, R. A. (1955), Statistical methods and scientic induction, Journal of the Royal Statistical Society,
Series B, 17, 6978. (criticism of statistical theories of Jerzy Neyman and Abraham Wald)
Freedman, D. A. (2009). Statistical models: Theory and practice (revised ed.). Cambridge University Press.
pp. xiv+442 pp. ISBN 978-0-521-74385-3. MR 2489600.
Freedman, D. A. (2010). Statistical Models and Causal Inferences: A Dialogue with the Social Sciences (Edited
by David Collier, Jasjeet S. Sekhon, and Philip B. Stark), Cambridge University Press.
Hansen, Mark H.; Yu, Bin (June 2001). Model Selection and the Principle of Minimum Description Length:
Review paper. Journal of the American Statistical Association. 96 (454): 746774. doi:10.1198/016214501753168398.
JSTOR 2670311. MR 1939352.
Hinkelmann, Klaus; Kempthorne, Oscar (2008). Introduction to Experimental Design (Second ed.). Wiley.
ISBN 978-0-471-72756-9.
Kolmogorov, Andrei N. (1963). On tables of random numbers. Sankhy Ser. A. 25: 369375. MR 178484.
Reprinted as Kolmogorov, Andrei N. (1998). On tables of random numbers. Theoretical Computer Science.
207 (2): 387395. doi:10.1016/S0304-3975(98)00075-9. MR 1643414.
Konishi S., Kitagawa G. (2008), Information Criteria and Statistical Modeling, Springer.
Kruskal, William (December 1988). Miracles and Statistics: the casual assumption of independence (ASA
Presidential Address)". Journal of the American Statistical Association. 83 (404): 929940. doi:10.2307/2290117.
JSTOR 2290117.
Le Cam, Lucian. (1986) Asymptotic Methods of Statistical Decision Theory, Springer. ISBN 0-387-96307-3
Neyman, Jerzy (1956). Note on an Article by Sir Ronald Fisher. Journal of the Royal Statistical Society,
Series B. 18 (2): 288294. JSTOR 2983716. (reply to Fisher 1955)
65
Peirce, C. S. (18771878), Illustrations of the Logic of Science (series), Popular Science Monthly, vols.
12-13. Relevant individual papers:
(1878 March), The Doctrine of Chances, Popular Science Monthly, v. 12, March issue, pp. 604615.
Internet Archive Eprint.
(1878 April), The Probability of Induction, Popular Science Monthly, v. 12, pp. 705718. Internet
Archive Eprint.
(1878 June), The Order of Nature, Popular Science Monthly, v. 13, pp. 203217.Internet Archive
Eprint.
(1878 August), Deduction, Induction, and Hypothesis, Popular Science Monthly, v. 13, pp. 470482.
Internet Archive Eprint.
Peirce, C. S. (1883), A Theory of Probable Inference, Studies in Logic, pp. 126-181, Little, Brown, and
Company. (Reprinted 1983, John Benjamins Publishing Company, ISBN 90-272-3271-7)
Pfanzagl, Johann; with the assistance of R. Hambker (1994). Parametric Statistical Theory. Berlin: Walter
de Gruyter. ISBN 3-11-013863-8. MR 1291393.
Rissanen, Jorma (1989). Stochastic Complexity in Statistical Inquiry. Series in computer science. 15. Singapore: World Scientic. ISBN 9971-5-0859-1. MR 1082556.
Soo, Ehsan S. (December 2000). Principal Information-Theoretic Approaches (Vignettes for the Year 2000:
Theory and Methods, ed. by George Casella)". Journal of the American Statistical Association. 95 (452):
13491353. doi:10.1080/01621459.2000.10474346. JSTOR 2669786. MR 1825292.
Traub, Joseph F.; Wasilkowski, G. W.; Wozniakowski, H. (1988). Information-Based Complexity. Academic
Press. ISBN 0-12-697545-0.
Zabell, S. L. (Aug 1992). R. A. Fisher and Fiducial Argument. Statistical Science. 7 (3): 369387.
doi:10.1214/ss/1177011233. JSTOR 2246073.
Hampel, Frank (Feb 2003). The proper ducial argument (PDF) (Research Report No. 114). Retrieved 29
March 2016.
Chapter 10
X,Y = corr(X, Y ) =
cov(X, Y )
E[(X X )(Y Y )]
=
,
X Y
X Y
where E is the expected value operator, cov means covariance, and corr is a widely used alternative notation for the
correlation coecient.
The Pearson correlation is dened only if both of the standard deviations are nite and nonzero. It is a corollary of
66
0.8
0.4
67
-0.4
-0.8
-1
-1
-1
-1
Several sets of (x, y) points, with the Pearson correlation coecient of x and y for each set. Note that the correlation reects the
noisiness and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear
relationships (bottom). N.B.: the gure in the center has a slope of 0 but in that case the correlation coecient is undened because
the variance of Y is zero.
the CauchySchwarz inequality that the correlation cannot exceed 1 in absolute value. The correlation coecient is
symmetric: corr(X,Y) = corr(Y,X).
The Pearson correlation is +1 in the case of a perfect direct (increasing) linear relationship (correlation), 1 in the
case of a perfect decreasing (inverse) linear relationship (anticorrelation),[5] and some value in the open interval
(1, 1) in all other cases, indicating the degree of linear dependence between the variables. As it approaches zero
there is less of a relationship (closer to uncorrelated). The closer the coecient is to either 1 or 1, the stronger the
correlation between the variables.
If the variables are independent, Pearsons correlation coecient is 0, but the converse is not true because the correlation coecient detects only linear dependencies between two variables. For example, suppose the random variable
X is symmetrically distributed about zero, and Y = X2 . Then Y is completely determined by X, so that X and Y are
perfectly dependent, but their correlation is zero; they are uncorrelated. However, in the special case when X and Y
are jointly normal, uncorrelatedness is equivalent to independence.
If we have a series of n measurements of X and Y written as xi and yi for i = 1, 2, ..., n, then the sample correlation
coecient can be used to estimate the population Pearson correlation r between X and Y. The sample correlation
coecient is written:
n
rxy =
(xi x
)(yi y)
i=1
nsx sy
(xi x
)(yi y)
i=1
n
i=1
(xi x
)2
,
(yi y)2
i=1
where x and y are the sample means of X and Y, and sx and sy are the sample standard deviations of X and Y.
This can also be written as:
rxy =
xi yi n
xy
n xi yi xi yi
= 2
.
nsx sy
n xi ( xi )2 n yi2 ( yi )2
If x and y are results of measurements that contain measurement error, the realistic limits on the correlation coecient
are not 1 to +1 but a smaller range.[6]
For the case of a linear model with a single independent variable, the coecient of determination (R squared) is the
square of r, Pearsons product-moment coecient.
68
As we go from each pair to the next pair x increases, and so does y. This relationship is perfect, in the sense that
an increase in x is always accompanied by an increase in y. This means that we have a perfect rank correlation,
and both Spearmans and Kendalls correlation coecients are 1, whereas in this example Pearson product-moment
correlation coecient is 0.7544, indicating that the points are far from lying on a straight line. In the same way if
y always decreases when x increases, the rank correlation coecients will be 1, while the Pearson product-moment
correlation coecient may or may not be close to 1, depending on how close the points are to a straight line. Although
in the extreme cases of perfect rank correlation the two coecients are both equal (being both +1 or both 1), this
is not generally the case, and so values of the two coecients cannot meaningfully be compared.[7] For example, for
the three pairs (1, 1) (2, 3) (3, 2) Spearmans coecient is 1/2, while Kendalls coecient is 1/3.
69
Pearson/Spearman correlation coecients between X and Y are shown when the two variables ranges are unrestricted, and when
the range of X is restricted to the interval (0,1).
Most correlation measures are sensitive to the manner in which X and Y are sampled. Dependencies tend to be
stronger if viewed over a wider range of values. Thus, if we consider the correlation coecient between the heights
of fathers and their sons over all adult males, and compare it to the same correlation coecient calculated when the
fathers are selected to be between 165 cm and 170 cm in height, the correlation will be weaker in the latter case.
Several techniques have been developed that attempt to correct for range restriction in one or both variables, and are
commonly used in meta-analysis; the most common are Thorndikes case II and case III equations.[13]
Various correlation measures in use may be undened for certain joint distributions of X and Y. For example, the
Pearson correlation coecient is dened in terms of moments, and hence will be undened if the moments are undened. Measures of dependence based on quantiles are always dened. Sample-based statistics intended to estimate
population measures of dependence may or may not have desirable statistical properties such as being unbiased, or
asymptotically consistent, based on the spatial structure of the population from which the data were sampled.
Sensitivity to the data distribution can be used to an advantage. For example, scaled correlation is designed to use the
sensitivity to the range in order to pick out correlations between fast components of time series.[14] By reducing the
range of values in a controlled manner, the correlations on long time scale are ltered out and only the correlations
on short time scales are revealed.
70
10.6.2
The Pearson correlation coecient indicates the strength of a linear relationship between two variables, but its value
generally does not completely characterize their relationship.[16] In particular, if the conditional mean of Y given X,
denoted E(Y|X), is not linear in X, the correlation coecient will not fully determine the form of E(Y|X).
The image on the right shows scatter plots of Anscombes quartet, a set of four dierent pairs of variables created
by Francis Anscombe.[17] The four y variables have the same mean (7.5), variance (4.12), correlation (0.816) and
regression line (y = 3 + 0.5x). However, as can be seen on the plots, the distribution of the variables is very dierent.
The rst one (top left) seems to be distributed normally, and corresponds to what one would expect when considering
two variables correlated and following the assumption of normality. The second one (top right) is not distributed
normally; while an obvious relationship between the two variables can be observed, it is not linear. In this case the
Pearson correlation coecient does not indicate that there is an exact functional relationship: only the extent to which
that relationship can be approximated by a linear relationship. In the third case (bottom left), the linear relationship
is perfect, except for one outlier which exerts enough inuence to lower the correlation coecient from 1 to 0.816.
Finally, the fourth example (bottom right) shows another example when one outlier is enough to produce a high
correlation coecient, even though the relationship between the two variables is not linear.
These examples indicate that the correlation coecient, as a summary statistic, cannot replace visual examination of
the data. Note that the examples are sometimes said to demonstrate that the Pearson correlation assumes that the
data follow a normal distribution, but this is not correct.[4]
71
E(Y | X) = E(Y ) + ry
X E(X)
,
x
where and are the expected values of X and Y, respectively, and x and y are the standard deviations of X and Y,
respectively.
72
10.10 References
[1] Croxton, Frederick Emory; Cowden, Dudley Johnstone; Klein, Sidney (1968) Applied General Statistics, Pitman. ISBN
9780273403159 (page 625)
[2] Dietrich, Cornelius Frank (1991) Uncertainty, Calibration and Probability: The Statistics of Scientic and Industrial Measurement 2nd Edition, A. Higler. ISBN 9780750300605 (Page 331)
[3] Aitken, Alexander Craig (1957) Statistical Mathematics 8th Edition. Oliver & Boyd. ISBN 9780050013007 (Page 95)
[4] Rodgers, J. L.; Nicewander, W. A. (1988). Thirteen ways to look at the correlation coecient. The American Statistician.
42 (1): 5966. doi:10.1080/00031305.1988.10475524. JSTOR 2685263.
[5] Dowdy, S. and Wearden, S. (1983). Statistics for Research, Wiley. ISBN 0-471-08602-9 pp 230
73
[6] Francis, DP; Coats AJ; Gibson D (1999). How high can a correlation coecient be?". Int J Cardiol. 69 (2): 185199.
doi:10.1016/S0167-5273(99)00028-5.
[7] Yule, G.U and Kendall, M.G. (1950), An Introduction to the Theory of Statistics, 14th Edition (5th Impression 1968).
Charles Grin & Co. pp 258270
[8] Kendall, M. G. (1955) Rank Correlation Methods, Charles Grin & Co.
[9] Mahdavi Damghani B. (2013). The Non-Misleading Value of Inferred Correlation: An Introduction to the Cointelation
Model. Wilmott Magazine. doi:10.1002/wilm.10252.
[10] Szkely, G. J. Rizzo; Bakirov, N. K. (2007). Measuring and testing independence by correlation of distances. Annals of
Statistics. 35 (6): 27692794. doi:10.1214/009053607000000505.
[11] Szkely, G. J.; Rizzo, M. L. (2009). Brownian distance covariance. Annals of Applied Statistics. 3 (4): 12331303.
doi:10.1214/09-AOAS312.
[12] Lopez-Paz D. and Hennig P. and Schlkopf B. (2013). The Randomized Dependence Coecient, "Conference on Neural
Information Processing Systems" Reprint
[13] Thorndike, Robert Ladd (1947). Research problems and techniques (Report No. 3). Washington DC: US Govt. print. o.
[14] Nikoli, D; Muresan, RC; Feng, W; Singer, W (2012). Scaled correlation analysis: a better way to compute a crosscorrelogram. European Journal of Neuroscience: 121. doi:10.1111/j.1460-9568.2011.07987.x.
[15] Aldrich, John (1995). Correlations Genuine and Spurious in Pearson and Yule. Statistical Science. 10 (4): 364376.
doi:10.1214/ss/1177009870. JSTOR 2246135.
[16] Mahdavi Damghani, Babak (2012). The Misleading Value of Measured Correlation. Wilmott. 2012 (1): 6473.
doi:10.1002/wilm.10167.
[17] Anscombe, Francis J. (1973). Graphs in statistical analysis. The American Statistician. 27: 1721. doi:10.2307/2682899.
JSTOR 2682899.
Chapter 11
Regression analysis
In statistical modeling, regression analysis is a statistical process for estimating the relationships among variables. It
includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between
a dependent variable and one or more independent variables (or 'predictors). More specically, regression analysis
helps one understand how the typical value of the dependent variable (or 'criterion variable') changes when any one of
the independent variables is varied, while the other independent variables are held xed. Most commonly, regression
analysis estimates the conditional expectation of the dependent variable given the independent variables that is, the
average value of the dependent variable when the independent variables are xed. Less commonly, the focus is on a
quantile, or other location parameter of the conditional distribution of the dependent variable given the independent
variables. In all cases, the estimation target is a function of the independent variables called the regression function.
In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression
function which can be described by a probability distribution. A related but distinct approach is Necessary Condition
Analysis (NCA), which estimates the maximum (rather than average) value of the dependent variable for a given value
of the independent variable (ceiling line rather than central line) in order to identify what value of the independent
variable is necessary but not sucient for a given value of the dependent variable.
Regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the eld of
machine learning. Regression analysis is also used to understand which among the independent variables are related
to the dependent variable, and to explore the forms of these relationships. In restricted circumstances, regression
analysis can be used to infer causal relationships between the independent and dependent variables. However this can
lead to illusions or false relationships, so caution is advisable;[1] for example, correlation does not imply causation.
Many techniques for carrying out regression analysis have been developed. Familiar methods such as linear regression
and ordinary least squares regression are parametric, in that the regression function is dened in terms of a nite
number of unknown parameters that are estimated from the data. Nonparametric regression refers to techniques that
allow the regression function to lie in a specied set of functions, which may be innite-dimensional.
The performance of regression analysis methods in practice depends on the form of the data generating process, and
how it relates to the regression approach being used. Since the true form of the data-generating process is generally
not known, regression analysis often depends to some extent on making assumptions about this process. These
assumptions are sometimes testable if a sucient quantity of data is available. Regression models for prediction are
often useful even when the assumptions are moderately violated, although they may not perform optimally. However,
in many applications, especially with small eects or questions of causality based on observational data, regression
methods can give misleading results.[2][3]
In a narrower sense, regression may refer specically to the estimation of continuous response variables, as opposed
to the discrete response variables used in classication.[4] The case of a continuous output variable may be more
specically referred to as metric regression to distinguish it from related problems.[5]
11.1 History
The earliest form of regression was the method of least squares, which was published by Legendre in 1805,[6] and by
Gauss in 1809.[7] Legendre and Gauss both applied the method to the problem of determining, from astronomical
observations, the orbits of bodies about the Sun (mostly comets, but also later the then newly discovered minor
74
75
planets). Gauss published a further development of the theory of least squares in 1821,[8] including a version of the
GaussMarkov theorem.
The term regression was coined by Francis Galton in the nineteenth century to describe a biological phenomenon.
The phenomenon was that the heights of descendants of tall ancestors tend to regress down towards a normal average (a phenomenon also known as regression toward the mean).[9][10] For Galton, regression had only this biological meaning,[11][12] but his work was later extended by Udny Yule and Karl Pearson to a more general statistical
context.[13][14] In the work of Yule and Pearson, the joint distribution of the response and explanatory variables is
assumed to be Gaussian. This assumption was weakened by R.A. Fisher in his works of 1922 and 1925.[15][16][17]
Fisher assumed that the conditional distribution of the response variable is Gaussian, but the joint distribution need
not be. In this respect, Fishers assumption is closer to Gausss formulation of 1821.
In the 1950s and 1960s, economists used electromechanical desk calculators to calculate regressions. Before 1970,
it sometimes took up to 24 hours to receive the result from one regression.[18]
Regression methods continue to be an area of active research. In recent decades, new methods have been developed
for robust regression, regression involving correlated responses such as time series and growth curves, regression
in which the predictor (independent variable) or response variables are curves, images, graphs, or other complex
data objects, regression methods accommodating various types of missing data, nonparametric regression, Bayesian
methods for regression, regression in which the predictor variables are measured with error, regression with more
predictor variables than observations, and causal inference with regression.
Y f (X, )
The approximation is usually formalized as E(Y | X) = f(X, ). To carry out regression analysis, the form of the
function f must be specied. Sometimes the form of this function is based on knowledge about the relationship
between Y and X that does not rely on the data. If no such knowledge is available, a exible or convenient form for
f is chosen.
Assume now that the vector of unknown parameters is of length k. In order to perform a regression analysis the
user must provide information about the dependent variable Y:
If N data points of the form (Y, X) are observed, where N < k, most classical approaches to regression analysis
cannot be performed: since the system of equations dening the regression model is underdetermined, there
are not enough data to recover .
If exactly N = k data points are observed, and the function f is linear, the equations Y = f(X, ) can be solved
exactly rather than approximately. This reduces to solving a set of N equations with N unknowns (the elements
of ), which has a unique solution as long as the X are linearly independent. If f is nonlinear, a solution may
not exist, or many solutions may exist.
The most common situation is where N > k data points are observed. In this case, there is enough information
in the data to estimate a unique value for that best ts the data in some sense, and the regression model when
applied to the data can be viewed as an overdetermined system in .
In the last case, the regression analysis provides the tools for:
76
11.2.1
Consider a regression model which has three unknown parameters, 0 , 1 , and 2 . Suppose an experimenter performs
10 measurements all at exactly the same value of independent variable vector X (which contains the independent
variables X1 , X2 , and X3 ). In this case, regression analysis fails to give a unique set of estimated values for the three
unknown parameters; the experimenter did not provide enough information. The best one can do is to estimate the
average value and the standard deviation of the dependent variable Y. Similarly, measuring at two dierent values of
X would give enough data for a regression with two unknowns, but not for three or more unknowns.
If the experimenter had performed measurements at three dierent values of the independent variable vector X, then
regression analysis would provide a unique set of estimates for the three unknown parameters in .
In the case of general linear regression, the above statement is equivalent to the requirement that the matrix XT X is
invertible.
11.2.2
Statistical assumptions
When the number of measurements, N, is larger than the number of unknown parameters, k, and the measurement
errors are normally distributed then the excess of information contained in (N k) measurements is used to make
statistical predictions about the unknown parameters. This excess of information is referred to as the degrees of
freedom of the regression.
77
variables may include values aggregated by areas. With aggregated data the modiable areal unit problem can cause
extreme variation in regression parameters.[21] When analyzing data aggregated by political boundaries, postal codes
or census areas results may be very distinct with a dierent choice of units.
yi = 0 + 1 xi + i ,
i = 1, . . . , n.
In multiple linear regression, there are several independent variables or functions of independent variables.
Adding a term in xi2 to the preceding regression gives:
yi = 0 + 1 xi + 2 x2i + i , i = 1, . . . , n.
This is still linear regression; although the expression on the right hand side is quadratic in the independent variable
xi , it is linear in the parameters 0 , 1 and 2 .
In both cases, i is an error term and the subscript i indexes a particular observation.
Returning our attention to the straight line case: Given a random sample from the population, we estimate the population parameters and obtain the sample linear regression model:
ybi = b0 + b1 xi .
The residual, ei = yi ybi , is the dierence between the value of the dependent variable predicted by the model, ybi
, and the true value of the dependent variable, yi . One method of estimation is ordinary least squares. This method
obtains parameter estimates that minimize the sum of squared residuals, SSE,[22][23] also sometimes denoted RSS:
SSE =
e2i .
i=1
Minimization of this function results in a set of normal equations, a set of simultaneous linear equations in the
parameters, which are solved to yield the parameter estimators, b0 , b1 .
In the case of simple regression, the formulas for the least squares estimates are
c1 =
(xi x
)(yi y)
c1 x
and 0 = y
(xi x
)2
where x
is the mean (average) of the x values and y is the mean of the y values.
Under the assumption that the population error term has a constant variance, the estimate of that variance is given by:
2 =
SSE
.
n2
This is called the mean square error (MSE) of the regression. The denominator is the sample size reduced by the
number of model parameters estimated from the same data, (n-p) for p regressors or (n-p1) if an intercept is used.[24]
In this case, p=1 so the denominator is n2.
78
0 =
1
x
2
+
n
(xi x
)2
1 =
1
.
(xi x
)2
Under the further assumption that the population error term is normally distributed, the researcher can use these
estimated standard errors to create condence intervals and conduct hypothesis tests about the population parameters.
11.4.1
i = yi 1 xi1 p xip .
The normal equations are
p
n
Xij Xik k =
i=1 k=1
79
Xij yi , j = 1, . . . , p.
i=1
^ = X Y,
(X X)
where the ij element of X is xij, the i element of the column vector Y is yi, and the j element of is j . Thus X is
np, Y is n1, and is p1. The solution is
^ = (X X)1 X Y.
11.4.2
Diagnostics
11.4.3
The phrase limited dependent is used in econometric statistics for categorical and constrained variables.
The response variable may be non-continuous (limited to lie on some subset of the real line). For binary (zero
or one) variables, if analysis proceeds with least-squares linear regression, the model is called the linear probability
model. Nonlinear models for binary dependent variables include the probit and logit model. The multivariate probit
model is a standard method of estimating a joint relationship between several binary dependent variables and some
independent variables. For categorical variables with more than two values there is the multinomial logit. For ordinal
variables with more than two values, there are the ordered logit and ordered probit models. Censored regression
models may be used when the dependent variable is only sometimes observed, and Heckman correction type models
may be used when the sample is not randomly selected from the population of interest. An alternative to such
procedures is linear regression based on polychoric correlation (or polyserial correlations) between the categorical
variables. Such procedures dier in the assumptions made about the distribution of the variables in the population. If
the variable is positive with low values and represents the repetition of the occurrence of an event, then count models
like the Poisson regression or the negative binomial model may be used instead.
80
The further the extrapolation goes outside the data, the more room there is for the model to fail due to dierences
between the assumptions and the sample data or the true values.
It is generally advised that when performing extrapolation, one should accompany the estimated value of the dependent
variable with a prediction interval that represents the uncertainty. Such intervals tend to expand rapidly as the values
of the independent variable(s) moved outside the range covered by the observed data.
For such reasons and others, some tend to say that it might be unwise to undertake extrapolation.[25]
However, this does not cover the full set of modelling errors that may be being made: in particular, the assumption
of a particular form for the relation between Y and X. A properly conducted regression analysis will include an
assessment of how well the assumed form is matched by the observed data, but it can only do so within the range of
values of the independent variables actually available. This means that any extrapolation is particularly reliant on the
assumptions being made about the structural form of the regression relationship. Best-practice advice here is that a
linear-in-variables and linear-in-parameters relationship should not be chosen simply for computational convenience,
but that all available knowledge should be deployed in constructing a regression model. If this knowledge includes
the fact that the dependent variable cannot go outside a certain range of values, this can be made use of in selecting
the model even if the observed dataset has no values particularly near such bounds. The implications of this step
of choosing an appropriate functional form for the regression can be great when extrapolation is considered. At a
minimum, it can ensure that any extrapolation arising from a tted model is realistic (or in accord with what is
known).
= 4.29 .
11.9. SOFTWARE
81
11.9 Software
Main article: List of statistical packages
All major statistical software packages perform least squares regression analysis and inference. Simple linear regression and multiple regression using least squares can be done in some spreadsheet applications and on some calculators.
While many statistical software packages can perform various types of nonparametric and robust regression, these
methods are less standardized; dierent software packages implement dierent methods, and a method with a given
name may be implemented dierently in dierent packages. Specialized regression software has been developed for
use in elds such as survey analysis and neuroimaging.
11.11 References
[1] Armstrong, J. Scott (2012). Illusions in Regression Analysis. International Journal of Forecasting (forthcoming). 28 (3):
689. doi:10.1016/j.ijforecast.2012.02.001.
[2] David A. Freedman, Statistical Models: Theory and Practice, Cambridge University Press (2005)
[3] R. Dennis Cook; Sanford Weisberg Criticism and Inuence Analysis in Regression, Sociological Methodology, Vol. 13.
(1982), pp. 313361
82
[4] Christopher M. Bishop (2006). Pattern Recognition and Machine Learning. Springer. p. 3. Cases [...] in which the aim is
to assign each input vector to one of a nite number of discrete categories, are called classication problems. If the desired
output consists of one or more continuous variables, then the task is called regression.
[5] Waegeman, Willem; De Baets, Bernard; Boullart, Luc (2008). ROC analysis in ordinal regression learning. Pattern
Recognition Letters. 29: 19. doi:10.1016/j.patrec.2007.07.019.
[6] A.M. Legendre. Nouvelles mthodes pour la dtermination des orbites des comtes, Firmin Didot, Paris, 1805. Sur la
Mthode des moindres quarrs appears as an appendix.
[7] C.F. Gauss. Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientum. (1809)
[8] C.F. Gauss. Theoria combinationis observationum erroribus minimis obnoxiae. (1821/1823)
[9] Mogull, Robert G. (2004). Second-Semester Applied Statistics. Kendall/Hunt Publishing Company. p. 59. ISBN 0-75751181-3.
[10] Galton, Francis (1989). Kinship and Correlation (reprinted 1989)". Statistical Science. Institute of Mathematical Statistics.
4 (2): 8086. doi:10.1214/ss/1177012581. JSTOR 2245330.
[11] Francis Galton. Typical laws of heredity, Nature 15 (1877), 492495, 512514, 532533. (Galton uses the term reversion in this paper, which discusses the size of peas.)
[12] Francis Galton. Presidential address, Section H, Anthropology. (1885) (Galton uses the term regression in this paper,
which discusses the height of humans.)
[13] Yule, G. Udny (1897). On the Theory of Correlation. Journal of the Royal Statistical Society. Blackwell Publishing. 60
(4): 81254. doi:10.2307/2979746. JSTOR 2979746.
[14] Pearson, Karl; Yule, G.U.; Blanchard, Norman; Lee,Alice (1903). The Law of Ancestral Heredity. Biometrika. Biometrika
Trust. 2 (2): 211236. doi:10.1093/biomet/2.2.211. JSTOR 2331683.
[15] Fisher, R.A. (1922). The goodness of t of regression formulae, and the distribution of regression coecients. Journal
of the Royal Statistical Society. Blackwell Publishing. 85 (4): 597612. doi:10.2307/2341124. JSTOR 2341124.
[16] Ronald A. Fisher (1954). Statistical Methods for Research Workers (Twelfth ed.). Edinburgh: Oliver and Boyd. ISBN
0-05-002170-2.
[17] Aldrich, John (2005). Fisher and Regression. Statistical Science. 20 (4): 401417. doi:10.1214/088342305000000331.
JSTOR 20061201.
[18] Rodney Ramcharan. Regressions: Why Are Economists Obessessed with Them? March 2006. Accessed 2011-12-03.
[19] N. Cressie (1996) Change of Support and the Modiable Areal Unit Problem. Geographical Systems 3:159180.
[20] Fotheringham, A. Stewart; Brunsdon, Chris; Charlton, Martin (2002). Geographically weighted regression: the analysis of
spatially varying relationships (Reprint ed.). Chichester, England: John Wiley. ISBN 978-0-471-49616-8.
[21] Fotheringham, AS; Wong, DWS (1 January 1991). The modiable areal unit problem in multivariate statistical analysis.
Environment and Planning A. 23 (7): 10251044. doi:10.1068/a231025.
[22] M. H. Kutner, C. J. Nachtsheim, and J. Neter (2004), Applied Linear Regression Models, 4th ed., McGraw-Hill/Irwin,
Boston (p. 25)
[23] N. Ravishankar and D. K. Dey (2002), A First Course in Linear Model Theory, Chapman and Hall/CRC, Boca Raton
(p. 101)
[24] Steel, R.G.D, and Torrie, J. H., Principles and Procedures of Statistics with Special Reference to the Biological Sciences.,
McGraw Hill, 1960, page 288.
[25] Chiang, C.L, (2003) Statistical methods of analysis, World Scientic. ISBN 981-238-310-7 - page 274 section 9.7.4 interpolation vs extrapolation
[26] Good, P. I.; Hardin, J. W. (2009). Common Errors in Statistics (And How to Avoid Them) (3rd ed.). Hoboken, New Jersey:
Wiley. p. 211. ISBN 978-0-470-45798-6.
[27] Tofallis, C. (2009). Least Squares Percentage Regression. Journal of Modern Applied Statistical Methods. 7: 526534.
doi:10.2139/ssrn.1406472.
[28] YangJing Long (2009). Human age estimation by metric learning for regression problems (PDF). Proc. International
Conference on Computer Analysis of Images and Patterns: 7482.
83
Chapter 12
Multivariate statistics
Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more
than one outcome variable. The application of multivariate statistics is multivariate analysis.
Multivariate statistics concerns understanding the dierent aims and background of each of the dierent forms of
multivariate analysis, and how they relate to each other. The practical implementation of multivariate statistics to
a particular problem may involve several types of univariate and multivariate analyses in order to understand the
relationships between variables and their relevance to the actual problem being studied.
In addition, multivariate statistics is concerned with multivariate probability distributions, in terms of both
how these can be used to represent the distributions of observed data;
how they can be used as part of statistical inference, particularly where several dierent quantities
are of interest to the same analysis.
Certain types of problem involving multivariate data, for example simple linear regression and multiple regression,
are not usually considered as special cases of multivariate statistics because the analysis is dealt with by considering
the (univariate) conditional distribution of a single outcome variable given the other variables.
85
6. Redundancy analysis (RDA) is similar to canonical correlation analysis but allows the user to derive a specied
number of synthetic variables from one set of (independent) variables that explain as much variance as possible
in another (independent) set. It is a multivariate analogue of regression.
7. Correspondence analysis (CA), or reciprocal averaging, nds (like PCA) a set of synthetic variables that summarise the original set. The underlying model assumes chi-squared dissimilarities among records (cases).
8. Canonical (or constrained) correspondence analysis (CCA) for summarising the joint variation in two sets
of variables (like redundancy analysis); combination of correspondence analysis and multivariate regression
analysis. The underlying model assumes chi-squared dissimilarities among records (cases).
9. Multidimensional scaling comprises various algorithms to determine a set of synthetic variables that best represent the pairwise distances between records. The original method is principal coordinates analysis (PCoA;
based on PCA).
10. Discriminant analysis, or canonical variate analysis, attempts to establish whether a set of variables can be used
to distinguish between two or more groups of cases.
11. Linear discriminant analysis (LDA) computes a linear predictor from two sets of normally distributed data to
allow for classication of new observations.
12. Clustering systems assign objects into groups (called clusters) so that objects (cases) from the same cluster are
more similar to each other than objects from dierent clusters.
13. Recursive partitioning creates a decision tree that attempts to correctly classify members of the population
based on a dichotomous dependent variable.
14. Articial neural networks extend regression and clustering methods to non-linear multivariate models.
15. Statistical graphics such as tours, parallel coordinate plots, scatterplot matrices can be used to explore multivariate data.
16. Simultaneous equations models involve more than one regression equation, with dierent dependent variables,
estimated together.
17. Vector autoregression involves simultaneous regressions of various time series variables on their own and each
others lagged values.
12.3 History
Andersons 1958 textbook, An Introduction to Multivariate Analysis,[3] educated a generation of theorists and applied
statisticians; Andersons book emphasizes hypothesis testing via likelihood ratio tests and the properties of power
functions: Admissibility, unbiasedness and monotonicity.[4][5]
86
12.6. REFERENCES
87
12.6 References
[1] Hidalgo, B; Goodman, M (2013). Multivariate or multivariable regression?". Am J Public Health. 103: 3940. doi:10.2105/AJPH.2012.300897
PMC 3518362 . PMID 23153131.
[2] Unsophisticated analysts of bivariate Gaussian problems may nd useful a crude but accurate method of accurately gauging
probability by simply taking the sum S of the N residuals squares, subtracting the sum Sm at minimum, dividing this
dierence by Sm, multiplying the result by (N - 2) and taking the inverse anti-ln of half that product.
[3] T.W. Anderson (1958) An Introduction to Multivariate Analysis, New York: Wiley ISBN 0471026409; 2e (1984) ISBN
0471889873; 3e (2003) ISBN 0471360910
[4] Sen, Pranab Kumar; Anderson, T. W.; Arnold, S. F.; Eaton, M. L.; Giri, N. C.; Gnanadesikan, R.; Kendall, M. G.; Kshirsagar, A. M.; et al. (June 1986). Review: Contemporary Textbooks on Multivariate Statistical Analysis: A Panoramic
Appraisal and Critique. Journal of the American Statistical Association. 81 (394): 560564. doi:10.2307/2289251. ISSN
0162-1459. JSTOR 2289251.(Pages 560561)
[5] Schervish, Mark J. (November 1987). A Review of Multivariate Analysis. Statistical Science. 2 (4): 396413. doi:10.1214/ss/1177013111.
ISSN 0883-4237. JSTOR 2245530.
Chapter 13
Data collection
Adlie penguins are identied and weighed each time they cross the automated weighbridge on their way to or from the sea.[1]
Data collection is the process of gathering and measuring information on targeted variables in an established systematic fashion, which then enables one to answer relevant questions and evaluate outcomes. The data collection component of research is common to all elds of study including physical and social sciences, humanities and business.It
help us to collect the main points as gathered information. While methods vary by discipline, the emphasis on ensuring accurate and honest collection remains the same. The goal for all data collection is to capture quality evidence
that then translates to rich data analysis and allows the building of a convincing and credible answer to questions that
have been posed.
13.1 Importance
Regardless of the eld of study or preference for dening data (quantitative or qualitative), accurate data collection
is essential to maintaining the integrity of research. Both the selection of appropriate data collection instruments
(existing, modied, or newly developed) and clearly delineated instructions for their correct use reduce the likelihood
88
13.2. TYPES
89
of errors occurring.
A formal data collection process is necessary as it ensures that data gathered are both dened and accurate and that
subsequent decisions based on arguments embodied in the ndings are valid.[2] The process provides both a baseline
from which to measure and in certain cases a target on what to improve.
13.2 Types
Generally there are four types of data collection and they are:
1. Surveys: Standardized paper-and-pencil or phone questionnaires that ask predetermined questions.
2. Interviews: Structured or unstructured one-on-one directed conversations with key individuals or leaders in a
community.
3. Focus groups: Structured interviews with small groups of like individuals using standardized questions, follow-up
questions, and exploration of other topics that arise to better understand participants.
4. Action Research: An intervention that is practicable (researcher does something to implant a modication or
intervention in a situation that is researchable).
Consequences from improperly collected data include:
Inability to answer research questions accurately;
Inability to repeat and validate the study.
13.4 References
[1] Lescrol, A. L.; Ballard, G.; Grmillet, D.; Authier, M.; Ainley, D. G. (2014). Descamps, Sbastien, ed. Antarctic
Climate Change: Extreme Events Disrupt Plastic Phenotypic Response in Adlie Penguins. PLoS ONE. 9 (1): e85291.
doi:10.1371/journal.pone.0085291. PMC 3906005 . PMID 24489657.
[2] Data Collection and Analysis By Dr. Roger Sapsford, Victor Jupp ISBN 0-7619-5046-X
[3] Weimer, J. (ed.) (1995). Research Techniques in Human Engineering. Englewood Clis, NJ: Prentice Hall ISBN 0-13097072-7
90
Chapter 14
Time series
Time series: random data plus trend, with best-t line and dierent applied lters
A time series is a series of data points listed (or graphed) in time order. Most commonly, a time series is a sequence
taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Examples of time series
are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average.
Time series are very frequently plotted via line charts. Time series are used in statistics, signal processing, pattern
recognition, econometrics, mathematical nance, weather forecasting, intelligent transport and trajectory forecasting,[1]
earthquake prediction, electroencephalography, control engineering, astronomy, communications engineering, and
largely in any domain of applied science and engineering which involves temporal measurements.
Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and
other characteristics of the data. Time series forecasting is the use of a model to predict future values based on
previously observed values. While regression analysis is often employed in such a way as to test theories that the
current values of one or more independent time series aect the current value of another time series, this type of
91
92
analysis of time series is not called time series analysis, which focuses on comparing values of a single time series
or multiple dependent time series at dierent points in time.[2]
Time series data have a natural temporal ordering. This makes time series analysis distinct from cross-sectional
studies, in which there is no natural ordering of the observations (e.g. explaining peoples wages by reference to their
respective education levels, where the individuals data could be entered in any order). Time series analysis is also
distinct from spatial data analysis where the observations typically relate to geographical locations (e.g. accounting for
house prices by the location as well as the intrinsic characteristics of the houses). A stochastic model for a time series
will generally reect the fact that observations close together in time will be more closely related than observations
further apart. In addition, time series models will often make use of the natural one-way ordering of time so that
values for a given period will be expressed as deriving in some way from past values, rather than from future values
(see time reversibility.)
Time series analysis can be applied to real-valued, continuous data, discrete numeric data, or discrete symbolic data
(i.e. sequences of characters, such as letters and words in the English language[3] ).
14.3 Analysis
There are several types of motivation and data analysis available for time series which are appropriate for dierent
purposes.
14.3.1
Motivation
In the context of statistics, econometrics, quantitative nance, seismology, meteorology, and geophysics the primary
goal of time series analysis is forecasting. In the context of signal processing, control engineering and communication
engineering it is used for signal detection and estimation, while in the context of data mining, pattern recognition and
machine learning time series analysis can be used for clustering, classication, query by content, anomaly detection
as well as forecasting.
14.3. ANALYSIS
14.3.2
93
Exploratory analysis
14.3.3
Curve tting
94
as an aid for data visualization,[17][18] to infer values of a function where no data are available,[19] and to summarize
the relationships among two or more variables.[20] Extrapolation refers to the use of a tted curve beyond the range
of the observed data,[21] and is subject to a degree of uncertainty[22] since it may reect the method used to construct
the curve as much as it reects the observed data.
The construction of economic time series involves the estimation of some components for some dates by interpolation
between values (benchmarks) for earlier and later dates. Interpolation is estimation of an unknown quantity between two known quantities (historical data), or drawing conclusions about missing information from the available
information (reading between the lines).[23] Interpolation is useful where the data surrounding the missing data is
available and its trend, seasonality, and longer-term cycles are known. This is often done by using a related series
known for all relevant dates.[24] Alternatively polynomial interpolation or spline interpolation is used where piecewise polynomial functions are t into time intervals such that they t smoothly together. A dierent problem which
is closely related to interpolation is the approximation of a complicated function by a simple function (also called
regression).The main dierence between regression and interpolation is that polynomial regression gives a single
polynomial that models the entire data set. Spline interpolation, however, yield a piecewise continuous function
composed of many polynomials to model the data set.
Extrapolation is the process of estimating, beyond the original observation range, the value of a variable on the basis
of its relationship with another variable. It is similar to interpolation, which produces estimates between known
observations, but extrapolation is subject to greater uncertainty and a higher risk of producing meaningless results.
14.3.4
Function Approximation
14.3.5
In statistics, prediction is a part of statistical inference. One particular approach to such inference is known as
predictive inference, but the prediction can be undertaken within any of the several approaches to statistical inference. Indeed, one description of statistics is that it provides a means of transferring knowledge about a sample of a
population to the whole population, and to other related populations, which is not necessarily the same as prediction
over time. When information is transferred across time, often to specic points in time, the process is known as
forecasting.
Fully formed statistical models for stochastic simulation purposes, so as to generate alternative versions of the
time series, representing what might happen over non-specic time-periods in the future
Simple or fully formed statistical models to describe the likely outcome of the time series in the immediate
future, given knowledge of the most recent outcomes (forecasting).
Forecasting on time series is usually done using automated statistical software packages and programming
languages, such as R, S, SAS, SPSS, Minitab, Pandas (Python) and many others.
14.4. MODELS
14.3.6
95
Classication
14.3.7
Regression analysis
14.3.8
Signal estimation
14.3.9
Segmentation
14.4 Models
Models for time series data can have many forms and represent dierent stochastic processes. When modeling
variations in the level of a process, three broad classes of practical importance are the autoregressive (AR) models,
the integrated (I) models, and the moving average (MA) models. These three classes depend linearly on previous
data points.[27] Combinations of these ideas produce autoregressive moving average (ARMA) and autoregressive
integrated moving average (ARIMA) models. The autoregressive fractionally integrated moving average (ARFIMA)
model generalizes the former three. Extensions of these classes to deal with vector-valued data are available under
the heading of multivariate time-series models and sometimes the preceding acronyms are extended by including
an initial V for vector, as in VAR for vector autoregression. An additional set of extensions of these models is
available for use where the observed time-series is driven by some forcing time-series (which may not have a causal
eect on the observed series): the distinction from the multivariate case is that the forcing series may be deterministic
or under the experimenters control. For these models, the acronyms are extended with a nal X for exogenous.
96
Non-linear dependence of the level of a series on previous data points is of interest, partly because of the possibility
of producing a chaotic time series. However, more importantly, empirical investigations can indicate the advantage
of using predictions derived from non-linear models, over those from linear models, as for example in nonlinear
autoregressive exogenous models. Further references on nonlinear time series analysis: (Kantz and Schreiber),[28]
and (Abarbanel) [29]
Among other types of non-linear time series models, there are models to represent the changes of variance over
time (heteroskedasticity). These models represent autoregressive conditional heteroskedasticity (ARCH) and the
collection comprises a wide variety of representation (GARCH, TARCH, EGARCH, FIGARCH, CGARCH, etc.).
Here changes in variability are related to, or predicted by, recent past values of the observed series. This is in contrast
to other possible representations of locally varying variability, where the variability might be modelled as being driven
by a separate time-varying process, as in a doubly stochastic model.
In recent work on model-free analyses, wavelet transform based methods (for example locally stationary wavelets and
wavelet decomposed neural networks) have gained favor. Multiscale (often referred to as multiresolution) techniques
decompose a given time series, attempting to illustrate time dependence at multiple scales. See also Markov switching
multifractal (MSMF) techniques for modeling volatility evolution.
A Hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be
a Markov process with unobserved (hidden) states. An HMM can be considered as the simplest dynamic Bayesian
network. HMM models are widely used in speech recognition, for translating a time series of spoken words into text.
14.4.1
Notation
A number of dierent notations are in use for time-series analysis. A common notation specifying a time series X
that is indexed by the natural numbers is written
X = {X1 , X2 , ...}.
Another common notation is
Y = {Yt: t T},
where T is the index set.
14.4.2
Conditions
There are two sets of conditions under which much of the theory is built:
Stationary process
Ergodic process
However, ideas of stationarity must be expanded to consider two important ideas: strict stationarity and second-order
stationarity. Both models and applications can be developed under each of these conditions, although the models in
the latter case might be considered as only partly specied.
In addition, time-series analysis can be applied where the series are seasonally stationary or non-stationary. Situations
where the amplitudes of frequency components change with time can be dealt with in time-frequency analysis which
makes use of a timefrequency representation of a time-series or signal.[30]
14.4.3
Models
14.4. MODELS
97
98
14.4.4
Measures
Time series metrics or features that can be used for time series classication or regression analysis:[34]
Univariate linear measures
Moment (mathematics)
Spectral band power
Spectral edge frequency
Accumulated Energy (signal processing)
Characteristics of the autocorrelation function
Hjorth parameters
FFT parameters
Autoregressive model parameters
MannKendall test
Univariate non-linear measures
Measures based on the correlation sum
Correlation dimension
Correlation integral
Correlation density
Correlation entropy
Approximate entropy[35]
Sample entropy
Fourier entropy
Wavelet entropy
Rnyi entropy
Higher-order methods
Marginal predictability
Dynamical similarity index
State space dissimilarity measures
Lyapunov exponent
Permutation methods
14.5. VISUALIZATION
99
Local ow
Other univariate measures
Algorithmic complexity
Kolmogorov complexity estimates
Hidden Markov Model states
Surrogate time series and surrogate correction
Loss of recurrence (degree of non-stationarity)
Bivariate linear measures
Maximum linear cross-correlation
Linear Coherence (signal processing)
Bivariate non-linear measures
Non-linear interdependence
Dynamical Entrainment (physics)
Measures for Phase synchronization
Measures for Phase locking
Similarity measures:[36]
Cross-correlation
Dynamic Time Warping[32]
Hidden Markov Models
Edit distance
Total correlation
NeweyWest estimator
PraisWinsten transformation
Data as Vectors in a Metrizable Space
Minkowski distance
Mahalanobis distance
Data as Time Series with Envelopes
Global Standard Deviation
Local Standard Deviation
Windowed Standard Deviation
Data Interpreted as Stochastic Series
Pearson product-moment correlation coecient
Spearmans rank correlation coecient
Data Interpreted as a Probability Distribution Function
KolmogorovSmirnov test
Cramrvon Mises criterion
14.5 Visualization
Time series can be visualized with two categories of chart:Overlapping Charts and Separated Charts. Overlapping
Charts display all-time series on the same layout while Separated Charts presents them on dierent layouts (but
aligned for comparison purpose)[37]
100
14.5.1
Overlapping Charts
Braided Graphs
Line Charts
Slope Graphs
GapChart
14.5.2
Separated Charts
Horizon Graphs
Reduced Line Charts (small multiples)
Silhouette Graph
Circular Silhouette Graph
14.6 Applications
Fractal geometry, using a deterministic Cantor structure, is used to model the surface topography, where recent
advancements in thermoviscoelastic creep contact of rough surfaces are introduced. Various viscoelastic idealizations
are used to model the surface materials, for example, Maxwell, Kelvin-Voigt, Standard Linear Solid and Jerey media.
Asymptotic power laws, through hypergeometric series, were used to express the surface creep as a function of remote
forces, body temperatures and time.[38]
14.7 Software
Working with Time Series data is a relatively common use for statistical analysis software. As a result of this, there
are many oerings both commercial and open source. Some examples include:
CRAN supplementary statistics package for R[39]
Analysis and Forecasting with Weka[40]
Predictive modeling with GMDH Shell[41]
Functions and Modeling in the Wolfram Language[42]
Time Series Objects in MATLAB[43]
SAS/ETS in SAS software[44]
Expert Modeler in IBM SPSS Statistics and IBM SPSS Modeler
14.9. REFERENCES
101
Distributed lag
Estimation theory
Forecasting
Hurst exponent
Monte Carlo method
Random walk
Scaled correlation
Seasonal adjustment
Sequence analysis
Signal processing
Trend estimation
Unevenly spaced time series
Time series database
14.9 References
[1] Zissis, Dimitrios; Xidias, Elias; Lekkas, Dimitrios (2015). Real-time vessel behavior prediction. Evolving Systems. 7:
112. doi:10.1007/s12530-015-9133-5.
[2] Imdadullah. Time Series Analysis. Basic Statistics and Data Analysis. itfeature.com. Retrieved 2 January 2014.
[3] Lin, Jessica; Keogh, Eamonn; Lonardi, Stefano; Chiu, Bill (2003). A symbolic representation of time series, with implications for streaming algorithms. Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and
knowledge discovery. New York: ACM Press. doi:10.1145/882082.882086.
[4] Bloomeld, P. (1976). Fourier analysis of time series: An introduction. New York: Wiley. ISBN 0471082562.
[5] Shumway, R. H. (1988). Applied statistical time series analysis. Englewood Clis, NJ: Prentice Hall. ISBN 0130415006.
[6] Sandra Lach Arlinghaus, PHB Practical Handbook of Curve Fitting. CRC Press, 1994.
[7] William M. Kolb. Curve Fitting for Programmable Calculators. Syntec, Incorporated, 1984.
[8] S.S. Halli, K.V. Rao. 1992. Advanced Techniques of Population Analysis. ISBN 0306439972 Page 165 (cf. ... functions
are fullled if we have a good to moderate t for the observed data.)
[9] The Signal and the Noise: Why So Many Predictions Fail-but Some Don't. By Nate Silver
[10] Data Preparation for Data Mining: Text. By Dorian Pyle.
[11] Numerical Methods in Engineering with MATLAB. By Jaan Kiusalaas. Page 24.
[12] Numerical Methods in Engineering with Python 3. By Jaan Kiusalaas. Page 21.
[13] Numerical Methods of Curve Fitting. By P. G. Guest, Philip George Guest. Page 349.
[14] See also: Mollier
[15] Fitting Models to Biological Data Using Linear and Nonlinear Regression. By Harvey Motulsky, Arthur Christopoulos.
[16] Regression Analysis By Rudolf J. Freund, William J. Wilson, Ping Sa. Page 269.
[17] Visual Informatics. Edited by Halimah Badioze Zaman, Peter Robinson, Maria Petrou, Patrick Olivier, Heiko Schrder.
Page 689.
[18] Numerical Methods for Nonlinear Engineering Models. By John R. Hauser. Page 227.
[19] Methods of Experimental Physics: Spectroscopy, Volume 13, Part 1. By Claire Marton. Page 150.
102
[20] Encyclopedia of Research Design, Volume 1. Edited by Neil J. Salkind. Page 266.
[21] Community Analysis and Planning Techniques. By Richard E. Klosterman. Page 1.
[22] An Introduction to Risk and Uncertainty in the Evaluation of Environmental Investments. DIANE Publishing. Pg 69
[23] Hamming, Richard. Numerical methods for scientists and engineers. Courier Corporation, 2012.
[24] Friedman, Milton. The interpolation of time series by related series. Journal of the American Statistical Association
57.300 (1962): 729-757.
[25] Gandhi, Sorabh, Luca Foschini, and Subhash Suri. Space-ecient online approximation of time series data: Streams,
amnesia, and out-of-order. Data Engineering (ICDE), 2010 IEEE 26th International Conference on. IEEE, 2010.
[26] Lawson, Charles L.; Hanson, Richard J. (1995). Solving Least Squares Problems. Philadelphia: Society for Industrial and
Applied Mathematics. ISBN 0898713560.
[27] Gershenfeld, N. (1999). The Nature of Mathematical Modeling. New York: Cambridge University Press. pp. 205208.
ISBN 0521570956.
[28] Kantz, Holger; Thomas, Schreiber (2004). Nonlinear Time Series Analysis. London: Cambridge University Press. ISBN
978-0521529020.
[29] Abarbanel, Henry (Nov 25, 1997). Analysis of Observed Chaotic Data. New York: Springer. ISBN 978-0387983721.
[30] Boashash, B. (ed.), (2003) Time-Frequency Signal Analysis and Processing: A Comprehensive Reference, Elsevier Science,
Oxford, 2003 ISBN ISBN 0-08-044335-4
[31] Nikoli, D.; Muresan, R. C.; Feng, W.; Singer, W. (2012). Scaled correlation analysis: a better way to compute a crosscorrelogram. European Journal of Neuroscience. 35 (5): 742762. doi:10.1111/j.1460-9568.2011.07987.x.
[32] Sakoe, Hiroaki; Chiba, Seibi (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE
Transactions on Acoustics, Speech and Signal Processing. doi:10.1109/TASSP.1978.1163055.
[33] Goutte, Cyril; Toft, Peter; Rostrup, Egill; Nielsen, Finn .; Hansen, Lars Kai (1999). On Clustering fMRI Time Series.
NeuroImage. doi:10.1006/nimg.1998.0391.
[34] Mormann, Florian; Andrzejak, Ralph G.; Elger, Christian E.; Lehnertz, Klaus (2007). Seizure prediction: the long and
winding road. Brain. 130 (2): 314333. doi:10.1093/brain/awl241. PMID 17008335.
[35] Land, Bruce; Elias, Damian. Measuring the 'Complexity' of a time series.
[36] Ropella, G. E. P.; Nag, D. A.; Hunt, C. A. (2003). Similarity measures for automated comparison of in silico and in vitro
experimental results. Engineering in Medicine and Biology Society. 3: 29332936. doi:10.1109/IEMBS.2003.1280532.
[37] Tominski, Christian; Aigner, Wolfgang. The TimeViz Browser:A Visual Survey of Visualization Techniques for TimeOriented Data. Retrieved 1 June 2014.TimeViz
[38] Osama Abuzeid, Anas Al-Rabadi, Hashem Alkhaldi . Recent advancements in fractal geometric-based nonlinear time
series solutions to the micro-quasistatic thermoviscoelastic creep for rough surfaces in contact, Mathematical Problems in
Engineering, Volume 2011, Article ID 691270
[39] Hyndman, Rob J (2016-01-22). CRAN Task View: Time Series Analysis.
[40] Time Series Analysis and Forecasting with Weka - Pentaho Data Mining - Pentaho Wiki. wiki.pentaho.com. Retrieved
2016-07-07.
[41] Time Series Analysis & Forecasting Software 2016 [Free Download]". Retrieved 2016-07-07.
[42] Time SeriesWolfram Language Documentation. reference.wolfram.com. Retrieved 2016-07-07.
[43] Time Series Objects - MATLAB & Simulink. www.mathworks.com. Retrieved 2016-07-07.
[44] Econometrics and Time Series Analysis, SAS/ETS Software. Retrieved 2016-07-07.
103
104
Text
Statistics Source: https://en.wikipedia.org/wiki/Statistics?oldid=736871217 Contributors: Brion VIBBER, Mav, The Anome, Tarquin,
Stephen Gilbert, Ap, Larry Sanger, Eclecticology, Saikat, Youssefsan, Christian List, Enchanter, Miguel~enwiki, SimonP, Peterlin~enwiki,
Ben-Zin~enwiki, Hefaistos, Waveguy, Heron, Rsabbatini, Camembert, Marekan, Olivier, Stevertigo, Edward, Boud, Michael Hardy,
GABaker, Fred Bauder, Lexor, Nixdorf, Shyamal, Kku, Tannin, Dcljr, Tomi, CesarB, Looxix~enwiki, Ahoerstemeier, DavidWBrooks,
Ronz, BevRowe, Snoyes, Salsa Shark, Netsnipe, Big iron, Jtzg, Cherkash, Samuel~enwiki, Mxn, Schneelocke, Hike395, Guaka, Vanished user 5zariu3jisj0j4irj, Wikiborg, Dysprosia, Jitse Niesen, Quux, Jake Nelson, Maximus Rex, Wakka, Wernher, Optim, Rbellin,
Secretlondon, Noeckel, Phil Boswell, Robbot, Jakohn, Benwing, ZimZalaBim, Gandalf61, Tim Ivorson, RossA, Henrygb, Hemanshu,
Gidonb, Borislav, Ianml, Roozbeh, Dhodges, SoLando, Wile E. Heresiarch, Cutler, Dave6, Aomarks, Ancheta Wis, Matthew Stannard,
Tophcito, Giftlite, Sj, Wikilibrarian, Netoholic, Lethe, Tom harrison, Meursault2004, Everyking, Maha ts, Curps, Dmb000006, Muzzle,
Jfdwol, BrendanH, Maarten van Vliet, Guanaco, Skagedal, Eequor, Mdb~enwiki, SWAdair, Brazuca, Hereticam, Andycjp, Mats Kindahl, Antandrus, MarkSweep, Piotrus, Ampre, L353a1, Sean Heron, CSTAR, APH, Oneiros, Gsociology, PFHLai, Bodnotbod, Mysidia,
Icairns, Simoneau, Sam Hocevar, Jeremykemp, Howardjp, Divadrax, Zondor, Bluemask, Drchris, Richardelainechambers, Moverton,
Discospinster, Rich Farmbrough, Guanabot, Michal Jurosz, IlyaHaykinson, Paul August, Bender235, Kbh3rd, Brian0918, El C, Lycurgus, Zenohockey, Art LaPella, RoyBoy, 2005, Bobo192, Janna Isabot, O18, Gianlu~enwiki, Smalljim, Maurreen, 3mta3, Minghong, Mdd,
Passw0rd, Drf5n, Schissel, Jigen III, Msh210, Alansohn, Gary, Anthony Appleyard, Mduvekot, Kanie, Rgclegg, Avenue, Evil Monkey,
Oleg Alexandrov, AustinZ, Waabu, Linas, Karnesky, LOL, Before My Ken, WadeSimMiser, Acerperi, Wikiklrsc, Sengkang, BlaiseFEgan, Gimboid13, Mr Anthem, Marudubshinki, RichardWeiss, Graham87, Ilya, Galwhaa, Chun-hian, FreplySpang, Dragoneye776, Dpr,
Tlroche, Jorunn, Koolkao, Rjwilmsi, Mayumashu, Pleiotrop3, Amire80, Carbonite, Salix alba, Jb-adder, Willetjo, Crazynas, Jemcneill, Zero0w, FlaBot, Chocolatier, RobertG, Windchaser, Dibowen5, Latka, Mathbot, Airumel, Nivix, Celestianpower, RexNL, Gurch,
AndriuZ, Pete.Hurd, Mathieumcguire, Shaile, Malhonen, BradBeattie, CiaPan, Chobot, Nagytibi, DVdm, Bgwhite, Simesa, Adoniscik,
Gwernol, Wavelength, Phantomsteve, Loom91, Cswrye, Epolk, Donwarnersaklad, Hydrargyrum, Stephenb, Manop, Chaos, NawlinWiki, Wiki alf, Grafen, Tailpig, ONEder Boy, TCrossland, Johndarrington, Isolani, D. Wu, Alex43223, BOT-Superzerocool, Mgnbar, Tigershrike, Saric, Closedmouth, Terfgiu, Modify, Beaker342, GraemeL, AGToth, Whouk, NeilN, DVD R W, Sardanaphalus,
Veinor, JJL, SmackBot, YellowMonkey, Twerges, Unschool, Honza Zruba, Stux, Hydrogen Iodide, McGeddon, Mscuthbert, CommodiCast, Timotheus Canens, Dhochron, Gilliam, Brotherbobby, Skizzik, ERcheck, Chris the speller, Bychan~enwiki, Bluebot, Keegan,
Jjalexand, DocKrin, Wikisamh, Silly rabbit, Ekalin, RayAYang, Deli nk, Klnorman, Dlohcierekims sock, Robth, Zven, John Reaves,
Scwlong, Chendy, SLC1, Iwaterpolo, PierreAnoid, Can't sleep, clown will eat me, DRahier, Asarko, Hve, Terry Oldberg, Addshore,
Kcordina, Amazins490, Mosca, SundarBot, UU, Jmlk17, Aldaron, ConMan, Valenciano, Krexer, Chadmbol, Richard001, Nrcprm2026,
Mini-Geek, G716, Photoleif, GumbyProf, Fschoonj, Wybot, Zeamays, SashatoBot, Lambiam, Arodb, Derek farn, Harryboyles, Chocolateluvr88, Sina2, Archimerged, Kuru, MagnaMopus, Lapaz, Soumyasch, Tim bates, SpyMagician, Deviathan~enwiki, Ckatz, RandomCritic, 16@r, Beetstra, Santa Sangre, Daphne A, Mets501, Spiel496, Ctacmo, RichardF, Roderickmunro, Hu12, Levineps, BranStark,
Joseph Solis in Australia, Wjejskenewr, Mangesh.dashpute, Chris53516, Igoldste, Tawkerbot2, Daniel5127, Filelakeshoe, Kevin Murray, Kendroche, JForget, Robertdamron, CRGreathouse, CmdrObot, Dycedarg, Philiprbrenan, Dexter inside, Requestion, MarsRover,
Neelix, Hingenivrutti, Penbat, Nnp, Art10, MrFish, Myasuda, Mct mht, Slack---line, Mjhoy~enwiki, Arauzo, Ramitmahajan, Gogo
Dodo, Jkokavec, Anonymi, Bornsommer, Odie5533, Christian75, DumbBOT, Richard416282, Englishnerd, Optimist on the run, Lindsay658, Finn krogstad, FrancoGG, Mattisse, Talgalili, Sarvesh85@gmail.com, Epbr123, Jrl306, LeeG, Jsejcksn, Willworkforicecream,
N5iln, Marek69, John254, Escarbot, Dainis, Mentisto, Wikiwilly~enwiki, AntiVandalBot, Luna Santin, Seaphoto, Memset, Zappernapper, Mack2, Sbarnard, Gkhan, Golgofrinchian, MikeLynch, JAnDbot, Ldc, Markbold, The Transhumanist, Db099221, BenB4,
PhilKnight, IamHope, SiobhanHansa, Magioladitis, Bongwarrior, VoABot II, Je Dahl, JamesBWatson, Hubbardaie, Ranger2006, Trugster, Skew-t, Recurring dreams, Ddr~enwiki, Caesarjbsquitti, Avicennasis, Nevvers, KConWiki, Catgut, Animum, Depressedrobot,
Johnbibby, Robotman1974, Boob, Bobby H. Heey, Xerxes minor, JoergenB, DerHexer, JaGa, Khalid Mahmood, AllenDowney,
Apdevries, Pax:Vobiscum, Gjd001, Rustyfence, Cli smith, MartinBot, Vigyani, BetBot~enwiki, Jim.henderson, R'n'B, Lilac Soul,
Mausy5043, J.delanoy, Trusilver, Rlsheehan, Numbo3, Mthibault, Ulyssesmsu, Yannick56, TheSeven, Cpiral, Gzkn, M C Y 1008,
Luntertun, It Is Me Here, Noschool3, Ronny Gunnarsson, Macrolizard, Bmilicevic, HiLo48, The Transhumanist (AWB), KylieTastic,
Kenneth M Burke, DavidCBryant, Tiggerjay, Afv2006, HyDeckar, WinterSpw, Ron shelf, Tanyawade, Idioma-bot, Funandtrvl, Wikieditor06, Lights, VolkovBot, DrMicro, ABF, JohnBlackburne, Paxcoder, Jimmaths, Barneca, Philip Trueman, DoorsAjar, TXiKiBoT,
Ranajeet, Jacob Lundberg, Wikipediatoperfection, Tomsega, ElinorD, Qxz, Arpabr, The Tetrast, Seanstock, Jackfork, Christopher Connor, Onore Baka Sama, Manik762007, Careercornerstone, Wikidan829, Richard redfern, Skarz, Dmcq, Symane, EmxBot, Kolmorogo,
Demmy, Thefellswooper, SieBot, BotMultichill, Katonal, Triwbe, Toddst1, Flyer22 Reborn, Tiptoety, JD554, Ireas, Jt512, Free Software Knight, Strife911, Oxymoron83, Faradayplank, Boromir123, Hinaaa, BenoniBot~enwiki, Emesee, OKBot, Msrasnw, Melcombe,
Yhkhoo, Nn123645, Superbeecat, Digisus, Richard David Ramsey, Escape Orbit, Maniac2910, Tautologist, XDanielx, WikipedianMarlith, ClueBot, Rumping, Fyyer, John ellenberger, DesertAngel, Gaia Octavia Agrippa, Giusippe, Turbojet, Uncle Milty, Niceguyedc,
LizardJr8, Morten Mnchow, Chickenman78, Lbertolotti, DragonBot, Pumpmeup, Jusdafax, Three-quarter-ten, Rwilli13, Adamjslund,
Livius3, Stathope17, Notteln, Precanalytics, Diaa abdelmoneim, Dekisugi, Gundersen53, BOTarate, Aitias, ShawnAGaddy, Dbenzvi, JDPhD, FinnMan, Qwfp, Antonwg, Ano-User, GKantaris, Editorofthewiki, Helixweb, XLinkBot, Avoided, WikHead, Alexius08, Tayste,
Addbot, Proofreader77, Hgberman, DOI bot, Captain-tucker, Atethnekos, Fgnievinski, Johnjohn83, Kwanesum, Br1z, Bte99, CanadianLinuxUser, MrOllie, Chamal N, Glane23, Delaszk, Glass Sword, Debresser, Favonian, Quercus solaris, Aitambong, Ssschhh, Tide
rolls, Lightbot, Kiril Simeonovski, Teles, MuZemike, TeH nOmInAtOr, LuK3, Megaman en m, Nbeltz, Jim, Luckas-bot, Yobot, Notizy1251, OrgasGirl, Fraggle81, Vimalp, DisillusionedBitterAndKnackered, Mathinik, Gobbleswoggler, THEN WHO WAS PHONE?,
ECEstats, Brougham96, Mhmolitor, AnomieBOT, DemocraticLuntz, VX, Jim1138, Cavarrone, Galoubet, Dwayne, Piano non troppo,
Youkbam, Templatehater, Walter Grassroot, Htim, Materialscientist, The High Fin Sperm Whale, Citation bot, Jtamad, OllieFury, Markmagdy, Sweeraha, GB fan, Apollo, Neurolysis, ArthurBot, Herreradavid33, LilHelpa, Xqbot, TinucherianBot II, Class ruiner, Kenz0402,
Drilnoth, Fishiface, Locos epraix, Spetzznaz, AbigailAbernathy, Clear range, Coretheapple, GrouchoBot, Ute in DC, SassoBot, Loizbec,
78.26, Rstatx, Stynyr, Doulos Christos, Chen-Pan Liao, N.j.hansen, Shadowjams, Joaquin008, Brennan41292, FrescoBot, Tobby72,
Hallway916, Shadowpsi, HJ Mitchell, Winterswift, Citation bot 1, PrBeacon, Boxplot, Yuanfangdelang, Pinethicket, Kiefer.Wolfowitz,
Stpasha, Brian Everlasting, le ottante, Bwana2009, Dee539, Florendobe, White Shadows, Gamewizard71, FoxBot, Mjs1991, Ruzihm,
TobeBot, LAUD, Arfgab, Decstop, MrX, Spegali, Keepitup.sid, Sourishdas, Tbhotch, Drivi86, Sandman888, DARTH SIDIOUS 2, Chrisrayner, Whisky drinker, Mean as custard, Updatehelper, TjBot, Kastchei, Karlheinz037, Becritical, Elitropia, Jordan.brayanov, EmausBot, Orphan Wiki, Gfoley4, Racerx11, Hiamy, Tommy2010, Kellylautt, Dcirovic, Tuxedo junction, Bae88, Daonguyen95, F, Josve05a,
105
Bollyje, Tastewrong1234, WeijiBaikeBianji, Cbratsas, JA(000)Davidson, Access Denied, Dylthaavatar, Kgwet, SporkBot, Jorjulio,
GrindtXX, Makecat, Sak11sl, Future ahead, Anglais1, Sunur7, Mr. Kenan Bek, Noodleki, Donner60, Agatecat2700, NTox, DemonicPartyHat, 28bot, Petrb, ClueBot NG, MelbourneStar, This lousy T-shirt, Chrisminter, Dvsbmx, BarrelProof, Bped1985, Andreas.Persson,
Shawnluft, Cntras, Braincricket, ScottSteiner, Widr, Hikenstu, Ryan Vesey, Amircrypto, Helpful Pixie Bot, Xandrox, Mishnadar,
Ldownss00, Calabe1992, KLBot2, Lowercase sigmabot, BG19bot, Scyllagist, WikiTryHardDieHard, Juro2351, Northamerica1000, Absconded Northerner, Muhehej1000, MusikAnimal, Marcocapelle, Stalve, EmadIV, Rm1271, Htrkaya, Omiswiki, Manoguru, Kittipatv,
Meclee, Brad7777, Glacialfox, Roleren, Anbu121, Aks23121990, Europeancentralbank, Bsutradhar, Ca3tki, Kodiologist, Codeh, Gr
khan veroana kharal, Markk waugh, Illia Connell, SelmanRepiti, Dexbot, Ubertook, Mogism, Wikignome1213, CuriousMind01, Princessandthepi, Lugia2453, Brownstat, Norazoey, Speakel, 069952497a, PeterLFlomPhD, Faizan, RG57, FallingGravity, AmericanLemming, Tentinator, Beasarah, DavidLeighEllis, Butter7938, Ugog Nizdast, Seppi333, SpuriousTwist, Ginsuloft, Sean4424, Sarwan khan,
Adirlanz, AddWittyNameHere, Narasandraprabhakara, Science.philosophy.arts, Akuaku123, Mendisar Esarimar Desktrwaimar, Mconnolly17, Zib2542, Therealthings, MelaniePS, Monkbot, Horseless Headman, Soon Son Simps, Vieque, Majormuesli, Waggie, Trackteur,
Andri Kuawko, Romelthomas, Umkan, Ybergner, Amortias, NQ, Morgantaschuk, VanishedUser sdu9aya9fs654654, Schmuck420, Crystallizedcarbon, GautamC129, Sumonratin, Zppix, Charlotte Aryanne, Thebearedguy, Mj3322, Rainamagdalena, Lucky457, JohnDae123,
Kreplach123, BabyChastie, SolidPhase, Amira Swedan, Isambard Kingdom, All-wikipro, Asyraf Afthanorhan, KasparBot, Hilopmip, Replypartyeuclides, Chonzom, CLCStudent, Badineleynes, Johnyau89, ArguMentor, Marianna251, XUSB, NRXTR and Anonymous:
1269
Portal:Statistics Source: https://en.wikipedia.org/wiki/Portal%3AStatistics?oldid=659788057 Contributors: Topbanana, Tompw, Btyner,
G716, Magioladitis, VolkovBot, Udufruduhu, Melcombe, Cenarium, Qwfp, Pa36opob, Addbot, Tcharvin, Jbenno, Ciphers, Donner60,
Northamerica1000, Illia Connell, John of Reading Bot and Anonymous: 3
List of elds of application of statistics Source: https://en.wikipedia.org/wiki/List_of_fields_of_application_of_statistics?oldid=712822720
Contributors: Alan Liefting, Btyner, Itub, G716, Cydebot, Rlsheehan, Fratrep, Melcombe, Qwfp, Koumz, Addbot, Luckas-bot, Materialscientist, Xqbot, Duoduoduo, ClueBot NG, WIKIWIZWORKER, BG19bot, Smasongarrison, Sheri khan khan, Soon Son Simps and
Anonymous: 13
Business analytics Source: https://en.wikipedia.org/wiki/Business_analytics?oldid=725185185 Contributors: Michael Hardy, Kku, Michael
Devore, Alvestrand, Gscshoyru, S.K., Mdd, Oleg Alexandrov, Mindmatrix, RHaworth, Rjwilmsi, Hans Genten, Random user 39849958,
Rick lightburn, JLaTondre, XpXiXpY, SmackBot, RolandR, Kuru, Cnbrb, Simonjohnpalmer, Earthlyreason, B, Alaibot, MelanieN,
Vlado1, Vanished user ty12kl89jq10, Sarnalios~enwiki, Wcrosbie, Philip Trueman, Billinghurst, Kerenb, Emilygracedell, Fratrep, Melcombe, Founder DIPM Institute, Ukpremier, Tomas e, Jinij, Niceguyedc, Apparition11, Writerguy71, DeepOpinion, MrOllie, Crmguru2008, Citation bot, Emcien, FrescoBot, Rlistou, Boxplot, Pinethicket, I dream of horses, AmyDenise, Dnedzel, Full-date unlinking bot, Ethansdad, Trappist the monk, Crysb, Helwr, Timtempleton, Dries Debbaut, Chire, Idea Farm, Smithandteam, ClueBot NG,
WhartonCAI, HMSSolent, Wbm1058, BG19bot, Singularit, Jamesx12345, Me, Myself, and I are Here, Faizan, Picturepro, Huang cynthia, Photo.iep, Drchriswilliams, Monkbot, HMSLavender, Loraof, Yashwantsnaik, Rasaxen, Olletove, Rahulfsm, Kavithagrg, Alpha T
Knowledge, Andrewbielat and Anonymous: 70
Descriptive statistics Source: https://en.wikipedia.org/wiki/Descriptive_statistics?oldid=736764422 Contributors: AxelBoldt, Mav, Larry
Sanger, ChangChienFu, Michael Hardy, Dcljr, Tomi, Ronz, Mickey~enwiki, Palfrey, Jitse Niesen, Henrygb, Wikibot, TPK, Giftlite,
Wmahan, Rich Farmbrough, HCA, Bender235, Arcadian, RainbowOfLight, Graham87, Vegaswikian, John Baez, Latka, Chobot, YurikBot, NTBot~enwiki, Wimt, TCrossland, DeadEyeArrow, Zarboki, Blueyoshi321, Closedmouth, Modify, Allens, Bo Jacoby, SmackBot,
Gilliam, Irbobo, Kurykh, TonySt, G716, Vina-iwbot~enwiki, Lambiam, Tim bates, Dan1679, Forsakendaemon, CmdrObot, Irwangatot, Mattisse, Barticus88, Eddyspeeder, Lfstevens, MER-C, Magioladitis, Yllhyseni, David Eppstein, Tgeairn, Syalowitz, VolkovBot,
TXiKiBoT, Samantha kellett, BotKung, Graymornings, Dan Polansky, SieBot, Bentogoa, Quest for Truth, S2000magician, Melcombe,
Escape Orbit, ClueBot, Unbuttered Parsnip, Niceguyedc, Skbkekas, L.tak, Livius3, SchreiberBike, Qwfp, Alexius08, Addbot, RPHv,
Friginator, NjardarBot, MrOllie, AndersBot, Aviados, Legobot, Luckas-bot, Yobot, KamikazeBot, Sz-iwbot, Materialscientist, Citation
bot, MauritsBot, Bakerccm, Pelicans in the lake, Pinethicket, Dashed, Duoduoduo, Cowlibob, Jlj1173, Elium2, GoingBatty, RenamedUser01302013, ZroBot, Donner60, EdoBot, Drea23839, ClueBot NG, Ch88, Hmansourian, BenJChadwick, Benzband, Morning Sunshine, ChrisGualtieri, Yukyuk11, Auss00, S2Jackie, ItsClaudiaC, Evanlemke, Ashleyleia, Clahoonya, Jsw6408, Ualtin, MelaniePS, Soon
Son Simps, Mediavalia, KasparBot and Anonymous: 111
Quality control Source: https://en.wikipedia.org/wiki/Quality_control?oldid=730756539 Contributors: Ed Poor, Deb, Heron, Olivier,
Michael Hardy, JakeVortex, Kku, Danhicks, Jiang, Kaihsu, Smack, Beck, Mydogategodshat, David Thrale, Greenrd, Vaceituno, Frazzydee,
Chuunen Baka, Nufy8, Fredrik, Texture, Robinh, Alan Liefting, Giftlite, DocWatson42, Tom harrison, Robodoc.at, Beland, Quarl,
Adamrice, DMG413, Canterbury Tail, Mike Rosoft, CALR, DanielCD, Discospinster, Rich Farmbrough, Memobug, Mani1, Bender235,
Khalid, Clooistopherm, S.K., Jensbn, El C, Art LaPella, Femto, Bobo192, Smalljim, Prainog, Brim, Maurreen, DaveGorman, John
Fader, Hooperbloob, Mdd, Gary, Conan, Romary, Velella, Computerjoe, Versageek, Ceyockey, Marcelo1229, David Haslam, Uris, ArrowmanCoder, BD2412, FreplySpang, Dpr, Koavf, GlenPeterson, The wub, Sango123, FlaBot, SchuminWeb, DennisArter, RexNL,
AndriuZ, Yorrose, Tomrosenfeld, Adoniscik, YurikBot, Wavelength, RussBot, Jenks1987, Shell Kinney, NawlinWiki, Wiki alf, Multichill, Dhollm, Jpbowen, Kyle Barbour, FF2010, Sandstein, Chase me ladies, I'm the Cavalry, Juanscott, E Wing, Retropunk, Red
Jay, JLaTondre, Dbarefoot, Kungfuadam, SmackBot, DCDuring, The Photon, DanielPeneld, Lds, Abbeyvet, David.c.h, Folajimi,
Bluebot, Rkitko, SchftyThree, KaiserbBot, Jwy, Nakon, Richard001, Weregerbil, Hmoul, Kuru, John, Peterlewis, Arhon, Rwong48,
Ripe, Waggers, Anonymous anonymous, RichardF, Novangelis, Roderickmunro, Dl2000, DabMachine, Iridescent, Shoeofdeath, IvanLanin, Dan1679, Eastlaw, Glanthor Reviol, Teixant, Mak Thorpe, Phatom87, Enoch the red, Biblbroks, Thijs!bot, Eggsyntax, James086,
JustAGal, Sniper Elite, Dfrg.msc, I already forgot, AntiVandalBot, WinBot, Hughch, Ron Richard, Rforsyth, Rabqa1, Danger, Myanw,
PhilKnight, VoABot II, JamesBWatson, Mbc362, Twsx, Ivec, Morlich, E-pen, Gwern, MartinBot, Doodledoo, Jayantaism, J.delanoy, Rlsheehan, Bogey97, VAcharon, Plasticup, SJP, Cobi, KylieTastic, DorganBot, AlnoktaBOT, Philip Trueman, TXiKiBoT, Rotor DB, Ann
Stouter, Martin451, Hanwufu, Jpeeling, Enviroboy, Jmuenzing, SieBot, BotMultichill, Su huynh, Thisisjonathanchan, Itemuk, JSpung, Pm
master, Oxymoron83, KatieDOM, La Parka Your Car, S2000magician, Melcombe, WikiLaurent, Jfbravoc~enwiki, ClueBot, The Thing
That Should Not Be, Professorial, Drmies, SecretDisc, Sabri76, Shustov, Brewcrewer, Gaslan2, StormyJerry, Sam907, DeltaQuad, Versus22, Qwfp, Raploichkin, Richdavi, Addbot, Download, LaaknorBot, SpBot, Numbo3-bot, Lightbot, Anxietycello, Krano, PlankBot,
Yobot, Ptbotgourou, Fraggle81, Sanyi4, Northenpal, KamikazeBot, Emdee, Materialscientist, Frankenpuppy, Erik9bot, Shabadsingh,
Pshent, Kiefer.Wolfowitz, Jonesey95, Impala2009, Sixsigmais, Reconsider the static, Jonkerz, Fastilysock, Suusion of Yellow, DARTH
SIDIOUS 2, Mean as custard, Sapientij, WikitanvirBot, Frostee94, Wikipelli, Dcirovic, F, JeremyBradley, L Kensington, Millsj88,
ChuispastonBot, Ileshko, Qualitytier, ClueBot NG, Bml013, Widr, Theopolisme, Helpful Pixie Bot, BG19bot, Shantanu1989, Wiki-
106
Hannibal, RudeBoyRudeBoy, David.moreno72, Illia Connell, OrganizedGuy, Pittello87, Dimitra Karelou, FallingGravity, Lemnaminor,
JohannesFB, Ana346894, Mat657894, Monkbot, Phil Jacques, Moose1911, KiarashKevin, KasparBot and Anonymous: 320
Operations research Source: https://en.wikipedia.org/wiki/Operations_research?oldid=736687400 Contributors: The Anome, Tbackstr, Iwnbap, Khendon, Vignaux, Maury Markowitz, Jdpipe, Michael Hardy, Wshun, Fred Bauder, Dominus, Ixfd64, Zeno Gantner, Ronz,
EdH, Hike395, Mydogategodshat, Dysprosia, Jitse Niesen, Penfold, Finlay McWalter, Robbot, Xa4~enwiki, PBS, Gwrede, Jredmond, Altenmann, Henrygb, Nilmerg, Aetheling, Jpo, Giftlite, Muness, Oberiko, Mintleaf~enwiki, BenFrantzDale, Fastssion, Leonard G., Gracefool, Just Another Dan, Andycjp, Piotrus, Togo~enwiki, Joyous!, Fintor, Robin klein, Klemen Kocjancic, Clemwang, Canterbury Tail,
Lucidish, Monkeyman, Brianhe, Bender235, Petrus~enwiki,
, RoyBoy, Dungodung, Maurreen, Slambo, Notreadbyhumans, Haham
hanuka, Mdd, Msh210, Arthena, Andrewpmk, Bdwilliamscraig, Hu, Czyl, Saxifrage, Novacatz, Eleusis, Myleslong, Pol098, Tabletop,
AnmaFinotera, Btyner, GraemeLeggett, BDE, Ian Dunster, FlaBot, Mathbot, RobyWayne, Dnadan, YurikBot, Wavelength, Borgx,
Angus Lepper, RobotE, Encyclops, Arzel, Amckern, Manop, Rsrikanth05, Eddie.willers, Welsh, Gareth Jones, Panscient, Amcfreely,
Abrio, Cheese Sandwich, Tribaal, Open2universe, Caliprincess, Nelson50, Bluezy, Zvika, That Guy, From That Show!, Sardanaphalus,
JJL, SmackBot, DXBari, Elgrandragon, Benjaminevans82, Ohnoitsjamie, Skizzik, Anwar saadat, Sadads, Maxsonbd, Baa, Gracenotes,
D nath1, Wyckyd Sceptre, Trekphiler, Tsca.bot, Snowmanradio, Yqwen, Stevenmitchell, RJN, Jon Awbrey, Acdx, Brainfood, Ohconfucius, Vgy7ujm, Aleenf1, Wxm29, Slakr, Beetstra, SandyGeorgia, Ace Frahm, Keith-264, Will Thomas, Iridescent, Vocaro, Philip
ea, CRGreathouse, Thomasmeeks, Requestion, Sanspeur, Penbat, Cydebot, Krauss, Chrislk02, Imajoebob, Jay.Here, Thijs!bot, Surendra
mohnot, David from Downunder, Dawnseeker2000, Seaphoto, Matforddavid, Wayiran, Knotwork, Erxnmedia, .anacondabot, Xn4, Swpb,
David Eppstein, DGG, Jim.henderson, R'n'B, LittleOldMe old, Erkan Yilmaz, NerdyNSK, Auegel, DadaNeem, Axr15, Jose Gaspar, Alterrabe, Deor, JohnBlackburne, Homarjun, Jimmaths, Toddy1, Oshwah, SueHay, Ask123, BarryList, Rich Janis, PhDinMS, SieBot, BotMultichill, Jyoti buet, Bentogoa, Dralbertomarquez, Flyer22 Reborn, Samansouri, Mtrick, Oxymoron83, Wuhwuzdat, S2000magician,
Melcombe, Masoudsa, ClueBot, Koczy, Deanlaw, The Thing That Should Not Be, Isaac.holeman, Grantbow, Boing! said Zebedee, EnigmaMcmxc, Three-quarter-ten, PixelBot, Abkeshvari, JamieS93, TheRedPenOfDoom, Thewellman, Qwfp, Wally Tharg, Graham Sharp,
FTGHSmith, Addbot, Fgnievinski, KenKendall, Fieldday-sunday, Mnh, Leszek Jaczuk, Download, Protonk, LaaknorBot, Lightbot,
Zorrobot, Legobot, Luckas-bot, Yobot, VictorK1965, Pcap, AnomieBOT, VanishedUser sdu9aya9fasdsopa, DemocraticLuntz, Bsimmons666, Galoubet, Formol, HanPritcher, TheTechieGeek63, WebsterRiver, Knowledge Incarnate, Lonniev, Isheden, Williamsrus, KosMal, Omnipaedista, Nnhsky, FrescoBot, Krj373, Sidna, D'ohBot, Cargoking, Michael.Forman, Citation bot 1, Shuroo, Kiefer.Wolfowitz,
RedBot, Rkhwaja, Ibizzavic, NorthernCounties, G Qian, Duoduoduo, Robinqiu, Earthandmoon, Mean as custard, EmausBot, Vader07d,
Dramaturgid, JaeDyWolf, Netha Hussain, Erianna, ChuispastonBot, Nrlsouza, 28bot, Snumath89, Will Beback Auto, StopThat, Gareth
Grith-Jones, Mdgarvey, Widr, BradfordF, Helpful Pixie Bot, Merveunuvar, Wbm1058, BG19bot, Tcody84, Pine, Qx2020, Rjpbi, Marcocapelle, Compfreak7, Brad7777, Gibbja, MahdiBot, Cyberbot II, BFL2015, Flower of Mystery, Jbeyerl, Me, Myself, and I are Here,
Razibot, Randykitty, Kuldeepsheoran1, Biogeographist, Pedarkwa, Melody Lavender, Ginsuloft, Lizia7, Pablodim91, Longobardiano,
WholeWheatBagel, U2fanboi, Behroozkamali, Bfortz, Sarahfores, Cyberbikerva, Prasoon068, Directorofpubs, KasparBot, JessicaGibbs,
Prgks, GreenC bot, RainFall, Varsei.mohsen and Anonymous: 282
Machine learning Source: https://en.wikipedia.org/wiki/Machine_learning?oldid=736737631 Contributors: Arvindn, ChangChienFu,
Michael Hardy, Kku, Delirium, Ahoerstemeier, Ronz, BenKovitz, Mxn, Hike395, Silvonen, Furrykef, Buridan, Jmartinezot, Phoebe,
Shizhao, Topbanana, Robbot, Plehn, KellyCoinGuy, Ancheta Wis, Fabiform, Centrx, Giftlite, Seabhcan, Levin, Dratman, Jason Quinn,
Khalid hassani, Utcursch, APH, Gene s, Paulscrawl, Clemwang, Nowozin, Silence, Bender235, ZeroOne, Superbacana, Aaronbrick,
Jojit fb, Nk, Rajah, Tritium6, Haham hanuka, Mdd, HasharBot~enwiki, Vilapi, Arcenciel, Denoir, Diego Moya, Wjbean, Stephen
Turner, LearnMore, Rrenaud, Leondz, Soultaco, Ruud Koot, BlaiseFEgan, JimmyShelter~enwiki, Essjay, Joerg Kurt Wegner, Adiel,
BD2412, Qwertyus, Rjwilmsi, Emrysk, VKokielov, Eubot, Celendin, Intgr, Predictor, Kri, BMF81, Irregulargalaxies, Chobot, Bobdc,
Bgwhite, Adoniscik, YurikBot, Misterwindupbird, Trondtr, Nesbit, Grafen, Gareth Jones, Srinivasasha, Raikkonen, Crasshopper, DaveWF, Masatran, CWenger, Fram, KnightRider~enwiki, SmackBot, Mneser, InverseHypercube, CommodiCast, Jyoshimi, Mcld, KYN,
Ohnoitsjamie, Chris the speller, FidesLT, Nbarth, Cfallin, Moorejh, JonHarder, Baguasquirrel, Krexer, Shadow1, Philpraxis~enwiki,
Daniel.Cardenas, Sina2, ChaoticLogic, NongBot~enwiki, RexSurvey, Beetstra, WMod-NS, Julthep, Dsilver~enwiki, Dicklyon, Vsweiner,
Optakeover, Ctacmo, MTSbot~enwiki, Ralf Klinkenberg, Dave Runger, Doceddi, Scigrex14, Pgr94, Innohead, Bumbulski, Peterdjones,
Dancter, Msnicki, Quintopia, Thijs!bot, Mereda, Perrygogas, Djbwiki, GordonRoss, Kinimod~enwiki, Damienfrancois, Natalie Erin,
Seaphoto, AnAj, Ninjakannon, Kimptoc, Penguinbroker, The Transhumanist, Jrennie, Hut 8.5, Kyhui, Magioladitis, Ryszard Michalski, Jwojt, Transcendence, Tedickey, Pebkac, Robotman1974, Jroudh, Businessman332211, Pmbhagat, Calltech, STBot, Keith D, Glrx,
Nickvence, Gem-fanat, Salih, AntiSpamBot, Gombang, Chriblo, Mxwsn, Dana2020, DavidCBryant, Bonadea, WinterSpw, RJASE1,
Funandtrvl, James Kidd, LokiClock, Redgecko, Markcsg, Jrljrl, Like.liberation, A4bot, Daniel347x, Joel181, Wikidemon, Lordvolton,
Defza, Chrisoneall, Wingedsubmariner, Spiral5800, Kesshaka, Cvdwalt, Why Not A Duck, Sebastjanmm, LittleBenW, Gal chechik,
Biochaos, Cmbishop, Jbmurray, IradBG, Smsarmad, Scorpion451, Kumioko (renamed), CharlesGillingham, StaticGull, CultureDrone,
Anchor Link Bot, ImageRemovalBot, ClueBot, GorillaWarfare, Ahyeek, Sonu mangla, Ggia, Debejyo, D.scain.farenzena, He7d3r, Magdon~enwiki, WilliamSewell, Jim15936, Vanished user uih38riiw4hjlsd, Evansad, Roxy the dog, PseudoOne, Andr P Ricardo, Agamemnonc, Darnelr, MystBot, Dsimic, YrPolishUncle, MTJM, Addbot, Mortense, Fyrael, Aceituno, MrOllie, LaaknorBot, Jarble, Movado73,
Luckas-bot, QuickUkie, Yobot, NotARusski, Genius002, Examtester, AnomieBOT, Piano non troppo, Materialscientist, Clickey, Devantheryv, Vivohobson, ArthurBot, Quebec99, Xqbot, Happyrabbit, Gtfjbl, Kithira, J04n, Addingrefs, Webidiap, Shirik, Joehms22,
Aaron Kauppi, Velblod, Prari, FrescoBot, Jdizzle123, Olexa Riznyk, Featherard, WhatWasDone, Siculars, Proviktor, Boxplot, Swordsmankirby, I dream of horses, Wikinacious, Skyerise, Mostafa mahdieh, Lars Washington, TobeBot, OnceAlpha, AXRL, ,
BertSeghers, Edouard.darchimbaud, Winnerdy, Zosoin, Helwr, EmausBot, Johncasey, Dzkd, Primefac, MartinThoma, Jasonanaggie,
MarsTrombone, Wht43, Chire, GZ-Bot, Jcautilli, Jorjulio, AManWithNoPlan, Pintaio, L Kensington, Ataulf, Zfeinst, Yoshua.Bengio,
Casia wyq, Ego White Tray, Blaz.zupan, Shinosin, Marius.andreiana, Lovok Sovok, Graytay, Liuyipei, ClueBot NG, Tillander, Keefaas,
Lawrence87, Aiwing, Pranjic973, Candace Gillhoolley, Robiminer, Leonardo61, Wrdieter, Arrandale, O.Koslowski, WikiMSL, Helpful
Pixie Bot, RobertPollak, BG19bot, Smorsy, Mohamed CJ, Lisasolomonsalford, Anubhab91, Chafe66, Solomon7968, Ishq2011, Autologin, Brooksrichardbrown, DasAllFolks, Billhodak, Debora.riu, Ohandyya, Davidmetcalfe, David.moreno72, Mdann52, JoshuSasori,
Ulugen, IjonTichyIjonTichy, Keshav.dhandhania, Dexbot, Mogism, Djfrost711, Bkuhlman80, Frosty, Jamesx12345, Shubhi choudhary,
Jochen Burghardt, Joeinwiki, Brettrmurphy, Phamnhatkhanh, Ppilotte, Delaf, InnocuousPilcrow, Kittensareawesome, Statpumpkin, Neo
Poz, Dustin V. S., TJLaher123, Ankit.u, Francisbach, Aleks-ger, MarinMersenne, Weiping.thu, LokeshRavindranathan, Tonyszedlak,
Proneat123, GrowthRate, Sami Abu-El-Haija, Mpgoldhirsh, Work Shop Corpse, Superploro, Riceissa, Dawolakamp, Waggie, Justincahoon, Jorge Guerra Pires, Hm1235, Velvel2, Vidhul sikka, Erik Itter, Annaelison, Tgrin9, Chazdywaters, Rmashrmash, Komselvam, Robbybluedogs, HelpUsStopSpam, EricVSiegel, KenTancwell, Justinqnabel, Rusky.ai, Datapablo, Aetilley, JenniferTheEmpress0,
Dsysko, Haodong123, Lr0^^k, BNoack, NightOwl15, Latosh Boris, Thejavis86, Muratovst, Pinsi281, ArguMentor, Datakeeper, Doctasarge, Espyromi, Kailey 2001, WunderStahl, Natenatenatenate, Fmadd, Vladiatorr, Ayasdi, ChillyBlue, Hyksandra, Famousceleb, Hex-
107
108
Joel B. Lewis, CH-stat, Helpful Pixie Bot, BG19bot, Giogm2000, CitationCleanerBot, Hakimo99, Gprobins, Prof. Squirrel, Attleboro,
Illia Connell, JYBot, Sinxvin, Sminthopsis84, Francescapelusi, Lugia2453, SimonPerera, Me, Myself, and I are Here, Lemnaminor, Inniti4, EJM86, Francisbach, Eli the King, Monkbot, Bob nau, Moorshed k, Moorshed, HelpUsStopSpam, KasparBot, Statperson123,
Ballatown, NatalieSunshine and Anonymous: 396
Multivariate statistics Source: https://en.wikipedia.org/wiki/Multivariate_statistics?oldid=730570477 Contributors: Ap, Fnielsen, Michael
Hardy, Tomi, Den fjttrade ankan~enwiki, Cherkash, Sboehringer, Henrygb, Diberri, APH, Sam Hocevar, Bender235, Kronoss, Landroni,
Denoir, Jheald, Mindmatrix, Lgallindo, Graham87, Rjwilmsi, Adoniscik, Holon, SmackBot, Unyoyega, Chlewbot, Jmlk17, Richard001,
Nutcracker, Kaarebrandt, Gnome (Bot), Tawkerbot2, Dan1679, Mikiemike, CBM, Rphirsch, Thijs!bot, Mycatharsis, Johnbibby, STBot,
Luc.girardin, TheSeven, It Is Me Here, Policron, DonAndre, TXiKiBoT, Seraphim, Lorloci, AlasdairBailey, Yerpo, Melcombe, Digisus,
Dlrohrer2003, Niceguyedc, Qwfp, DumZiBoT, Ewger, Proevy, Addbot, Fgnievinski, MrOllie, Delaszk, PAvdK, Luckas-bot, AnomieBOT,
Pankaj.tux, Mookiedockee, Materialscientist, Dpoduval, Citation bot, Xqbot, GrouchoBot, Omnipaedista, FrescoBot, Boxplot, Kiefer.Wolfowitz,
Ericbeg, FoxBot, Trappist the monk, Kastchei, RMGunton, Fjsalguero, LaineVitola, ZroBot, Johnbates, Helpful Pixie Bot, Therhaag,
Verplay, Cretchen, Illia Connell, Jack McArdle, Epicgenius, Lemnaminor, FizykLJF, Icely88, Monkbot, Soon Son Simps, Moorshed,
Loraof, KasparBot, Kkdabt and Anonymous: 50
Data collection Source: https://en.wikipedia.org/wiki/Data_collection?oldid=735567931 Contributors: Kku, Ronz, Cherkash, Bobo192,
Mdd, Alansohn, RainbowOfLight, Waldir, Bgwhite, Hydrargyrum, Rsrikanth05, Aeusoes1, Daniel Mietchen, Epipelagic, SmackBot,
Yamaguchi , Ohnoitsjamie, Gyrobo, Scwlong, Eliyak, Cydebot, Dmbrown00, RichardVeryard, KylieTastic, Geekdiva, Khairul hazim,
DoorsAjar, Shanzu, Yintan, Khvalamde, Aboluay, Melcombe, Arakunem, Jdrowlands, SchreiberBike, Qwfp, Addbot, MrOllie, Peter
Flass, Tempodivalse, Jim1138, Materialscientist, Qweedsa, Tim Keighley,
, Pinethicket, I dream of horses, Kiefer.Wolfowitz,
Jonkerz, Lotje, Mean as custard, EmausBot, WikitanvirBot, Tuxedo junction, Bemanna, Tolly4bolly, 28bot, ClueBot NG, Helpful Pixie
Bot, BG19bot, Altar, ChrisGualtieri, Illia Connell, Dexbot, Allnamesarefkintaken, JaconaFrere, Woenel, Soon Son Simps, Beth.Alex123,
My Chemistry romantic, Rmullings, WikiBaes, Junosoon and Anonymous: 78
Time series Source: https://en.wikipedia.org/wiki/Time_series?oldid=732545965 Contributors: Michael Hardy, Kku, Dcljr, Cherkash,
Charles Matthews, Taxman, Topbanana, Gandalf61, Babbage, Wile E. Heresiarch, Giftlite, Pucicu, Andycjp, Piotrus, Wyllium, Discospinster, Rich Farmbrough, Pak21, Modargo, Bender235, Calair, Tobacman, Jrme, Gary, Arthena, Rgclegg, PAR, Spangineer, Aegis
Maelstrom, Mindmatrix, Camw, Btyner, Edison, Rjwilmsi, MZMcBride, ElKevbo, Rbonvall, Intgr, Chobot, Gap, YurikBot, Wavelength,
CambridgeBayWeather, Jugander, Joel7687, Tarawneh, Georey.landis, Zvika, SmackBot, Hopf, Mm100100, Unyoyega, CommodiCast,
Commander Keane bot, Esoterum, Chris the speller, Oli Filth, Nbarth, G716, Mwtoews, Bdushaw, Dankonikolic, Lambiam, Kuru, John
Cumbers, Nutcracker, Nialsh, Susko, JohnCD, Requestion, Krauss, Scientio, Lovibond, VictorAnyakin, JAnDbot, Instinct, Magioladitis,
VoABot II, Albmont, SHCarter, Ldecola, JaGa, Apdevries, MartinBot, Nono64, Abeliavsky, Eliz81, TheSeven, SShearman, Policron,
Cottrellnc, STBotD, The enemies of god, Funandtrvl, VolkovBot, DrMicro, Jimmaths, Frederic.vernier, Kv75, Cpdo, Zheric~enwiki,
SieBot, Mathaddins, Charmi99, Flyer22 Reborn, Strife911, Melcombe, Rinconsoleao, ClueBot, Drrho, Zipircik, SchreiberBike, Aleksd,
1ForTheMoney, Qwfp, Keithljelp, Dkondras, Dekart, Hellopeopleofdetroit, Tayste, Addbot, Truswalu, Cwdegier, Fgnievinski, MrOllie, LaaknorBot, Favonian, Legobot, Luckas-bot, Yobot, AnomieBOT, Mihal Orela, Materialscientist, Xqbot, Armandology, FrescoBot,
Luyima, Citation bot 1, Boxplot, Pinethicket, Kiefer.Wolfowitz, Rushbugled13, Hoo man, Merlion444, Twilight Nightmare, Duoduoduo, Badtoothfairy, Diannaa, Simonkramer, FBmotion, Sandman888, DARTH SIDIOUS 2, Helwr, EmausBot, Johncasey, Dewritech,
BAICAN XXX, Dondervogel 2, Chire, A930913, Burhem, Donner60, ChuispastonBot, Visu dreamz, Mjbmrbot, ClueBot NG, Mesoderm, Statoman71, Masssly, Helpful Pixie Bot, Scwarebang, Dr ahmed1010, BG19bot, QualitycontrolUS, Andreas4965, Ricardo Oliveros
Ramos, Adrianafraj, Cretchen, EdwardH, Thulka, BattyBot, Sick Rantorum, ChrisGualtieri, Imdadasad, SFK2, Jochen Burghardt, GabeIglesia, Coginsys, Llinfeng, OhGodItsSoAmazing, LCS check, EJM86, Citrusbowler, Cubism44, Soon Son Simps, Srp54, Moorshed k,
Oversound2, Moorshed, DoctorTerrella, Loraof, Vtor, HelpUsStopSpam, Rodionos, KasparBot, Tariqfaruqi, Jahrmann, Colinwikipedia
and Anonymous: 156
14.12.2
Images
File:Animation2.gif Source: https://upload.wikimedia.org/wikipedia/commons/c/c0/Animation2.gif License: CC-BY-SA-3.0 Contributors: Own work Original artist: MG (talk contribs)
File:Anscombe{}s_quartet_3.svg Source: https://upload.wikimedia.org/wikipedia/commons/e/ec/Anscombe%27s_quartet_3.svg License: CC BY-SA 3.0 Contributors:
Anscombe.svg Original artist: Anscombe.svg: Schutz
File:Astrolabe-Persian-18C.jpg Source: https://upload.wikimedia.org/wikipedia/commons/1/18/Astrolabe-Persian-18C.jpg License:
CC BY-SA 2.0 Contributors: Whipple Museum of the History of Science Original artist: Andrew Dunn
File:Automated_weighbridge_for_Adlie_penguins_-_journal.pone.0085291.g002.png Source: https://upload.wikimedia.org/wikipedia/
commons/5/54/Automated_weighbridge_for_Ad%C3%A9lie_penguins_-_journal.pone.0085291.g002.png License: CC BY 4.0 Contributors: Lescrol, A. L.; Ballard, G.; Grmillet, D.; Authier, M.; Ainley, D. G. (2014). Antarctic Climate Change: Extreme Events
Disrupt Plastic Phenotypic Response in Adlie Penguins. PLoS ONE 9: e85291. DOI:10.1371/journal.pone.0085291. Original artist:
Lescrol, A. L.; Ballard, G.; Grmillet, D.; Authier, M.; Ainley, D. G. (2014)
File:B_24_in_raf_service_23_03_05.jpg Source: https://upload.wikimedia.org/wikipedia/commons/a/a1/B_24_in_raf_service_23_03_
05.jpg License: Public domain Contributors: Transferred from en.wikipedia to Commons. Original artist: The original uploader was Bzuk
at English Wikipedia
File:Commons-logo.svg Source: https://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg License: CC-BY-SA-3.0 Contributors: ? Original artist: ?
File:Complex-adaptive-system.jpg Source: https://upload.wikimedia.org/wikipedia/commons/0/00/Complex-adaptive-system.jpg License: Public domain Contributors: Own work by Acadac : Taken from en.wikipedia.org, where Acadac was inspired to create this graphic
after reading: Original artist: Acadac
File:Correlation_examples.png Source: https://upload.wikimedia.org/wikipedia/commons/0/02/Correlation_examples.png License: Public domain Contributors: Transferred from en.wikipedia to Commons by jtneill. Original artist: Imagecreator at English Wikipedia
109
110
14.12.3
Content license
111