Sunteți pe pagina 1din 242

Maths

made EASY

PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information.
PDF generated at: Mon, 27 Sep 2010 12:14:40 UTC
Contents
Articles
ARITHMETIC MEAN 1
Arithmetic mean 1
Statistics 3
Mathematics 12
Median 24
Mean 29
Statistical population 36
Sampling (statistics) 37
Probability theory 50
Normal distribution 56
Standard deviation 76
Random variable 89
Probability distribution 95
Real number 99
Variance 105
Probability density function 115
Cumulative distribution function 120
Expected value 124
Discrete probability distribution 131
Continuous probability distribution 133
Probability mass function 134
Continuous function 135
Measure (mathematics) 144
Bias of an estimator 149
Probability 152
Pierre-Simon Laplace 158
Integral 171
Function (mathematics) 192
Calculus 213
Average 226

References
Article Sources and Contributors 231
Image Sources, Licenses and Contributors 237

Article Licenses
License 239
1

ARITHMETIC MEAN

Arithmetic mean
In mathematics and statistics, the arithmetic mean, often referred to as simply the mean or average when the
context is clear, is a method to derive the central tendency of a sample space. The term "arithmetic mean" is
preferred in mathematics and statistics because it helps distinguish it from other averages such as the geometric and
harmonic mean.
In addition to mathematics and statistics, the arithmetic mean is used frequently in fields such as economics,
sociology, and history, though it is used in almost every academic field to some extent. For example, per capita GDP
gives an approximation of the arithmetic average income of a nation's population.
While the arithmetic mean is often used to report central tendencies, it is not a robust statistic, meaning that it is
greatly influenced by outliers. Notably, for skewed distributions, the arithmetic mean may not accord with one's
notion of "middle", and robust statistics such as the median may be a better description of central tendency.

Definition
Suppose we have sample space . Then the arithmetic mean is defined via the equation

If the list is a statistical population, then the mean of that population is called a population mean. If the list is a
statistical sample, we call the resulting statistic a sample mean.

Motivating properties
The arithmetic mean has several properties that make it useful, especially as a measure of central tendency. These
include:
• If numbers have mean X, then . Since is the
distance from a given number to the mean, one way to interpret this property is as saying that the numbers to the
left of the mean are balanced by the numbers to the right of the mean. The mean is the only single number for
which the residuals defined this way sum to zero.
• If it is required to use a single number X as an estimate for the value of numbers , then the arithmetic
mean does this best, in the sense of minimizing the sum of squares (xi − X)2 of the residuals. (It follows that the
mean is also the best single predictor in the sense of having the lowest root mean squared error.)
• For a normal distribution, the arithmetic mean is equal to both the median and the mode, other measures of central
tendency.

Problems
The arithmetic mean may be misinterpreted as the median to imply that most values are higher or lower than is
actually the case. If elements in the sample space increase arithmetically, when placed in some order, then the
median and arithmetic average are equal. For example, consider the sample space {1,2,3,4}. The average is 2.5, as is
the median. However, when we consider a sample space that cannot be arranged into an arithmetic progression, such
as {1,2,4,8,16}, the median and arithmetic average can differ significantly. In this case the arithmetic average is 6.2
Arithmetic mean 2

and the median is 4. When one looks at the arithmetic average of a sample space, one must note that the average
value can vary significantly from most values in the sample space.
There are applications of this phenomenon in fields such as economics. For example, since the 1980s in the United
States median income has increased more slowly than the arithmetic average of income. Ben Bernanke, has
speculated that the difference can be accounted for through technology, and less so via the decline in labour unions
and other factors.[1]

Angles
Particular care must be taken when using cyclic data such as phases or angles. Naïvely taking the arithmetic mean of
1° and 359° yields a result of 180°. This is incorrect for two reasons:
• Firstly, angle measurements are only defined up to a factor of 360° (or 2π, if measuring in radians). Thus one
could as easily call these 1° and −1°, or 1° and 719° – each of which gives a different average.
• Secondly, in this situation, 0° (equivalently, 360°) is geometrically a better average value: there is lower
dispersion about it (the points are both 1° from it, and 179° from 180°, the putative average).
In general application such an oversight will lead to the average value artificially moving towards the middle of the
numerical range. A solution to this problem is to use the optimization formulation (viz, define the mean as the central
point: the point about which one has the lowest dispersion), and redefine the difference as a modular distance (i.e.,
the distance on the circle: so the modular distance between 1° and 359° is 2°, not 358°).

See also
• Assumed mean • Median
• Average • Mode
• Central tendency • Muirhead's inequality
• Empirical measure • Sample mean and covariance
• Fréchet mean • Sample size
• Generalized mean • Standard deviation
• Geometric mean • Summary statistics
• Inequality of arithmetic and geometric means • Variance
• Mean

Further reading
• Darrell Huff, How to lie with statistics, Victor Gollancz, 1954 (ISBN 0-393-31072-8).

External links
• Calculations and comparisons between arithmetic and geometric mean of two numbers [2]
• Mean or Average [3]

References
[1] Ben S. Bernanke. "The Level and Distribution of Economic Well-Being" (http:/ / www. federalreserve. gov/ newsevents/ speech/
bernanke20070206a. htm). . Retrieved 23 July 2010.
[2] http:/ / www. sengpielaudio. com/ calculator-geommean. htm
[3] http:/ / people. revoledu. com/ kardi/ tutorial/ BasicMath/ Average/ index. html
Statistics 3

Statistics
Statistics is the science of the collection, organization, and interpretation of data.[1] [2] It deals with all aspects of
this, including the planning of data collection in terms of the design of surveys and experiments.[1]
A statistician is someone who is particularly well versed in the ways of thinking necessary for the successful
application of statistical analysis. Such people have often gained this experience through working in any of a wide
number of fields. There is also a discipline called mathematical statistics, which is concerned with the theoretical
basis of the subject.
The word statistics can either be singular or plural.[3] When it refers to the discipline, "statistics" is singular, as in
"Statistics is an art." When it refers to quantities (such as mean and median) calculated from a set of data,[4] statistics
is plural, as in "These statistics are misleading."

Scope
Statistics is considered by some to be a
mathematical science pertaining to the
collection, analysis, interpretation or
explanation, and presentation of
data,[5] while others consider it a
branch of mathematics[6] concerned
with collecting and interpreting data.[7]
Because of its empirical roots and its
focus on applications, statistics is
usually considered to be a distinct
mathematical science rather than a
branch of mathematics.[8] [9]

Statisticians improve the quality of


data with the design of experiments More probability density will be found the closer one gets to the expected (mean) value in
a normal distribution. Statistics used in standardized testing assessment are shown. The
and survey sampling. Statistics also
scales include standard deviations, cumulative percentages, percentile equivalents,
provides tools for prediction and Z-scores, T-scores, standard nines, and percentages in standard nines.
forecasting using data and statistical
models. Statistics is applicable to a wide variety of academic disciplines, including natural and social sciences,
government, and business.

Statistical methods can be used to summarize or describe a collection of data; this is called descriptive statistics. This
is useful in research, when communicating the results of experiments. In addition, patterns in the data may be
modeled in a way that accounts for randomness and uncertainty in the observations, and are then used to draw
inferences about the process or population being studied; this is called inferential statistics. Inference is a vital
element of scientific advance, since it provides a prediction (based in data) for where a theory logically leads. To
further prove the guiding theory, these predictions are tested as well, as part of the scientific method. If the inference
holds true, then the descriptive statistics of the new data increase the soundness of that hypothesis. Descriptive
statistics and inferential statistics (a.k.a., predictive statistics) together comprise applied statistics.[10]
Statistics 4

History
Some scholars pinpoint the origin of statistics to 1663, with the publication of Natural and Political Observations
upon the Bills of Mortality by John Graunt.[11] Early applications of statistical thinking revolved around the needs of
states to base policy on demographic and economic data, hence its stat- etymology. The scope of the discipline of
statistics broadened in the early 19th century to include the collection and analysis of data in general. Today,
statistics is widely employed in government, business, and the natural and social sciences.
Its mathematical foundations were laid in the 17th century with the development of probability theory by Blaise
Pascal and Pierre de Fermat. Probability theory arose from the study of games of chance. The method of least
squares was first described by Carl Friedrich Gauss around 1794. The use of modern computers has expedited
large-scale statistical computation, and has also made possible new methods that are impractical to perform
manually.

Overview
In applying statistics to a scientific, industrial, or societal problem, it is necessary to begin with a population or
process to be studied. Populations can be diverse topics such as "all persons living in a country" or "every atom
composing a crystal". A population can also be composed of observations of a process at various times, with the data
from each observation serving as a different member of the overall group. Data collected about this kind of
"population" constitutes what is called a time series.
For practical reasons, a chosen subset of the population called a sample is studied — as opposed to compiling data
about the entire group (an operation called census). Once a sample that is representative of the population is
determined, data is collected for the sample members in an observational or experimental setting. This data can then
be subjected to statistical analysis, serving two related purposes: description and inference.
• Descriptive statistics summarize the population data by describing what was observed in the sample numerically
or graphically. Numerical descriptors include mean and standard deviation for continuous data types (like heights
or weights), while frequency and percentage are more useful in terms of describing categorical data (like race).
• Inferential statistics uses patterns in the sample data to draw inferences about the population represented,
accounting for randomness. These inferences may take the form of: answering yes/no questions about the data
(hypothesis testing), estimating numerical characteristics of the data (estimation), describing associations within
the data (correlation), modeling relationships within the data (regression), extrapolation, interpolation, or other
modeling techniques like ANOVA, time series, and data mining.
“... it is only the manipulation of uncertainty that interests us. We are not concerned with the matter that is uncertain. Thus we do not
study the mechanism of rain; only whether it will rain.”

Dennis Lindley, "The Philosophy of Statistics", The Statistician (2000).

The concept of correlation is particularly noteworthy for the potential confusion it can cause. Statistical analysis of a
data set often reveals that two variables (properties) of the population under consideration tend to vary together, as if
they were connected. For example, a study of annual income that also looks at age of death might find that poor
people tend to have shorter lives than affluent people. The two variables are said to be correlated; however, they may
or may not be the cause of one another. The correlation phenomena could be caused by a third, previously
unconsidered phenomenon, called a lurking variable or confounding variable. For this reason, there is no way to
immediately infer the existence of a causal relationship between the two variables. (See Correlation does not imply
causation.)
For a sample to be used as a guide to an entire population, it is important that it is truly a representative of that
overall population. Representative sampling assures that the inferences and conclusions can be safely extended from
the sample to the population as a whole. A major problem lies in determining the extent to which the sample chosen
is actually representative. Statistics offers methods to estimate and correct for any random trending within the sample
Statistics 5

and data collection procedures. There are also methods for designing experiments that can lessen these issues at the
outset of a study, strengthening its capability to discern truths about the population. Statisticians describe stronger
methods as more "robust".(See experimental design.)
Randomness is studied using the mathematical discipline of probability theory. Probability is used in "Mathematical
statistics" (alternatively, "statistical theory") to study the sampling distributions of sample statistics and, more
generally, the properties of statistical procedures. The use of any statistical method is valid when the system or
population under consideration satisfies the assumptions of the method.
Misuse of statistics can produce subtle, but serious errors in description and interpretation — subtle in the sense that
even experienced professionals make such errors, and serious in the sense that they can lead to devastating decision
errors. For instance, social policy, medical practice, and the reliability of structures like bridges all rely on the proper
use of statistics. Even when statistics are correctly applied, the results can be difficult to interpret for those lacking
expertise. The statistical significance of a trend in the data — which measures the extent to which a trend could be
caused by random variation in the sample — may or may not agree with an intuitive sense of its significance. The set
of basic statistical skills (and skepticism) that people need to deal with information in their everyday lives properly is
referred to as statistical literacy.

Statistical methods

Experimental and observational studies


A common goal for a statistical research project is to investigate causality, and in particular to draw a conclusion on
the effect of changes in the values of predictors or independent variables on dependent variables or response. There
are two major types of causal statistical studies: experimental studies and observational studies. In both types of
studies, the effect of differences of an independent variable (or variables) on the behavior of the dependent variable
are observed. The difference between the two types lies in how the study is actually conducted. Each can be very
effective. An experimental study involves taking measurements of the system under study, manipulating the system,
and then taking additional measurements using the same procedure to determine if the manipulation has modified the
values of the measurements. In contrast, an observational study does not involve experimental manipulation. Instead,
data are gathered and correlations between predictors and response are investigated.

Experiments
The basic steps of a statistical experiment are:
1. Planning the research, including finding the number of replicates of the study, using the following information:
preliminary estimates regarding the size of treatment effects, alternative hypotheses, and the estimated
experimental variability. Consideration of the selection of experimental subjects and the ethics of research is
necessary. Statisticians recommend that experiments compare (at least) one new treatment with a standard
treatment or control, to allow an unbiased estimate of the difference in treatment effects.
2. Design of experiments, using blocking to reduce the influence of confounding variables, and randomized
assignment of treatments to subjects to allow unbiased estimates of treatment effects and experimental error. At
this stage, the experimenters and statisticians write the experimental protocol that shall guide the performance of
the experiment and that specifies the primary analysis of the experimental data.
3. Performing the experiment following the experimental protocol and analyzing the data following the
experimental protocol.
4. Further examining the data set in secondary analyses, to suggest new hypotheses for future study.
5. Documenting and presenting the results of the study.
Experiments on human behavior have special concerns. The famous Hawthorne study examined changes to the
working environment at the Hawthorne plant of the Western Electric Company. The researchers were interested in
Statistics 6

determining whether increased illumination would increase the productivity of the assembly line workers. The
researchers first measured the productivity in the plant, then modified the illumination in an area of the plant and
checked if the changes in illumination affected productivity. It turned out that productivity indeed improved (under
the experimental conditions). However, the study is heavily criticized today for errors in experimental procedures,
specifically for the lack of a control group and blindness. The Hawthorne effect refers to finding that an outcome (in
this case, worker productivity) changed due to observation itself. Those in the Hawthorne study became more
productive not because the lighting was changed but because they were being observed.

Observational study
An example of an observational study is one that explores the correlation between smoking and lung cancer. This
type of study typically uses a survey to collect observations about the area of interest and then performs statistical
analysis. In this case, the researchers would collect observations of both smokers and non-smokers, perhaps through
a case-control study, and then look for the number of cases of lung cancer in each group.

Levels of measurement
There are four main levels of measurement used in statistics:
• nominal,
• ordinal,
• interval, and
• ratio.
They have different degrees of usefulness in statistical research. Ratio measurements have both a meaningful zero
value and the distances between different measurements defined; they provide the greatest flexibility in statistical
methods that can be used for analyzing the data. Interval measurements have meaningful distances between
measurements defined, but the zero value is arbitrary (as in the case with longitude and temperature measurements in
Celsius or Fahrenheit). Ordinal measurements have imprecise differences between consecutive values, but have a
meaningful order to those values. Nominal measurements have no meaningful rank order among values.
Because variables conforming only to nominal or ordinal measurements cannot be reasonably measured numerically,
sometimes they are grouped together as categorical variables, whereas ratio and interval measurements are grouped
together as quantitative or continuous variables due to their numerical nature.

Key terms used in statistics

Null hypothesis
Interpretation of statistical information can often involve the development of a null hypothesis in that the assumption
is that whatever is proposed as a cause has no effect on the variable being measured.
The best illustration for a novice is the predicament encountered by a jury trial. The null hypothesis, H0, asserts that
the defendant is innocent, whereas the alternative hypothesis, H1, asserts that the defendant is guilty.
The indictment comes because of suspicion of the guilt. The H0 (status quo) stands in opposition to H1 and is
maintained unless H1 is supported by evidence  “beyond a reasonable doubt”. However,  “failure to reject H0” in this
case does not imply innocence, but merely that the evidence was insufficient to convict. So the jury does not
necessarily accept H0 but fails to reject H0. While to the casual observer the difference appears moot,
misunderstanding the difference is one of the most common and arguably most serious errors made by
non-statisticians. Failure to reject the H0 does NOT prove that the H0 is true, as any crook with a good lawyer who
gets off because of insufficient evidence can attest to. While one can not  “prove” a null hypothesis one can test how
close it is to being true with a power test, which tests for type II errors.
Statistics 7

Error
Working from a null hypothesis two basic forms of error are recognised:
• Type I errors where the null hypothesis is falsely rejected giving a "false positive".
• Type II errors where the null hypothesis fails to be rejected and an actual difference between populations is
missed.
Error also refers to the extent to which individual observations in a sample differ from a central value, such as the
sample or population mean. Many statistical methods seek to minimize the mean-squared error, and these are called
"methods of least squares."
Measurement processes that generate statistical data are also subject to error. Many of these errors are classified as
random (noise) or systematic (bias), but other important types of errors (e.g., blunder, such as when an analyst
reports incorrect units) can also be important.

Confidence intervals
Most studies will only sample part of a population and then the result is used to interpret the null hypothesis in the
context of the whole population. Any estimates obtained from the sample only approximate the population value.
Confidence intervals allow statisticians to express how closely the sample estimate matches the true value in the
whole population. Often they are expressed as 95% confidence intervals. Formally, a 95% confidence interval of a
procedure is a range where, if the sampling an analysis were repeated under the same conditions, the interval would
include the true (population) value 95% of the time. This does not imply that the probability that the true value is in
the confidence interval is 95%. One quantity that is a probability for an estimated value is the credible interval from
Bayesian statistics.

Significance
Statistics rarely give a simple Yes/No type answer to the question asked of them. Interpretation often comes down to
the level of statistical significance applied to the numbers and often refer to the probability of a value accurately
rejecting the null hypothesis (sometimes referred to as the p-value).
Referring to statistical significance does not necessarily mean that the overall result is significant in real world terms.
For example, in a large study of a drug it may be shown that the drug has a statistically significant but very small
beneficial effect, such that the drug will be unlikely to help the patient in a noticeable way.

Examples
Some well-known statistical tests and procedures are:
• Analysis of variance (ANOVA)
• Chi-square test
• Correlation
• Factor analysis
• Mann–Whitney U
• Mean square weighted deviation (MSWD)
• Pearson product-moment correlation coefficient
• Regression analysis
• Spearman's rank correlation coefficient
• Student's t-test
• Time series analysis
Statistics 8

Specialized disciplines
Some fields of inquiry use applied statistics so extensively that they have specialized terminology. These disciplines
include:
• Actuarial science
• Applied information economics
• Biostatistics
• Business statistics
• Chemometrics (for analysis of data from chemistry)
• Data mining (applying statistics and pattern recognition to discover knowledge from data)
• Demography
• Econometrics
• Energy statistics
• Engineering statistics
• Epidemiology
• Geography and Geographic Information Systems, specifically in Spatial analysis
• Image processing
• Psychological statistics
• Reliability engineering
• Social statistics
In addition, there are particular types of statistical analysis that have also developed their own specialised
terminology and methodology:
• Bootstrap & Jackknife Resampling
• Statistical classification
• Statistical surveys
• Structured data analysis (statistics)
• Survival analysis
• Statistics in various sports, particularly baseball and cricket
Statistics form a key basis tool in business and manufacturing as well. It is used to understand measurement systems
variability, control processes (as in statistical process control or SPC), for summarizing data, and to make
data-driven decisions. In these roles, it is a key tool, and perhaps the only reliable tool.
Statistics 9

Statistical computing
The rapid and sustained increases in
computing power starting from the second
half of the 20th century have had a
substantial impact on the practice of
statistical science. Early statistical models
were almost always from the class of linear
models, but powerful computers, coupled
with suitable numerical algorithms, caused
an increased interest in nonlinear models
(such as neural networks) as well as the
creation of new types, such as generalized
linear models and multilevel models.

Increased computing power has also led to


the growing popularity of computationally
intensive methods based on resampling,
such as permutation tests and the bootstrap, gretl, an example of an open source statistical package
while techniques such as Gibbs sampling
have made use of Bayesian models more feasible. The computer revolution has implications for the future of
statistics with new emphasis on "experimental" and "empirical" statistics. A large number of both general and special
purpose statistical software are now available.

Misuse
There is a general perception that statistical knowledge is all-too-frequently intentionally misused by finding ways to
interpret only the data that are favorable to the presenter. The famous saying, "There are three kinds of lies: lies,
damned lies, and statistics".[12] which was popularized in the USA by Samuel Clemens and incorrectly attributed by
him to Disraeli (1804–1881), has come to represent the general mistrust [and misunderstanding] of statistical
science. Harvard President Lawrence Lowell wrote in 1909 that statistics, "...like veal pies, are good if you know the
person that made them, and are sure of the ingredients."
If various studies appear to contradict one another, then the public may come to distrust such studies. For example,
one study may suggest that a given diet or activity raises blood pressure, while another may suggest that it lowers
blood pressure. The discrepancy can arise from subtle variations in experimental design, such as differences in the
patient groups or research protocols, which are not easily understood by the non-expert. (Media reports usually omit
this vital contextual information entirely, because of its complexity.)
By choosing (or rejecting, or modifying) a certain sample, results can be manipulated. Such manipulations need not
be malicious or devious; they can arise from unintentional biases of the researcher. The graphs used to summarize
data can also be misleading.
Deeper criticisms come from the fact that the hypothesis testing approach, widely used and in many cases required
by law or regulation, forces one hypothesis (the null hypothesis) to be "favored," and can also seem to exaggerate the
importance of minor differences in large studies. A difference that is highly statistically significant can still be of no
practical significance. (See criticism of hypothesis testing and controversy over the null hypothesis.)
One response is by giving a greater emphasis on the p-value than simply reporting whether a hypothesis is rejected at
the given level of significance. The p-value, however, does not indicate the size of the effect. Another increasingly
common approach is to report confidence intervals. Although these are produced from the same calculations as those
of hypothesis tests or p-values, they describe both the size of the effect and the uncertainty surrounding it.
Statistics 10

Statistics applied to mathematics or the arts


Traditionally, statistics was concerned with drawing inferences using a semi-standardized methodology that was
"required learning" in most sciences. This has changed with use of statistics in non-inferential contexts. What was
once considered a dry subject, taken in many fields as a degree-requirement, is now viewed enthusiastically. Initially
derided by some mathematical purists, it is now considered essential methodology in certain areas.
• In number theory, scatter plots of data generated by a distribution function may be transformed with familiar tools
used in statistics to reveal underlying patterns, which may then lead to hypotheses.
• Methods of statistics including predictive methods in forecasting, are combined with chaos theory and fractal
geometry to create video works that are considered to have great beauty.
• The process art of Jackson Pollock relied on artistic experiments whereby underlying distributions in nature were
artistically revealed. With the advent of computers, methods of statistics were applied to formalize such
distribution driven natural processes, in order to make and analyze moving video art.
• Methods of statistics may be used predicatively in performance art, as in a card trick based on a Markov process
that only works some of the time, the occasion of which can be predicted using statistical methodology.
• Statistics is used to predicatively create art, as in applications of statistical mechanics with the statistical or
stochastic music invented by Iannis Xenakis, where the music is performance-specific. Though this type of
artistry does not always come out as expected, it does behave within a range predictable using statistics.

See also
• Glossary of probability and statistics • Forecasting
• Index of statistics articles • Foundations of statistics
• List of academic statistical associations • Multivariate statistics
• List of national and international statistical services • Official statistics
• List of important publications in statistics • Regression analysis
• List of statistical packages (software) • Statistical consultants
• Notation in probability and statistics • Statistician, List of
statisticians
• Structural equation modeling

• Statistical literacy
• Statistical modeling

Related disciplines
• Biostatistics
• Computational biology
• Computational sociology
• Network biology
• Social science
• Sociology
• Positivism
• Social research
Statistics 11

References
• Best, Joel (2001). Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists.
University of California Press. ISBN 0-520-21978-3.
• Desrosières, Alain (2004). The Politics of Large Numbers: A History of Statistical Reasoning. Trans. Camille
Naish. Harvard University Press. ISBN 0-674-68932-1.
• Hacking, Ian (1990). The Taming of Chance. Cambridge University Press. ISBN 0-521-38884-8.
• Lindley, D.V. (1985). Making Decisions (2nd ed. ed.). John Wiley & Sons. ISBN 0-471-90808-8.
• Tijms, Henk (2004). Understanding Probability: Chance Rules in Everyday life. Cambridge University Press.
ISBN 0-521-83329-9.

External links

Online non-commercial textbooks


• "A New View of Statistics" [13], by Will G. Hopkins, AUT University
• "NIST/SEMATECH e-Handbook of Statistical Methods" [14], by U.S. National Institute of Standards and
Technology and SEMATECH
• "Online Statistics: An Interactive Multimedia Course of Study" [15], by David Lane, Joan Lu, Camille Peres,
Emily Zitek, et al.
• "The Little Handbook of Statistical Practice" [16], by Gerard E. Dallal [17], Tufts University
• "StatSoft Electronic Textbook" [18], by StatSoft [19]

Other non-commercial resources


• Statistics [20] (OECD)
• Probability Web [21] (Carleton College)
• Free online statistics course with interactive practice exercises [22] (Carnegie Mellon University)
• Resources for Teaching and Learning about Probability and Statistics [23] (ERIC)
• Rice Virtual Lab in Statistics [24] (Rice University)
• Statistical Science Web [25] (University of Melbourne)
• Applied statistics applets [26]
• Statlib: data and software archives [27]
• StatProb [28] – peer reviewed Statistics and probability Wikipedia, Sponsored by a Collaborative of Statistics and
Probability Societies[29]

References
[1] Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9
[2] The Free Online Dictionary (http:/ / www. thefreedictionary. com/ dict. asp?Word=statistics)
[3] "Statistics" (http:/ / www. merriam-webster. com/ dictionary/ statistics). Merriam-Webster Online Dictionary. .
[4] "Statistic" (http:/ / www. merriam-webster. com/ dictionary/ statistic). Merriam-Webster Online Dictionary. .
[5] Moses, Lincoln E. Think and Explain with statistics, pp. 1–3. Addison-Wesley, 1986.
[6] Hays, William Lee, Statistics for the social sciences, Holt, Rinehart and Winston, 1973, p.xii, ISBN 978-0-03-077945-9
[7] Statistics at Encyclopedia of Mathematics (http:/ / us. oocities. com/ mathfair2002/ school/ plans. htm)
[8] Moore, David (1992). "Teaching Statistics as a Respectable Subject". Statistics for the Twenty-First Century. Washington, DC: The
Mathematical Association of America. pp. 14–25.
[9] Chance, Beth L.; Rossman, Allan J. (2005). "Preface" (http:/ / www. rossmanchance. com/ iscam/ preface. pdf). Investigating Statistical
Concepts, Applications, and Methods. Duxbury Press. ISBN 978-0495050643. .
[10] Anderson, , D.R.; Sweeney, D.J.; Williams, T.A.. Statistics: Concepts and Applications, pp. 5–9. West Publishing Company, 1986.
[11] Willcox, Walter (1938) The Founder of Statistics. (http:/ / www. jstor. org/ stable/ 1400906) Review of the International Statistical Institute
5(4):321–328.
Statistics 12

[12] Leonard H.Courtney (1832–1918) in a speech at Saratoga Springs’, New York, August 1895, in which this sentence appeared: ‘After all,
facts are facts, and although we may quote one to another with a chuckle the words of the Wise Statesman, “Lies – damned lies – and
statistics,” still there are some easy figures the simplest must understand, and the astutest cannot wriggle out of.’, earliest documented use of
exact phrase.
[13] http:/ / sportsci. org/ resource/ stats/
[14] http:/ / www. itl. nist. gov/ div898/ handbook/
[15] http:/ / onlinestatbook. com/ index. html
[16] http:/ / www. StatisticalPractice. com
[17] http:/ / www. tufts. edu/ ~gdallal/
[18] http:/ / www. statsoft. com/ textbook/ stathome. html
[19] http:/ / www. statsoft. com/ index. htm
[20] http:/ / stats. oecd. org/ Index. aspx
[21] http:/ / www. mathcs. carleton. edu/ probweb/ probweb. html
[22] http:/ / oli. web. cmu. edu/ openlearning/ forstudents/ freecourses/ statistics
[23] http:/ / www. ericdigests. org/ 2000-2/ resources. htm
[24] http:/ / www. onlinestatbook. com/ rvls. html
[25] http:/ / www. statsci. org
[26] http:/ / www. mbhs. edu/ ~steind00/ statistics. html
[27] http:/ / lib. stat. cmu. edu/
[28] http:/ / statprob. com/ encyclopedia
[29] http:/ / statprob. com/ ?op=about

Mathematics
Mathematics is the study of quantity, structure, space, and change.
Mathematicians seek out patterns,[2] [3] formulate new conjectures, and
establish truth by rigorous deduction from appropriately chosen axioms
and definitions.[4]
There is debate over whether mathematical objects such as numbers
and points exist naturally or are human creations. The mathematician
Benjamin Peirce called mathematics "the science that draws necessary
conclusions".[5] Albert Einstein, on the other hand, stated that "as far as
the laws of mathematics refer to reality, they are not certain; and as far
as they are certain, they do not refer to reality."[6] Euclid, Greek mathematician, 3rd century BC, as
imagined by Raphael in this detail from The
Through the use of abstraction and logical reasoning, mathematics [1]
School of Athens.
evolved from counting, calculation, measurement, and the systematic
study of the shapes and motions of physical objects. Practical
mathematics has been a human activity for as far back as written records exist. Rigorous arguments first appeared in
Greek mathematics, most notably in Euclid's Elements. Mathematics continued to develop, for example in China in
300 BC, in India in AD 100, and in the Muslim world in AD 800, until the Renaissance, when mathematical
innovations interacting with new scientific discoveries led to a rapid increase in the rate of mathematical discovery
that continues to the present day.[7]

Mathematics is used throughout the world as an essential tool in many fields, including natural science, engineering,
medicine, and the social sciences. Applied mathematics, the branch of mathematics concerned with application of
mathematical knowledge to other fields, inspires and makes use of new mathematical discoveries and sometimes
leads to the development of entirely new mathematical disciplines, such as statistics and game theory.
Mathematicians also engage in pure mathematics, or mathematics for its own sake, without having any application in
mind, although practical applications for what began as pure mathematics are often discovered.[8]
Mathematics 13

Etymology
The word "mathematics" comes from the Greek μάθημα (máthēma), which means learning, study, science, and
additionally came to have the narrower and more technical meaning "mathematical study", even in Classical times.[9]
Its adjective is μαθηματικός (mathēmatikós), related to learning, or studious, which likewise further came to mean
mathematical. In particular, μαθηματικὴ τέχνη (mathēmatikḗ tékhnē), Latin: ars mathematica, meant the
mathematical art.
The apparent plural form in English, like the French plural form les mathématiques (and the less commonly used
singular derivative la mathématique), goes back to the Latin neuter plural mathematica (Cicero), based on the Greek
plural τα μαθηματικά (ta mathēmatiká), used by Aristotle, and meaning roughly "all things mathematical"; although
it is plausible that English borrowed only the adjective mathematic(al) and formed the noun mathematics anew, after
the pattern of physics and metaphysics, which were inherited from the Greek.[10] In English, the noun mathematics
takes singular verb forms. It is often shortened to maths or, in English-speaking North America, math.

History
The evolution of mathematics might be seen as an ever-increasing
series of abstractions, or alternatively an expansion of subject matter.
The first abstraction, which is shared by many animals,[11] was
probably that of numbers: the realization that a collection of two apples
and a collection of two oranges (for example) have something in
common, namely quantity of their members.

In addition to recognizing how to count physical objects, prehistoric


peoples also recognized how to count abstract quantities, like time –
days, seasons, years.[12] Elementary arithmetic (addition, subtraction,
multiplication and division) naturally followed.
Since numeracy pre-dated writing, further steps were needed for
recording numbers such as tallies or the knotted strings called quipu
used by the Inca to store numerical data. Numeral systems have been
many and diverse, with the first known written numerals created by
Pythagoras (c.570-c.495 BC) has commonly been Egyptians in Middle Kingdom texts such as the Rhind Mathematical
given credit for discovering the Pythagorean Papyrus.
theorem. Well-known figures in Greek
mathematics also include Euclid, Archimedes,
and Thales.
Mathematics 14

The earliest uses of mathematics were in trading, land measurement,


painting and weaving patterns and the recording of time. More
complex mathematics did not appear until around 3000 BC, when the
Babylonians and Egyptians began using arithmetic, algebra and
geometry for taxation and other financial calculations, for building and
construction, and for astronomy.[13] The systematic study of
mathematics in its own right began with the Ancient Greeks between
600 and 300 BC.[14]

Mathematics has since been greatly extended, and there has been a
fruitful interaction between mathematics and science, to the benefit of
both. Mathematical discoveries continue to be made today. According
to Mikhail B. Sevryuk, in the January 2006 issue of the Bulletin of the
American Mathematical Society, "The number of papers and books
Mayan numerals
included in the Mathematical Reviews database since 1940 (the first
year of operation of MR) is now more than 1.9 million, and more than
75 thousand items are added to the database each year. The overwhelming majority of works in this ocean contain
new mathematical theorems and their proofs."[15]

Inspiration, pure and applied mathematics, and aesthetics


Mathematics arises from many different kinds of problems. At first
these were found in commerce, land measurement, architecture and
later astronomy; nowadays, all sciences suggest problems studied by
mathematicians, and many problems arise within mathematics itself.
For example, the physicist Richard Feynman invented the path integral
formulation of quantum mechanics using a combination of
mathematical reasoning and physical insight, and today's string theory,
a still-developing scientific theory which attempts to unify the four
fundamental forces of nature, continues to inspire new mathematics.[16]
Some mathematics is only relevant in the area that inspired it, and is
applied to solve further problems in that area. But often mathematics
inspired by one area proves useful in many areas, and joins the general
stock of mathematical concepts. A distinction is often made between
pure mathematics and applied mathematics. However pure
mathematics topics often turn out to have applications, e.g. number
Sir Isaac Newton (1643-1727), an inventor of
theory in cryptography. This remarkable fact that even the "purest"
infinitesimal calculus. mathematics often turns out to have practical applications is what
Eugene Wigner has called "the unreasonable effectiveness of
[17]
mathematics". As in most areas of study, the explosion of knowledge in the scientific age has led to
specialization: there are now hundreds of specialized areas in mathematics and the latest Mathematics Subject
Classification runs to 46 pages.[18] Several areas of applied mathematics have merged with related traditions outside
of mathematics and become disciplines in their own right, including statistics, operations research, and computer
science.

For those who are mathematically inclined, there is often a definite aesthetic aspect to much of mathematics. Many
mathematicians talk about the elegance of mathematics, its intrinsic aesthetics and inner beauty. Simplicity and
generality are valued. There is beauty in a simple and elegant proof, such as Euclid's proof that there are infinitely
Mathematics 15

many prime numbers, and in an elegant numerical method that speeds calculation, such as the fast Fourier transform.
G. H. Hardy in A Mathematician's Apology expressed the belief that these aesthetic considerations are, in
themselves, sufficient to justify the study of pure mathematics. He identified criteria such as significance,
unexpectedness, inevitability, and economy as factors that contribute to a mathematical aesthetic.[19] Mathematicians
often strive to find proofs of theorems that are particularly elegant, a quest Paul Erdős often referred to as finding
proofs from "The Book" in which God had written down his favorite proofs.[20] [21] The popularity of recreational
mathematics is another sign of the pleasure many find in solving mathematical questions.

Notation, language, and rigor


Most of the mathematical notation in use today was not invented until
the 16th century.[22] Before that, mathematics was written out in
words, a painstaking process that limited mathematical discovery.[23]
Euler (1707–1783) was responsible for many of the notations in use
today. Modern notation makes mathematics much easier for the
professional, but beginners often find it daunting. It is extremely
compressed: a few symbols contain a great deal of information. Like
musical notation, modern mathematical notation has a strict syntax
(which to a limited extent varies from author to author and from
discipline to discipline) and encodes information that would be
difficult to write in any other way.

Mathematical language can also be hard for beginners. Words such as


or and only have more precise meanings than in everyday speech.
Moreover, words such as open and field have been given specialized
Leonhard Euler, who created and popularized
mathematical meanings. Mathematical jargon includes technical terms much of the mathematical notation used today
such as homeomorphism and integrable. But there is a reason for
special notation and technical jargon: mathematics requires more precision than everyday speech. Mathematicians
refer to this precision of language and logic as "rigor".

Mathematical proof is fundamentally a matter of rigor. Mathematicians


want their theorems to follow from axioms by means of systematic
reasoning. This is to avoid mistaken "theorems", based on fallible
intuitions, of which many instances have occurred in the history of the
subject.[24] The level of rigor expected in mathematics has varied over
time: the Greeks expected detailed arguments, but at the time of Isaac
Newton the methods employed were less rigorous. Problems inherent
in the definitions used by Newton would lead to a resurgence of careful
analysis and formal proof in the 19th century. Misunderstanding the
rigor is a cause for some of the common misconceptions of
mathematics. Today, mathematicians continue to argue among
themselves about computer-assisted proofs. Since large computations
are hard to verify, such proofs may not be sufficiently rigorous.[25]
The infinity symbol ∞ in several typefaces.
Axioms in traditional thought were "self-evident truths", but that
conception is problematic. At a formal level, an axiom is just a string of symbols, which has an intrinsic meaning
only in the context of all derivable formulas of an axiomatic system. It was the goal of Hilbert's program to put all of
mathematics on a firm axiomatic basis, but according to Gödel's incompleteness theorem every (sufficiently
powerful) axiomatic system has undecidable formulas; and so a final axiomatization of mathematics is impossible.
Mathematics 16

Nonetheless mathematics is often imagined to be (as far as its formal content) nothing but set theory in some
axiomatization, in the sense that every mathematical statement or proof could be cast into formulas within set
theory.[26]

Mathematics as science
Carl Friedrich Gauss referred to mathematics as "the Queen of the
Sciences".[28] In the original Latin Regina Scientiarum, as well as in
German Königin der Wissenschaften, the word corresponding to
science means (field of) knowledge. Indeed, this is also the original
meaning in English, and there is no doubt that mathematics is in this
sense a science. The specialization restricting the meaning to natural
science is of later date. If one considers science to be strictly about the
physical world, then mathematics, or at least pure mathematics, is not a
science. Albert Einstein stated that "as far as the laws of mathematics
refer to reality, they are not certain; and as far as they are certain,
they do not refer to reality."[6]

Many philosophers believe that mathematics is not experimentally


falsifiable, and thus not a science according to the definition of Karl
Popper.[29] However, in the 1930s important work in mathematical
logic convinced many mathematicians that mathematics cannot be Carl Friedrich Gauss, himself known as the
[27]
"prince of mathematicians", referred to
reduced to logic alone, and Karl Popper concluded that "most
mathematics as "the Queen of the Sciences".
mathematical theories are, like those of physics and biology,
hypothetico-deductive: pure mathematics therefore turns out to be
much closer to the natural sciences whose hypotheses are conjectures, than it seemed even recently."[30] Other
thinkers, notably Imre Lakatos, have applied a version of falsificationism to mathematics itself.

An alternative view is that certain scientific fields (such as theoretical physics) are mathematics with axioms that are
intended to correspond to reality. In fact, the theoretical physicist, J. M. Ziman, proposed that science is public
knowledge and thus includes mathematics.[31] In any case, mathematics shares much in common with many fields in
the physical sciences, notably the exploration of the logical consequences of assumptions. Intuition and
experimentation also play a role in the formulation of conjectures in both mathematics and the (other) sciences.
Experimental mathematics continues to grow in importance within mathematics, and computation and simulation are
playing an increasing role in both the sciences and mathematics, weakening the objection that mathematics does not
use the scientific method. In his 2002 book A New Kind of Science, Stephen Wolfram argues that computational
mathematics deserves to be explored empirically as a scientific field in its own right.
The opinions of mathematicians on this matter are varied. Many mathematicians feel that to call their area a science
is to downplay the importance of its aesthetic side, and its history in the traditional seven liberal arts; others feel that
to ignore its connection to the sciences is to turn a blind eye to the fact that the interface between mathematics and its
applications in science and engineering has driven much development in mathematics. One way this difference of
viewpoint plays out is in the philosophical debate as to whether mathematics is created (as in art) or discovered (as
in science). It is common to see universities divided into sections that include a division of Science and Mathematics,
indicating that the fields are seen as being allied but that they do not coincide. In practice, mathematicians are
typically grouped with scientists at the gross level but separated at finer levels. This is one of many issues considered
in the philosophy of mathematics.
Mathematical awards are generally kept separate from their equivalents in science. The most prestigious award in
mathematics is the Fields Medal,[32] [33] established in 1936 and now awarded every 4 years. It is often considered
Mathematics 17

the equivalent of science's Nobel Prizes. The Wolf Prize in Mathematics, instituted in 1978, recognizes lifetime
achievement, and another major international award, the Abel Prize, was introduced in 2003. These are awarded for
a particular body of work, which may be innovation, or resolution of an outstanding problem in an established field.
A famous list of 23 such open problems, called "Hilbert's problems", was compiled in 1900 by German
mathematician David Hilbert. This list achieved great celebrity among mathematicians, and at least nine of the
problems have now been solved. A new list of seven important problems, titled the "Millennium Prize Problems",
was published in 2000. Solution of each of these problems carries a $1 million reward, and only one (the Riemann
hypothesis) is duplicated in Hilbert's problems.

Fields of mathematics
Mathematics can, broadly speaking, be subdivided into the study of
quantity, structure, space, and change (i.e. arithmetic, algebra,
geometry, and analysis). In addition to these main concerns, there are
also subdivisions dedicated to exploring links from the heart of
mathematics to other fields: to logic, to set theory (foundations), to the
empirical mathematics of the various sciences (applied mathematics),
and more recently to the rigorous study of uncertainty.
An abacus, a simple calculating tool used since
ancient times.
Quantity
The study of quantity starts with numbers, first the familiar natural numbers and integers ("whole numbers") and
arithmetical operations on them, which are characterized in arithmetic. The deeper properties of integers are studied
in number theory, from which come such popular results as Fermat's Last Theorem. Number theory also holds two
problems widely considered to be unsolved: the twin prime conjecture and Goldbach's conjecture.
As the number system is further developed, the integers are recognized as a subset of the rational numbers
("fractions"). These, in turn, are contained within the real numbers, which are used to represent continuous
quantities. Real numbers are generalized to complex numbers. These are the first steps of a hierarchy of numbers that
goes on to include quarternions and octonions. Consideration of the natural numbers also leads to the transfinite
numbers, which formalize the concept of "infinity". Another area of study is size, which leads to the cardinal
numbers and then to another conception of infinity: the aleph numbers, which allow meaningful comparison of the
size of infinitely large sets.

Natural Integers Rational Real numbers Complex numbers


numbers numbers

Structure
Many mathematical objects, such as sets of numbers and functions, exhibit internal structure as a consequence of
operations or relations that are defined on the set. Mathematics then studies properties of those sets that can be
expressed in terms of that structure; for instance number theory studies properties of the set of integers that can be
expressed in terms of arithmetic operations. Moreover, it frequently happens that different such structured sets (or
structures) exhibit similar properties, which makes it possible, by a further step of abstraction, to state axioms for a
class of structures, and then study at once the whole class of structures satisfying these axioms. Thus one can study
groups, rings, fields and other abstract systems; together such studies (for structures defined by algebraic operations)
constitute the domain of abstract algebra. By its great generality, abstract algebra can often be applied to seemingly
unrelated problems; for instance a number of ancient problems concerning compass and straightedge constructions
Mathematics 18

were finally solved using Galois theory, which involves field theory and group theory. Another example of an
algebraic theory is linear algebra, which is the general study of vector spaces, whose elements called vectors have
both quantity and direction, and can be used to model (relations between) points in space. This is one example of the
phenomenon that the originally unrelated areas of geometry and algebra have very strong interactions in modern
mathematics. Combinatorics studies ways of enumerating the number of objects that fit a given structure.

Combinatorics Number theory Group theory Graph theory Order theory

Space
The study of space originates with geometry – in particular, Euclidean geometry. Trigonometry is the branch of
mathematics that deals with relationships between the sides and the angles of triangles and with the trigonometric
functions; it combines space and numbers, and encompasses the well-known Pythagorean theorem. The modern
study of space generalizes these ideas to include higher-dimensional geometry, non-Euclidean geometries (which
play a central role in general relativity) and topology. Quantity and space both play a role in analytic geometry,
differential geometry, and algebraic geometry. Within differential geometry are the concepts of fiber bundles and
calculus on manifolds, in particular, vector and tensor calculus. Within algebraic geometry is the description of
geometric objects as solution sets of polynomial equations, combining the concepts of quantity and space, and also
the study of topological groups, which combine structure and space. Lie groups are used to study space, structure,
and change. Topology in all its many ramifications may have been the greatest growth area in 20th century
mathematics; it includes point-set topology, set-theoretic topology, algebraic topology and differential topology. In
particular, instances of modern day topology are metrizability theory, axiomatic set theory, homotopy theory, and
Morse theory. Topology also includes the now solved Poincaré conjecture and the controversial four color theorem,
whose only proof, by computer, has never been verified by a human.

Geometry Trigonometry Differential geometry Topology Fractal geometry Measure


Theory

Change
Understanding and describing change is a common theme in the natural sciences, and calculus was developed as a
powerful tool to investigate it. Functions arise here, as a central concept describing a changing quantity. The rigorous
study of real numbers and functions of a real variable is known as real analysis, with complex analysis the equivalent
field for the complex numbers. Functional analysis focuses attention on (typically infinite-dimensional) spaces of
functions. One of many applications of functional analysis is quantum mechanics. Many problems lead naturally to
relationships between a quantity and its rate of change, and these are studied as differential equations. Many
phenomena in nature can be described by dynamical systems; chaos theory makes precise the ways in which many of
these systems exhibit unpredictable yet still deterministic behavior.
Mathematics 19

Calculus Vector Differential Dynamical systems Chaos theory Complex analysis


calculus equations

Foundations and philosophy


In order to clarify the foundations of mathematics, the fields of mathematical logic and set theory were developed.
Mathematical logic includes the mathematical study of logic and the applications of formal logic to other areas of
mathematics; set theory is the branch of mathematics that studies sets or collections of objects. Category theory,
which deals in an abstract way with mathematical structures and relationships between them, is still in development.
The phrase "crisis of foundations" describes the search for a rigorous foundation for mathematics that took place
from approximately 1900 to 1930.[34] Some disagreement about the foundations of mathematics continues to present
day. The crisis of foundations was stimulated by a number of controversies at the time, including the controversy
over Cantor's set theory and the Brouwer-Hilbert controversy.
Mathematical logic is concerned with setting mathematics within a rigorous axiomatic framework, and studying the
implications of such a framework. As such, it is home to Gödel's incompleteness theorems which (informally) imply
that any formal system that contains basic arithmetic, if sound (meaning that all theorems that can be proven are
true), is necessarily incomplete (meaning that there are true theorems which cannot be proved in that system).
Whatever finite collection of number-theoretical axioms is taken as a foundation, Gödel showed how to construct a
formal statement that is a true number-theoretical fact, but which does not follow from those axioms. Therefore no
formal system is a complete axiomatization of full number theory. Modern logic is divided into recursion theory,
model theory, and proof theory, and is closely linked to theoretical computer science.

Mathematical logic Set theory Category theory

Theoretical computer science


Theoretical computer science includes computability theory, computational complexity theory, and information
theory. Computability theory examines the limitations of various theoretical models of the computer, including the
most powerful known model – the Turing machine. Complexity theory is the study of tractability by computer; some
problems, although theoretically solvable by computer, are so expensive in terms of time or space that solving them
is likely to remain practically unfeasible, even with rapid advance of computer hardware. A famous problem is the
"P=NP?" problem, one of the Millennium Prize Problems.[35] Finally, information theory is concerned with the
amount of data that can be stored on a given medium, and hence deals with concepts such as compression and
entropy.

Theory of Cryptography
computation
Mathematics 20

Applied mathematics
Applied mathematics considers the use of abstract mathematical tools in solving concrete problems in the sciences,
business, and other areas.
Applied mathematics has significant overlap with the discipline of statistics, whose theory is formulated
mathematically, especially with probability theory. Statisticians (working as part of a research project) "create data
that makes sense" with random sampling and with randomized experiments; the design of a statistical sample or
experiment specifies the analysis of the data (before the data be available). When reconsidering data from
experiments and samples or when analyzing data from observational studies, statisticians "make sense of the data"
using the art of modelling and the theory of inference – with model selection and estimation; the estimated models
and consequential predictions should be tested on new data.[36]
Computational mathematics proposes and studies methods for solving mathematical problems that are typically too
large for human numerical capacity. Numerical analysis studies methods for problems in analysis using ideas of
functional analysis and techniques of approximation theory; numerical analysis includes the study of approximation
and discretization broadly with special concern for rounding errors. Other areas of computational mathematics
include computer algebra and symbolic computation.

Mathematical physics Fluid dynamics Numerical analysis Optimization


(mathematics)Optimization

Probability theory Statistics Financial mathematics Game theory

Mathematical Mathematical Mathematical economics Control theory


biology chemistry
Mathematics 21

See also
• Definitions of mathematics
• Dyscalculia
• Iatromathematicians
• Logics
• Mathematical anxiety
• Mathematical game
• Mathematical model
• Mathematical problem
• Mathematical structure
• Mathematics and art
• Mathematics competitions
• Mathematics education
• Mathematics portal
• Pattern
• Philosophy of mathematics
• Pseudomathematics

References
• Benson, Donald C., The Moment of Proof: Mathematical Epiphanies, Oxford University Press, USA; New Ed
edition (December 14, 2000). ISBN 0-19-513919-4.
• Boyer, Carl B., A History of Mathematics, Wiley; 2 edition (March 6, 1991). ISBN 0-471-54397-7. — A concise
history of mathematics from the Concept of Number to contemporary Mathematics.
• Courant, R. and H. Robbins, What Is Mathematics? : An Elementary Approach to Ideas and Methods, Oxford
University Press, USA; 2 edition (July 18, 1996). ISBN 0-19-510519-2.
• Davis, Philip J. and Hersh, Reuben, The Mathematical Experience. Mariner Books; Reprint edition (January 14,
1999). ISBN 0-395-92968-7. — A gentle introduction to the world of mathematics.
• Einstein, Albert (1923). Sidelights on Relativity (Geometry and Experience). P. Dutton., Co.
• Eves, Howard, An Introduction to the History of Mathematics, Sixth Edition, Saunders, 1990, ISBN
0-03-029558-0.
• Gullberg, Jan, Mathematics — From the Birth of Numbers. W. W. Norton & Company; 1st edition (October
1997). ISBN 0-393-04002-X. — An encyclopedic overview of mathematics presented in clear, simple language.
• Hazewinkel, Michiel (ed.), Encyclopaedia of Mathematics. Kluwer Academic Publishers 2000. — A translated
and expanded version of a Soviet mathematics encyclopedia, in ten (expensive) volumes, the most complete and
authoritative work available. Also in paperback and on CD-ROM, and online [37].
• Jourdain, Philip E. B., The Nature of Mathematics, in The World of Mathematics, James R. Newman, editor,
Dover Publications, 2003, ISBN 0-486-43268-8.
• Kline, Morris, Mathematical Thought from Ancient to Modern Times, Oxford University Press, USA; Paperback
edition (March 1, 1990). ISBN 0-19-506135-7.
• Monastyrsky, Michael (2001) (PDF). Some Trends in Modern Mathematics and the Fields Medal [38]. Canadian
Mathematical Society. Retrieved 2006-07-28.
• Oxford English Dictionary, second edition, ed. John Simpson and Edmund Weiner, Clarendon Press, 1989, ISBN
0-19-861186-2.
• The Oxford Dictionary of English Etymology, 1983 reprint. ISBN 0-19-861112-9.
• Pappas, Theoni, The Joy Of Mathematics, Wide World Publishing; Revised edition (June 1989). ISBN
0-933174-65-9.
Mathematics 22

• Peirce, Benjamin (1882). "Linear Associative Algebra" [39]. American Journal of Mathematics (Vol. 4, No. 1/4.
(1881)..
• Peterson, Ivars, Mathematical Tourist, New and Updated Snapshots of Modern Mathematics, Owl Books, 2001,
ISBN 0-8050-7159-8.
• Paulos, John Allen (1996). A Mathematician Reads the Newspaper. Anchor. ISBN 0-385-48254-X.
• Popper, Karl R. (1995). "On knowledge". In Search of a Better World: Lectures and Essays from Thirty Years.
Routledge. ISBN 0-415-13548-6.
• Riehm, Carl (August 2002). "The Early History of the Fields Medal" [40] (PDF). Notices of the AMS (AMS) 49
(7): 778–782.
• Sevryuk, Mikhail B. (January 2006). "Book Reviews" [41] (PDF). Bulletin of the American Mathematical Society
43 (1): 101–109. doi:10.1090/S0273-0979-05-01069-4. Retrieved 2006-06-24.
• Waltershausen, Wolfgang Sartorius von (1856, repr. 1965). Gauss zum Gedächtniss [42]. Sändig Reprint Verlag
H. R. Wohlwend. ISBN 3-253-01702-8.
• Ziman, J.M., F.R.S. (1968). Public Knowledge:An essay concerning the social dimension of science [43].

External links
• Free Mathematics books [44] Free Mathematics books collection.
• Encyclopaedia of Mathematics online encyclopaedia from Springer [45], Graduate-level reference work with over
8,000 entries, illuminating nearly 50,000 notions in mathematics.
• HyperMath site at Georgia State University [46]
• FreeScience Library [47] The mathematics section of FreeScience library
• Rusin, Dave: The Mathematical Atlas [48]. A guided tour through the various branches of modern mathematics.
(Can also be found at NIU.edu [49].)
• Polyanin, Andrei: EqWorld: The World of Mathematical Equations [50]. An online resource focusing on algebraic,
ordinary differential, partial differential (mathematical physics), integral, and other mathematical equations.
• Cain, George: Online Mathematics Textbooks [51] available free online.
• Tricki [52], Wiki-style site that is intended to develop into a large store of useful mathematical problem-solving
techniques.
• Mathematical Structures [53], list information about classes of mathematical structures.
• Math & Logic: The history of formal mathematical, logical, linguistic and methodological ideas. [54] In The
Dictionary of the History of Ideas.
• Mathematician Biographies [55]. The MacTutor History of Mathematics archive Extensive history and quotes
from all famous mathematicians.
• Metamath [56]. A site and a language, that formalize mathematics from its foundations.
• Nrich [57], a prize-winning site for students from age five from Cambridge University
• Open Problem Garden [58], a wiki of open problems in mathematics
• Planet Math [59]. An online mathematics encyclopedia under construction, focusing on modern mathematics.
Uses the Attribution-ShareAlike license, allowing article exchange with Wikipedia. Uses TeX markup.
• Some mathematics applets, at MIT [60]
• Weisstein, Eric et al.: MathWorld: World of Mathematics [61]. An online encyclopedia of mathematics.
• Patrick Jones' Video Tutorials [62] on Mathematics
• Citizendium: Theory (mathematics) [63].
Mathematics 23

References
[1] No likeness or description of Euclid's physical appearance made during his lifetime survived antiquity. Therefore, Euclid's depiction in works
of art depends on the artist's imagination (see Euclid).
[2] Steen, L.A. (April 29, 1988). The Science of Patterns. Science, 240: 611–616. and summarized at Association for Supervision and
Curriculum Development. (http:/ / www. ascd. org/ portal/ site/ ascd/ template. chapter/ menuitem. 1889bf0176da7573127855b3e3108a0c/
?chapterMgmtId=f97433df69abb010VgnVCM1000003d01a8c0RCRD), ascd.org
[3] Devlin, Keith, Mathematics: The Science of Patterns: The Search for Order in Life, Mind and the Universe (Scientific American Paperback
Library) 1996, ISBN 978-0-7167-5047-5
[4] Jourdain.
[5] Peirce, p. 97.
[6] Einstein, p. 28. The quote is Einstein's answer to the question: "how can it be that mathematics, being after all a product of human thought
which is independent of experience, is so admirably appropriate to the objects of reality?" He, too, is concerned with The Unreasonable
Effectiveness of Mathematics in the Natural Sciences.
[7] Eves
[8] Peterson
[9] Both senses can be found in Plato. Liddell and Scott, s.voceμαθηματικός
[10] The Oxford Dictionary of English Etymology, Oxford English Dictionary, sub "mathematics", "mathematic", "mathematics"
[11] S. Dehaene; G. Dehaene-Lambertz; L. Cohen (Aug 1998). "Abstract representations of numbers in the animal and human brain". Trends in
Neuroscience 21 (8): 355–361. doi:10.1016/S0166-2236(98)01263-6.
[12] See, for example, Raymond L. Wilder, Evolution of Mathematical Concepts; an Elementary Study, passim
[13] Kline 1990, Chapter 1.
[14] " A History of Greek Mathematics: From Thales to Euclid (http:/ / books. google. com/ books?id=drnY3Vjix3kC& pg=PA1& dq&
hl=en#v=onepage& q=& f=false)". Thomas Little Heath (1981). ISBN 0-486-24073-8
[15] Sevryuk
[16] Johnson, Gerald W.; Lapidus, Michel L. (2002). The Feynman Integral and Feynman's Operational Calculus. Oxford University Press.
ISBN 0821824139.
[17] Eugene Wigner, 1960, " The Unreasonable Effectiveness of Mathematics in the Natural Sciences, (http:/ / www. dartmouth. edu/ ~matc/
MathDrama/ reading/ Wigner. html)" Communications on Pure and Applied Mathematics 13(1): 1–14.
[18] Mathematics Subject Classification 2010 (http:/ / www. ams. org/ mathscinet/ msc/ pdfs/ classification2010. pdf)
[19] Hardy, G. H. (1940). A Mathematician's Apology. Cambridge University Press. ISBN 0521427061.
[20] Gold, Bonnie; Simons, Rogers A. (2008). Proof and Other Dilemmas: Mathematics and Philosophy. MAA.
[21] Aigner, Martin; Ziegler, Gunter M. (2001). Proofs from the Book. Springer. ISBN 3540404600.
[22] Earliest Uses of Various Mathematical Symbols (http:/ / jeff560. tripod. com/ mathsym. html) (Contains many further references).
[23] Kline, p. 140, on Diophantus; p.261, on Vieta.
[24] See false proof for simple examples of what can go wrong in a formal proof. The history of the Four Color Theorem contains examples of
false proofs accidentally accepted by other mathematicians at the time.
[25] Ivars Peterson, The Mathematical Tourist, Freeman, 1988, ISBN 0-7167-1953-3. p. 4 "A few complain that the computer program can't be
verified properly", (in reference to the Haken-Apple proof of the Four Color Theorem).
[26] Patrick Suppes, Axiomatic Set Theory, Dover, 1972, ISBN 0-486-61630-4. p. 1, "Among the many branches of modern mathematics set
theory occupies a unique place: with a few rare exceptions the entities which are studied and analyzed in mathematics may be regarded as
certain particular sets or classes of objects."
[27] Zeidler, Eberhard (2004). Oxford User's Guide to Mathematics. Oxford, UK: Oxford University Press. p. 1188. ISBN 0198507631.
[28] Waltershausen
[29] Shasha, Dennis Elliot; Lazere, Cathy A. (1998). Out of Their Minds: The Lives and Discoveries of 15 Great Computer Scientists. Springer.
p. 228.
[30] Popper 1995, p. 56
[31] Ziman
[32] "The Fields Medal is now indisputably the best known and most influential award in mathematics." Monastyrsky
[33] Riehm
[34] Luke Howard Hodgkin & Luke Hodgkin, A History of Mathematics, Oxford University Press, 2005.
[35] Clay Mathematics Institute (http:/ / www. claymath. org/ millennium/ P_vs_NP/ ), P=NP, claymath.org
[36] Like other mathematical sciences such as physics and computer science, statistics is an autonomous discipline rather than a branch of
applied mathematics. Like research physicists and computer scientists, research statisticians are mathematical scientists. Many statisticians
have a degree in mathematics, and some statisticians are also mathematicians.
[37] http:/ / eom. springer. de/ default. htm
[38] http:/ / www. fields. utoronto. ca/ aboutus/ FieldsMedal_Monastyrsky. pdf
[39] http:/ / books. google. com/ ?id=De0GAAAAYAAJ& pg=PA1& dq=Peirce+ Benjamin+ Linear+ Associative+ Algebra+ & q=
[40] http:/ / www. ams. org/ notices/ 200207/ comm-riehm. pdf
Mathematics 24

[41] http:/ / www. ams. org/ bull/ 2006-43-01/ S0273-0979-05-01069-4/ S0273-0979-05-01069-4. pdf
[42] http:/ / www. amazon. de/ Gauss-Ged%e4chtnis-Wolfgang-Sartorius-Waltershausen/ dp/ 3253017028
[43] http:/ / info. med. yale. edu/ therarad/ summers/ ziman. htm
[44] http:/ / freebookcentre. net/ SpecialCat/ Free-Mathematics-Books-Download. html
[45] http:/ / eom. springer. de
[46] http:/ / hyperphysics. phy-astr. gsu. edu/ Hbase/ hmat. html
[47] http:/ / www. freescience. info/ mathematics. php
[48] http:/ / www. math-atlas. org/
[49] http:/ / www. math. niu. edu/ ~rusin/ known-math/ index/ index. html
[50] http:/ / eqworld. ipmnet. ru/
[51] http:/ / www. math. gatech. edu/ ~cain/ textbooks/ onlinebooks. html
[52] http:/ / www. tricki. org/
[53] http:/ / math. chapman. edu/ cgi-bin/ structures?HomePage
[54] http:/ / etext. lib. virginia. edu/ DicHist/ analytic/ anaVII. html
[55] http:/ / www-history. mcs. st-and. ac. uk/ ~history/
[56] http:/ / metamath. org/
[57] http:/ / www. nrich. maths. org/ public/ index. php
[58] http:/ / garden. irmacs. sfu. ca
[59] http:/ / planetmath. org/
[60] http:/ / www-math. mit. edu/ daimp
[61] http:/ / www. mathworld. com/
[62] http:/ / www. youtube. com/ user/ patrickJMT
[63] http:/ / en. citizendium. org/ wiki/ Theory_(mathematics)

Median
In probability theory and statistics, a median is described as the numeric value separating the higher half of a
sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be
found by arranging all the observations from lowest value to highest value and picking the middle one. If there is an
even number of observations, then there is no single middle value; the median is then usually defined to be the mean
of the two middle values.[1] [2]
In a sample of data, or a finite population, there may be no member of the sample whose value is identical to the
median (in the case of an even sample size) and, if there is such a member, there may be more than one so that the
median may not uniquely identify a sample member. Nonetheless the value of the median is uniquely determined
with the usual definition. A related concept, in which the outcome is forced to correspond to a member of the sample
is the medoid.
At most half the population have values less than the median and at most half have values greater than the median. If
both groups contain less than half the population, then some of the population is exactly equal to the median. For
example, if a < b < c, then the median of the list {a, b, c} is b, and if a < b < c < d, then the median of the list
{a, b, c, d} is the mean of b and c, i.e. it is (b + c)/2.
The median can be used as a measure of location when a distribution is skewed, when end values are not known, or
when one requires reduced importance to be attached to outliers, e.g. because they may be measurement errors. A
disadvantage of the median is the difficulty of handling it theoretically.
Median 25

Notation
[3]
The median of some variable x is denoted either as or as

Measures of statistical dispersion


When the median is used as a location parameter in descriptive statistics, there are several choices for a measure of
variability: the range, the interquartile range, the mean absolute deviation, and the median absolute deviation. Since
the median is the same as the second quartile, its calculation is illustrated in the article on quartiles.

Medians of probability distributions


For any probability distribution on the real line with cumulative distribution function F, regardless of whether it is
any kind of continuous probability distribution, in particular an absolutely continuous distribution (and therefore has
a probability density function), or a discrete probability distribution, a median m satisfies the inequalities

or

in which a Lebesgue–Stieltjes integral is used. For an absolutely continuous probability distribution with probability
density function ƒ, we have

Medians of particular distributions


The medians of certain types of distributions can be easily calculated from their parameters: The median of a normal
distribution with mean μ and variance σ2 is μ. In fact, for a normal distribution, mean = median = mode. The median
of a uniform distribution in the interval [a, b] is (a + b) / 2, which is also the mean. The median of a Cauchy
distribution with location parameter x0 and scale parameter y is x0, the location parameter. The median of an
exponential distribution with rate parameter λ is the natural logarithm of 2 divided by the rate parameter: λ−1ln 2.
The median of a Weibull distribution with shape parameter k and scale parameter λ is λ(ln 2)1/k.

Medians in descriptive statistics


The median is primarily used for skewed distributions, which it summarizes differently than the arithmetic mean.
Consider the multiset { 1, 2, 2, 2, 3, 14 }. The median is 2 in this case, as is the mode, and it might be seen as a better
indication of central tendency than the arithmetic mean of 4.
Calculation of medians is a popular technique in summary statistics and summarizing statistical data, since it is
simple to understand and easy to calculate, while also giving a measure that is more robust in the presence of outlier
values than is the mean.
Median 26

Theoretical properties

An optimality property
A median is also a central point which minimizes the average of the absolute deviations. In the above example, the
median value of 2 minimizes the average of the absolute deviations (1 + 0 + 0 + 0 + 1 + 12) / 6 = 2.33; in contrast,
the mean value of 4 minimizes the average of the squares (9 + 4 + 4 + 4 + 1 + 100) / 6 = 20.33. In the language of
statistics, a value of c that minimizes

is a median of the probability distribution of the random variable X.


However, a median c need not be uniquely defined. Where exactly one median exists, statisticians speak of "the
median" correctly; even when no unique median exists, some statisticians speak of "the median" informally.

An inequality relating means and medians


For continuous probability distributions, the difference between the median and the mean is less than or equal to one
standard deviation. See an inequality on location and scale parameters.

The sample median

Efficient computation of the sample median


Even though sorting n items generally requires O(n log n) operations, the median of n items can be computed with
only O(n) operations. In fact, one can always find the kth smallest of n items with a O(n)-operations selection
algorithm.

Easy explanation of the sample median

For an odd number of values


As an example, we will calculate the sample median for the following set of observations: 1, 5, 2, 8, 7.
Start by sorting the values: 1, 2, 5, 7, 8.
In this case, the median is 5 since it is the middle observation in the ordered list.

For an even number of values


As an example, we will calculate the sample median for the following set of observations: 1, 5, 2, 8, 7, 2.
Start by sorting the values: 1, 2, 2, 5, 7, 8.
In this case, the average of the two middlemost terms is (2 + 5)/2 = 3.5. Therefore, the median is 3.5 since it is the
average of the middle observations in the ordered list.

Other estimates of the median


If data are represented by a statistical model specifying a particular family of probability distributions, then estimates
of the median can be obtained by fitting that family of probability distributions to the data and calculating the
theoretical median of the fitted distribution. See, for example Pareto interpolation.
Median 27

Median-unbiased estimators, and bias with respect to loss functions


Any mean-unbiased estimator minimizes the risk (expected loss) with respect to the squared-error loss function, as
observed by Gauss. A median-unbiased estimator minimizes the risk with respect to the absolute-deviation loss
function, as observed by Laplace. Other loss functions are used in statistical theory, particularly in robust statistics.
The theory of median-unbiased estimators was revived by George W. Brown [4] in 1947:
An estimate of a one-dimensional parameter θ will be said to be median-unbiased, if for fixed θ, the
median of the distribution of the estimate is at the value θ, i.e., the estimate underestimates just as often
as it overestimates. This requirement seems for most purposes to accomplish as much as the
mean-unbiased requirement and has the additional property that it is invariant under one-to-one
transformation. [page 584]
Further properties of median-unbiased estimators have been noted by Lehmann, Birnbaum, van der Vaart and
Pfanzagl. In particular, median-unbiased estimators exist in cases where mean-unbiased and maximum-likelihood
estimators do not exist. Besides being invariant under one-to-one transformations, median-unbiased estimators have
surprising robustness.

In image processing
In monochrome raster images there is a type of noise, known as the salt and pepper noise, when each pixel
independently become black (with some small probability) or white (with some small probability), and is unchanged
otherwise (with the probability close to 1). An image constructed of median values of neighborhoods (like 3×3
square) can effectively reduce noise in this case.

In multidimensional statistical inference


In multidimensional statistical inference, the value that minimizes is also called a centroid.[5] In
this case is indicating a norm for the vector difference, such as the Euclidean norm, rather than the
one-dimensional case's use of an absolute value. (Note that in some other contexts a centroid is more like a
multidimensional mean than the multidimensional median described here.)
Like a centroid, a medoid minimizes , but is restricted to be a member of specified set. For
instance, the set could be a sample of points drawn from some distribution.

History
Gustav Fechner popularized the median into the formal analysis of data, although it had been used previously by
Laplace.[6]

See also
• Order statistic
• Quantile
• A median is the 2nd quartile, 5th decile, and 50th percentile.
• A sample-median is median-unbiased but can be a mean-biased estimator.
• Absolute deviation
• Concentration of measure for Lipschitz functions
• An inequality on location and scale parameters
• Median voter theory
• Median graph
• The centerpoint is a generalization of the median for data in higher dimensions.
Median 28

• Median search
• Hinges

References
[1] http:/ / mathworld. wolfram. com/ StatisticalMedian. html Weisstein, Eric W. "Statistical Median." From MathWorld--A Wolfram Web
Resource.
[2] http:/ / www. stat. psu. edu/ old_resources/ ClassNotes/ ljs_07/ sld008. htm Simon, Laura J "Descriptive statistics" Statistical Education
Resource Kit Penn State Department of Statistics
[3] http:/ / mathworld. wolfram. com/ StatisticalMedian. html
[4] http:/ / www. universityofcalifornia. edu/ senate/ inmemoriam/ georgewbrown. htm
[5] Carvalho, Luis; Lawrence, Charles (2008), "Centroid estimation in discrete high-dimensional spaces with applications in biology", Proc Natl
Acad Sci U S A 105 (9): 3209–3214, doi:10.1073/pnas.0712329105
[6] Keynes, John Maynard; A Treatise on Probability (1921), Pt II Ch XVII §5 (p 201).

• Brown, George W. (http://www.universityofcalifornia.edu/senate/inmemoriam/georgewbrown.htm) ”On


Small-Sample Estimation.” The Annals of Mathematical Statistics, Vol. 18, No. 4 (Dec., 1947), pp. 582–585.
• Lehmann, E. L. “A General Concept of Unbiasedness” The Annals of Mathematical Statistics, Vol. 22, No. 4
(Dec., 1951), pp. 587–592.
• Allan Birnbaum. 1961. “A Unified Theory of Estimation, I”, The Annals of Mathematical Statistics, Vol. 32, No. 1
(Mar., 1961), pp. 112–135
• van der Vaart, H. R. 1961. “Some Extensions of the Idea of Bias” The Annals of Mathematical Statistics, Vol. 32,
No. 2 (Jun., 1961), pp. 436–447.
• Pfanzagl, Johann; with the assistance of R. Hamböker (1994). Parametric Statistical Theory. Walter de Gruyter.
ISBN 3-11-01-3863-8. MR1291393

External links
• A Guide to Understanding & Calculating the Median (http://stats4students.com/
measures-of-central-tendency-2.php)
• Median as a weighted arithmetic mean of all Sample Observations (http://www.accessecon.com/pubs/EB/
2004/Volume3/EB-04C10011A.pdf)
• On-line calculator (http://www.poorcity.richcity.org/cgi-bin/inequality.cgi)
• Calculating the median (http://www.statcan.ca/english/edu/power/ch11/median/median.htm)
• A problem involving the mean, the median, and the mode. (http://mathschallenge.net/index.
php?section=problems&show=true&titleid=average_problem)
• mathworld: Statistical Median (http://mathworld.wolfram.com/StatisticalMedian.html)
• Python script (http://www.poorcity.richcity.org/oei/#GiniHooverTheil) for Median computations and income
inequality metrics
This article incorporates material from Median of a distribution on PlanetMath, which is licensed under the
Creative Commons Attribution/Share-Alike License.
Mean 29

Mean
In statistics, mean has two related meanings:
• the arithmetic mean (and is distinguished from the geometric mean or harmonic mean).
• the expected value of a random variable, which is also called the population mean.
There are other statistical measures that use samples that some people confuse with averages - including 'median' and
'mode'. Other simple statistical analyses use measures of spread, such as range, interquartile range, or standard
deviation. For a real-valued random variable X, the mean is the expectation of X. Note that not every probability
distribution has a defined mean (or variance); see the Cauchy distribution for an example.
For a data set, the mean is the sum of the values divided by the number of values. The mean of a set of numbers x1,
x2, ..., xn is typically denoted by , pronounced "x bar". This mean is a type of arithmetic mean. If the data set was
based on a series of observations obtained by sampling a statistical population, this mean is termed the "sample
mean" to distinguish it from the "population mean". The mean is often quoted along with the standard deviation: the
mean describes the central location of the data, and the standard deviation describes the spread. An alternative
measure of dispersion is the mean deviation, equivalent to the average absolute deviation from the mean. It is less
sensitive to outliers, but less mathematically tractable.
If a series of observations is sampled from a larger population (measuring the heights of a sample of adults drawn
from the entire world population, for example), or from a probability distribution which gives the probabilities of
each possible result, then the larger population or probability distribution can be used to construct a "population
mean", which is also the expected value for a sample drawn from this population or probability distribution. For a
finite population, this would simply be the arithmetic mean of the given property for every member of the
population. For a probability distribution, this would be a sum or integral over every possible value weighted by the
probability of that value. It is a universal convention to represent the population mean by the symbol .[1] In the
case of a discrete probability distribution, the mean of a discrete random variable x is given by taking the product of
each possible value of x and its probability P(x), and then adding all these products together, giving
.[2]

The sample mean may differ from the population mean, especially for small samples, but the law of large numbers
dictates that the larger the size of the sample, the more likely it is that the sample mean will be close to the
population mean.[3]
As well as statistics, means are often used in geometry and analysis; a wide range of means have been developed for
these purposes, which are not much used in statistics. These are listed below.

Examples of means

Arithmetic mean (AM)


The arithmetic mean is the "standard" average, often simply called the "mean".

The mean may often be confused with the median, mode or range. The mean is the arithmetic average of a set of
values, or distribution; however, for skewed distributions, the mean is not necessarily the same as the middle value
(median), or the most likely (mode). For example, mean income is skewed upwards by a small number of people
with very large incomes, so that the majority have an income lower than the mean. By contrast, the median income is
the level at which half the population is below and half is above. The mode income is the most likely income, and
favors the larger number of people with lower incomes. The median or mode are often more intuitive measures of
Mean 30

such data.
Nevertheless, many skewed distributions are best described by their mean – such as the exponential and Poisson
distributions.
For example, the arithmetic mean of six values: 34, 27, 45, 55, 22, 34 is

Geometric mean (GM)


The geometric mean is an average that is useful for sets of positive numbers that are interpreted according to their
product and not their sum (as is the case with the arithmetic mean) e.g. rates of growth.

For example, the geometric mean of six values: 34, 27, 45, 55, 22, 34 is:

Harmonic mean (HM)


The harmonic mean is an average which is useful for sets of numbers which are defined in relation to some unit, for
example speed (distance per unit of time).

For example, the harmonic mean of the six values: 34, 27, 45, 55, 22, and 34 is

Relationship between AM, GM, and HM


AM, GM, and HM satisfy these inequalities:

Equality holds only when all the elements of the given sample are equal.

Generalized means

Power mean
The generalized mean, also known as the power mean or Hölder mean, is an abstraction of the quadratic, arithmetic,
geometric and harmonic means. It is defined for a set of n positive numbers xi by

By choosing the appropriate value for the parameter m we get


Mean 31

maximum,

quadratic mean,

arithmetic
mean,

geometric mean,

harmonic mean,

minimum.

ƒ-mean
This can be generalized further as the generalized f-mean

and again a suitable choice of an invertible ƒ will give

harmonic mean,

power mean,

geometric
mean.

Weighted arithmetic mean


The weighted arithmetic mean is used, if one wants to combine average values from samples of the same population
with different sample sizes:

The weights represent the bounds of the partial sample. In other applications they represent a measure for the
reliability of the influence upon the mean by respective values.

Truncated mean
Sometimes a set of numbers might contain outliers, i.e. a datum which is much lower or much higher than the others.
Often, outliers are erroneous data caused by artifacts. In this case one can use a truncated mean. It involves
discarding given parts of the data at the top or the bottom end, typically an equal amount at each end, and then taking
the arithmetic mean of the remaining data. The number of values removed is indicated as a percentage of total
number of values.
Mean 32

Interquartile mean
The interquartile mean is a specific example of a truncated mean. It is simply the arithmetic mean after removing the
lowest and the highest quarter of values.

assuming the values have been ordered, so is simply a specific example of a weighted mean for a specific set of
weights.

Mean of a function
In calculus, and especially multivariable calculus, the mean of a function is loosely defined as the average value of
the function over its domain. In one variable, the mean of a function f(x) over the interval (a,b) is defined by

(See also mean value theorem.) In several variables, the mean over a relatively compact domain U in a Euclidean
space is defined by

This generalizes the arithmetic mean. On the other hand, it is also possible to generalize the geometric mean to
functions by defining the geometric mean of f to be

More generally, in measure theory and probability theory either sort of mean plays an important role. In this context,
Jensen's inequality places sharp estimates on the relationship between these two different notions of the mean of a
function.
There is also a harmonic average of functions and a quadratic average (or root mean square) of functions.

Mean of a Probability Distribution


See expected value

Mean of angles
Most of the usual means fail on circular quantities, like angles, daytimes, fractional parts of real numbers. For those
quantities you need a mean of circular quantities.

Fréchet mean
The Fréchet mean gives a manner for determining the "center" of a mass distribution on a surface or, more generally,
Riemannian manifold. Unlike many other means, the Fréchet mean is defined on a space whose elements cannot
necessarily be added together or multiplied by scalars. It is sometimes also known as the Karcher mean.
Mean 33

Other means
• Arithmetic-geometric mean
• Arithmetic-harmonic mean
• Cesàro mean
• Chisini mean
• Contraharmonic mean
• Elementary symmetric mean
• Geometric-harmonic mean
• Heinz mean
• Heronian mean
• Identric mean
• Lehmer mean
• Logarithmic mean
• Median
• Moving average
• Root mean square
• Stolarsky mean
• Weighted geometric mean
• Weighted harmonic mean
• Rényi's entropy (a generalized f-mean)

Properties
All means share some properties and additional properties are shared by the most common means. Some of these
properties are collected here.

Weighted mean
A weighted mean M is a function which maps tuples of positive numbers to a positive number

such that the following properties hold:


• "Fixed point": M(1,1,...,1) = 1
• Homogeneity: M(λ x1, ..., λ xn) = λ M(x1, ..., xn) for all λ and xi. In vector notation: M(λ x) = λ Mx for all n-vectors
x.
• Monotonicity: If xi ≤ yi for each i, then Mx ≤ My
It follows
• Boundedness: min x ≤ Mx ≤ max x
• Continuity:
• There are means which are not differentiable. For instance, the maximum number of a tuple is considered a mean
(as an extreme case of the power mean, or as a special case of a median), but is not differentiable.
• All means listed above, with the exception of most of the Generalized f-means, satisfy the presented properties.
• If f is bijective, then the generalized f-mean satisfies the fixed point property.
• If f is strictly monotonic, then the generalized f-mean satisfy also the monotony property.
• In general a generalized f-mean will miss homogeneity.
The above properties imply techniques to construct more complex means:
If C, M1, ..., Mm are weighted means and p is a positive real number, then A and B defined by
Mean 34

are also weighted means.

Unweighted mean
Intuitively spoken, an unweighted mean is a weighted mean with equal weights. Since our definition of weighted
mean above does not expose particular weights, equal weights must be asserted by a different way. A different view
on homogeneous weighting is, that the inputs can be swapped without altering the result.
Thus we define M to be an unweighted mean if it is a weighted mean and for each permutation π of inputs, the result
is the same.
Symmetry: Mx = M(πx) for all n-tuples π and permutations π on n-tuples.
Analogously to the weighted means, if C is a weighted mean and M1, ..., Mm are unweighted means and p is a
positive real number, then A and B defined by

are also unweighted means.

Convert unweighted mean to weighted mean


An unweighted mean can be turned into a weighted mean by repeating elements. This connection can also be used to
state that a mean is the weighted version of an unweighted mean. Say you have the unweighted mean M and weight
the numbers by natural numbers . (If the numbers are rational, then multiply them with the least
common denominator.) Then the corresponding weighted mean A is obtained by

Means of tuples of different sizes


If a mean M is defined for tuples of several sizes, then one also expects that the mean of a tuple is bounded by the
means of partitions. More precisely
• Given an arbitrary tuple x, which is partitioned into y1, ..., yk, then

(See Convex hull.)

Population and sample means


The mean of a population has an expected value of μ, known as the population mean. The sample mean makes a
good estimator of the population mean, as its expected value is the same as the population mean. The sample mean
of a population is a random variable, not a constant, and consequently it will have its own distribution. For a random
sample of n observations from a normally distributed population, the sample mean distribution is

Often, since the population variance is an unknown parameter, it is estimated by the mean sum of squares, which
changes the distribution of the sample mean from a normal distribution to a Student's t distribution with n − 1
degrees of freedom.
Mean 35

See also
• Average, same as central tendency
• Weighted mean
• Descriptive statistics
• Kurtosis
• Median
• Mode (statistics)
• Summary statistics
• Law of averages
• Spherical mean
• Algorithms for calculating mean and variance
• For an independent identical distribution from the reals, the mean of a sample is an unbiased estimator for the
mean of the population.

References
[1] IntroSTAT by L. G. Underhill and Dave Bradfield, p. 181 (http:/ / books. google. com/ books?id=f6TlVjrSAsgC& lpg=PP1&
pg=PA181#v=onepage& q& f=false)
[2] Elementary Statistics by Robert R. Johnson and Patricia J. Kuby, p. 279 (http:/ / books. google. com/ books?id=DWCAh7jWO98C&
lpg=PP1& pg=PA279#v=onepage& q& f=false)
[3] Schaum's Outline of Theory and Problems of Probability by By Seymour Lipschutz and Marc Lipson, p. 141 (http:/ / books. google. com/
books?id=ZKdqlw2ZnAMC& lpg=PP1& pg=PA141#v=onepage& q& f=false)

• Hardy, G.H.; Littlewood, J.E.; Pólya, G. (1988), Inequalities (2nd ed.), Cambridge University Press,
ISBN 978-0521358804

External links
• Comparison between arithmetic and geometric mean of two numbers (http://www.sengpielaudio.com/
calculator-geommean.htm)
Statistical population 36

Statistical population
In statistics, a statistical population is a set of entities concerning which statistical inferences are to be drawn, often
based on a random sample taken from the population. For example, if we were interested in generalizations about
crows, then we would describe the set of crows that is of interest. Notice that if we choose a population like all
crows, we will be limited to observing crows that exist now or will exist in the future. Probably, geography will also
constitute a limitation in that our resources for studying crows are also limited.
Population is also used to refer to a set of potential measurements or values, including not only cases actually
observed but those that are potentially observable. Suppose, for example, we are interested in the set of all adult
crows now alive in the county of Cambridgeshire, and we want to know the mean weight of these birds. For each
bird in the population of crows there is a weight, and the set of these weights is called the population of weights.

Subpopulation
A subset of a population is called a subpopulation. If different subpopulations have different properties, they can
often be better understood if they are first separated into distinct subpopulations.
For instance, a particular medicine may have different effects on different subpopulations, and its effects may be
obscured or dismissed if the subpopulation is not identified and examined in isolation.
Similarly, one can often estimate parameters more accurately if one separates out subpopulations: distribution of
heights among people is better modeled by considering men and women as separate subpopulations, for instance.
Populations consisting of subpopulations can be modeled by mixture models, which combine the distributions within
subpopulations into an overall population distribution.

See also
• Population
• Sample (statistics)
• Sampling (statistics)

External links
• Statistical Terms Made Simple [1]

References
[1] http:/ / www. socialresearchmethods. net/ kb/ sampstat. htm
Sampling (statistics) 37

Sampling (statistics)
Sampling is that part of statistical practice concerned with the selection of an unbiased or random subset of
individual observations within a population of individuals intended to yield some knowledge about the population of
concern, especially for the purposes of making predictions based on statistical inference. Sampling is an important
aspect of data collection.
Researchers rarely survey the entire population for two reasons (Adèr, Mellenbergh, & Hand, 2008): the cost is too
high, and the population is dynamic in that the individuals making up the population may change over time. The
three main advantages of sampling are that the cost is lower, data collection is faster, and since the data set is smaller
it is possible to ensure homogeneity and to improve the accuracy and quality of the data.
Each observation measures one or more properties (such as weight, location, color) of observable bodies
distinguished as independent objects or individuals. In survey sampling, survey weights can be applied to the data to
adjust for the sample design. Results from probability theory and statistical theory are employed to guide practice. In
business and medical research, sampling is widely used for gathering information about a population.[1]

Process
The sampling process comprises several stages:
• Defining the population of concern
• Specifying a sampling frame, a set of items or events possible to measure
• Specifying a sampling method for selecting items or events from the frame
• Determining the sample size
• Implementing the sampling plan
• Sampling and data collecting

Population definition
Successful statistical practice is based on focused problem definition. In sampling, this includes defining the
population from which our sample is drawn. A population can be defined as including all people or items with the
characteristic one wishes to understand. Because there is very rarely enough time or money to gather information
from everyone or everything in a population, the goal becomes finding a representative sample (or subset) of that
population.
Sometimes that which defines a population is obvious. For example, a manufacturer needs to decide whether a batch
of material from production is of high enough quality to be released to the customer, or should be sentenced for scrap
or rework due to poor quality. In this case, the batch is the population.
Although the population of interest often consists of physical objects, sometimes we need to sample over time,
space, or some combination of these dimensions. For instance, an investigation of supermarket staffing could
examine checkout line length at various times, or a study on endangered penguins might aim to understand their
usage of various hunting grounds over time. For the time dimension, the focus may be on periods or discrete
occasions.
In other cases, our 'population' may be even less tangible. For example, Joseph Jagger studied the behaviour of
roulette wheels at a casino in Monte Carlo, and used this to identify a biased wheel. In this case, the 'population'
Jagger wanted to investigate was the overall behaviour of the wheel (i.e. the probability distribution of its results
over infinitely many trials), while his 'sample' was formed from observed results from that wheel. Similar
considerations arise when taking repeated measurements of some physical characteristic such as the electrical
conductivity of copper.
Sampling (statistics) 38

This situation often arises when we seek knowledge about the cause system of which the observed population is an
outcome. In such cases, sampling theory may treat the observed population as a sample from a larger
'superpopulation'. For example, a researcher might study the success rate of a new 'quit smoking' program on a test
group of 100 patients, in order to predict the effects of the program if it were made available nationwide. Here the
superpopulation is "everybody in the country, given access to this treatment" - a group which does not yet exist,
since the program isn't yet available to all.
Note also that the population from which the sample is drawn may not be the same as the population about which we
actually want information. Often there is large but not complete overlap between these two groups due to frame
issues etc. (see below). Sometimes they may be entirely separate - for instance, we might study rats in order to get a
better understanding of human health, or we might study records from people born in 2008 in order to make
predictions about people born in 2009.
Time spent in making the sampled population and population of concern precise is often well spent, because it raises
many issues, ambiguities and questions that would otherwise have been overlooked at this stage.

Sampling frame
In the most straightforward case, such as the sentencing of a batch of material from production (acceptance sampling
by lots), it is possible to identify and measure every single item in the population and to include any one of them in
our sample. However, in the more general case this is not possible. There is no way to identify all rats in the set of all
rats. Where voting is not compulsory, there is no way to identify which people will actually vote at a forthcoming
election (in advance of the election).
These imprecise populations are not amenable to sampling in any of the ways below and to which we could apply
statistical theory.
As a remedy, we seek a sampling frame which has the property that we can identify every single element and include
any in our sample.[1] The most straightforward type of frame is a list of elements of the population (preferably the
entire population) with appropriate contact information. For example, in an opinion poll, possible sampling frames
include:
• Electoral register
• Telephone directory
Not all frames explicitly list population elements. For example, a street map can be used as a frame for a
door-to-door survey; although it doesn't show individual houses, we can select streets from the map and then visit all
houses on those streets. (One advantage of such a frame is that it would include people who have recently moved and
are not yet on the list frames discussed above.)
The sampling frame must be representative of the population and this is a question outside the scope of statistical
theory demanding the judgment of experts in the particular subject matter being studied. All the above frames omit
some people who will vote at the next election and contain some people who will not; some frames will contain
multiple records for the same person. People not in the frame have no prospect of being sampled. Statistical theory
tells us about the uncertainties in extrapolating from a sample to the frame. In extrapolating from frame to
population, its role is motivational and suggestive.
To the scientist, however, representative sampling is the only justified procedure for choosing individual
objects for use as the basis of generalization, and is therefore usually the only acceptable basis for ascertaining
truth.
—Andrew A. Marino[2]
It is important to understand this difference to steer clear of confusing prescriptions found in many web pages.
In defining the frame, practical, economic, ethical, and technical issues need to be addressed. The need to obtain
timely results may prevent extending the frame far into the future.
Sampling (statistics) 39

The difficulties can be extreme when the population and frame are disjoint. This is a particular problem in
forecasting where inferences about the future are made from historical data. In fact, in 1703, when Jacob Bernoulli
proposed to Gottfried Leibniz the possibility of using historical mortality data to predict the probability of early
death of a living man, Gottfried Leibniz recognized the problem in replying:
Nature has established patterns originating in the return of events but only for the most part. New illnesses
flood the human race, so that no matter how many experiments you have done on corpses, you have not
thereby imposed a limit on the nature of events so that in the future they could not vary.
—Gottfried Leibniz
Kish posited four basic problems of sampling frames:
1. Missing elements: Some members of the population are not included in the frame.
2. Foreign elements: The non-members of the population are included in the frame.
3. Duplicate entries: A member of the population is surveyed more than once.
4. Groups or clusters: The frame lists clusters instead of individuals.
A frame may also provide additional 'auxiliary information' about its elements; when this information is related to
variables or groups of interest, it may be used to improve survey design. For instance, an electoral register might
include name and sex; this information can be used to ensure that a sample taken from that frame covers all
demographic categories of interest. (Sometimes the auxiliary information is less explicit; for instance, a telephone
number may provide some information about location.)
Having established the frame, there are a number of ways for organizing it to improve efficiency and effectiveness.
It's at this stage that the researcher should decide whether the sample is in fact to be the whole population and would
therefore be a census.

Probability and nonprobability sampling


A probability sampling scheme is one in which every unit in the population has a chance (greater than zero) of
being selected in the sample, and this probability can be accurately determined. The combination of these traits
makes it possible to produce unbiased estimates of population totals, by weighting sampled units according to their
probability of selection.
Example: We want to estimate the total income of adults living in a given street. We visit each household in
that street, identify all adults living there, and randomly select one adult from each household. (For example,
we can allocate each person a random number, generated from a uniform distribution between 0 and 1, and
select the person with the highest number in each household). We then interview the selected person and find
their income.
People living on their own are certain to be selected, so we simply add their income to our estimate of the
total. But a person living in a household of two adults has only a one-in-two chance of selection. To reflect
this, when we come to such a household, we would count the selected person's income twice towards the total.
(In effect, the person who is selected from that household is taken as representing the person who isn't
selected.)
In the above example, not everybody has the same probability of selection; what makes it a probability sample is the
fact that each person's probability is known. When every element in the population does have the same probability of
selection, this is known as an 'equal probability of selection' (EPS) design. Such designs are also referred to as
'self-weighting' because all sampled units are given the same weight.
Probability sampling includes: Simple Random Sampling, Systematic Sampling, Stratified Sampling, Probability
Proportional to Size Sampling, and Cluster or Multistage Sampling. These various ways of probability sampling
have two things in common:
Sampling (statistics) 40

1. Every element has a known nonzero probability of being sampled and


2. involves random selection at some point.
Nonprobability sampling is any sampling method where some elements of the population have no chance of
selection (these are sometimes referred to as 'out of coverage'/'undercovered'), or where the probability of selection
can't be accurately determined. It involves the selection of elements based on assumptions regarding the population
of interest, which forms the criteria for selection. Hence, because the selection of elements is nonrandom,
nonprobability sampling does not allow the estimation of sampling errors. These conditions give rise to exclusion
bias, placing limits on how much information a sample can provide about the population. Information about the
relationship between sample and population is limited, making it difficult to extrapolate from the sample to the
population.
Example: We visit every household in a given street, and interview the first person to answer the door.
In any household with more than one occupant, this is a nonprobability sample, because some people
are more likely to answer the door (e.g. an unemployed person who spends most of their time at home is
more likely to answer than an employed housemate who might be at work when the interviewer calls)
and it's not practical to calculate these probabilities.
Nonprobability Sampling includes: Accidental Sampling, Quota Sampling and Purposive Sampling. In addition,
nonresponse effects may turn any probability design into a nonprobability design if the characteristics of
nonresponse are not well understood, since nonresponse effectively modifies each element's probability of being
sampled.

Sampling methods
Within any of the types of frame identified above, a variety of sampling methods can be employed, individually or in
combination. Factors commonly influencing the choice between these designs include:
• Nature and quality of the frame
• Availability of auxiliary information about units on the frame
• Accuracy requirements, and the need to measure accuracy
• Whether detailed analysis of the sample is expected
• Cost/operational concerns

Simple random sampling


In a simple random sample ('SRS') of a given size, all such subsets of the frame are given an equal probability. Each
element of the frame thus has an equal probability of selection: the frame is not subdivided or partitioned.
Furthermore, any given pair of elements has the same chance of selection as any other such pair (and similarly for
triples, and so on). This minimises bias and simplifies analysis of results. In particular, the variance between
individual results within the sample is a good indicator of variance in the overall population, which makes it
relatively easy to estimate the accuracy of results.
However, SRS can be vulnerable to sampling error because the randomness of the selection may result in a sample
that doesn't reflect the makeup of the population. For instance, a simple random sample of ten people from a given
country will on average produce five men and five women, but any given trial is likely to overrepresent one sex and
underrepresent the other. Systematic and stratified techniques, discussed below, attempt to overcome this problem by
using information about the population to choose a more representative sample.
SRS may also be cumbersome and tedious when sampling from an unusually large target population. In some cases,
investigators are interested in research questions specific to subgroups of the population. For example, researchers
might be interested in examining whether cognitive ability as a predictor of job performance is equally applicable
across racial groups. SRS cannot accommodate the needs of researchers in this situation because it does not provide
Sampling (statistics) 41

subsamples of the population. Stratified sampling, which is discussed below, addresses this weakness of SRS.
Simple random sampling is always an EPS design, but not all EPS designs are simple random sampling.

Systematic sampling
Systematic sampling relies on arranging the target population according to some ordering scheme and then selecting
elements at regular intervals through that ordered list. Systematic sampling involves a random start and then
proceeds with the selection of every kth element from then onwards. In this case, k=(population size/sample size). It
is important that the starting point is not automatically the first in the list, but is instead randomly chosen from within
the first to the kth element in the list. A simple example would be to select every 10th name from the telephone
directory (an 'every 10th' sample, also referred to as 'sampling with a skip of 10').
As long as the starting point is randomized, systematic sampling is a type of probability sampling. It is easy to
implement and the stratification induced can make it efficient, if the variable by which the list is ordered is correlated
with the variable of interest. 'Every 10th' sampling is especially useful for efficient sampling from databases.
Example: Suppose we wish to sample people from a long street that starts in a poor district (house #1)
and ends in an expensive district (house #1000). A simple random selection of addresses from this street
could easily end up with too many from the high end and too few from the low end (or vice versa),
leading to an unrepresentative sample. Selecting (e.g.) every 10th street number along the street ensures
that the sample is spread evenly along the length of the street, representing all of these districts. (Note
that if we always start at house #1 and end at #991, the sample is slightly biased towards the low end;
by randomly selecting the start between #1 and #10, this bias is eliminated.)
However, systematic sampling is especially vulnerable to periodicities in the list. If periodicity is present and the
period is a multiple or factor of the interval used, the sample is especially likely to be unrepresentative of the overall
population, making the scheme less accurate than simple random sampling.
Example: Consider a street where the odd-numbered houses are all on the north (expensive) side of the
road, and the even-numbered houses are all on the south (cheap) side. Under the sampling scheme
given above, it is impossible' to get a representative sample; either the houses sampled will all be from
the odd-numbered, expensive side, or they will all be from the even-numbered, cheap side.
Another drawback of systematic sampling is that even in scenarios where it is more accurate than SRS, its theoretical
properties make it difficult to quantify that accuracy. (In the two examples of systematic sampling that are given
above, much of the potential sampling error is due to variation between neighbouring houses - but because this
method never selects two neighbouring houses, the sample will not give us any information on that variation.)
As described above, systematic sampling is an EPS method, because all elements have the same probability of
selection (in the example given, one in ten). It is not 'simple random sampling' because different subsets of the same
size have different selection probabilities - e.g. the set {4,14,24,...,994} has a one-in-ten probability of selection, but
the set {4,13,24,34,...} has zero probability of selection.
Systematic sampling can also be adapted to a non-EPS approach; for an example, see discussion of PPS samples
below.
Sampling (statistics) 42

Stratified sampling
Where the population embraces a number of distinct categories, the frame can be organized by these categories into
separate "strata." Each stratum is then sampled as an independent sub-population, out of which individual elements
can be randomly selected.[3] There are many several potential benefits to stratified sampling.
First, dividing the population into distinct, independent strata can enable researchers to draw inferences about
specific subgroups that may be lost in a more generalized random sample.
Second, utilizing a stratified sampling method can lead to more efficient statistical estimates (provided that strata are
selected based upon relevance to the criterion in question, instead of availability of the samples). Even if a stratified
sampling approach does not lead to increased statistical efficiency, such a tactic will not result in less efficiency than
would simple random sampling, provided that each stratum is proportional to the group’s size in the population.
Third, it is sometimes the case that data are more readily available for individual, pre-existing strata within a
population than for the overall population; in such cases, using a stratified sampling approach may be more
convenient than aggregating data across groups (though this may potentially be at odds with the previously noted
importance of utilizing criterion-relevant strata).
Finally, since each stratum is treated as an independent population, different sampling approaches can be applied to
different strata, potentially enabling researchers to use the approach best suited (or most cost-effective) for each
identified subgroup within the population.
There are, however, some potential drawbacks to using stratified sampling. First, identifying strata and implementing
such an approach can increase the cost and complexity of sample selection, as well as leading to increased
complexity of population estimates. Second, when examining multiple criteria, stratifying variables may be related to
some, but not to others, further complicating the design, and potentially reducing the utility of the strata. Finally, in
some cases (such as designs with a large number of strata, or those with a specified minimum sample size per
group), stratified sampling can potentially require a larger sample than would other methods (although in most cases,
the required sample size would be no larger than would be required for simple random sampling.
A stratified sampling approach is most effective when three conditions are met
1. Variability within strata are minimized
2. Variability between strata are maximized
3. The variables upon which the population is stratified are strongly correlated with the desired dependent variable.
Advantages over other sampling methods
1. Focuses on important subpopulations and ignores irrelevant ones.
2. Allows use of different sampling techniques for different subpopulations.
3. Improves the accuracy/efficiency of estimation.
4. Permits greater balancing of statistical power of tests of differences between strata by sampling equal numbers
from strata varying widely in size.
Disadvantages
1. Requires selection of relevant stratification variables which can be difficult.
2. Is not useful when there are no homogeneous subgroups.
3. Can be expensive to implement.
Poststratification
Stratification is sometimes introduced after the sampling phase in a process called "poststratification".[3] This
approach is typically implemented due to a lack of prior knowledge of an appropriate stratifying variable or when the
experimenter lacks the necessary information to create a stratifying variable during the sampling phase. Although the
method is susceptible to the pitfalls of post hoc approaches, it can provide several benefits in the right situation.
Implementation usually follows a simple random sample. In addition to allowing for stratification on an ancillary
Sampling (statistics) 43

variable, poststratification can be used to implement weighting, which can improve the precision of a sample's
estimates.[3]
Oversampling
Choice-based sampling is one of the stratified sampling strategies. In choice-based sampling,[4] the data are stratified
on the target and a sample is taken from each strata so that the rare target class will be more represented in the
sample. The model is then built on this biased sample. The effects of the input variables on the target are often
estimated with more precision with the choice-based sample even when a smaller overall sample size is taken,
compared to a random sample. The results usually must be adjusted to correct for the oversampling.

Probability proportional to size sampling


In some cases the sample designer has access to an "auxiliary variable" or "size measure", believed to be correlated
to the variable of interest, for each element in the population. This data can be used to improve accuracy in sample
design. One option is to use the auxiliary variable as a basis for stratification, as discussed above.
Another option is probability-proportional-to-size ('PPS') sampling, in which the selection probability for each
element is set to be proportional to its size measure, up to a maximum of 1. In a simple PPS design, these selection
probabilities can then be used as the basis for Poisson sampling. However, this has the drawbacks of variable sample
size, and different portions of the population may still be over- or under-represented due to chance variation in
selections. To address this problem, PPS may be combined with a systematic approach.
Example: Suppose we have six schools with populations of 150, 180, 200, 220, 260, and 490 students
respectively (total 1500 students), and we want to use student population as the basis for a PPS sample
of size three. To do this, we could allocate the first school numbers 1 to 150, the second school 151 to
330 (= 150 + 180), the third school 331 to 530, and so on to the last school (1011 to 1500). We then
generate a random start between 1 and 500 (equal to 1500/3) and count through the school populations
by multiples of 500. If our random start was 137, we would select the schools which have been allocated
numbers 137, 637, and 1137, i.e. the first, fourth, and sixth schools.
The PPS approach can improve accuracy for a given sample size by concentrating sample on large elements that
have the greatest impact on population estimates. PPS sampling is commonly used for surveys of businesses, where
element size varies greatly and auxiliary information is often available - for instance, a survey attempting to measure
the number of guest-nights spent in hotels might use each hotel's number of rooms as an auxiliary variable. In some
cases, an older measurement of the variable of interest can be used as an auxiliary variable when attempting to
produce more current estimates.

Cluster sampling
Sometimes it is cheaper to 'cluster' the sample in some way e.g. by selecting respondents from certain areas only, or
certain time-periods only. (Nearly all samples are in some sense 'clustered' in time - although this is rarely taken into
account in the analysis.)
Cluster sampling is an example of 'two-stage sampling' or 'multistage sampling': in the first stage a sample of areas is
chosen; in the second stage a sample of respondents within those areas is selected.
This can reduce travel and other administrative costs. It also means that one does not need a sampling frame listing
all elements in the target population. Instead, clusters can be chosen from a cluster-level frame, with an
element-level frame created only for the selected clusters. Cluster sampling generally increases the variability of
sample estimates above that of simple random sampling, depending on how the clusters differ between themselves,
as compared with the within-cluster variation.
Nevertheless, some of the disadvantages of cluster sampling are the reliance of sample estimate precision on the
actual clusters chosen. If clusters chosen are biased in a certain way, inferences drawn about population parameters
Sampling (statistics) 44

from these sample estimates will be far off from being accurate.
Multistage sampling Multistage sampling is a complex form of cluster sampling in which two or more levels of
units are embedded one in the other. The first stage consists of constructing the clusters that will be used to sample
from. In the second stage, a sample of primary units is randomly selected from each cluster (rather than using all
units contained in all selected clusters). In following stages, in each of those selected clusters, additional samples of
units are selected, and so on. All ultimate units (individuals, for instance) selected at the last step of this procedure
are then surveyed.
This technique, thus, is essentially the process of taking random samples of preceding random samples. It is not as
effective as true random sampling, but it probably solves more of the problems inherent to random sampling.
Moreover, It is an effective strategy because it banks on multiple randomizations. As such, it is extremely useful.
Multistage sampling is used frequently when a complete list of all members of the population does not exist and is
inappropriate. Moreover, by avoiding the use of all sample units in all selected clusters, multistage sampling avoids
the large, and perhaps unnecessary, costs associated traditional cluster sampling.

Matched random sampling


A method of assigning participants to groups in which pairs of participants are first matched on some characteristic
and then individually assigned randomly to groups.[5]
The procedure for matched random sampling can be briefed with the following contexts,
1. Two samples in which the members are clearly paired, or are matched explicitly by the researcher. For example,
IQ measurements or pairs of identical twins.
2. Those samples in which the same attribute, or variable, is measured twice on each subject, under different
circumstances. Commonly called repeated measures. Examples include the times of a group of athletes for 1500m
before and after a week of special training; the milk yields of cows before and after being fed a particular diet.

Quota sampling
In quota sampling, the population is first segmented into mutually exclusive sub-groups, just as in stratified
sampling. Then judgment is used to select the subjects or units from each segment based on a specified proportion.
For example, an interviewer may be told to sample 200 females and 300 males between the age of 45 and 60.
It is this second step which makes the technique one of non-probability sampling. In quota sampling the selection of
the sample is non-random. For example interviewers might be tempted to interview those who look most helpful.
The problem is that these samples may be biased because not everyone gets a chance of selection. This random
element is its greatest weakness and quota versus probability has been a matter of controversy for many years

Convenience sampling
Convenience sampling (sometimes known as grab or opportunity sampling) is a type of nonprobability sampling
which involves the sample being drawn from that part of the population which is close to hand. That is, a sample
population selected because it is readily available and convenient. The researcher using such a sample cannot
scientifically make generalizations about the total population from this sample because it would not be representative
enough. For example, if the interviewer was to conduct such a survey at a shopping center early in the morning on a
given day, the people that he/she could interview would be limited to those given there at that given time, which
would not represent the views of other members of society in such an area, if the survey was to be conducted at
different times of day and several times per week. This type of sampling is most useful for pilot testing. Several
important considerations for researchers using convenience samples include:
1. Are there controls within the research design or experiment which can serve to lessen the impact of a
non-random, convenience sample whereby ensuring the results will be more representative of the population?
Sampling (statistics) 45

2. Is there good reason to believe that a particular convenience sample would or should respond or behave
differently than a random sample from the same population?
3. Is the question being asked by the research one that can adequately be answered using a convenience sample?
In social science research, snowball sampling is a similar technique, where existing study subjects are used to recruit
more subjects into the sample.

Line-intercept sampling
Line-intercept sampling is a method of sampling elements in a region whereby an element is sampled if a chosen
line segment, called a “transect”, intersects the element.

Panel sampling
Panel sampling is the method of first selecting a group of participants through a random sampling method and then
asking that group for the same information again several times over a period of time. Therefore, each participant is
given the same survey or interview at two or more time points; each period of data collection is called a "wave". This
sampling methodology is often chosen for large scale or nation-wide studies in order to gauge changes in the
population with regard to any number of variables from chronic illness to job stress to weekly food expenditures.
Panel sampling can also be used to inform researchers about within-person health changes due to age or help explain
changes in continuous dependent variables such as spousal interaction. There have been several proposed methods of
analyzing panel sample data, including MANOVA, growth curves, and structural equation modeling with lagged
effects. For a more thorough look at analytical techniques for panel data, see Johnson (1995).

Event sampling methodology


Event sampling methodology (ESM) is a new form of sampling method that allows researchers to study ongoing
experiences and events that vary across and within days in its naturally-occurring environment. Because of the
frequent sampling of events inherent in ESM, it enables researchers to measure the typology of activity and detect
the temporal and dynamic fluctuations of work experiences. Popularity of ESM as a new form of research design
increased over the recent years because it addresses the shortcomings of cross-sectional research, where once unable
to, researchers can now detect intra-individual variances across time. In ESM, participants are asked to record their
experiences and perceptions in a paper or electronic diary.
There are three types of ESM:
1. Signal contingent – random beeping notifies participants to record data. The advantage of this type of ESM is
minimization of recall bias.
2. Event contingent – records data when certain events occur
3. Interval contingent – records data according to the passing of a certain period of time
ESM has several disadvantages. One of the disadvantages of ESM is it can sometimes be perceived as invasive and
intrusive by participants. ESM also leads to possible self-selection bias. It may be that only certain types of
individuals are willing to participate in this type of study creating a non-random sample. Another concern is related
to participant cooperation. Participants may not be actually fill out their diaries at the specified times. Furthermore,
ESM may substantively change the phenomenon being studied. Reactivity or priming effects may occur, such that
repeated measurement may cause changes in the participants' experiences. This method of sampling data is also
highly vulnerable to common method variance.[6]
Further, it is important to think about whether or not an appropriate dependent variable is being used in an ESM
design. For example, it might be logical to use ESM in order to answer research questions which involve dependent
variables with a great deal of variation throughout the day. Thus, variables such as change in mood, change in stress
level, or the immediate impact of particular events may be best studied using ESM methodology. However, it is not
Sampling (statistics) 46

likely that utilizing ESM will yield meaningful predictions when measuring someone performing a repetitive task
throughout the day or when dependent variables are long-term in nature (coronary heart problems).

Replacement of selected units


Sampling schemes may be without replacement ('WOR' - no element can be selected more than once in the same
sample) or with replacement ('WR' - an element may appear multiple times in the one sample). For example, if we
catch fish, measure them, and immediately return them to the water before continuing with the sample, this is a WR
design, because we might end up catching and measuring the same fish more than once. However, if we do not
return the fish to the water (e.g. if we eat the fish), this becomes a WOR design.

Sample size
Formulas, tables, and power function charts are well known approaches to determine sample size.

Formulas
Where the frame and population are identical, statistical theory yields exact recommendations on sample size.[7]
However, where it is not straightforward to define a frame representative of the population, it is more important to
understand the cause system of which the population are outcomes and to ensure that all sources of variation are
embraced in the frame. Large number of observations are of no value if major sources of variation are neglected in
the study. In other words, it is taking a sample group that matches the survey category and is easy to survey. Bartlett,
Kotrlik, and Higgins (2001) published a paper titled Organizational Research: Determining Appropriate Sample Size
in Survey Research Information Technology, Learning, and Performance Journal[8] that provides an explanation of
Cochran’s (1977) formulas. A discussion and illustration of sample size formulas, including the formula for adjusting
the sample size for smaller populations, is included. A table is provided that can be used to select the sample size for
a research problem based on three alpha levels and a set error rate.

Steps for using sample size tables


1. Postulate the effect size of interest, α, and β.
2. Check sample size table[9]
1. Select the table corresponding to the selected α
2. Locate the row corresponding to the desired power
3. Locate the column corresponding to the estimated effect size.
4. The intersection of the column and row is the minimum sample size required.

Sampling and data collection


Good data collection involves:
• Following the defined sampling process
• Keeping the data in time order
• Noting comments and other contextual events
• Recording non-responses
Most sampling books and papers written by non-statisticians focus only in the data collection aspect, which is just a
small though important part of the sampling process.
Sampling (statistics) 47

Errors in sample surveys


Survey results are typically subject to some error. Total errors can be classified into sampling errors and
non-sampling errors. The term "error" here includes systematic biases as well as random errors.

Sampling errors and biases


Sampling errors and biases are induced by the sample design. They include:
1. Selection bias: When the true selection probabilities differ from those assumed in calculating the results.
2. Random sampling error: Random variation in the results due to the elements in the sample being selected at
random.

Non-sampling error
Non-sampling errors are caused by other problems in data collection and processing. They include:
1. Overcoverage: Inclusion of data from outside of the population.
2. Undercoverage: Sampling frame does not include elements in the population.
3. Measurement error: E.g. when respondents misunderstand a question, or find it difficult to answer.
4. Processing error: Mistakes in data coding.
5. Non-response: Failure to obtain complete data from all selected individuals.
After sampling, a review should be held of the exact process followed in sampling, rather than that intended, in order
to study any effects that any divergences might have on subsequent analysis. A particular problem is that of
non-response.
Two major types of nonresponse exist: unit nonresponse (referring to lack of completion of any part of the survey)
and item nonresponse (submission or participation in survey but failing to complete one or more
components/questions of the survey).[10] [11] In survey sampling, many of the individuals identified as part of the
sample may be unwilling to participate, not have the time to participate (opportunity cost),[12] or survey
administrators may not have been able to contact them. In this case, there is a risk of differences, between
respondents and nonrespondents, leading to biased estimates of population parameters. This is often addressed by
improving survey design, offering incentives, and conducting follow-up studies which make a repeated attempt to
contact the unresponsive and to characterize their similarities and differences with the rest of the frame.[13] The
effects can also be mitigated by weighting the data when population benchmarks are available or by imputing data
based on answers to other questions.
Nonresponse is particularly a problem in internet sampling. Reasons for this problem include improperly designed
surveys,[11] over-surveying (or survey fatigue),[14] [15] and the fact that potential participants hold multiple e-mail
addresses, which they don't use anymore or don't check regularly. Web-based surveys also tend to demonstrate
nonresponse bias; for example, studies have shown that females and those from a white/Caucasian background are
more likely to respond than their counterparts.[16]

Survey weights
In many situations the sample fraction may be varied by stratum and data will have to be weighted to correctly
represent the population. Thus for example, a simple random sample of individuals in the United Kingdom might
include some in remote Scottish islands who would be inordinately expensive to sample. A cheaper method would
be to use a stratified sample with urban and rural strata. The rural sample could be under-represented in the sample,
but weighted up appropriately in the analysis to compensate.
More generally, data should usually be weighted if the sample design does not give each individual an equal chance
of being selected. For instance, when households have equal selection probabilities but one person is interviewed
from within each household, this gives people from large households a smaller chance of being interviewed. This can
Sampling (statistics) 48

be accounted for using survey weights. Similarly, households with more than one telephone line have a greater
chance of being selected in a random digit dialing sample, and weights can adjust for this.
Weights can also serve other purposes, such as helping to correct for non-response.

History
Random sampling by using lots is an old idea, mentioned several times in the Bible. In 1786 Pierre Simon Laplace
estimated the population of France by using a sample, along with ratio estimator. He also computed probabilistic
estimates of the error. These were not expressed as modern confidence intervals but as the sample size that would be
needed to achieve a particular upper bound on the sampling error with probability 1000/1001. His estimates used
Bayes' theorem with a uniform prior probability and it assumed his sample was random. The theory of small-sample
statistics developed by William Sealy Gossett put the subject on a more rigorous basis in the 20th century. However,
the importance of random sampling was not universally appreciated and in the USA the 1936 Literary Digest
prediction of a Republican win in the presidential election went badly awry, due to severe bias [17]. More than two
million people responded to the study with their names obtained through magazine subscription lists and telephone
directories. It was not appreciated that these lists were heavily biased towards Republicans and the resulting sample,
though very large, was deeply flawed.

See also
• Acceptance sampling
• Data collection
• Official statistics
• Replication (statistics)
• Sample (statistics)
• Sample size rule of thumb for estimate of population mean
• Sampling (case studies)
• Sampling error
• Gy's sampling theory
• Horvitz–Thompson estimator

References
• Adèr, H. J., Mellenbergh, G. J., & Hand, D. J. (2008). Advising on research methods: A consultant's companion.
Huizen, The Netherlands: Johannes van Kessel Publishing.
• Bartlett, J. E., II, Kotrlik, J. W., & Higgins, C. (2001). Organizational research: Determining appropriate sample
size for survey research. Information Technology, Learning, and Performance Journal, 19(1) 43–50. [18]
• Chambers, R L, and Skinner, C J (editors) (2003), Analysis of Survey Data, Wiley, ISBN 0-471-89987-9
• Cochran, William G. (1977). Sampling Techniques (Third ed.). Wiley. ISBN 0-471-16240-X.
• Deming, W. Edwards (1975) On probability as a basis for action, The American Statistician, 29(4), pp146–152.
• Deming, W. Edwards (1966). Some Theory of Sampling. Dover Publications. ISBN 0-486-64684-X.
OCLC 166526.
• Gy, P (1992) Sampling of Heterogeneous and Dynamic Material Systems: Theories of Heterogeneity, Sampling
and Homogenizing
• Kish, Leslie (1995) Survey Sampling, Wiley, ISBN 0-471-10949-5
• Korn, E L, and Graubard, B I (1999) Analysis of Health Surveys, Wiley, ISBN 0-471-13773-1
• Lohr, Sharon L. (1999). Sampling: Design and Analysis. Duxbury. ISBN 0-534-35361-4.
Sampling (statistics) 49

• Pedhazur, E., & Schmelkin, L. (1991). Measurement design and analysis: An integrated approach. New York:
Psychology Press.
• Särndal, Carl-Erik, and Swensson, Bengt, and Wretman, Jan (1992). Model Assisted Survey Sampling.
Springer-Verlag. ISBN 0-387-40620-4.
• Stuart, Alan (1962) Basic Ideas of Scientific Sampling, Hafner Publishing Company, New York
• Smith, T. M. F. (1984). "Present Position and Potential Developments: Some Personal Views: Sample surveys"
[19]
. Journal of the Royal Statistical Society. Series A (General) 147 (The 150th Anniversary of the Royal
Statistical Society): 208–221. doi:10.2307/2981677. JSTOR 2981677
• Smith, T. M. F. (1993). "Populations and Selection: Limitations of Statistics (Presidential address)" [20]. Journal
of the Royal Statistical Society. Series A (Statistics in Society) 156 (2): 144–166. doi:10.2307/2982726.
JSTOR 2982726 (Portrait of T. M. F. Smith on page 144)
• Smith, T. M. F. (2001). "Biometrika centenary: Sample surveys" [21]. Biometrika 88, (1): 167–243.
doi:10.1093/biomet/88.1.167.
• Smith, T. M. F. (2001). "Biometrika centenary: Sample surveys". in D. M. Titterington and D. R. Cox.
Biometrika: One Hundred Years. Oxford University Press. pp. 165–194. ISBN 0-19-850993-6.
• Whittle, P. (May 1954). "Optimum preventative sampling" [22]. Journal of the Operations Research Society of
America 2 (2): 197–203.
• ASTM E105 Standard Practice for Probability Sampling Of Materials
• ASTM E122 Standard Practice for Calculating Sample Size to Estimate, With a Specified Tolerable Error, the
Average for Characteristic of a Lot or Process
• ASTM E141 Standard Practice for Acceptance of Evidence Based on the Results of Probability Sampling
• ASTM E1402 Standard Terminology Relating to Sampling
• ASTM E1994 Standard Practice for Use of Process Oriented AOQL and LTPD Sampling Plans
• ASTM E2234 Standard Practice for Sampling a Stream of Product by Attributes Indexedby AQL

External links
• Chapter on Sampling at the Research Methods Knowledge Base [23]
• Survey Sampling Methods at the SatPac survey software site [24]
• TRSL – Template Range Sampling Library [25] is a free-software and open-source C++ library that implements
several sampling schemes behind an (STL-like) iterator interface.
• Continuous Sampling vs. Costs - Electronics Industry Example [26]

References
[1] Ken Black (2004). Business Statistics for Contemporary Decision Making (Fourth (Wiley Student Edition for India) ed.). Wiley-India.
ISBN 9788126508099.
[2] Andrew A. Marino) Representative Sampling (http:/ / www. ortho. lsuhsc. edu/ Faculty/ Marino/ Point1/ Representative. html)
[3] Pedhazur & Schmelkin, 1991
[4] Scott, A.J., and Wild, C.J. (1986). Fitting logistic models under case-control or choice-based sampling. J. Roy. Statist. Soc. B, 48, 170–182.
[5] Brown, Cozby, Kee, & Worden, 1999, p.371).
[6] Alliger & Williams, 1993
[7] Mathematical details are displayed in the Sample size article.
[8] http:/ / www. osra. org/ itlpj/ bartlettkotrlikhiggins. pdf
[9] Cohen, 1988
[10] Berinsky, A. J. (2008). Survey non-response. In W. Donsbach & M. W. Traugott (Eds.), The SAGE handbook of public opinion research
(pp. 309-321). Thousand Oaks, CA: Sage Publications.
[11] Dillman, D. A., Eltinge, J. L., Groves, R. M., & Little, R. J. A. (2002). Survey nonresponse in design, data collection, and analysis. In R. M.
Groves, D. A. Dillman, J. L. Eltinge, & R. J. A. Little (Eds.), Survey nonresponse (pp. 3-26). New York: John Wiley & Sons.
[12] Dillman, D.A., Smyth, J.D., & Christian, L. M. (2009). Internet, mail, and mixed-mode surveys: The tailored design method. San Francisco:
Jossey-Bass.
Sampling (statistics) 50

[13] Vehovar, V., Batagelj, Z., Manfreda, K.L., & Zaletel, M. (2002). Nonresponse in web surveys. In R. M. Groves, D. A. Dillman, J. L.
Eltinge, & R. J. A. Little (Eds.), Survey nonresponse (pp. 229-242). New York: John Wiley & Sons.
[14] Porter, Whitcomb, Weitzer (2004) Multiple surveys of students and survey fatigue. In S. R. Porter (Ed.), Overcoming survey research
problems: Vol. 121. New directions for institutional research (pp. 63-74). San Francisco, CA: Jossey Bass.
[15] Groves et al., Survey Methodology (2004) book
[16] Sax, L. J., Gilmartin, S. K., & Bryant, A. N. (2003). Assessing response rates and nonresponse bias in web and paper surveys. Research in
Higher Education, 44(4), 409-432.
[17] http:/ / online. wsj. com/ public/ article/ SB115974322285279370-_rk13XDUHmIcnA8DYs5VUscZG94_20071001. html?mod=rss_free
[18] http:/ / www. osra. org/ itlpj/ bartlettkotrlikhiggins. pdf
[19] http:/ / www. jstor. org/ stable/ 2981677
[20] http:/ / www. jstor. org/ stable/ 2982726
[21] http:/ / biomet. oxfordjournals. org/ cgi/ content/ abstract/ 88/ 1/ 167
[22] http:/ / www. jstor. org/ stable/ 166605
[23] http:/ / www. socialresearchmethods. net/ kb/ sampling. php
[24] http:/ / www. statpac. com/ surveys/ sampling. htm
[25] http:/ / trsl. sourceforge. net/
[26] http:/ / inderscience. metapress. com/ openurl. asp?genre=article& eissn=1740-8857& volume=4& issue=4& spage=393

Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena.[1] The central
objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of
non-deterministic events or measured quantities that may either be single occurrences or evolve over time in an
apparently random fashion. Although an individual coin toss or the roll of a die is a random event, if repeated many
times the sequence of random events will exhibit certain statistical patterns, which can be studied and predicted. Two
representative mathematical results describing such patterns are the law of large numbers and the central limit
theorem.
As a mathematical foundation for statistics, probability theory is essential to many human activities that involve
quantitative analysis of large sets of data. Methods of probability theory also apply to descriptions of complex
systems given only partial knowledge of their state, as in statistical mechanics. A great discovery of twentieth
century physics was the probabilistic nature of physical phenomena at atomic scales, described in quantum
mechanics.

History
The mathematical theory of probability has its roots in attempts to analyze games of chance by Gerolamo Cardano in
the sixteenth century, and by Pierre de Fermat and Blaise Pascal in the seventeenth century (for example the
"problem of points"). Christiaan Huygens published a book on the subject in 1657.[2]
Initially, probability theory mainly considered discrete events, and its methods were mainly combinatorial.
Eventually, analytical considerations compelled the incorporation of continuous variables into the theory.
This culminated in modern probability theory, the foundations of which were laid by Andrey Nikolaevich
Kolmogorov. Kolmogorov combined the notion of sample space, introduced by Richard von Mises, and measure
theory and presented his axiom system for probability theory in 1933. Fairly quickly this became the mostly
undisputed axiomatic basis for modern probability theory but alternatives exist, in particular the adoption of finite
rather than countable additivity by Bruno de Finetti [3]
Probability theory 51

Treatment
Most introductions to probability theory treat discrete probability distributions and continuous probability
distributions separately. The more mathematically advanced measure theory based treatment of probability covers
both the discrete, the continuous, any mix of these two and more.

Discrete probability distributions


Discrete probability theory deals with events that occur in countable sample spaces.
Examples: Throwing dice, experiments with decks of cards, and random walk.
Classical definition: Initially the probability of an event to occur was defined as number of cases favorable for the
event, over the number of total outcomes possible in an equiprobable sample space: see Classical definition of
probability.

For example, if the event is "occurrence of an even number when a die is rolled", the probability is given by ,
since 3 faces out of the 6 have even numbers and each face has the same probability of appearing.
Modern definition: The modern definition starts with a finite or countable set called the sample space, which
relates to the set of all possible outcomes in classical sense, denoted by . It is then assumed that for each element
, an intrinsic "probability" value is attached, which satisfies the following properties:
1.

2.

That is, the probability function f(x) lies between zero and one for every value of x in the sample space Ω, and the
sum of f(x) over all values x in the sample space Ω is equal to 1. An event is defined as any subset of the sample
space . The probability of the event is defined as

So, the probability of the entire sample space is 1, and the probability of the null event is 0.
The function mapping a point in the sample space to the "probability" value is called a probability mass
function abbreviated as pmf. The modern definition does not try to answer how probability mass functions are
obtained; instead it builds a theory that assumes their existence.

Continuous probability distributions


Continuous probability theory deals with events that occur in a continuous sample space.
Classical definition: The classical definition breaks down when confronted with the continuous case. See Bertrand's
paradox.
Modern definition: If the outcome space of a random variable X is the set of real numbers ( ) or a subset thereof,
then a function called the cumulative distribution function (or cdf) exists, defined by .
That is, F(x) returns the probability that X will be less than or equal to x.
The cdf necessarily satisfies the following properties.
1. is a monotonically non-decreasing, right-continuous function;
2.
3.
If is absolutely continuous, i.e., its derivative exists and integrating the derivative gives us the cdf back again,
then the random variable X is said to have a probability density function or pdf or simply density
Probability theory 52

For a set , the probability of the random variable X being in is

In case the probability density function exists, this can be written as

Whereas the pdf exists only for continuous random variables, the cdf exists for all random variables (including
discrete random variables) that take values in
These concepts can be generalized for multidimensional cases on and other continuous sample spaces.

Measure-theoretic probability theory


The raison d'être of the measure-theoretic treatment of probability is that it unifies the discrete and the continuous
cases, and makes the difference a question of which measure is used. Furthermore, it covers distributions that are
neither discrete nor continuous nor mixtures of the two.
An example of such distributions could be a mix of discrete and continuous distributions, for example, a random
variable which is 0 with probability 1/2, and takes a random value from a normal distribution with probability 1/2. It
can still be studied to some extent by considering it to have a pdf of , where is the Dirac
delta function.
Other distributions may not even be a mix, for example, the Cantor distribution has no positive probability for any
single point, neither does it have a density. The modern approach to probability theory solves these problems using
measure theory to define the probability space:
Given any set , (also called sample space) and a σ-algebra on it, a measure defined on is called a
probability measure if
If is the Borel σ-algebra on the set of real numbers, then there is a unique probability measure on for any cdf,
and vice versa. The measure corresponding to a cdf is said to be induced by the cdf. This measure coincides with the
pmf for discrete variables, and pdf for continuous variables, making the measure-theoretic approach free of fallacies.
The probability of a set in the σ-algebra is defined as

where the integration is with respect to the measure induced by


Along with providing better understanding and unification of discrete and continuous probabilities,
measure-theoretic treatment also allows us to work on probabilities outside , as in the theory of stochastic
processes. For example to study Brownian motion, probability is defined on a space of functions.

Probability distributions
Certain random variables occur very often in probability theory because they well describe many natural or physical
processes. Their distributions therefore have gained special importance in probability theory. Some fundamental
discrete distributions are the discrete uniform, Bernoulli, binomial, negative binomial, Poisson and geometric
distributions. Important continuous distributions include the continuous uniform, normal, exponential, gamma and
beta distributions.
Probability theory 53

Convergence of random variables


In probability theory, there are several notions of convergence for random variables. They are listed below in the
order of strength, i.e., any subsequent notion of convergence in the list implies convergence according to all of the
preceding notions.
Weak convergence: A sequence of random variables converges weakly to the random
variable if their respective cumulative distribution functions converge to the cumulative
distribution function of , wherever is continuous. Weak convergence is also called convergence in
distribution.
Most common short hand notation:
Convergence in probability: The sequence of random variables is said to converge towards the
random variable in probability if for every ε > 0.
Most common short hand notation:
Strong convergence: The sequence of random variables is said to converge towards the random
variable strongly if . Strong convergence is also known as almost sure
convergence.
Most common short hand notation:
As the names indicate, weak convergence is weaker than strong convergence. In fact, strong convergence implies
convergence in probability, and convergence in probability implies weak convergence. The reverse statements are
not always true.

Law of large numbers


Common intuition suggests that if a fair coin is tossed many times, then roughly half of the time it will turn up
heads, and the other half it will turn up tails. Furthermore, the more often the coin is tossed, the more likely it should
be that the ratio of the number of heads to the number of tails will approach unity. Modern probability provides a
formal version of this intuitive idea, known as the law of large numbers. This law is remarkable because it is
nowhere assumed in the foundations of probability theory, but instead emerges out of these foundations as a
theorem. Since it links theoretically derived probabilities to their actual frequency of occurrence in the real world,
the law of large numbers is considered as a pillar in the history of statistical theory.[4]

The law of large numbers (LLN) states that the sample average of (independent

and identically distributed random variables with finite expectation ) converges towards the theoretical
expectation
It is in the different forms of convergence of random variables that separates the weak and the strong law of large
numbers

It follows from LLN that if an event of probability p is observed repeatedly during independent experiments, the
ratio of the observed frequency of that event to the total number of repetitions converges towards p.
Putting this in terms of random variables and LLN we have are independent Bernoulli random variables
taking values 1 with probability p and 0 with probability 1-p. for all i and it follows from LLN that

converges to p almost surely.


Probability theory 54

Central limit theorem


[5]
"The central limit theorem (CLT) is one of the great results of mathematics." (Chapter 18 in .) It explains the
ubiquitous occurrence of the normal distribution in nature.
The theorem states that the average of many independent and identically distributed random variables with finite
variance tends towards a normal distribution irrespective of the distribution followed by the original random
variables. Formally, let be independent random variables with mean and variance Then
the sequence of random variables

converges in distribution to a standard normal random variable.

See also
• Expected value and Variance • Notation in probability
• Fuzzy logic and Fuzzy measure theory • Predictive modelling
• Glossary of probability and statistics • Probabilistic logic - A combination of probability theory and logic
• Likelihood function • Probability
• List of probability topics • Probability axioms
• Catalog of articles in probability theory • Probability interpretations
• List of publications in statistics • Statistical independence
• List of statistical topics • Subjective logic
• Probabilistic proofs of non-probabilistic theorems

References
• Pierre Simon de Laplace (1812). Analytical Theory of Probability.
The first major treatise blending calculus with probability theory, originally in French: Théorie
Analytique des Probabilités.
• Andrei Nikolajevich Kolmogorov (1950). Foundations of the Theory of Probability.
The modern measure-theoretic foundation of probability theory; the original German version
(Grundbegriffe der Wahrscheinlichkeitrechnung) appeared in 1933.
• Patrick Billingsley (1979). Probability and Measure. New York, Toronto, London: John Wiley and Sons.
• Olav Kallenberg; Foundations of Modern Probability, 2nd ed. Springer Series in Statistics. (2002). 650 pp. ISBN
0-387-95313-2
• Henk Tijms (2004). Understanding Probability. Cambridge Univ. Press.
A lively introduction to probability theory for the beginner.
• Olav Kallenberg; Probabilistic Symmetries and Invariance Principles. Springer -Verlag, New York (2005). 510
pp. ISBN 0-387-25115-4
• Gut, Allan (2005). Probability: A Graduate Course. Springer-Verlag. ISBN 0387228330.
Probability theory 55

References
[1] Probability theory, Encyclopaedia Britannica (http:/ / www. britannica. com/ ebc/ article-9375936)
[2] Grinstead, Charles Miller; James Laurie Snell. "Introduction". Introduction to Probability. pp. vii.
[3] "The origins and legacy of Kolmogorov's Grundbegriffe", by Glenn Shafer and Vladimir Vovk (http:/ / www. probabilityandfinance. com/
articles/ 04. pdf)
[4] http:/ / www. leithner. com. au/ circulars/ circular17. htm
[5] David Williams, "Probability with martingales", Cambridge 1991/2008
Normal distribution 56

Normal distribution
Probability density function

The red line is the standard normal distribution


Cumulative distribution function

Colors match the image above


notation:
parameters: μ ∈ R — mean (location)
σ2 ≥ 0 — variance (squared scale)
support: x ∈ R   if σ2 > 0
x = μ   if σ2 = 0
pdf:

cdf:

mean: μ
median: μ
mode: μ
variance: σ2
skewness: 0
ex.kurtosis: 0
entropy:
mgf:

cf:

Fisher information:

In probability theory and statistics, the normal distribution, or Gaussian distribution, is an absolutely continuous
probability distribution whose cumulants of all orders above two are zero. The graph of the associated probability
density function is  “bell”-shaped, with peak at the mean, and is known as the Gaussian function or bell curve:[1]
Normal distribution 57

where parameters μ and σ 2 are the mean and the variance. The distribution with μ = 0 and σ 2 = 1 is called standard
normal.
The normal distribution is often used to describe, at least approximately, any variable that tends to cluster around the
mean. For example, the heights of adult males in the United States are roughly normally distributed, with a mean of
about 70 inches (1.8 m). Most men have a height close to the mean, though a small number of outliers have a height
significantly above or below the mean. A histogram of male heights will appear similar to a bell curve, with the
correspondence becoming closer if more data are used.
By the central limit theorem, under certain conditions the sum of a number of random variables with finite means
and variances approaches a normal distribution as the number of variables increases. For this reason, the normal
distribution is commonly encountered in practice, and is used throughout statistics, natural sciences, and social
sciences[2] as a simple model for complex phenomena. For example, the observational error in an experiment is
usually assumed to follow a normal distribution, and the propagation of uncertainty is computed using this
assumption.
The Gaussian distribution was named after Carl Friedrich Gauss, who introduced it in 1809 as a way of rationalizing
the method of least squares. One year later Laplace proved the first version of the central limit theorem,
demonstrating that the normal distribution occurs as a limiting distribution of arithmetic means of independent,
identically distributed random variables with finite second moment. For this reason the normal distribution is
sometimes called Laplacian, especially in French-speaking countries.

Definition
The simplest case of a normal distribution is known as the standard normal distribution, described by the probability
density function

The constant in this expression ensures that the total area under the curve ϕ(x) is equal to one,[proof] and 1⁄2 in the
exponent makes the  “width” of the curve (measured as half of the distance between the inflection points of the
curve) also equal to one. It is traditional[3] in statistics to denote this function with the Greek letter ϕ (phi), whereas
density functions for all other distributions are usually denoted with letters ƒ or p. The alternative glyph φ is also used
quite often, however within this article we reserve  “φ” to denote characteristic functions.
More generally, a normal distribution results from exponentiating a quadratic function (just as an exponential
distribution results from exponentiating a linear function):

This yields the classic  “bell curve” shape (provided that a < 0 so that the quadratic function is concave). Notice that
f(x) > 0 everywhere. One can adjust a to control the  “width” of the bell, then adjust b to move the central peak of the
bell along the x-axis, and finally adjust c to control the  “height” of the bell. For f(x) to be a true probability density
function over R, one must choose c such that (which is only possible when a < 0).
Rather than using a, b, and c, it is far more common to describe a normal distribution by its mean μ = −b/(2a) and
variance σ2 = −1/(2a). Changing to these new parameters allows us to rewrite the probability density function in a
convenient standard form,

Notice that for a standard normal distribution, μ = 0 and σ2 = 1. The last part of the equation above shows that any
other normal distribution can be regarded as a version of the standard normal distribution that has been stretched
horizontally by a factor σ and then translated rightward by a distance μ. Thus, μ specifies the position of the bell
Normal distribution 58

curve’s central peak, and σ specifies the  “width” of the bell curve.
The parameter μ is at the same time the mean, the median and the mode of the normal distribution. The parameter σ2
is called the variance; as for any random variable, it describes how concentrated the distribution is around its mean.
The square root of σ2 is called the standard deviation and is the width of the density function.
The normal distribution is usually denoted by N(μ, σ2).[4] Commonly the letter N is written in calligraphic font (typed
as \mathcal{N} in LaTeX). Thus when a random variable X is distributed normally with mean μ and variance σ2,
we write

Alternative formulations
Some authors[5] instead of σ2 use its reciprocal τ = σ−2, which is called the precision. This parameterization has an
advantage in numerical applications where σ2 is very close to zero and is more convenient to work with in analysis as
τ is a natural parameter of the normal distribution. Another advantage of using this parameterization is in the study of
conditional distributions in multivariate normal case.
The question which normal distribution should be called the  “standard” one is also answered differently by various
authors. Starting from the works of Gauss the standard normal was considered to be the one with variance σ2 = 1/2:

Stigler (1982) goes even further and suggests the standard normal with variance σ2 = 1/(2π):

According to the author, this formulation is advantageous because of a much simpler and easier-to-remember
formula, the fact that the pdf has unit height at zero, and simple approximate formulas for the quantiles of the
distribution.

Characterization
In the previous section the normal distribution was defined by specifying its probability density function. However
there are other ways to characterize a probability distribution. They include: the cumulative distribution function, the
moments, the cumulants, the characteristic function, the moment-generating function, etc.

Probability density function


The probability density function (pdf) of a random variable describes the relative frequencies of different values for
that random variable. The pdf of the normal distribution is given by the formula explained in detail in the previous
section:
This is a proper function only when the variance σ2 is not equal to zero. In that case this is a continuous smooth
function, defined on the entire real line, and which is called the  “Gaussian function”.
When σ2 = 0, the density function doesn’t exist. However we can consider a generalized function that would behave
in a manner similar to the regular density function (in the sense that it defines a measure on the real line, and it can
be plugged in into an integral in order to calculate expected values of different quantities):

This is the Dirac delta function, it is equal to infinity at x = μ and is zero elsewhere.
Properties:
• Function ƒ(x) is symmetric around the point x = μ, which is at the same time the mode, the median and the mean
of the distribution.
Normal distribution 59

• The inflection points of the curve occur one standard deviation away from the mean (i.e., at x = μ − σ and x = μ +
σ).
• The standard normal density ϕ(x) is an eigenfunction of the Fourier transform.
• The function is supersmooth of order 2, implying that it is infinitely differentiable.
• The first derivative of ϕ(x) is ϕ′(x) = −x·ϕ(x); the second derivative is ϕ′′(x) = (x2 − 1)ϕ(x). More generally, the
n-th derivative is given by ϕ(n)(x) = (−1)nHn(x)ϕ(x), where Hn is the Hermite polynomial of order n.[6]

Cumulative distribution function


The cumulative distribution function (cdf) describes probabilities for a random variable to fall in the intervals of the
form (−∞, x]. The cdf of the standard normal distribution is denoted with the capital Greek letter Φ (phi), and can be
computed as an integral of the probability density function:

This integral can only be expressed in terms of a special function erf, called the error function. The numerical
methods for calculation of the standard normal cdf are discussed below. For a generic normal random variable with
mean μ and variance σ2 > 0 the cdf will be equal to

For a normal distribution with zero variance, the cdf is the Heaviside step function:

The complement of the standard normal cdf, Q(x) = 1 − Φ(x), is referred to as the Q-function, especially in
engineering texts.[7] [8] This represents the tail probability of the Gaussian distribution, that is the probability that a
standard normal random variable X is greater than the number x. Other definitions of the Q-function, all of which are
simple transformations of Φ, are also used occasionally.[9]
Properties:
• The standard normal cdf is 2-fold rotationally symmetric around point (0, ½):  Φ(−x) = 1 − Φ(x).
• The derivative of Φ(x) is equal to the standard normal pdf ϕ(x):  Φ′(x) = ϕ(x).
• The antiderivative of Φ(x) is:  ∫ Φ(x) dx = x Φ(x) + ϕ(x).

Quantile function
The inverse of the standard normal cdf, called the quantile function or probit function, is expressed in terms of the
inverse error function:

Quantiles of the standard normal distribution are commonly denoted as zp. The quantile zp represents such a value
that a standard normal random variable X has the probability of exactly p to fall inside the (−∞, zp] interval. The
quantiles are used in hypothesis testing, construction of confidence intervals and Q-Q plots. The most  “famous”
normal quantile is 1.96 = z0.975. A standard normal random variable is greater than 1.96 in absolute value in only 5%
of cases.
For a normal random variable with mean μ and variance σ2, the quantile function is
Normal distribution 60

Characteristic function and moment generating function


The characteristic function φX(t) of a random variable X is defined as the expected value of eitX, where i is the
imaginary unit, and t ∈ R is the argument of the characteristic function. Thus the characteristic function is the Fourier
transform of the density ϕ(x). For a normally distributed X with mean μ and variance σ2, the characteristic function is
[10]

The moment generating function is defined as the expected value of etX. For a normal distribution, the moment
generating function exists and is equal to

The cumulant generating function is the logarithm of the moment generating function:

Since this is a quadratic polynomial in t, only the first two cumulants are nonzero.

Moments
The normal distribution has moments of all orders. That is, for a normally distributed X with mean μ and variance σ
2
, the expectation E|X|p exists and is finite for all p such that Re[p] > −1. Usually we are interested only in moments
of integer orders: p = 1, 2, 3, ….
• Central moments are the moments of X around its mean μ. Thus, a central moment of order p is the expected
value of (X − μ) p. Using standardization of normal random variables, this expectation will be equal to σ p · E[Zp],
where Z is standard normal.

Here n!! denotes the double factorial, that is the product of every other number from n to 1.
• Central absolute moments are the moments of |X − μ|. They coincide with regular moments for all even orders,
but are nonzero for all odd p’s.

• Raw moments and raw absolute moments are the moments of X and |X| respectively. The formulas for these
moments are much more complicated, and are given in terms of confluent hypergeometric functions 1F1 and U.

These expressions remain valid even if p is not integer. See also generalized Hermite polynomials.
• First two cumulants are equal to μ and σ 2 respectively, whereas all higher-order cumulants are equal to zero.
Normal distribution 61

Order Raw moment Central moment Cumulant

1 μ 0 μ

2 μ2 + σ2 σ2 σ2

3 0 0
μ3 + 3μσ2

4 0
μ4 + 6μ2σ2 + 3σ4 3σ 4

5 0 0
μ5 + 10μ3σ2 + 15μσ4

6 0
μ6 + 15μ4σ2 + 45μ2σ4 + 15σ6 15σ 6

7 0 0
μ7 + 21μ5σ2 + 105μ3σ4 + 105μσ6

8 0
μ8 + 28μ6σ2 + 210μ4σ4 + 420μ2σ6 + 105σ8 105σ 8

Properties

Standardizing normal random variables


As a consequence of property 1, it is possible to relate all normal random variables to the standard normal. For
example if X is normal with mean μ and variance σ2, then

has mean zero and unit variance, that is Z has the standard normal distribution. Conversely, having a standard normal
random variable Z we can always construct another normal random variable with specific mean μ and variance σ2:

This  “standardizing” transformation is convenient as it allows one to compute the pdf and especially the cdf of a
normal distribution having the table of pdf and cdf values for the standard normal. They will be related via

Standard deviation and confidence intervals


Normal distribution 62

About 68% of values drawn from a


normal distribution are within one
standard deviation σ > 0 away from the
mean μ; about 95% of the values are
within two standard deviations and
about 99.7% lie within three standard
deviations. This is known as the
68-95-99.7 rule, or the empirical rule,
or the 3-sigma rule.

To be more precise, the area under the


bell curve between μ − nσ and μ + nσ Dark blue is less than one standard deviation from the mean. For the normal distribution,
this accounts for about 68% of the set (dark blue), while two standard deviations from the
in terms of the cumulative normal
mean (medium and dark blue) account for about 95%, and three standard deviations
distribution function is given by (light, medium, and dark blue) account for about 99.7%.

where erf is the error function. To 12 decimal places, the values for the 1-, 2-, up to 6-sigma points are:

i.e. 1 minus ... or 1 in ...

1 0.682689492137 0.317310507863 3.15148718753

2 0.954499736104 0.045500263896 21.9778945081

3 0.997300203937 0.002699796063 370.398347380

4 0.999936657516 0.000063342484 15,787.192684

5 0.999999426697 0.000000573303 1,744,278.331

6 0.999999998027 0.000000001973 506,842,375.7

The next table gives the reverse relation of sigma multiples corresponding to a few often used values for the area
under the bell curve. These values are useful to determine (asymptotic) confidence intervals of the specified levels
based on normally distributed (or asymptotically normal) estimators:

0.80 1.281551565545

0.90 1.644853626951

0.95 1.959963984540

0.98 2.326347874041

0.99 2.575829303549

0.995 2.807033768344

0.998 3.090232306168

0.999 3.290526731492

0.9999 3.890591886413

0.99999 4.417173413469
Normal distribution 63

where the value on the left of the table is the proportion of values that will fall within a given interval and n is a
multiple of the standard deviation that specifies the width of the interval.

Central limit theorem


The theorem states that under certain, fairly common conditions, the sum of a large number of random variables will
have an approximately normal distribution. For example if (x1, …, xn) is a sequence of iid random variables, each
having mean μ and variance σ2 but otherwise distributions of xi’s can be arbitrary, then the central limit theorem
states that

The theorem will hold even if the summands xi are not iid, although some constraints on the degree of dependence
and the growth rate of moments still have to be imposed.
The importance of the central limit theorem cannot be overemphasized. A great number of test statistics, scores, and
estimators encountered in practice contain sums of certain random variables in them, even more estimators can be
represented as sums of random variables through the use of influence functions — all of these quantities are
governed by the central limit theorem and will have asymptotically normal distribution as a result.
Another practical consequence of the central limit
theorem is that certain other distributions can be
approximated by the normal distribution, for example:
• The binomial distribution B(n, p) is approximately
normal N(np, np(1 − p)) for large n and for p not too
close to zero or one.
• The Poisson(λ) distribution is approximately normal
N(λ, λ) for large values of λ.
• The chi-squared distribution χ2(k) is approximately
normal N(k, 2k) for large ks.
As the number of discrete events increases, the function begins to
• The Student’s t-distribution t(ν) is approximately resemble a normal distribution
normal N(0, 1) when ν is large.

Whether these approximations are sufficiently accurate depends on the purpose for which they are needed, and the
rate of convergence to the normal distribution. It is typically the case that such approximations are less accurate in
the tails of the distribution.
A general upper bound for the approximation error in the central limit theorem is given by the Berry–Esseen
theorem, improvements of the approximation are given by the Edgeworth expansions.

Miscellaneous
1. The family of normal distributions is closed under linear transformations. That is, if X is normally distributed
with mean μ and variance σ2, then a linear transform aX + b (for some real numbers a and b) is also normally
distributed:

Also if X1, X2 are two independent normal random variables, with means μ1, μ2 and standard deviations σ1, σ2,
then their linear combination will also be normally distributed: [proof]

2. The converse of (1) is also true: if X1 and X2 are independent and their sum X1 + X2 is distributed normally, then
both X1 and X2 must also be normal. This is known as Cramér’s theorem. The interpretation of this property is that
Normal distribution 64

a normal distribution is only divisible by other normal distributions.


3. It is a common fallacy that if two normal random variables are uncorrelated then they are also independent. This
is false.[proof] The correct statement is that if the two random variables are jointly normal and uncorrelated, only
then they are independent.
4. Normal distribution is infinitely divisible: for a normally distributed X with mean μ and variance σ2 we can find n
independent random variables {X1, …, Xn} each distributed normally with means μ/n and variances σ2/n such that

5. Normal distribution is stable (with exponent α = 2): if X1, X2 are two independent N(μ, σ2) random variables and
a, b are arbitrary real numbers, then

where X3 is also N(μ, σ2). This relationship directly follows from property (1).
6. The Kullback–Leibler divergence between two normal distributions X1 ∼ N(μ1, σ21 )and X2 ∼ N(μ2, σ22 )is given
by:[11]

The Hellinger distance between the same distributions is equal to

7. The Fisher information matrix for normal distribution is diagonal and takes form

8. Normal distributions belongs to an exponential family with natural parameters and , and natural
statistics x and x . The dual, expectation parameters for normal distribution are η1 = μ and η2 = μ + σ2.
2 2

9. Of all probability distributions over the reals with mean μ and variance σ2, the normal distribution N(μ, σ2) is the
one with the maximum entropy.
10. The family of normal distributions forms a manifold with constant curvature −1. The same family is flat with
respect to the (±1)-connections ∇(e) and ∇(m).[12]

Related distributions
• If X is distributed normally with mean μ and variance σ2, then
• The exponent of X is distributed log-normally: eX ~ lnN (μ, σ2).
• The absolute value of X has folded normal distribution: IXI ~ Nf (μ, σ2). If μ = 0 this is known as the
half-normal distribution.
• The square of X/σ has the non-central chi-square distribution with one degree of freedom: X2/σ2 ~ χ21(μ2/σ2). If
μ = 0, the distribution is called simply chi-square.
• Variable X restricted to an interval [a, b] is called the truncated normal distribution.
• (X − μ)−2 has a Lévy distribution with location 0 and scale σ−2.
• If X1 and X2 are two independent standard normal random variables, then
• Their sum and difference is distributed normally with mean zero and variance two: X1 ± X2 ∼ N(0, 2).
• Their product Z = X1·X2 follows the  “product-normal” distribution[13] with density function fZ(z) = π−1K0(|z|),
where K0 is the modified Bessel function of the second kind. This distribution is symmetric around zero,
unbounded at z = 0, and has the characteristic function φZ(t) = (1 + t 2)−1/2.
• Their ratio follows the standard Cauchy distribution: X1 ÷ X2 ∼ Cauchy(0, 1).
Normal distribution 65

• Their Euclidean norm has the Rayleigh distribution, also known as the chi distribution with 2
degrees of freedom.
• If X1, X2, …, Xn are independent standard normal random variables, then the sum of their squares has the
chi-square distribution with n degrees of freedom: .
• If X1, X2, …, Xn are independent normally distributed random variables with means μ and variances σ2, then their
sample mean is independent from the sample standard deviation, which can be demonstrated using the Basu’s
theorem or Cochran’s theorem. The ratio of these two quantities will have the Student’s t-distribution with n − 1
degrees of freedom:
• If X1, …, Xn, Y1, …, Ym are independent standard normal random variables, then the ratio of their normalized
sums of squares will have the F-distribution with (n, m) degrees of freedom:

Extensions
The notion of normal distribution, being one of the most important distributions in probability theory, has been
extended far beyond the standard framework of the univariate (that is one-dimensional) case. All these extensions are
also called normal or Gaussian laws, so a certain ambiguity in names exists.
• Multivariate normal distribution describes the Gaussian law in the k-dimensional Euclidean space. A vector X ∈
Rk is multivariate-normally distributed if any linear combination of its components     has a
(univariate) normal distribution. The variance of X is a k×k symmetric positive-definite matrix V.
• Complex normal distribution deals with the complex normal vectors. A complex vector X ∈ Ck is said to be
normal if both its real and imaginary components jointly possess a 2k-dimensional multivariate normal
distribution. The variance-covariance structure of X is described by two matrices: the variance matrix Γ, and the
relation matrix C.
• Matrix normal distribution describes the case of normally distributed matrices.
• Gaussian processes are the normally distributed stochastic processes. These can be viewed as elements of some
infinite-dimensional Hilbert space H, and thus are the analogues of multivariate normal vectors for the case k = ∞.
A random element h ∈ H is said to be normal if for any constant a ∈ H the scalar product (a, h) has a (univariate)
normal distribution. The variance structure of such Gaussian random element can be described in terms of the
linear covariance operator K: H → H. Several Gaussian processes became popular enough to have their own
names:
• Brownian motion,
• Brownian bridge,
• Ornstein-Uhlenbeck process.
• Gaussian q-distribution is an abstract mathematical construction which represents a  “q-analogue” of the normal
distribution.
One of the main practical uses of the Gaussian law is to model the empirical distributions of many different random
variables encountered in practice. In such case a possible extension would be a richer family of distributions, having
more than two parameters and therefore being able to fit the empirical distribution more accurately. The examples of
such extensions are:
• Pearson distribution — a four-parametric family of probability distributions that extend the normal law to include
different skewness and kurtosis values.
Normal distribution 66

Normality tests
Normality tests assess the likelihood that the given data set {x1, …, xn} comes from a normal distribution. Typically
the null hypothesis H0 is that the observations are distributed normally with unspecified mean μ and variance σ2,
versus the alternative Ha that the distribution is arbitrary. A great number of tests (over 40) have been devised for
this problem, the more prominent of them are outlined below:
• “Visual” tests are more intuitively appealing but subjective at the same time, as they rely on informal human
judgement to accept or reject the null hypothesis.
• Q-Q plot — is a plot of the sorted values from the data set against the expected values of the corresponding
quantiles from the standard normal distribution. That is, it’s a plot of point of the form (Φ−1(pk), x(k)), where
plotting points pk are equal to pk = (k−α)/(n+1−2α) and α is an adjustment constant which can be anything
between 0 and 1. If the null hypothesis is true, the plotted points should approximately lie on a straight line.
• P-P plot — similar to the Q-Q plot, but used much less frequently. This method consists of plotting the points
(Φ(z(k)), pk), where . For normally distributed data this plot should lie on a 45° line between
(0,0) and (1,1).
• Wilk–Shapiro test employs the fact that the line in the Q-Q plot has the slope of σ. The test compares the least
squares estimate of that slope with the value of the sample variance, and rejects the null hypothesis if these two
quantities differ significantly.
• Normal probability plot (rankit plot)
• Moment tests:
• D’Agostino’s K-squared test
• Jarque–Bera test
• Empirical distribution function tests:
• Kolmogorov–Smirnov test
• Lilliefors test
• Anderson–Darling test

Estimation of parameters
It is often the case that we don’t know the parameters of the normal distribution, but instead want to estimate them.
That is, having a sample (x1, …, xn) from a normal N(μ, σ2) population we would like to learn the approximate
values of parameters μ and σ2. The standard approach to this problem is the maximum likelihood method, which
requires maximization of the log-likelihood function:
Taking derivatives with respect to μ and σ2 and solving the resulting system of first order conditions yields the
maximum likelihood estimates:

Estimator is called the sample mean, since it is the arithmetic mean of all observations. The statistic is complete
and sufficient for μ, and therefore by the Lehmann–Scheffé theorem, is the uniformly minimum variance unbiased
(UMVU) estimator. In finite samples it is distributed normally:

The variance of this estimator is equal to the μμ-element of the inverse Fisher information matrix . This implies
that the estimator is finite-sample efficient. Of practical importance is the fact that the standard error of is
proportional to , that is, if one wishes to decrease the standard error by a factor of 10, one must increase the
number of points in the sample by a factor of 100. This fact is widely used in determining sample sizes for opinion
polls and the number of trials in Monte Carlo simulations.
Normal distribution 67

From the standpoint of the asymptotic theory, is consistent, that is, it converges in probability to μ as n → ∞. The
estimator is also asymptotically normal, which is a simple corollary of the fact that it is normal in finite samples:

The estimator is called the sample variance, since it is the variance of the sample (x1, …, xn). In practice, another
estimator is often used instead of the . This other estimator is denoted s2, and is also called the sample variance,
which represents a certain ambiguity in terminology; its square root s is called the sample standard deviation. The
estimator s2 differs from by having (n − 1) instead of n in the denominator (the so called Bessel’s correction):

The difference between s2 and becomes negligibly small for large n’s. In finite samples however, the motivation
behind the use of s2 is that it is an unbiased estimator of the underlying parameter σ2, whereas is biased. Also, by
the Lehmann–Scheffé theorem the estimator s2 is uniformly minimum variance unbiased (UMVU), which makes it
the  “best” estimator among all unbiased ones. However it can be shown that the biased estimator is  “better” than
the s2 in terms of the mean squared error (MSE) criterion. In finite samples both s2 and have scaled chi-squared
distribution with (n − 1) degrees of freedom:

The first of these expressions shows that the variance of s2 is equal to 2σ4/(n−1), which is slightly greater than the
σσ-element of the inverse Fisher information matrix . Thus, s2 is not an efficient estimator for σ2, and moreover,
since s2 is UMVU, we can conclude that the finite-sample efficient estimator for σ2 does not exist.
Applying the asymptotic theory, both estimators s2 and are consistent, that is they converge in probability to σ2 as
the sample size n → ∞. The two estimators are also both asymptotically normal:

In particular, both estimators are asymptotically efficient for σ2.


By Cochran’s theorem, for normal distribution the sample mean and the sample variance s2 are independent,
which means there can be no gain in considering their joint distribution. There is also a reverse theorem: if in a
sample the sample mean and sample variance are independent, then the sample must have come from the normal
distribution. The independence between and s can be employed to construct the so-called t-statistic:

This quantity t has the Student’s t-distribution with (n − 1) degrees of freedom, and it is an ancillary statistic
(independent of the value of the parameters). Inverting the distribution of this t-statistics will allow us to construct
the confidence interval for μ; similarly, inverting the χ2 distribution of the statistic s2 will give us the confidence
interval for σ2:
where tk,p and χk,p2 are the pth quantiles of the t- and χ2-distributions respectively. These confidence intervals are of
the level 1 − α, meaning that the true values μ and σ2 fall outside of these intervals with probability α. In practice
people usually take α = 5%, resulting in the 95% confidence intervals. The approximate formulas in the display
above were derived from the asymptotic distributions of and s2. The approximate formulas become valid for large
values of n, and are more convenient for the manual calculation since the standard normal quantiles zα/2 do not
depend on n. In particular, the most popular value of α = 5%, results in |z0.025| = 1.96.
Normal distribution 68

Occurrence
The occurrence of normal distribution in practical problems can be loosely classified into three categories:
1. Exactly normal distributions;
2. Approximately normal laws, for example when such approximation is justified by the central limit theorem; and
3. Distributions modeled as normal — the normal distribution being one of the simplest and most convenient to use,
frequently researchers are tempted to assume that certain quantity is distributed normally, without justifying such
assumption rigorously. In fact, the maturity of a scientific field can be judged by the prevalence of the normality
assumption in its methods.

Exact normality
Certain quantities in physics are distributed normally, as was first
demonstrated by James Clerk Maxwell. Examples of such quantities
are:
• Velocities of the molecules in the ideal gas. More generally,
velocities of the particles in any system in thermodynamic
equilibrium will have normal distribution, due to the maximum
entropy principle.
• Probability density function of a ground state in a quantum harmonic
oscillator.
• The density of an electron cloud in 1s state. The ground state of a quantum harmonic
oscillator has the Gaussian distribution.

• The position of a particle which experiences diffusion. If initially the particle is located at a specific point (that is
its probability distribution is a dirac delta function), then after time t its location is described by a normal
distribution with variance t, which satisfies the diffusion equation  . If the initial location is
given by a certain density function g(x), then the density at time t is the convolution of g and the normal pdf.

Approximate normality
Approximately normal distributions occur in many situations, as explained by the central limit theorem. When the
outcome is produced by a large number of small effects acting additively and independently, its distribution will be
close to normal. The normal approximation will not be valid if the effects act multiplicatively (instead of additively),
or if there is a single external influence which has a considerably larger magnitude than the rest of the effects.
• In counting problems, where the central limit theorem includes a discrete-to-continuum approximation and where
infinitely divisible and decomposable distributions are involved, such as
• Binomial random variables, associated with binary response variables;
• Poisson random variables, associated with rare events;
• Thermal light has a Bose–Einstein distribution on very short time scales, and a normal distribution on longer
timescales due to the central limit theorem.
Normal distribution 69

Assumed normality


I can only recognize the occurrence of the normal curve — the Laplacian curve of errors — as a very abnormal phenomenon. It is roughly
approximated to in certain distributions; for this reason, and on account for its beautiful simplicity, we may, perhaps, use it as a first
approximation, particularly in theoretical investigations. — Pearson (1901) ”
There are statistical methods to empirically test that assumption, see the #Normality tests section.
• In biology:
• The logarithm of measures of size of living tissue (length, height, skin area, weight);[14]
• The length of inert appendages (hair, claws, nails, teeth) of biological specimens, in the direction of growth;
presumably the thickness of tree bark also falls under this category;
• Certain physiological measurements, such as blood pressure of adult humans (after separation on male/female
subpopulations).
• In finance, in particular the Black–Scholes model, changes in the logarithm of exchange rates, price indices, and
stock market indices are assumed normal (these variables behave like compound interest, not like simple interest,
and so are multiplicative). Some mathematicians such as Benoît Mandelbrot argue that log-Levy distributions
which possesses heavy tails would be a more appropriate model, in particular for the analysis for stock market
crashes.
• Measurement errors in physical experiments are often assumed to be normally distributed. This assumption
allows for particularly simple practical rules for how to combine errors in measurements of different quantities.
However, whether this assumption is valid or not in practice is debatable. A famous remark of Lippmann says: 
“Everyone believes in the [normal] law of errors: the mathematicians, because they think it is an experimental
fact; and the experimenters, because they suppose it is a theorem of mathematics.” [15]
• In standardized testing, results can be made to have a normal distribution. This is done by either selecting the
number and difficulty of questions (as in the IQ test), or by transforming the raw test scores into  “output” scores
by fitting them to the normal distribution. For example, the SAT’s traditional range of 200–800 is based on a
normal distribution with a mean of 500 and a standard deviation of 100.
• Many scores are derived from the normal distribution, including percentile ranks (  “percentiles” or   “quantiles”),
normal curve equivalents, stanines, z-scores, and T-scores. Additionally, a number of behavioral statistical
procedures are based on the assumption that scores are normally distributed; for example, t-tests and ANOVAs.
Bell curve grading assigns relative grades based on a normal distribution of scores.
Normal distribution 70

Generating values from normal distribution


For computer simulations, especially in applications of
Monte-Carlo method, it is often useful to generate
values that have a normal distribution. All algorithms
described here are concerned with generating the
standard normal, since a N(μ, σ2) can be generated as X
= μ + σZ, where Z is standard normal. The algorithms
rely on the availability of a random number generator
capable of producing random values distributed
uniformly.

• The most straightforward method is based on the


probability integral transform property: if U is
distributed uniformly on (0,1), then Φ−1(U) will
have the standard normal distribution. The drawback
The bean machine, a device invented by sir Francis Galton, can be
of this method is that it relies on calculation of the called the first generator of normal random variables. This machine
probit function Φ−1, which cannot be done consists of a vertical board with interleaved rows of pins. Small balls
analytically. Some approximate methods are are dropped from the top and then bounce randomly left or right as
they hit the pins. The balls are collected into bins at the bottom and
described in Hart (1968) and in the erf article.
settle down into a pattern resembling the Gaussian curve.
• A simple approximate approach that is easy to
program is as follows: simply sum 12 uniform (0,1) deviates and subtract 6 — the resulting random variable will
have approximately standard normal distribution. In truth, the distribution will be Irwin–Hall, which is a
12-section eleventh-order polynomial approximation to the normal distribution. This random deviate will have a
limited range of (−6, 6).[16]
• The Box–Muller method uses two independent random numbers U and V distributed uniformly on (0,1]. Then
two random variables X and Y

will both have the standard normal distribution, and be independent. This formulation arises because for a
bivariate normal random vector (X Y) the squared norm X2 + Y2 will have the chi-square distribution with two
degrees of freedom, which is an easily generated exponential random variable corresponding to the quantity
−2ln(U) in these equations; and the angle is distributed uniformly around the circle, chosen by the random
variable V.
• Marsaglia polar method is a modification of the Box–Muller method algorithm, which does not require
computation of functions sin() and cos(). In this method U and V are drawn from the uniform (−1,1)
distribution, and then S = U2 + V2 is computed. If S is greater or equal to one then the method starts over,
otherwise two quantities

are returned. Again, X and Y here will be independent and standard normally distributed.
• Ratio method[17] starts with generating two independent uniform deviates U and V. The algorithm proceeds as
follows:
• Compute X = √(8/e) (V − 0.5)/U;
• If X2 ≤ 5 − 4e1/4U then accept X and terminate algorithm;
• If X2 ≥ 4e−1.35/U + 1.4 then reject X and start over from step 1;
• If X2 ≤ −4 / lnU then accept X, otherwise start over the algorithm.
Normal distribution 71

• The ziggurat algorithm (Marsaglia & Tsang 2000) is faster than the Box–Muller transform and still exact. In
about 97% of all cases it uses only two random numbers, one random integer and one random uniform, one
multiplication and an if-test. Only in 3% of the cases where the combination of those two falls outside the  “core
of the ziggurat” a kind of rejection sampling using logarithms, exponentials and more uniform random numbers
has to be employed.
• There is also some investigation into the connection between the fast Hadamard transform and the normal
distribution, since the transform employs just addition and subtraction and by the central limit theorem random
numbers from almost any distribution will be transformed into the normal distribution. In this regard a series of
Hadamard transforms can be combined with random permutations to turn arbitrary data sets into a normally
distributed data.

Numerical approximations for the normal cdf


The standard normal cdf is widely used in scientific and statistical computing. The values Φ(x) may be approximated
very accurately by a variety of methods, such as numerical integration, Taylor series, asymptotic series and
continued fractions. Different approximations are used depending on the desired level of accuracy.
• Abramowitz & Stegun (1964) give the approximation for Φ(x) for x > 0 with the absolute error |ε(x)| < 7.5·10−8
(algorithm 26.2.17 [18]):where ϕ(x) is the standard normal pdf, and b0 = 0.2316419, b1 = 0.319381530, b2 =
−0.356563782, b3 = 1.781477937, b4 = −1.821255978, b5 = 1.330274429.
• Hart (1968) lists almost a hundred of rational function approximations for the erfc() function. His algorithms
vary in the degree of complexity and the resulting precision, with maximum absolute precision of 24 digits. An
algorithm by West (2009) combines Hart’s algorithm 5666 with a continued fraction approximation in the tail to
provide a fast computation algorithm with a 16-digit precision.
• Marsaglia (2004) suggested a simple algorithm[19] based on the Taylor series expansion for calculating Φ(x) with
arbitrary precision. The drawback of this algorithm is comparatively slow calculation time (for example it takes
over 300 iterations to calculate the function with 16 digits of precision when x = 10).
• The GNU Scientific Library calculates values of the standard normal cdf using Hart’s algorithms and
approximations with Chebyshev polynomials.

History
Some authors[20] [21] attribute at least partially the credit for the discovery of the normal distribution to de Moivre,
who in 1738 published in the second edition of his  “The Doctrine of Chances”[22] [23] the study of the coefficients in
the binomial expansion of (a + b)n. De Moivre proved that the middle term in this expansion has the approximate
magnitude of , and that  “If m or ½n be a Quantity infinitely great, then the Logarithm of the Ratio, which a
Term diſtant from the middle by the Interval ℓ, has to the middle Term, is .” Although this theorem can be
interpreted as the first obscure expression for the normal probability law, Stigler points out that de Moivre himself
did not interpret his results as anything more than the approximate rule for the binomial coefficients, and in
particular de Moivre lacked the concept of the probability density function.[24]
Normal distribution 72

In 1809 Gauss published the monograph  “Theoria motus corporum


coelestium in sectionibus conicis solem ambientium” where among
other things he introduces and describes several important statistical
concepts, such as the method of least squares, the method of maximum
likelihood, and the normal distribution. Gauss used M, M′, M′′, … to
denote the measurements of some unknown quantity V, and sought the 
“most probable” estimator: the one which maximizes the probability
φ(M−V) · φ(M′−V) · φ(M′′−V) · … of obtaining the observed
experimental results. In his notation φΔ is the probability law of the
measurement errors of magnitude Δ. Not knowing what the function φ
is, Gauss requires that his method should reduce to the well-known
answer: the arithmetic mean of the measured values.[25] Starting from
these principles, Gauss demonstrates that the only law which
rationalizes the choice of arithmetic mean as an estimator of the
Carl Friedrich Gauss invented the normal
location parameter, is the normal law of errors:[26]
distribution in 1809 as a way to rationalize the
method of least squares.

where h is  “the measure of the precision of the observations”. Using this normal law as a generic model for errors in
the experiments, Gauss formulates what is now known as the non-linear weighted least squares (NWLS) method.[27]
Although Gauss was the first to suggest the normal distribution law,
the merit of the contributions of Laplace cannot be underestimated.[28]
It was Laplace who first posed the problem of aggregating several
observations in 1774,[29] although his own solution led to the Laplacian
distribution. It was Laplace who first calculated the value of the
integral ∫ e−t ²dt = √π in 1782, providing the normalization constant
for the normal distribution.[30] Finally, it was Laplace who in 1810
proved and presented to the Academy the fundamental central limit
theorem, which emphasized the theoretical importance of the normal
distribution.[31]

It is of interest to note that in 1809 an American mathematician Adrain


published two derivations of the normal probability law,
simultaneously and independently from Gauss.[32] His works remained
unnoticed until 1871 when they were rediscovered by Abbe,[33] mainly
because the scientific community was virtually non-existent in the
Marquis de Laplace proved the central limit
United States at that time. theorem in 1810, consolidating the importance of
the normal distribution in statistics.
In the middle of the 19th century Maxwell demonstrated that the
normal distribution is not just a convenient mathematical tool, but may
also occur in natural phenomena:[34] “The number of particles whose velocity, resolved in a certain direction, lies
between x and x+dx is

Since its introduction, the normal distribution has been known by many different names: the law of error, the law of
facility of errors, Laplace’s second law, Gaussian law, etc. By the end of the 19th century some authors[35] start to
Normal distribution 73

occasionally use the name normal distribution, where the word “normal” is used as an adjective — the term was
derived from the fact that this distribution was seen as typical, common, normal. Around the turn of the 20th century
Pearson popularizes the term normal as a designation for this distribution.[36]


Many years ago I called the Laplace–Gaussian curve the normal curve, which name, while it avoids an international question of priority, has
the disadvantage of leading people to believe that all other distributions of frequency are in one sense or another ‘abnormal.’ — Pearson
(1920) ”
Also, it was Pearson who first wrote the distribution in terms of the standard deviation σ as in modern notation. Soon
after this, in year 1915, Fisher added the location parameter to the formula for normal distribution, expressing it in
the way it is written nowadays:

The term “standard normal” which denotes the normal distribution with zero mean and unit variance came into
general use around 1950s, appearing in the popular textbooks by P.G. Hoel (1947) “Introduction to mathematical
statistics” and A.M. Mood (1950) “Introduction to the theory of statistics”.[37]

See also
• Behrens–Fisher problem — the long-standing problem of testing whether two normal samples with different
variances have same means;
• Erdős-Kac theorem — on the occurrence of the normal distribution in number theory
• Gaussian blur — convolution which uses the normal distribution as a kernel

Notes
[1] The designation  “bell curve” is ambiguous: there are many other distributions in probability theory which can be recognized as  “bell-shaped”:
the Cauchy distribution, Student’s t-distribution, generalized normal, logistic, etc.
[2] Gale Encyclopedia of Psychology — Normal Distribution (http:/ / findarticles. com/ p/ articles/ mi_g2699/ is_0002/ ai_2699000241)
[3] Halperin & et al. (1965, item 7)
[4] McPherson (1990) page 110
[5] Bernardo & Smith (2000)
[6] Patel & Read (1996, [2.1.8])
[7] Scott, Clayton; Robert Nowak (August 7, 2003). "The Q-function" (http:/ / cnx. org/ content/ m11537/ 1. 2/ ). Connexions. .
[8] Barak, Ohad (April 6, 2006). "Q function and error function" (http:/ / www. eng. tau. ac. il/ ~jo/ academic/ Q. pdf). Tel Aviv University. .
[9] Weisstein, Eric W., " Normal Distribution Function (http:/ / mathworld. wolfram. com/ NormalDistributionFunction. html)" from
MathWorld.
[10] Sanders, Mathijs A.. "Characteristic function of the univariate normal distribution" (http:/ / www. planetmathematics. com/ CharNormal.
pdf). . Retrieved 2009-03-06.
[11] http:/ / www. allisons. org/ ll/ MML/ KL/ Normal/
[12] Amari & Nagaoka (2000)
[13] Mathworld entry for Normal Product Distribution (http:/ / mathworld. wolfram. com/ NormalProductDistribution. html)
[14] Huxley (1932)
[15] Whittaker, E. T.; Robinson, G. (1967). The Calculus of Observations: A Treatise on Numerical Mathematics. New York: Dover. p. 179.
[16] Johnson et al. (1995, Equation (26.48))
[17] Kinderman & Monahan (1976)
[18] http:/ / www. math. sfu. ca/ ~cbm/ aands/ page_932. htm
[19] For example, this algorithm is given in the article Bc programming language.
[20] Johnson et al. (1994, page 85)
[21] Le Cam (2000, p. 74)
[22] De Moivre (1738)
[23] De Moivre first published his findings in 1733, in a pamphlet  “Approximatio ad Summam Terminorum Binomii (a + b)n in Seriem Expansi”
that was designated for private circulation only. But it was not until the year 1738 that he made his results publicly available. The original
pamphlet was reprinted several times, see for example Helen M. Walker (1985).
[24] Stigler (1986, p. 76)
Normal distribution 74

[25] “It has been customary certainly to regard as an axiom the hypothesis that if any quantity has been determined by several direct observations,
made under the same circumstances and with equal care, the arithmetical mean of the observed values affords the most probable value, if not
rigorously, yet very nearly at least, so that it is always most safe to adhere to it.” — Gauss (1809, section 177)
[26] Gauss (1809, section 177)
[27] Gauss (1809, section 179)
[28] “My custom of terming the curve the Gauss–Laplacian or normal curve saves us from proportioning the merit of discovery between the two
great astronomer mathematicians.” quote from Pearson (1905, p. 189)
[29] Laplace (1774, Problem III)
[30] Pearson (1905, p. 189)
[31] Stigler (1986, p. 144)
[32] Stigler (1978, p. 243)
[33] Stigler (1978, p. 244)
[34] Maxwell (1860), p. 23
[35] Such use is encountered in the works of Peirce, Galton and Lexis approximately around 1875.
[36] Kruskal & Stigler (1997)
[37] "Earliest uses… (entry STANDARD NORMAL CURVE)" (http:/ / jeff560. tripod. com/ s. html). .

Literature
• Aldrich, John; Miller, Jeff. "Earliest uses of symbols in probability and statistics" (http://jeff560.tripod.com/
stat.html).
• Aldrich, John; Miller, Jeff. "Earliest known uses of some of the words of mathematics" (http://jeff560.tripod.
com/mathword.html). In particular, the entries for “bell-shaped and bell curve” (http://jeff560.tripod.com/b.
html), “normal (distribution)” (http://jeff560.tripod.com/n.html), “Gaussian” (http://jeff560.tripod.com/g.
html), and “Error, law of error, theory of errors, etc.” (http://jeff560.tripod.com/e.html).
• Amari, Shun-ichi; Nagaoka, Hiroshi (2000). Methods of information geometry. Oxford University Press.
ISBN 0-8218-0531-2.
• Bernardo, J. M.; Smith, A.F.M. (2000). Bayesian Theory. Wiley. ISBN 0-471-49464-X.
• de Moivre, Abraham (1738). The Doctrine of Chances. ISBN 0821821032.
• Gavss, Carolo Friderico (1809) (in Latin). Theoria motvs corporvm coelestivm in sectionibvs conicis Solem
ambientivm [Theory of the motion of the heavenly bodies moving about the Sun in conic sections]. English
translation (http://books.google.com/books?id=1TIAAAAAQAAJ).
• Gould, Stephen Jay (1981). The mismeasure of man (first ed.). W.W. Norton. ISBN 0-393-01489-4.
• Halperin, Max; Hartley, H. O.; Hoel, P. G. (1965). "Recommended standards for statistical symbols and notation.
COPSS committee on symbols and notation" (http://jstor.org/stable/2681417). The American Statistician 19
(3): 12–14. doi:10.2307/2681417.
• Hart, John F.; et al (1968). Computer approximations. New York: John Wiley & Sons, Inc. ISBN 0882756427.
• Herrnstein, C.; Murray (1994). The bell curve: intelligence and class structure in American life. Free Press.
ISBN 0-02-914673-9.
• Huxley, Julian S. (1932). Problems of relative growth. London. ISBN 0486611140. OCLC 476909537.
• Johnson, N.L.; Kotz, S.; Balakrishnan, N. (1994). Continuous univariate distributions, Volume 1. Wiley.
ISBN 0-471-58495-9.
• Johnson, N.L.; Kotz, S.; Balakrishnan, N. (1994). Continuous univariate distributions, Volume 2. Wiley.
ISBN 0-471-58494-0.
• Kruskal, William H.; Stigler, Stephen M. (1997). Normative terminology: ‘normal’ in statistics and elsewhere.
Statistics and public policy, edited by Bruce D. Spencer. Oxford University Press. ISBN 0-19-852341-6.
• la Place, M. de (1774). "Mémoire sur la probabilité des causes par les évènemens". Mémoires de Mathématique et
de Physique, Presentés à l’Académie Royale des Sciences, par divers Savans & lûs dans ses Assemblées, Tome
Sixième: 621–656. Translated by S.M.Stigler in Statistical Science 1 (3), 1986: JSTOR 2245476.
• Laplace, Pierre-Simon (1812). Analytical theory of probabilities.
Normal distribution 75

• McPherson, G. (1990). Statistics in scientific investigation: its basis, application and interpretation.
Springer-Verlag. ISBN 0-387-97137-8.
• Marsaglia, George; Tsang, Wai Wan (2000). "The ziggurat method for generating random variables" (http://
www.jstatsoft.org/v05/i08/paper). Journal of Statistical Software 5 (8).
• Marsaglia, George (2004). "Evaluating the normal distribution" (http://www.jstatsoft.org/v11/i05/paper).
Journal of Statistical Software 11 (4).
• Maxwell, James Clerk (1860). "V. Illustrations of the dynamical theory of gases. — Part I: On the motions and
collisions of perfectly elastic spheres". Philosophical Magazine, series 4 19 (124): 19–32.
doi:10.1080/14786446008642818 (inactive 2010-09-14).
• Patel, Jagdish K.; Read, Campbell B. (1996). Handbook of the normal distribution. ISBN 0824715411.
• Pearson, Karl (1905). "‘Das Fehlergesetz und seine Verallgemeinerungen durch Fechner und Pearson’. A
rejoinder". Biometrika 4: 169–212. JSTOR 2331536.
• Pearson, Karl (1920). "Notes on the history of correlation". Biometrika 13 (1): 25–45.
doi:10.1093/biomet/13.1.25. JSTOR 2331722.
• Stigler, Stephen M. (1978). "Mathematical statistics in the early states". The Annals of Statistics 6 (2): 239–265.
doi:10.1214/aos/1176344123. JSTOR 2958876.
• Stigler, Stephen M. (1982). "A modest proposal: a new standard for the normal". The American Statistician 36
(2). JSTOR 2684031.
• Stigler, Stephen M. (1986). The history of statistics: the measurement of uncertainty before 1900. Harvard
University Press. ISBN 0-674-40340-1.
• Stigler, Stephen M. (1999). Statistics on the table. Harvard University Press. ISBN 0674836014.
• Walker, Helen M. (editor) (1985) "De Moivre on the law of normal probability" in: Smith, David Eugene (1985),
A Source Book in Mathematics, Dover. ISBN 0486646904 pages 566–575. (online pdf) (http://www.york.ac.
uk/depts/maths/histstat/demoivre.pdf)
• Weisstein, Eric W. "Normal distribution" (http://mathworld.wolfram.com/NormalDistribution.html).
MathWorld.
• West, Graeme (2009). "Better approximations to cumulative normal functions" (http://www.wilmott.com/pdfs/
090721_west.pdf). Wilmott Magazine: 70–76.
• Zelen, Marvin; Severo, Norman C. (1964). Probability functions (chapter 26) (http://www.math.sfu.ca/~cbm/
aands/page_931.htm). Handbook of mathematical functions with formulas, graphs, and mathematical tables, by
Abramowitz and Stegun: National Bureau of Standards. New York: Dover. ISBN 0-486-61272-4.
Standard deviation 76

Standard deviation
In probability theory and statistics, the
standard deviation of a statistical
population, a data set, or a probability
distribution is the square root of its
variance. Standard deviation is a
widely used measure of the variability
or dispersion, being algebraically more
tractable though practically less robust
than the expected deviation or average
absolute deviation.

It shows how much variation there is A plot of a normal distribution (or bell curve). Each colored band has a width of one
standard deviation.
from the "average" (mean, or
expected/budgeted value). A low
standard deviation indicates that the
data points tend to be very close to the
mean, whereas high standard deviation
indicates that the data is spread out
over a large range of values.

For example, the average height for


adult men in the United States is about
70 inches (178 cm), with a standard Cumulative probability of a normal distribution
deviation of around 3 in (8 cm). This with expected value 0 and standard deviation 1

means that most men (about 68


percent, assuming a normal
distribution) have a height within 3 in
(8 cm) of the mean (67–73 in/170–185
cm), one standard deviation. Whereas
almost all men (about 95%) have a
height within 6 in (15 cm) of the mean
(64–76 in/163–193 cm), 2 standard
deviations. If the standard deviation
were zero, then all men would be
exactly 70 in (178 cm) high. If the
standard deviation were 20 in (51 cm),
then men would have much more A data set with a mean of 50 (shown in blue) and
variable heights, with a typical range a standard deviation (σ) of 20.

of about 50–90 in (127–229 cm).


Three standard deviations account for 99.7% of the sample population being studied, assuming the distribution is
normal (bell-shaped).
Standard deviation 77

In addition to expressing the variability of a population, standard


deviation is commonly used to measure confidence in statistical
conclusions. For example, the margin of error in polling data is
determined by calculating the expected standard deviation in the results
if the same poll were to be conducted multiple times. The reported
margin of error is typically about twice the standard deviation–the
radius of a 95% confidence interval. In science, researchers commonly
report the standard deviation of experimental data, and only effects that
fall far outside the range of standard deviation are considered
Example of two sample populations with the
statistically significant—normal random error or variation in the same mean and different standard deviations. Red
measurements is in this way distinguished from causal variation. population has mean 100 and SD 10; blue
Standard deviation is also important in finance, where the standard population has mean 100 and SD 50.

deviation on the rate of return on an investment is a measure of the


volatility of the investment.

The term standard deviation was first used[1] in writing by Karl Pearson[2] in 1894, following his use of it in lectures.
This was as a replacement for earlier alternative names for the same idea: for example, Gauss used mean error.[3] A
useful property of standard deviation is that, unlike variance, it is expressed in the same units as the data. Note,
however, that for measurements with percentage as unit, the standard deviation will have percentage points as unit.
When only a sample of data from a population is available, the population standard deviation can be estimated by a
modified quantity called the sample standard deviation, explained below.

Basic examples
Consider a population consisting of the following eight values:

These eight data points have the mean (average) of 5:

To calculate the population standard deviation, first compute the difference of each data point from the mean, and
square the result of each:

Next compute the average of these values, and take the square root:

This quantity is the population standard deviation, it is equal to the square root of the variance. The formula is
valid only if the eight values we began with form the complete population. If they instead were a random sample,
drawn from some larger,  “parent” population, then we should have used 7 instead of 8 in the denominator of the last
formula, and then the quantity thus obtained would have been called the sample standard deviation. See the section
Estimation below for more details.
Standard deviation 78

Definition of population values


Let X be a random variable with mean value μ:

Here the operator E denotes the average or expected value of X. Then the standard deviation of X is the quantity

That is, the standard deviation σ (sigma) is the square root of the average value of (X − μ)2.
The standard deviation of a (univariate) probability distribution is the same as that of a random variable having that
distribution. Not all random variables have a standard deviation, since these expected values need not exist. For
example, the standard deviation of a random variable which follows a Cauchy distribution is undefined because its
expected value μ is undefined.

Discrete random variable


In the case where X takes random values from a finite data set x1, x2, …, xN, with each value having the same
probability, the standard deviation is

or, using summation notation,

Continuous random variable


The standard deviation of a continuous real-valued random variable X with probability density function p(x) is

where

and where the integrals are definite integrals taken for x ranging over the sample space of X.
In the case of a parametric family of distributions, the standard deviation can be expressed in terms of the
parameters. For example, in the case of the log-normal distribution with parameters μ and σ2, the standard deviation
is [(exp(σ2) − 1)exp(2μ + σ2)]1/2.
Standard deviation 79

Estimation
One can find the standard deviation of an entire population in cases (such as standardized testing) where every
member of a population is sampled. In cases where that cannot be done, the standard deviation σ is estimated by
examining a random sample taken from the population. Some estimators are given below:

With standard deviation of the sample


An estimator for σ sometimes used is the standard deviation of the sample, denoted by sN and defined as follows:

This estimator has a uniformly smaller mean squared error than the sample standard deviation (see below), and is the
maximum-likelihood estimate when the population is normally distributed. But this estimator, when applied to a
small or moderately sized sample, tends to be too low: it is a biased estimator.
The standard deviation of the sample is the same as the population standard deviation of a discrete random variable
that can assume precisely the values from the data set, where the probability for each value is proportional to its
multiplicity in the data set.

With sample standard deviation


The most common estimator for σ used is an adjusted version, the sample standard deviation, denoted by s and
defined as follows:

where are the observed values of the sample items and is the mean value of these observations.
This correction (the use of N − 1 instead of N) is known as Bessel's correction. The reason for this correction is that
s2 is an unbiased estimator for the variance σ2 of the underlying population, if that variance exists and the sample
values are drawn independently with replacement. However, s is not an unbiased estimator for the standard deviation
σ; it tends to underestimate the population standard deviation.
The term standard deviation of the sample is used for the uncorrected estimator (using N) whilst the term sample
standard deviation is used for the corrected estimator (using N − 1). The denominator N − 1 is the number of degrees
of freedom in the vector of residuals, .

Other estimators
Although an unbiased estimator for σ is known when the random variable is normally distributed, the formula is
complicated and amounts to a minor correction. Moreover, unbiasedness (in this sense of the word) is not always
desirable.

Identities and mathematical properties


The standard deviation is invariant to changes in location, and scales directly with the scale of the random variable.
Thus, for a constant c and random variables X and Y:

The standard deviation of the sum of two random variables can be related to their individual standard deviations and
the covariance between them:
Standard deviation 80

where and stand for variance and covariance, respectively.


The calculation of the sum of squared deviations can be related to moments calculated directly from the data. In
general, we have

For a finite population with equal probabilities on all points, we have

Thus, the standard deviation is equal to the square root of (the average of the squares less the square of the average).
See computational formula for the variance for a proof of this fact, and for an analogous result for the sample
standard deviation.

Interpretation and application


A large standard deviation indicates that the data points are far from the mean and a small standard deviation
indicates that they are clustered closely around the mean.
For example, each of the three populations {0, 0, 14, 14}, {0, 6, 8, 14} and {6, 6, 8, 8} has a mean of 7. Their
standard deviations are 7, 5, and 1, respectively. The third population has a much smaller standard deviation than the
other two because its values are all close to 7. In a loose sense, the standard deviation tells us how far from the mean
the data points tend to be. It will have the same units as the data points themselves. If, for instance, the data set {0, 6,
8, 14} represents the ages of a population of four siblings in years, the standard deviation is 5 years.
As another example, the population {1000, 1006, 1008, 1014} may represent the distances traveled by four athletes,
measured in meters. It has a mean of 1007 meters, and a standard deviation of 5 meters.
Standard deviation may serve as a measure of uncertainty. In physical science, for example, the reported standard
deviation of a group of repeated measurements should give the precision of those measurements. When deciding
whether measurements agree with a theoretical prediction the standard deviation of those measurements is of crucial
importance: if the mean of the measurements is too far away from the prediction (with the distance measured in
standard deviations), then the theory being tested probably needs to be revised. This makes sense since they fall
outside the range of values that could reasonably be expected to occur if the prediction were correct and the standard
deviation appropriately quantified. See prediction interval.

Application examples
The practical value of understanding the standard deviation of a set of values is in appreciating how much variation
there is from the "average" (mean).

Climate
As a simple example, consider the average daily maximum temperatures for two cities, one inland and one on the
coast. It is helpful to understand that the range of daily maximum temperatures for cities near the coast is smaller
than for cities inland. Thus, while these two cities may each have the same average maximum temperature, the
standard deviation of the daily maximum temperature for the coastal city will be less than that of the inland city as,
on any particular day, the actual maximum temperature is more likely to be farther from the average maximum
temperature for the inland city than for the coastal one.
Standard deviation 81

Sports
Another way of seeing it is to consider sports teams. In any set of categories, there will be teams that rate highly at
some things and poorly at others. Chances are, the teams that lead in the standings will not show such disparity, but
will perform well in most categories. The lower the standard deviation of their ratings in each category, the more
balanced and consistent they will tend to be. Whereas, teams with a higher standard deviation will be more
unpredictable. For example, a team that is consistently bad in most categories will have a low standard deviation. A
team that is consistently good in most categories will also have a low standard deviation. However, a team with a
high standard deviation might be the type of team that scores a lot (strong offense) but also concedes a lot (weak
defense), or, vice versa, that might have a poor offense but compensates by being difficult to score on.
Trying to predict which teams, on any given day, will win, may include looking at the standard deviations of the
various team "stats" ratings, in which anomalies can match strengths vs. weaknesses to attempt to understand what
factors may prevail as stronger indicators of eventual scoring outcomes.
In racing, a driver is timed on successive laps. A driver with a low standard deviation of lap times is more consistent
than a driver with a higher standard deviation. This information can be used to help understand where opportunities
might be found to reduce lap times.

Finance
In finance, standard deviation is a representation of the risk associated with a given security (stocks, bonds, property,
etc.), or the risk of a portfolio of securities (actively managed mutual funds, index mutual funds, or ETFs). Risk is an
important factor in determining how to efficiently manage a portfolio of investments because it determines the
variation in returns on the asset and/or portfolio and gives investors a mathematical basis for investment decisions
(known as mean-variance optimization). The overall concept of risk is that as it increases, the expected return on the
asset will increase as a result of the risk premium earned – in other words, investors should expect a higher return on
an investment when said investment carries a higher level of risk, or uncertainty of that return. When evaluating
investments, investors should estimate both the expected return and the uncertainty of future returns. Standard
deviation provides a quantified estimate of the uncertainty of future returns.
For example, let's assume an investor had to choose between two stocks. Stock A over the last 20 years had an
average return of 10%, with a standard deviation of 20 percentage points (pp) and Stock B, over the same period, had
average returns of 12%, but a higher standard deviation of 30 pp. On the basis of risk and return, an investor may
decide that Stock A is the safer choice, because Stock B's additional 2% points of return is not worth the additional
10 pp standard deviation (greater risk or uncertainty of the expected return). Stock B is likely to fall short of the
initial investment (but also to exceed the initial investment) more often than Stock A under the same circumstances,
and is estimated to return only 2% more on average. In this example, Stock A is expected to earn about 10%, plus or
minus 20 pp (a range of 30% to -10%), about two-thirds of the future year returns. When considering more extreme
possible returns or outcomes in future, an investor should expect results of up to 10% plus or minus 60 pp, or a range
from 70% to (−)50%, which includes outcomes for three standard deviations from the average return (about 99.7%
of probable returns).
Calculating the average return (or arithmetic mean) of a security over a given period will generate an expected return
on the asset. For each period, subtracting the expected return from the actual return results in the variance. Square
the variance in each period to find the effect of the result on the overall risk of the asset. The larger the variance in a
period, the greater risk the security carries. Taking the average of the squared variances results in the measurement of
overall units of risk associated with the asset. Finding the square root of this variance will result in the standard
deviation of the investment tool in question.
Population standard deviation is used to set the width of Bollinger Bands, a widely adopted technical analysis tool.
For example, the upper Bollinger Band is given as: x + nσx The most commonly used value for n is 2; there is about
5% chance of going outside, assuming the normal distribution is right.
Standard deviation 82

Geometric interpretation
To gain some geometric insights, we will start with a population of three values, x1, x2, x3. This defines a point P =
(x1, x2, x3) in R3. Consider the line L = {(r, r, r) : r in R}. This is the "main diagonal" going through the origin. If our
three given values were all equal, then the standard deviation would be zero and P would lie on L. So it is not
unreasonable to assume that the standard deviation is related to the distance of P to L. And that is indeed the case. To
move orthogonally from L to the point P, one begins at the point:

whose coordinates are the mean of the values we started out with. A little algebra shows that the distance between P
and M (which is the same as the orthogonal distance between P and the line L) is equal to the standard deviation of
the vector x1, x2, x3, divided by the square root of the number of dimensions of the vector.

Chebyshev's inequality
An observation is rarely more than a few standard deviations away from the mean. Chebyshev's inequality ensures,
for all distributions for which the standard deviation is defined, the amount of data within a number of standard
deviations is at least that as follows. The following table gives some exemplar values of the minimum population
within a number of standard deviations.

Min. population Distance from mean

50% √2

75% 2

89% 3

94% 4

96% 5

97% 6

[4]

Rules for normally distributed data


Standard deviation 83

The central limit theorem says that the


distribution of a sum of many
independent, identically distributed
random variables tends towards the
famous bell-shaped normal distribution
with a probability density function of:

Dark blue is less than one standard deviation from the mean. For the normal distribution,
this accounts for 68.27 % of the set; while two standard deviations from the mean
(medium and dark blue) account for 95.45%; three standard deviations (light, medium,
and dark blue) account for 99.73%; and four standard deviations account for 99.994%.
The two points of the curve which are one standard deviation from the mean are also the
inflection points.

where μ is the arithmetic mean of the sample. The standard deviation therefore is simply a scaling variable that
adjusts how broad the curve will be, though also appears in the normalizing constant to keep the distribution
normalized for different widths.
If a data distribution is approximately normal then the proportion of data values within z standard deviations of the
mean is defined by , where is the error function. If a data distribution is approximately normal then
about 68% of the data values are within 1 standard deviation of the mean (mathematically, μ ± σ, where μ is the
arithmetic mean), about 95% are within two standard deviations (μ ± 2σ), and about 99.7% lie within 3 standard
deviations (μ ± 3σ). This is known as the 68-95-99.7 rule, or the empirical rule.
For various values of z, the percentage of values expected to lie in and outside the symmetric confidence interval,
CI = (−zσ, zσ), are as follows:

zσ Percentage within CI Percentage outside CI Ratio outside CI

1σ 68.2689492% 31.7310508% 1 / 3.1514871


1.645σ 90% 10% 1 / 10
1.960σ 95% 5% 1 / 20
2σ 95.4499736% 4.5500264% 1 / 21.977894
2.576σ 99% 1% 1 / 100
3σ 99.7300204% 0.2699796% 1 / 370.398
3.2906σ 99.9% 0.1% 1 / 1000

4σ 99.993666% 0.006334% 1 / 15,788


5σ 99.9999426697% 0.0000573303% 1 / 1744278
6σ 99.9999998027% 0.0000001973% 1 / 506,800,000
7σ 99.9999999997440% 0.0000000002560% 1 / 390600000000
Standard deviation 84

Relationship between standard deviation and mean


The mean and the standard deviation of a set of data are usually reported together. In a certain sense, the standard
deviation is a "natural" measure of statistical dispersion if the center of the data is measured about the mean. This is
because the standard deviation from the mean is smaller than from any other point. The precise statement is the
following: suppose x1, ..., xn are real numbers and define the function:

Using calculus or by completing the square, it is possible to show that σ(r) has a unique minimum at the mean:

The coefficient of variation of a sample is the ratio of the standard deviation to the mean. It is a dimensionless
number that can be used to compare the amount of variance between populations with means that are close together.
The reason is that if you compare populations with same standard deviations but different means then coefficient of
variation will be bigger for the population with the smaller mean. Thus in comparing variability of data, coefficient
of variation should be used with care and better replaced with another method.
Often we want some information about the accuracy of the mean we obtained. We can obtain this by determining the
standard deviation of the sampled mean. The standard deviation of the mean is related to the standard deviation of
the distribution by:

where N is the number of observation in the sample used to estimate the mean. This can easily be proven with:

hence

Resulting in:

Worked example
The standard deviation of a discrete random variable is the root-mean-square (RMS) deviation of its values from the
mean.
If the random variable X takes on N values (which are real numbers) with equal probability, then its
standard deviation σ can be calculated as follows:
1. Find the mean, , of the values.
2. For each value calculate its deviation from the mean.
3. Calculate the squares of these deviations.
4. Find the mean of the squared deviations. This quantity is the variance σ2.
5. Take the square root of the variance.
Standard deviation 85

This calculation is described by the following formula:

where is the arithmetic mean of the values xi, defined as:

If not all values have equal probability, but the probability of value xi equals pi, the standard deviation can be
computed by:

where

Suppose we wished to find the standard deviation of the distribution placing probabilities 1⁄4, 1⁄2, and 1⁄4 on the points
in the sample space 3, 7, and 19.
Step 1: find the probability-weighted mean

Step 2: find the deviation of each value in the sample space from the mean,

Step 3: square each of the deviations, which amplifies large deviations and makes negative values positive,

Step 4: find the probability-weighted mean of the squared deviations,

Step 5: take the positive square root of the quotient (converting squared units back to regular units),

So, the standard deviation of the set is 6. This example also shows that, in general, the standard deviation is different
from the mean absolute deviation (which is 5 in this example).
Standard deviation 86

Rapid calculation methods


The following two formulas can represent a running (continuous) standard deviation. A set of three power sums s0,
s1, s2 are each computed over a set of N values of x, denoted as xk.

Note that s0 raises x to the zero power, and since x0 is always 1, s0 evaluates to N.
Given the results of these three running summations, the values s0, s1, s2 can be used at any time to compute the
current value of the running standard deviation. This definition for sj can represent the two different phases
(summation computation sj, and σ calculation).

Similarly for sample standard deviation,

In a computer implementation, as the three sj sums become large, we need to consider round-off error, arithmetic
overflow, and arithmetic underflow. The method below calculates the running sums method with reduced rounding
errors:

where A is the mean value.

Sample variance:

Standard variance:

Weighted calculation
When the values xi are weighted with unequal weights wi, the power sums s0, s1, s2 are each computed as:

And the standard deviation equations remain unchanged. Note that s0 is now the sum of the weights and not the
number of samples N.
The incremental method with reduced rounding errors can also be applied, with some additional complexity.
A running sum of weights must be computed:

and places where 1/i is used above must be replaced by wi/Wi:


Standard deviation 87

In the final division,

and

where n is the total number of elements, and n' is the number of elements with non-zero weights. The above formulas
become equal to the simpler formulas given above if weights are taken as equal to 1.

Combining standard deviations

Population-based statistics
The populations of sets, which may overlap, can be calculated simply as follows:

Standard deviations of non-overlapping (X ∩ Y = ∅) sub-populations can be aggregated as follows if the size (actual
or relative to one another) and means of each are known:

For example, suppose it is known that the average American man has a mean height of 70 inches with a standard
deviation of 3 inches and that the average American woman has a mean height of 65 inches with a standard deviation
of 2 inches. Also assume that the number of men, N, is equal to the number of woman. Then the mean and standard
deviation of heights of American adults could be calculated as:

For the more general M non-overlapping data sets X1 through XM:


where

If the size (actual or relative to one another), mean, and standard deviation of two overlapping populations are
known for the populations as well as their intersection, then the standard deviation of the overall population can still
be calculated as follows:
If two or more sets of data are being added in a pairwise fashion, the standard deviation can be calculated if the
covariance between the each pair of data sets is known.

For the special case where no correlation exists between all pairs of data sets, then the relation reduces to the
root-mean-square:
Standard deviation 88

Sample-based statistics
Standard deviations of non-overlapping, , sub-samples can be aggregated as follows if the actual size and
means of each are known:
For the more general M non-overlapping data sets, :
where:

If the size, mean, and standard deviation of two overlapping samples are known for the samples as well as their
intersection, then the standard deviation of the samples can still be calculated. In general:

See also
• Accuracy and precision • Median
• An inequality on location and scale parameters • Pooled standard deviation
• Cumulant • Raw score
• Deviation (statistics) • Root mean square
• Distance standard deviation • Sample size
• Error bar • Samuelson's inequality
• Geometric standard deviation • Saturation (color theory)
• Kurtosis • Skewness
• Mean absolute error • Unbiased estimation of standard deviation
• Variance
• Volatility (finance)
• Yamartino method for calculating standard deviation of wind direction

External links
• A Guide to Understanding & Calculating Standard Deviation [5]
• C++ Source Code [6] (license free) C++ implementation of rapid mean, variance and standard deviation
calculation
• Interactive Demonstration and Standard Deviation Calculator [7]
• Standard Deviation – an explanation without maths [8]
• Standard Deviation, an elementary introduction [9]
• Standard Deviation, a simpler explanation for writers and journalists [10]
• Standard Deviation Calculator [11]
• Texas A&M Standard Deviation and Confidence Interval Calculators [12]
• The concept of Standard Deviation is shown in this 8-foot-tall (2.4 m) Probability Machine (named Sir Francis)
comparing stock market returns to the randomness of the beans dropping through the quincunx pattern. [13] from
Index Funds Advisors IFA.com [14]
Standard deviation 89

References
[1] Dodge, Yadolah (2003). The Oxford Dictionary of Statistical Terms. Oxford University Press. ISBN 0-19-920613-9.
[2] Pearson, Karl (1894). "On the dissection of asymmetrical frequency curves". Phil. Trans. Roy. Soc. London, Series A 185: 719–810.
[3] Miller, Jeff. "Earliest Known Uses of Some of the Words of Mathematics" (http:/ / jeff560. tripod. com/ mathword. html). .
[4] Ghahramani, Saeed (2000). Fundamentals of Probability (2nd Edition). Prentice Hall: New Jersey. p. 438.
[5] http:/ / stats4students. com/ measures-of-spread-3. php
[6] http:/ / www. chrisevansdev. com/ rapidlive-statistics/
[7] http:/ / www. usablestats. com/ tutorials/ StandardDeviation
[8] http:/ / www. techbookreport. com/ tutorials/ stddev-30-secs. html
[9] http:/ / davidmlane. com/ hyperstat/ A16252. html
[10] http:/ / www. robertniles. com/ stats/ stdev. shtml
[11] http:/ / invsee. asu. edu/ srinivas/ stdev. html
[12] http:/ / www. stat. tamu. edu/ ~jhardin/ applets/
[13] http:/ / www. youtube. com/ watch?v=AUSKTk9ENzg
[14] http:/ / www. ifa. com

Random variable
In mathematics, a random variable (or stochastic variable) is (in general) a measurable function that maps a
probability space into a measurable space. Random variables mapping all possible outcomes of an event into the real
numbers are frequently studied in elementary statistics and used in the sciences to make predictions based on data
obtained from scientific experiments. In addition to scientific applications, random variables were developed for the
analysis of games of chance and stochastic events.
While the above definition of a random variable requires a familiarity with measure theory to appreciate, the
language and structure of random variables can be grasped at various levels of mathematical fluency through
limiting the variables one considers. Beyond the introductory level, however, set theory and calculus are
fundamental to their study. The concept of a random variable is closely linked to the term "random variate": a
random variate is a particular outcome (value) of a random variable.
There are two types of random variables: discrete and continuous.[1] A discrete random variable maps events to
values of a countable set (e.g., the integers), with each value in the range having probability greater than or equal to
zero. A continuous random variable maps events to values of an uncountable set (e.g., the real numbers). For a
continuous random variable, the probability of any specific value is zero, whereas the probability of some infinite set
of values (such as an interval of non-zero length) may be positive. A random variable can be "mixed", with part of its
probability spread out over an interval like a typical continuous variable, and part of it concentrated on particular
values like a discrete variable. These classifications are equivalent to the categorisation of probability distributions.
A random variable has an associated probability distribution and frequently also a probability density function.
Probability density functions are commonly used for continuous variables.
Random variable 90

Intuitive description
In the simplest case, a random variable maps events to real numbers. A random variable can be thought of as a
function mapping the sample space of a random process to a set of numbers or quantifiable labels.

Examples
For a coin toss, the possible events are heads or tails. The possible outcomes for one fair coin toss can be described
using the following random variable:

and if the coin is equally likely to land on either side then it has a probability mass function given by:

It is sometimes convenient to model this situation using a random variable which takes numbers as its values, rather
than the values head and tail. This can be done by using the real random variable defined as follows:

and if the coin is equally likely to land on either side then it has a probability mass function given by:

A random variable can also be used to describe the process of rolling a fair die and the possible outcomes. The most
obvious representation is to take the set {1, 2, 3, 4, 5, 6} as the sample space, defining the random variable X as the
number rolled. In this case,

An example of a continuous random variable would be one based on a spinner that can choose a horizontal direction.
Then the values taken by the random variable are directions. We could represent these directions by North West,
East South East, etc. However, it is commonly more convenient to map the sample space to a random variable which
takes values which are real numbers. This can be done, for example, by mapping a direction to a bearing in degrees
clockwise from North. The random variable then takes values which are real numbers from the interval [0, 360), with
all parts of the range being "equally likely". In this case, X = the angle spun. Any real number has probability zero of
being selected, but a positive probability can be assigned to any range of values. For example, the probability of
choosing a number in [0, 180] is ½. Instead of speaking of a probability mass function, we say that the probability
density of X is 1/360. The probability of a subset of [0, 360) can be calculated by multiplying the measure of the set
by 1/360. In general, the probability of a set for a given continuous random variable can be calculated by integrating
Random variable 91

the density over the given set.


An example of a random variable of mixed type would be based on an experiment where a coin is flipped and the
spinner is spun only if the result of the coin toss is heads. If the result is tails, X = −1; otherwise X = the value of the
spinner as in the preceding example. There is a probability of ½ that this random variable will have the value −1.
Other ranges of values would have half the probability of the last example.

Non-real-valued form
Very commonly a random variable takes values which are numbers. This is by no means always so; one can consider
random variables of any type. This often includes vector-valued random variables or complex-valued random
variables, but in general can include arbitrary types such as sequences, sets, shapes, manifolds, matrices, and
functions.

Formal definition
Let (Ω, ℱ, P) be a probability space, and (E, ℰ) a measurable space. Then an (E, ℰ)-valued random variable is a
function X: Ω→E, which is (ℱ, ℰ)-measurable. That is, such function that for every subset B ∈ ℰ, its preimage lies in
ℱ:  X −1(B) ∈ ℱ, where X −1(B) = {ω: X(ω) ∈ B}.[2]
When E is a topological space, then the most common choice for the σ-algebra ℰ is to take it equal to the Borel
σ-algebra ℬ(E), which is the σ-algebra generated by the collection of all open sets in E. In such case the (E,
ℰ)-valued random variable is called the E-valued random variable. Moreover, when space E is the real line ℝ, then
such real-valued random variable is called simply the random variable.
The meaning of this definition is following: suppose (Ω, ℱ, P) is the underlying probability space, whereas we want
to consider a probability space based on the space E with σ-algebra ℰ. In order to turn the pair (E, ℰ) into a
probability space, we need to equip it with some probability function, call it Q. This function would have to assign
the probability to each set B in ℰ. If X is some function from Ω to E, then it is natural to postulate that the probability
of B must be the same as the probability of its preimage in Ω: Q(B) = P(X −1(B)). In order for this formula to be
meaningful, X −1(B) must lie in ℱ, since the probability function P is defined only on ℱ. And this is exactly what
the definition of the random variable requires: that X −1(B) ∈ ℱ for every B ∈ ℰ.

Real-valued random variables


In this case the observation space is the real numbers with a suitable measure. Recall, is the probability
space. For real observation space, the function is a real-valued random variable if

This definition is a special case of the above because generates the Borel sigma-algebra on
the real numbers, and it is enough to check measurability on a generating set. (Here we are using the fact that
.)
Random variable 92

Distribution functions of random variables


Associating a cumulative distribution function (CDF) with a random variable is a generalization of assigning a value
to a variable. If the CDF is a (right continuous) Heaviside step function then the variable takes on the value at the
jump with probability 1. In general, the CDF specifies the probability that the variable takes on particular values.
If a random variable defined on the probability space is given, we can ask questions like
"How likely is it that the value of is bigger than 2?". This is the same as the probability of the event
which is often written as for short, and easily obtained since

Recording all these probabilities of output ranges of a real-valued random variable X yields the probability
distribution of X. The probability distribution "forgets" about the particular probability space used to define X and
only records the probabilities of various values of X. Such a probability distribution can always be captured by its
cumulative distribution function

and sometimes also using a probability density function. In measure-theoretic terms, we use the random variable X to
"push-forward" the measure P on Ω to a measure dF on R. The underlying probability space Ω is a technical device
used to guarantee the existence of random variables, and sometimes to construct them. In practice, one often
disposes of the space Ω altogether and just puts a measure on R that assigns measure 1 to the whole real line, i.e.,
one works with probability distributions instead of random variables.

Moments
The probability distribution of a random variable is often characterised by a small number of parameters, which also
have a practical interpretation. For example, it is often enough to know what its "average value" is. This is captured
by the mathematical concept of expected value of a random variable, denoted E[X], and also called the first
moment. In general, E[f(X)] is not equal to f(E[X]). Once the "average value" is known, one could then ask how far
from this average value the values of X typically are, a question that is answered by the variance and standard
deviation of a random variable. E[X] can be viewed intuitively as an average obtained from an infinite population,
the members of which are particular evaluations of X.
Mathematically, this is known as the (generalised) problem of moments: for a given class of random variables X,
find a collection {fi} of functions such that the expectation values E[fi(X)] fully characterise the distribution of the
random variable X.

Functions of random variables


If we have a random variable on and a Borel measurable function , then will also
be a random variable on , since the composition of measurable functions is also measurable. (However, this is not
true if is Lebesgue measurable.) The same procedure that allowed one to go from a probability space to
can be used to obtain the distribution of . The cumulative distribution function of is

If function g is invertible, i.e. g-1 exists, and increasing, then the previous relation can be extended to obtain

and, again with the same hypotheses of invertibility of g, assuming also differentiability, we can find the relation
between the probability density functions by differentiating both sides with respect to y, in order to obtain

.
Random variable 93

If there is no invertibility of g but each y admits at most a countable number of roots (i.e. a finite, or countably
infinite, number of xi such that y = g(xi)) then the previous relation between the probability density functions can be
generalized with

where xi = gi-1(y). The formulas for densities do not demand g to be increasing.

Example 1
Let X be a real-valued, continuous random variable and let Y = X2.

If y < 0, then P(X2 ≤ y) = 0, so

If y ≥ 0, then

so

Example 2
Suppose is a random variable with a cumulative distribution

where is a fixed parameter. Consider the random variable Then,


The last expression can be calculated in terms of the cumulative distribution of so

Equivalence of random variables


There are several different senses in which random variables can be considered to be equivalent. Two random
variables can be equal, equal almost surely, equal in mean, or equal in distribution.
In increasing order of strength, the precise definition of these notions of equivalence is given below.

Equality in distribution
If the sample space is a subset of the real line a possible definition is that random variables X and Y are equal in
distribution if they have the same distribution functions:

Two random variables having equal moment generating functions have the same distribution. This provides, for
example, a useful method of checking equality of certain functions of i.i.d. random variables. However, the moment
generating function exists only for distributions that are good enough.
Random variable 94

Almost sure equality


Two random variables X and Y are equal almost surely if, and only if, the probability that they are different is zero:

For all practical purposes in probability theory, this notion of equivalence is as strong as actual equality. It is
associated to the following distance:

where "ess sup" represents the essential supremum in the sense of measure theory.

Equality
Finally, the two random variables X and Y are equal if they are equal as functions on their probability space, that is,

Convergence
Much of mathematical statistics consists in proving convergence results for certain sequences of random variables;
see for instance the law of large numbers and the central limit theorem.
There are various senses in which a sequence (Xn) of random variables can converge to a random variable X. These
are explained in the article on convergence of random variables.

See also
• Observable variable
• Probability distribution
• Algebra of random variables
• Multivariate random variable
• Event (probability theory)
• Randomness
• Random element
• Random vector
• Random function
• Random measure
• Stochastic process

References
[1] Rice, John (1999). Mathematical Statistics and Data Analysis. Duxbury Press. ISBN 0534209343.
[2] Fristedt & Gray (1996, page 11)

Literature
• Fristedt, Bert; Gray, Lawrence (1996). A modern approach to probability theory. Boston: Birkhäuser.
ISBN 3-7643-3807-5.
• Kallenberg, O., Random Measures, 4th edition. Academic Press, New York, London; Akademie-Verlag, Berlin
(1986). MR0854102 ISBN 0-12-394960-2
• Kallenberg, O., Foundations of Modern Probability, 2nd edition. Springer-Verlag, New York, Berlin, Heidelberg
(2001). ISBN 0-387-95313-2
• Papoulis, Athanasios 1965 Probability, Random Variables, and Stochastic Processes. McGraw–Hill Kogakusha,
Tokyo, 9th edition, ISBN 0-07-119981-0.
Random variable 95

This article incorporates material from Random variable on PlanetMath, which is licensed under the Creative
Commons Attribution/Share-Alike License.

Probability distribution
In probability theory and statistics, a probability distribution identifies either the probability of each value of a
random variable (when the variable is discrete), or the probability of the value falling within a particular interval
(when the variable is continuous).[1] The probability distribution describes the range of possible values that a random
variable can attain and the probability that the value of the random variable is within any (measurable) subset of that
range.
When the random variable takes values
in the set of real numbers, the
probability distribution is completely
described by the cumulative
distribution function, whose value at
each real x is the probability that the
random variable is smaller than or
equal to x.

The concept of the probability


distribution and the random variables
which they describe underlies the The Normal distribution, often called the "bell curve".

mathematical discipline of probability


theory, and the science of statistics. There is spread or variability in almost any value that can be measured in a
population (e.g. height of people, durability of a metal, sales growth, traffic flow, etc.); almost all measurements are
made with some intrinsic error; in physics many processes are described probabilistically, from the kinetic properties
of gases to the quantum mechanical description of fundamental particles. For these and many other reasons, simple
numbers are often inadequate for describing a quantity, while probability distributions are often more appropriate.

There are various probability distributions that show up in various different applications. One of the more important
ones is the normal distribution, which is also known as the Gaussian distribution or the bell curve and approximates
many different naturally occurring distributions. The toss of a fair coin yields another familiar distribution, where the
possible values are heads or tails, each with probability 1/2.

Formal definition
In the measure-theoretic formalization of probability theory, a random variable is defined as a measurable function X
from a probability space to measurable space . A probability distribution is the pushforward
measure X*P = PX −1 on .

Probability distributions of real-valued random variables


Because a probability distribution Pr on the real line is determined by the probability of a real-valued random
variable X being in a half-open interval (-∞, x], the probability distribution is completely characterized by its
cumulative distribution function:
Probability distribution 96

Discrete probability distribution


A probability distribution is called discrete if its cumulative distribution function only increases in jumps. More
precisely, a probability distribution is discrete if there is a finite or countable set whose probability is 1.
For many familiar discrete distributions, the set of possible values is topologically discrete in the sense that all its
points are isolated points. But, there are discrete distributions for which this countable set is dense on the real line.
Discrete distributions are characterized by a probability mass function, such that

Continuous probability distribution


By one convention, a probability distribution is called continuous if its cumulative distribution function
is continuous and, therefore, the probability measure of singletons for all .
Another convention reserves the term continuous probability distribution for absolutely continuous distributions.
These distributions can be characterized by a probability density function: a non-negative Lebesgue integrable
function defined on the real numbers such that

Discrete distributions and some continuous distributions (like the Cantor distribution) do not admit such a density.

Terminology
The support of a distribution is the smallest closed interval/set whose complement has probability zero. It may be
understood as the points or elements that are actual members of the distribution.
A discrete random variable is a random variable whose probability distribution is discrete. Similarly, a continuous
random variable is a random variable whose probability distribution is continuous.

Simulated sampling
The following algorithm lets one sample from a probability distribution (either discrete or continuous). This
algorithm assumes that one has access to the inverse of the cumulative distribution (easy to calculate with a discrete
distribution, can be approximated for continuous distributions) and a computational primitive called "random()"
which returns an arbitrary-precision floating-point-value in the range of [0,1).
define function sampleFrom(cdfInverse (type="function")):

// input:

// cdfInverse(x) - the inverse of the CDF of the probability distribution

// example: if distribution is [[Gaussian]], one can use a [[Taylor approximation]] of the inverse of [[erf]](x)

// example: if distribution is discrete, see explanation below pseudocode

// output:

// type="real number" - a value sampled from the probability distribution represented by cdfInverse

r = random()

while(r == 0): (make sure r is not equal to 0; discontinuity possible)

r = random()

return cdfInverse(r)
Probability distribution 97

For discrete distributions, the function cdfInverse (inverse of cumulative distribution function) can be calculated
from samples as follows: for each element in the sample range (discrete values along the x-axis), calculating the total
samples before it. Normalize this new discrete distribution. This new discrete distribution is the CDF, and can be
turned into an object which acts like a function: calling cdfInverse(query) returns the smallest x-value such that the
CDF is greater than or equal to the query.
define function dataToCdfInverse(discreteDistribution (type="dictionary"))

// input:

// discreteDistribution - a mapping from possible values to frequencies/probabilities

// example: {0 -> 1-p, 1 -> p} would be a [[Bernoulli distribution]] with chance=p

// example: setting p=0.5 in the above example, this is a [[fair coin]] where P(X=1)->"heads" and P(X=0)->"tails"

// output:

// type="function" - a function that represents (CDF^-1)(x)

define function cdfInverse(x):

integral = 0

go through mapping (key->value) in sorted order, adding value to integral...

stop when integral > x (or integral >= x, doesn't matter)

return last key we added

return cdfInverse

Note that often, mathematics environments and computer algebra systems will have some way to represent
probability distributions and sample from them. This functionality might even have been developed in third-party
libraries. Such packages greatly facilitate such sampling, most likely have optimizations for common distributions,
and are likely to be more elegant than the above bare-bones solution.

Some properties
• The probability density function of the sum of two independent random variables is the convolution of each of
their density functions.
• The probability density function of the difference of two independent random variables is the cross-correlation
of their density functions.
• Probability distributions are not a vector space – they are not closed under linear combinations, as these do not
preserve non-negativity or total integral 1 – but they are closed under convex combination, thus forming a convex
subset of the space of functions (or measures).
Probability distribution 98

See also
• Copula (statistics) • Inverse transform • Probability density function
• Cumulative distribution function sampling • Random variable
• Histogram • Likelihood function • Riemann–Stieltjes integral application to probability theory
• List of statistical topics

External links
• An 8-foot-tall (2.4 m) Probability Machine (named Sir Francis) comparing stock market returns to the
randomness of the beans dropping through the quincunx pattern. [13] from Index Funds Advisors IFA.com [14],
youtube.com
• Interactive Discrete and Continuous Probability Distributions [2], socr.ucla.edu
• A Compendium of Common Probability Distributions [3]
• A Compendium of Distributions [4], vosesoftware.com
• Statistical Distributions - Overview [5], xycoon.com
• Probability Distributions [6] in Quant Equation Archive, sitmo.com
• A Probability Distribution Calculator [7], covariable.com
• Sourceforge.net [8], Distribution Explorer: a mixed C++ and C# Windows application that allows you to explore
the properties of 20+ statistical distributions, and calculate CDF, PDF & quantiles. Written using open-source
C++ from the Boost.org [9] Math Toolkit library.
• Explore different probability distributions and fit your own dataset online - interactive tool [10], xjtek.com

References
[1] Everitt, B.S. (2006) The Cambridge Dictionary of Statistics, Third Edition. pp. 313–314. Cambridge University Press, Cambridge. ISBN
0521690277
[2] http:/ / www. socr. ucla. edu/ htmls/ SOCR_Distributions. html
[3] http:/ / www. causascientia. org/ math_stat/ Dists/ Compendium. pdf
[4] http:/ / www. vosesoftware. com/ content/ ebook. pdf
[5] http:/ / www. xycoon. com/ contdistroverview. htm
[6] http:/ / www. sitmo. com/ eqcat/ 8
[7] http:/ / www. covariable. com/ continuous. html
[8] http:/ / sourceforge. net/ projects/ distexplorer/
[9] http:/ / www. boost. org
[10] http:/ / www. xjtek. com/ anylogic/ demo_models/ 111/
Real number 99

Real number
In computing, 'real number' often refers to non-complex floating-point numbers.
In mathematics, the real numbers include both rational numbers, such as 42 and
−23/129, and irrational numbers, such as pi and the square root of two. A real number
can be given by an infinite decimal representation, such as 2.4871773339..., where the
digits continue indefinitely. The real numbers are sometimes thought of as points on an
infinitely long number line.
These descriptions of the real numbers, while intuitively accessible, are not sufficiently Symbol often used to
rigorous for the purposes of pure mathematics. The discovery of a suitably rigorous denote the set of real
numbers
definition of the real numbers—indeed, the realization that a better definition was
needed—was one of the most important developments of 19th century mathematics.
Popular definitions in use today include equivalence classes of Cauchy sequences of rational numbers; Dedekind
cuts; a more sophisticated version of "decimal representation"; and an axiomatic definition of the real numbers as the
unique complete Archimedean ordered field. These definitions are all described in detail below.

Basic properties
A real number may be either rational or Real numbers can be thought of as points on an infinitely long number line.
irrational; either algebraic or transcendental;
and either positive, negative, or zero. Real numbers are used to measure continuous quantities. They may in theory
be expressed by decimal representations that have an infinite sequence of digits to the right of the decimal point;
these are often represented in the same form as 324.823122147… The ellipsis (three dots) indicate that there would
still be more digits to come.

More formally, real numbers have the two basic properties of being an ordered field, and having the least upper
bound property. The first says that real numbers comprise a field, with addition and multiplication as well as division
by nonzero numbers, which can be totally ordered on a number line in a way compatible with addition and
multiplication. The second says that if a nonempty set of real numbers has an upper bound, then it has a least upper
bound. These two together define the real numbers completely, and allow its other properties to be deduced. For
instance, we can prove from these properties that every polynomial of odd degree with real coefficients has a real
root, and that if you add the square root of −1 to the real numbers, obtaining the complex numbers, the resulting field
is algebraically closed.

Uses
In the physical sciences, most physical constants such as the universal gravitational constant, and physical variables,
such as position, mass, speed, and electric charge, are modeled using real numbers. Note importantly, however, that
all actual measurements of physical quantities yield rational numbers because the precision of such measurements
can only be finite.
Computers cannot directly operate on real numbers, but only on a finite subset of rational numbers, limited by the
number of bits used to store them. However, computer algebra systems are able to treat some irrational numbers
exactly by storing their algebraic description (such as "sqrt(2)") rather than their rational approximation.[1]
A real number is said to be computable if there exists an algorithm that yields its digits. Because there are only
countably many algorithms, but an uncountable number of reals, "most" real numbers fail to be computable. Some
constructivists accept the existence of only those reals that are computable. The set of definable numbers is broader,
but still only countable. If computers could use unlimited precision real numbers (real computation), then one could
Real number 100

solve NP-complete problems, and even #P-complete problems in polynomial time, answering affirmatively the P =
NP problem. Unlimited precision real numbers in the physical universe are prohibited by the holographic principle
and the Bekenstein bound.[2]
Mathematicians use the symbol R (or alternatively, , the letter "R" in blackboard bold, Unicode ℝ) to represent
n
the set of all real numbers. The notation R refers to an n-dimensional space with real coordinates; for example, a
value from R3 consists of three real numbers and specifies a location in 3-dimensional space.
In mathematics, real is used as an adjective, meaning that the underlying field is the field of real numbers. For
example real matrix, real polynomial and real Lie algebra. As a substantive, the term is used almost strictly in
reference to the real numbers themselves (e.g., The "set of all reals").

History
Vulgar fractions had been used by the Egyptians around 1000 BC; the Vedic "Sulba Sutras" ("The rules of chords")
in, ca. 600 BC, include what may be the first 'use' of irrational numbers. The concept of irrationality was implicitly
accepted by early Indian mathematicians since Manava (c. 750–690 BC), who was aware that the square roots of
certain numbers such as 2 and 61 could not be exactly determined.[3] Around 500 BC, the Greek mathematicians led
by Pythagoras realized the need for irrational numbers, in particular the irrationality of the square root of 2.
The Middle Ages saw the acceptance of zero, negative, integral and fractional numbers, first by Indian and Chinese
mathematicians, and then by Arabic mathematicians, who were also the first to treat irrational numbers as algebraic
objects,[4] which was made possible by the development of algebra. Arabic mathematicians merged the concepts of
"number" and "magnitude" into a more general idea of real numbers.[5] The Egyptian mathematician Abū Kāmil
Shujā ibn Aslam (c. 850–930) was the first to accept irrational numbers as solutions to quadratic equations or as
coefficients in an equation, often in the form of square roots, cube roots and fourth roots.[6]
In the 16th century, Simon Stevin created the basis for modern decimal notation, and insisted that there is no
difference between rational and irrational numbers in this regard.
In the 18th and 19th centuries there was much work on irrational and transcendental numbers. Lambert (1761) gave
the first flawed proof that π cannot be rational; Legendre (1794) completed the proof, and showed that π is not the
square root of a rational number. Ruffini (1799) and Abel (1842) both constructed proofs of Abel–Ruffini theorem:
that the general quintic or higher equations cannot be solved by a general formula involving only arithmetical
operations and roots.
Évariste Galois (1832) developed techniques for determining whether a given equation could be solved by radicals,
which gave rise to the field of Galois theory. Joseph Liouville (1840) showed that neither e nor e2 can be a root of an
integer quadratic equation, and then established existence of transcendental numbers, the proof being subsequently
displaced by Georg Cantor (1873). Charles Hermite (1873) first proved that e is transcendental, and Ferdinand von
Lindemann (1882), showed that π is transcendental. Lindemann's proof was much simplified by Weierstrass (1885),
still further by David Hilbert (1893), and has finally been made elementary by Hurwitz and Paul Albert Gordan.
The development of calculus in the 18th century used the entire set of real numbers without having defined them
cleanly. The first rigorous definition was given by Georg Cantor in 1871. In 1874 he showed that the set of all real
numbers is uncountably infinite but the set of all algebraic numbers is countably infinite. Contrary to widely held
beliefs, his first method was not his famous diagonal argument, which he published in 1891. See Cantor's first
uncountability proof.
Real number 101

Definition

Construction from the rational numbers


The real numbers can be constructed as a completion of the rational numbers in such a way that a sequence defined
by a decimal or binary expansion like {3, 3.1, 3.14, 3.141, 3.1415,...} converges to a unique real number. For details
and other constructions of real numbers, see construction of the real numbers.

Axiomatic approach
Let R denote the set of all real numbers. Then:
• The set R is a field, meaning that addition and multiplication are defined and have the usual properties.
• The field R is ordered, meaning that there is a total order ≥ such that, for all real numbers x, y and z:
• if x ≥ y then x + z ≥ y + z;
• if x ≥ 0 and y ≥ 0 then xy ≥ 0.
• The order is Dedekind-complete; that is, every non-empty subset S of R with an upper bound in R has a least
upper bound (also called supremum) in R.
The last property is what differentiates the reals from the rationals. For example, the set of rationals with square less
than 2 has a rational upper bound (e.g., 1.5) but no rational least upper bound, because the square root of 2 is not
rational.
The real numbers are uniquely specified by the above properties. More precisely, given any two Dedekind-complete
ordered fields R1 and R2, there exists a unique field isomorphism from R1 to R2, allowing us to think of them as
essentially the same mathematical object.
For another axiomatization of R, see Tarski's axiomatization of the reals.

Properties

Completeness
The main reason for introducing the reals is that the reals contain all limits. More technically, the reals are complete
(in the sense of metric spaces or uniform spaces, which is a different sense than the Dedekind completeness of the
order in the previous section). This means the following:
A sequence (xn) of real numbers is called a Cauchy sequence if for any ε > 0 there exists an integer N (possibly
depending on ε) such that the distance |xn − xm| is less than ε for all n and m that are both greater than N. In other
words, a sequence is a Cauchy sequence if its elements xn eventually come and remain arbitrarily close to each other.
A sequence (xn) converges to the limit x if for any ε > 0 there exists an integer N (possibly depending on ε) such that
the distance |xn − x| is less than ε provided that n is greater than N. In other words, a sequence has limit x if its
elements eventually come and remain arbitrarily close to x.
It is easy to see that every convergent sequence is a Cauchy sequence. An important fact about the real numbers is
that the converse is also true:
Every Cauchy sequence of real numbers is convergent to a real number.
That is, the reals are complete.
Note that the rationals are not complete. For example, the sequence (1, 1.4, 1.41, 1.414, 1.4142, 1.41421, ...), where
each term adds a digit of the decimal expansion of the positive square root of 2, is Cauchy but it does not converge to
a rational number. (In the real numbers, in contrast, it converges to the positive square root of 2.)
The existence of limits of Cauchy sequences is what makes calculus work and is of great practical use. The standard
numerical test to determine if a sequence has a limit is to test if it is a Cauchy sequence, as the limit is typically not
Real number 102

known in advance.
For example, the standard series of the exponential function

converges to a real number because for every x the sums

can be made arbitrarily small by choosing N sufficiently large. This proves that the sequence is Cauchy, so we know
that the sequence converges even if the limit is not known in advance.

"The complete ordered field"


The real numbers are often described as "the complete ordered field", a phrase that can be interpreted in several
ways.
First, an order can be lattice-complete. It is easy to see that no ordered field can be lattice-complete, because it can
have no largest element (given any element z, z + 1 is larger), so this is not the sense that is meant.
Additionally, an order can be Dedekind-complete, as defined in the section Axioms. The uniqueness result at the end
of that section justifies using the word "the" in the phrase "complete ordered field" when this is the sense of
"complete" that is meant. This sense of completeness is most closely related to the construction of the reals from
Dedekind cuts, since that construction starts from an ordered field (the rationals) and then forms the
Dedekind-completion of it in a standard way.
These two notions of completeness ignore the field structure. However, an ordered group (in this case, the additive
group of the field) defines a uniform structure, and uniform structures have a notion of completeness (topology); the
description in the section Completeness above is a special case. (We refer to the notion of completeness in uniform
spaces rather than the related and better known notion for metric spaces, since the definition of metric space relies on
already having a characterisation of the real numbers.) It is not true that R is the only uniformly complete ordered
field, but it is the only uniformly complete Archimedean field, and indeed one often hears the phrase "complete
Archimedean field" instead of "complete ordered field". Since it can be proved that any uniformly complete
Archimedean field must also be Dedekind-complete (and vice versa, of course), this justifies using "the" in the
phrase "the complete Archimedean field". This sense of completeness is most closely related to the construction of
the reals from Cauchy sequences (the construction carried out in full in this article), since it starts with an
Archimedean field (the rationals) and forms the uniform completion of it in a standard way.
But the original use of the phrase "complete Archimedean field" was by David Hilbert, who meant still something
else by it. He meant that the real numbers form the largest Archimedean field in the sense that every other
Archimedean field is a subfield of R. Thus R is "complete" in the sense that nothing further can be added to it
without making it no longer an Archimedean field. This sense of completeness is most closely related to the
construction of the reals from surreal numbers, since that construction starts with a proper class that contains every
ordered field (the surreals) and then selects from it the largest Archimedean subfield.

Advanced properties
The reals are uncountable; that is, there are strictly more real numbers than natural numbers, even though both sets
are infinite. In fact, the cardinality of the reals equals that of the set of subsets (i.e., the power set) of the natural
numbers, and Cantor's diagonal argument states that the latter set's cardinality is strictly bigger than the cardinality of
N. Since only a countable set of real numbers can be algebraic, almost all real numbers are transcendental. The
non-existence of a subset of the reals with cardinality strictly between that of the integers and the reals is known as
the continuum hypothesis. The continuum hypothesis can neither be proved nor be disproved; it is independent from
Real number 103

the axioms of set theory.


The real numbers form a metric space: the distance between x and y is defined to be the absolute value |x − y|. By
virtue of being a totally ordered set, they also carry an order topology; the topology arising from the metric and the
one arising from the order are identical, but yield different presentations for the topology – in the order topology as
intervals, in the metric topology as epsilon-balls. The Dedekind cuts construction uses the order topology
presentation, while the Cauchy sequences construction uses the metric topology presentation. The reals are a
contractible (hence connected and simply connected), separable metric space of dimension 1, and are everywhere
dense. The real numbers are locally compact but not compact. There are various properties that uniquely specify
them; for instance, all unbounded, connected, and separable order topologies are necessarily homeomorphic to the
reals.
Every nonnegative real number has a square root in R, and no negative number does. This shows that the order on R
is determined by its algebraic structure. Also, every polynomial of odd degree admits at least one real root: these two
properties make R the premier example of a real closed field. Proving this is the first half of one proof of the
fundamental theorem of algebra.
The reals carry a canonical measure, the Lebesgue measure, which is the Haar measure on their structure as a
topological group normalised such that the unit interval [0,1] has measure 1.
The supremum axiom of the reals refers to subsets of the reals and is therefore a second-order logical statement. It is
not possible to characterize the reals with first-order logic alone: the Löwenheim–Skolem theorem implies that there
exists a countable dense subset of the real numbers satisfying exactly the same sentences in first order logic as the
real numbers themselves. The set of hyperreal numbers satisfies the same first order sentences as R. Ordered fields
that satisfy the same first-order sentences as R are called nonstandard models of R. This is what makes nonstandard
analysis work; by proving a first-order statement in some nonstandard model (which may be easier than proving it in
R), we know that the same statement must also be true of R.

Generalizations and extensions


The real numbers can be generalized and extended in several different directions:
• The complex numbers contain solutions to all polynomial equations and hence are an algebraically closed field
unlike the real numbers. However, the complex numbers are not an ordered field.
• The affinely extended real number system adds two elements +∞ and −∞. It is a compact space. It is no longer a
field, not even an additive group; it still has a total order; moreover, it is a complete lattice.
• The real projective line adds only one value ∞. It is also a compact space. Again, it is no longer a field, not even
an additive group. However, it allows division of a non-zero element by zero. It is not ordered anymore.
• The long real line pastes together ℵ1* + ℵ1 copies of the real line plus a single point (here ℵ1* denotes the
reversed ordering of ℵ1) to create an ordered set that is "locally" identical to the real numbers, but somehow
longer; for instance, there is an order-preserving embedding of ℵ1 in the long real line but not in the real numbers.
The long real line is the largest ordered set that is complete and locally Archimedean. As with the previous two
examples, this set is no longer a field or additive group.
• Ordered fields extending the reals are the hyperreal numbers and the surreal numbers; both of them contain
infinitesimal and infinitely large numbers and thus are not Archimedean.
• Self-adjoint operators on a Hilbert space (for example, self-adjoint square complex matrices) generalize the reals
in many respects: they can be ordered (though not totally ordered), they are complete, all their eigenvalues are
real and they form a real associative algebra. Positive-definite operators correspond to the positive reals and
normal operators correspond to the complex numbers.
Real number 104

"Reals" in set theory


In set theory, specifically descriptive set theory, the Baire space is used as a surrogate for the real numbers since the
latter have some topological properties (connectedness) that are a technical inconvenience. Elements of Baire space
are referred to as "reals".

See also
• Completeness
• Continued fraction
• Limit of a sequence
• Real analysis
• Simon Stevin
• Imaginary number
• Complex number

References
• Georg Cantor, 1874, "Über eine Eigenschaft des Inbegriffes aller reellen algebraischen Zahlen", Journal für die
Reine und Angewandte Mathematik, volume 77, pages 258–262.
• Robert Katz, 1964, Axiomatic Analysis, D. C. Heath and Company.
• Edmund Landau, 2001, ISBN 0-8218-2693-X, Foundations of Analysis, American Mathematical Society.
• Howie, John M., Real Analysis, Springer, 2005, ISBN 1-85233-314-6

External links
• The real numbers: Pythagoras to Stevin [7]
• The real numbers: Stevin to Hilbert [8]
• The real numbers: Attempts to understand [9]

References
[1] Cohen, Joel S. (2002). Computer algebra and symbolic computation: elementary algorithms. 1. A K Peters, Ltd.. p. 32.
ISBN 9781568811581.
[2] Scott Aaronson, NP-complete Problems and Physical Reality (http:/ / arxiv. org/ abs/ quant-ph/ 0502072), ACM SIGACT News, Vol. 36, No.
1. (March 2005), pp. 30–52.
[3] T. K. Puttaswamy, "The Accomplishments of Ancient Indian Mathematicians", pp. 410–1, in Selin, Helaine; D'Ambrosio, Ubiratan (2000),
Mathematics Across Cultures: The History of Non-western Mathematics, Springer, ISBN 1402002602
[4] O'Connor, John J.; Robertson, Edmund F., "Arabic mathematics: forgotten brilliance?" (http:/ / www-history. mcs. st-andrews. ac. uk/
HistTopics/ Arabic_mathematics. html), MacTutor History of Mathematics archive, University of St Andrews, .
[5] Matvievskaya, Galina (1987), "The Theory of Quadratic Irrationals in Medieval Oriental Mathematics", Annals of the New York Academy of
Sciences 500: 253–277 [254], doi:10.1111/j.1749-6632.1987.tb37206.x
[6] Jacques Sesiano, "Islamic mathematics", p. 148, in Selin, Helaine; D'Ambrosio, Ubiratan (2000), Mathematics Across Cultures: The History
of Non-western Mathematics, Springer, ISBN 1402002602
[7] http:/ / www-groups. dcs. st-and. ac. uk/ ~history/ HistTopics/ Real_numbers_1. html
[8] http:/ / www-groups. dcs. st-and. ac. uk/ ~history/ HistTopics/ Real_numbers_2. html
[9] http:/ / www-groups. dcs. st-and. ac. uk/ ~history/ HistTopics/ Real_numbers_3. html
Variance 105

Variance
In probability theory and statistics, the variance is used as one of several descriptors of a distribution. It describes
how far values lie from the mean. In particular, the variance is one of the moments of a distribution. In that context,
it forms part of a systematic approach to distinguishing between probability distributions. While other such
approaches have been developed, those based on moments are advantageous in terms of mathematical and
computational simplicity.
The variance is a parameter describing a theoretical probability distribution, while a sample of data from such a
distribution can be used to construct an estimate of this variance: in the simplest cases this estimate can be the
sample variance.

Background
The variance of a random variable or distribution is the expectation, or mean, of the squared deviation of that
variable from its expected value or mean. Thus the variance is a measure of the amount of variation within the values
of that variable, taking account of all possible values and their probabilities or weightings (not just the extremes
which give the range). For example, a perfect die, when thrown, has expected value (1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5,
expected absolute deviation 1.5 (the mean of the equally likely absolute deviations (3.5 − 1, 3.5 − 2, 3.5 − 3, 4 − 3.5,
5 − 3.5, 6 − 3.5), giving (2.5, 1.5, 0.5, 0.5, 1.5, 2.5), but expected square deviation or variance of 17.5/6 ≈ 2.9 (the
mean of the equally likely squared deviations 2.52, 1.52, 0.52, 0.52, 1.52, 2.52).
As another example, if a coin is tossed twice, the number of heads is: 0 with probability 0.25, 1 with probability 0.5
and 2 with probability 0.25. Thus the variance is 0.25 × (0 − 1)2 + 0.5 × (1 − 1)2 + 0.25 × (2 − 1)2 = 0.25 + 0 + 0.25
= 0.5. (Note that in this case, where tosses of coins are independent, the variance is additive, i.e., if the coin is tossed
n times, the variance will be 0.25n.)
Unlike expected deviation, the variance of a variable has units that are the square of the units of the variable itself.
For example, a variable measured in inches will have a variance measured in square inches. For this reason,
describing data sets via their standard deviation or root mean square deviation is often preferred over variance. In the
dice example the standard deviation is √(17.5/6) ≈ 1.7, slightly larger than the expected deviation of 1.5.
The standard deviation and the expected deviation can both be used as an indicator of the "spread" of a distribution.
The standard deviation is more amenable to algebraic manipulation, and, together with variance and its
generalization covariance, is used frequently in theoretical statistics; however the expected deviation tends to be
more robust as it is less sensitive to outliers arising from measurement anomalies or an unduly heavy-tailed
distribution.
Real-world distributions such as the distribution of yesterday’s rain throughout the day are typically not fully known,
unlike the behavior of perfect dice or an ideal distribution such as the normal distribution, because it is impractical to
account for every raindrop. Instead one estimates the mean and variance of the whole distribution as the computed
mean and variance of n samples drawn suitably randomly from the whole sample space, in this example yesterday’s
rainfall.
This method of estimation is close to optimal, with the caveat that it underestimates the variance by a factor of
(n−1)/n (when n = 1 the variance of a single sample is obviously zero regardless of the true variance), a bias which
should be corrected for when n is small. If the mean is determined in some other way than from the same samples
used to estimate the variance then this bias does not arise and the variance can safely be estimated as that of the
samples.
The variance of a real-valued random variable is its second central moment, and it also happens to be its second
cumulant. Just as some distributions do not have a mean, some do not have a variance. The mean exists whenever the
variance exists, but not vice versa.
Variance 106

Definition
If a random variable X has the expected value (mean) μ = E[X], then the variance of X is given by:

This definition encompasses random variables that are discrete, continuous, or neither. It can be expanded as
follows:

The variance of random variable X is typically designated as Var(X), , or simply σ2 (pronounced “sigma
squared”). If a distribution does not have an expected value, as is the case for the Cauchy distribution, it does not
have a variance either. Many other distributions for which the expected value does exist do not have a finite variance
because the relevant integral diverges. An example is a Pareto distribution whose index k satisfies 1 < k ≤ 2.

Continuous case
If the random variable X is continuous with probability density function f(x),

where

and where the integrals are definite integrals taken for x ranging over the range of X.

Discrete case
If the random variable X is discrete with probability mass function x1 ↦ p1, ..., xn ↦ pn, then

where

(When such a discrete weighted variance is specified by weights whose sum is not 1, then one divides by the sum of
the weights.) That is, it is the expected value of the square of the deviation of X from its own mean. In plain
language, it can be expressed as “The mean of the square of the deviation of each data point from the average”. It is
thus the mean squared deviation.
Variance 107

Examples

Exponential distribution
The exponential distribution with parameter λ is a continuous distribution whose support is the semi-infinite interval
[0,∞). Its probability density function is given by:

and it has expected value μ = λ−1. Therefore the variance is equal to:

So for an exponentially distributed random variable σ2 = μ2.

Fair dice
A six-sided fair dice can be modelled with a discrete random variable with outcomes 1 through 6, each with equal
probability . The expected value is (1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5. Therefore the variance can be computed to be:

Properties
Variance is non-negative because the squares are positive or zero. The variance of a constant random variable is
zero, and the variance of a variable in a data set is 0 if and only if all entries have the same value.
Variance is invariant with respect to changes in a location parameter. That is, if a constant is added to all values of
the variable, the variance is unchanged. If all values are scaled by a constant, the variance is scaled by the square of
that constant. These two properties can be expressed in the following formula:

The variance of a finite sum of uncorrelated random variables is equal to the sum of their variances. This stems
from the identity:

and that for uncorrelated variables covariance is zero.

In general, for the sum of variables: , we have:

or

Suppose that the observations can be partitioned into equal-sized subgroups according to some second variable.
Then the variance of the total group is equal to the mean of the variances of the subgroups plus the variance of the
means of the subgroups. This property is known as variance decomposition or the law of total variance and plays an
important role in the analysis of variance. For example, suppose that a group consists of a subgroup of men and an
equally large subgroup of women. Suppose that the men have a mean body length of 180 and that the variance of
their lengths is 100. Suppose that the women have a mean length of 160 and that the variance of their lengths is 50.
Then the mean of the variances is (100 + 50) / 2 = 75; the variance of the means is the variance of 180, 160 which is
100. Then, for the total group of men and women combined, the variance of the body lengths will be 75 + 100 = 175.
Variance 108

Note that this uses N for the denominator instead of N − 1.


   In a more general case, if the subgroups have unequal sizes, then they must be weighted proportionally to their size
in the computations of the means and variances. The formula is also valid with more than two groups, and even if the
grouping variable is continuous.
   This formula implies that the variance of the total group cannot be smaller than the mean of the variances of the
subgroups. Note, however, that the total variance is not necessarily larger than the variances of the subgroups. In the
above example, when the subgroups are analyzed separately, the variance is influenced only by the man-man
differences and the woman-woman differences. If the two groups are combined, however, then the men-women
differences enter into the variance also.
Many computational formulas for the variance are based on this equality: The variance is equal to the mean of the
square minus the square of the mean:

For example, if we consider the numbers 1, 2, 3, 4 then the mean of the squares is (1 × 1 + 2 × 2 + 3 × 3 + 4 × 4) / 4
= 7.5. The regular mean of all four numbers is 2.5, so the square of the mean is 6.25. Therefore the variance is
7.5 − 6.25 = 1.25, which is indeed the same result obtained earlier with the definition formulas. Many pocket
calculators use an algorithm that is based on this formula and that allows them to compute the variance while the
data are entered, without storing all values in memory. The algorithm is to adjust only three variables when a new
data value is entered: The number of data entered so far (n), the sum of the values so far (S), and the sum of the
squared values so far (SS). For example, if the data are 1, 2, 3, 4, then after entering the first value, the algorithm
would have n = 1, S = 1 and SS = 1. After entering the second value (2), it would have n = 2, S = 3 and SS = 5. When
all data are entered, it would have n = 4, S = 10 and SS = 30. Next, the mean is computed as M = S / n, and finally the
variance is computed as SS / n − M × M. In this example the outcome would be 30 / 4 − 2.5 × 2.5 = 7.5 − 6.25 =
1.25. If the unbiased sample estimate is to be computed, the outcome will be multiplied by n / (n − 1), which yields
1.667 in this example.

Properties, formal

Sum of uncorrelated variables (Bienaymé formula)


One reason for the use of the variance in preference to other measures of dispersion is that the variance of the sum
(or the difference) of uncorrelated random variables is the sum of their variances:

This statement is called the Bienaymé formula.[1] and was discovered in 1853. It is often made with the stronger
condition that the variables are independent, but uncorrelatedness suffices. So if all the variables have the same
variance σ2, then, since division by n is a linear transformation, this formula immediately implies that the variance of
their mean is

That is, the variance of the mean decreases when n increases. This formula for the variance of the mean is used in the
definition of the standard error of the sample mean, which is used in the central limit theorem.
Variance 109

Sum of correlated variables


In general, if the variables are correlated, then the variance of their sum is the sum of their covariances:

(Note: This by definition includes the variance of each variable, since Cov(X,X) = Var(X).)
Here Cov is the covariance, which is zero for independent random variables (if it exists). The formula states that the
variance of a sum is equal to the sum of all elements in the covariance matrix of the components. This formula is
used in the theory of Cronbach's alpha in classical test theory.
So if the variables have equal variance σ2 and the average correlation of distinct variables is ρ, then the variance of
their mean is

This implies that the variance of the mean increases with the average of the correlations. Moreover, if the variables
have unit variance, for example if they are standardized, then this simplifies to

This formula is used in the Spearman-Brown prediction formula of classical test theory. This converges to ρ if n goes
to infinity, provided that the average correlation remains constant or converges too. So for the variance of the mean
of standardized variables with equal correlations or converging average correlation we have

Therefore, the variance of the mean of a large number of standardized variables is approximately equal to their
average correlation. This makes clear that the sample mean of correlated variables does generally not converge to the
population mean, even though the Law of large numbers states that the sample mean will converge for independent
variables.

Weighted sum of variables


The scaling property and the Bienaymé formula, along with this property from the covariance page: Cov(aX, bY) =
ab Cov(X, Y) jointly imply that

This implies that in a weighted sum of variables, the variable with the largest weight will have a disproportionally
large weight in the variance of the total. For example, if X and Y are uncorrelated and the weight of X is two times
the weight of Y, then the weight of the variance of X will be four times the weight of the variance of Y.

Decomposition
The general formula for variance decomposition or the law of total variance is: If X and Y are two random variables
and the variance of X exists, then

Here, E(X|Y) is the conditional expectation of X given Y, and Var(X|Y) is the conditional variance of X given Y. (A
more intuitive explanation is that given a particular value of Y, then X follows a distribution with mean E(X|Y) and
variance Var(X|Y). The above formula tells how to find Var(X) based on the distributions of these two quantities
when Y is allowed to vary.) This formula is often applied in analysis of variance, where the corresponding formula is

It is also used in linear regression analysis, where the corresponding formula is


Variance 110

This can also be derived from the additivity of variances, since the total (observed) score is the sum of the predicted
score and the error score, where the latter two are uncorrelated.

Computational formula
The computational formula for the variance follows in a straightforward manner from the linearity of expected
values and the above definition:

This is often used to calculate the variance in practice, although it suffers from catastrophic cancellation if the two
components of the equation are similar in magnitude.

Characteristic property
The second moment of a random variable attains the minimum value when taken around the first moment (i.e.,
mean) of the random variable, i.e. . Conversely, if a continuous function
satisfies for all random variables X, then it is necessarily of the form
, where a > 0. This also holds in the multidimensional case.[2]

Calculation from the CDF


The population variance for a non-negative random variable can be expressed in terms of the cumulative distribution
function F using

where H(u) = 1 − F(u) is the right tail function. This expression can be used to calculate the variance in situations
where the CDF, but not the density, can be conveniently expressed.

Approximating the variance of a function


The delta method uses second-order Taylor expansions to approximate the variance of a function of one or more
random variables: see Taylor expansions for the moments of functions of random variables. For example, the
approximate variance of a function of one variable is given by

provided that f is twice differentiable and that the mean and variance of X are finite.
Variance 111

Population variance and sample variance


In general, the population variance of a finite population of size N is given by

where

is the population mean.


In many practical situations, the true variance of a population is not known a priori and must be computed somehow.
When dealing with extremely large populations, it is not possible to count every object in the population.
A common task is to estimate the variance of a population from a sample.[3] We take a sample with replacement of n
values y1, ..., yn from the population, where n < N, and estimate the variance on the basis of this sample. There are
several good estimators. Two of them are well known:

and

[4]

Both are referred to as sample variance. Here, denotes the sample mean:

The two estimators only differ slightly as can be seen, and for larger values of the sample size n the difference is
negligible. While the first one may be seen as the variance of the sample considered as a population, the second one
is the unbiased estimator of the population variance, meaning that its expected value E[s2] is equal to the true
variance of the sampled random variable; the use of the term n − 1 is called Bessel's correction. The sample variance
with n − 1 is a U-statistic for the function ƒ(x1, x2) = (x1 − x2)2/2, meaning that it is obtained by averaging a 2-sample
statistic over 2-element subsets of the population.
Variance 112

While,

Distribution of the sample variance


Being a function of random variables, the sample variance is itself a random variable, and it is natural to study its
distribution. In the case that yi are independent observations from a normal distribution, Cochran's theorem shows
that s2 follows a scaled chi-square distribution:

As a direct consequence, it follows that E(s2)  = σ2.


If the yi are independent and identically distributed, but not necessarily normally distributed, then

where κ is the kurtosis of the distribution. If the conditions of the law of large numbers hold, s2 is a consistent
estimator of σ2.

Generalizations
If is a vector-valued random variable, with values in , and thought of as a column vector, then the natural
generalization of variance is , where and is the transpose of , and
so is a row vector. This variance is a positive semi-definite square matrix, commonly referred to as the covariance
matrix.
If is a complex-valued random variable, with values in , then its variance is ,
where is the conjugate transpose of . This variance is also a positive semi-definite square matrix.
Variance 113

History
The term variance was first introduced by Ronald Fisher in his 1918 paper The Correlation Between Relatives on the
Supposition of Mendelian Inheritance:[5]
The great body of available statistics show us that the deviations of a human measurement from its mean
follow very closely the Normal Law of Errors, and, therefore, that the variability may be uniformly
measured by the standard deviation corresponding to the square root of the mean square error. When
there are two independent causes of variability capable of producing in an otherwise uniform population
distributions with standard deviations and , it is found that the distribution, when both causes act
together, has a standard deviation . It is therefore desirable in analysing the causes of

variability to deal with the square of the standard deviation as the measure of variability. We shall term
this quantity the Variance...

Moment of inertia
The variance of a probability distribution is analogous to the moment of inertia in classical mechanics of a
corresponding mass distribution along a line, with respect to rotation about its center of mass. It is because of this
analogy that such things as the variance are called moments of probability distributions. The covariance matrix is
related to the moment of inertia tensor for multivariate distributions. The moment of inertia of a cloud of n points
with a covariance matrix of is given by

This difference between moment of inertia in physics and in statistics is clear for points that are gathered along a
line. Suppose many points are close to the x and distributed along it. The covariance matrix might look like

That is, there is the most variance in the x direction. However, physicists would consider this to have a low moment
about the x axis so the moment-of-inertia tensor is

See also
• Algorithms for calculating variance
• An inequality on location and scale parameters
• Average absolute deviation
• Bhatia–Davis inequality
• Covariance
• Chebyshev's inequality
• Distance variance
• Estimation of covariance matrices
• Explained variance & unexplained variance
• Kurtosis
• Mean absolute error
• Mean difference
• Popoviciu's inequality on variances
Variance 114

• Qualitative variation
• Sample mean and covariance
• Semivariance
• Skewness
• Standard deviation
• Weighted sample variance

External links
• A Guide to Understanding & Calculating Variance [6]
• Fisher's original paper [7] (pdf format)
• A tutorial on Analysis of Variance devised for first-year Oxford University students [8]

References
[1] Michel Loeve, "Probability Theory", Graduate Texts in Mathematics, Volume 45, 4th edition, Springer-Verlag, 1977, p. 12.
[2] A. Kagan and L. A. Shepp, "Why the variance?", Statistics and Probability Letters, Volume 38, Number 4, 1998, pp. 329–333. (online (http:/
/ dx. doi. org/ 10. 1016/ S0167-7152(98)00041-8))
[3] William Navidi, Statistics for Engineers and Scientists (2006), McGraw-Hill, pg 14.
[4] Montgomery, D.C. and Runger, G.C.:Applied statistics and probability for engineers, page 201. John Wiley & Sons New York, 1994.
[5] Ronald Fisher (1918) The correlation between relatives on the supposition of Mendelian Inheritance (http:/ / www. library. adelaide. edu. au/
digitised/ fisher/ 9. pdf)
[6] http:/ / www. stats4students. com/ Essentials/ Measures-Of-Spread/ Overview_3. php
[7] http:/ / www. library. adelaide. edu. au/ digitised/ fisher/ 9. pdf
[8] http:/ / www. celiagreen. com/ charlesmccreery/ statistics/ anova. pdf
Probability density function 115

Probability density function


In probability theory, a probability
density function (abbreviated as pdf,
or just density) of an absolutely
continuous random variable is a
function that describes the relative
likelihood for this random variable to
occur at a given point in the
observation space. The probability for
a random variable to fall within a given
set is given by the integral of its
density over the set.

The terms  “probability distribution


function”[1] and  “probability
[2]
function” have also been used to
denote the probability density function.
However, special care should be taken
around this usage since it is not
standard among probabilists and
statisticians. In other sources, 
“probability distribution function” may
be used when the probability Boxplot and probability density function (pdf) of a gaussian probability distribution N(0,
distribution is defined as a function σ2).
over general sets of values, or it may
refer to the cumulative distribution function, or it may be a probability mass function rather than the density.

Absolutely continuous univariate distributions


A probability density function is most commonly associated with absolutely continuous univariate distributions. A
random variable X has density ƒ, where ƒ is a non-negative Lebesgue-integrable function, if:

Hence, if F is the cumulative distribution function of X, then:

and (if ƒ is continuous at x)

Intuitively, one can think of ƒ(x) dx as being the probability of X falling within the infinitesimal interval [x, x + dx].
Probability density function 116

Formal definition
This definition may be extended to any probability distribution using the measure-theoretic definition of probability.
A random variable X has probability distribution X∗P: the density of X with respect to a reference measure μ is the
Radon–Nikodym derivative:

That is, ƒ is any function with the property that:

for any measurable set A.

Discussion
In the continuous univariate case above, the reference measure is the Lebesgue measure. The probability mass
function of a discrete random variable is the density with respect to the counting measure over the sample space
(usually the set of integers, or some subset thereof).
Note that it is not possible to define a density with reference to an arbitrary measure (i.e. one can't choose the
counting measure as a reference for a continuous random variable). Furthermore, when it does exist, the density is
almost everywhere unique.

Further details
For example, the uniform on the interval [0, 1] distribution has probability density ƒ(x) = 1 for 0 ≤ x ≤ 1 and ƒ(x) = 0
elsewhere.
The standard normal distribution has probability density

If a random variable X is given and its distribution admits a probability density function ƒ, then the expected value of
X (if it exists) can be calculated as

Not every probability distribution has a density function: the distributions of discrete random variables do not; nor
does the Cantor distribution, even though it has no discrete component, i.e., does not assign positive probability to
any individual point.
A distribution has a density function if and only if its cumulative distribution function F(x) is absolutely continuous.
In this case: F is almost everywhere differentiable, and its derivative can be used as probability density:

If a probability distribution admits a density, then the probability of every one-point set {a} is zero; the same holds
for finite and countable sets.
Two probability densities ƒ and g represent the same probability distribution precisely if they differ only on a set of
Lebesgue measure zero.
In the field of statistical physics, a non-formal reformulation of the relation above between the derivative of the
cumulative distribution function and the probability density function is generally used as the definition of the
probability density function. This alternate definition is the following:
Probability density function 117

If dt is an infinitely small number, the probability that X is included within the interval (t, t + dt) is equal to ƒ(t) dt, or:

Link between discrete and continuous distributions


It is possible to represent certain discrete random variables as well as random variables involving both a continuous
and a discrete part with a generalized probability density function, by using the Dirac delta function. For example, let
us consider a binary discrete random variable taking −1 or 1 for values, with probability ½ each.
The density of probability associated with this variable is:

More generally, if a discrete variable can take n different values among real numbers, then the associated probability
density function is:

where x1, …, xn are the discrete values accessible to the variable and p1, …, pn are the probabilities associated with
these values.
This substantially unifies the treatment of discrete and continuous probability distributions. For instance, the above
expression allows for determining statistical characteristics of such a discrete variable (such as its mean, its variance
and its kurtosis), starting from the formulas given for a continuous distribution.

Densities associated with multiple variables


For continuous random variables X1, …, Xn, it is also possible to define a probability density function associated to
the set as a whole, often called joint probability density function. This density function is defined as a function of
the n variables, such that, for any domain D in the n-dimensional space of the values of the variables X1, …, Xn, the
probability that a realisation of the set variables falls inside the domain D is
If F(x1, …, xn) = Pr(X1 ≤ x1, …, Xn ≤ xn) is the cumulative distribution function of the vector (X1, …, Xn), then the
joint probability density function can be computed as a partial derivative

Marginal densities
For i=1, 2, …,n, let ƒXi(xi) be the probability density function associated to variable Xi alone. This is called the
“marginal” density function, and can be deduced from the probability densities associated of the random variables X1,
…, Xn by integrating on all values of the n − 1 other variables:

Independence
Continuous random variables X1, …, Xn admitting a joint density are all independent from each other if and only if
Probability density function 118

Corollary
If the joint probability density function of a vector of n random variables can be factored into a product of n
functions of one variable

(where each fi is not necessarily a density) then the n variables in the set are all independent from each other, and the
marginal probability density function of each of them is given by

Example
This elementary example illustrates the above definition of multidimensional probability density functions in the
simple case of a function of a set of two variables. Let us call a 2-dimensional random vector of coordinates (X,
Y): the probability to obtain in the quarter plane of positive x and y is

Sums of independent random variables


The probability density function of the sum of two independent random variables U and V, each of which has a
probability density function, is the convolution of their separate density functions:

It is possible to generalize the previous relation to a sum of N independent random variables, with densities U1, …,
UN:

Dependent variables and change of variables


If the probability density function of a random variable X is given as ƒX(x), it is possible (but often not necessary; see
below) to calculate the probability density function of some variable Y = g(X). This is also called a “change of
variable” and is in practice used to generate a random variable of arbitrary shape ƒg(X) = ƒY using a known (for
instance uniform) random number generator.
If the function g is monotonic, then the resulting density function is

Here g−1 denotes the inverse function and g' denotes the derivative.
This follows from the fact that the probability contained in a differential area must be invariant under change of
variables. That is,

or
For functions which are not monotonic the probability density function for y is

where n(y) is the number of solutions in x for the equation g(x) = y, and g−1k(y) are these solutions.
Probability density function 119

It is tempting to think that in order to find the expected value E(g(X)) one must first find the probability density ƒg(X)
of the new random variable Y = g(X). However, rather than computing

one may find instead

The values of the two integrals are the same in all cases in which both X and g(X) actually have probability density
functions. It is not necessary that g be a one-to-one function. In some cases the latter integral is computed much
more easily than the former.

Multiple variables
The above formulas can be generalized to variables (which we will again call y) depending on more than one other
variable. ƒ(x0, x1, …, xm−1) shall denote the probability density function of the variables y depends on, and the
dependence shall be y = g(x0, x1, …, xm−1). Then, the resulting density function is

where the integral is over the entire (m-1)-dimensional solution of the subscripted equation and the symbolic dV
must be replaced by a parametrization of this solution for a particular calculation; the variables x0, x1, …, xm−1 are
then of course functions of this parametrization.
This derives from the following, perhaps more intuitive representation: Suppose x is an n-dimensional random
variable with joint density f. If y = H(x), where H is a bijective, differentiable function, then y has density g:

with the differential regarded as the Jacobian of the inverse of H, evaluated at y.


Using the delta-function (and assuming independence) the same result is formulated as follows.
If the probability density function of independent random variables Xi, i = 1, 2, …n are given as ƒXi(xi), it is possible
to calculate the probability density function of some variable Y = G(X1, X2, …Xn). The following formula establishes
a connection between the probability density function of Y denoted by ƒY(y) and ƒXi(xi) using the Dirac delta function:

See also
• Likelihood function
• Density estimation
• Secondary measure
• Probability mass function

References
[1] (http:/ / planetmath. org/ ?method=png& from=objects& id=2884& op=getobj) PlanetMath
[2] (http:/ / mathworld. wolfram. com/ ProbabilityDistributionFunction. html) Mathworld

• Ushakov, N.G. (2001), "Density of a probability distribution" (http://eom.springer.de/D/d031110.htm), in


Hazewinkel, Michiel, Encyclopaedia of Mathematics, Springer, ISBN 978-1556080104
Probability density function 120

Bibliography
• Pierre Simon de Laplace (1812). Analytical Theory of Probability.
The first major treatise blending calculus with probability theory, originally in French: Théorie
Analytique des Probabilités.
• Andrei Nikolajevich Kolmogorov (1950). Foundations of the Theory of Probability.
The modern measure-theoretic foundation of probability theory; the original German version
(Grundbegriffe der Wahrscheinlichkeitrechnung) appeared in 1933.
• Patrick Billingsley (1979). Probability and Measure. New York, Toronto, London: John Wiley and Sons.
• David Stirzaker (2003). Elementary Probability.
Chapters 7 to 9 are about continuous variables. This book is filled with theory and mathematical proofs.

External links
• Weisstein, Eric W., " Probability density function (http://mathworld.wolfram.com/ProbabilityDensityFunction.
html)" from MathWorld.

Cumulative distribution function


In probability theory and statistics, the cumulative distribution function (CDF), or just distribution function,
describes the probability that a real-valued random variable X with a given probability distribution will be found at a
value less than or equal to x. Intuitively, it is the "area so far" function of the probability distribution. Cumulative
distribution functions are also used to specify the distribution of multivariate random variables.

Definition
For every real number x, the CDF of a real-valued random variable X is given by

where the right-hand side represents the probability that the random variable X takes on a value less than or equal to
x. The probability that X lies in the interval (a, b] is therefore if a < b.
If treating several random variables X, Y, ... etc. the corresponding letters are used as subscripts while, if treating
only one, the subscript is omitted. It is conventional to use a capital F for a cumulative distribution function, in
contrast to the lower-case f used for probability density functions and probability mass functions. This applies when
discussing general distributions: some specific distributions have their own conventional notation, for example the
normal distribution.
The CDF of X can be defined in terms of the probability density function ƒ as follows:

Note that in the definition above, the "less than or equal to" sign, "≤", is a convention, not a universally used one
(e.g. Hungarian literature uses "<"), but is important for discrete distributions. The proper use of tables of the
binomial and Poisson distributions depend upon this convention. Moreover, important formulas like Levy's inversion
formula for the characteristic function also rely on the "less or equal" formulation.
Cumulative distribution function 121

Properties
Every cumulative distribution function F is (not necessarily strictly)
monotone non-decreasing (see monotone increasing) and
right-continuous. Furthermore, we have

From top to bottom, the cumulative distribution


function of a discrete probability distribution,
continuous probability distribution, and a
distribution which has both a continuous part and
a discrete part.

Every function with these four properties is a CDF. The properties imply that all CDFs are càdlàg functions.
If X is a discrete random variable, then it attains values x1, x2, ... with probability pi = P(xi), and the CDF of X will be
discontinuous at the points xi and constant in between:

If the CDF F of X is continuous, then X is a continuous random variable; if furthermore F is absolutely continuous,
then there exists a Lebesgue-integrable function f(x) such that

for all real numbers a and b. (The first of the two equalities displayed above would not be correct in general if we
had not said that the distribution is continuous. Continuity of the distribution implies that P (X = a) = P (X = b) = 0,
so the difference between "<" and "≤" ceases to be important in this context.) The function f is equal to the derivative
of F almost everywhere, and it is called the probability density function of the distribution of X.

Point probability
The "point probability" that X is exactly b can be found as

Kolmogorov–Smirnov and Kuiper's tests


The Kolmogorov–Smirnov test is based on cumulative distribution functions and can be used to test to see whether
two empirical distributions are different or whether an empirical distribution is different from an ideal distribution.
The closely related Kuiper's test (pronounced /ˈkaɪpərz/) is useful if the domain of the distribution is cyclic as in day
of the week. For instance we might use Kuiper's test to see if the number of tornadoes varies during the year or if
Cumulative distribution function 122

sales of a product vary by day of the week or day of the month.

Complementary cumulative distribution function


Sometimes, it is useful to study the opposite question and ask how often the random variable is above a particular
level. This is called the complementary cumulative distribution function (ccdf) or exceedance, and is defined as

This has applications in statistical hypothesis testing, for example, because one-sided P-value is the probability of
observing a test statistic at least as extreme as the one observed; hence, the one-sided P-value is simply given by the
ccdf.
In survival analysis, is called the survival function and denoted , while the term reliability function is
common in engineering.

Folded cumulative distribution


While the plot of a cumulative distribution often has an S-like shape,
an alternative illustration is the folded cumulative distribution or
mountain plot, which folds the top half of the graph over,[1] thus using
two scales, one for the upslope and another for the downslope. This
form of illustration emphasises the median and dispersion of the
distribution or of the empirical results.

Examples
As an example, suppose X is uniformly distributed on the unit interval
[0, 1]. Then the CDF of X is given by

Example of the folded cumulative distribution for


a normal distribution function

Take another example, suppose X takes only the discrete values 0 and 1, with equal probability. Then the CDF of X
is given by
Cumulative distribution function 123

Inverse
If the CDF F is strictly increasing and continuous then is the unique real number such that
.
Unfortunately, the distribution does not, in general, have an inverse. One may define, for ,

Example 1: The median is .


Example 2: Put . Then we call the 95th percentile.
The inverse of the cdf is called the quantile function.
The inverse of the cdf can be used to translate results obtained for the uniform distribution to other distributions.
Some useful properties of the inverse cdf are:
1. is nondecreasing
2.
3.
4. if and only if
5. If has a distribution then is distributed as . This is used in random number generation
using the inverse transform sampling-method.
6. If is a collection of independent -distributed random variables defined on the same sample space, then
there exist random variables such that is distributed as and with probability 1
for all .

Multivariate case
When dealing simultaneously with more than one random variable the joint cumulative distribution function can also
be defined. For example, for a pair of random variables X,Y, the joint CDF is given by

where the right-hand side represents the probability that the random variable X takes on a value less than or equal to
x and that Y takes on a value less than or equal to y.
Every multivariate CDF is:
1. Monotonically non-decreasing for each of its variables
2. Right-continuous for each of its variables.
3.
4. and

See also
• Descriptive statistics
• Empirical distribution function
• Cumulative frequency analysis
• Q-Q plot
• Ogive
• Single crossing condition
Cumulative distribution function 124

References
[1] Gentle, J.E. (2009). Computational Statistics (http:/ / books. google. de/ books?id=m4r-KVxpLsAC& lpg=PA348& ots=8Wxj0G_GC6&
dq=folded cumulative distribution or mountain plot& hl=en& pg=PA348#v=onepage& q=folded cumulative distribution or mountain plot&
f=false). Springer. . Retrieved 2010-08-06.

Expected value
In probability theory and statistics, the expected value (or expectation value, or mathematical expectation, or
mean, or first moment) of a random variable is the integral of the random variable with respect to its probability
measure.[1] [2]
For discrete random variables this is equivalent to the probability-weighted sum of the possible values.
For continuous random variables with a density function it is the probability density-weighted integral of the
possible values.
The term "expected value" can be misleading. It must not be confused with the "most probable value." The expected
value is in general not a typical value that the random variable can take on. It is often helpful to interpret the
expected value of a random variable as the long-run average value of the variable over many independent repetitions
of an experiment.
The expected value may be intuitively understood by the law of large numbers: The expected value, when it exists, is
almost surely the limit of the sample mean as sample size grows to infinity. The value may not be expected in the
general sense — the "expected value" itself may be unlikely or even impossible (such as having 2.5 children), just
like the sample mean.
The expected value does not exist for some distributions with large "tails", such as the Cauchy distribution.[3]
It is possible to construct an expected value equal to the probability of an event by taking the expectation of an
indicator function that is one if the event has occurred and zero otherwise. This relationship can be used to translate
properties of expected values into properties of probabilities, e.g. using the law of large numbers to justify estimating
probabilities by frequencies.

History
The idea of the expected value originated in the middle of the 17th century from the study of the so-called problem
of points, posed by a French nobleman chevalier de Méré. The problem was that of two players who want to finish a
game early and, given the current circumstances of the game, want to divide the stakes fairly, based on the chance
each has of winning the game from that point. This problem was solved in 1654 by Blaise Pascal in his private
correspondence with Pierre de Fermat, however the idea was not communicated to the broad scientific community.
Three years later, in 1657, a Dutch mathematician Christiaan Huygens published a treatise (see Huygens (1657)) “De
ratiociniis in ludo aleæ” on probability theory, which not only lay down the foundations of the theory of probability,
but also considered the problem of points, presenting a solution essentially the same as Pascal’s. [4]
Neither Pascal nor Huygens used the term “expectation” in its modern sense. In particular, Huygens writes: “That my
Chance or Expectation to win any thing is worth just such a Sum, as wou’d procure me in the same Chance and
Expectation at a fair Lay. … If I expect a or b, and have an equal Chance of gaining them, my Expectation is worth
.” More than a hundred years later, in 1814, Pierre-Simon Laplace published his tract “Théorie analytique des
probabilités”, where the concept of expected value was defined explicitly:
Expected value 125


… This advantage in the theory of chance is the product of the sum hoped for by the probability of obtaining it; it is the partial sum which
ought to result when we do not wish to run the risks of the event in supposing that the division is made proportional to the probabilities. This
division is the only equitable one when all strange circumstances are eliminated; because an equal degree of probability gives an equal right
for the sum hoped for. We will call this advantage mathematical hope. ”
The use of letter E to denote expected value goes back to W.A. Whitworth (1901) “Choice and chance”. The symbol
has become popular since for English writers it meant “Expectation”, for Germans “Erwartungswert”, and for French
“Espérance mathématique”.[5]

Examples
The expected outcome from one roll of an ordinary (that is, fair) six-sided die is
which is not among the possible outcomes.[6]
A common application of expected value is gambling. For example, an American roulette wheel has 38 places where
the ball may land, all equally likely. A winning bet on a single number pays 35-to-1, meaning that the original stake
is not lost, and 35 times that amount is won, so you receive 36 times what you've bet. Considering all 38 possible
outcomes, the expected value of the profit resulting from a dollar bet on a single number is the sum of potential net
loss times the probability of losing and potential net gain times the probability of winning, that is,
The net change in your financial holdings is −$1 when you lose, and $35 when you win. Thus one may expect, on
average, to lose about five cents for every dollar bet, and the expected value of a one-dollar bet is $0.947368421. In
gambling, an event of which the expected value equals the stake (i.e. the bettor's expected profit, or net gain, is zero)
is called a “fair game”.

Mathematical definition
In general, if is a random variable defined on a probability space , then the expected value of ,
denoted by , , or , is defined as

When this integral converges absolutely, it is called the expectation of X.The absolute convergence is necessary
because conditional convergence means that different order of addition gives different result, which is against the
nature of expected value. Here the Lebesgue integral is employed. Note that not all random variables have an
expected value, since the integral may not converge absolutely (e.g., Cauchy distribution). Two variables with the
same probability distribution will have the same expected value, if it is defined.
If is a discrete random variable with probability mass function , then the expected value becomes

as in the gambling example mentioned above.


If the probability distribution of admits a probability density function , then the expected value can be
computed as

It follows directly from the discrete case definition that if is a constant random variable, i.e. for some
fixed real number , then the expected value of is also .
The expected value of an arbitrary function of X, g(X), with respect to the probability density function f(x) is given
by the inner product of f and g:
Expected value 126

This is sometimes called the law of the unconscious statistician. Using representations as Riemann–Stieltjes integral
and integration by parts the formula can be restated as
• if ,
• if .
As a special case let denote a positive real number, then

In particular, for , this reduces to:

if , where F is the cumulative distribution function of X.

Conventional terminology
• When one speaks of the "expected price", "expected height", etc. one means the expected value of a random
variable that is a price, a height, etc.
• When one speaks of the "expected number of attempts needed to get one successful attempt," one might
conservatively approximate it as the reciprocal of the probability of success for such an attempt. Cf. expected
value of the geometric distribution.

Properties

Constants
The expected value of a constant is equal to the constant itself; i.e., if c is a constant, then .

Monotonicity
If X and Y are random variables so that almost surely, then .

Linearity
The expected value operator (or expectation operator) is linear in the sense that

Note that the second result is valid even if X is not statistically independent of Y. Combining the results from
previous three equations, we can see that

for any two random variables and (which need to be defined on the same probability space) and any real
numbers and .
Expected value 127

Iterated expectation

Iterated expectation for discrete random variables


For any two discrete random variables one may define the conditional expectation:[7]

which means that is a function on .


Then the expectation of satisfies

Hence, the following equation holds:[8]

The right hand side of this equation is referred to as the iterated expectation and is also sometimes called the tower
rule. This proposition is treated in law of total expectation.

Iterated expectation for continuous random variables


In the continuous case, the results are completely analogous. The definition of conditional expectation would use
inequalities, density functions, and integrals to replace equalities, mass functions, and summations, respectively.
However, the main result still holds:

Inequality
If a random variable X is always less than or equal to another random variable Y, the expectation of X is less than or
equal to that of Y:
If , then .
In particular, since and , the absolute value of expectation of a random variable is less
than or equal to the expectation of its absolute value:
Expected value 128

Non-multiplicativity
In general, the expected value operator is not multiplicative, i.e. is not necessarily equal to .
If multiplicativity occurs, the and variables are said to be uncorrelated (independent variables are a notable
case of uncorrelated variables). The lack of multiplicativity gives rise to study of covariance and correlation.
If one considers the joint PDF of X and Y, say j(x,y), then the expectation of XY is

Now if X and Y are independent, then by definition j(x,y)=f(x)g(y) where f and g are the marginal PDFs for X and Y.
Then
Observe that independence of X and Y is required only to write j(x,y)=f(x)g(y), and this is required to establish the
third equality above.

Functional non-invariance
In general, the expectation operator and functions of random variables do not commute; that is

A notable inequality concerning this topic is Jensen's inequality, involving expected values of convex (or concave)
functions.

Uses and applications


The expected values of the powers of are called the moments of ; the moments about the mean of are
expected values of powers of . The moments of some random variables can be used to specify their
distributions, via their moment generating functions.
To empirically estimate the expected value of a random variable, one repeatedly measures observations of the
variable and computes the arithmetic mean of the results. If the expected value exists, this procedure estimates the
true expected value in an unbiased manner and has the property of minimizing the sum of the squares of the residuals
(the sum of the squared differences between the observations and the estimate). The law of large numbers
demonstrates (under fairly mild conditions) that, as the size of the sample gets larger, the variance of this estimate
gets smaller.
This property is often exploited in a wide variety of applications, including general problems of statistical estimation
and machine learning, to estimate (probabilistic) quantities of interest via Monte Carlo methods, since most
quantities of interest can be written in terms of expectation, e.g. where is the
indicator function for set , i.e. .
In classical mechanics, the center of mass is an analogous concept to expectation. For example, suppose is a
discrete random variable with values and corresponding probabilities . Now consider a weightless rod on
which are placed weights, at locations along the rod and having masses (whose sum is one). The point at
which the rod balances is .
Expected values can also be used to compute the variance, by means of the computational formula for the variance

A very important application of the expectation value is in the field of quantum mechanics. The expectation value of
a quantum mechanical operator operating on a quantum state vector is written as . The
uncertainty in can be calculated using the formula .
Expected value 129

Expectation of matrices
If is an matrix, then the expected value of the matrix is defined as the matrix of expected values:
This is utilized in covariance matrices.

Formulas for special cases

Discrete distribution taking only non-negative integer values


When a random variable takes only values in we can use the following formula for computing its
expectation:

Proof:

interchanging the order of summation, we have

as claimed. This result can be a useful computational shortcut. For example, suppose we toss a coin where the
probability of heads is . How many tosses can we expect until the first heads (not including the heads itself)? Let
be this number. Note that we are counting only the tails and not the heads which ends the experiment; in
particular, we can have . The expectation of may be computed by . This is because

the number of tosses is at least exactly when the first tosses yielded tails. This matches the expectation of a
random variable with an Exponential distribution. We used the formula for Geometric progression:

Continuous distribution taking non-negative values


Analogously with the discrete case above, when a continuous random variable X takes only non-negative values, we
can use the following formula for computing its expectation:

Proof: It is first assumed that X has a density .

interchanging the order of integration, we have


Expected value 130

as claimed. In case no density exists, it is seen that

See also
• Conditional expectation
• An inequality on location and scale parameters
• Expected value is also a key concept in economics, finance, and many other subjects
• The general term expectation
• Moment (mathematics)
• Expectation value (quantum mechanics)
• Wald's equation for calculating the expected value of a random number of random variables

Historical background
• Edwards, A.W.F (2002). Pascal’s arithmetical triangle: the story of a mathematical idea (2nd ed.). JHU Press.
ISBN 0-8018-6946-3.
• Huygens, Christiaan (1657). De ratiociniis in ludo aleæ (English translation, published in 1714: [9]).

External links
• An 8-foot-tall (2.4 m) Probability Machine (named Sir Francis) comparing stock market returns to the
randomness of the beans dropping through the quincunx pattern. [13] from Index Funds Advisors IFA.com [14],
youtube.com
• Expectation [10] on PlanetMath

References
[1] Sheldon M Ross (2007). "§2.4 Expectation of a random variable" (http:/ / books. google. com/ books?id=12Pk5zZFirEC& pg=PA38).
Introduction to probability models (9th ed.). Academic Press. p. 38 ff. ISBN 0125980620. .
[2] Richard W Hamming (1991). "§2.5 Random variables, mean and the expected value" (http:/ / books. google. com/
books?id=jX_F-77TA3gC& pg=PA64). The art of probability for scientists and engineers. Addison-Wesley. p. 64 ff. ISBN 0201406861. .
[3] For a discussion of the Cauchy distribution, see Richard W Hamming (1991). "Example 8.7–1 The Cauchy distribution" (http:/ / books.
google. com/ books?id=jX_F-77TA3gC& printsec=frontcover& dq=isbn:0201406861& cd=1#v=onepage& q=Cauchy& f=false). The art of
probability for scientists and engineers. Addison-Wesley. p. 290 ff. ISBN 0201406861. . "Sampling from the Cauchy distribution and
averaging gets you nowhere – one sample has the same distribution as the average of 1000 samples!"
[4] In the foreword to his book, Huygens writes: “It should be said, also, that for some time some of the best mathematicians of France have
occupied themselves with this kind of calculus so that no one should attribute to me the honour of the first invention. This does not belong to
me. But these savants, although they put each other to the test by proposing to each other many questions difficult to solve, have hidden their
methods. I have had therefore to examine and go deeply for myself into this matter by beginning with the elements, and it is impossible for me
for this reason to affirm that I have even started from the same principle. But finally I have found that my answers in many cases do not differ
from theirs.” (cited in Edwards (2002)). Thus, Huygens learned about de Méré’s problem in 1655 during his visit to France; later on in 1656
from his correspondence with Carcavi he learned that his method was essentially the same as Pascal’s; so that before his book went to press in
1657 he knew about Pascal’s priority in this subject.
[5] "Earliest uses of symbols in probability and statistics" (http:/ / jeff560. tripod. com/ stat. html). .
[6] Sheldon M Ross. "Example 2.15" (http:/ / books. google. com/ books?id=12Pk5zZFirEC& pg=PA39). cited work. p. 39. ISBN 0125980620. .
[7] Sheldon M Ross. "Chapter 3: Conditional probability and conditional expectation" (http:/ / books. google. com/ books?id=12Pk5zZFirEC&
pg=PA97). cited work. p. 97 ff. ISBN 0125980620. .
Expected value 131

[8] Sheldon M Ross. "§3.4: Computing expectations by conditioning" (http:/ / books. google. com/ books?id=12Pk5zZFirEC& pg=PA105). cited
work. p. 105 ff. ISBN 0125980620. .
[9] http:/ / www. york. ac. uk/ depts/ maths/ histstat/ huygens. pdf
[10] http:/ / planetmath. org/ ?op=getobj& amp;from=objects& amp;id=505

Discrete probability distribution


In probability theory and statistics, a discrete probability distribution
is a probability distribution characterized by a probability mass
function. Thus, the distribution of a random variable X is discrete, and
X is then called a discrete random variable, if

The probability mass function of a discrete


probability distribution. The probabilities of the
singletons {1}, {3}, and {7} are respectively 0.2,
0.5, 0.3. A set not containing any of these points
has probability zero.

The cdf of a discrete probability distribution,...

... of a continuous probability distribution,...

... of a distribution which has both a continuous


part and a discrete part.

as u runs through the set of all possible values of X. It follows that such a random variable can assume only a finite
or countably infinite number of values. That is, the possible values might be listed, although the list might be infinite.
For example, count observations such as the numbers of birds in flocks comprise only natural number values
{0, 1, 2, ...}. By contrast, continuous observations such as the weights of birds comprise real number values and
would typically be modeled by a continuous probability distribution such as the normal.
Discrete probability distribution 132

In cases more frequently considered, this set of possible values is a topologically discrete set in the sense that all its
points are isolated points. But there are discrete random variables for which this countable set is dense on the real
line.
Among the most well-known discrete probability distributions that are used for statistical modeling are the Poisson
distribution, the Bernoulli distribution, the binomial distribution, the geometric distribution, and the negative
binomial distribution. In addition, the discrete uniform distribution is commonly used in computer programs that
make equal-probability random selections between a number of choices.

Alternative description
Equivalently to the above, a discrete random variable can be defined as a random variable whose cumulative
distribution function (cdf) increases only by jump discontinuities—that is, its cdf increases only where it "jumps" to
a higher value, and is constant between those jumps. The points where jumps occur are precisely the values which
the random variable may take. The number of such jumps may be finite or countably infinite. The set of locations of
such jumps need not be topologically discrete; for example, the cdf might jump at each rational number.
Consequently, a discrete probability distribution is often represented as a generalized probability density function
involving Dirac delta functions, which substantially unifies the treatment of continuous and discrete distributions.
This is especially useful when dealing with probability distributions involving both a continuous and a discrete part.

Representation in terms of indicator functions


For a discrete random variable X, let u0, u1, ... be the values it can take with non-zero probability. Denote

These are disjoint sets, and by formula (1)

It follows that the probability that X takes any value except for u0, u1, ... is zero, and thus one can write X as

except on a set of probability zero, where is the indicator function of A. This may serve as an alternative
definition of discrete random variables.

See also
• Stochastic vector
• Continuous probability distribution
Continuous probability distribution 133

Continuous probability distribution


In probability theory, a probability distribution is called continuous if its cumulative distribution function is
continuous . This is equivalent to saying that for random variables X with the distribution in question, Pr[X = a] = 0
for all real numbers a, i.e.: the probability that X attains the value a is zero, for any number a. If the distribution of X
is continuous then X is called a continuous random variable.
While for a discrete probability distribution an event with probability zero is impossible (e.g. rolling 3.5 on a
standard die is impossible, and has probability zero), this is not true in the case of a continuous random variable. For
example, if one measures the width of an oak leaf, the result 3.5 cm is possible, but has probability zero because
there are infinitely many possible values even between 3 cm and 4 cm. Each of these individual outcomes has
probability zero, yet the probability that the outcome will fall into that interval is nonzero. This apparent paradox is
resolved by the fact that the probability that X attains some value within an infinite set, such as an interval, cannot be
found by naively adding the probabilities for individual values. Formally, each value has an infinitesimally small
probability, which statistically is equivalent to zero.

Comparison with absolute continuity


The term "continuous" is sometimes used as a synonym for "absolutely continuous with respect to Lebesgue" (see
Radon–Nikodym theorem). An absolutely continuous distribution (with respect to Lebesgue) has a probability
density function[1] . For a random variable X, being absolutely continuous is equivalent to saying that the probability
that X attains a value in any given subset S of its range with Lebesgue measure zero is equal to zero. This does not
follow from the condition Pr[X = a] = 0 for all real numbers a, since there are uncountable sets with
Lebesgue-measure zero (e.g. the Cantor set).
A random variable with the Cantor distribution is continuous (according to the first convention) but is not absolutely
continuous.
In practical applications, random variables are often either discrete, absolutely continuous, or mixtures thereof.
However, the Cantor distribution is neither discrete nor a weighted average of discrete and absolutely continuous
distributions.
The normal distribution, continuous uniform distribution, Beta distribution, and Gamma distribution are well known
absolutely continuous distributions. The normal distribution, also called the Gaussian or the bell curve, is ubiquitous
in nature and statistics due to the central limit theorem: every variable that can be modelled as a sum of many small
independent variables is approximately normal.

External links
• Continuous Random Variables. [2] John Appleby, School of Mathematical Sciences, Dublin City University.
[3]
Hazewinkel, Michiel, ed. (2001), "Continuous distribution" , Encyclopaedia of Mathematics, Springer,
ISBN 978-1556080104

References
[1] Feller: An Introduction to Probability Theory and its Applications, volume 2, page 139
[2] http:/ / webpages. dcu. ie/ ~applebyj/ ms207/ CNSRV1. pdf
[3] http:/ / eom. springer. de/ c/ c025620. htm
Probability mass function 134

Probability mass function


In probability theory, a probability mass function (pmf) is a function
that gives the probability that a discrete random variable is exactly
equal to some value. A pmf differs from a probability density function
(pdf) in that the values of a pdf, defined only for continuous random
variables, are not probabilities as such. Instead, the integral of a pdf
over a range of possible values (a, b] gives the probability of the
The graph of a probability mass function. All the
random variable falling within that range. See notation for the meaning
values of this function must be non-negative and
of (a, b]. sum up to 1.

Mathematical description
Suppose that X: S → R is a discrete random variable defined on a
sample space S. Then the probability mass function fX: R → [0, 1] for
X is defined as

The probability mass function of a fair die. All


the numbers on the die have an equal chance of
appearing on top when the die is rolled.

Note that fX is defined for all real numbers, including those not in the image of X; indeed, fX(x) = 0 for all x X(S).
Since the image of X is countable, the probability mass function fX(x) is zero for all but a countable number of values
of x. The discontinuity of probability mass functions reflects the fact that the cumulative distribution function of a
discrete random variable is also discontinuous. Where it is differentiable, the derivative is zero, just as the
probability mass function is zero at all such points.

Example
Suppose that S is the sample space of all outcomes of a single toss of a fair coin, and X is the random variable
defined on S assigning 0 to "tails" and 1 to "heads". Since the coin is fair, the probability mass function is

See also
• Discrete probability distribution
Probability mass function 135

References
• Johnson, N.L., Kotz, S., Kemp A. (1993) Univariate Discrete Distributions (2nd Edition). Wiley. ISBN
0-471-54897-9 (p 36)

Continuous function
In mathematics, a continuous function is a function for which, intuitively, small changes in the input result in small
changes in the output. Otherwise, a function is said to be "discontinuous". A continuous function with a continuous
inverse function is called "bicontinuous". An intuitive (though imprecise) idea of continuity is given by the common
statement that a continuous function is a function whose graph can be drawn without lifting the chalk from the
blackboard.
Continuity of functions is one of the core concepts of topology, which is treated in full generality below. The
introductory portion of this article focuses on the special case where the inputs and outputs of functions are real
numbers. In addition, this article discusses the definition for the more general case of functions between two metric
spaces. In order theory, especially in domain theory, one considers a notion of continuity known as Scott continuity.
Other forms of continuity do exist but they are not discussed in this article.
As an example, consider the function h(t) which describes the height of a growing flower at time t. This function is
continuous. In fact, there is a dictum of classical physics which states that in nature everything is continuous. By
contrast, if M(t) denotes the amount of money in a bank account at time t, then the function jumps whenever money
is deposited or withdrawn, so the function M(t) is discontinuous. (However, if one assumes a discrete set as the
domain of function M, for instance the set of points of time at 4:00 PM on business days, then M becomes
continuous function, as every function whose domain is a discrete subset of reals is.)

Real-valued continuous functions

Historical infinitesimal definition


Cauchy defined continuity of a function in the following intuitive terms: an infinitesimal change in the independent
variable corresponds to an infinitesimal change of the dependent variable (see Cours d'analyse, page 34).

Definition in terms of limits


Suppose we have a function that maps real numbers to real numbers and whose domain is some interval, like the
functions h and M above. Such a function can be represented by a graph in the Cartesian plane; the function is
continuous if, roughly speaking, the graph is a single unbroken curve with no "holes" or "jumps".
In general, we say that the function f is continuous at some point c of its domain if, and only if, the following holds:
• The limit of f(x) as x approaches c through domain of f does exist and is equal to f(c); in mathematical notation,
. If the point c in the domain of f is not a limit point of the domain, then this condition is
vacuously true, since x cannot approach c through values not equal c. Thus, for example, every function whose
domain is the set of all integers is continuous.
We call a function continuous, if, and only if, it is continuous at every point of its domain. More generally, we say
that a function is continuous on some subset of its domain if it is continuous at every point of that subset.
The notation C(Ω) or C0(Ω) is sometimes used to denote the set of all continuous functions with domain Ω.
Similarly, C1(Ω) is used to denote the set of differentiable functions whose derivative is continuous, C²(Ω) for the
twice-differentiable functions whose second derivative is continuous, and so on (see differentiability class). In the
field of computer graphics, these three levels are sometimes called g0 (continuity of position), g1 (continuity of
Continuous function 136

tangency), and g2 (continuity of curvature). The notation C(n, α)(Ω) occurs in the definition of a more subtle concept,
that of Hölder continuity.

Weierstrass definition (epsilon-delta) of continuous functions


Without resorting to limits, one can define continuity of real functions as follows.
Again consider a function ƒ that maps a set of real numbers to another set of real numbers, and suppose c is an
element of the domain of ƒ. The function ƒ is said to be continuous at the point c if the following holds: For any
number ε > 0, however small, there exists some number δ > 0 such that for all x in the domain of ƒ with
c − δ < x < c + δ, the value of ƒ(x) satisfies

Alternatively written: Given subsets I, D of R, continuity of ƒ : I → D at c ∈ I means that for every ε > 0 there exists
a δ > 0 such that for all x ∈ I,:

A form of this epsilon-delta definition of continuity was first given by Bernard Bolzano in 1817. Preliminary forms
of a related definition of the limit were given by Cauchy,[1] though the formal definition and the distinction between
pointwise continuity and uniform continuity were first given by Karl Weierstrass.
More intuitively, we can say that if we want to get all the ƒ(x) values to stay in some small neighborhood around ƒ(c),
we simply need to choose a small enough neighborhood for the x values around c, and we can do that no matter how
small the ƒ(x) neighborhood is; ƒ is then continuous at c.
In modern terms, this is generalized by the definition of continuity of a function with respect to a basis for the
topology, here the metric topology.

Heine definition of continuity


The following definition of continuity is due to Heine.
A real function ƒ is continuous if for any sequence (xn) such that

it holds that

(We assume that all the points xn as well as L belong to the domain of ƒ.)
One can say, briefly, that a function is continuous if, and only if, it preserves limits.
Weierstrass's and Heine's definitions of continuity are equivalent on the reals. The usual (easier) proof makes use of
the axiom of choice, but in the case of global continuity of real functions it was proved by Wacław Sierpiński that
the axiom of choice is not actually needed.[2]
In more general setting of topological spaces, the concept analogous to Heine definition of continuity is called
sequential continuity. In general, the condition of sequential continuity is weaker than the analogue of Cauchy
continuity, which is just called continuity (see continuity (topology) for details). However, if instead of sequences
one uses nets (sets indexed by a directed set, not only the natural numbers), then the resulting concept is equivalent
to the general notion of continuity in topology. Sequences are sufficient on metric spaces because they are
first-countable spaces (every point has a countable neighborhood basis, hence representative points in each
neighborhood are enough to ensure continuity), but general topological spaces are not first-countable, hence
sequences do not suffice, and nets must be used.
Continuous function 137

Definition using oscillation

The failure of a function to be continuous at a


point is quantified by its oscillation.

Continuity can also be defined in terms of oscillation: a function ƒ is continuous at a point x0 if and only if the
oscillation is zero;[3] in symbols, A benefit of this definition is that it quantifies discontinuity: the
oscillation gives how much the function is discontinuous at a point.
This definition is useful in descriptive set theory to study the set of discontinuities and continuous points – the
continuous points are the intersection of the sets where the oscillation is less than ε (hence a Gδ set) – and gives a
very quick proof of one direction of the Lebesgue integrability condition.[4]
The oscillation is equivalence to the ε-δ definition by a simple re-arrangement, and by using a limit (lim sup, lim inf)
to define oscillation: if (at a given point) for a given ε0 there is no δ that satisfies the ε-δ definition, then the
oscillation is at least ε0, and conversely if for every ε there is a desired δ, the oscillation is 0. The oscillation
definition can be naturally generalized to maps from a topological space to a metric space.

Definition using the hyperreals


Non-standard analysis is a way of making Newton-Leibniz-style infinitesimals mathematically rigorous. The real
line is augmented by the addition of infinite and infinitesimal numbers to form the hyperreal numbers. In
nonstandard analysis, continuity can be defined as follows.
A function ƒ from the reals to the reals is continuous if its natural extension to the hyperreals has the property
that for real x and infinitesimal dx, ƒ(x+dx) − ƒ(x) is infinitesimal.[5]
In other words, an infinitesimal increment of the independent variable corresponds to an infinitesimal change of the
dependent variable, giving a modern expression to Augustin-Louis Cauchy's definition of continuity.

Examples
• All polynomial functions are continuous.
• If a function has a domain which is not an interval, the notion of a continuous function as one whose graph you
can draw without taking your pencil off the paper is not quite correct. Consider the functions f(x) = 1/x and g(x) =
(sin x)/x. Neither function is defined at x = 0, so each has domain R \ {0} of real numbers except 0, and each
function is continuous. The question of continuity at x = 0 does not arise, since x = 0 is neither in the domain of f
nor in the domain of g. The function f cannot be extended to a continuous function whose domain is R, since no
matter what value is assigned at 0, the resulting function will not be continuous. On the other hand, since the limit
of g at 0 is 1, g can be extended continuously to R by defining its value at 0 to be 1.
Continuous function 138

• The exponential functions, logarithms, square root function, trigonometric functions and absolute value function
are continuous. Rational functions, however, are not necessarily continuous on all of R.
• An example of a rational continuous function is f(x)=1⁄x-2. The question of continuity at x= 2 does not arise, since
x = 2 is not in the domain of f.
• An example of a discontinuous function is the function f defined by f(x) = 1 if x > 0, f(x) = 0 if x ≤ 0. Pick for
instance ε = 1⁄2. There is no δ-neighborhood around x = 0 that will force all the f(x) values to be within ε of f(0).
Intuitively we can think of this type of discontinuity as a sudden jump in function values.
• Another example of a discontinuous function is the signum or sign function.
• A more complicated example of a discontinuous function is Thomae's function.
• Dirichlet's function

is continuous at only one point, namely x = 0. [6]

Facts about continuous functions


If two functions f and g are continuous, then f + g, fg, and f/g are continuous. (Note. The only possible points x of
discontinuity of f/g are the solutions of the equation g(x) = 0; but then any such x does not belong to the domain of
the function f/g. Hence f/g is continuous on its entire domain, or - in other words - is continuous.)
The composition f o g of two continuous functions is continuous.
If a function is differentiable at some point c of its domain, then it is also continuous at c. The converse is not true: a
function that is continuous at c need not be differentiable there. Consider for instance the absolute value function at
c = 0.

Intermediate value theorem


The intermediate value theorem is an existence theorem, based on the real number property of completeness, and
states:
If the real-valued function f is continuous on the closed interval [a, b] and k is some number between f(a) and
f(b), then there is some number c in [a, b] such that f(c) = k.
For example, if a child grows from 1 m to 1.5 m between the ages of two and six years, then, at some time between
two and six years of age, the child's height must have been 1.25 m.
As a consequence, if f is continuous on [a, b] and f(a) and f(b) differ in sign, then, at some point c in [a, b], f(c) must
equal zero.

Extreme value theorem


The extreme value theorem states that if a function f is defined on a closed interval [a,b] (or any closed and bounded
set) and is continuous there, then the function attains its maximum, i.e. there exists c ∈ [a,b] with f(c) ≥ f(x) for all
x ∈ [a,b]. The same is true of the minimum of f. These statements are not, in general, true if the function is defined
on an open interval (a,b) (or any set that is not both closed and bounded), as, for example, the continuous function
f(x) = 1/x, defined on the open interval (0,1), does not attain a maximum, being unbounded above.
Continuous function 139

Directional continuity

A right continuous function A left continuous function

A function may happen to be continuous in only one direction, either from the "left" or from the "right". A
right-continuous function is a function which is continuous at all points when approached from the right.
Technically, the formal definition is similar to the definition above for a continuous function but modified as
follows:
The function ƒ is said to be right-continuous at the point c if the following holds: For any number ε > 0 however
small, there exists some number δ > 0 such that for all x in the domain with c < x < c + δ, the value of ƒ(x) will satisfy

Notice that x must be larger than c, that is on the right of c. If x were also allowed to take values less than c, this
would be the definition of continuity. This restriction makes it possible for the function to have a discontinuity at c,
but still be right continuous at c, as pictured.
Likewise a left-continuous function is a function which is continuous at all points when approached from the left,
that is, c − δ < x < c.
A function is continuous if and only if it is both right-continuous and left-continuous.

Continuous functions between metric spaces


Now consider a function f from one metric space (X, dX) to another metric space (Y, dY). Then f is continuous at the
point c in X if for any positive real number ε, there exists a positive real number δ such that all x in X satisfying dX(x,
c) < δ will also satisfy dY(f(x), f(c)) < ε.
This can also be formulated in terms of sequences and limits: the function f is continuous at the point c if for every
sequence (xn) in X with limit lim xn = c, we have lim f(xn) = f(c). Continuous functions transform limits into limits.
This latter condition can be weakened as follows: f is continuous at the point c if and only if for every convergent
sequence (xn) in X with limit c, the sequence (f(xn)) is a Cauchy sequence, and c is in the domain of f. Continuous
functions transform convergent sequences into Cauchy sequences.
The set of points at which a function between metric spaces is continuous is a Gδ set – this follows from the ε-δ
definition of continuity.
Continuous function 140

Continuous functions between topological spaces


The above definitions of continuous
functions can be generalized to functions
from one topological space to another in a
natural way; a function f : X → Y, where X
and Y are topological spaces, is continuous
if and only if for every open set V ⊆ Y, the
inverse image

Continuity of a function at a point

is open.
However, this definition is often difficult to use directly. Instead, suppose we have a function f from X to Y, where X,
Y are topological spaces. We say f is continuous at x for some x ∈ X if for any neighborhood V of f(x), there is a
neighborhood U of x such that f(U) ⊆ V. Although this definition appears complex, the intuition is that no matter how
"small" V becomes, we can always find a U containing x that will map inside it. If f is continuous at every x ∈ X, then
we simply say f is continuous.
In a metric space, it is equivalent to consider the neighbourhood system of open balls centered at x and f(x) instead of
all neighborhoods. This leads to the standard ε-δ definition of a continuous function from real analysis, which says
roughly that a function is continuous if all points close to x map to points close to f(x). This only really makes sense
in a metric space, however, which has a notion of distance.
Note, however, that if the target space is Hausdorff, it is still true that f is continuous at a if and only if the limit of f
as x approaches a is f(a). At an isolated point, every function is continuous.

Definitions
Several equivalent definitions for a topological structure exist and thus there are several equivalent ways to define a
continuous function.

Open and closed set definition


The most common notion of continuity in topology defines continuous functions as those functions for which the
preimages(or inverse images) of open sets are open. Similar to the open set formulation is the closed set
formulation, which says that preimages (or inverse images) of closed sets are closed.

Neighborhood definition
Definitions based on preimages are often difficult to use directly. Instead, suppose we have a function f : X → Y,
where X and Y are topological spaces.[7] We say f is continuous at x for some x ∈ X if for any neighborhood V of
f(x), there is a neighborhood U of x such that f(U) ⊆ V. Although this definition appears complicated, the intuition is
that no matter how "small" V becomes, we can always find a U containing x that will map inside it. If f is continuous
at every x ∈ X, then we simply say f is continuous.
Continuous function 141

In a metric space, it is equivalent to consider the neighbourhood system of open balls centered at x and f(x) instead of
all neighborhoods. This leads to the standard δ-ε definition of a continuous function from real analysis, which says
roughly that a function is continuous if all points close to x map to points close to f(x). This only really makes sense
in a metric space, however, which has a notion of distance.
Note, however, that if the target space is Hausdorff, it is still true that f is continuous at a if and only if the limit of f
as x approaches a is f(a). At an isolated point, every function is continuous.

Sequences and nets


In several contexts, the topology of a space is conveniently specified in terms of limit points. In many instances, this
is accomplished by specifying when a point is the limit of a sequence, but for some spaces that are too large in some
sense, one specifies also when a point is the limit of more general sets of points indexed by a directed set, known as
nets. A function is continuous only if it takes limits of sequences to limits of sequences. In the former case,
preservation of limits is also sufficient; in the latter, a function may preserve all limits of sequences yet still fail to be
continuous, and preservation of nets is a necessary and sufficient condition.
In detail, a function f : X → Y is sequentially continuous if whenever a sequence (xn) in X converges to a limit x, the
sequence (f(xn)) converges to f(x). Thus sequentially continuous functions "preserve sequential limits". Every
continuous function is sequentially continuous. If X is a first-countable space, then the converse also holds: any
function preserving sequential limits is continuous. In particular, if X is a metric space, sequential continuity and
continuity are equivalent. For non first-countable spaces, sequential continuity might be strictly weaker than
continuity. (The spaces for which the two properties are equivalent are called sequential spaces.) This motivates the
consideration of nets instead of sequences in general topological spaces. Continuous functions preserve limits of
nets, and in fact this property characterizes continuous functions.

Closure operator definition


Given two topological spaces (X,cl) and (X ' ,cl ') where cl and cl ' are two closure operators then a function

is continuous if for all subsets A of X

One might therefore suspect that given two topological spaces (X,int) and (X ' ,int ') where int and int ' are two
interior operators then a function

is continuous if for all subsets A of X

or perhaps if
Continuous function 142

however, neither of these conditions is either necessary or sufficient for continuity.


Instead, we must resort to inverse images: given two topological spaces (X,int) and (X ' ,int ') where int and int ' are
two interior operators then a function

is continuous if for all subsets A of X '

We can also write that given two topological spaces (X,cl) and (X ' ,cl ') where cl and cl ' are two closure operators
then a function

is continuous if for all subsets A of X '

Closeness relation definition


Given two topological spaces (X,δ) and (X' ,δ') where δ and δ' are two closeness relations then a function

is continuous if for all points x and of X and all subsets A of X,

This is another way of writing the closure operator definition.

Useful properties of continuous maps


Some facts about continuous maps between topological spaces:
• If f : X → Y and g : Y → Z are continuous, then so is the composition g ∘ f : X → Z.
• If f : X → Y is continuous and
• X is compact, then f(X) is compact.
• X is connected, then f(X) is connected.
• X is path-connected, then f(X) is path-connected.
• X is Lindelöf, then f(X) is Lindelöf.
• X is separable, then f(X) is separable.
• The identity map idX : (X, τ2) → (X, τ1) is continuous if and only if τ1 ⊆ τ2 (see also comparison of topologies).

Other notes
If a set is given the discrete topology, all functions with that space as a domain are continuous. If the domain set is
given the indiscrete topology and the range set is at least T0, then the only continuous functions are the constant
functions. Conversely, any function whose range is indiscrete is continuous.
Given a set X, a partial ordering can be defined on the possible topologies on X. A continuous function between two
topological spaces stays continuous if we strengthen the topology of the domain space or weaken the topology of the
codomain space. Thus we can consider the continuity of a given function a topological property, depending only on
the topologies of its domain and codomain spaces.
For a function f from a topological space X to a set S, one defines the final topology on S by letting the open sets of S
be those subsets A of S for which f−1(A) is open in X. If S has an existing topology, f is continuous with respect to
this topology if and only if the existing topology is coarser than the final topology on S. Thus the final topology can
be characterized as the finest topology on S which makes f continuous. If f is surjective, this topology is canonically
Continuous function 143

identified with the quotient topology under the equivalence relation defined by f. This construction can be
generalized to an arbitrary family of functions X → S.
Dually, for a function f from a set S to a topological space, one defines the initial topology on S by letting the open
sets of S be those subsets A of S for which f(A) is open in X. If S has an existing topology, f is continuous with
respect to this topology if and only if the existing topology is finer than the initial topology on S. Thus the initial
topology can be characterized as the coarsest topology on S which makes f continuous. If f is injective, this topology
is canonically identified with the subspace topology of S, viewed as a subset of X. This construction can be
generalized to an arbitrary family of functions S → X.
Symmetric to the concept of a continuous map is an open map, for which images of open sets are open. In fact, if an
open map f has an inverse, that inverse is continuous, and if a continuous map g has an inverse, that inverse is open.
If a function is a bijection, then it has an inverse function. The inverse of a continuous bijection is open, but need not
be continuous. If it is, this special function is called a homeomorphism. If a continuous bijection has as its domain a
compact space and its codomain is Hausdorff, then it is automatically a homeomorphism.

Continuous functions between partially ordered sets


In order theory, continuity of a function between posets is Scott continuity. Let X be a complete lattice, then a
function f : X → X is continuous if, for each subset Y of X, we have sup f(Y) = f(sup Y).

Continuous binary relation


A binary relation R on A is continuous if R(a, b) whenever there are sequences (ak)i and (bk)i in A which converge to
a and b respectively for which R(ak, bk) for all k. Clearly, if one treats R as a characteristic function in two variables,
this definition of continuous is identical to that for continuous functions.

Continuity space
A continuity space[8] [9] is a generalization of metric spaces and posets, which uses the concept of quantales, and
that can be used to unify the notions of metric spaces and domains.[10]

See also
• Absolute continuity
• Bounded linear operator
• Classification of discontinuities
• Coarse function
• Continuous functor
• Continuous stochastic process
• Dini continuity
• Discrete function
• Equicontinuity
• Lipschitz continuity
• Normal function
• Piecewise
• Scott continuity
• Semicontinuity
• Smooth function
• Symmetrically continuous function
• Uniform continuity
Continuous function 144

References
• Visual Calculus [11] by Lawrence S. Husch, University of Tennessee (2001)

References
[1] Grabiner, Judith V. (March 1983). "Who Gave You the Epsilon? Cauchy and the Origins of Rigorous Calculus" (http:/ / www. maa. org/
pubs/ Calc_articles/ ma002. pdf). The American Mathematical Monthly 90 (3): 185–194. doi:10.2307/2975545. .
[2] "Heine continuity implies Cauchy continuity without the Axiom of Choice" (http:/ / www. apronus. com/ math/ cauchyheine. htm).
Apronus.com. .
[3] Introduction to Real Analysis (http:/ / ramanujan. math. trinity. edu/ wtrench/ texts/ TRENCH_REAL_ANALYSIS. PDF), updated April
2010, William F. Trench, Theorem 3.5.2, p. 172
[4] Introduction to Real Analysis (http:/ / ramanujan. math. trinity. edu/ wtrench/ texts/ TRENCH_REAL_ANALYSIS. PDF), updated April
2010, William F. Trench, 3.5 "A More Advanced Look at the Existence of the Proper Riemann Integral", pp. 171–177
[5] http:/ / www. math. wisc. edu/ ~keisler/ calc. html
[6] http:/ / www. quantiphile. com/ 2010/ 09/ 13/ a-function-that-is-continuous-at-only-one-point/
[7] f is a function f : X → Y between two topological spaces (X,TX) and (Y,TY). That is, the function f is defined on the elements of the set X, not
on the elements of the topology TX. However continuity of the function does depend on the topologies used.
[8] Quantales and continuity spaces (http:/ / citeseerx. ist. psu. edu/ viewdoc/ download?doi=10. 1. 1. 48. 851& rep=rep1& type=pdf), RC Flagg -
Algebra Universalis, 1997
[9] All topologies come from generalized metrics, R Kopperman - American Mathematical Monthly, 1988
[10] Continuity spaces: Reconciling domains and metric spaces, B Flagg, R Kopperman - Theoretical Computer Science, 1997
[11] http:/ / archives. math. utk. edu/ visual. calculus/

Measure (mathematics)
In mathematics, more specifically in measure theory, a measure on a
set is a systematic way to assign to each suitable subset a number,
intuitively interpreted as the size of the subset. In this sense, a measure
is a generalization of the concepts of length, area, volume, et cetera. A
particularly important example is the Lebesgue measure on a Euclidean
space, which assigns the conventional length, area and volume of
Euclidean geometry to suitable subsets of Rn, n = 1, 2, 3, .... For
instance, the Lebesgue measure of [0, 1] in the real numbers is its
length in the everyday sense of the word, specifically 1.

To qualify as a measure (see Definition below), a function that assigns


a non-negative real number or +∞ to a set's subsets must satisfy a few
conditions. One important condition is countable additivity. This
condition states that the size of the union of a sequence of disjoint
subsets is equal to the sum of the sizes of the subsets. However, it is in
general impossible to consistently associate a size to each subset of a
given set and also satisfy the other axioms of a measure. This problem
was resolved by defining measure only on a sub-collection of all
subsets; the subsets on which the measure is to be defined are called
measurable and they are required to form a sigma-algebra, meaning Informally, a measure has the property of being
monotone in the sense that if A is a subset of B,
that unions, intersections and complements of sequences of measurable
the measure of A is less than or equal to the
subsets are measurable. Non-measurable sets in a Euclidean space, on measure of B. Furthermore, the measure of the
which the Lebesgue measure cannot be consistently defined, are empty set is required to be 0.
necessarily complex to the point of incomprehensibility, in a sense
badly mixed up with their complement; indeed, their existence is a non-trivial consequence of the axiom of choice.
Measure (mathematics) 145

Measure theory was developed in successive stages during the late 19th and early 20th centuries by Emile Borel,
Henri Lebesgue, Johann Radon and Maurice Fréchet, among others. The main applications of measures are in the
foundations of the Lebesgue integral, in Andrey Kolmogorov's axiomatisation of probability theory and in ergodic
theory. In integration theory, specifying a measure allows one to define integrals on spaces more general than subsets
of Euclidean space; moreover, the integral with respect to the Lebesgue measure on Euclidean spaces is more general
and has a richer theory than its predecessor, the Riemann integral. Probability theory considers measures that assign
to the whole set the size 1, and considers measurable subsets to be events whose probability is given by the measure.
Ergodic theory considers measures that are invariant under, or arise naturally from, a dynamical system.

Definition
Let Σ be a σ-algebra over a set X. A function μ from Σ to the extended real number line is called a measure if it
satisfies the following properties:
• Non-negativity:
for all
• Null empty set:

• Countable additivity (or σ-additivity): For all countable collections of pairwise disjoint sets in Σ:

The second condition may be treated as a special case of countable additivity, if the empty collection is allowed as a
countable collection (and the empty sum is interpreted as 0). Otherwise, if the empty collection is disallowed (but
finite collections are allowed), the second condition still follows from countable additivity provided, however, that
there is at least one set having finite measure.
The pair (X, Σ) is called a measurable space, the members of Σ are called measurable sets, and the triple (X, Σ, μ)
is called a measure space.
If only the second and third conditions of the definition of measure above are met, and μ takes on at most one of the
values ±∞, then μ is called a signed measure.
A probability measure is a measure with total measure one (i.e., μ(X) = 1); a probability space is a measure space
with a probability measure.
For measure spaces that are also topological spaces various compatibility conditions can be placed for the measure
and the topology. Most measures met in practice in analysis (and in many cases also in probability theory) are Radon
measures. Radon measures have an alternative definition in terms of linear functionals on the locally convex space of
continuous functions with compact support. This approach is taken by Bourbaki (2004) and a number of other
authors. For more details see Radon measure.
Measure (mathematics) 146

Properties
Several further properties can be derived from the definition of a countably additive measure.

Monotonicity
A measure μ is monotonic: If E1 and E2 are measurable sets with E1 ⊆ E2 then

Measures of infinite unions of measurable sets


A measure μ is countably subadditive: If E1, E2, E3, … is a countable sequence of sets in Σ, not necessarily disjoint,
then

A measure μ is continuous from below: If E1, E2, E3, … are measurable sets and En is a subset of En + 1 for all n, then
the union of the sets En is measurable, and

Measures of infinite intersections of measurable sets


A measure μ is continuous from above: If E1, E2, E3, … are measurable sets and En + 1 is a subset of En for all n, then
the intersection of the sets En is measurable; furthermore, if at least one of the En has finite measure, then

This property is false without the assumption that at least one of the En has finite measure. For instance, for each n ∈
N, let

which all have infinite Lebesgue measure, but the intersection is empty.

Sigma-finite measures
A measure space (X, Σ, μ) is called finite if μ(X) is a finite real number (rather than ∞). It is called σ-finite if X can be
decomposed into a countable union of measurable sets of finite measure. A set in a measure space has σ-finite
measure if it is a countable union of sets with finite measure.
For example, the real numbers with the standard Lebesgue measure are σ-finite but not finite. Consider the closed
intervals [k,k+1] for all integers k; there are countably many such intervals, each has measure 1, and their union is the
entire real line. Alternatively, consider the real numbers with the counting measure, which assigns to each finite set
of reals the number of points in the set. This measure space is not σ-finite, because every set with finite measure
contains only finitely many points, and it would take uncountably many such sets to cover the entire real line. The
σ-finite measure spaces have some very convenient properties; σ-finiteness can be compared in this respect to the
Lindelöf property of topological spaces. They can be also thought of as a vague generalization of the idea that a
measure space may have 'uncountable measure'.
Measure (mathematics) 147

Completeness
A measurable set X is called a null set if μ(X)=0. A subset of a null set is called a negligible set. A negligible set need
not be measurable, but every measurable negligible set is automatically a null set. A measure is called complete if
every negligible set is measurable.
A measure can be extended to a complete one by considering the σ-algebra of subsets Y which differ by a negligible
set from a measurable set X, that is, such that the symmetric difference of X and Y is contained in a null set. One
defines μ(Y) to equal μ(X).

Examples
Some important measures are listed here.
• The counting measure is defined by μ(S) = number of elements in S.
• The Lebesgue measure on R is a complete translation-invariant measure on a σ-algebra containing the intervals in
R such that μ([0,1]) = 1; and every other measure with these properties extends Lebesgue measure.
• Circular angle measure is invariant under rotation.
• The Haar measure for a locally compact topological group is a generalization of the Lebesgue measure (and also
of counting measure and circular angle measure) and has similar uniqueness properties.
• The Hausdorff measure which is a refinement of the Lebesgue measure to some fractal sets.
• Every probability space gives rise to a measure which takes the value 1 on the whole space (and therefore takes
all its values in the unit interval [0,1]). Such a measure is called a probability measure. See probability axioms.
• The Dirac measure δa (cf. Dirac delta function) is given by δa(S) = χS(a), where χS is the characteristic function of
S. The measure of a set is 1 if it contains the point a and 0 otherwise.
Other 'named' measures used in various theories include: Borel measure, Jordan measure, ergodic measure, Euler
measure, Gaussian measure, Baire measure, Radon measure and Young measure.
In physics an example of a measure is spatial distribution of mass (see e.g., gravity potential), or another
non-negative extensive property, conserved (see conservation law for a list of these) or not. Negative values lead to
signed measures, see "generalizations" below.
Liouville measure, known also as the natural volume form on a symplectic manifold, is useful in classical statistical
and Hamiltonian mechanics.
Gibbs measure is widely used in statistical mechanics, often under the name canonical ensemble.

Non-measurable sets
If the axiom of choice is assumed to be true, not all subsets of Euclidean space are Lebesgue measurable; examples
of such sets include the Vitali set, and the non-measurable sets postulated by the Hausdorff paradox and the
Banach–Tarski paradox.

Generalizations
For certain purposes, it is useful to have a "measure" whose values are not restricted to the non-negative reals or
infinity. For instance, a countably additive set function with values in the (signed) real numbers is called a signed
measure, while such a function with values in the complex numbers is called a complex measure. Measures that take
values in Banach spaces have been studied extensively. A measure that takes values in the set of self-adjoint
projections on a Hilbert space is called a projection-valued measure; these are used mainly in functional analysis for
the spectral theorem. When it is necessary to distinguish the usual measures which take non-negative values from
generalizations, the term positive measure is used. Positive measures are closed under conical combination but not
general linear combination, while signed measures are the linear closure of positive measures.
Measure (mathematics) 148

Another generalization is the finitely additive measure, which are sometimes called contents. This is the same as a
measure except that instead of requiring countable additivity we require only finite additivity. Historically, this
definition was used first, but proved to be not so useful. It turns out that in general, finitely additive measures are
connected with notions such as Banach limits, the dual of L∞ and the Stone–Čech compactification. All these are
linked in one way or another to the axiom of choice.
A charge is a generalization in both directions: it is a finitely additive, signed measure.
The remarkable result in integral geometry known as Hadwiger's theorem states that the space of
translation-invariant, finitely additive, not-necessarily-nonnegative set functions defined on finite unions of compact
convex sets in Rn consists (up to scalar multiples) of one "measure" that is "homogeneous of degree k" for each k =
0, 1, 2, ..., n, and linear combinations of those "measures". "Homogeneous of degree k" means that rescaling any set
by any factor c > 0 multiplies the set's "measure" by ck. The one that is homogeneous of degree n is the ordinary
n-dimensional volume. The one that is homogeneous of degree n − 1 is the "surface volume". The one that is
homogeneous of degree 1 is a mysterious function called the "mean width", a misnomer. The one that is
homogeneous of degree 0 is the Euler characteristic.

See also
• Outer measure
• Inner measure
• Hausdorff measure
• Product measure
• Pushforward measure
• Lebesgue measure
• Vector measure
• Almost everywhere
• Lebesgue integration
• Caratheodory extension theorem
• Measurable function
• Geometric measure theory
• Volume form
• Fuzzy measure theory

References
• R. G. Bartle, 1995. The Elements of Integration and Lebesgue Measure. Wiley Interscience.
• Bourbaki, Nicolas (2004), Integration I, Springer Verlag, ISBN 3-540-41129-1 Chapter III.
• R. M. Dudley, 2002. Real Analysis and Probability. Cambridge University Press.
• Folland, Gerald B. (1999), Real Analysis: Modern Techniques and Their Applications, John Wiley and Sons,
ISBN 0-471-317160-0 Second edition.
• D. H. Fremlin, 2000. Measure Theory [1]. Torres Fremlin.
• Paul Halmos, 1950. Measure theory. Van Nostrand and Co.
• R. Duncan Luce and Louis Narens (1987). "measurement, theory of," The New Palgrave: A Dictionary of
Economics, v. 3, pp. 428–32.
• M. E. Munroe, 1953. Introduction to Measure and Integration. Addison Wesley.
• K. P. S. Bhaskara Rao and M. Bhaskara Rao (1983), Theory of Charges: A Study of Finitely Additive Measures,
London: Academic Press, pp. x + 315, ISBN 0-1209-5780-9
• Shilov, G. E., and Gurevich, B. L., 1978. Integral, Measure, and Derivative: A Unified Approach, Richard A.
Silverman, trans. Dover Publications. ISBN 0-486-63519-8. Emphasizes the Daniell integral.
Measure (mathematics) 149

External links
• Tutorial: Measure Theory for Dummies [2]

References
[1] http:/ / www. essex. ac. uk/ maths/ people/ fremlin/ mt. htm
[2] http:/ / www. ee. washington. edu/ techsite/ papers/ documents/ UWEETR-2006-0008. pdf

Bias of an estimator
In statistics, bias (or bias function) of an estimator is the difference between this estimator's expected value and the
true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased.
Otherwise the estimator is said to be biased.
In ordinary English, the term bias is pejorative. In statistics, there are problems for which it may be good to use an
estimator with a small, but nonzero, bias. In some cases, an estimator with a small bias may have lesser mean
squared error or be median-unbiased (rather than mean-unbiased, the standard unbiasedness property). The property
of median-unbiasedness is invariant under transformations while the property of mean-unbiasedness may be lost
under nonlinear transformations.

Definition
Suppose θ^is an estimator of parameter θ. Then the bias of this estimator is defined to be

where E[ ] denotes expected value.


An estimator is said to be unbiased if its bias is equal to zero for all values of parameter θ.
There are more general notions of bias and unbiasedness. What this article calls "bias" is called "mean-bias", to
distinguish mean-bias from the other notions, notably "median-unbiased" estimators. The general theory of unbiased
estimators is briefly discussed near the end of this article.
In a simulation experiment concerning the properties of an estimator, the bias of the estimator may be assessed using
the mean signed difference.

Examples

Sample variance
Suppose X1, ..., Xn are independent and identically distributed (i.i.d) random variables with expectation μ and
variance σ2. If the sample mean and sample variance are defined as

then S2 is a biased estimator of σ2, because


In other words, the expected value of the sample variance does not equal the population variance σ2, unless
multiplied by a normalization factor. The sample mean, on the other hand, is an unbiased estimator of the population
mean μ.
The reason that S2 is biased stems from the fact that the sample mean is an Ordinary Least Squares (OLS) estimator
for μ: it's such a number which makes the sum Σ(Xi − m)2 as small as possible. That is, when you plug any other
number into this sum, for example when you plug m = μ, the sum can only increase. Therefore σ2 will always be
Bias of an estimator 150

greater than the sample variance, since σ2 is an expected value of (Xi − m)2.


Note that the usual definition of sample variance,

is an unbiased estimator of the population variance.

This can be seen by noticing that

[1]
and hence gives the result (This pdf contains the full proof by Scott D.

Anderson).

Estimating a Poisson probability


A far more extreme case of a biased estimator being better than any unbiased estimator arises from the Poisson
distribution:[2] [3] : Suppose X has a Poisson distribution with expectation λ. Suppose it is desired to estimate

(For example, when incoming calls at a telephone switchboard are modeled as a Poisson process, and λ is the
average number of calls per minute, then e−2λ is the probability that no calls arrive in the next two minutes.)
Since the expectation of an unbiased estimator δ(X) is equal to the estimand, i.e.

the only function of the data constituting an unbiased estimator is

To see this, note that when decomposing e−λ from the above expression for expectation, the sum that is left is a
Taylor Series expansion of e−λ as well, yielding e−λe−λ = e−2λ (see Characterizations of the exponential function).
If the observed value of X is 100, then the estimate is 1, although the true value of the quantity being estimated is
obviously very likely to be near 0, which is the opposite extreme. And if X is observed to be 101, then the estimate is
even more absurd: it is −1, although the quantity being estimated obviously must be positive.
The (biased) maximum likelihood estimator

is far better than this unbiased estimator. Not only is its value always positive, but it is also more accurate in the
sense that its mean squared error

is smaller; compare the unbiased estimator's MSE of

The MSEs are functions of the true value λ. The bias of the maximum-likelihood estimator is:
Bias of an estimator 151

Maximum of a discrete uniform distribution


The bias of maximum-likelihood estimators can be substantial. Consider a case where n tickets numbered from 1
through to n are placed in a box and one is selected at random, giving a value X. If n is unknown, then the
maximum-likelihood estimator of n is X, even though the expectation of X is only (n + 1)/2; we can only be certain
that n is at least X and is probably more. In this case, the natural unbiased estimator is 2X − 1.

Median-unbiased estimators, and bias with respect to other loss functions


Any mean-unbiased estimator minimizes the risk (expected loss) with respect to the squared-error loss function, as
observed by Gauss. A median-unbiased estimator minimizes the risk with respect to the absolute loss function, as
observed by Laplace. Other loss functions are used in statistical theory, particularly in robust statistics.
The theory of median-unbiased estimators was revived by George W. Brown [4] in 1947:
An estimate of a one-dimensional parameter θ will be said to be median-unbiased, if for fixed θ, the
median of the distribution of the estimate is at the value θ, i.e., the estimate underestimates just as often
as it overestimates. This requirement seems for most purposes to accomplish as much as the
mean-unbiased requirement and has the additional property that it is invariant under one-to-one
transformation.[4]
Further properties of median-unbiased estimators have been noted by Lehmann, Birnbaum, van der Vaart and
Pfanzagl. In particular, median-unbiased estimators exist in cases where mean-unbiased and maximum-likelihood
estimators do not exist. Besides being invariant under one-to-one transformations, median-unbiased estimators have
surprising robustness.

Effect of transformations
Note that when a transformation is applied to a mean-unbiased estimator, the result need not be a mean-unbiased
estimator of its corresponding population statistic. That is, for a non-linear function f and a mean-unbiased estimator
U of a parameter p, the composite estimator f(U) need not be a mean-unbiased estimator of f(p). For example the
square root of the unbiased estimator of the population variance is not a mean-unbiased estimator of the population
standard deviation.

See also
• Omitted-variable bias
• Consistent estimator
• Estimation theory
• Expected loss
• Expected value
• Loss function
• Median
• Statistical decision theory
Bias of an estimator 152

References
• Brown, George W. [4] "On Small-Sample Estimation." The Annals of Mathematical Statistics, Vol. 18, No. 4
(Dec., 1947), pp. 582–585. JSTOR 2236236
• Lehmann, E.L. "A General Concept of Unbiasedness" The Annals of Mathematical Statistics, Vol. 22, No. 4
(Dec., 1951), pp. 587–592. JSTOR 2236928
• Allan Birnbaum. 1961. "A Unified Theory of Estimation, I", The Annals of Mathematical Statistics, Vol. 32, No.
1 (Mar., 1961), pp. 112–135
• van der Vaart, H.R. 1961. "Some Extensions of the Idea of Bias" The Annals of Mathematical Statistics, Vol. 32,
No. 2 (Jun., 1961), pp. 436–447.
• Pfanzagl, Johann. 1994. Parametric Statistical Theory. Walter de Gruyter.
• Stuart, Alan; Ord, Keith; Arnold, Steven [F.] (1999). Classical Inference and the Linear Model. Kendall's
Advanced Theory of Statistics. 2A (Sixth ed.). London: Arnold. pp. xxii+885. MR1687411. ISBN 0-340-66230-1.
• V.G. Voinov and M.S. Nikulin. "Unbiased Estimators and Their Applications", in two volumes (vol. 1, Univariate
case; vol. 2, Multivariate case). Kluwer Academic Publishers: Dordrecht, 1993, 1996.

References
[1] http:/ / biology. ucf. edu/ ~pascencio/ classes/ Methods/ Proof%20that%20Sample%20Variance%20is%20Unbiased. pdf
[2] J.P. Romano and A.F. Siegel, Counterexamples in Probability and Statistics, Wadsworth & Brooks/Cole, Monterey, CA, 1986
[3] Hardy, M. (1 March 2003). "An Illuminating Counterexample" (http:/ / jstor. org/ stable/ 3647938). American Mathematical Monthly 110 (3):
234–238. doi:10.2307/3647938. ISSN 00029890. .
[4] Brown (1947), page 583

Probability
Probability is a way of expressing knowledge or belief that an event will occur or has occurred. The concept has
been given an exact mathematical meaning in probability theory, which is used extensively in such areas of study as
mathematics, statistics, finance, gambling, science, and philosophy to draw conclusions about the likelihood of
potential events and the underlying mechanics of complex systems.

Interpretations
The word probability does not have a consistent direct definition. In fact, there are two broad categories of
probability interpretations, whose adherents possess different (and sometimes conflicting) views about the
fundamental nature of probability:
1. Frequentists talk about probabilities only when dealing with experiments that are random and well-defined. The
probability of a random event denotes the relative frequency of occurrence of an experiment's outcome, when
repeating the experiment. Frequentists consider probability to be the relative frequency "in the long run" of
outcomes.[1]
2. Bayesians, however, assign probabilities to any statement whatsoever, even when no random process is involved.
Probability, for a Bayesian, is a way to represent an individual's degree of belief in a statement, or an objective
degree of rational belief, given the evidence.
Probability 153

Etymology
The word Probability derives from latin word probabilitas that can also mean probity, a measure of the authority of a
witness in a legal case in Europe, and often correlated with the witness's nobility. In a sense, this differs much from
the modern meaning of probability, which, in contrast, is used as a measure of the weight of empirical evidence, and
is arrived at from inductive reasoning and statistical inference.[2] [3]

History
The scientific study of probability is a modern development. Gambling shows that there has been an interest in
quantifying the ideas of probability for millennia, but exact mathematical descriptions of use in those problems only
arose much later.
According to Richard Jeffrey, "Before the middle of the seventeenth century, the term 'probable' (Latin probabilis)
meant approvable, and was applied in that sense, univocally, to opinion and to action. A probable action or opinion
was one such as sensible people would undertake or hold, in the circumstances."[4] However, in legal contexts
especially, 'probable' could also apply to propositions for which there was good evidence.[5]
Aside from some elementary considerations made by Girolamo Cardano in the 16th century, the doctrine of
probabilities dates to the correspondence of Pierre de Fermat and Blaise Pascal (1654). Christiaan Huygens (1657)
gave the earliest known scientific treatment of the subject. Jakob Bernoulli's Ars Conjectandi (posthumous, 1713)
and Abraham de Moivre's Doctrine of Chances (1718) treated the subject as a branch of mathematics. See Ian
Hacking's The Emergence of Probability and James Franklin's The Science of Conjecture for histories of the early
development of the very concept of mathematical probability.
The theory of errors may be traced back to Roger Cotes's Opera Miscellanea (posthumous, 1722), but a memoir
prepared by Thomas Simpson in 1755 (printed 1756) first applied the theory to the discussion of errors of
observation. The reprint (1757) of this memoir lays down the axioms that positive and negative errors are equally
probable, and that there are certain assignable limits within which all errors may be supposed to fall; continuous
errors are discussed and a probability curve is given.
Pierre-Simon Laplace (1774) made the first attempt to deduce a rule for the combination of observations from the
principles of the theory of probabilities. He represented the law of probability of errors by a curve ,
being any error and its probability, and laid down three properties of this curve:
1. it is symmetric as to the -axis;
2. the -axis is an asymptote, the probability of the error being 0;
3. the area enclosed is 1, it being certain that an error exists.
He also gave (1781) a formula for the law of facility of error (a term due to Lagrange, 1774), but one which led to
unmanageable equations. Daniel Bernoulli (1778) introduced the principle of the maximum product of the
probabilities of a system of concurrent errors.
The method of least squares is due to Adrien-Marie Legendre (1805), who introduced it in his Nouvelles méthodes
pour la détermination des orbites des comètes (New Methods for Determining the Orbits of Comets). In ignorance of
Legendre's contribution, an Irish-American writer, Robert Adrain, editor of "The Analyst" (1808), first deduced the
law of facility of error,

being a constant depending on precision of observation, and a scale factor ensuring that the area under the
curve equals 1. He gave two proofs, the second being essentially the same as John Herschel's (1850). Gauss gave the
first proof which seems to have been known in Europe (the third after Adrain's) in 1809. Further proofs were given
by Laplace (1810, 1812), Gauss (1823), James Ivory (1825, 1826), Hagen (1837), Friedrich Bessel (1838), W. F.
Donkin (1844, 1856), and Morgan Crofton (1870). Other contributors were Ellis (1844), De Morgan (1864),
Probability 154

Glaisher (1872), and Giovanni Schiaparelli (1875). Peters's (1856) formula for , the probable error of a single
observation, is well known.
In the nineteenth century authors on the general theory included Laplace, Sylvestre Lacroix (1816), Littrow (1833),
Adolphe Quetelet (1853), Richard Dedekind (1860), Helmert (1872), Hermann Laurent (1873), Liagre, Didion, and
Karl Pearson. Augustus De Morgan and George Boole improved the exposition of the theory.
Andrey Markov introduced the notion of Markov chains (1906) playing an important role in theory of stochastic
processes and its applications.
The modern theory of probability based on the measure theory was developed by Andrey Kolmogorov (1931).
On the geometric side (see integral geometry) contributors to The Educational Times were influential (Miller,
Crofton, McColl, Wolstenholme, Watson, and Artemas Martin).

Mathematical treatment
In mathematics, a probability of an event A is represented by a real number in the range from 0 to 1 and written as
P(A), p(A) or Pr(A).[6] An impossible event has a probability of 0, and a certain event has a probability of 1.
However, the converses are not always true: probability 0 events are not always impossible, nor probability 1 events
certain. The rather subtle distinction between "certain" and "probability 1" is treated at greater length in the article on
"almost surely".
The opposite or complement of an event A is the event [not A] (that is, the event of A not occurring); its probability is
given by P(not A) = 1 - P(A).[7] As an example, the chance of not rolling a six on a six-sided die is 1 – (chance of
rolling a six) . See Complementary event for a more complete treatment.
If both the events A and B occur on a single performance of an experiment this is called the intersection or joint
probability of A and B, denoted as . If two events, A and B are independent then the joint probability is

[8]
for example, if two coins are flipped the chance of both being heads is
If either event A or event B or both events occur on a single performance of an experiment this is called the union of
the events A and B denoted as . If two events are mutually exclusive then the probability of either
occurring is

For example, the chance of rolling a 1 or 2 on a six-sided die is


If the events are not mutually exclusive then

For example, when drawing a single card at random from a regular deck of cards, the chance of getting a heart or a
face card (J,Q,K) (or one that is both) is , because of the 52 cards of a deck 13 are hearts, 12
are face cards, and 3 are both: here the possibilities included in the "3 that are both" are included in each of the "13
hearts" and the "12 face cards" but should only be counted once.
Conditional probability is the probability of some event A, given the occurrence of some other event B. Conditional
probability is written P(A|B), and is read "the probability of A, given B". It is defined by

[9]

If then is undefined.
Probability 155

Summary of probabilities
Event Probability

not A

A or B

A and B

A given
B

Theory
Like other theories, the theory of probability is a representation of probabilistic concepts in formal terms—that is, in
terms that can be considered separately from their meaning. These formal terms are manipulated by the rules of
mathematics and logic, and any results are then interpreted or translated back into the problem domain.
There have been at least two successful attempts to formalize probability, namely the Kolmogorov formulation and
the Cox formulation. In Kolmogorov's formulation (see probability space), sets are interpreted as events and
probability itself as a measure on a class of sets. In Cox's theorem, probability is taken as a primitive (that is, not
further analyzed) and the emphasis is on constructing a consistent assignment of probability values to propositions.
In both cases, the laws of probability are the same, except for technical details.
There are other methods for quantifying uncertainty, such as the Dempster-Shafer theory or possibility theory, but
those are essentially different and not compatible with the laws of probability as they are usually understood.

Applications
Two major applications of probability theory in everyday life are in risk assessment and in trade on commodity
markets. Governments typically apply probabilistic methods in environmental regulation where it is called "pathway
analysis", often measuring well-being using methods that are stochastic in nature, and choosing projects to undertake
based on statistical analyses of their probable effect on the population as a whole.
A good example is the effect of the perceived probability of any widespread Middle East conflict on oil prices -
which have ripple effects in the economy as a whole. An assessment by a commodity trader that a war is more likely
vs. less likely sends prices up or down, and signals other traders of that opinion. Accordingly, the probabilities are
not assessed independently nor necessarily very rationally. The theory of behavioral finance emerged to describe the
effect of such groupthink on pricing, on policy, and on peace and conflict.
It can reasonably be said that the discovery of rigorous methods to assess and combine probability assessments has
had a profound effect on modern society. Accordingly, it may be of some importance to most citizens to understand
how odds and probability assessments are made, and how they contribute to reputations and to decisions, especially
in a democracy.
Another significant application of probability theory in everyday life is reliability. Many consumer products, such as
automobiles and consumer electronics, utilize reliability theory in the design of the product in order to reduce the
probability of failure. The probability of failure may be closely associated with the product's warranty.
Probability 156

Relation to randomness
In a deterministic universe, based on Newtonian concepts, there is no probability if all conditions are known. In the
case of a roulette wheel, if the force of the hand and the period of that force are known, then the number on which
the ball will stop would be a certainty. Of course, this also assumes knowledge of inertia and friction of the wheel,
weight, smoothness and roundness of the ball, variations in hand speed during the turning and so forth. A
probabilistic description can thus be more useful than Newtonian mechanics for analyzing the pattern of outcomes of
repeated rolls of roulette wheel. Physicists face the same situation in kinetic theory of gases, where the system, while
deterministic in principle, is so complex (with the number of molecules typically the order of magnitude of
Avogadro constant 6.02·1023) that only statistical description of its properties is feasible.
A revolutionary discovery of 20th century physics was the random character of all physical processes that occur at
sub-atomic scales and are governed by the laws of quantum mechanics. The wave function itself evolves
deterministically as long as no observation is made, but, according to the prevailing Copenhagen interpretation, the
randomness caused by the wave function collapsing when an observation is made, is fundamental. This means that
probability theory is required to describe nature. Others never came to terms with the loss of determinism. Albert
Einstein famously remarked in a letter to Max Born: Jedenfalls bin ich überzeugt, daß der Alte nicht würfelt. (I am
convinced that God does not play dice). Although alternative viewpoints exist, such as that of quantum decoherence
being the cause of an apparent random collapse, at present there is a firm consensus among physicists that
probability theory is necessary to describe quantum phenomena.

See also
• Black Swan theory
• Calculus of predispositions
• Chance
• Class membership probabilities
• Decision theory
• Equiprobable
• Fuzzy measure theory
• Game theory
• Gaming mathematics
• Information theory
• Important publications in probability
• Measure theory
• Negative probability
• Probabilistic argumentation
• Probabilistic logic
• Random fields
• Random variable
• List of scientific journals in probability
• List of statistical topics
• Stochastic process
• Wiener process
Probability 157

References
• Kallenberg, O. (2005) Probabilistic Symmetries and Invariance Principles. Springer -Verlag, New York. 510
pp. ISBN 0-387-25115-4
• Kallenberg, O. (2002) Foundations of Modern Probability, 2nd ed. Springer Series in Statistics. 650 pp. ISBN
0-387-95313-2
• Olofsson, Peter (2005) Probability, Statistics, and Stochastic Processes, Wiley-Interscience. 504 pp ISBN
0-471-67969-0.

Quotations
• Damon Runyon, "It may be that the race is not always to the swift, nor the battle to the strong - but that is the way
to bet."
• Pierre-Simon Laplace "It is remarkable that a science which began with the consideration of games of chance
should have become the most important object of human knowledge." Théorie Analytique des Probabilités, 1812.
• Richard von Mises "The unlimited extension of the validity of the exact sciences was a characteristic feature of
the exaggerated rationalism of the eighteenth century" (in reference to Laplace). Probability, Statistics, and Truth,
p 9. Dover edition, 1981 (republication of second English edition, 1957).

External links
• Probability and Statistics EBook [10]
• Edwin Thompson Jaynes. Probability Theory: The Logic of Science. Preprint: Washington University, (1996). —
HTML index with links to PostScript files [11] and PDF [12] (first three chapters)
• People from the History of Probability and Statistics (Univ. of Southampton) [13]
• Probability and Statistics on the Earliest Uses Pages (Univ. of Southampton) [14]
• Earliest Uses of Symbols in Probability and Statistics [15] on Earliest Uses of Various Mathematical Symbols [16]
• A tutorial on probability and Bayes’ theorem devised for first-year Oxford University students [17]
• pdf file of An Anthology of Chance Operations (1963) [18] at UbuWeb
• Probability Theory Guide for Non-Mathematicians [19]
• Understanding Risk and Probability [20] with BBC raw

References
[1] The Logic of Statistical Inference, Ian Hacking, 1965
[2] The Emergence of Probability: A Philosophical Study of Early Ideas about Probability, Induction and Statistical Inference, Ian Hacking,
Cambridge University Press, 2006, ISBN 0521685575, 9780521685573
[3] The Cambridge History of Seventeenth-century Philosophy, Daniel Garber, 2003
[4] Jeffrey, R.C., Probability and the Art of Judgment, Cambridge University Press. (1992). pp. 54-55 . ISBN 0-521-39459-7
[5] Franklin, J., The Science of Conjecture: Evidence and Probability Before Pascal, Johns Hopkins University Press. (2001). pp. 22, 113, 127
[6] Olofsson, Peter. (2005) Page 8.
[7] Olofsson, page 9
[8] Olofsson, page 35.
[9] Olofsson, page 29.
[10] http:/ / wiki. stat. ucla. edu/ socr/ index. php/ EBook
[11] http:/ / omega. albany. edu:8008/ JaynesBook. html
[12] http:/ / bayes. wustl. edu/ etj/ prob/ book. pdf
[13] http:/ / www. economics. soton. ac. uk/ staff/ aldrich/ Figures. htm
[14] http:/ / www. economics. soton. ac. uk/ staff/ aldrich/ Probability%20Earliest%20Uses. htm
[15] http:/ / jeff560. tripod. com/ stat. html
[16] http:/ / jeff560. tripod. com/ mathsym. html
[17] http:/ / www. celiagreen. com/ charlesmccreery/ statistics/ bayestutorial. pdf
[18] http:/ / ubu. com/ historical/ young/ index. html
Probability 158

[19] http:/ / probability. infarom. ro


[20] http:/ / www. bbc. co. uk/ raw/ money/ express_unit_risk/

Pierre-Simon Laplace
Pierre-Simon, marquis de Laplace

Pierre-Simon Laplace (1749–1827). Posthumous portrait by Madame Feytaud, 1842.

Born 23 March 1749


Beaumont-en-Auge, Normandy, France

Died 5 March 1827 (aged 77)


Paris, France

Nationality  France

Fields Astronomer and Mathematician

Institutions École Militaire (1769–1776)

Alma mater University of Caen

Academic advisors Jean d'Alembert


Christophe Gadbled
Pierre Le Canu

Doctoral students Siméon Denis Poisson

Known for Work in Celestial Mechanics


Laplace's equation
Laplacian
Laplace transform
Laplace distribution
Laplace's demon
Laplace expansion
Young–Laplace equation
Laplace number
Laplace limit
Laplace invariant
Laplace principle

Pierre-Simon, marquis de Laplace (23 March 1749 – 5 March 1827) was a French mathematician and astronomer
whose work was pivotal to the development of mathematical astronomy and statistics. He summarized and extended
the work of his predecessors in his five volume Mécanique Céleste (Celestial Mechanics) (1799–1825). This work
translated the geometric study of classical mechanics to one based on calculus, opening up a broader range of
problems. In statistics, the so-called Bayesian interpretation of probability was mainly developed by Laplace.[1]
Pierre-Simon Laplace 159

He formulated Laplace's equation, and pioneered the Laplace transform which appears in many branches of
mathematical physics, a field that he took a leading role in forming. The Laplacian differential operator, widely used
in applied mathematics, is also named after him.
He restated and developed the nebular hypothesis of the origin of the solar system and was one of the first scientists
to postulate the existence of black holes and the notion of gravitational collapse.
He is remembered as one of the greatest scientists of all time, sometimes referred to as a French Newton or Newton
of France, with a phenomenal natural mathematical faculty superior to any of his contemporaries.[2]
He became a count of the First French Empire in 1806 and was named a marquis in 1817, after the Bourbon
Restoration.

Early life
Many details of the life of Laplace were lost when the family château burned in 1925.[3] Laplace was born in
Beaumont-en-Auge, Normandy in 1749. According to W. W. Rouse Ball (A Short Account of the History of
Mathematics, 4th edition, 1908), he was the son of a small cottager or perhaps a farm-labourer, and owed his
education to the interest excited in some wealthy neighbours by his abilities and engaging presence. Very little is
known of his early years. It would seem from a pupil he became an usher in the school at Beaumont; but, having
procured a letter of introduction to d'Alembert, he went to Paris to push his fortune. However, Karl Pearson[3] is
scathing about the inaccuracies in Rouse Ball's account and states,
Indeed Caen was probably in Laplace's day the most intellectually active of all the towns of Normandy.
It was here that Laplace was educated and was provisionally a professor. It was here he wrote his first
paper published in the Mélanges of the Royal Society of Turin, Tome iv. 1766–1769, at least two years
before he went at 22 or 23 to Paris in 1771. Thus before he was 20 he was in touch with Lagrange in
Turin. He did not go to Paris a raw self-taught country lad with only a peasant background! In 1765 at
the age of sixteen Laplace left the "School of the Duke of Orleans" in Beaumont and went to the
University of Caen, where he appears to have studied for five years. The 'Ecole militaire' of Beaumont
did not replace the old school until 1776.
His parents were from comfortable families. His father was Pierre Laplace, and his mother was Marie-Anne Sochon.
The Laplace family was involved in agriculture until at least 1750, but Pierre Laplace senior was also a cider
merchant and syndic of the town of Beaumont.
Pierre Simon Laplace attended a school in the village run at a Benedictine priory, his father intending that he would
be ordained in the Roman Catholic Church, and at sixteen he was sent to further his father's intention at the
University of Caen, reading theology.[4]
At the university, he was mentored by two enthusiastic teachers of mathematics, Christophe Gadbled and Pierre Le
Canu, who awoke his zeal for the subject. Laplace never graduated in theology but left for Paris with a letter of
introduction from Le Canu to Jean le Rond d'Alembert.[4]
According to his great-great-grandson,[3] d'Alembert received him rather poorly, and to get rid of him gave him a
thick mathematics book, saying to come back when he had read it. When Laplace came back a few days later,
d'Alembert was even less friendly and did not hide his opinion that it was impossible that Laplace could have read
and understood the book. But upon questioning him, he realized that it was true, and from that time he took Laplace
under his care.
Another version is that Laplace solved overnight a problem that d'Alembert set him for submission the following
week, then solved a harder problem the following night. D'Alembert was impressed and recommended him for a
teaching place in the École Militaire.[5]
With a secure income and undemanding teaching, Laplace now threw himself into original research and, in the next
seventeen years, 1771–1787, he produced much of his original work in astronomy.[6]
Pierre-Simon Laplace 160

Laplace further impressed the Marquis de Condorcet, and even in 1771 Laplace felt that he was entitled to
membership in the French Academy of Sciences. However, in that year, admission went to Alexandre-Théophile
Vandermonde and in 1772 to Jacques Antoine Joseph Cousin. Laplace was disgruntled, and at the beginning of
1773, d'Alembert wrote to Lagrange in Berlin to ask if a position could be found for Laplace there. However,
Condorcet became permanent secretary of the Académie in February and Laplace was elected associate member on
31 March, at age 24.[7]
He married Marie-Charlotte de Courty de Romanges in his late thirties and the couple had a daughter, Sophie, and a
son, Charles-Émile (b. 1789).[3] [8]

Analysis, probability and astronomical stability


Laplace's early published work in 1771 started with differential equations and finite differences but he was already
starting to think about the mathematical and philosophical concepts of probability and statistics.[9] However, before
his election to the Académie in 1773, he had already drafted two papers that would establish his reputation. The first,
Mémoire sur la probabilité des causes par les événements was ultimately published in 1774 while the second paper,
published in 1776, further elaborated his statistical thinking and also began his systematic work on celestial
mechanics and the stability of the solar system. The two disciplines would always be interlinked in his mind.
"Laplace took probability as an instrument for repairing defects in knowledge."[10] Laplace's work on probability and
statistics is discussed below with his mature work on the Analytic theory of probabilities.

Stability of the solar system


Sir Isaac Newton had published his Philosophiae Naturalis Principia Mathematica in 1687 in which he gave a
derivation of Kepler's laws, which describe the motion of the planets, from his laws of motion and his law of
universal gravitation. However, though Newton had privately developed the methods of calculus, all his published
work used cumbersome geometric reasoning, unsuitable to account for the more subtle higher-order effects of
interactions between the planets. Newton himself had doubted the possibility of a mathematical solution to the
whole, even concluding that periodic divine intervention was necessary to guarantee the stability of the solar system.
Dispensing with the hypothesis of divine intervention would be a major activity of Laplace's scientific life.[11] It is
now generally regarded that Laplace's methods on their own, though vital to the development of the theory, are not
sufficiently precise to demonstrate the stability of the Solar System,[12] and indeed, the Solar System is now
understood to be chaotic, although it actually appears to be fairly stable.
One particular problem from observational astronomy was the apparent instability whereby Jupiter's orbit appeared
to be shrinking while that of Saturn was expanding. The problem had been tackled by Leonhard Euler in 1748 and
Joseph Louis Lagrange in 1763 but without success.[13] In 1776, Laplace published a memoir in which he first
explored the possible influences of a purported luminiferous ether or of a law of gravitation that did not act
instantaneously. He ultimately returned to an intellectual investment in Newtonian gravity.[14] Euler and Lagrange
had made a practical approximation by ignoring small terms in the equations of motion. Laplace noted that though
the terms themselves were small, when integrated over time they could become important. Laplace carried his
analysis into the higher-order terms, up to and including the cubic. Using this more exact analysis, Laplace
concluded that any two planets and the sun must be in mutual equilibrium and thereby launched his work on the
stability of the solar system.[15] Gerald James Whitrow described the achievement as "the most important advance in
physical astronomy since Newton".[11]
Laplace had a wide knowledge of all sciences and dominated all discussions in the Académie.[16] Laplace seems to
have regarded analysis merely as a means of attacking physical problems, though the ability with which he invented
the necessary analysis is almost phenomenal. As long as his results were true he took but little trouble to explain the
steps by which he arrived at them; he never studied elegance or symmetry in his processes, and it was sufficient for
him if he could by any means solve the particular question he was discussing.[6]
Pierre-Simon Laplace 161

On the figure of the Earth


During the years 1784–1787 he published some memoirs of exceptional power. Prominent among these is one read
in 1783, reprinted as Part II of Théorie du Mouvement et de la figure elliptique des planètes in 1784, and in the third
volume of the Méchanique céleste. In this work, Laplace completely determined the attraction of a spheroid on a
particle outside it. This is memorable for the introduction into analysis of spherical harmonics or Laplace's
coefficients, and also for the development of the use of the potential, a name first used by George Green in 1828.[6]

Spherical harmonics
In 1783, in a paper sent to the Académie, Adrien-Marie Legendre had
introduced what are now known as associated Legendre functions.[6] If two
points in a plane have polar co-ordinates (r, θ) and (r ', θ'), where r ' ≥ r, then,
by elementary manipulation, the reciprocal of the distance between the points,
d, can be written as:

Spherical harmonics

This expression can be expanded in powers of r/r ' using Newton's generalized binomial theorem to give:

The sequence of functions P0k(cosф) is the set of so-called "associated Legendre functions" and their usefulness
arises from the fact that every function of the points on a circle can be expanded as a series of them.[6]
Laplace, with scant regard for credit to Legendre, made the non-trivial extension of the result to three dimensions to
yield a more general set of functions, the spherical harmonics or Laplace coefficients. The latter term is not now in
common use. Every function of the points on a sphere can be expanded as a series of them.[6]

Potential theory
This paper is also remarkable for the development of the idea of the scalar potential.[6] The gravitational force acting
on a body is, in modern language, a vector, having magnitude and direction. A potential function is a scalar function
that defines how the vectors will behave. A scalar function is computationally and conceptually easier to deal with
than a vector function.
Alexis Clairault had first suggested the idea in 1743 while working on a similar problem though he was using
Newtonian-type geometric reasoning. Laplace described Clairault's work as being "in the class of the most beautiful
mathematical productions".[17] However, Rouse Ball alleges that the idea "was appropriated from Joseph Louis
Lagrange, who had used it in his memoirs of 1773, 1777 and 1780".[6]
Laplace applied the language of calculus to the potential function and showed that it always satisfies the differential
equation:[6]
Pierre-Simon Laplace 162

His subsequent work on gravitational attraction was based on this result. The quantity ∇2V has been termed the
concentration of and its value at any point indicates the "excess" of the value of there over its mean value in
the neighbourhood of the point. Laplace's equation, a special case of Poisson's equation, appears ubiquitously in
mathematical physics. The concept of a potential occurs in fluid dynamics, electromagnetism and other areas. Rouse
Ball speculated that it might be seen as "the outward sign" of one the "prior forms" in Kant's theory of perception.[6]
The spherical harmonics turn out to be critical to practical solutions of Laplace's equation. Laplace's equation in
spherical coordinates, such as are used for mapping the sky, can be simplified, using the method of separation of
variables into a radial part, depending solely on distance from the centre point, and an angular or spherical part. The
solution to the spherical part of the equation can be expressed as a series of Laplace's spherical harmonics,
simplifying practical computation.

Planetary and lunar inequalities

Jupiter-Saturn great inequality


Laplace presented a memoir on planetary inequalities in three sections, in 1784, 1785, and 1786. This dealt mainly
with the identification and explanation of the perturbations now known as the "great Jupiter-Saturn inequality".
Laplace solved a longstanding problem in the study and prediction of the movements of these planets. He showed by
general considerations, first, that the mutual action of two planets could never cause large changes in the
eccentricities and inclinations of their orbits; but then, even more importantly, that peculiarities arose in the
Jupiter-Saturn system because of the near approach to commensurability of the mean motions of Jupiter and Saturn.
(Commensurability, in this context, means related by ratios of small whole numbers. Two periods of Saturn's orbit
around the Sun almost equal five of Jupiter's. The corresponding difference between multiples of the mean motions,
(2nJ − 5nS), corresponds to a period of nearly 900 years, and it occurs as a small divisor in the integration of a very
small perturbing force with this same period. As a result, the integrated perturbations with this period are
disproportionately large, about 0.8° (degrees of arc in orbital longitude) for Saturn and about 0.3° for Jupiter.)
Further developments of these theorems on planetary motion were given in his two memoirs of 1788 and 1789, but
with the aid of Laplace's discoveries, the tables of the motions of Jupiter and Saturn could at last be made much more
accurate. It was on the basis of Laplace's theory that Delambre computed his astronomical tables.[6]

Lunar inequalities
Laplace also produced an analytical solution (as it turned out later, a partial solution), to a significant problem
regarding the motion of the Moon. Edmond Halley had been the first to suggest, in 1695,[18] that the mean motion of
the Moon was apparently getting faster, by comparison with ancient eclipse observations, but he gave no data. (It
was not yet known in Halley's or Laplace's times that what is actually occurring includes a slowing-down of the
Earth's rate of rotation: see also Ephemeris time - History. When measured as a function of mean solar time rather
than uniform time, the effect appears as a positive acceleration.) In 1749 Richard Dunthorne confirmed Halley's
suspicion after re-examining ancient records, and produced the first quantitative estimate for the size of this apparent
effect:[19] a centurial rate of +10" (arcseconds) in lunar longitude (a surprisingly good result for its time, not far
different from values assessed later, e.g. in 1786 by de Lalande[20] , and to compare with values from about 10" to
nearly 13" being derived about century later.)[21] [22] The effect became known as the secular acceleration of the
Moon, but until Laplace, its cause remained unknown.
Laplace gave an explanation of the effect in 1787, showing how an acceleration arises from changes (a secular
reduction) in the eccentricity of the Earth's orbit, which in turn is one of the effects of planetary perturbations on the
Earth. Laplace's initial computation accounted for the whole effect, thus seeming to tie up the theory neatly with both
modern and ancient observations. However, in 1853, J C Adams caused the question to be re-opened by finding an
error in Laplace's computations: it turned out that only about half of the Moon's apparent acceleration could be
Pierre-Simon Laplace 163

accounted for on Laplace's basis by the change in the Earth's orbital eccentricity.[23] (Adams showed that Laplace
had in effect only considered the radial force on the moon and not the tangential, and the partial result hence had
overstimated the acceleration, the remaining (negative), terms when accounted for, showed that Laplace's cause
could not explain more than about half of the acceleration. The other half was subsequently shown to be due to tidal
acceleration.[24] )
Laplace used his results concerning the lunar acceleration when completing his attempted "proof" of the stability of
the whole solar system on the assumption that it consists of a collection of rigid bodies moving in a vacuum.[6]
All the memoirs above alluded to were presented to the Académie des sciences, and they are printed in the Mémoires
présentés par divers savants.[6]

Celestial mechanics
Laplace now set himself the task to write a work which should "offer a complete solution of the great mechanical
problem presented by the solar system, and bring theory to coincide so closely with observation that empirical
equations should no longer find a place in astronomical tables." The result is embodied in the Exposition du système
du monde and the Mécanique céleste.[6]
The former was published in 1796, and gives a general explanation of the phenomena, but omits all details. It
contains a summary of the history of astronomy. This summary procured for its author the honour of admission to
the forty of the French Academy and is commonly esteemed one of the masterpieces of French literature, though it is
not altogether reliable for the later periods of which it treats.[6]
Laplace developed the nebular hypothesis of the formation of the solar system, first suggested by Emanuel
Swedenborg and expanded by Immanuel Kant, a hypothesis that continues to dominate accounts of the origin of
planetary systems. According to Laplace's description of the hypothesis, the solar system had evolved from a
globular mass of incandescent gas rotating around an axis through its centre of mass. As it cooled, this mass
contracted, and successive rings broke off from its outer edge. These rings in their turn cooled, and finally condensed
into the planets, while the sun represented the central core which was still left. On this view, Laplace predicted that
the more distant planets would be older than those nearer the sun.[6] [25]
As mentioned, the idea of the nebular hypothesis had been outlined by Immanuel Kant in 1755,[25] and he had also
suggested "meteoric aggregations" and tidal friction as causes affecting the formation of the solar system. Laplace
was probably aware of this, but, like many writers of his time, he generally did not reference the work of others.[3]
Laplace's analytical discussion of the solar system is given in his Méchanique céleste published in five volumes. The
first two volumes, published in 1799, contain methods for calculating the motions of the planets, determining their
figures, and resolving tidal problems. The third and fourth volumes, published in 1802 and 1805, contain
applications of these methods, and several astronomical tables. The fifth volume, published in 1825, is mainly
historical, but it gives as appendices the results of Laplace's latest researches. Laplace's own investigations embodied
in it are so numerous and valuable that it is regrettable to have to add that many results are appropriated from other
writers with scanty or no acknowledgement, and the conclusions – which have been described as the organized result
of a century of patient toil – are frequently mentioned as if they were due to Laplace.[6]
Jean-Baptiste Biot, who assisted Laplace in revising it for the press, says that Laplace himself was frequently unable
to recover the details in the chain of reasoning, and, if satisfied that the conclusions were correct, he was content to
insert the constantly recurring formula, "Il est aisé à voir que..." ("It is easy to see that..."). The Mécanique céleste is
not only the translation of Newton's Principia into the language of the differential calculus, but it completes parts of
which Newton had been unable to fill in the details. The work was carried forward in a more finely tuned form in
Félix Tisserand's Traité de mécanique céleste (1889–1896), but Laplace's treatise will always remain a standard
authority.[6]
Pierre-Simon Laplace 164

Arcueil
In 1806, Laplace bought a house in Arcueil, then a village and not yet
absorbed into the Paris conurbation. Claude Louis Berthollet was a
near neighbour and the pair formed the nucleus of an informal
scientific circle, latterly known as the Society of Arcueil. Because of
their closeness to Napoleon, Laplace and Berthollet effectively
controlled advancement in the scientific establishment and admission
to the more prestigious offices. The Society built up a complex
pyramid of patronage.[26] In 1806, he was also elected a foreign
member of the Royal Swedish Academy of Sciences. Laplace's house at Arcueil

Napoleon
An account of a famous interaction between Laplace and Napoleon is provided by Rouse Ball:[6]
Laplace went in state to Napoleon to accept a copy of his work, and the following account of the interview is
well authenticated, and so characteristic of all the parties concerned that I quote it in full. Someone had told
Napoleon that the book contained no mention of the name of God; Napoleon, who was fond of putting
embarrassing questions, received it with the remark, 'M. Laplace, they tell me you have written this large book
on the system of the universe, and have never even mentioned its Creator.' Laplace, who, though the most
supple of politicians, was as stiff as a martyr on every point of his philosophy, drew himself up and answered
bluntly, 'Je n'avais pas besoin de cette hypothèse-là.' ("I had no need of that hypothesis.") Napoleon, greatly
amused, told this reply to Lagrange, who exclaimed, 'Ah! c'est une belle hypothèse; ça explique beaucoup de
choses.' ("Ah, it is a fine hypothesis; it explains many things.")

Black holes
Laplace also came close to propounding the concept of the black hole. He pointed out that there could be massive
stars whose gravity is so great that not even light could escape from their surface (see escape velocity).[27] Laplace
also speculated that some of the nebulae revealed by telescopes may not be part of the Milky Way and might actually
be galaxies themselves. Thus, he anticipated Edwin Hubble's major discovery 100 years in advance.

Analytic theory of probabilities


In 1812, Laplace issued his Théorie analytique des probabilités in which he laid down many fundamental results in
statistics. In 1819, he published a popular account of his work on probability. This book bears the same relation to
the Théorie des probabilités that the Système du monde does to the Méchanique céleste.[6]

Probability-generating function
The method of estimating the ratio of the number of favourable cases, compared to the whole number of possible
cases, had been previously indicated by Laplace in a paper written in 1779. It consists of treating the successive
values of any function as the coefficients in the expansion of another function, with reference to a different variable.
The latter is therefore called the probability-generating function of the former. Laplace then shows how, by means of
interpolation, these coefficients may be determined from the generating function. Next he attacks the converse
problem, and from the coefficients he finds the generating function; this is effected by the solution of a finite
difference equation.[6]
Pierre-Simon Laplace 165

Least squares
This treatise includes an exposition of the method of least squares, a remarkable testimony to Laplace's command
over the processes of analysis. The method of least squares for the combination of numerous observations had been
given empirically by Carl Friedrich Gauss (around 1794) and Legendre (in 1805), but the fourth chapter of this work
contains a formal proof of it, on which the whole of the theory of errors has been since based. This was affected only
by a most intricate analysis specially invented for the purpose, but the form in which it is presented is so meagre and
unsatisfactory that, in spite of the uniform accuracy of the results, it was at one time questioned whether Laplace had
actually gone through the difficult work he so briefly and often incorrectly indicates.[6]

Inductive probability
While he conducted much research in physics, another major theme of his life's endeavours was probability theory.
In his Essai philosophique sur les probabilités (1814), Laplace set out a mathematical system of inductive reasoning
based on probability, which we would today recognise as Bayesian. He begins the text with a series of principles of
probability, the first six being:
1) Probability is the ratio of the "favored events" to the total possible events.
2) The first principle assumed equal probabilities for all events. When this is not true, we must first determine the
probabilities of each event. Then, the probability is the sum of the probabilities of all possible favored events.
3) For independent events, the probability of the occurrence of all is the probability of each multiplied together.
4) For events not independent, the probability of event B following event A (or event A causing B) is the probability
of A multiplied by the probability that A and B both occur.
5) The probability that A will occur, given B has occurred, is the probability of A and B occurring divided by the
probability of B.
6) Three corollaries are given for the sixth principle, which amount to Bayesian probability. Where event Ai ∈ {A1,
A2, ...An} exhausts the list of possible causes for event B, Pr(B) = Pr(A1, A2, ...An). Then
.

One well-known formula arising from his system is the rule of succession, given as principle seven. Suppose that
some trial has only two possible outcomes, labeled "success" and "failure". Under the assumption that little or
nothing is known a priori about the relative plausibilities of the outcomes, Laplace derived a formula for the
probability that the next trial will be a success.

where s is the number of previously observed successes and n is the total number of observed trials. It is still used as
an estimator for the probability of an event if we know the event space, but only have a small number of samples.
The rule of succession has been subject to much criticism, partly due to the example which Laplace chose to
illustrate it. He calculated that the probability that the sun will rise tomorrow, given that it has never failed to in the
past, was

where d is the number of times the sun has risen in the past. This result has been derided as absurd, and some authors
have concluded that all applications of the Rule of Succession are absurd by extension. However, Laplace was fully
aware of the absurdity of the result; immediately following the example, he wrote, "But this number [i.e., the
probability that the sun will rise tomorrow] is far greater for him who, seeing in the totality of phenomena the
principle regulating the days and seasons, realizes that nothing at the present moment can arrest the course of it."[28]
Pierre-Simon Laplace 166

Laplace's demon
Laplace strongly believed in causal determinism, which is expressed in the following quote from the introduction to
the Essai:
We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect
which at a certain moment would know all forces that set nature in motion, and all positions of all items of
which nature is composed, if this intellect were also vast enough to submit these data to analysis, it would
embrace in a single formula the movements of the greatest bodies of the universe and those of the tiniest atom;
for such an intellect nothing would be uncertain and the future just like the past would be present before its
eyes.
—Pierre Simon Laplace, A Philosophical Essay on Probabilities[29]
This intellect is often referred to as Laplace's Superman or Laplace's demon (in the same vein as Maxwell's demon).
Note that the description of the hypothetical intellect described above by Laplace as a demon does not come from
Laplace, but from later biographers: Laplace saw himself as a scientist who hoped that humanity would progress in a
better scientific understanding of the world, which, if and when eventually completed, would still need a tremendous
calculating power to compute it all in a single instant.

Laplace transforms
As early as 1744, Euler, followed by Lagrange, had started looking for solutions of differential equations in the
form:[30]

In 1785, Laplace took the key forward step in using integrals of this form in order to transform a whole difference
equation, rather than simply as a form for the solution, and found that the transformed equation was easier to solve
than the original.[31] [32]

Other discoveries and accomplishments

Mathematics
Amongst the other discoveries of Laplace in pure and applicable mathematics are:
• Discussion, contemporaneously with Alexandre-Théophile Vandermonde, of the general theory of determinants,
(1772);[6]
• Proof that every equation of an even degree must have at least one real quadratic factor;[6]
• Solution of the linear partial differential equation of the second order;[6]
• He was the first to consider the difficult problems involved in equations of mixed differences, and to prove that
the solution of an equation in finite differences of the first degree and the second order might be always obtained
in the form of a continued fraction;[6] and
• In his theory of probabilities:
• Evaluation of several common definite integrals;[6] and
• General proof of the Lagrange reversion theorem.[6]
Pierre-Simon Laplace 167

Surface tension
Laplace built upon the qualitative work of Thomas Young to develop the theory of capillary action and the
Young-Laplace equation.

Speed of sound
Laplace in 1816 was the first to point out that the speed of sound in air depends on the heat capacity ratio. Newton's
original theory gave too low a value, because it does not take account of the adiabatic compression of the air which
results in a local rise in temperature and pressure. Laplace's investigations in practical physics were confined to those
carried on by him jointly with Lavoisier in the years 1782 to 1784 on the specific heat of various bodies.[6]

Political ambitions
According to W. W. Rouse Ball, as Napoleon's power increased Laplace begged him to give him the post of Minister
of the Interior. However this is disputed by Pearson.[3] Napoleon, who desired the support of men of science, did
make him Minister of the Interior in November 1799, but a little less than six weeks saw the close of Laplace's
political career. Napoleon later (in his Mémoires de Sainte Hélène) wrote of his dismissal as follows:[6]
Géomètre de premier rang, Laplace ne tarda pas à se montrer administrateur plus que médiocre; dès son
premier travail nous reconnûmes que nous nous étions trompé. Laplace ne saisissait aucune question sous son
véritable point de vue: il cherchait des subtilités partout, n'avait que des idées problématiques, et portait enfin
l'esprit des `infiniment petits' jusque dans l'administration. (Geometrician of the first rank, Laplace was not
long in showing himself a worse than average administrator; since his first actions in office we recognized our
mistake. Laplace did not consider any question from the right angle: he sought subtleties everywhere, only
conceived problems, and finally carried the spirit of "infinitesimals" into the administration.)
Lucien, Napoleon's brother, was given the post. Although Laplace was removed
from office, it was desirable to retain his allegiance. He was accordingly raised to
the senate, and to the third volume of the Mécanique céleste he prefixed a note
that of all the truths therein contained the most precious to the author was the
declaration he thus made of his devotion towards the peacemaker of Europe. In
copies sold after the Bourbon Restoration this was struck out. (Pearson points out
that the censor would not have allowed it anyway.) In 1814 it was evident that
the empire was falling; Laplace hastened to tender his services to the Bourbons,
and in 1817 during the Restoration he was rewarded with the title of marquis.

According to Rouse Ball, the contempt that his more honest colleagues felt for
Laplace his conduct in the matter may be read in the pages of Paul Louis Courier. His
knowledge was useful on the numerous scientific commissions on which he
served, and probably accounts for the manner in which his political insincerity was overlooked.[6]
He died in Paris in 1827. His brain was removed by his physician, François Magendie, and kept for many years,
eventually being displayed in a roving anatomical museum in Britain. It was reportedly smaller than the average
brain.[3]
Pierre-Simon Laplace 168

Honours
• Asteroid 4628 Laplace is named for him.[33]
• He is one of only seventy-two people to have their name engraved on the Eiffel Tower.
• The European Space Agency's working-title for the international Europa Jupiter System Mission is "Laplace".

Quotes
• What we know is not much. What we do not know is immense. (attributed)
• I had no need of that hypothesis. ("Je n'avais pas besoin de cette hypothèse-là", as a reply to Napoleon, who had
asked why he hadn't mentioned God in his book on astronomy.)
• "It is therefore obvious that ..." (frequently used in the Celestial Mechanics when he had proved something and
mislaid the proof, or found it clumsy. Notorious as a signal for something true, but hard to prove.)
• The weight of evidence for an extraordinary claim must be proportioned to its strangeness.[34]
• "...(This simplicity of ratios will not appear astonishing if we consider that) all the effects of nature are only
mathematical results of a small number of immutable laws." [29]

Bibliography

By Laplace
• Œuvres complètes de Laplace [35], 14 vol. (1878–1912), Paris: Gauthier-Villars (copy from Gallica in French)
• Théorie du movement et de la figure elliptique des planètes (1784) Paris (not in Œuvres complètes)
• Précis de l'histoire de l'astronomie [36]

English translations
• Bowditch, N. (trans.) (1829–1839) Mécanique céleste, 4 vols, Boston
• New edition by Reprint Services ISBN 078122022X
• — [1829–1839] (1966–1969) Celestial Mechanics, 5 vols, including the original French
• Pound, J. (trans.) (1809) The System of the World, 2 vols, London: Richard Phillips
• _ The System of the World (v.1) [37]
• _ The System of the World (v.2) [38]
• — [1809] (2007) The System of the World, vol.1, Kessinger, ISBN 1432653679
• Toplis, J. (trans.) (1814) A treatise upon analytical mechanics [39] Nottingham: H. Barnett
• Truscott, F. W. & Emory, F. L. (trans.) (2007) [1902]. A Philosophical Essay on Probabilities.
ISBN 1602063281., translated from the French 6th ed. (1840)
• A Philosophical Essay on Probabilities (1902) [40] at the Internet Archive

About Laplace and his work


• Andoyer, H. (1922). L'œuvre scientifique de Laplace. Paris: Payot. (in French)
• Bigourdan, G. (1931). "La jeunesse de P.-S. Laplace" (in French). La Science moderne 9: 377–384.
• Crosland, M. (1967). The Society of Arcueil: A View of French Science at the Time of Napoleon I. Cambridge
MA: Harvard University Press. ISBN 043554201X.
• Dale, A. I. (1982). "Bayes or Laplace? an examination of the origin and early application of Bayes' theorem".
Archive for the History of the Exact Sciences 27: 23–47.
• David, F. N. (1965) "Some notes on Laplace", in Neyman, J. & LeCam, L. M. (eds) Bernoulli, Bayes and
Laplace, Berlin, pp30–44
Pierre-Simon Laplace 169

• Deakin, M. A. B. (1981). "The development of the Laplace transform". Archive for the History of the Exact
Sciences 25: 343–390. doi:10.1007/BF01395660.
• — (1982). "The development of the Laplace transform". Archive for the History of the Exact Sciences 26:
351–381. doi:10.1007/BF00418754.
• Dhombres, J. (1989). "La théorie de la capillarité selon Laplace: mathématisation superficielle ou étendue" (in
French). Revue d'Histoire des sciences et de leurs applications 62: 43–70.
• Duveen, D. & Hahn, R. (1957). "Laplace's succession to Bezout's post of Examinateur des élèves de l'artillerie".
Isis 48: 416–427. doi:10.1086/348608.
• Finn, B. S. (1964). "Laplace and the speed of sound". Isis 55: 7–19. doi:10.1086/349791.
• Fourier, J. B. J. (1827). "Éloge historique de M. le Marquis de Laplace". Mémoires de l'Académie Royale des
Sciences 10: lxxxi–cii., delivered 15 June 1829, published in 1831. (in French)
• Gillispie, C. C. (1972). "Probability and politics: Laplace, Condorcet, and Turgot". Proceedings of the American
Philosophical Society 116(1): 1–20.
• — (1997) Pierre Simon Laplace 1749–1827: A Life in Exact Science, Princeton: Princeton University Press,
ISBN 0-691-01185-0
• Grattan-Guinness, I., 2005, "'Exposition du système du monde' and 'Traité de méchanique céleste'" in his
Landmark Writings in Western Mathematics. Elsevier: 242–57.
• Hahn, R. (1955). "Laplace's religious views". Archives internationales d'histoire des sciences 8: 38–40.
• — (1982). Calendar of the Correspondence of Pierre Simon Laplace (Berkeley Papers in the History of Science,
vol.8 ed.). Berkeley, CA: University of California.
• — (1994). New Calendar of the Correspondence of Pierre Simon Laplace (Berkeley Papers in the History of
Science, vol.16 ed.). Berkeley, CA: University of California.
• — (2005) Pierre Simon Laplace 1749–1827: A Determined Scientist, Cambridge, MA: Harvard University Press,
ISBN 0-674-01892-3
• Israel, Werner (1987). "Dark stars: the evolution of an idea". in Hawking, Stephen W.; Israel, Werner. 300 Years
of Gravitation. Cambridge University Press. pp. 199–276
• O'Connor, John J.; Robertson, Edmund F., "Pierre-Simon Laplace" [41], MacTutor History of Mathematics
archive, University of St Andrews. (1999)
• Rouse Ball, W. W. [1908] (2003) "Pierre Simon Laplace (1749–1827) [42]", in A Short Account of the History of
Mathematics, 4th ed., Dover, ISBN 0486206300
• Stigler, S. M. (1975). "Napoleonic statistics: the work of Laplace" [43]. Biometrika (Biometrika, Vol. 62, No. 2)
62 (2): 503–517. doi:10.2307/2335393.
• — (1978). "Laplace's early work: chronology and citations". Isis 69(2): 234–254.
• Whitrow, G. J. (2001) "Laplace, Pierre-Simon, marquis de", Encyclopaedia Britannica, Deluxe CDROM edition
• Whittaker, E. T. (1949a). "Laplace" [44]. Mathematical Gazette (The Mathematical Gazette, Vol. 33, No. 303) 33
(303): 1–12. doi:10.2307/3608408.
• — (1949b). "Laplace". American Mathematical Monthly 56(6): 369–372.
• Wilson, C. (1985). "The Great Inequality of Jupiter and Saturn: from Kepler to Laplace". Archive for the History
of the Exact Sciences 33(1–3): 15–290. doi:10.1007/BF00328048.
• Young, T. (1821). Elementary Illustrations of the Celestial Mechanics of Laplace: Part the First, Comprehending
the First Book [45]. London: John Murray. (available from Google Books)
Pierre-Simon Laplace 170

External links
• "Laplace, Pierre (1749–1827)" [46]. Eric Weisstein's World of Scientific Biography. Wolfram Research. Retrieved
2007-08-24.
• "Pierre-Simon Laplace [41]" in the MacTutor History of Mathematics archive.
• "Bowditch's English translation of Laplace's preface" [47]. Méchanique Céleste. The MacTutor History of
Mathematics archive. Retrieved 2007-09-04.
• Guide to the Pierre Simon Laplace Papers [48] at The Bancroft Library
• Pierre-Simon Laplace [49] at the Mathematics Genealogy Project
• English translation [50] of a large part of Laplace's work in probability and statistics, provided by Richard
Pulskamp [51]

References
[1] Stephen M. Stigler (1986) The history of statistics. Harvard University press. Chapter 3.
[2] [Anon.] (1911) " Pierre Simon, Marquis De Laplace (http:/ / www. 1911encyclopedia. org/ Pierre_Simon,_Marquis_De_Laplace)",
Encyclopaedia Britannica
[3] "Laplace, being Extracts from Lectures delivered by Karl Pearson", Biometrika, vol. 21, Dec. 1929, pp. 202–16
[4] *O'Connor, John J.; Robertson, Edmund F., "Pierre-Simon Laplace" (http:/ / www-history. mcs. st-andrews. ac. uk/ Biographies/ Laplace.
html), MacTutor History of Mathematics archive, University of St Andrews, ., accessed 25 August 2007
[5] Gillispie (1997) pp3–4
[6] Rouse Ball (1908)
[7] Gillispie (1997) pp5
[8] "Pierre-Simon Laplace". Catholic Encyclopedia. New York: Robert Appleton Company. 1913.
[9] Gillispie (1989) pp7–12
[10] Gillispie (1989) pp14–15
[11] Whitrow (2001)
[12] Celletti, A. & Perozzi, E. (2007). Celestial Mechanics: The Waltz of the Planets. Berlin: Springer. pp. 91–93. ISBN 0-387-30777-X.
[13] Whittaker (1949b)
[14] Gillispie (1989) pp29–35
[15] Gillispie (1989) pp35–36
[16] School of Mathematics and Statistics (http:/ / www-history. mcs. st-andrews. ac. uk/ Biographies/ Laplace. html), University of St Andrews,
Scotland.
[17] Grattan-Guinness, I. (2003). Companion Encyclopedia of the History and Philosophy of the Mathematical Sciences (http:/ / books. google.
com/ ?id=f5FqsDPVQ2MC& pg=PA1098& lpg=PA1098& dq=laplace+ potential+ 1784). Baltimore: Johns Hopkins University Press.
pp. 1097–1098. ISBN 0801873967. .
[18] E Halley (1695), "Some Account of the Ancient State of the City of Palmyra, with Short Remarks upon the Inscriptions Found there" (http:/
/ rstl. royalsocietypublishing. org/ content/ 19/ 215-235/ 160. full. pdf), Phil. Trans., vol.19 (1695-1697), pages 160-175; esp. at pages
174-175.
[19] Richard Dunthorne (1749), "A Letter from the Rev. Mr. Richard Dunthorne to the Reverend Mr. Richard Mason F. R. S. and Keeper of the
Wood-Wardian Museum at Cambridge, concerning the Acceleration of the Moon" (http:/ / rstl. royalsocietypublishing. org/ content/ 46/
491-496/ 162. full. pdf), Philosophical Transactions (1683-1775), Vol. 46 (1749 - 1750) #492, pp.162-172; also given in Philosophical
Transactions (abridgements) (1809), vol.9 (for 1744-49), p669-675 (http:/ / www. archive. org/ stream/ philosophicaltra09royarich#page/ 669/
mode/ 2up) as "On the Acceleration of the Moon, by the Rev. Richard Dunthorne".
[20] J de Lalande (1786): "Sur les equations seculaires du soleil et de la lune" (http:/ / www. academie-sciences. fr/ membres/ in_memoriam/
Lalande/ Lalande_pdf/ Mem1786_p390. pdf), Memoires de l'Academie Royale des Sciences, pp.390-397, at page 395.
[21] J D North (2008), "Cosmos: an illustrated history of astronomy and cosmology", (University of Chicago Press, 2008), chapter 14, at page
454 (http:/ / books. google. com/ books?id=qq8Luhs7rTUC& pg=PA454).
[22] See also P Puiseux (1879), "Sur l'acceleration seculaire du mouvement de la Lune" (http:/ / archive. numdam. org/ article/
ASENS_1879_2_8__361_0. pdf), Annales Scientifiques de l'Ecole Normale Superieure, 2nd series vol.8 (1879), pp.361-444, at pages 361-5.
[23] J C Adams (1853), "On the Secular Variation of the Moon's Mean Motion" (http:/ / rstl. royalsocietypublishing. org/ content/ 143/ 397. full.
pdf), in Phil. Trans. R. Soc. Lond., vol.143 (1853), pages 397-406.
[24] Roy, A. E. (2005). Orbital Motion (http:/ / books. google. com/ ?id=Hzv7k2vH6PgC& pg=PA313& lpg=PA313& dq=laplace+ secular+
acceleration). London: CRC Press. pp. 313. ISBN 0750310154. .
[25] Owen, T. C. (2001) "Solar system: origin of the solar system", Encyclopaedia Britannica, Deluxe CDROM edition
[26] Crosland (1967) p.1
[27] See Israel (1987), sec. 7.2.
Pierre-Simon Laplace 171

[28] Laplace, Pierre Simon, A Philosophical Essay on Probabilities, translated from the 6th French edition by Frederick Wilson Truscott and
Frederick Lincoln Emory, Dover Publications (New York, 1951)
[29] Laplace, Pierre Simon, A Philosophical Essay on Probabilities, translated from the 6th French edition by Frederick Wilson Truscott and
Frederick Lincoln Emory, Dover Publications (New York, 1951) pp.4
[30] Grattan-Guiness, in Gillispie (1997) p.260
[31] Grattan-Guiness, in Gillispie (1997) pp261–262
[32] Deakin (1981)
[33] Schmadel, L. D. (2003). Dictionary of Minor Planet Names (5th rev. ed.). Berlin: Springer-Verlag. ISBN 3540002383.
[34] A sense of place in the heartland (http:/ / www. jsonline. com/ story/ index. aspx?id=497783& format=print), The Milwaukee Journal
Sentinel Online
[35] http:/ / gallica. bnf. fr/ ark:/ 12148/ bpt6k775950
[36] http:/ / books. google. com/ books?id=QYpOb3N7zBMC
[37] http:/ / books. google. com/ books?id=yW3nd4DSgYYC
[38] http:/ / books. google. com/ books?id=f7Kv2iFUNJoC
[39] http:/ / books. google. com/ books?id=c2YSAAAAIAAJ
[40] http:/ / www. archive. org/ details/ philosophicaless00lapliala
[41] http:/ / www-history. mcs. st-andrews. ac. uk/ Biographies/ Laplace. html
[42] http:/ / www. maths. tcd. ie/ pub/ HistMath/ People/ Laplace/ RouseBall/ RB_Laplace. html
[43] http:/ / jstor. org/ stable/ 2335393
[44] http:/ / jstor. org/ stable/ 3608408
[45] http:/ / books. google. com/ ?id=20AJAAAAIAAJ& dq=laplace
[46] http:/ / scienceworld. wolfram. com/ biography/ Laplace. html
[47] http:/ / www-history. mcs. st-andrews. ac. uk/ history/ Extras/ Laplace_mechanique_celeste. html
[48] http:/ / www. oac. cdlib. org/ findaid/ ark:/ 13030/ kt8q2nf3g7/
[49] http:/ / genealogy. math. ndsu. nodak. edu/ id. php?id=108295
[50] http:/ / www. cs. xu. edu/ math/ Sources/ Laplace/ index. html
[51] http:/ / www. cs. xu. edu/ math/ Sources/ index. html

Integral
Integration is an important concept in mathematics and, together with
differentiation, is one of the two main operations in calculus. Given a
function ƒ of a real variable x and an interval [a, b] of the real line, the
definite integral

A definite integral of a function can be


represented as the signed area of the region
bounded by its graph.

is defined informally to be the net signed area of the region in the xy-plane bounded by the graph of ƒ, the x-axis, and
the vertical lines x = a and x = b.
The term integral may also refer to the notion of antiderivative, a function F whose derivative is the given function ƒ.
In this case, it is called an indefinite integral, while the integrals discussed in this article are termed definite
Integral 172

integrals. Some authors maintain a distinction between antiderivatives and indefinite integrals.
The principles of integration were formulated independently by Isaac Newton and Gottfried Leibniz in the late 17th
century. Through the fundamental theorem of calculus, which they independently developed, integration is
connected with differentiation: if ƒ is a continuous real-valued function defined on a closed interval [a, b], then, once
an antiderivative F of ƒ is known, the definite integral of ƒ over that interval is given by

Integrals and derivatives became the basic tools of calculus, with numerous applications in science and engineering.
A rigorous mathematical definition of the integral was given by Bernhard Riemann. It is based on a limiting
procedure which approximates the area of a curvilinear region by breaking the region into thin vertical slabs.
Beginning in the nineteenth century, more sophisticated notions of integrals began to appear, where the type of the
function as well as the domain over which the integration is performed has been generalised. A line integral is
defined for functions of two or three variables, and the interval of integration [a, b] is replaced by a certain curve
connecting two points on the plane or in the space. In a surface integral, the curve is replaced by a piece of a surface
in the three-dimensional space. Integrals of differential forms play a fundamental role in modern differential
geometry. These generalizations of integral first arose from the needs of physics, and they play an important role in
the formulation of many physical laws, notably those of electrodynamics. There are many modern concepts of
integration, among these, the most common is based on the abstract mathematical theory known as Lebesgue
integration, developed by Henri Lebesgue.

History

Pre-calculus integration
Integration can be traced as far back as ancient Egypt ca. 1800 BC, with the Moscow Mathematical Papyrus
demonstrating knowledge of a formula for the volume of a pyramidal frustum. The first documented systematic
technique capable of determining integrals is the method of exhaustion of Eudoxus (ca. 370 BC), which sought to
find areas and volumes by breaking them up into an infinite number of shapes for which the area or volume was
known. This method was further developed and employed by Archimedes and used to calculate areas for parabolas
and an approximation to the area of a circle. Similar methods were independently developed in China around the 3rd
century AD by Liu Hui, who used it to find the area of the circle. This method was later used in the 5th century by
Chinese father-and-son mathematicians Zu Chongzhi and Zu Geng to find the volume of a sphere.[1] That same
century, the Indian mathematician Aryabhata used a similar method in order to find the volume of a cube.[2]
The next major step in integral calculus came in Iraq when the 11th century mathematician Ibn al-Haytham (known
as Alhazen in Europe) devised what is now known as "Alhazen's problem", which leads to an equation of the fourth
degree, in his Book of Optics. While solving this problem, he performed an integration in order to find the volume of
a paraboloid. Using mathematical induction, he was able to generalize his result for the integrals of polynomials up
to the fourth degree. He thus came close to finding a general formula for the integrals of polynomials, but he was not
concerned with any polynomials higher than the fourth degree.[3] Some ideas of integral calculus are also found in
the Siddhanta Shiromani, a 12th century astronomy text by Indian mathematician Bhāskara II.
The next significant advances in integral calculus did not begin to appear until the 16th century. At this time the
work of Cavalieri with his method of indivisibles, and work by Fermat, began to lay the foundations of modern
calculus. Further steps were made in the early 17th century by Barrow and Torricelli, who provided the first hints of
a connection between integration and differentiation.
At around the same time, there was also a great deal of work being done by Japanese mathematicians, particularly by
Seki Kōwa.[4] He made a number of contributions, namely in methods of determining areas of figures using
integrals, extending the method of exhaustion.
Integral 173

Newton and Leibniz


The major advance in integration came in the 17th century with the independent discovery of the fundamental
theorem of calculus by Newton and Leibniz. The theorem demonstrates a connection between integration and
differentiation. This connection, combined with the comparative ease of differentiation, can be exploited to calculate
integrals. In particular, the fundamental theorem of calculus allows one to solve a much broader class of problems.
Equal in importance is the comprehensive mathematical framework that both Newton and Leibniz developed. Given
the name infinitesimal calculus, it allowed for precise analysis of functions within continuous domains. This
framework eventually became modern calculus, whose notation for integrals is drawn directly from the work of
Leibniz.

Formalizing integrals
While Newton and Leibniz provided a systematic approach to integration, their work lacked a degree of rigour.
Bishop Berkeley memorably attacked infinitesimals as "the ghosts of departed quantities". Calculus acquired a
firmer footing with the development of limits and was given a suitable foundation by Cauchy in the first half of the
19th century. Integration was first rigorously formalized, using limits, by Riemann. Although all bounded piecewise
continuous functions are Riemann integrable on a bounded interval, subsequently more general functions were
considered, to which Riemann's definition does not apply, and Lebesgue formulated a different definition of integral,
founded in measure theory (a subfield of real analysis). Other definitions of integral, extending Riemann's and
Lebesgue's approaches, were proposed.

Notation
Isaac Newton used a small vertical bar above a variable to indicate integration, or placed the variable inside a box.
The vertical bar was easily confused with or , which Newton used to indicate differentiation, and the box
notation was difficult for printers to reproduce, so these notations were not widely adopted.
The modern notation for the indefinite integral was introduced by Gottfried Leibniz in 1675 (Burton 1988, p. 359;
Leibniz 1899, p. 154). He adapted the integral symbol, ∫, from an elongated letter s, standing for summa (Latin for
"sum" or "total"). The modern notation for the definite integral, with limits above and below the integral sign, was
first used by Joseph Fourier in Mémoires of the French Academy around 1819–20, reprinted in his book of 1822
(Cajori 1929, pp. 249–250; Fourier 1822, §231).

Terminology and notation


If a function has an integral, it is said to be integrable. The function for which the integral is calculated is called the
integrand. The region over which a function is being integrated is called the domain of integration. Usually this
domain will be an interval in which case it is enough to give the limits of that interval, which are called the limits of
integration. If the integral does not have a domain of integration, it is considered indefinite (one with a domain is
considered definite). In general, the integrand may be a function of more than one variable, and the domain of
integration may be an area, volume, a higher dimensional region, or even an abstract space that does not have a
geometric structure in any usual sense.
The simplest case, the integral of a real-valued function f of one real variable x on the interval [a, b], is denoted by

The ∫ sign represents integration; a and b are the lower limit and upper limit, respectively, of integration, defining
the domain of integration; f is the integrand, to be evaluated as x varies over the interval [a,b]; and dx is the variable
of integration. In correct mathematical typography, the dx is separated from the integrand by a space (as shown).
Some authors use an upright d (that is, dx instead of dx).
Integral 174

The variable of integration dx has different interpretations depending on the theory being used. For example, it can
be seen as strictly a notation indicating that x is a dummy variable of integration, as a reflection of the weights in the
Riemann sum, a measure (in Lebesgue integration and its extensions), an infinitesimal (in non-standard analysis) or
as an independent mathematical quantity: a differential form. More complicated cases may vary the notation slightly.
In so-called modern Arabic mathematical notation, which aims at pre-university levels of education in the Arab
world and is written from right to left, an inverted integral symbol is used (W3C 2006).

Introduction
Integrals appear in many practical situations. Consider a swimming pool. If it is rectangular with a flat bottom, then
from its length, width, and depth we can easily determine the volume of water it can contain (to fill it), the area of its
surface (to cover it), and the length of its edge (to rope it). But if it is oval with a rounded bottom, all of these
quantities call for integrals. Practical approximations may suffice for such trivial examples, but precision engineering
(of any discipline) requires exact and rigorous values for these elements.
To start off, consider the curve y = f(x) between x = 0 and x = 1, with
f(x) = √x. We ask:
What is the area under the function f, in the interval from 0 to 1?
and call this (yet unknown) area the integral of f. The notation for this
integral will be

Approximations to integral of √x from 0 to 1,


with ■ 5 right samples (above) and ■ 12 left
samples (below)

As a first approximation, look at the unit square given by the sides x = 0 to x = 1 and y = f(0) = 0 and y = f(1) = 1. Its
area is exactly 1. As it is, the true value of the integral must be somewhat less. Decreasing the width of the
approximation rectangles shall give a better result; so cross the interval in five steps, using the approximation points
0, 1⁄5, 2⁄5, and so on to 1. Fit a box for each step using the right end height of each curve piece, thus √1⁄5, √2⁄5, and so
on to √1 = 1. Summing the areas of these rectangles, we get a better approximation for the sought integral, namely

Notice that we are taking a sum of finitely many function values of f, multiplied with the differences of two
subsequent approximation points. We can easily see that the approximation is still too large. Using more steps
produces a closer approximation, but will never be exact: replacing the 5 subintervals by twelve as depicted, we will
get an approximate value for the area of 0.6203, which is too small. The key idea is the transition from adding
finitely many differences of approximation points multiplied by their respective function values to using infinitely
many fine, or infinitesimal steps.
As for the actual calculation of integrals, the fundamental theorem of calculus, due to Newton and Leibniz, is the
fundamental link between the operations of differentiating and integrating. Applied to the square root curve, f(x) =
x1/2, it says to look at the antiderivative F(x) = 2⁄3x3/2, and simply take F(1) − F(0), where 0 and 1 are the boundaries
Integral 175

of the interval [0,1]. So the exact value of the area under the curve is computed formally as

(This is a case of a general rule, that for f(x) = xq, with q ≠ −1, the related function, the so-called antiderivative is
F(x) = (xq+1)/(q + 1).)
The notation

conceives the integral as a weighted sum, denoted by the elongated s, of function values, f(x), multiplied by
infinitesimal step widths, the so-called differentials, denoted by dx. The multiplication sign is usually omitted.
Historically, after the failure of early efforts to rigorously interpret infinitesimals, Riemann formally defined
integrals as a limit of weighted sums, so that the dx suggested the limit of a difference (namely, the interval width).
Shortcomings of Riemann's dependence on intervals and continuity motivated newer definitions, especially the
Lebesgue integral, which is founded on an ability to extend the idea of "measure" in much more flexible ways. Thus
the notation

refers to a weighted sum in which the function values are partitioned, with μ measuring the weight to be assigned to
each value. Here A denotes the region of integration.
Differential geometry, with its "calculus on manifolds", gives the familiar notation yet another interpretation. Now
f(x) and dx become a differential form, ω = f(x) dx, a new differential operator d, known as the exterior derivative
appears, and the fundamental theorem becomes the more general Stokes' theorem,

from which Green's theorem, the divergence theorem, and the fundamental theorem of calculus follow.
More recently, infinitesimals have reappeared with rigor, through modern innovations such as non-standard analysis.
Not only do these methods vindicate the intuitions of the pioneers; they also lead to new mathematics.
Although there are differences between these conceptions of integral, there is considerable overlap. Thus, the area of
the surface of the oval swimming pool can be handled as a geometric ellipse, a sum of infinitesimals, a Riemann
integral, a Lebesgue integral, or as a manifold with a differential form. The calculated result will be the same for all.
Integral 176

Formal definitions
There are many ways of formally defining an integral, not all of which are equivalent. The differences exist mostly
to deal with differing special cases which may not be integrable under other definitions, but also occasionally for
pedagogical reasons. The most commonly used definitions of integral are Riemann integrals and Lebesgue integrals.

Riemann integral
The Riemann integral is defined in terms of Riemann sums of
functions with respect to tagged partitions of an interval. Let [a,b] be a
closed interval of the real line; then a tagged partition of [a,b] is a
finite sequence

Integral approached as Riemann sum based on


tagged partition, with irregular sampling positions
and widths (max in red). True value is 3.76;
estimate is 3.648.

This partitions the interval [a,b] into n sub-intervals


[xi−1, xi] indexed by i, each of which is "tagged" with a
distinguished point ti ∈ [xi−1, xi]. A Riemann sum of a
function f with respect to such a tagged partition is
defined as

Riemann sums converging as intervals halve, whether sampled at


■ right, ■ minimum, ■ maximum, or ■ left.

thus each term of the sum is the area of a rectangle with height equal to the function value at the distinguished point
of the given sub-interval, and width the same as the sub-interval width. Let Δi = xi−xi−1 be the width of sub-interval
Integral 177

i; then the mesh of such a tagged partition is the width of the largest sub-interval formed by the partition,
maxi=1…n Δi. The Riemann integral of a function f over the interval [a,b] is equal to S if:
For all ε > 0 there exists δ > 0 such that, for any tagged partition [a,b] with mesh less than δ, we have

When the chosen tags give the maximum (respectively, minimum) value of each interval, the Riemann sum becomes
an upper (respectively, lower) Darboux sum, suggesting the close connection between the Riemann integral and the
Darboux integral.

Lebesgue integral
The Riemann integral is not defined for a wide range of functions and situations of importance in applications (and
of interest in theory). For example, the Riemann integral can easily integrate density to find the mass of a steel beam,
but cannot accommodate a steel ball resting on it. This motivates other definitions, under which a broader assortment
of functions is integrable (Rudin 1987). The Lebesgue integral, in particular, achieves great flexibility by directing
attention to the weights in the weighted sum.
The definition of the Lebesgue integral thus begins with a measure, μ. In the simplest case, the Lebesgue measure
μ(A) of an interval A = [a,b] is its width, b − a, so that the Lebesgue integral agrees with the (proper) Riemann
integral when both exist. In more complicated cases, the sets being measured can be highly fragmented, with no
continuity and no resemblance to intervals.
To exploit this flexibility, Lebesgue integrals reverse the approach to the weighted sum. As Folland (1984, p. 56)
puts it, "To compute the Riemann integral of f, one partitions the domain [a,b] into subintervals", while in the
Lebesgue integral, "one is in effect partitioning the range of f".
One common approach first defines the integral of the indicator function of a measurable set A by:

This extends by linearity to a measurable simple function s, which attains only a finite number, n, of distinct
non-negative values:

(where the image of Ai under the simple function s is the constant value ai). Thus if E is a measurable set one defines

Then for any non-negative measurable function f one defines


that is, the integral of f is set to be the supremum of all the integrals of simple functions that are less than or equal to
f. A general measurable function f, is split into its positive and negative values by defining
Integral 178

Finally, f is Lebesgue integrable if

and then the integral is defined by

When the measure space on which the functions are defined is also a locally compact topological space (as is the
case with the real numbers R), measures compatible with the topology in a suitable sense (Radon measures, of which
the Lebesgue measure is an example) and integral with respect to them can be defined differently, starting from the
integrals of continuous functions with compact support. More precisely, the compactly supported functions form a
vector space that carries a natural topology, and a (Radon) measure can be defined as any continuous linear
functional on this space; the value of a measure at a compactly supported function is then also by definition the
integral of the function. One then proceeds to expand the measure (the integral) to more general functions by
continuity, and defines the measure of a set as the integral of its indicator function. This is the approach taken by
Bourbaki (2004) and a certain number of other authors. For details see Radon measures.

Other integrals
Although the Riemann and Lebesgue integrals are the most widely used definitions of the integral, a number of
others exist, including:
• The Riemann–Stieltjes integral, an extension of the Riemann integral.
• The Lebesgue-Stieltjes integral, further developed by Johann Radon, which generalizes the Riemann–Stieltjes and
Lebesgue integrals.
• The Daniell integral, which subsumes the Lebesgue integral and Lebesgue-Stieltjes integral without the
dependence on measures.
• The Henstock-Kurzweil integral, variously defined by Arnaud Denjoy, Oskar Perron, and (most elegantly, as the
gauge integral) Jaroslav Kurzweil, and developed by Ralph Henstock.
• The Itō integral and Stratonovich integral, which define integration with respect to stochastic processes such as
Brownian motion.

Properties

Linearity
• The collection of Riemann integrable functions on a closed interval [a, b] forms a vector space under the
operations of pointwise addition and multiplication by a scalar, and the operation of integration

is a linear functional on this vector space. Thus, firstly, the collection of integrable functions is closed under
taking linear combinations; and, secondly, the integral of a linear combination is the linear combination of the
integrals,
Integral 179

• Similarly, the set of real-valued Lebesgue integrable functions on a given measure space E with measure μ is
closed under taking linear combinations and hence form a vector space, and the Lebesgue integral

is a linear functional on this vector space, so that

• More generally, consider the vector space of all measurable functions on a measure space (E,μ), taking values in a
locally compact complete topological vector space V over a locally compact topological field K, f : E → V. Then
one may define an abstract integration map assigning to each function f an element of V or the symbol ∞,

that is compatible with linear combinations. In this situation the linearity holds for the subspace of functions
whose integral is an element of V (i.e. "finite"). The most important special cases arise when K is R, C, or a
finite extension of the field Qp of p-adic numbers, and V is a finite-dimensional vector space over K, and when
K=C and V is a complex Hilbert space.
Linearity, together with some natural continuity properties and normalisation for a certain class of "simple"
functions, may be used to give an alternative definition of the integral. This is the approach of Daniell for the case of
real-valued functions on a set X, generalized by Nicolas Bourbaki to functions with values in a locally compact
topological vector space. See (Hildebrandt 1953) for an axiomatic characterisation of the integral.

Inequalities for integrals


A number of general inequalities hold for Riemann-integrable functions defined on a closed and bounded interval [a,
b] and can be generalized to other notions of integral (Lebesgue and Daniell).
• Upper and lower bounds. An integrable function f on [a, b], is necessarily bounded on that interval. Thus there
are real numbers m and M so that m ≤ f (x) ≤ M for all x in [a, b]. Since the lower and upper sums of f over [a, b]
are therefore bounded by, respectively, m(b − a) and M(b − a), it follows that

• Inequalities between functions. If f(x) ≤ g(x) for each x in [a, b] then each of the upper and lower sums of f is
bounded above by the upper and lower sums, respectively, of g. Thus

This is a generalization of the above inequalities, as M(b − a) is the integral of the constant function with value
M over [a, b].
• Subintervals. If [c, d] is a subinterval of [a, b] and f(x) is non-negative for all x, then

• Products and absolute values of functions. If f and g are two functions then we may consider their pointwise
products and powers, and absolute values:

If f is Riemann-integrable on [a, b] then the same is true for |f|, and

Moreover, if f and g are both Riemann-integrable then f 2, g 2, and fg are also Riemann-integrable, and
Integral 180

This inequality, known as the Cauchy–Schwarz inequality, plays a prominent role in Hilbert space theory,
where the left hand side is interpreted as the inner product of two square-integrable functions f and g on the
interval [a, b].
• Hölder's inequality. Suppose that p and q are two real numbers, 1 ≤ p, q ≤ ∞ with 1/p + 1/q = 1, and f and g are
two Riemann-integrable functions. Then the functions |f|p and |g|q are also integrable and the following Hölder's
inequality holds:

For p = q = 2, Hölder's inequality becomes the Cauchy–Schwarz inequality.


• Minkowski inequality. Suppose that p ≥ 1 is a real number and f and g are Riemann-integrable functions. Then |f|p,
|g|p and |f + g|p are also Riemann integrable and the following Minkowski inequality holds:
An analogue of this inequality for Lebesgue integral is used in construction of Lp spaces.

Conventions
In this section f is a real-valued Riemann-integrable function. The integral

over an interval [a, b] is defined if a < b. This means that the upper and lower sums of the function f are evaluated on
a partition a = x0 ≤ x1 ≤ . . . ≤ xn = b whose values xi are increasing. Geometrically, this signifies that integration
takes place "left to right", evaluating f within intervals [x i , x i +1] where an interval with a higher index lies to the
right of one with a lower index. The values a and b, the end-points of the interval, are called the limits of integration
of f. Integrals can also be defined if a > b:
• Reversing limits of integration. If a > b then define

This, with a = b, implies:


• Integrals over intervals of length zero. If a is a real number then

The first convention is necessary in consideration of taking integrals over subintervals of [a, b]; the second says that
an integral taken over a degenerate interval, or a point, should be zero. One reason for the first convention is that the
integrability of f on an interval [a, b] implies that f is integrable on any subinterval [c, d], but in particular integrals
have the property that:
• Additivity of integration on intervals. If c is any element of [a, b], then

With the first convention the resulting relation

is then well-defined for any cyclic permutation of a, b, and c.


Integral 181

Instead of viewing the above as conventions, one can also adopt the point of view that integration is performed of
differential forms on oriented manifolds only. If M is such an oriented m-dimensional manifold, and M is the same
manifold with opposed orientation and ω is an m-form, then one has:

These conventions correspond to interpreting the integrand as a differential form, integrated over a chain. In measure
theory, by contrast, one interprets the integrand as a function f with respect to a measure and integrates over a
subset A, without any notion of orientation; one writes to indicate integration over a subset
A. This is a minor distinction in one dimension, but becomes subtler on higher dimensional manifolds; see
Differential form: Relation with measures for details.

Fundamental theorem of calculus


The fundamental theorem of calculus is the statement that differentiation and integration are inverse operations: if a
continuous function is first integrated and then differentiated, the original function is retrieved. An important
consequence, sometimes called the second fundamental theorem of calculus, allows one to compute integrals by
using an antiderivative of the function to be integrated.

Statements of theorems
• Fundamental theorem of calculus. Let f be a real-valued integrable function defined on a closed interval [a, b]. If
F is defined for x in [a, b] by

then F is continuous on [a, b]. If f is continuous at x in [a, b], then F is differentiable at x, and F ′(x) = f(x).
• Second fundamental theorem of calculus. Let f be a real-valued integrable function defined on a closed interval [a,
b]. If F is a function such that F ′(x) = f(x) for all x in [a, b] (that is, F is an antiderivative of f), then

In particular, these are true whenever f is continuous on [a, b].


Integral 182

Extensions

Improper integrals
A "proper" Riemann integral assumes the integrand is defined and
finite on a closed and bounded interval, bracketed by the limits of
integration. An improper integral occurs when one or more of these
conditions is not satisfied. In some cases such integrals may be defined
by considering the limit of a sequence of proper Riemann integrals on
progressively larger intervals.

If the interval is unbounded, for instance at its upper end, then the
improper integral is the limit as that endpoint goes to infinity.

The improper integral

has unbounded intervals for both domain and


range.

If the integrand is only defined or finite on a half-open interval, for instance (a,b], then again a limit may provide a
finite result.

That is, the improper integral is the limit of proper integrals as one endpoint of the interval of integration approaches
either a specified real number, or ∞, or −∞. In more complicated cases, limits are required at both endpoints, or at
interior points.

Consider, for example, the function integrated from 0 to ∞ (shown right). At the lower bound, as x goes to
0 the function goes to ∞, and the upper bound is itself ∞, though the function goes to 0. Thus this is a doubly
improper integral. Integrated, say, from 1 to 3, an ordinary Riemann sum suffices to produce a result of . To
integrate from 1 to ∞, a Riemann sum is not possible. However, any finite upper bound, say t (with t > 1), gives a
well-defined result, . This has a finite limit as t goes to infinity, namely . Similarly, the integral
from 1⁄3 to 1 allows a Riemann sum as well, coincidentally again producing . Replacing 1⁄3 by an arbitrary positive
value s (with s < 1) is equally safe, giving . This, too, has a finite limit as s goes to zero,
namely . Combining the limits of the two fragments, the result of this improper integral is
This process does not guarantee success; a limit may fail to exist, or may be unbounded. For example, over the
bounded interval 0 to 1 the integral of does not converge; and over the unbounded interval 1 to ∞ the integral of
does not converge.
It may also happen that an integrand is unbounded at an interior point, in which case the integral must be split at that
point, and the limit integrals on both sides must exist and must be bounded. Thus
Integral 183

But the similar integral

cannot be assigned a value in this way, as the integrals above and below zero do not independently converge.
(However, see Cauchy principal value.)

Multiple integration
Integrals can be taken over regions other than intervals. In general, an
integral over a set E of a function f is written:

Double integral as volume under a surface.

Here x need not be a real number, but can be another suitable quantity, for instance, a vector in R3. Fubini's theorem
shows that such integrals can be rewritten as an iterated integral. In other words, the integral can be calculated by
integrating one coordinate at a time.
Just as the definite integral of a positive function of one variable represents the area of the region between the graph
of the function and the x-axis, the double integral of a positive function of two variables represents the volume of the
region between the surface defined by the function and the plane which contains its domain. (The same volume can
be obtained via the triple integral — the integral of a function in three variables — of the constant function f(x, y, z)
= 1 over the above mentioned region between the surface and the plane.) If the number of variables is higher, then
the integral represents a hypervolume, a volume of a solid of more than three dimensions that cannot be graphed.
For example, the volume of the cuboid of sides 4 × 6 × 5 may be obtained in two ways:
• By the double integral

of the function f(x, y) = 5 calculated in the region D in the xy-plane which is the base of the cuboid. For
example, if a rectangular base of such a cuboid is given via the xy inequalities 2 ≤ x ≤ 7, 4 ≤ y ≤ 9, our above
Integral 184

double integral now reads

From here, integration is conducted with respect to either x or y first; in this example, integration is first done
with respect to x as the interval corresponding to x is the inner integral. Once the first integration is completed
via the method or otherwise, the result is again integrated with respect to the other variable.
The result will equate to the volume under the surface.
• By the triple integral

of the constant function 1 calculated on the cuboid itself.

Line integrals
The concept of an integral can be extended to more general domains of
integration, such as curved lines and surfaces. Such integrals are
known as line integrals and surface integrals respectively. These have
important applications in physics, as when dealing with vector fields.
A line integral (sometimes called a path integral) is an integral where
the function to be integrated is evaluated along a curve. Various
different line integrals are in use. In the case of a closed curve it is also
called a contour integral.
The function to be integrated may be a scalar field or a vector field.
The value of the line integral is the sum of values of the field at all
points on the curve, weighted by some scalar function on the curve A line integral sums together elements along a
curve.
(commonly arc length or, for a vector field, the scalar product of the
vector field with a differential vector in the curve). This weighting
distinguishes the line integral from simpler integrals defined on intervals. Many simple formulas in physics have
natural continuous analogs in terms of line integrals; for example, the fact that work is equal to force, F, multiplied
by displacement, s, may be expressed (in terms of vector quantities) as:

For an object moving along a path in a vector field such as an electric field or gravitational field, the total work
done by the field on the object is obtained by summing up the differential work done in moving from to
. This gives the line integral
Integral 185

Surface integrals
A surface integral is a definite integral taken over a surface (which
may be a curved set in space); it can be thought of as the double
integral analog of the line integral. The function to be integrated may
be a scalar field or a vector field. The value of the surface integral is
the sum of the field at all points on the surface. This can be achieved
by splitting the surface into surface elements, which provide the
partitioning for Riemann sums.

For an example of applications of surface integrals, consider a vector


The definition of surface integral relies on
field v on a surface S; that is, for each point x in S, v(x) is a vector.
splitting the surface into small surface elements.
Imagine that we have a fluid flowing through S, such that v(x)
determines the velocity of the fluid at x. The flux is defined as the
quantity of fluid flowing through S in unit amount of time. To find the flux, we need to take the dot product of v with
the unit surface normal to S at each point, which will give us a scalar field, which we integrate over the surface:

The fluid flux in this example may be from a physical fluid such as water or air, or from electrical or magnetic flux.
Thus surface integrals have applications in physics, particularly with the classical theory of electromagnetism.

Integrals of differential forms


A differential form is a mathematical concept in the fields of multivariable calculus, differential topology and
tensors. The modern notation for the differential form, as well as the idea of the differential forms as being the
wedge products of exterior derivatives forming an exterior algebra, was introduced by Élie Cartan.
We initially work in an open set in Rn. A 0-form is defined to be a smooth function f. When we integrate a function f
over an m-dimensional subspace S of Rn, we write it as

(The superscripts are indices, not exponents.) We can consider dx1 through dxn to be formal objects themselves,
rather than tags appended to make integrals look like Riemann sums. Alternatively, we can view them as covectors,
and thus a measure of "density" (hence integrable in a general sense). We call the dx1, …,dxn basic 1-forms.
We define the wedge product, "∧", a bilinear "multiplication" operator on these elements, with the alternating
property that

for all indices a. Note that alternation along with linearity and associativity implies dxb∧dxa = −dxa∧dxb. This also
ensures that the result of the wedge product has an orientation.
We define the set of all these products to be basic 2-forms, and similarly we define the set of products of the form
dxa∧dxb∧dxc to be basic 3-forms. A general k-form is then a weighted sum of basic k-forms, where the weights are
the smooth functions f. Together these form a vector space with basic k-forms as the basis vectors, and 0-forms
(smooth functions) as the field of scalars. The wedge product then extends to k-forms in the natural way. Over Rn at
most n covectors can be linearly independent, thus a k-form with k > n will always be zero, by the alternating
property.
In addition to the wedge product, there is also the exterior derivative operator d. This operator maps k-forms to
(k+1)-forms. For a k-form ω = f dxa over Rn, we define the action of d by:
Integral 186

with extension to general k-forms occurring linearly.


This more general approach allows for a more natural coordinate-free approach to integration on manifolds. It also
allows for a natural generalisation of the fundamental theorem of calculus, called Stokes' theorem, which we may
state as

where ω is a general k-form, and ∂Ω denotes the boundary of the region Ω. Thus, in the case that ω is a 0-form and
Ω is a closed interval of the real line, this reduces to the fundamental theorem of calculus. In the case that ω is a
1-form and Ω is a two-dimensional region in the plane, the theorem reduces to Green's theorem. Similarly, using
2-forms, and 3-forms and Hodge duality, we can arrive at Stokes' theorem and the divergence theorem. In this way
we can see that differential forms provide a powerful unifying view of integration.

Summations
The discrete equivalent of integration is summation. Summations and integrals can be put on the same foundations
using the theory of Lebesgue integrals or time scale calculus.

Methods

Computing integrals
The most basic technique for computing definite integrals of one real variable is based on the fundamental theorem
of calculus. Let f(x) be the function of x to be integrated over a given interval [a, b]. Then, find an antiderivative of f;
that is, a function F such that F' = f on the interval. By the fundamental theorem of calculus—provided the integrand
and integral have no singularities on the path of integration—

The integral is not actually the antiderivative, but the fundamental theorem provides a way to use antiderivatives to
evaluate definite integrals.
The most difficult step is usually to find the antiderivative of f. It is rarely possible to glance at a function and write
down its antiderivative. More often, it is necessary to use one of the many techniques that have been developed to
evaluate integrals. Most of these techniques rewrite one integral as a different one which is hopefully more tractable.
Techniques include:
• Integration by substitution
• Integration by parts
• Changing the order of integration
• Integration by trigonometric substitution
• Integration by partial fractions
• Integration by reduction formulae
• Integration using parametric derivatives
• Integration using Euler's formula
• Differentiation under the integral sign
• Contour Integration
Alternate methods exist to compute more complex integrals. Many nonelementary integrals can be expanded in a
Taylor series and integrated term by term. Occasionally, the resulting infinite series can be summed analytically. The
method of convolution using Meijer G-functions can also be used, assuming that the integrand can be written as a
product of Meijer G-functions. There are also many less common ways of calculating definite integrals; for instance,
Integral 187

Parseval's identity can be used to transform an integral over a rectangular region into an infinite sum. Occasionally,
an integral can be evaluated by a trick; for an example of this, see Gaussian integral.
Computations of volumes of solids of revolution can usually be done with disk integration or shell integration.
Specific results which have been worked out by various techniques are collected in the list of integrals.

Symbolic algorithms
Many problems in mathematics, physics, and engineering involve integration where an explicit formula for the
integral is desired. Extensive tables of integrals have been compiled and published over the years for this purpose.
With the spread of computers, many professionals, educators, and students have turned to computer algebra systems
that are specifically designed to perform difficult or tedious tasks, including integration. Symbolic integration
presents a special challenge in the development of such systems.
A major mathematical difficulty in symbolic integration is that in many cases, a closed formula for the antiderivative
of a rather simple-looking function does not exist. For instance, it is known that the antiderivatives of the functions
exp ( x2), xx and sin x /x cannot be expressed in the closed form involving only rational and exponential functions,
logarithm, trigonometric and inverse trigonometric functions, and the operations of multiplication and composition;
in other words, none of the three given functions is integrable in elementary functions. Differential Galois theory
provides general criteria that allow one to determine whether the antiderivative of an elementary function is
elementary. Unfortunately, it turns out that functions with closed expressions of antiderivatives are the exception
rather than the rule. Consequently, computerized algebra systems have no hope of being able to find an
antiderivative for a randomly constructed elementary function. On the positive side, if the 'building blocks' for
antiderivatives are fixed in advance, it may be still be possible to decide whether the antiderivative of a given
function can be expressed using these blocks and operations of multiplication and composition, and to find the
symbolic answer whenever it exists. The Risch algorithm, implemented in Mathematica and other computer algebra
systems, does just that for functions and antiderivatives built from rational functions, radicals, logarithm, and
exponential functions.
Some special integrands occur often enough to warrant special study. In particular, it may be useful to have, in the
set of antiderivatives, the special functions of physics (like the Legendre functions, the hypergeometric function, the
Gamma function, the Incomplete Gamma function and so on - see Symbolic integration for more details). Extending
the Risch-Norman algorithm so that it includes these functions is possible but challenging.
Most humans are not able to integrate such general formulae, so in a sense computers are more skilled at integrating
highly complicated formulae. Very complex formulae are unlikely to have closed-form antiderivatives, so how much
of an advantage this presents is a philosophical question that is open for debate.

Numerical quadrature
The integrals encountered in a basic calculus course are deliberately chosen for simplicity; those found in real
applications are not always so accommodating. Some integrals cannot be found exactly, some require special
functions which themselves are a challenge to compute, and others are so complex that finding the exact answer is
too slow. This motivates the study and application of numerical methods for approximating integrals, which today
use floating-point arithmetic on digital electronic computers. Many of the ideas arose much earlier, for hand
calculations; but the speed of general-purpose computers like the ENIAC created a need for improvements.
The goals of numerical integration are accuracy, reliability, efficiency, and generality. Sophisticated methods can
vastly outperform a naive method by all four measures (Dahlquist & Björck 2008; Kahaner, Moler & Nash 1989;
Stoer & Bulirsch 2002). Consider, for example, the integral
Integral 188

which has the exact answer 94⁄25 = 3.76. (In ordinary practice the answer is not known in advance, so an important
task — not explored here — is to decide when an approximation is good enough.) A “calculus book” approach
divides the integration range into, say, 16 equal pieces, and computes function values.

Spaced function values


x −2.00 −1.50 −1.00 −0.50 0.00 0.50 1.00 1.50 2.00

f(x) 2.22800 2.45663 2.67200 2.32475 0.64400 −0.92575 −0.94000 −0.16963 0.83600

x −1.75 −1.25 −0.75 −0.25 0.25 0.75 1.25 1.75

f(x) 2.33041 2.58562 2.62934 1.64019 −0.32444 −1.09159 −0.60387 0.31734

Using the left end of each piece, the rectangle method sums 16
function values and multiplies by the step width, h, here 0.25, to get an
approximate value of 3.94325 for the integral. The accuracy is not
impressive, but calculus formally uses pieces of infinitesimal width, so
initially this may seem little cause for concern. Indeed, repeatedly
doubling the number of steps eventually produces an approximation of
3.76001. However, 218 pieces are required, a great computational
expense for such little accuracy; and a reach for greater accuracy can
force steps so small that arithmetic precision becomes an obstacle.

A better approach replaces the horizontal tops of the rectangles with


slanted tops touching the function at the ends of each piece. This
trapezium rule is almost as easy to calculate; it sums all 17 function Numerical quadrature methods: ■ Rectangle,
■ Trapezoid, ■ Romberg, ■ Gauss
values, but weights the first and last by one half, and again multiplies
by the step width. This immediately improves the approximation to
3.76925, which is noticeably more accurate. Furthermore, only 210 pieces are needed to achieve 3.76000,
substantially less computation than the rectangle method for comparable accuracy.

Romberg's method builds on the trapezoid method to great effect. First, the step lengths are halved incrementally,
giving trapezoid approximations denoted by T(h0), T(h1), and so on, where hk+1 is half of hk. For each new step size,
only half the new function values need to be computed; the others carry over from the previous size (as shown in the
table above). But the really powerful idea is to interpolate a polynomial through the approximations, and extrapolate
to T(0). With this method a numerically exact answer here requires only four pieces (five function values)! The
Lagrange polynomial interpolating {hk,T(hk)}k=0…2 = {(4.00,6.128), (2.00,4.352), (1.00,3.908)} is 3.76+0.148h2,
producing the extrapolated value 3.76 at h = 0.

Gaussian quadrature often requires noticeably less work for superior accuracy. In this example, it can compute the
function values at just two x positions, ±2⁄√3, then double each value and sum to get the numerically exact answer.
The explanation for this dramatic success lies in error analysis, and a little luck. An n-point Gaussian method is exact
for polynomials of degree up to 2n−1. The function in this example is a degree 3 polynomial, plus a term that cancels
because the chosen endpoints are symmetric around zero. (Cancellation also benefits the Romberg method.)
Shifting the range left a little, so the integral is from −2.25 to 1.75, removes the symmetry. Nevertheless, the
trapezoid method is rather slow, the polynomial interpolation method of Romberg is acceptable, and the Gaussian
method requires the least work — if the number of points is known in advance. As well, rational interpolation can
use the same trapezoid evaluations as the Romberg method to greater effect.
Integral 189

Quadrature method cost comparison


Method Trapezoid Romberg Rational Gauss

Points 1048577 257 129 36

Rel. Err. −5.3×10−13 −6.3×10−15 8.8×10−15 3.1×10−15

Value

In practice, each method must use extra evaluations to ensure an error bound on an unknown function; this tends to
offset some of the advantage of the pure Gaussian method, and motivates the popular Gauss–Kronrod quadrature
formulae. Symmetry can still be exploited by splitting this integral into two ranges, from −2.25 to −1.75 (no
symmetry), and from −1.75 to 1.75 (symmetry). More broadly, adaptive quadrature partitions a range into pieces
based on function properties, so that data points are concentrated where they are needed most.
Simpson's rule, named for Thomas Simpson (1710–1761), uses a parabolic curve to approximate integrals. In many
cases, it is more accurate than the trapezoidal rule and others. The rule states that

with an error of

The computation of higher-dimensional integrals (for example, volume calculations) makes important use of such
alternatives as Monte Carlo integration.
A calculus text is no substitute for numerical analysis, but the reverse is also true. Even the best adaptive numerical
code sometimes requires a user to help with the more demanding integrals. For example, improper integrals may
require a change of variable or methods that can avoid infinite function values, and known properties like symmetry
and periodicity may provide critical leverage.

See also
• Lists of integrals – integrals of the most common functions
• Multiple integral
• Numerical integration
• Integral equation
• Riemann integral
• Riemann–Stieltjes integral
• Henstock–Kurzweil integral
• Lebesgue integration
• Darboux integral
• Riemann sum
• Symbolic integration
Integral 190

References
• Apostol, Tom M. (1967), Calculus, Vol. 1: One-Variable Calculus with an Introduction to Linear Algebra (2nd
ed.), Wiley, ISBN 978-0-471-00005-1
• Bourbaki, Nicolas (2004), Integration I, Springer Verlag, ISBN 3-540-41129-1. In particular chapters III and IV.
• Burton, David M. (2005), The History of Mathematics: An Introduction (6th ed.), McGraw-Hill, p. 359,
ISBN 978-0-07-305189-5
• Cajori, Florian (1929), A History Of Mathematical Notations Volume II [5], Open Court Publishing, pp. 247–252,
ISBN 978-0-486-67766-8
• Dahlquist, Germund; Björck, Åke (2008), "Chapter 5: Numerical Integration" [6], Numerical Methods in Scientific
Computing, Volume I, Philadelphia: SIAM
• Folland, Gerald B. (1984), Real Analysis: Modern Techniques and Their Applications (1st ed.), John Wiley &
Sons, ISBN 978-0-471-80958-6
• Fourier, Jean Baptiste Joseph (1822), Théorie analytique de la chaleur [7], Chez Firmin Didot, père et fils, p. §231
Available in translation as Fourier, Joseph (1878), The analytical theory of heat [8], Freeman, Alexander (trans.),
Cambridge University Press, pp. 200–201
• Heath, T. L., ed. (2002), The Works of Archimedes [9], Dover, ISBN 978-0-486-42084-4
(Originally published by Cambridge University Press, 1897, based on J. L. Heiberg's Greek version.)
• Hildebrandt, T. H. (1953), "Integration in abstract spaces" [10], Bulletin of the American Mathematical Society 59
(2): 111–139, ISSN 0273-0979
• Kahaner, David; Moler, Cleve; Nash, Stephen (1989), "Chapter 5: Numerical Quadrature", Numerical Methods
and Software, Prentice Hall, ISBN 978-0-13-627258-8
• Leibniz, Gottfried Wilhelm (1899), Gerhardt, Karl Immanuel, ed., Der Briefwechsel von Gottfried Wilhelm
Leibniz mit Mathematikern. Erster Band [11], Berlin: Mayer & Müller
• Miller, Jeff, Earliest Uses of Symbols of Calculus [12], retrieved 2009-11-22
• O’Connor, J. J.; Robertson, E. F. (1996), A history of the calculus [13], retrieved 2007-07-09
• Rudin, Walter (1987), "Chapter 1: Abstract Integration", Real and Complex Analysis (International ed.),
McGraw-Hill, ISBN 978-0-07-100276-9
• Saks, Stanisław (1964), Theory of the integral [14] (English translation by L. C. Young. With two additional notes
by Stefan Banach. Second revised ed.), New York: Dover
• Stoer, Josef; Bulirsch, Roland (2002), "Chapter 3: Topics in Integration", Introduction to Numerical Analysis (3rd
ed.), Springer, ISBN 978-0-387-95452-3.
• W3C (2006), Arabic mathematical notation [15]

External links
• Riemann Sum [16] by Wolfram Research

Online tools
• Wolfram Integrator [17] — Free online symbolic integration with Mathematica
• Mathematical Assistant on Web [18] — symbolic computations online. Allows to integrate in small steps (with
hints for next step (integration by parts, substitution, partial fractions, application of formulas and others),
powered by Maxima
• Function Calculator [19] from WIMS [20]
• Online integral calculator [21], numberempire.com
• Calculus : Integrate [22], quickmath.com
Integral 191

Online books
• Keisler, H. Jerome, Elementary Calculus: An Approach Using Infinitesimals [23], University of Wisconsin
• Stroyan, K.D., A Brief Introduction to Infinitesimal Calculus [24], University of Iowa
• Mauch, Sean, Sean's Applied Math Book [25], CIT, an online textbook that includes a complete introduction to
calculus
• Crowell, Benjamin, Calculus [26], Fullerton College, an online textbook
• Garrett, Paul, Notes on First-Year Calculus [27]
• Hussain, Faraz, Understanding Calculus [28], an online textbook
• Kowalk, W.P., Integration Theory [29], University of Oldenburg. A new concept to an old problem. Online
textbook
• Sloughter, Dan, Difference Equations to Differential Equations [30], an introduction to calculus
• Numerical Methods of Integration [31] at Holistic Numerical Methods Institute
• P.S. Wang, Evaluation of Definite Integrals by Symbolic Manipulation [32] (1972) - a cookbook of definite
integral techniques

References
[1] Shea, Marilyn (May 2007), Biography of Zu Chongzhi (http:/ / hua. umf. maine. edu/ China/ astronomy/ tianpage/ 0014ZuChongzhi9296bw.
html), University of Maine, , retrieved 9 January 2009
Katz, Victor J. (2004), A History of Mathematics, Brief Version, Addison-Wesley, pp. 125–126, ISBN 978-0-321-16193-2
[2] Victor J. Katz (1995), "Ideas of Calculus in Islam and India", Mathematics Magazine 68 (3): 163-174 [165]
[3] Victor J. Katz (1995), "Ideas of Calculus in Islam and India", Mathematics Magazine 68 (3): 163–174 [165–9 & 173–4]
[4] http:/ / www2. gol. com/ users/ coynerhm/ 0598rothman. html
[5] http:/ / www. archive. org/ details/ historyofmathema027671mbp
[6] http:/ / www. mai. liu. se/ ~akbjo/ NMbook. html
[7] http:/ / books. google. com/ books?id=TDQJAAAAIAAJ
[8] http:/ / www. archive. org/ details/ analyticaltheory00fourrich
[9] http:/ / www. archive. org/ details/ worksofarchimede029517mbp
[10] http:/ / projecteuclid. org/ euclid. bams/ 1183517761
[11] http:/ / name. umdl. umich. edu/ AAX2762. 0001. 001
[12] http:/ / jeff560. tripod. com/ calculus. html
[13] http:/ / www-history. mcs. st-andrews. ac. uk/ HistTopics/ The_rise_of_calculus. html
[14] http:/ / matwbn. icm. edu. pl/ kstresc. php?tom=7& wyd=10& jez=
[15] http:/ / www. w3. org/ TR/ arabic-math/
[16] http:/ / mathworld. wolfram. com/ RiemannSum. html
[17] http:/ / integrals. wolfram. com
[18] http:/ / user. mendelu. cz/ marik/ maw/ index. php?lang=en& form=integral
[19] http:/ / wims. unice. fr/ wims/ wims. cgi?module=tool/ analysis/ function. en
[20] http:/ / wims. unice. fr
[21] http:/ / www. numberempire. com/ integralcalculator. php
[22] http:/ / www. quickmath. com/ webMathematica3/ quickmath/ page. jsp?s1=calculus& s2=integrate& s3=basic
[23] http:/ / www. math. wisc. edu/ ~keisler/ calc. html
[24] http:/ / www. math. uiowa. edu/ ~stroyan/ InfsmlCalculus/ InfsmlCalc. htm
[25] http:/ / www. its. caltech. edu/ ~sean/ book/ unabridged. html
[26] http:/ / www. lightandmatter. com/ calc/
[27] http:/ / www. math. umn. edu/ ~garrett/ calculus/
[28] http:/ / www. understandingcalculus. com
[29] http:/ / einstein. informatik. uni-oldenburg. de/ 20910. html
[30] http:/ / math. furman. edu/ ~dcs/ book
[31] http:/ / numericalmethods. eng. usf. edu/ topics/ integration. html
[32] http:/ / www. lcs. mit. edu/ publications/ specpub. php?id=660
Function (mathematics) 192

Function (mathematics)
The mathematical concept of a function expresses the
intuitive idea that one quantity (the argument of the
function, also known as the input) completely
determines another quantity (the value, or the output).
A function assigns a unique value to each input of a
specified type. The argument and the value may be real
numbers, but they can also be elements from any given
sets: the domain and the codomain of the function. An
example of a function with the real numbers as both its
domain and codomain is the function f(x) = 2x, which
assigns to every real number the real number with
twice its value. In this case, it is written that f(5) = 10.

In addition to elementary functions on numbers,


functions include maps between algebraic structures
like groups and maps between geometric objects like
manifolds. In the abstract set-theoretic approach, a
Graph of example function, Both
function is a relation between the domain and the
the domain and the range in the picture are the set of real numbers
codomain that associates each element in the domain
between -1 and 1.5.
with exactly one element in the codomain. An example
of a function with domain {A,B,C} and codomain {1,2,3} associates A with 1, B with 2, and C with 3.

There are many ways to describe or represent functions: by a formula, by an algorithm that computes it, by a plot or
a graph. A table of values is a common way to specify a function in statistics, physics, chemistry, and other sciences.
A function may also be described through its relationship to other functions, for example, as the inverse function or a
solution of a differential equation. There are uncountably many different functions from the set of natural numbers to
itself, most of which cannot be expressed with a formula or an algorithm.
In a setting where they have numerical outputs, functions may be added and multiplied, yielding new functions.
Collections of functions with certain properties, such as continuous functions and differentiable functions, usually
required to be closed under certain operations, are called function spaces and are studied as objects in their own
right, in such disciplines as real analysis and complex analysis. An important operation on functions, which
distinguishes them from numbers, is the composition of functions.

Overview
Because functions are so widely used, many traditions have grown up around their use. The symbol for the input to a
function is often called the independent variable or argument and is often represented by the letter x or, if the input
is a particular time, by the letter t. The symbol for the output is called the dependent variable or value and is often
represented by the letter y. The function itself is most often called f, and thus the notation y = f(x) indicates that a
function named f has an input named x and an output named y.
Function (mathematics) 193

The set of all permitted inputs to a given function is called the domain
of the function. The set of all resulting outputs is called the image or
range of the function. The image is often a subset of some larger set,
called the codomain of a function. Thus, for example, the function
f(x) = x2 could take as its domain the set of all real numbers, as its
image the set of all non-negative real numbers, and as its codomain the
set of all real numbers. In that case, we would describe f as a
real-valued function of a real variable. Sometimes, especially in
computer science, the term "range" refers to the codomain rather than
the image, so care needs to be taken when using the word.

It is usual practice in mathematics to introduce functions with


A function ƒ takes an input, x, and returns an
temporary names like ƒ. For example, ƒ(x) = 2x+1, implies ƒ(3) = 7; output ƒ(x). One metaphor describes the function
when a name for the function is not needed, the form y = 2x+1 may be as a "machine" or "black box" that converts the
used. If a function is often used, it may be given a more permanent input into the output.

name as, for example,

Functions need not act on numbers: the domain and codomain of a function may be arbitrary sets. One example of a
function that acts on non-numeric inputs takes English words as inputs and returns the first letter of the input word as
output. Furthermore, functions need not be described by any expression, rule or algorithm: indeed, in some cases it
may be impossible to define such a rule. For example, the association between inputs and outputs in a choice
function often lacks any fixed rule, although each input element is still associated to one and only one output.
A function of two or more variables is considered in formal mathematics as having a domain consisting of ordered
pairs or tuples of the argument values. For example Sum(x,y) = x+y operating on integers is the function Sum with a
domain consisting of pairs of integers. Sum then has a domain consisting of elements like (3,4), a codomain of
integers, and an association between the two that can be described by a set of ordered pairs like ((3,4), 7). Evaluating
Sum(3,4) then gives the value 7 associated with the pair (3,4).
A family of objects indexed by a set is equivalent to a function. For example, the sequence 1, 1/2, 1/3, ..., 1/n, ... can
be written as the ordered sequence <1/n> where n is a natural number, or as a function f(n) = 1/n from the set of
natural numbers into the set of rational numbers.
Dually, a surjective function partitions its domain into disjoint sets indexed by the codomain. This partition is known
as the kernel of the function, and the parts are called the fibers or level sets of the function at each element of the
codomain. (A non-surjective function divides its domain into disjoint and possibly-empty subsets).

Definition
One precise definition of a function is that it consists of an ordered triple of sets, which may be written as (X, Y, F).
X is the domain of the function, Y is the codomain, and F is a set of ordered pairs. In each of these ordered pairs (a,
b), the first element a is from the domain, the second element b is from the codomain, and every element in the
domain is the first element in one and only one ordered pair. The set of all b is known as the image of the function.
Some authors use the term "range" to mean the image, others to mean the codomain.
The notation ƒ:X→Y indicates that ƒ is a function with domain X and codomain Y.
In most practical situations, the domain and codomain are understood from context, and only the relationship
between the input and output is given. Thus

is usually written as
Function (mathematics) 194

The graph of a function is its set of ordered pairs. Such a set can be plotted on a pair of coordinate axes; for example,
(3, 9) is the point of intersection of the lines x = 3 and y = 9.
A function is a special case of a more general mathematical concept, the relation, for which the restriction that each
element of the domain appear as the first element in one and only one ordered pair is removed (or, in other words,
the restriction that each input be associated to exactly one output). A relation is "single-valued" or "functional" when
for each element of the domain set, the graph contains at most one ordered pair (and possibly none) with it as a first
element. A relation is called "left-total" or simply "total" when for each element of the domain, the graph contains at
least one ordered pair with it as a first element (and possibly more than one). A relation that is both left-total and
single-valued is a function.
In some parts of mathematics, including recursion theory and functional analysis, it is convenient to study partial
functions in which some values of the domain have no association in the graph; i.e., single-valued relations. For
example, the function f such that f(x) = 1/x does not define a value for x = 0, and so is only a partial function from the
real line to the real line. The term total function can be used to stress the fact that every element of the domain does
appear as the first element of an ordered pair in the graph. In other parts of mathematics, non-single-valued relations
are similarly conflated with functions: these are called multivalued functions, with the corresponding term
single-valued function for ordinary functions.
Some authors (especially in set theory) define a function as simply its graph f, with the restriction that the graph
should not contain two distinct ordered pairs with the same first element. Indeed, given such a graph, one can
construct a suitable triple by taking the set of all first elements as the domain and the set of all second elements as the
codomain: this automatically causes the function to be total and surjective . However, most authors in advanced
mathematics outside of set theory prefer the greater power of expression afforded by defining a function as an
ordered triple of sets.
Many operations in set theory—such as the power set—have the class of all sets as their domain, therefore, although
they are informally described as functions, they do not fit the set-theoretical definition above outlined.

Vocabulary
A specific input in a function is called an argument of the function. For each argument value x, the corresponding
unique y in the codomain is called the function value at x, output of ƒ for an argument x, or the image of x under ƒ.
The image of x may be written as ƒ(x) or as y.
The graph of a function ƒ is the set of all ordered pairs (x, ƒ(x)), for all x in the domain X. If X and Y are subsets of R,
the real numbers, then this definition coincides with the familiar sense of "graph" as a picture or plot of the function,
with the ordered pairs being the Cartesian coordinates of points.
A function can also be called a map or a mapping. Some authors, however, use the terms "function" and "map" to
refer to different types of functions. Other specific types of functions include functionals and operators.
Function (mathematics) 195

Notation
Formal description of a function typically involves the function's name, its domain, its codomain, and a rule of
correspondence. Thus we frequently see a two-part notation, an example being

where the first part is read:


• "ƒ is a function from N to R" (one often writes informally "Let ƒ: X → Y" to mean "Let ƒ be a function from X to
Y"), or
• "ƒ is a function on N into R", or
• "ƒ is an R-valued function of an N-valued variable",
and the second part is read:

• maps to

Here the function named "ƒ" has the natural numbers as domain, the real numbers as codomain, and maps n to itself
divided by π. Less formally, this long form might be abbreviated

where f(n) is read as "f as function of n" or "f of n". There is some loss of information: we no longer are explicitly
given the domain N and codomain R.
It is common to omit the parentheses around the argument when there is little chance of confusion, thus: sin x; this is
known as prefix notation. Writing the function after its argument, as in x ƒ, is known as postfix notation; for example,
the factorial function is customarily written n!, even though its generalization, the gamma function, is written Γ(n).
Parentheses are still used to resolve ambiguities and denote precedence, though in some formal settings the
consistent use of either prefix or postfix notation eliminates the need for any parentheses.

Functions with multiple inputs and outputs


The concept of function can be extended to an object that takes a combination of two (or more) argument values to a
single result. This intuitive concept is formalized by a function whose domain is the Cartesian product of two or
more sets.
For example, consider the function that associates two integers to their product: ƒ(x, y) = x·y. This function can be
defined formally as having domain Z×Z , the set of all integer pairs; codomain Z; and, for graph, the set of all pairs
((x,y), x·y). Note that the first component of any such pair is itself a pair (of integers), while the second component is
a single integer.
The function value of the pair (x,y) is ƒ((x,y)). However, it is customary to drop one set of parentheses and consider
ƒ(x,y) a function of two variables, x and y. Functions of two variables may be plotted on the three-dimensional
Cartesian as ordered triples of the form (x,y,f(x,y)).
The concept can still further be extended by considering a function that also produces output that is expressed as
several variables. For example, consider the function swap(x, y) = (y, x) with domain R×R and codomain R×R as
well. The pair (y, x) is a single value in the codomain seen as a Cartesian product.
Function (mathematics) 196

Currying
An alternative approach to handling functions with multiple arguments is to transform them into a chain of functions
that each takes a single argument. For instance, one can interpret Add(3,5) to mean "first produce a function that
adds 3 to its argument, and then apply the 'Add 3' function to 5". This transformation is called currying: Add 3 is
curry(Add) applied to 3. There is a bijection between the function spaces CA×B and (CB)A.
When working with curried functions it is customary to use prefix notation with function application considered
left-associative, since juxtaposition of multiple arguments—as in (ƒ x y)—naturally maps to evaluation of a curried
function. Conversely, the → and ⟼ symbols are considered to be right-associative, so that curried functions may be
defined by a notation such as ƒ: Z → Z → Z = x ⟼ y ⟼ x·y

Binary operations
The familiar binary operations of arithmetic, addition and multiplication, can be viewed as functions from R×R to R.
This view is generalized in abstract algebra, where n-ary functions are used to model the operations of arbitrary
algebraic structures. For example, an abstract group is defined as a set X and a function ƒ from X×X to X that satisfies
certain properties.
Traditionally, addition and multiplication are written in the infix notation: x+y and x×y instead of +(x, y) and ×(x, y).

Injective and surjective functions


Three important kinds of function are the injections (or one-to-one functions), which have the property that if ƒ(a) =
ƒ(b) then a must equal b; the surjections (or onto functions), which have the property that for every y in the
codomain there is an x in the domain such that ƒ(x) = y; and the bijections, which are both one-to-one and onto. This
nomenclature was introduced by the Bourbaki group.
When the definition of a function by its graph only is used, since the codomain is not defined, the "surjection" must
be accompanied with a statement about the set the function maps onto. For example, we might say ƒ maps onto the
set of all real numbers.
Function (mathematics) 197

Function composition
The function composition of two or
more functions takes the output of one
or more functions as the input of
others. The functions ƒ: X → Y and
g: Y → Z can be composed by first
applying ƒ to an argument x to obtain y
= ƒ(x) and then applying g to y to
obtain z = g(y). The composite
function formed in this way from
general ƒ and g may be written

A composite function g(f(x)) can be visualized as the combination of two "machines".


The first takes input x and outputs f(x). The second takes f(x) and outputs g(f(x)).

This notation follows the form such that

The function on the right acts first and the function on the left acts second, reversing English reading order. We
remember the order by reading the notation as "g of ƒ". The order is important, because rarely do we get the same
result both ways. For example, suppose ƒ(x) = x2 and g(x) = x+1. Then g(ƒ(x)) = x2+1, while ƒ(g(x)) = (x+1)2, which is
x2+2x+1, a different function.
In a similar way, the function given above by the formula y = 5x−20x3+16x5 can be obtained by composing several
functions, namely the addition, negation, and multiplication of real numbers.
An alternative to the colon notation, convenient when functions are being composed, writes the function name above
the arrow. For example, if ƒ is followed by g, where g produces the complex number eix, we may write

A more elaborate form of this is the commutative diagram.


Function (mathematics) 198

Identity function
The unique function over a set X that maps each element to itself is called the identity function for X, and typically
denoted by idX. Each set has its own identity function, so the subscript cannot be omitted unless the set can be
inferred from context. Under composition, an identity function is "neutral": if ƒ is any function from X to Y, then

Restrictions and extensions


Informally, a restriction of a function ƒ is the result of trimming its domain.
More precisely, if ƒ is a function from a X to Y, and S is any subset of X, the restriction of ƒ to S is the function ƒ|S
from S to Y such that ƒ|S(s) = ƒ(s) for all s in S.
If g is a restriction of ƒ, then it is said that ƒ is an extension of g.
The overriding of f: X → Y by g: W → Y (also called overriding union) is an extension of g denoted as (f ⊕ g): (X
∪ W) → Y. Its graph is the set-theoretical union of the graphs of g and f|X \ W. Thus, it relates any element of the
domain of g to its image under g, and any other element of the domain of f to its image under f. Overriding is an
associative operation; it has the empty function as an identity element. If f|X ∩ W and g|X ∩ W are pointwise equal
(e.g., the domains of f and g are disjoint), then the union of f and g is defined and is equal to their overriding union.
This definition agrees with the definition of union for binary relations.

Inverse function
If ƒ is a function from X to Y then an inverse function for ƒ, denoted by ƒ−1, is a function in the opposite direction,
from Y to X, with the property that a round trip (a composition) returns each element to itself. Not every function has
an inverse; those that do are called invertible. The inverse function exists if and only if ƒ is a bijection.
As a simple example, if ƒ converts a temperature in degrees Celsius C to degrees Fahrenheit F, the function
converting degrees Fahrenheit to degrees Celsius would be a suitable ƒ−1.

The notation for composition is similar to multiplication; in fact, sometimes it is denoted using juxtaposition, gƒ,
without an intervening circle. With this analogy, identity functions are like the multiplicative identity, 1, and inverse
functions are like reciprocals (hence the notation).
For functions that are injections or surjections, generalized inverse functions can be defined, called left and right
inverses respectively. Left inverses map to the identity when composed to the left; right inverses when composed to
the right.

Image of a set
The concept of the image can be extended from the image of a point to the image of a set. If A is any subset of the
domain, then ƒ(A) is the subset of im ƒ consisting of all images of elements of A. We say the ƒ(A) is the image of A
under f.
Use of ƒ(A) to denote the image of a subset A⊆X is consistent so long as no subset of the domain is also an element of
the domain. In some fields (e.g., in set theory, where ordinals are also sets of ordinals) it is convenient or even
necessary to distinguish the two concepts; the customary notation is ƒ[A] for the set { ƒ(x): x ∈ A }; some authors
write ƒ`x instead of ƒ(x), and ƒ``A instead of ƒ[A].
Notice that the image of ƒ is the image ƒ(X) of its domain, and that the image of ƒ is a subset of its codomain.
Function (mathematics) 199

Inverse image
The inverse image (or preimage, or more precisely, complete inverse image) of a subset B of the codomain Y
under a function ƒ is the subset of the domain X defined by

So, for example, the preimage of {4, 9} under the squaring function is the set {−3,−2,2,3}.
In general, the preimage of a singleton set (a set with exactly one element) may contain any number of elements. For
example, if ƒ(x) = 7, then the preimage of {5} is the empty set but the preimage of {7} is the entire domain. Thus the
preimage of an element in the codomain is a subset of the domain. The usual convention about the preimage of an
element is that ƒ−1(b) means ƒ−1({b}), i.e

In the same way as for the image, some authors use square brackets to avoid confusion between the inverse image
and the inverse function. Thus they would write ƒ−1[B] and ƒ−1[b] for the preimage of a set and a singleton.
The preimage of a singleton set is sometimes called a fiber. The term kernel can refer to a number of related
concepts.

Specifying a function
A function can be defined by any mathematical condition relating each argument to the corresponding output value.
If the domain is finite, a function ƒ may be defined by simply tabulating all the arguments x and their corresponding
function values ƒ(x). More commonly, a function is defined by a formula, or (more generally) an algorithm — a
recipe that tells how to compute the value of ƒ(x) given any x in the domain.
There are many other ways of defining functions. Examples include piecewise definitions, induction or recursion,
algebraic or analytic closure, limits, analytic continuation, infinite series, and as solutions to integral and differential
equations. The lambda calculus provides a powerful and flexible syntax for defining and combining functions of
several variables.

Computability
Functions that send integers to integers, or finite strings to finite strings, can sometimes be defined by an algorithm,
which gives a precise description of a set of steps for computing the output of the function from its input. Functions
definable by an algorithm are called computable functions. For example, the Euclidean algorithm gives a precise
process to compute the greatest common divisor of two positive integers. Many of the functions studied in the
context of number theory are computable.
Fundamental results of computability theory show that there are functions that can be precisely defined but are not
computable. Moreover, in the sense of cardinality, almost all functions from the integers to integers are not
computable. The number of computable functions from integers to integers is countable, because the number of
possible algorithms is. The number of all functions from integers to integers is higher: the same as the cardinality of
the real numbers. Thus most functions from integers to integers are not computable. Specific examples of
uncomputable functions are known, including the busy beaver function and functions related to the halting problem
and other undecidable problems.
Function (mathematics) 200

Function spaces
The set of all functions from a set X to a set Y is denoted by X → Y, by [X → Y], or by YX.
The latter notation is motivated by the fact that, when X and Y are finite and of size |X| and |Y|, then the number of
functions X → Y is |YX| = |Y||X|. This is an example of the convention from enumerative combinatorics that provides
notations for sets based on their cardinalities. Other examples are the multiplication sign X×Y used for the Cartesian
product, where |X×Y| = |X|·|Y|; the factorial sign X!, used for the set of permutations where |X!| = |X|!; and the
binomial coefficient sign , used for the set of n-element subsets where

If ƒ: X → Y, it may reasonably be concluded that ƒ ∈ [X → Y].

Pointwise operations
If ƒ: X → R and g: X → R are functions with a common domain of X and common codomain of a ring R, then the
sum function ƒ + g: X → R and the product function ƒ ⋅ g: X → R can be defined as follows:

for all x in X.
This turns the set of all such functions into a ring. The binary operations in that ring have as domain ordered pairs of
functions, and as codomain functions. This is an example of climbing up in abstraction, to functions of more
complex types.
By taking some other algebraic structure A in the place of R, we can turn the set of all functions from X to A into an
algebraic structure of the same type in an analogous way.

Other properties
There are many other special classes of functions that are important to particular branches of mathematics, or
particular applications. Here is a partial list:

• bijection, injection and surjection, or individually: • linear, polynomial, rational • convex, monotonic, unimodal
• injective, surjective, and bijective function • algebraic, transcendental • holomorphic, meromorphic, entire
• continuous • trigonometric • vector-valued
• differentiable, integrable • fractal • computable
• odd or even

History

Functions prior to Leibniz


Historically, some mathematicians can be regarded as having foreseen and come close to a modern
formulation of the concept of function. Among them is Oresme (1323-1382) . . . In his theory, some general
ideas about independent and dependent variable quantities seem to be present.[1] [2]
Ponte further notes that "The emergence of a notion of function as an individualized mathematical entity can be
traced to the beginnings of infinitesimal calculus".[1]
Function (mathematics) 201

The notion of "function" in analysis


As a mathematical term, "function" was coined by Gottfried Leibniz, in a 1673 letter, to describe a quantity related
to a curve, such as a curve's slope at a specific point.[3] [4] The functions Leibniz considered are today called
differentiable functions. For this type of function, one can talk about limits and derivatives; both are measurements
of the output or the change in the output as it depends on the input or the change in the input. Such functions are the
basis of calculus.
Johann Bernoulli "by 1718, had come to regard a function as any expression made up of a variable and some
constants",[5] and Leonhard Euler during the mid-18th century used the word to describe an expression or formula
involving variables and constants e.g., x2+3x+2.[6]
Alexis Claude Clairaut (in approximately 1734) and Euler introduced the familiar notation " f(x) ".[6]
At first, the idea of a function was rather limited. Joseph Fourier, for example, claimed that every function had a
Fourier series, something no mathematician would claim today. By broadening the definition of functions,
mathematicians were able to study "strange" mathematical objects such as continuous functions that are nowhere
differentiable. These functions were first thought to be only theoretical curiosities, and they were collectively called
"monsters" as late as the turn of the 20th century. However, powerful techniques from functional analysis have
shown that these functions are, in a precise sense, more common than differentiable functions. Such functions have
since been applied to the modeling of physical phenomena such as Brownian motion.
During the 19th century, mathematicians started to formalize all the different branches of mathematics. Weierstrass
advocated building calculus on arithmetic rather than on geometry, which favoured Euler's definition over Leibniz's
(see arithmetization of analysis).
Dirichlet and Lobachevsky are traditionally credited with independently giving the modern "formal" definition of a
function as a relation in which every first element has a unique second element. Eves asserts that "the student of
mathematics usually meets the Dirichlet definition of function in his introductory course in calculus,[7] but Dirichlet's
claim to this formalization is disputed by Imre Lakatos:
There is no such definition in Dirichlet's works at all. But there is ample evidence that he had no idea of this
concept. In his [1837], for instance, when he discusses piecewise continuous functions, he says that at points
of discontinuity the function has two values: ...
(Proofs and Refutations, 151, Cambridge University Press 1976.)
In the context of "the Differential Calculus" George Boole defined (circa 1849) the notion of a function as follows:
"That quantity whose variation is uniform . . . is called the independent variable. That quantity whose variation
is referred to the variation of the former is said to be a function of it. The Differential calculus enables us in
every case to pass from the function to the limit. This it does by a certain Operation. But in the very Idea of an
Operation is . . . the idea of an inverse operation. To effect that inverse operation in the present instance is the
business of the Int[egral] Calculus."[8]

The logician's "function" prior to 1850


Logicians of this time were primarily involved with analyzing syllogisms (the 2000 year-old Aristotelian forms and
otherwise), or as Augustus De Morgan (1847) stated it: "the examination of that part of reasoning which depends
upon the manner in which inferences are formed, and the investigation of general maxims and rules for constructing
arguments".[9] At this time the notion of (logical) "function" is not explicit, but at least in the work of De Morgan
and George Boole it is implied: we see abstraction of the argument forms, the introduction of variables, the
introduction of a symbolic algebra with respect to these variables, and some of the notions of set theory.
De Morgan's 1847 "FORMAL LOGIC OR, The Calculus of Inference, Necessary and Probable" observes that "[a]
logical truth depends upon the structure of the statement, and not upon the particular matters spoken of"; he wastes
no time (preface page i) abstracting: "In the form of the proposition, the copula is made as absract as the terms". He
Function (mathematics) 202

immediately (p. 1) casts what he calls "the proposition" (present-day propositional function or relation) into a form
such as "X is Y", where the symbols X, "is", and Y represent, respectively, the subject, copula, and predicate. While
the word "function" does not appear, the notion of "abstraction" is there, "variables" are there, the notion of inclusion
in his symbolism “all of the Δ is in the О” (p. 9) is there, and lastly a new symbolism for logical analysis of the
notion of "relation" (he uses the word with respect to this example " X)Y " (p. 75) ) is there:
" A1 X)Y To take an X it is necessary to take a Y" [or To be an X it is necessary to be a Y]
" A1 Y)X To take an Y it is sufficient to take a X" [or To be a Y it is sufficient to be an X], etc.
In his 1848 The Nature of Logic Boole asserts that "logic . . . is in a more especial sense the science of reasoning by
signs", and he briefly discusses the notions of "belonging to" and "class": "An individual may possess a great variety
of attributes and thus belonging to a great variety of different classes" .[10] Like De Morgan he uses the notion of
"variable" drawn from analysis; he gives an example of "represent[ing] the class oxen by x and that of horses by y
and the conjunction and by the sign + . . . we might represent the aggregate class oxen and horses by x + y".[11]

The logicians' "function" 1850-1950


Eves observes "that logicians have endeavored to push down further the starting level of the definitional
development of mathematics and to derive the theory of sets, or classes, from a foundation in the logic of
propositions and propositional functions".[12] But by the late 19th century the logicians' research into the foundations
of mathematics was undergoing a major split. The direction of the first group, the Logicists, can probably be
summed up best by Bertrand Russell 1903:9 -- "to fulfil two objects, first, to show that all mathematics follows from
symbolic logic, and secondly to discover, as far as possible, what are the principles of symbolic logic itself."
The second group of logicians, the set-theorists, emerged with Georg Cantor's "set theory" (1870–1890) but were
driven forward partly as a result of Russell's discovery of a paradox that could be derived from Frege's conception of
"function", but also as a reaction against Russell's proposed solution.[13] Zermelo's set-theoretic response was his
1908 Investigations in the foundations of set theory I -- the first axiomatic set theory; here too the notion of
"propositional function" plays a role.

George Boole's The Laws of Thought 1854; John Venn's Symbolic Logic 1881
In his An Investigation into the laws of thought Boole now defined a function in terms of a symbol x as follows:
"8. Definition.-- Any algebraic expression involving symbol x is termed a function of x, and may be
represented by the abbreviated form f(x)"[14]
Boole then used algebraic expressions to define both algebraic and logical notions, e.g., 1−x is logical NOT(x), xy is
the logical AND(x,y), x + y is the logical OR(x, y), x(x+y) is xx+xy, and "the special law" xx = x2 = x.[15]
In his 1881 Symbolic Logic Venn was using the words "logical function" and the contemporary symbolism ( x = f(y),
y = f−1(x), cf page xxi) plus the circle-diagrams historically associated with Venn to describe "class relations",[16] the
notions "'quantifying' our predicate", "propositions in respect of their extension", "the relation of inclusion and
exclusion of two classes to one another", and "propositional function" (all on p. 10), the bar over a variable to
indicate not-x (page 43), etc. Indeed he equated unequivocally the notion of "logical function" with "class" [modern
"set"]: "... on the view adopted in this book, f(x) never stands for anything but a logical class. It may be a compound
class aggregated of many simple classes; it may be a class indicated by certain inverse logical operations, it may be
composed of two groups of classes equal to one another, or what is the same thing, their difference declared equal to
zero, that is, a logical equation. But however composed or derived, f(x) with us will never be anything else than a
general expression for such logical classes of things as may fairly find a place in ordinary Logic".[17]
Function (mathematics) 203

Frege's Begriffsschrift 1879


Gottlob Frege's Begriffsschrift (1879) preceded Giuseppe Peano (1889), but Peano had no knowledge of Frege 1879
until after he had published his 1889.[18] Both writers strongly influenced Bertrand Russell (1903). Russell in turn
influenced much of 20th-century mathematics and logic through his Principia Mathematica (1913) jointly authored
with Alfred North Whitehead.
At the outset Frege abandons the traditional "concepts subject and predicate", replacing them with argument and
function respectively, which he believes "will stand the test of time. It is easy to see how regarding a content as a
function of an argument leads to the formation of concepts. Furthermore, the demonstration of the connection
between the meanings of the words if, and, not, or, there is, some, all, and so forth, deserves attention".[19]
Frege begins his discussion of "function" with an example: Begin with the expression[20] "Hydrogen is lighter than
carbon dioxide". Now remove the sign for hydrogen (i.e., the word "hydrogen") and replace it with the sign for
oxygen (i.e., the word "oxygen"); this makes a second statement. Do this again (using either statement) and
substitute the sign for nitrogen (i.e., the word "nitrogen") and note that "This changes the meaning in such a way that
"oxygen" or "nitrogen" enters into the relations in which "hydrogen" stood before".[21] There are three statements:
• "Hydrogen is lighter than carbon dioxide."
• "Oxygen is lighter than carbon dioxide."
• "Nitrogen is lighter than carbon dioxide."
Now observe in all three a "stable component, representing the totality of [the] relations";[22] call this the function,
i.e.,
"... is lighter than carbon dioxide", is the function.
Frege calls the argument of the function "[t]he sign [e.g., hydrogen, oxygen, or nitrogen], regarded as replaceable by
others that denotes the object standing in these relations".[23] He notes that we could have derived the function as
"Hydrogen is lighter than . . .." as well, with an argument position on the right; the exact observation is made by
Peano (see more below). Finally, Frege allows for the case of two (or more arguments). For example, remove
"carbon dioxide" to yield the invariant part (the function) as:
• "... is lighter than ... "
The one-argument function Frege generalizes into the form Φ(A) where A is the argument and Φ( ) represents the
function, whereas the two-argument function he symbolizes as Ψ(A, B) with A and B the arguments and Ψ( , ) the
function and cautions that "in general Ψ(A, B) differs from Ψ(B, A)". Using his unique symbolism he translates for
the reader the following symbolism:
"We can read |--- Φ(A) as "A has the property Φ. |--- Ψ(A, B) can be translated by "B stands in the relation Ψ
to A" or "B is a result of an application of the procedure Ψ to the object A".[24]

Peano 1889 The Principles of Arithmetic 1889


Peano defined the notion of "function" in a manner somewhat similar to Frege, but without the precision.[25] First
Peano defines the sign "K means class, or aggregate of objects",[26] the objects of which satisfy three simple
equality-conditions,[27] a = a, (a = b) = (b = a), IF ((a = b) AND (b = c)) THEN (a = c). He then introduces φ, "a
sign or an aggregate of signs such that if x is an object of the class s, the expression φx denotes a new object". Peano
adds two conditions on these new objects: First, that the three equality-conditions hold for the objects φx; secondly,
that "if x and y are objects of class s and if x = y, we assume it is possible to deduce φx = φy".[28] Given all these
conditions are met, φ is a "function presign". Likewise he identifies a "function postsign". For example if φ is the
function presign a+, then φx yields a+x, or if φ is the function postsign +a then xφ yields x+a.[29]
Function (mathematics) 204

Bertrand Russell's The Principles of Mathematics 1903


While the influence of Cantor and Peano was paramount,[30] in Appendix A "The Logical and Arithmetical
Doctrines of Frege" of The Principles of Mathematics, Russell arrives at a discussion of Frege's notion of function,
"...a point in which Frege's work is very important, and requires careful examination".[31] In response to his 1902
exchange of letters with Frege about the contradiction he discovered in Frege's Begriffsschrift Russell tacked this
section on at the last moment.
For Russell the bedeviling notion is that of "variable": "6. Mathematical propositions are not only characterized by
the fact that they assert implications, but also by the fact that they contain variables. The notion of the variable is one
of the most difficult with which logic has to deal. For the present, I openly wish to make it plain that there are
variables in all mathematical propositions, even where at first sight they might seem to be absent. . . . We shall find
always, in all mathematical propositions, that the words any or some occur; and these words are the marks of a
variable and a formal implication".[32]
As expressed by Russell "the process of transforming constants in a proposition into variables leads to what is called
generalization, and gives us, as it were, the formal essence of a proposition ... So long as any term in our proposition
can be turned into a variable, our proposition can be generalized; and so long as this is possible, it is the business of
mathematics to do it";[33] these generalizations Russell named propositional functions".[34] Indeed he cites and
quotes from Frege's Begriffsschrift and presents a vivid example from Frege's 1891 Function und Begriff: That "the
essence of the arithmetical function 2*x3+x is what is left when the x is taken away, i.e., in the above instance 2*( )3
+ ( ). The argument x does not belong to the function but the two taken together make the whole".[31] Russell agreed
with Frege's notion of "function" in one sense: "He regards functions -- and in this I agree with him -- as more
fundamental than predicates and relations" but Russell rejected Frege's "theory of subject and assertion", in particular
"he thinks that, if a term a occurs in a proposition, the proposition can always be analysed into a and an assertion
about a".[31]

Evolution of Russell's notion of "function" 1908-1913


Russell would carry his ideas forward in his 1908 Mathematical logical as based on the theory of types and into his
and Whitehead's 1910-1913 Principia Mathematica. By the time of Principia Mathematica Russell, like Frege,
considered the propositional function fundamental: "Propositional functions are the fundamental kind from which the
more usual kinds of function, such as “sin ‘’x’’ or log x or "the father of x" are derived. These derivative functions . . .
are called “descriptive functions". The functions of propositions . . . are a particular case of propositional
functions".[35]
Propositional functions: Because his terminology is different from the contemporary, the reader may be confused
by Russell's "propositional function". An example may help. Russell writes a propositional function in its raw form,
e.g., as φŷ: "ŷ is hurt". (Observe the circumflex or "hat" over the variable y). For our example, we will assign just 4
values to the variable ŷ: "Bob", "This bird", "Emily the rabbit", and "y". Substitution of one of these values for
variable ŷ yields a proposition; this proposition is called a "value" of the propositional function. In our example
there are four values of the propositional function, e.g., "Bob is hurt", "This bird is hurt", "Emily the rabbit is hurt"
and "y is hurt." A proposition, if it is significant—i.e., if its truth is determinate—has a truth-value of truth or
falsity. If a proposition's truth value is "truth" then the variable's value is said to satisfy the propositional function.
Finally, per Russell's definition, "a class [set] is all objects satisfying some propositional function" (p. 23). Note the
word "all'" -- this is how the contemporary notions of "For all ∀" and "there exists at least one instance ∃" enter the
treatment (p. 15).
To continue the example: Suppose (from outside the mathematics/logic) one determines that the propositions "Bob is
hurt" has a truth value of "falsity", "This bird is hurt" has a truth value of "truth", "Emily the rabbit is hurt" has an
indeterminate truth value because "Emily the rabbit" doesn't exist, and "y is hurt" is ambiguous as to its truth value
because the argument y itself is ambiguous. While the two propositions "Bob is hurt" and "This bird is hurt" are
Function (mathematics) 205

significant (both have truth values), only the value "This bird" of the variable ŷ satisfies' the propositional function
φŷ: "ŷ is hurt". When one goes to form the class α: φŷ: "ŷ is hurt", only "This bird" is included, given the four values
"Bob", "This bird", "Emily the rabbit" and "y" for variable ŷ and their respective truth-values: falsity, truth,
indeterminate, ambiguous.
Russell defines functions of propositions with arguments, and truth-functions f(p).[36] For example, suppose one
were to form the "function of propositions with arguments" p1: "NOT(p) AND q" and assign its variables the values
of p: "Bob is hurt" and q: "This bird is hurt". (We are restricted to the logical linkages NOT, AND, OR and
IMPLIES, and we can only assign "significant" propositions to the variables p and q). Then the "function of
propositions with arguments" is p1: NOT("Bob is hurt") AND "This bird is hurt"). To determine the truth value of
this "function of propositions with arguments" we submit it to a "truth function", e.g., f(p1): f( NOT("Bob is hurt")
AND "This bird is hurt") ), which yields a truth value of "truth".
The notion of a "many-one" functional relation": Russell first discusses the notion of "identity", then defines a
descriptive function (pages 30ff) as the unique value ιx that satisfies the (2-variable) propositional function (i.e.,
"relation") φŷ.
N.B. The reader should be warned here that the order of the variables are reversed! y is the independent
variable and x is the dependent variable, e.g., x = sin(y).[37]
Russell symbolizes the descriptive function as "the object standing in relation to y": R'y =DEF (ιx)(x R y). Russell
repeats that "R'y is a function of y, but not a propositional function [sic]; we shall call it a descriptive function. All
the ordinary functions of mathematics are of this kind. Thus in our notation "sin y" would be written " sin 'y ", and
"sin" would stand for the relation sin 'y has to y".[38]

Hardy 1908
Hardy 1908, pp. 26–28 defined a function as a relation between two variables x and y such that "to some values of x
at any rate correspond values of y." He neither required the function to be defined for all values of x nor to associate
each value of x to a single value of y. This broad definition of a function encompasses more relations than are
ordinarily considered functions in contemporary mathematics.

The Formalist's "function": David Hilbert's axiomatization of mathematics (1904-1927)


David Hilbert set himself the goal of "formalizing" classical mathematics "as a formal axiomatic theory, and this
theory shall be proved to be consistent, i.e., free from contradiction" .[39] In his 1927 The Foundations of
Mathematics Hilbert frames the notion of function in terms of the existence of an "object":
13. A(a) --> A(ε(A)) Here ε(A) stands for an object of which the proposition A(a) certainly holds if it holds of
any object at all; let us call ε the logical ε-function".[40] [The arrow indicates “implies”.]
Hilbert then illustrates the three ways how the ε-function is to be used, firstly as the "for all" and "there exists"
notions, secondly to represent the "object of which [a proposition] holds", and lastly how to cast it into the choice
function.
Recursion theory and computability: But the unexpected outcome of Hilbert's and his student Bernays's effort was
failure; see Gödel's incompleteness theorems of 1931. At about the same time, in an effort to solve Hilbert's
Entscheidungsproblem, mathematicians set about to define what was meant by an "effectively calculable function"
(Alonzo Church 1936), i.e., "effective method" or "algorithm", that is, an explicit, step-by-step procedure that would
succeed in computing a function. Various models for algorithms appeared, in rapid succession, including Church's
lambda calculus (1936), Stephen Kleene's μ-recursive functions(1936) and Allan Turing's (1936-7) notion of
replacing human "computers" with utterly-mechanical "computing machines" (see Turing machines). It was shown
that all of these models could compute the same class of computable functions. Church's thesis holds that this class
of functions exhausts all the number-theoretic functions that can be calculated by an algorithm. The outcomes of
these efforts were vivid demonstrations that, in Turing's words, "there can be no general process for determining
Function (mathematics) 206

whether a given formula U of the functional calculus K [Principia Mathematica] is provable";[41] see more at
Independence (mathematical logic) and Computability theory.

Development of the set-theoretic definition of "function"


Set theory began with the work of the logicians with the notion of "class" (modern "set") for example De Morgan
(1847), Jevons (1880), Venn 1881, Frege 1879 and Peano (1889). It was given a push by Georg Cantor's attempt to
define the infinite in set-theoretic treatment(1870–1890) and a subsequent discovery of an antinomy (contradiction,
paradox) in this treatment (Cantor's paradox), by Russell's discovery (1902) of an antinomy in Frege's 1879
(Russell's paradox), by the discovery of more antinomies in the early 20th century (e.g., the 1897 Burali-Forti
paradox and the 1905 Richard paradox), and by resistance to Russell's complex treatment of logic[42] and dislike of
his axiom of reducibility[43] (1908, 1910–1913) that he proposed as a means to evade the antinomies.

Russell's paradox 1902


In 1902 Russell sent a letter to Frege pointing out that Frege's 1879 Begriffsschrift allowed a function to be an
argument of itself: "On the other hand, it may also be that the argument is determinate and the function indeterminate
. . .."[44] From this unconstrained situation Russell was able to form a paradox:
"You state ... that a function, too, can act as the indeterminate element. This I formerly believed, but now this
view seems doubtful to me because of the following contradiction. Let w be the predicate: to be a predicate
that cannot be predicated of itself. Can w be predicated of itself?"[45]
Frege responded promptly that "Your discovery of the contradiction caused me the greatest surprise and, I would
almost say, consternation, since it has shaken the basis on which I intended to build arithmetic".[46]
From this point forward development of the foundations of mathematics became an exercise in how to dodge
"Russell's paradox", framed as it was in "the bare [set-theoretic] notions of set and element".[47]

Zermelo's set theory (1908) modified by Skolem (1922)


The notion of "function" appears as Zermelo's axiom III—the Axiom of Separation (Axiom der Aussonderung). This
axiom constrains us to use a propositional function Φ(x) to "separate" a subset MΦ from a previously formed set M:
"AXIOM III. (Axiom of separation). Whenever the propositional function Φ(x) is definite for all elements of a
set M, M possesses a subset MΦ containing as elements precisely those elements x of M for which Φ(x) is
true".[48]
As there is no universal set—sets originate by way of Axiom II from elements of (non-set) domain B -- "...this
disposes of the Russell antinomy so far as we are concerned".[49] But Zermelo's "definite criterion" is imprecise, and
is fixed by Weyl, Fraenkel, Skolem, and von Neumann.[50]
In fact Skolem in his 1922 referred to this "definite criterion" or "property" as a "definite proposition":
"... a finite expression constructed from elementary propositions of the form a ε b or a = b by means of the five
operations [logical conjunction, disjunction, negation, universal quantification, and existential
quantification].[51]
van Heijenoort summarizes:
"A property is definite in Skolem's sense if it is expressed . . . by a well-formed formula in the simple predicate
calculus of first order in which the sole predicate constants are ε and possibly, =. ... Today an axiomatization
of set theory is usually embedded in a logical calculus, and it is Weyl's and Skolem's approach to the
formulation of the axiom of separation that is generally adopted.[52]
In this quote the reader may observe a shift in terminology: nowhere is mentioned the notion of "propositional
function", but rather one sees the words "formula", "predicate calculus", "predicate", and "logical calculus." This
shift in terminology is discussed more in the section that covers "function" in contemporary set theory.
Function (mathematics) 207

The Wiener–Hausdorff–Kuratowski "ordered pair" definition 1914–1921


The history of the notion of "ordered pair" is not clear. As noted above, Frege (1879) proposed an intuitive ordering
in his definition of a two-argument function Ψ(A, B). Norbert Wiener in his 1914 (see below) observes that his own
treatment essentially "revert(s) to Schröder's treatment of a relation as a class of ordered couples".[53] Russell (1903)
considered the definition of a relation (such as Ψ(A, B)) as a "class of couples" but rejected it:
"There is a temptation to regard a relation as definable in extension as a class of couples. This is the formal
advantage that it avoids the necessity for the primitive proposition asserting that every couple has a relation
holding between no other pairs of terms. But it is necessary to give sense to the couple, to distinguish the
referent [domain] from the relatum [converse domain]: thus a couple becomes essentially distinct from a class
of two terms, and must itself be introduced as a primitive idea. . . . It seems therefore more correct to take an
intensional view of relations, and to identify them rather with class-concepts than with classes."[54]
By 1910-1913 and Principia Mathematica Russell had given up on the requirement for an intensional definition of a
relation, stating that "mathematics is always concerned with extensions rather than intensions" and "Relations, like
classes, are to be taken in extension".[55] To demonstrate the notion of a relation in extension Russell now embraced
the notion of ordered couple: "We may regard a relation ... as a class of couples ... the relation determined by φ(x, y)
is the class of couples (x, y) for which φ(x, y) is true".[56] In a footnote he clarified his notion and arrived at this
definition:
"Such a couple has a sense, i.e., the couple (x, y) is different from the couple (y, x) unless x = y. We shall call it
a "couple with sense," ... it may also be called an ordered couple.[56]
But he goes on to say that he would not introduce the ordered couples further into his "symbolic treatment"; he
proposes his "matrix" and his unpopular axiom of reducibility in their place.
An attempt to solve the problem of the antinomies led Russell to propose his "doctrine of types" in an appendix B of
his 1903 The Principles of Mathematics.[57] In a few years he would refine this notion and propose in his 1908 The
Theory of Types two axioms of reducibility, the purpose of which were to reduce (single-variable) propositional
functions and (dual-variable) relations to a "lower" form (and ultimately into a completely extensional form); he and
Alfred North Whitehead would carry this treatment over to Principia Mathematica 1910-1913 with a further
refinement called "a matrix".[58] The first axiom is *12.1; the second is *12.11. To quote Wiener the second axiom
*12.11 "is involved only in the theory of relations".[59] Both axioms, however, were met with skepticism and
resistance; see more at Axiom of reducibility. By 1914 Norbert Wiener, using Whitehead and Russell's symbolism,
eliminated axiom *12.11 (the "two-variable" (relational) version of the axiom of reducibility) by expressing a
relation as an ordered pair "using the null set. At approximately the same time, Hausdorff (1914, p. 32) gave the
definition of the ordered pair (a, b) as { {a,1}, {b, 2} }. A few years later Kuratowski (1921) offered a definition that
has been widely used ever since, namely { {a, b}, {a} }".[60] As noted by Suppes (1960) "This definition . . . was
historically important in reducing the theory of relations to the theory of sets.[61]
Observe that while Wiener "reduced" the relational *12.11 form of the axiom of reducibility he did not reduce nor
otherwise change the propositional-function form *12.1; indeed he declared this "essential to the treatment of
identity, descriptions, classes and relations".[62]

Schönfinkel's notion of "function" as a many-one "correspondence" 1924


Where exactly the general notion of "function" as a many-one relationship derives from is unclear. Russell in his
1920 Introduction to Mathematical Philosophy states that "It should be observed that all mathematical functions
result form one-many [sic -- contemporary usage is many-one] relations . . . Functions in this sense are descriptive
functions".[63] A reasonable possibility is the Principia Mathematica notion of "descriptive function" -- R 'y =DEF
(ιx)(x R y): "the singular object that has a relation R to y". Whatever the case, by 1924, Moses Schonfinkel expressed
the notion, claiming it to be "well known":
Function (mathematics) 208

"As is well known, by function we mean in the simplest case a correspondence between the elements of some
domain of quantities, the argument domain, and those of a domain of function values ... such that to each
argument value there corresponds at most one function value".[64]
According to Willard Quine, Schönfinkel's 1924 "provide[s] for ... the whole sweep of abstract set theory. The crux
of the matter is that Schönfinkel lets functions stand as arguments. ¶ For Schönfinkel, substantially as for Frege,
classes are special sorts of functions. They are propositional functions, functions whose values are truth values. All
functions, propositional and otherwise, are for Schönfinkel one-place functions".[65] Remarkably, Schönfinkel
reduces all mathematics to an extremely compact functional calculus consisting of only three functions: Constancy,
fusion (i.e., composition), and mutual exclusivity. Quine notes that Haskell Curry (1958) carried this work forward
"under the head of combinatory logic".[66]

von Neumann's set theory 1925


By 1925 Abraham Fraenkel (1922) and Thoralf Skolem (1922) had amended Zermelo's set theory of 1908. But von
Neumann was not convinced that this axiomatization could not lead to the antinomies.[67] So he proposed his own
theory, his 1925 An axiomatization of set theory. It explicitly contains a "contemporary", set-theoretic version of the
notion of "function":
"[Unlike Zermelo's set theory] [w]e prefer, however, to axiomatize not "set" but "function". The latter notion
certainly includes the former. (More precisely, the two notions are completely equivalent, since a function can
be regarded as a set of pairs, and a set as a function that can take two values.)".[68]
His axiomatization creates two "domains of objects" called "arguments" (I-objects) and "functions" (II-objects);
where they overlap are the "argument functions" (I-II objects). He introduces two "universal two-variable
operations" -- (i) the operation [x, y]: ". . . read 'the value of the function x for the argument y) and (ii) the operation
(x, y): ". . . (read 'the ordered pair x, y'") whose variables x and y must both be arguments and that itself produces an
argument (x,y)". To clarify the function pair he notes that "Instead of f(x) we write [f,x] to indicate that f, just like x,
is to be regarded as a variable in this procedure". And to avoid the "antinomies of naive set theory, in Russell's first
of all . . . we must forgo treating certain functions as arguments".[69] He adopts a notion from Zermelo to restrict
these "certain functions"[70]

Since 1950

Notion of "function" in contemporary set theory


Both axiomatic and naive forms of Zermelo's set theory as modified by Fraenkel (1922) and Skolem (1922) define
"function" as a relation, define a relation as a set of ordered pairs, and define an ordered pair as a set of two
"dissymetric" sets.
While the reader of Suppes (1960) Axiomatic Set Theory or Halmos (1970) Naive Set Theory observes the use of
function-symbolism in the axiom of separation, e.g., φ(x) (in Suppes) and S(x) (in Halmos), they will see no mention
of "proposition" or even "first order predicate calculus". In their place are "expressions of the object language",
"atomic formulae", "primitive formulae", and "atomic sentences".
Kleene 1952 defines the words as follows: "In word languages, a proposition is expressed by a sentence. Then a
'predicate' is expressed by an incomplete sentence or sentence skeleton containing an open place. For example, "___
is a man" expresses a predicate ... The predicate is a propositional function of one variable. Predicates are often
called 'properties' ... The predicate calculus will treat of the logic of predicates in this general sense of 'predicate', i.e.,
as propositional function".[71]
The reason for the disappearance of the words "propositional function" e.g., in Suppes (1960), and Halmos (1970), is
explained by Alfred Tarski 1946 together with further explanation of the terminology:
Function (mathematics) 209

"An expression such as x is an integer, which contains variables and, on replacement of these variables by
constants becomes a sentence, is called a SENTENTIAL [i.e., propositional cf his index] FUNCTION. But
mathematicians, by the way, are not very fond of this expression, because they use the term "function" with a
different meaning. ... sentential functions and sentences composed entirely of mathematical symbols (and not
words of everyday languange), such as: x + y = 5 are usually referred to by mathematicians as FORMULAE.
In place of "sentential function" we shall sometimes simply say "sentence" --- but only in cases where there is
no danger of any misunderstanding".[72]
For his part Tarski calls the relational form of function a "FUNCTIONAL RELATION or simply a FUNCTION"
.[73] After a discussion of this "functional relation" he asserts that:
"The concept of a function which we are considering now differs essentially from the concepts of a sentential
[propositional] and of a designatory function .... Strictly speaking ... [these] do not belong to the domain of
logic or mathematics; they denote certain categories of expressions which serve to compose logical and
mathematical statements, but they do not denote things treated of in those statements... . The term "function" in
its new sense, on the other hand, is an expression of a purely logical character; it designates a certain type of
things dealt with in logic and mathematics."[74]
See more about "truth under an interpretation" at Alfred Tarski.

Further developments
The idea of structure-preserving functions, or homomorphisms, led to the abstract notion of morphism, the key
concept of category theory. More recently, the concept of functor has been used as an analogue of a function in
category theory.[75]

See also
• List of mathematical functions • Functional decomposition • Parametric equation
• Functional predicate • Functor • Plateau
• Function composition • Generalized function • Proportionality
• Functional • Implicit function • Vertical line test

References
• Anton, Howard (1980), Calculus with Analytical Geometry, Wiley, ISBN 978-0-471-03248-9
• Bartle, Robert G. (1976), The Elements of Real Analysis (2nd ed.), Wiley, ISBN 978-0-471-05464-1
• Husch, Lawrence S. (2001), Visual Calculus [11], University of Tennessee, retrieved 2007-09-27
• Katz, Robert (1964), Axiomatic Analysis, D. C. Heath and Company.
• Ponte, João Pedro (1992), "The history of the concept of function and some educational implications" [76], The
Mathematics Educator 3 (2): 3–8, ISSN 1062-9017
• Thomas, George B.; Finney, Ross L. (1995), Calculus and Analytic Geometry (9th ed.), Addison-Wesley,
ISBN 978-0-201-53174-9
• Youschkevitch, A. P. (1976), "The concept of function up to the middle of the 19th century", Archive for History
of Exact Sciences 16 (1): 37–85, doi:10.1007/BF00348305.
• Monna, A. F. (1972), "The concept of function in the 19th and 20th centuries, in particular with regard to the
discussions between Baire, Borel and Lebesgue", Archive for History of Exact Sciences 9 (1): 57–84,
doi:10.1007/BF00348540.
• Kleiner, Israel (1989), "Evolution of the Function Concept: A Brief Survey" [77], The College Mathematics
Journal (Mathematical Association of America) 20 (4): 282–300, doi:10.2307/2686848.
Function (mathematics) 210

• Ruthing, D. (1984), "Some definitions of the concept of function from Bernoulli, Joh. to Bourbaki, N.",
Mathematical Intelligencer 6 (4): 72–77.
• Dubinsky, Ed; Harel, Guershon (1992), The Concept of Function: Aspects of Epistemology and Pedagogy,
Mathematical Association of America, ISBN 0883850818.
• Malik, M. A. (1980), "Historical and pedagogical aspects of the definition of function", International Journal of
Mathematical Education in Science and Technology 11 (4): 489–492, doi:10.1080/0020739800110404.
• Boole, George (1854), An Investigation into the Laws of Thought on which are founded the Laws of Thought and
Probabilities", Walton and Marberly, London UK; Macmillian and Company, Cambridge UK. Republished as a
googlebook.
• Eves, Howard. (1990), Fundations and Fundamental Concepts of Mathematics: Third Edition, Dover
Publications, Inc. Mineola, NY, ISBN 0-486-69609-X (pbk)
• Frege, Gottlob. (1879), Begriffsschrift: eine der arithmetischen nachgebildete Formelsprache des reinen
Denkens, Halle
• Grattan-Guinness, Ivor and Bornet, Gérard (1997), George Boole: Selected Manuscripts on Logic and its
Philosophy, Springer-Verlag, Berlin, ISBN 3-7643-5456-9 (Berlin...)
• Halmos, Paul R. (1970) Naive Set Theory, Springer-Verlag, New York, ISBN 0-387-90092-6.
• Hardy, Godfrey Harold (1908), A Course of Pure Mathematics, Cambridge University Press (published 1993),
ISBN 978-0-521-09227-2
• Reichenbach, Hans (1947) Elements of Symbolic Logic, Dover Publishing Inc., New York NY, ISBN
0-486-24004-5.
• Russell, Bertrand (1903) The Principles of Mathematics: Vol. 1, Cambridge at the University Press, Cambridge,
UK, republished as a googlebook.
• Russell, Bertrand (1920) Introduction to Mathematical Philosophy (second edition), Dover Publishing Inc., New
York NY, ISBN 0-486-27724-0 (pbk).
• Suppes, Patrick (1960) Axiomatic Set Theory, Dover Publications, Inc, New York NY, ISBN 0-486-61630-4. cf
his Chapter 1 Introduction.
• Tarski, Alfred (1946) Introduction to Logic and to the Methodolgy of Deductive Sciences, republished 1195 by
Dover Publications, Inc., New York, NY ISBN 0-486-28462-x
• Venn, John (1881) Symbolic Logic, Macmillian and Co., London UK. Republished as a googlebook.
• van Heijenoort, Jean (1967, 3rd printing 1976), From Frege to Godel: A Source Book in Mathematical Logic,
1879-1931, Harvard University Press, Cambridge, MA, ISBN 0-674-32449-8 (pbk)
• Gottlob Frege (1879) Begriffsschrift, a formula language, modeled upon that of arithmetic, for pure thought
with commentary by van Heijenoort, pages 1–82
• Giuseppe Peano (1889) The principles of arithmetic, presented by a new method with commentary by van
Heijenoort, pages 83–97
• Bertrand Russell (1902) Letter to Frege with commentary by van Heijenoort, pages 124-125. Wherein Russell
announces his discovery of a "paradox" in Frege's work.
• Gottlob Frege (1902) Letter to Russell with commentary by van Heijenoort, pages 126-128.
• David Hilbert (1904) On the foundations of logic and arithmetic, with commentary by van Heijenoort, pages
129-138.
• Jules Richard (1905) The principles of mathematics and the problem of sets, with commentary by van
Heijenoort, pages 142-144. The Richard paradox.
• Bertrand Russell (1908a) Mathematical logic as based on the theory of types, with commentary by Willard
Quine, pages 150-182.
• Ernst Zermelo (1908) A new proof of the possibility of a well-ordering, with commentary by van Heijenoort,
pages 183-198. Wherein Zermelo rales against Poincaré's (and therefore Russell's) notion of impredicative
definition.
Function (mathematics) 211

• Ernst Zermelo (1908a) Investigations in the foundations of set theory I, with commentary by van Heijenoort,
pages 199-215. Wherein Zermelo attempts to solve Russell's paradox by structuring his axioms to restrict the
universal domain B (from which objects and sets are pulled by definite properties) so that it itself cannot be a
set, i.e., his axioms disallow a universal set.
• Norbert Wiener (1914) A simplification of the logic of relations, with commentary by van Heijenoort, pages
224-227
• Thoralf Skolem (1922) Some remarks on axiomatized set theory, with commentary by van Heijenoort, pages
290-301. Wherein Skolem defines Zermelo's vague "definite property".
• Moses Schönfinkel (1924) On the building blocks of mathematical logic, with commentary by Willard Quine,
pages 355-366. The start of combinatory logic.
• John von Neumann (1925) An axiomatization of set theory, with commentary by van Heijenoort , pages
393-413. Wherein von Neumann creates "classes" as distinct from "sets" (the "classes" are Zermelo's "definite
properties"), and now there is a universal set, etc.
• David Hilbert (1927) The foundations of mathematics by van Heijenoort, with commentary, pages 464-479.
• Whitehead, Alfred North and Russell, Bertrand (1913, 1962 edition), Principia Mathematica to *56, Cambridge
at the University Press, London UK, no ISBN or US card catalog number.

External links
• The Wolfram Functions Site [78] gives formulae and visualizations of many mathematical functions.
• Shodor: Function Flyer [79], interactive Java applet for graphing and exploring functions.
• xFunctions [80], a Java applet for exploring functions graphically.
• Draw Function Graphs [81], online drawing program for mathematical functions.
• Functions [82] from cut-the-knot.
• Function at ProvenMath [83].
• Comprehensive web-based function graphing & evaluation tool [84]

References
[1] The history of the function concept in mathematics (http:/ / www. educ. fc. ul. pt/ docentes/ jponte/ docs-uk/ 92 Ponte (Functions). doc)
J.P.Ponte, 1992
[2] Another short but useful history is found in Eves 1990 pages 234-235
[3] Thompson, S.P; Gardner, M; Calculus Made Easy. 1998. Page 10-11. ISBN 0312185480.
[4] Eves dates Leibniz's first use to the year 1694 and also similarly relates the usage to "as a term to denote any quantity connected with a curve,
such as the coordinates of a point on the curve, the slope of the curve, and so on" (Eves 1990:234).
[5] Eves 1990:234
[6] Eves 1990:235
[7] Eves asserts that Dirichlet "arrived at the following formulation: "[The notion of] a variable is a symbol that represents any one of a set of
numbers; if two variables x and y are so related that whenever a value is assigned to x there is automatically assigned, by some rule or
correspondence, a value to y, then we say y is a (single-valued) function of x. The variable x . . . is called the independent variable and the
variable y is called the dependent variable. The permissible values that x may assume constitute the domain of definition of the function, and
the values taken on by y constitute the range of values of the function . . . it stresses the basic idea of a relationship between two sets of
numbers" Eves 1990:235.
[8] Boole circa 1849 Elementary Treatise on Logic not mathematical including philosophy of mathematical reasoning in Grattan-Guiness and
Bornet 1997:40
[9] De Morgan 1847:1
[10] Boole 1848 in Grattan-Guiness and Bornet 1997:1, 2
[11] Boole 1848 in Grattan-Guiness and Bornet 1997:6
[12] Eves 1990:222
[13] Some of this criticism is intense: see the introduction by Willard Quine preceding Russell 1908 Mathematical logic as based on the theory of
types in van Heijenoort 1967:151. See also von Neumann's introduction to his 1925 Axiomatization of Set Theory in van Heijenoort 1967:395
[14] Boole 1854:86
[15] cf Boole 1854:31-34. Boole discusses this "special law" with its two algebraic roots x = 0 or 1, on page 37.
Function (mathematics) 212

[16] Although he gives others credit, cf Venn 1881:6


[17] Venn 1881: 86-87
[18] cf van Heijenoort's introduction to Peano 1889 in van Heijenoort 1967. For most of his logical symbolism and notions of propositions Peano
credits "many writers, especially Boole". In footnote 1 he credits Boole 1847, 1848, 1854, Schröder 1877, Peirce 1880, Jevons 1883, MacColl
1877, 1878, 1878a, 1880; cf van Heijenoort 1967:86).
[19] Frege 1879 in van Heijenoort 1967:7
[20] Frege's exact words are "expressed in our formula language" and "expression", cf Frege 1879 in van Heijenoort 1967:21-22.
[21] This example is from Frege 1879 in van Heijenoort 1967:21-22
[22] Frege 1879 in van Heijenoort 1967:21-22
[23] Frege cautions that the function will have "argument places" where the argument should be placed as distinct from other places where the
same sign might appear. But he does not go deeper into how to signify these positions and Russell 1903 observes this.
[24] Gottlob Frege (1879) in van Heijenoort 1967:21-24
[25] "...Peano intends to cover much more ground than Frege does in his Begriffsschrift and his subsequent works, but he does not till that ground
to any depth comparable to what Frege does in his self-allotted field", van Heijenoort 1967:85
[26] van Heijenoort 1967:89.
[27] van Heijenoort 1967:91.
[28] All symbols used here are from Peano 1889 in van Heijenoort 1967:91).
[29] cf van Heijenoort 1967:91
[30] "In Mathematics, my chief obligations, as is indeed evident, are to Georg Cantor and Professor Peano. If I had become acquainted sooner
with the work of Professor Frege, I should have owed a great deal to him, but as it is I arrived independently at many results which he had
already established", Russell 1903:viii. He also highlights Boole's 1854 Laws of Thought and Ernst Schröder's three volumes of
"non-Peanesque methods" 1890, 1891, and 1895 cf Russell 1903:10
[31] Russell 1903:505
[32] Russell 1903:5-6
[33] Russell 1903:7
[34] Russell 1903:19
[35] Russell 1910-1913:15
[36] Whitehead and Russell 1910-1913:6, 8 respectively
[37] Something similar appears in Tarski 1946. Tarski refers to a "relational function" as a "ONE-MANY [sic!] or FUNCTIONAL RELATION
or simply a FUNCTION". Tarski comments about this reversal of variables on page 99.
[38] Whitehead and Russell 1910-1913:31. This paper is important enough that van Heijenoort reprinted it as Whitehead and Russell 1910
Incomplete symbols: Descriptions with commentary by W. V. Quine in van Heijenoort 1967:216-223
[39] Kleene 1952:53
[40] Hilbert in van Heijenoort 1967:466
[41] Turing 1936-7 in Martin Davis The Undecidable 1965:145
[42] cf Kleene 1952:45
[43] "The nonprimitive and arbitrary character of this axiom drew forth severe criticism, and much of subsequent refinement of the logistic
program lies in attempts to devise some method of avoiding the disliked axiom of reducibility" Eves 1990:268.
[44] Frege 1879 in van Heijenoort 1967:23
[45] Russell (1902) Letter to Frege in van Heijenoort 1967:124
[46] Frege (1902) Letter to Russell in van Heijenoort 1967:127
[47] van Heijenoort's commentary to Russell's Letter to Frege in van Heijenoort 1967:124
[48] The original uses an Old High German symbol in place of Φ cf Zermelo 1908a in van Heijenoort 1967:202
[49] Zermelo 1908a in van Heijenoort 1967:203
[50] cf van Heijenoort's commentary before Zermelo 1908 Investigations in the foundations of set theory I in van Heijenoort 1967:199
[51] Skolem 1922 in van Heijenoort 1967:292-293
[52] van Heijenoort's introduction to Abraham Fraenkel's The notion "definite" and the independence of the axiom of choice in van Heijenoort
1967:285.
[53] But Wiener offers no date or reference cf Wiener 1914 in van Heijenoort 1967:226
[54] Russell 1903:99
[55] both quotes from Whitehead and Russell 1913:26
[56] Whitehead and Russell 1913:26
[57] Russell 1903:523-529
[58] *12 The Hierarchy of Types and the axiom of Reducibility in Principia Mathematica 1913:161
[59] Wiener 1914 in van Heijenoort 1967:224
[60] commentary by van Heijenoort preceding Norbert Wiener's (1914) A simplification of the logic of relations in van Heijenoort 1967:224.
[61] Suppes 1960:32. This same point appears in van Heijenoort's commentary before Wiener (1914) in van Heijenoort 1967:224.
[62] Wiener 1914 in van Heijeoort 1967:224
[63] Russell 1920:46
Function (mathematics) 213

[64] Schönfinkel (1924) On the building blocks of mathematical logic in van Heijenoort 1967:359
[65] commentary by W. V. Quine preceding Schönfinkel (1924) On the building blocks of mathematical logic in van Heijenoort 1967:356.
[66] cf Curry and Feys 1958; Quine in van Heijenoort 1967:357.
[67] von Neumann's critique of the history observes the split between the logicists (e.g., Russell et. al.) and the set-theorists (e.g., Zermelo et. al.)
and the formalists (e.g., Hilbert), cf von Neumann 1925 in van Heijenoort 1967:394-396.
[68] von Neumann 1925 in van Heijenoort 1967:396
[69] All quotes from von Neumann 1925 in van Heijenoort 1967:397-398
[70] This notion is not easy to summarize; see more at van Heijenoort 1967:397.
[71] Kleene 1952:143-145
[72] Tarski 1946:5
[73] Tarski 1946:98
[74] Tarski 1946:102
[75] John C. Baez; James Dolan (1998). Categorification (http:/ / arxiv. org/ abs/ math/ 9802029). .
[76] http:/ / www. math. tarleton. edu/ Faculty/ Brawner/ 550%20MAED/ History%20of%20functions. pdf
[77] http:/ / jstor. org/ stable/ 2686848
[78] http:/ / functions. wolfram. com/
[79] http:/ / www. shodor. org/ interactivate/ activities/ FunctionFlyer/
[80] http:/ / math. hws. edu/ xFunctions/
[81] http:/ / rechneronline. de/ function-graphs/
[82] http:/ / www. cut-the-knot. org/ do_you_know/ FunctionMain. shtml
[83] http:/ / www. apronus. com/ provenmath/ cartesian. htm
[84] http:/ / sporkforge. com/ math/ fcn_graph_eval. php

Calculus
Calculus (Latin, calculus, a small stone used for counting) is a branch in mathematics focused on limits, functions,
derivatives, integrals, and infinite series. This subject constitutes a major part of modern mathematics education. It
has two major branches, differential calculus and integral calculus, which are related by the fundamental theorem
of calculus. Calculus is the study of change[1] , in the same way that geometry is the study of shape and algebra is the
study of operations and their application to solving equations. A course in calculus is a gateway to other, more
advanced courses in mathematics devoted to the study of functions and limits, broadly called mathematical analysis.
Calculus has widespread applications in science, economics, and engineering and can solve many problems for
which algebra alone is insufficient.
Historically, calculus was called "the calculus of infinitesimals", or "infinitesimal calculus". More generally, calculus
(plural calculi) may refer to any method or system of calculation guided by the symbolic manipulation of
expressions. Some examples of other well-known calculi are propositional calculus, variational calculus, lambda
calculus, pi calculus, and join calculus.
Calculus 214

History

Ancient
The ancient period introduced some of the ideas of integral
calculus, but does not seem to have developed these ideas in a
rigorous or systematic way. Calculating volumes and areas, the
basic function of integral calculus, can be traced back to the
Egyptian Moscow papyrus (c. 1820 BC), in which an Egyptian
successfully calculated the volume of a pyramidal frustum.[2] [3]
From the school of Greek mathematics, Eudoxus (c. 408−355 BC)
used the method of exhaustion, which prefigures the concept of the
limit, to calculate areas and volumes while Archimedes (c.
287−212 BC) developed this idea further, inventing heuristics
which resemble integral calculus.[4] The method of exhaustion was
later reinvented in China by Liu Hui in the 3rd century AD in
order to find the area of a circle[5] . In the 5th century AD, Zu
Chongzhi established a method which would later be called
Cavalieri's principle to find the volume of a sphere.[6]

Medieval
Isaac Newton is one of the most famous contributors to
Around AD 1000, the Islamic mathematician Ibn al-Haytham the development of calculus, with, among other things,
the use of calculus in his laws of motion and
(Alhacen) was the first to derive the formula for the sum of the
gravitation.
fourth powers of an arithmetic progression, using a method that is
readily generalizable to finding the formula for the sum of any
higher integral powers, which he used to perform an integration.[7] In the 11th century, the Chinese polymath Shen
Kuo developed 'packing' equations that dealt with integration. In the 12th century, the Indian mathematician,
Bhāskara II, developed an early derivative representing infinitesimal change, and he described an early form of
Rolle's theorem.[8] Also in the 12th century, the Persian mathematician Sharaf al-Dīn al-Tūsī discovered the
derivative of cubic polynomials, an important result in differential calculus.[9] In the 14th century, Indian
mathematician Madhava of Sangamagrama, along with other mathematician-astronomers of the Kerala school of
astronomy and mathematics, described special cases of Taylor series,[10] which are treated in the text Yuktibhasa.[11]
[12] [13]

Modern
In Europe, the foundational work was a treatise due to Bonaventura Cavalieri, who argued that volumes and areas
should be computed as the sums of the volumes and areas of infinitesimal thin cross-sections. The ideas were similar
to Archimedes' in The Method, but this treatise was lost until the early part of the twentieth century. Cavalieri's work
was not well respected since his methods can lead to erroneous results, and the infinitesimal quantities he introduced
were disreputable at first.
The formal study of calculus combined Cavalieri's infinitesimals with the calculus of finite differences developed in
Europe at around the same time. The combination was achieved by John Wallis, Isaac Barrow, and James Gregory,
the latter two proving the second fundamental theorem of calculus around 1675.
The product rule and chain rule, the notion of higher derivatives, Taylor series, and analytical functions were
introduced by Isaac Newton in an idiosyncratic notation which he used to solve problems of mathematical physics.
In his publications, Newton rephrased his ideas to suit the mathematical idiom of the time, replacing calculations
Calculus 215

with infinitesimals by equivalent geometrical arguments which were considered beyond reproach. He used the
methods of calculus to solve the problem of planetary motion, the shape of the surface of a rotating fluid, the
oblateness of the earth, the motion of a weight sliding on a cycloid, and many other problems discussed in his
Principia Mathematica. In other work, he developed series expansions for functions, including fractional and
irrational powers, and it was clear that he understood the principles of the Taylor series. He did not publish all these
discoveries, and at this time infinitesimal methods were still considered disreputable.
These ideas were systematized into a true calculus of
infinitesimals by Gottfried Wilhelm Leibniz, who was originally
accused of plagiarism by Newton. [14] He is now regarded as an
independent inventor of and contributor to calculus. His
contribution was to provide a clear set of rules for manipulating
infinitesimal quantities, allowing the computation of second and
higher derivatives, and providing the product rule and chain rule,
in their differential and integral forms. Unlike Newton, Leibniz
paid a lot of attention to the formalism—he often spent days
determining appropriate symbols for concepts.

Leibniz and Newton are usually both credited with the invention
of calculus. Newton was the first to apply calculus to general
physics and Leibniz developed much of the notation used in
calculus today. The basic insights that both Newton and Leibniz
provided were the laws of differentiation and integration, second
Gottfried Wilhelm Leibniz was originally accused of and higher derivatives, and the notion of an approximating
plagiarizing Sir Isaac Newton's unpublished work (only polynomial series. By Newton's time, the fundamental theorem of
in Britain, not in continental Europe), but is now calculus was known.
regarded as an independent inventor of and contributor
to calculus. When Newton and Leibniz first published their results, there was
great controversy over which mathematician (and therefore which
country) deserved credit. Newton derived his results first, but Leibniz published first. Newton claimed Leibniz stole
ideas from his unpublished notes, which Newton had shared with a few members of the Royal Society. This
controversy divided English-speaking mathematicians from continental mathematicians for many years, to the
detriment of English mathematics. A careful examination of the papers of Leibniz and Newton shows that they
arrived at their results independently, with Leibniz starting first with integration and Newton with differentiation.
Today, both Newton and Leibniz are given credit for developing calculus independently. It is Leibniz, however, who
gave the new discipline its name. Newton called his calculus "the science of fluxions".

Since the time of Leibniz and Newton, many mathematicians have contributed to the continuing development of
calculus. In the 19th century, calculus was put on a much more rigorous footing by mathematicians such as Cauchy,
Riemann, and Weierstrass (see (ε, δ)-definition of limit). It was also during this period that the ideas of calculus were
generalized to Euclidean space and the complex plane. Lebesgue generalized the notion of the integral so that
virtually any function has an integral, while Laurent Schwartz extended differentiation in much the same way.
Calculus is a ubiquitous topic in most modern high schools and universities around the world.[15]
Calculus 216

Significance
While some of the ideas of calculus were developed earlier in Egypt, Greece, China, India, Iraq, Persia, and Japan,
the modern use of calculus began in Europe, during the 17th century, when Isaac Newton and Gottfried Wilhelm
Leibniz built on the work of earlier mathematicians to introduce its basic principles. The development of calculus
was built on earlier concepts of instantaneous motion and area underneath curves.
Applications of differential calculus include computations involving velocity and acceleration, the slope of a curve,
and optimization. Applications of integral calculus include computations involving area, volume, arc length, center
of mass, work, and pressure. More advanced applications include power series and Fourier series. Calculus can be
used to compute the trajectory of a shuttle docking at a space station or the amount of snow in a driveway.
Calculus is also used to gain a more precise understanding of the nature of space, time, and motion. For centuries,
mathematicians and philosophers wrestled with paradoxes involving division by zero or sums of infinitely many
numbers. These questions arise in the study of motion and area. The ancient Greek philosopher Zeno gave several
famous examples of such paradoxes. Calculus provides tools, especially the limit and the infinite series, which
resolve the paradoxes.

Foundations
In mathematics, foundations refers to the rigorous development of a subject from precise axioms and definitions.
Working out a rigorous foundation for calculus occupied mathematicians for much of the century following Newton
and Leibniz and is still to some extent an active area of research today.
There is more than one rigorous approach to the foundation of calculus. The usual one today is via the concept of
limits defined on the continuum of real numbers. An alternative is nonstandard analysis, in which the real number
system is augmented with infinitesimal and infinite numbers, as in the original Newton-Leibniz conception. The
foundations of calculus are included in the field of real analysis, which contains full definitions and proofs of the
theorems of calculus as well as generalizations such as measure theory and distribution theory.

Principles

Limits and infinitesimals


Calculus is usually developed by manipulating very small quantities. Historically, the first method of doing so was
by infinitesimals. These are objects which can be treated like numbers but which are, in some sense, "infinitely
small". An infinitesimal number dx could be greater than 0, but less than any number in the sequence 1, 1/2, 1/3, ...
and less than any positive real number. Any integer multiple of an infinitesimal is still infinitely small, i.e.,
infinitesimals do not satisfy the Archimedean property. From this point of view, calculus is a collection of techniques
for manipulating infinitesimals. This approach fell out of favor in the 19th century because it was difficult to make
the notion of an infinitesimal precise. However, the concept was revived in the 20th century with the introduction of
non-standard analysis and smooth infinitesimal analysis, which provided solid foundations for the manipulation of
infinitesimals.
In the 19th century, infinitesimals were replaced by limits. Limits describe the value of a function at a certain input
in terms of its values at nearby input. They capture small-scale behavior, just like infinitesimals, but use the ordinary
real number system. In this treatment, calculus is a collection of techniques for manipulating certain limits.
Infinitesimals get replaced by very small numbers, and the infinitely small behavior of the function is found by
taking the limiting behavior for smaller and smaller numbers. Limits are the easiest way to provide rigorous
foundations for calculus, and for this reason they are the standard approach.
Calculus 217

Differential calculus
Differential calculus is the study of the
definition, properties, and applications of
the derivative of a function. The process of
finding the derivative is called
differentiation. Given a function and a point
in the domain, the derivative at that point is
a way of encoding the small-scale behavior
of the function near that point. By finding
the derivative of a function at every point in
its domain, it is possible to produce a new
function, called the derivative function or
just the derivative of the original function.
In mathematical jargon, the derivative is a
linear operator which inputs a function and Tangent line at (x, f(x)). The derivative f′(x) of a curve at a point is the slope (rise
over run) of the line tangent to that curve at that point.
outputs a second function. This is more
abstract than many of the processes studied
in elementary algebra, where functions usually input a number and output another number. For example, if the
doubling function is given the input three, then it outputs six, and if the squaring function is given the input three,
then it outputs nine. The derivative, however, can take the squaring function as an input. This means that the
derivative takes all the information of the squaring function—such as that two is sent to four, three is sent to nine,
four is sent to sixteen, and so on—and uses this information to produce another function. (The function it produces
turns out to be the doubling function.)

The most common symbol for a derivative is an apostrophe-like mark called prime. Thus, the derivative of the
function of f is f′, pronounced "f prime." For instance, if f(x) = x2 is the squaring function, then f′(x) = 2x is its
derivative, the doubling function.
If the input of the function represents time, then the derivative represents change with respect to time. For example,
if f is a function that takes a time as input and gives the position of a ball at that time as output, then the derivative of
f is how the position is changing in time, that is, it is the velocity of the ball.
If a function is linear (that is, if the graph of the function is a straight line), then the function can be written y = mx +
b, where:

This gives an exact value for the slope of a straight line. If the graph of the function is not a straight line, however,
then the change in y divided by the change in x varies. Derivatives give an exact meaning to the notion of change in
output with respect to change in input. To be concrete, let f be a function, and fix a point a in the domain of f. (a,
f(a)) is a point on the graph of the function. If h is a number close to zero, then a + h is a number close to a.
Therefore (a + h, f(a + h)) is close to (a, f(a)). The slope between these two points is

This expression is called a difference quotient. A line through two points on a curve is called a secant line, so m is
the slope of the secant line between (a, f(a)) and (a + h, f(a + h)). The secant line is only an approximation to the
behavior of the function at the point a because it does not account for what happens between a and a + h. It is not
possible to discover the behavior at a by setting h to zero because this would require dividing by zero, which is
impossible. The derivative is defined by taking the limit as h tends to zero, meaning that it considers the behavior of f
Calculus 218

for all small values of h and extracts a consistent value for the case when h equals zero:

Geometrically, the derivative is the slope of the tangent line to the graph of f at a. The tangent line is a limit of secant
lines just as the derivative is a limit of difference quotients. For this reason, the derivative is sometimes called the
slope of the function f.
Here is a particular example, the derivative of the squaring function at the input 3. Let f(x) = x2 be the squaring
function.

The derivative f′(x) of a curve at a point is the slope of the line tangent to that curve
at that point. This slope is determined by considering the limiting value of the
slopes of secant lines. Here the function involved (drawn in red) is f(x) = x3 − x.
The tangent line (in green) which passes through the point (−3/2, −15/8) has a
slope of 23/4. Note that the vertical and horizontal scales in this image are
different.

The slope of tangent line to the squaring function at the point (3,9) is 6, that is to say, it is going up six times as fast
as it is going to the right. The limit process just described can be performed for any point in the domain of the
squaring function. This defines the derivative function of the squaring function, or just the derivative of the squaring
function for short. A similar computation to the one above shows that the derivative of the squaring function is the
doubling function.
Calculus 219

Leibniz notation
A common notation, introduced by Leibniz, for the derivative in the example above is

In an approach based on limits, the symbol dy/dx is to be interpreted not as the quotient of two numbers but as a
shorthand for the limit computed above. Leibniz, however, did intend it to represent the quotient of two
infinitesimally small numbers, dy being the infinitesimally small change in y caused by an infinitesimally small
change dx applied to x. We can also think of d/dx as a differentiation operator, which takes a function as an input and
gives another function, the derivative, as the output. For example:

In this usage, the dx in the denominator is read as "with respect to x". Even when calculus is developed using limits
rather than infinitesimals, it is common to manipulate symbols like dx and dy as if they were real numbers; although
it is possible to avoid such manipulations, they are sometimes notationally convenient in expressing operations such
as the total derivative.

Integral calculus
Integral calculus is the study of the definitions, properties, and applications of two related concepts, the indefinite
integral and the definite integral. The process of finding the value of an integral is called integration. In technical
language, integral calculus studies two related linear operators.
The indefinite integral is the antiderivative, the inverse operation to the derivative. F is an indefinite integral of f
when f is a derivative of F. (This use of upper- and lower-case letters for a function and its indefinite integral is
common in calculus.)
The definite integral inputs a function and outputs a number, which gives the area between the graph of the input
and the x-axis. The technical definition of the definite integral is the limit of a sum of areas of rectangles, called a
Riemann sum.
A motivating example is the distances traveled in a given time.

If the speed is constant, only multiplication is needed, but if the speed changes, then we need a more powerful
method of finding the distance. One such method is to approximate the distance traveled by breaking up the time into
many short intervals of time, then multiplying the time elapsed in each interval by one of the speeds in that interval,
and then taking the sum (a Riemann sum) of the approximate distance traveled in each interval. The basic idea is that
if only a short time elapses, then the speed will stay more or less the same. However, a Riemann sum only gives an
approximation of the distance traveled. We must take the limit of all such Riemann sums to find the exact distance
traveled.
Calculus 220

If f(x) in the diagram on the left represents speed


as it varies over time, the distance traveled
(between the times represented by a and b) is the
area of the shaded region s.
To approximate that area, an intuitive method
would be to divide up the distance between a
and b into a number of equal segments, the
length of each segment represented by the
symbol Δx. For each small segment, we can
choose one value of the function f(x). Call that
value h. Then the area of the rectangle with base
Δx and height h gives the distance (time Δx
multiplied by speed h) traveled in that segment.
Associated with each segment is the average
value of the function above it, f(x)=h. The sum
Integration can be thought of as measuring the area under a curve, defined by of all such rectangles gives an approximation of
f(x), between two points (here a and b). the area between the axis and the curve, which is
an approximation of the total distance traveled.
A smaller value for Δx will give more rectangles and in most cases a better approximation, but for an exact answer
we need to take a limit as Δx approaches zero.

The symbol of integration is , an elongated S (the S stands for "sum"). The definite integral is written as:

and is read "the integral from a to b of f-of-x with respect to x." The Leibniz notation dx is intended to suggest
dividing the area under the curve into an infinite number of rectangles, so that their width Δx becomes the
infinitesimally small dx. In a formulation of the calculus based on limits, the notation

is to be understood as an operator that takes a function as an input and gives a number, the area, as an output; dx is
not a number, and is not being multiplied by f(x).
The indefinite integral, or antiderivative, is written:

Functions differing by only a constant have the same derivative, and therefore the antiderivative of a given function
is actually a family of functions differing only by a constant. Since the derivative of the function y = x² + C, where C
is any constant, is y′ = 2x, the antiderivative of the latter is given by:

An undetermined constant like C in the antiderivative is known as a constant of integration.


Calculus 221

Fundamental theorem
The fundamental theorem of calculus states that differentiation and integration are inverse operations. More
precisely, it relates the values of antiderivatives to definite integrals. Because it is usually easier to compute an
antiderivative than to apply the definition of a definite integral, the Fundamental Theorem of Calculus provides a
practical way of computing definite integrals. It can also be interpreted as a precise statement of the fact that
differentiation is the inverse of integration.
The Fundamental Theorem of Calculus states: If a function f is continuous on the interval [a, b] and if F is a function
whose derivative is f on the interval (a, b), then

Furthermore, for every x in the interval (a, b),

This realization, made by both Newton and Leibniz, who based their results on earlier work by Isaac Barrow, was
key to the massive proliferation of analytic results after their work became known. The fundamental theorem
provides an algebraic method of computing many definite integrals—without performing limit processes—by
finding formulas for antiderivatives. It is also a prototype solution of a differential equation. Differential equations
relate an unknown function to its derivatives, and are ubiquitous in the sciences.

Applications
Calculus is used in every branch of the physical sciences, actuarial
science, computer science, statistics, engineering, economics, business,
medicine, demography, and in other fields wherever a problem can be
mathematically modeled and an optimal solution is desired. It allows
one to go from (non-constant) rates of change to the total change or
vice versa, and many times in studying a problem we know one and are
trying to find the other.

Physics makes particular use of calculus; all concepts in classical


mechanics and electromagnetism are interrelated through calculus. The The logarithmic spiral of the Nautilus shell is a
mass of an object of known density, the moment of inertia of objects, classical image used to depict the growth and
as well as the total energy of an object within a conservative field can change related to calculus

be found by the use of calculus. An example of the use of calculus in


mechanics is Newton's second law of motion: historically stated it expressly uses the term "rate of change" which
refers to the derivative saying The rate of change of momentum of a body is equal to the resultant force acting on
the body and is in the same direction. Commonly expressed today as Force = Mass × acceleration, it involves
differential calculus because acceleration is the time derivative of velocity or second time derivative of trajectory or
spatial position. Starting from knowing how an object is accelerating, we use calculus to derive its path.

Maxwell's theory of electromagnetism and Einstein's theory of general relativity are also expressed in the language
of differential calculus. Chemistry also uses calculus in determining reaction rates and radioactive decay. In biology,
population dynamics starts with reproduction and death rates to model population changes.
Calculus can be used in conjunction with other mathematical disciplines. For example, it can be used with linear
algebra to find the "best fit" linear approximation for a set of points in a domain. Or it can be used in probability
theory to determine the probability of a continuous random variable from an assumed density function. In analytic
geometry, the study of graphs of functions, calculus is used to find high points and low points (maxima and minima),
slope, concavity and inflection points.
Calculus 222

Green's Theorem, which gives the relationship between a line integral around a simple closed curve C and a double
integral over the plane region D bounded by C, is applied in an instrument known as a planimeter which is used to
calculate the area of a flat surface on a drawing. For example, it can be used to calculate the amount of area taken up
by an irregularly shaped flower bed or swimming pool when designing the layout of a piece of property.
In the realm of medicine, calculus can be used to find the optimal branching angle of a blood vessel so as to
maximize flow. From the decay laws for a particular drug's elimination from the body, it's used to derive dosing
laws. In nuclear medicine, it's used to build models of radiation transport in targeted tumor therapies.
In economics, calculus allows for the determination of maximal profit by providing a way to easily calculate both
marginal cost and marginal revenue.
Calculus is also used to find approximate solutions to equations; in practice it's the standard way to solve differential
equations and do root finding in most applications. Examples are methods such as Newton's method, fixed point
iteration, and linear approximation. For instance, spacecraft use a variation of the Euler method to approximate
curved courses within zero gravity environments.

See also

Lists
• List of differentiation identities
• List of calculus topics
• Publications in calculus
• Table of integrals

Related topics
• Calculus of finite differences
• Calculus with polynomials
• Complex analysis
• Differential equation
• Differential geometry
• Elementary calculus
• Fourier series
• Integral equation
• Mathematical analysis
• Mathematics
• Multivariable calculus
• Non-classical analysis
• Non-standard analysis
• Non-standard calculus
• Precalculus (mathematical education)
• Product Integrals
• Stochastic calculus
• Taylor series
• Time-scale calculus
Calculus 223

References

Books
• Larson, Ron, Bruce H. Edwards (2010). "Calculus", 9th ed., Brooks Cole Cengage Learning. ISBN
9780547167022
• McQuarrie, Donald A. (2003). Mathematical Methods for Scientists and Engineers, University Science Books.
ISBN 9781891389245
• Stewart, James (2008). Calculus: Early Transcendentals, 6th ed., Brooks Cole Cengage Learning. ISBN
9780495011668
• Thomas, George B., Maurice D. Weir, Joel Hass, Frank R. Giordano (2008), "Calculus", 11th ed.,
Addison-Wesley. ISBN 0-321-48987-X

Other resources

Further reading
• Courant, Richard ISBN 978-3540650584 Introduction to calculus and analysis 1.
• Edmund Landau. ISBN 0-8218-2830-4 Differential and Integral Calculus, American Mathematical Society.
• Robert A. Adams. (1999). ISBN 978-0-201-39607-2 Calculus: A complete course.
• Albers, Donald J.; Richard D. Anderson and Don O. Loftsgaarden, ed. (1986) Undergraduate Programs in the
Mathematics and Computer Sciences: The 1985-1986 Survey, Mathematical Association of America No. 7.
• John Lane Bell: A Primer of Infinitesimal Analysis, Cambridge University Press, 1998. ISBN 978-0-521-62401-5.
Uses synthetic differential geometry and nilpotent infinitesimals.
• Florian Cajori, "The History of Notations of the Calculus." Annals of Mathematics, 2nd Ser., Vol. 25, No. 1 (Sep.,
1923), pp. 1–46.
• Leonid P. Lebedev and Michael J. Cloud: "Approximating Perfection: a Mathematician's Journey into the World
of Mechanics, Ch. 1: The Tools of Calculus", Princeton Univ. Press, 2004.
• Cliff Pickover. (2003). ISBN 978-0-471-26987-8 Calculus and Pizza: A Math Cookbook for the Hungry Mind.
• Michael Spivak. (September 1994). ISBN 978-0-914098-89-8 Calculus. Publish or Perish publishing.
• Tom M. Apostol. (1967). ISBN 9780471000051 Calculus, Volume 1, One-Variable Calculus with an Introduction
to Linear Algebra. Wiley.
• Tom M. Apostol. (1969). ISBN 9780471000075 Calculus, Volume 2, Multi-Variable Calculus and Linear
Algebra with Applications. Wiley.
• Silvanus P. Thompson and Martin Gardner. (1998). ISBN 978-0-312-18548-0 Calculus Made Easy.
• Mathematical Association of America. (1988). Calculus for a New Century; A Pump, Not a Filter, The
Association, Stony Brook, NY. ED 300 252.
• Thomas/Finney. (1996). ISBN 978-0-201-53174-9 Calculus and Analytic geometry 9th, Addison Wesley.
• Weisstein, Eric W. "Second Fundamental Theorem of Calculus." [16] From MathWorld—A Wolfram Web
Resource.
Calculus 224

Online books
• Crowell, B. (2003). "Calculus" Light and Matter, Fullerton. Retrieved 6 May 2007 from http://www.
lightandmatter.com/calc/calc.pdf [17]
• Garrett, P. (2006). "Notes on first year calculus" University of Minnesota. Retrieved 6 May 2007 from
http://www.math.umn.edu/~garrett/calculus/first_year/notes.pdf [18]
• Faraz, H. (2006). "Understanding Calculus" Retrieved 6 May 2007 from Understanding Calculus, URL http://
www.understandingcalculus.com/ [19] (HTML only)
• Keisler, H. J. (2000). "Elementary Calculus: An Approach Using Infinitesimals" Retrieved 29 August 2010 from
http://www.math.wisc.edu/~keisler/calc.html [23]
• Mauch, S. (2004). "Sean's Applied Math Book" California Institute of Technology. Retrieved 6 May 2007 from
http://www.cacr.caltech.edu/~sean/applied_math.pdf [20]
• Sloughter, Dan (2000). "Difference Equations to Differential Equations: An introduction to calculus". Retrieved
17 March 2009 from http://synechism.org/drupal/de2de/ [21]
• Stroyan, K.D. (2004). "A brief introduction to infinitesimal calculus" University of Iowa. Retrieved 6 May 2007
from http://www.math.uiowa.edu/~stroyan/InfsmlCalculus/InfsmlCalc.htm [24] (HTML only)
• Strang, G. (1991). "Calculus" Massachusetts Institute of Technology. Retrieved 6 May 2007 from http://ocw.
mit.edu/ans7870/resources/Strang/strangtext.htm [22]
• Smith, William V. (2001). "The Calculus" Retrieved 4 July 2008 [23] (HTML only).

External links
• Weisstein, Eric W., "Calculus [24]" from MathWorld.
• Topics on Calculus [25] at PlanetMath.
• Calculus Made Easy (1914) by Silvanus P. Thompson [26] Full text in PDF
• Calculus [27] on In Our Time at the BBC. (listen now [28])
• Calculus.org: The Calculus page [29] at University of California, Davis – contains resources and links to other
sites
• COW: Calculus on the Web [30] at Temple University – contains resources ranging from pre-calculus and
associated algebra
• Earliest Known Uses of Some of the Words of Mathematics: Calculus & Analysis [31]
• Online Integrator (WebMathematica) [32] from Wolfram Research
• The Role of Calculus in College Mathematics [33] from ERICDigests.org
• OpenCourseWare Calculus [34] from the Massachusetts Institute of Technology
• Infinitesimal Calculus [35] – an article on its historical development, in Encyclopaedia of Mathematics, Michiel
Hazewinkel ed. .
• Elements of Calculus I [36] and Calculus II for Business [37], OpenCourseWare from the University of Notre
Dame with activities, exams and interactive applets.
• Calculus for Beginners and Artists [38] by Daniel Kleitman, MIT
• Calculus Problems and Solutions [39] by D. A. Kouba
• Solved problems in calculus [40]
Calculus 225

References
[1] Latorre, Donald R.; Kenelly, John W.; Reed, Iris B.; Biggers, Sherry (2007), Calculus Concepts: An Applied Approach to the Mathematics of
Change (http:/ / books. google. com/ books?id=bQhX-3k0LS8C), Cengage Learning, p. 2, ISBN 0-618-78981-2, , Chapter 1, p 2 (http:/ /
books. google. com/ books?id=bQhX-3k0LS8C& pg=PA2)
[2] There is no exact evidence on how it was done; some, including Morris Kline (Mathematical thought from ancient to modern times Vol. I)
suggest trial and error.
[3] Helmer Aslaksen. Why Calculus? (http:/ / www. math. nus. edu. sg/ aslaksen/ teaching/ calculus. html) National University of Singapore.
[4] Archimedes, Method, in The Works of Archimedes ISBN 978-0-521-66160-7
[5] Dun, Liu; Fan, Dainian; Cohen, Robert Sonné (1966). A comparison of Archimdes' and Liu Hui's studies of circles (http:/ / books. google.
com/ books?id=jaQH6_8Ju-MC). Chinese studies in the history and philosophy of science and technology. 130. Springer. p. 279.
ISBN 0-792-33463-9. ., Chapter , p. 279 (http:/ / books. google. com/ books?id=jaQH6_8Ju-MC& pg=PA279)
[6] Zill, Dennis G.; Wright, Scott; Wright, Warren S. (2009). Calculus: Early Transcendentals (http:/ / books. google. com/
books?id=R3Hk4Uhb1Z0C) (3 ed.). Jones & Bartlett Learning. p. xxvii. ISBN 0-763-75995-3. ., Extract of page 27 (http:/ / books. google.
com/ books?id=R3Hk4Uhb1Z0C& pg=PR27)
[7] Victor J. Katz (1995). "Ideas of Calculus in Islam and India", Mathematics Magazine 68 (3), pp. 163-174.
[8] Ian G. Pearce. Bhaskaracharya II. (http:/ / turnbull. mcs. st-and. ac. uk/ ~history/ Projects/ Pearce/ Chapters/ Ch8_5. html)
[9] J. L. Berggren (1990). "Innovation and Tradition in Sharaf al-Din al-Tusi's Muadalat", Journal of the American Oriental Society 110 (2), pp.
304-309.
[10] "Madhava" (http:/ / www-gap. dcs. st-and. ac. uk/ ~history/ Biographies/ Madhava. html). Biography of Madhava. School of Mathematics
and Statistics University of St Andrews, Scotland. . Retrieved 2006-09-13.
[11] "An overview of Indian mathematics" (http:/ / www-history. mcs. st-andrews. ac. uk/ HistTopics/ Indian_mathematics. html). Indian Maths.
School of Mathematics and Statistics University of St Andrews, Scotland. . Retrieved 2006-07-07.
[12] "Science and technology in free India" (http:/ / www. kerala. gov. in/ keralcallsep04/ p22-24. pdf) (PDF). Government of Kerala — Kerala
Call, September 2004. Prof.C.G.Ramachandran Nair. . Retrieved 2006-07-09.
[13] Charles Whish (1834), "On the Hindu Quadrature of the circle and the infinite series of the proportion of the circumference to the diameter
exhibited in the four Sastras, the Tantra Sahgraham, Yucti Bhasha, Carana Padhati and Sadratnamala", Transactions of the Royal Asiatic
Society of Great Britain and Ireland (Royal Asiatic Society of Great Britain and Ireland) 3 (3): 509–523, doi:10.1017/S0950473700001221,
JSTOR 25581775
[14] Leibniz, Gottfried Wilhelm. The Early Mathematical Manuscripts of Leibniz. Cosimo, Inc., 2008. Page 228. [ Copy (http:/ / books. google.
com/ books?hl=en& lr=& id=7d8_4WPc9SMC& oi=fnd& pg=PA3& dq=Gottfried+ Wilhelm+ Leibniz+ accused+ of+ plagiarism+ by+
Newton& ots=09h9BdTlbE& sig=hu5tNKpBJxHcpj8U3kR_T2bZqrY#v=onepage& q=plagairism& f=false|Online)]
[15] UNESCO-World Data on Education (http:/ / nt5. scbbs. com/ cgi-bin/ om_isapi. dll?clientID=137079235& infobase=iwde. nfo&
softpage=PL_frame)
[16] http:/ / mathworld. wolfram. com/ SecondFundamentalTheoremofCalculus. html
[17] http:/ / www. lightandmatter. com/ calc/ calc. pdf
[18] http:/ / www. math. umn. edu/ ~garrett/ calculus/ first_year/ notes. pdf
[19] http:/ / www. understandingcalculus. com/
[20] http:/ / www. cacr. caltech. edu/ ~sean/ applied_math. pdf
[21] http:/ / synechism. org/ drupal/ de2de/
[22] http:/ / ocw. mit. edu/ ans7870/ resources/ Strang/ strangtext. htm
[23] http:/ / www. math. byu. edu/ ~smithw/ Calculus/
[24] http:/ / mathworld. wolfram. com/ Calculus. html
[25] http:/ / planetmath. org/ encyclopedia/ TopicsOnCalculus. html
[26] http:/ / djm. cc/ library/ Calculus_Made_Easy_Thompson. pdf
[27] http:/ / www. bbc. co. uk/ programmes/ b00mrfwq
[28] http:/ / www. bbc. co. uk/ iplayer/ console/ b00mrfwq/ In_Our_Time_Calculus
[29] http:/ / www. calculus. org
[30] http:/ / cow. math. temple. edu/
[31] http:/ / www. economics. soton. ac. uk/ staff/ aldrich/ Calculus%20and%20Analysis%20Earliest%20Uses. htm
[32] http:/ / integrals. wolfram. com/
[33] http:/ / www. ericdigests. org/ pre-9217/ calculus. htm
[34] http:/ / ocw. mit. edu/ OcwWeb/ Mathematics/ index. htm
[35] http:/ / eom. springer. de/ I/ i050950. htm
[36] http:/ / ocw. nd. edu/ mathematics/ elements-of-calculus-i
[37] http:/ / ocw. nd. edu/ mathematics/ calculus-ii-for-business
[38] http:/ / math. mit. edu/ ~djk/ calculus_beginners/
[39] http:/ / www. math. ucdavis. edu/ ~kouba/ ProblemsList. html
[40] http:/ / calculus. solved-problems. com/
Average 226

Average
In mathematics, an average, or central tendency[1] of a data set is a measure of the "middle" value of the data set.
There are many different descriptive statistics that can be chosen as a measurement of the central tendency of the
data items. These include arithmetic mean, the median and the mode. Other statistical measures such as the standard
deviation and the range are called measures of spread and describe how spread out the data is.
An average is a single value that is meant to typify a list of values. If all the numbers in the list are the same, then
this number should be used. If the numbers are not the same, an easy way to get a representative value from a list is
to randomly pick any number from the list. In the latter case, the average is calculated by combining the values from
the set in a specific way and computing a single number as being the average of the set.
The most common method is the arithmetic mean but there are many other types of central tendency, such as median
(which is used most often when the distribution of the values is skewed with some small numbers of very high
values, as seen with house prices or incomes).[2]

Calculation

Arithmetic mean
If n numbers are given, each number denoted by ai, where i = 1, ..., n, the arithmetic mean is the [sum] of the ai's
divided by n or

The arithmetic mean, often simply called the mean, of two numbers, such as 2 and 8, is obtained by finding a value
A such that 2 + 8 = A + A. One may find that A = (2 + 8)/2 = 5. Switching the order of 2 and 8 to read 8 and 2 does
not change the resulting value obtained for A. The mean 5 is not less than the minimum 2 nor greater than the
maximum 8. If we increase the number of terms in the list for which we want an average, we get, for example, that
the arithmetic mean of 2, 8, and 11 is found by solving for the value of A in the equation 2 + 8 + 11 = A + A + A. One
finds that A = (2 + 8 + 11)/3 = 7.
Changing the order of the three members of the list does not change the result: A = (8 + 11 + 2)/3 = 7 and that 7 is
between 2 and 11. This summation method is easily generalized for lists with any number of elements. However, the
mean of a list of integers is not necessarily an integer. "The average family has 1.7 children" is a jarring way of
making a statement that is more appropriately expressed by "the average number of children in the collection of
families examined is 1.7".

Geometric mean
The geometric mean of n numbers is obtained by multiplying them all together and then taking the nth root. In
algebraic terms, the geometric mean of a1, a2, ..., an is defined as

Geometric mean can be thought of as the antilog of the arithmetic mean of the logs of the numbers.
Example: Geometric mean of 2 and 8 is
Average 227

Harmonic mean
Harmonic mean for a set of numbers a1, a2, ..., an is defined as the reciprocal of the arithmetic mean of the
reciprocals of ai's:

One example where it is useful is calculating the average speed. For example, if the speed for going from point A to
B was 60 km/h, and the speed for returning from B to A was 40 km/h, then the average speed is given by

Inequality concerning AM, GM, and HM


A well known inequality concerning arithmetic, geometric, and harmonic means for any set of positive numbers is

It is easy to remember noting that the alphabetical order of the letters A, G, and H is preserved in the inequality. See
Inequality of arithmetic and geometric means.

Mode and median


The most frequently occurring number in a list is called the mode. The mode of the list (1, 2, 2, 3, 3, 3, 4) is 3. The
mode is not necessarily well defined, the list (1, 2, 2, 3, 3, 5) has the two modes 2 and 3. The mode can be subsumed
under the general method of defining averages by understanding it as taking the list and setting each member of the
list equal to the most common value in the list if there is a most common value. This list is then equated to the
resulting list with all values replaced by the same value. Since they are already all the same, this does not require any
change. The mode is more meaningful and potentially useful if there are many numbers in the list, and the frequency
of the numbers progresses smoothly (e.g., if out of a group of 1000 people, 30 people weigh 61 kg, 32 weigh 62 kg,
29 weigh 63 kg, and all the other possible weights occur less frequently, then 62 kg is the mode).
The mode has the advantage that it can be used with non-numerical data (e.g., red cars are most frequent), while
other averages cannot.
The median is the middle number of the group when they are ranked in order. (If there are an even number of
numbers, the mean of the middle two is taken.)
Thus to find the median, order the list according to its elements' magnitude and then repeatedly remove the pair
consisting of the highest and lowest values until either one or two values are left. If exactly one value is left, it is the
median; if two values, the median is the arithmetic mean of these two. This method takes the list 1, 7, 3, 13 and
orders it to read 1, 3, 7, 13. Then the 1 and 13 are removed to obtain the list 3, 7. Since there are two elements in this
remaining list, the median is their arithmetic mean, (3 + 7)/2 = 5.

Average Percentage Return


The average percentage return is a type of average used in finance. It is an example of a geometric mean. For
example, if we are considering a period of two years, and the investment return in the first year is −10% and the
return in the second year is +60%, then the average percentage return, R, can be obtained by solving the equation: (1
− 10%) × (1 + 60%) = (1 − 0.1) × (1 + 0.6) = (1 + R) × (1 + R). The value of R that makes this equation true is 0.2,
or 20%. Note that changing the order to find the average percentage returns of +60% and −10% gives the same result
as the average percentage returns of −10% and +60%.
This method can be generalized to examples in which the periods are not all of one-year duration. Average
percentage of a set of returns is a variation on the geometric average that provides the intensive property of a return
Average 228

per year corresponding to a list of percentage returns. For example, consider a period of a half of a year for which the
return is −23% and a period of two and one half years for which the return is +13%. The average percentage return
for the combined period is the single year return, R, that is the solution of the following equation: (1 − 0.23)0.5 × (1 +
0.13)2.5 = (1 + R)0.5+2.5, giving an average percentage return R of 0.0600 or 6.00%.

Types
The table of mathematical symbols explains the symbols used below.

Name Equation or description

Arithmetic mean

Median The middle value that separates the higher half from the lower half of the data set

Geometric A rotation invariant extension of the median for points in Rn


median

Mode The most frequent value in the data set

Geometric mean

Harmonic mean

Quadratic mean
(or RMS)

Generalized
mean

Weighted mean

Truncated mean The arithmetic mean of data values after a certain number or proportion of the highest and lowest data values have been
discarded

Interquartile A special case of the truncated mean, using the interquartile range
mean

Midrange

Winsorized mean Similar to the truncated mean, but, rather than deleting the extreme values, they are set equal to the largest and smallest values
that remain

Annualization
Average 229

Solutions to variational problems


Several measures of central tendency can be characterized as solving a variational problem, in the sense of the
calculus of variations, namely minimizing variation from the center. That is, given a measure of statistical
dispersion, one asks for a measure of central tendency that minimizes variation: such that variation from the center is
minimal among all choices of center. In a quip, "dispersion precedes location". In the sense of Lp spaces, the
correspondence is:

dispersion central tendency


Lp

L1 average absolute deviation median

L2 standard deviation mean

L∞ maximum deviation midrange

Thus standard deviation about the mean is lower than standard deviation about any other point, and the maximum
deviation about the midrange is lower than the maximum deviation about any other point. The uniqueness of this
characterization of mean follows from convex optimization. Indeed, for a given (fixed) data set x, the function

represents the dispersion about a constant value c relative to the L2 norm. Because the function ƒ2 is a strictly convex
coercive function, the minimizer exists and is unique.
Note that the median in this sense is not in general unique, and in fact any point between the two central points of a
discrete distribution minimizes average absolute deviation. The dispersion in the L1 norm, given by

is not strictly convex, whereas strict convexity is needed to ensure uniqueness of the minimizer. In spite of this, the
minimizer is unique for the L∞ norm.

Miscellaneous types
Other more sophisticated averages are: trimean, trimedian, and normalized mean.
One can create one's own average metric using generalized f-mean:

where f is any invertible function. The harmonic mean is an example of this using f(x) = 1/x, and the geometric mean
is another, using f(x) = log x. Another example, expmean (exponential mean) is a mean using the function f(x) = ex,
and it is inherently biased towards the higher values. However, this method for generating means is not general
enough to capture all averages. A more general method for defining an average, y, takes any function of a list
g(x1, x2, ..., xn), which is symmetric under permutation of the members of the list, and equates it to the same function
with the value of the average replacing each member of the list: g(x1, x2, ..., xn) = g(y, y, ..., y). This most general
definition still captures the important property of all averages that the average of a list of identical elements is that
element itself. The function g(x1, x2, ..., xn) =x1+x2+ ...+ xn provides the arithmetic mean. The function g(x1, x2, ...,
xn) =x1·x2· ...· xn provides the geometric mean. The function g(x1, x2, ..., xn) =x1−1+x2−1+ ...+ xn−1 provides the
harmonic mean. (See John Bibby (1974) “Axiomatisations of the average and a further generalisation of monotonic
sequences,” Glasgow Mathematical Journal, vol. 15, pp. 63–65.)
Average 230

In data streams
The concept of an average can be applied to a stream of data as well as a bounded set, the goal being to find a value
about which recent data is in some way clustered. The stream may be distributed in time, as in samples taken by
some data acquisition system from which we want to remove noise, or in space, as in pixels in an image from which
we want to extract some property. An easy-to-understand and widely used application of average to a stream is the
simple moving average in which we compute the arithmetic mean of the most recent N data items in the stream. To
advance one position in the stream, we add 1/N times the new data item and subtract 1/N times the data item N
places back in the stream.

Averages of functions
The concept of average can be extended to functions.[3] In calculus, the average value of an integrable function ƒ on
an interval [a,b] is defined by

Etymology
An early meaning (c. 1500) of the word average is "damage sustained at sea". The root is found in Arabic as awar,
in Italian as avaria, in French as avarie and in Dutch as averij. Hence an average adjuster is a person who assesses
an insurable loss.
Marine damage is either particular average, which is borne only by the owner of the damaged property, or general
average, where the owner can claim a proportional contribution from all the parties to the marine venture. The type
of calculations used in adjusting general average gave rise to the use of "average" to mean "arithmetic mean".
However, according to the Oxford English Dictionary, the earliest usage in English (1489 or earlier) appears to be an
old legal term for a tenant's day labour obligation to a sheriff, probably anglicised from "avera" found in the English
Domesday Book (1085). This pre-existing term thus lay to hand when an equivalent for avarie was wanted.

References
• Hardy, G.H.; Littlewood, J.E.; Pólya, G. (1988), Inequalities (2nd ed.), Cambridge University Press,
ISBN 978-0521358804

External links
• Median as a weighted arithmetic mean of all Sample Observations [4]
• Calculations and comparison between arithmetic and geometric mean of two values [2]

References
[1] In statistics, the term central tendency is used in some fields of empirical research to refer to what statisticians sometimes call "location".
[2] An axiomatic approach to averages is provided by John Bibby (1974) "Axiomatisations of the average and a further generalization of
monotonic sequences", Glasgow Mathematical Journal, vol. 15, pp. 63–65.
[3] G. H. Hardy, J. E. Littlewood, and G. Pólya. Inequalities (2nd ed.), Cambridge University Press, ISBN 978-0521358804, 1988.
[4] http:/ / economicsbulletin. vanderbilt. edu/ 2004/ volume3/ EB-04C10011A. pdf
Article Sources and Contributors 231

Article Sources and Contributors


Arithmetic mean  Source: http://en.wikipedia.org/w/index.php?oldid=386193838  Contributors: 142.165.116.xxx, 217.5.141.xxx, 4.54.210.xxx, AGToth, Adeliine, AdjustShift, Adoniscik,
Ahruman, Alai, Amirab, Anclation, Anonymous Dissident, Arcadian, Arienh4, Arithmonic, AugPi, AxelBoldt, BenFrantzDale, Bennybp, Berland, Bh3u4m, Bobo192, BozMo, Bwechner, Calvin
1998, Can't sleep, clown will eat me, Chalome, Charles Matthews, Ciemo, Conversion script, Cronholm144, Cryptfiend64, Da monster under your bed, Dark Charles, Den fjättrade ankan,
DerHexer, Derek Ross, Dhanish007, Dick Beldin, Digitat, Dmcq, Dmn, Epbr123, FJPB, Fresheneesz, Frungi, G716, GPHemsley, Gap, Giftlite, Gracenotes, Graham87, HenningThielemann, Igny,
Impdog, IntrplnetSarah, Iridescent, Isnow, J.delanoy, JForget, Jamesooders, Jfitzg, Jitse Niesen, Jleedev, Jmath666, Kaboldy, LC, Larry Sanger, Lendu, Lindberg G Williams Jr, MER-C,
Maksim-e, MarkSweep, Markhurd, Mejor Los Indios, Melcombe, Mgsloan, Michael Hardy, Mikael Häggström, Mindfrieze, Mormegil, MrOllie, Mschlindwein, Musiphil, Naki, Nanshu,
NawlinWiki, Nbarth, Nemhun, Nneonneo, Nsaa, Octahedron80, Oleg Alexandrov, Oli Filth, Otisjimmy1, Paul August, Poor Yorick, Qwfp, RA0808, RaitisMath, Raoulharris, Rbj, Revipm, Rich
Farmbrough, Rickproser, Rjwilmsi, Rob Hooft, Rocket71048576, Ryguasu, SUNNY.SALIL, SaltyBoatr, Sannse, Shirulashem, Snoyes, SpeedyGonsales, Spoon!, TPK, Thingg, Timwi,
TomasBat, Tomi, Tommy2010, Triskell, WadeSimMiser, Waggers, Warman2100, WookieInHeat, Youandme, Youssefsan, ZeroOne, Zigger, Лев Дубовой, 193 anonymous edits

Statistics  Source: http://en.wikipedia.org/w/index.php?oldid=387289244  Contributors: 16@r, 2005, 205.188.199.xxx, 217.70.229.xxx, 3mta3, ABF, AGToth, APH, Acerperi, Adamjslund,
Addshore, Adoniscik, Afv2006, Agatecat2700, Ahoerstemeier, Airumel, Aitambong, Aitias, Alansohn, Aldaron, Alex43223, Alexius08, AllenDowney, Amazins490, Amire80, Ampre, Ancheta
Wis, AndriuZ, Andycjp, Animum, Anomenat, Anonymi, Antandrus, Anthony Appleyard, Antonwg, Aomarks, Ap, Apdevries, Apollo, Arauzo, Archimerged, Arfgab, Arodb, Arpabr, Art LaPella,
Art10, Asarko, AustinZ, Avenue, Avicennasis, Barneca, Beaker342, Beetstra, Ben-Zin, BenB4, BevRowe, Big iron, BlaiseFEgan, Bluemask, Bmilicevic, Bobby H. Heffley, Bobo192, Boffob,
Borislav, Bornsommer, Boromir123, Boud, Boxplot, Br1z, BradBeattie, Brazuca, BrendanH, Brennan41292, Brian0918, Brion VIBBER, Brotherbobby, Brougham96, Bwana2009, Bychan,
CRGreathouse, CSTAR, Caesarjbsquitti, Camembert, Can't sleep, clown will eat me, CanadianLinuxUser, Captain-tucker, Carbonite, Careercornerstone, Celestianpower, CesarB, Chadmbol,
Chamal N, Chaos, Chendy, Chickenman78, Chocolateluvr88, Chocolatier, Chris53516, Chrisrayner, Christian List, Christian75, Christopher Connor, Chun-hian, CiaPan, Ckatz, Class ruiner,
Clear range, Cliff smith, Closedmouth, CommodiCast, ConMan, Cswrye, Ctacmo, Curps, Cutler, D. Wu, DVD R W, Dainis, Daniel5127, Daphne A, Dave6, DavidCBryant, DavidWBrooks,
Db099221, Dbenzvi, Dcljr, Ddr, Debresser, Decstop, Dee539, Dekisugi, Delaszk, Demmy, Derek farn, DesertAngel, Deviathan, Dexter inside, Dhochron, Dhodges, Diaa abdelmoneim,
Dibowen5, Digisus, Discospinster, Dlohcierekim's sock, Dmb000006, DocKrin, Donwarnersaklad, Dpr, Dragoneye776, Drchris, Drf5n, Drilnoth, Drivi86, Dwayne, Dycedarg, Dysprosia,
DéRahier, ECEstats, ERcheck, Eclecticology, Ed Fitzgerald, Edward, Eequor, Ekalin, El C, ElinorD, Elitropia, Emesee, Enchanter, Englishnerd, Epbr123, Epolk, Escape Orbit, Everyking, Evil
Monkey, Faradayplank, Favonian, Filelakeshoe, Finn krogstad, FinnMan, Fishiface, Florendobe, FrancoGG, Fred Bauder, Free Software Knight, FreplySpang, Fschoonj, Funandtrvl, Fyyer, G716,
GABaker, GB fan, Gaia Octavia Agrippa, Galoubet, Galwhaa, Gandalf61, Gary King, Gianlu, Gidonb, Giftlite, Gimboid13, Giusippe, Gjd001, Glass Sword, Gobbleswoggler, Gogo Dodo,
GraemeL, Gsociology, Guaka, Guanaco, GumbyProf, Gundersen53, Gurch, Gwernol, Gzkn, Gökhan, Hallway916, Harryboyles, Hefaistos, Helixweb, Hemanshu, Henrygb, Hereticam, Heron,
Herreradavid33, Hgberman, Hiamy, Hike395, Hinaaa, Hingenivrutti, Honza Záruba, Howardjp, Htim, Hu12, Hubbardaie, Hve, HyDeckar, Hydrargyrum, Hydrogen Iodide, IamHope, Ianml,
Icairns, Igoldste, Ilya, IlyaHaykinson, Ireas, Isolani, It Is Me Here, Iwaterpolo, J.delanoy, JA(000)Davidson, JD554, JDPhD, JForget, JJL, JaGa, Jacob Lundberg, Jake Nelson, Jakohn, Jb-adder,
Jeff Dahl, Jeremykemp, Jfdwolff, Jfitzg, Jim, Jim.henderson, Jimmaths, Jitse Niesen, Jmlk17, JoergenB, John ellenberger, John254, Johnbibby, Johndarrington, Johnjohn83, Jordan.brayanov,
Jorjulio, Jorunn, Joseph Solis in Australia, Jrl306, Jt512, Jusdafax, Kanie, Karlheinz037, Karnesky, Katonal, Kbh3rd, Kcordina, Keegan, Keepitup.sid, Kellylautt, Kendroche, Kenneth M Burke,
Kenz0402, Kiefer.Wolfowitz, Kiril Simeonovski, Kku, Klnorman, Kolmorogoff, Koolkao, Kuru, Kwanesum, L353a1, LOL, Lambiam, Lapaz, Latka, Ldc, LeeG, Lethe, Levineps, Lexor, Lights,
Lilac Soul, Linas, Lindsay658, Livius3, LizardJr8, Locos epraix, Loizbec, Loom91, Looxix, Lostintherush, Luna Santin, Luntertun, M C Y 1008, Maarten van Vliet, Mack2, MagnaMopus, Maha
ts, Malhonen, Mangesh.dashpute, Maniac2910, Manik762007, Manop, Marekan, MarkSweep, Markbold, Markmagdy, MarsRover, Mathieumcguire, Mathinik, Mats Kindahl, Matthew Stannard,
Mattisse, Maurreen, Mausy5043, Mav, Maximus Rex, Mayumashu, Mct mht, Mdb, Mdd, Megaman en m, Melcombe, Memset, Mentifisto, Mets501, Meursault2004, Mgnbar, Mhmolitor,
Michael Hardy, Michal Jurosz, Miguel, Minghong, Modify, Morten Münchow, Mosca, Moverton, Mr Anthem, MrFish, MrOllie, Msh210, Mthibault, MuZemike, Muzzle, Mxn, Myasuda,
Mysidia, N.j.hansen, N5iln, Nagytibi, Nbeltz, Neelix, NeilN, Netoholic, Netsnipe, Neurolysis, Nevvers, Nivix, Nixdorf, Nnp, Noeckel, Noschool3, Notizy1251, Notteln, Nrcprm2026, Numbo3,
O18, ONEder Boy, Odie5533, Oleg Alexandrov, Olivier, OllieFury, Oneiros, Onore Baka Sama, Optim, OrgasGirl, Oxymoron83, P Carn, PFHLai, Passw0rd, Paul August, Pax:Vobiscum,
Paxcoder, Pete.Hurd, Peterlin, Phantomsteve, PhilKnight, Philip Trueman, Photoleif, Piano non troppo, PierreAnoid, Pinethicket, Piotrus, Possum, PrBeacon, Precanalytics, Proofreader77, Quux,
Qwfp, Qxz, Ranajeet, RandomCritic, Ranger2006, RayAYang, Rbellin, Rebecca, Recurring dreams, Requestion, RexNL, Rgclegg, Rich Farmbrough, Richard redfern, Richard001, RichardF,
Richardelainechambers, Rjwilmsi, Rlsheehan, RobertG, Robertdamron, Robth, Roderickmunro, Ron shelf, Ronz, Roozbeh, RossA, RoyBoy, Rsabbatini, Rstatx, Rumping, Rustyfence, Ruzihm,
Rwilli13, SLC1, SWAdair, Saikat, Sak11sl, Salix alba, Salsa Shark, Sam Hocevar, Samuel, Sandman888, Santa Sangre, Sarvesh85@gmail.com, Sbarnard, Schissel, Schneelocke, Seanstock,
Seaphoto, Secretlondon, Sengkang, Shadowjams, Shadowpsi, Shaile, ShawnAGaddy, Shyamal, Silly rabbit, Simesa, SimonP, Simoneau, Sina2, SiobhanHansa, Sj, Skagedal, Skew-t, Slack---line,
Snoyes, SoLando, Soumyasch, Sourishdas, Spetzznaz, Spiel496, Stathope17, Stephen Gilbert, Stephenb, Stevertigo, Stpasha, Stux, Stynyr, Sunur7, Sweeraha, Symane, TCrossland, THEN WHO
WAS PHONE?, Tailpig, Talgalili, Tannin, Tanyawade, Tarquin, Tastewrong1234, Tautologist, Tayste, TeH nOmInAtOr, Templatehater, Terry Oldberg, The Anome, The Transhumanist, The
Transhumanist (AWB), TheSeven, Thefellswooper, Tide rolls, Tigershrike, Tim Ivorson, Tiptoety, Toddst1, Tom harrison, Tomi, Tommy2010, Tomsega, Tophcito, Triwbe, Trugster, Turbojet,
Tuxedo junction, Ulyssesmsu, Uncle Milty, Unschool, Updatehelper, VX, Veinor, Vimalp, Waabu, Wakka, Waveguy, Wavelength, WeijiBaikeBianji, Wernher, Whisky drinker, Whouk, Wiki alf,
Wikiborg, Wikidan829, Wikieditor06, Wikiklrsc, Wikilibrarian, Wikipediatoperfection, Wikisamh, Wikiwilly, Wile E. Heresiarch, Willetjo, Willworkforicecream, WinterSpw, XDanielx, Xerxes
minor, Yannick56, YellowMonkey, Yhkhoo, Youkbam, Youssefsan, Yuanfangdelang, Zappernapper, Zeamays, Zenohockey, Zero0w, ZimZalaBim, Zondor, Zven, Île flottante, 1285 anonymous
edits

Mathematics  Source: http://en.wikipedia.org/w/index.php?oldid=386630969  Contributors: -- April, -jmac-, 0612, 0na01der, 12.254.243.xxx, 130.182.173.xxx, 158-152-12-77, 203.31.48.xxx,
20em89.01, 2boobies, 3Nigma, 4C, 4pario, 5150pacer, 5300abc, 62 (number), 62.8.212.xxx, 68.38.192.xxx, APH, Aaronthegr8, Abc30, Abhilaa, Abtract, Academic Challenger, Aceleo,
Adamsan, Adashiel, Addshore, Aditya Kabir, Aetherealize, Agatecat2700, Agroose, Ahoerstemeier, Ais523, Aitias, Akamad, AlanBarrett, Alansohn, Albert Einsteins pipe, Aldermalhir,
Alexius08, Alexturse, Alfio, Alientraveller, Alink, Alireza Hashemi, Alliashax, Alpboyraz, Alphachimp, Alphax, Amire80, Amplitude101, Amystreet, AnOddName, Anaraug, Andersmusician,
Andreas2001, AndrewKepert, Andrewferrier, Andrewlp1991, Andrewm1986, Andris, Andy pyro, Andy85719, AndySimpson, AndyZ, Angela, Angr, Animalalley12895, AnimeFan7,
Anmol9999, Ann Logsdon, Anonymous Dissident, Antandrus, Anthony Appleyard, Antiwiki, Anubhav29, Ap, Apokryltaros, Apparition11, Archaeopteryx, Archfalhwyl, Archmagusrm,
Arcturus, ArglebargleIV, Arkuski, Armeria, Armindo, Arne List, ArnoldReinhold, Art LaPella, Arunta007, Arvindn, Asdfdsa, Asterion, Asyndeton, Athenean, AussieOzborn au, Avala, Avenue,
Avnjay, Avraham, AwesomeHersh, AxG, AxelBoldt, AxiomShell, Ayman, AzaToth, BD2412, Baa, Babaloulou, BadKarma14, Balaam42, Bandaidboy, Bangvang, BanyanTree, Barak Sh,
Barneca, Baronnet, Bart v M, Batmanfan77, Battoe19, Bbatsell, Bcameron54, Beanai, Bedsandbellies, Beland, Ben D., Ben Standeven, Ben2then, BenKovitz, Bencherlite, Benjiboi, Bentong
Isles, BertSeghers, Bethnim, Bevo, Bezking, Bfinn, BiT, BillC, Blackmail, Blaxthos, Blobglob, Blondeychck7, Bluemask, Bmeguru, Bobblewik, Bobet, Bobo192, Bogdangiusca, Bongwarrior,
Boogster, Bookandcoffee, Booyabazooka, BorgHunter, Boris Allen, Borislav, Bovineone, BozMo, Br77rino, Braindamage3, BrentLeah, Brian G. Wilson, Brianjd, Brion VIBBER, Briséis,
Brotown3, Brucevdk, Brunnock, Bryan Derksen, Bsadowski1, Bubba hotep, Buggi22, Bupsiij, Bwfrank, C. Lee, C.Fred, C.lettingaAV, C1932, CALR, CANUTELOOL2, CANUTELOOL3,
CBOrgatrope, CRGreathouse, CSTAR, Cactus.man, Caesarjbsquitti, Caleb7693, Calltech, Cam275, Camembert, Can't sleep, clown will eat me, CanadianCaesar, CanadianLinuxUser,
Canadianism, Candy-Panda, Canthusus, Cantras, Canyonsupreme, Cap601, CapitalSasha, Capitalist, Capricorn42, CaptainIron555, Captmog, Carcharoth, Careercornerstone, Cartiod, Cdt
laurence, Cedars, Celendin, Centrx, CesarB, Cfrehr, Chalst, Chamal N, Chance Jeong, Charles Matthews, Charlielee111, CharlotteWebb, Charvest, Chas zzz brown, Chendy, Chickenclucker,
Chief Heath, Chill doubt, Chingchangriceball, Chocoforfriends, Chocolatepizza, Chrislk02, Christian List, Christopher Parham, Christopher denman, ChristopherWillis, Chun-hian, Cjnm, Ckatz,
Clawed, Claygate, CloudNine, Cmurphy au, Coastergeekperson04, Codycash33, Colonies Chris, Cometstyles, Commander Shepard, Connectonline, Conversion script, Correogsk,
CraigDesjardins, Crazybobson, Cremepuff222, Cronholm144, CryptoDerk, Crystallina, Curps, Cursive, Cutler, Cvaneg, Cyp, CzarB, D, D6, DFRussia, DJTrickyM, DONZOR, DVD R W,
DVokes, Da monster under your bed, DabMachine, Dabomb87, Damicatz, Dandy mandy, Daniel, Daniel C. Boyer, Daniel5127, DanielCD, Danielkwalsh, Danski14, Dark Load, DarkAudit,
DarkFalls, Darkhero77, Darkmyst932, Daryl7569, Daven200520, David Little, David R. Ingham, David spector, DavidCBryant, DavidLevinson, Db099221, Dbenbenn, Dboerstl, Dbtfz, Dcljr,
Death blaze, Deathiscomin90919, Demmy, Den fjättrade ankan, DennyColt, Deus Ex, Deverenn, Devonboy69, Dfrg.msc, Dgrant, Digby Tantrum, Diggyba, Dillydumdum, Dina, Diocles,
Discospinster, Djfeldman, Dlohcierekim's sock, Dmharvey, Dmn, Dmytro, Dna-webmaster, Doktor Who, Dominus, Donarreiskoffer, Donhalcon, Donkeyboya, Doug Alford, Doug Bell, Dozing,
Dr. Klim, Dragomiloff, Drumbeatsofeden, DryaUnda, Durova, Dylan Lake, Dysprosia, ERcheck, ESkog, Easton12, Ed g2s, Eduardoporcher, EdwardLockhart, Eeekster, Eequor, Eighty,
Ekilfeather, El C, El3m3nt09, ElBenevolente, ElNuevoEinstein, Elapsed, Elektron, Elfguy, Eloquence, Elroch, Elvisfan2095, Emo kid you?, Emote, Emre D., Enormousdude, Enviroboy,
Epbr123, EstebanF, Estoy Aquí, EugeneZelenko, Evanbrown326, Evercat, Everyking, Evil saltine, Excirial, Experiment123, FCYTravis, Fabiform, Falconleaf, Family400005, Faty148, Faustnh,
Favonian, Fennec, FiP, Fibonacci, Finell, Fir0002, Firebirth, Floccinocin123, Flockmeal, Flowersofnight, FlyingToaster, Font, Fox, Fplay, Francs2000, Frank2710, Franklin.vp, Fredrik,
FreplySpang, Frogjim, Func, Funwithbig, Fuzheado, Fuzzyhair2, FvdP, Fvw, G. Campbell, GHe, GPdB, GabrielAPetrie, Galoubet, Gamahucher, Gandalf61, Gareth Owen, Gareth Wyn,
GarnetRChaney, Gary King, Garzo, Gawaxay, Geoffspear, Geoking66, Geologician, Geometry guy, Georg Muntingh, Gff, Ghyll, Giant toaster, Giftlite, Gimboid13, Gingerninja12, Gioto, Glen,
Glenn, Go for it!, Golbez, Goldencako, Googl, GoonerDP, Gott wisst, GraemeL, Graham87, Greatal386, Green caterpillar, Greenjuice, Greenjuice3.0, Greenjuice4, Gregbard, Grg222, Grifter84,
Grizzly, GrumpyTroll, Grunt, Gscshoyru, Guaka, Guanaco, Gubbubu, Gugtup, Guppy, Gurch, Gutza, Gzkn, Hadal, Hamletö, Har56, Harmil, Harryboyles, Hdante, Hdt83, HeadCase,
Headforaheadeyeforaneye, HeikoEvermann, Heimstern, Helldude99, Hellonicole, Henry Delforn, HenryLi, Henrygb, Herve661, Hetar, Heyheyhey99, Hhjk, Honeyspots3121, HoodedMan,
Hopper5, Hotmedal, Hrishikesh.24889, Hu, Huerlisi, Hut 8.5, Huzefahamid, Hwalee76, IIR, Iapetus, IceUnshattered, Iced Kola, Icey, Ichudov, Igoldste, Ilya (usurped), Ilya Voyager, Inferno,
Lord of Penguins, Inkypaws, Inter, Iridescent, Isaac25, Ishap, Island, Isomorphic, Itsfrankie1221, Itsmine, Ivelnaps, Iwnbap, Ixfd64, J.delanoy, J0equ1nn, JDQuimby, JDoorjam, JDspeeder1,
JForget, JIP, JJL, JLaTondre, JPD, JRM, JYolkowski, JackLumber, Jackol, Jacob Lundberg, Jacob.jose, Jacob501, Jagged 85, Jagginess, Jaichander, JakeVortex, JamesBWatson, Jamesontai,
Jaranda, Jat99, Jatrius, Jazriel, Jcw69, Jeff G., Jeremybub, Jerry, Jersey Devil, Jester-Tester, Jet57, Jetsboy101, Jeyarathan, Jiddisch, Jitse Niesen, Jklin, Jmundo, Jni, JoanneB, JoeTrumpet,
Joejill67, Johann Wolfgang, John Foley, John-Haggerty, John254, JohnBlackburne, JohnOwens, Johnbibby, Johnferrer, Jon Awbrey, Jon Cates, Jonathunder, Jonik, Jorend, Jorgen W, Jose77,
Joseph Solis in Australia, Josh Parris, Joshua Boniface, Joshuagross, Joshurtree, Jossi, Jpark3591, Jpgordon, Jschwa1, Jtir, Jtkiefer, Juan Marquez, Juansempere, Jubeidono, Juliancolton, Jung
Article Sources and Contributors 232

dalglish, Jusjih, Just James, Jwy, J•A•K, KConWiki, Kan8eDie, Kaobear, Karada, Karlscherer3, Kasparov, Katoa, Katzmik, Keegan, Keesiewonder, Kelly Martin, Kennethduncan, Kevin Baas,
Kevlar992, Kiefer.Wolfowitz, Kieff, Kilo-Lima, Kilva, Kingturtle, Kipholbeck, Kku, Knowledge Seeker, KnowledgeOfSelf, Knuckles sonic8, Koyaanis Qatsi, KrazyKosbyKidz, Kreemy, Krellis,
Ksucemfof, Kubigula, Kumioko, Kungfuadam, Kungming2, Kuru, Kwsn, Kz8, LA2, LC, Labongo, Lambiam, Latka, LavosBacons, Leafyplant, Lectonar, Lethe, LevenBoy, Lexor, Liface,
Ligulem, Limaner, Linas, Little Mountain 5, Little guru, LittleDan, Loadmaster, Locos epraix, Lordthees, Lotje, Loudsox, Lowellian, Lumos3, Luna Santin, Lunch, Lupin, Lupo, Lyght, M a s,
MC10, MCrawford, MER-C, MFH, MJSkia1, MKoltnow, MONGO, Madir, Maelor, Magister Mathematicae, Magnus Manske, Malhonen, Malo, Man vyi, MarSch, Marc van Leeuwen,
MarcusVox, MarilynCP, Mark Krueger, Mark Renier, MarkS, Markjdb, Markus Krötzsch, Marlonbraga, MarsRover, Marysunshine, Masparasol, Masshaj, Masterjamie, Math hater, Matheor,
Mathsmad, Matt Crypto, Matt1314, Mattguzy, Mattyboy500, Matusz, Matěj Grabovský, Maurice Carbonaro, Maurreen, Mav, Maximus Rex, Maxwahrhaftig, Mayumashu, McKay, McVities,
Mccready, Mcmillin24, MeatJustice, Meb43, Melchoir, Melcombe, Memorymentor, Meno25, Merlissimo, Merovingian, Mets501, Meznaric, Mfishergt, Mgnbar, MhordeXsnipa,
MiNombreDeGuerra, Mic, Michael C Price, Michael Hardy, Michael Slone, Michael Snow, Michael.Urban, Miguel, Mike Schwartz, Mike92591, Mikemoral, Mikez, Milogardner, Minesweeper,
Mipadi, Miquonranger03, Misos, MisterSheik, Misza13, Mkns, Mlk, Mmmready, Mnemeson, Mo0, Mofeed.sawan, Mongreilf, Monkeynuts27, Moocow444, Moohahaha, Mosesroses, Mother69,
Mpatel, Mr Stephen, Mr magnolias, Mr. Billion, MrFish, Msh210, Mspraveen, Muchness, Mudcower, Muffin, Muriel Gottrop, Musse-kloge, Muéro, Mviergujerghs89fhsdifds, Mvsmith, Mxn,
My Cat inn, MyNamesLogan, Mycatiscool, Mysdaao, Mysidia, Nacrha, Nakon, Narxysus, Nastor, Natural Philosopher, NawlinWiki, Nczempin, Neko85, NerdyNSK, Netesq, Netoholic, Nev1,
Nevada, NewEnglandYankee, Nickm4c, Nickmuller, Nigholith, Nikola Smolenski, NikolaiLobachevsky, Nilamdoc, Nirvana888, Nishkid64, Niteowlneils, Nitya Dharma, Nixdorf, Nlu, No Guru,
Noahschultz, Nobs01, Nocklas, Node ue, Noetica, Nohat, Nomoneynotime, Noosfractal, Notheruser, Notinasnaid, Nov ialiste, Novacatz, Nscheffey, Ntmatter, Nukemason4, Numbo3, Ocolon,
Ohnoitsjamie, Oleg Alexandrov, Olga Raskolnikova, Olivier, Omicronpersei8, Omnieiunium, Onefive15, Onestone, Oo7565, Ooswesthoesbes, Opabinia regalis, Optakeover, Orionix, Orz, Oskar
Sigvardsson, OwenX, Oxymoron83, PDH, Page Up, Paolo.dL, Papep, Pascal.Tesson, Passw0rd, Paul August, Paul13, Pawl Kennedy, Pbroks13, Peak, Pentasyllabic, PeregrineAY, Petes2176,
Pethan, Pgk, Phanu9000, Phoenix1177, PhotoBox, Phys, Piano non troppo, PierreAbbat, Piet Delport, Pikminiman, Pilotguy, Pinkadelica, Piolinfax, Pizza Puzzle, PizzaMargherita, Pjb14, Pjvpjv,
Planb 89, Plastikspork, Pleasantville, Pm11189, Pmanderson, Point-set topologist, Politepunk, Polymerbringer, Polyvios, Poor Yorick, Pooryorick, Porton, Possum, Potatoscrub, Poweroid,
Prateekrr, PrestonH, PrimeFan, Profvk, Protonk, Prozo3190, Pruneau, Python eggs, Qef, Qtoktok, Quadell, Quanticle, Quintessent, Quixotex, Quuxplusone, Qxz, RANDP, RC-0722, RJASE1,
RK, RTFVerterra, RW Marloe, Rade Kutil, Raiden09, RainbowOfLight, Rajasekaran Deepak, Rajpaj, Ralesk, Ram-Man, Random account 47, Randomblue, Rdsmith4, RedWolf, Reedy,
Remember the dot, Remi0o, Remus John Lupin, Requestion, Retiono Virginian, Rettetast, RexNL, ReyBrujo, Rgclegg, Riana, Rich Farmbrough, Richard Woods, RichardF, Richardcraig,
Richfife, Rick Norwood, Riekuh, Ringleader1489, Rje, Rjwilmsi, Rl, Rmt2m, RobHar, RobertG, Robin S, Robinh, Robomonster, Roccorossi, Rock69, RockMFR, Roland Deschain, Romanm,
Ronz, Rossami, Rotem Dan, Rotje66, RoyBoy, Royalguard11, Rrburke, Rubentimothy, RunOrDie, Rurik3, Ruud Koot, Ruwanraj, RyanCross, RyanEberhart, Ryeterrell, Ryulong, SFC9394,
SMIE SMIE, Sacre, Saga City, Sakowski, Salix alba, Salsa Shark, Salt Yeung, Sam Hocevar, Sam Korn, Samlyn.josfyn, Samsara, SandyGeorgia, Sango123, Sannse, Sardonicone, Sarregouset,
Savagepine, Schapel, Scheinwerfermann, Sean Kelly, Seba5618, Seberle, Secretlondon, Selfworm, Selket, Senator Palpatine, Sengkang, Sethdoe92, Sfisher, Shahab, Shanes, Shmm70, Short
Verses, Sidasta, Silly rabbit, Silverfish, SimonMorgan, Simonkoldyk, Sir Nicholas de Mimsy-Porpington, Sir Vicious, Siroxo, Sketchmoose, Skizzik, Skullfission, Slac, Slayerteez, Sleeping123,
SlightlyMad, Smack, Smeira, Smithpith, Smitz, Smkumar0, Smyth, Snailwalker, Snoyes, Snozzer, Soccermaster3112, Soliloquial, Soltras, Some P. Erson, Somebodyreallycool, Sp3000,
Spacemonkey, Special-T, Spellchecker, Spencerallyn, Spiff, Spliffy, Spundun, SquidSK, Squiddy, Srikeit, Srinath555, Ssd, Staecker, Staffwaterboy, Staples, Stemonitis, Stephen B Streater,
Stephen G. Brown, Stephen MUFC, Stephen j omalley, Stephenb, Sternkampf, Stevertigo, Storm Rider, Sundar, Supersmashballs123, Sverdrup, Swpb, Sylent, Symane, Syphertext, Systemlover,
Sześćsetsześćdziesiątsześć, Sławomir Biały, T.M.M. Dowd, TAU710, THEN WHO WAS PHONE?, Tachyon01, TachyonP, TakuyaMurata, Tamillimat, Tangotango, Taop, Tarquin, Tarret,
Tarten5, Taskualads, Tatarian, Tawker, Taylorhewitt, TeH nOmInAtOr, Tearlach, Teller33, TenOfAllTrades, Teorth, Terence, ThaddeusB, The Anome, The Catcher in The Rye D:, The
Transhumanist, The Transhumanist (AWB), The wub, TheEmaciatedStilson, TheGerm, TheKMan, TheKoG, TheSeven, Thedudester, Thefutureschannel, Themantheman, Thenub314,
Thesilverbail, Thingg, Thirty-seven, Thomas H. Larsen, Thomasmeeks, Thomaswgc, Tigershrike, Tim Retout, Tim1988, Timgregg96, Timir2, Timo3, Timothy Clemans, Timwi, Tiptoety, Titoxd,
Tiyoringo, Tobby72, Toby Bartels, Tom harrison, Tomaxer, Tompw, TonyBallioni, Tonywalton, Tpbradbury, Traroth, Trd89, Tregoweth, Trehansiddharth, Trevor MacInnis, Triage, Triwbe,
Triwikanto, Trovatore, Tseay11, Tsirel, Tuluat, Tungsten, Twospoonfuls, Tzurvah MeRabannan, Uarrin, Ucanlookitup, Ugur Basak, Ukabia, Ukexpat, Ulises Sarry, Unknown 987, Unlockitall,
Unterdenlinden, Utcursch, Vancouverguy, Vanished User 0001, Vargenau, Vary, VegaDark, Versus22, Vesal, Vesta, Vgy7ujm, Vianello, VictorPorton, Vikvik, Vinsfan368, Vishi-vie, Vladimir
m, Voicework, Volcom5347, Wafulz, Wakka, Walor, Warhawkhalo101, Washburnmav, Washington8785, Wavelength, Wayp123, Welshleprechaun, Wereon, West Brom 4ever, Where, White
Cat, White wolf753, WhiteDragon, Wigren, Wiki alf, Wikibobspider, Wikiklrsc, Wikiwhat?, Wikiworkerindividual***, Wknight94, Wolfkeeper, Wolfrock, Wood Thrush, Woohookitty,
Wootwootwoot, Wrathchild, Wulfric1, XJamRastafire, Xen 1986, Xenon54, Xiong Chiamiov, Xyzaxis, Yadar677, Yahel Guhan, Yakuzai, Yamamoto Ichiro, Yansa, Yarnalgo, YellowMonkey,
Yhkhoo, Yobmod, Youssefsan, Yoyoyo9, Yurei-eggtart, Zachorious, Zero0000, Zeuron, Zhentmdfan, Zhurovai, Zhymkus, Zippy, Zodon, Zoltan808, Zondor, Zootsuits, Zoz, Zsynopsis, Zundark,
Zzyzx11, Ævar Arnfjörð Bjarmason, Милан Јелисавчић, 1789 anonymous edits

Median  Source: http://en.wikipedia.org/w/index.php?oldid=382029221  Contributors: 16@r, 4johnny, AGToth, Acebulf, Adam78, Adamjslund, Afa86, Airplaneman, Alejo2083, Almuayyad,
AlphaEta, Ancheta Wis, Andrew c, Andrewpmk, Antandrus, Apanag, Arcadian, Art LaPella, Atlant, Avenged Eightfold, Avjoska, AxelBoldt, B4hand, Baccyak4H, Bfinn, Bhudson, Bjcairns,
Blanchardb, Blindman shady, Bluap, Brain40, Brianjd, BrotherE, Bshort, Bth, Capricorn42, CharlotteWebb, Chire, Comic1, Connelly, Conversion script, Cryptor3, Cvaneg, Cybercobra,
Cybersavior, DL5MDA, David Eppstein, Dcljr, Dcoetzee, Den fjättrade ankan, Dick Beldin, Dima373, Dirkbb, Discospinster, Donarreiskoffer, Dreadstar, Dricherby, Dswallen, Dtuinhof,
Epbr123, Eric Kvaalen, Eseijo, Evil saltine, Explicit, FreplySpang, Fruggo, G716, GIScope, Gatewaycat, Giftlite, Glane23, Gomm, Graham5571, GreenGourd, Grossu, Hadleywickham, Haham
hanuka, HamburgerRadio, Heimstern, Henrygb, Herbee, Hirak 99, Hottentot, Hut 8.5, IMpbt, Impdog, Incnis Mrsi, Insanity Incarnate, J Hill, J.delanoy, JForget, JackOL31, Janneok, Jason Quinn,
Javawizard, Jesdisciple, Jfpierce, Jitse Niesen, Jklin, JodyB, Jose Ramos, JoshuaSchaeffer, Katalaveno, Kateshortforbob, Kiefer.Wolfowitz, Kpjas, Kuru, LGF1992UK, LeaveSleaves, Lirion,
LivingFont, Lythanhphu, MER-C, MacMed, Maddie!, Male1979, Manop, Mapley, Marco Pellegrino, MarkSweep, Martarius, Melcombe, Mentifisto, Michael Hardy, Michaelas10, Minesweeper,
Mishrasknehu, Mormegil, MrOllie, Nakon, Nascar1996, Ncmvocalist, Ndenison, NerdyPunk2ML, Nishkid64, Nixdorf, Noctibus, Obradovic Goran, Octahedron80, Oskar Sigvardsson, Pablo
Alcayaga, Palnatoke, Pamri, Patrick, Paul August, Penguinnerd121, Pinethicket, Poultney02, Quantling, Qwfp, RL0919, Radiojon, RexNL, Rich Farmbrough, Roozbeh, RoseParks, RyanCross,
SURIV, Sameeersingh, SanjivBhatia, Sfdan, Shannon1, Shmget, Signalhead, SlamDiego, Sligocki, SoSaysChappy, Sobreira, Steinsky, Stemonitis, Stpasha, Struway, Surfergrl813, Swaroopch,
Sławomir Biały, TPK, Thatguyflint, The Thing That Should Not Be, Thomas Tvileren, Tomi, Triskell, Trusilver, U3002, Urhixidur, Vishnava, WeijiBaikeBianji, Well, girl, look at you!, Wetman,
WikHead, WikiDao, WikipedianMarlith, Wildscop, XXJASHANXx, Zundark, Zvika, ^demon, 381 anonymous edits

Mean  Source: http://en.wikipedia.org/w/index.php?oldid=386262603  Contributors: 02barryc, 123candy, 128.195.169.xxx, 165.123.179.xxx, 16@r, 212.153.190.xxx, 213.253.39.xxx, 7121989,
A314268, APT, Adambro, Adamjslund, Ahruman, Aitias, Alan Liefting, Alansohn, Ale jrb, Alexwuv, Alksentrs, Allen3, Altenmann, Amirab, Ann Stouter, Anna Lincoln, Antandrus, Arakunem,
Arcadian, Arithmonic, Army1987, Aruton, AubyJR., Avenue, AxelBoldt, B4hand, Babycruzrocks, Bagatelle, Banes, BarretBonden, Bart133, BenFrantzDale, Bequal, Berland, Bfigura, Bjcairns,
Blehfu, BlueEditor, Bobble2, Bobo192, Bobrayner, Brossow, C'est moi, CMW275, COMPFUNK2, CRGreathouse, CWii, CardinalDan, Carmen56, Casper2k3, Catgut, Chickchick19, Cmacpher,
Cody574, Connelly, Conversion script, Coolaaron88, Corecirculator, Cronholm144, CroydThoth, Crzrussian, Cubs Fan, Cybercobra, DVD R W, Dac04, Dandv, DarkFalls, Davidruben,
DearPrudence, Delldot, Den fjättrade ankan, Dethme0w, Diegotorquemada, Dietcokelime2006, Discodontron, Discospinster, Dmcq, Donarreiskoffer, Dysepsion, ERcheck, ESkog, Ear121,
Eddthegr8one, Ejy2007, El C, Epbr123, Eubulides, Falcon8765, Faradayplank, FastLizard4, Fieldday-sunday, Fireice, Footwarrior, Fortdj33, Freedomlinux, Fresheneesz, Funkystuff267,
Funnybunny123, G716, GTBacchus, Gary King, Gene Nygaard, GeoWriter, Ghazer, Giftlite, Gogo Dodo, Graham87, Grant meads, Grim23, Groengras, Gul e, Gurch, Gwernol, Gyrofrog,
Hadleywickham, HenningThielemann, Henrygb, Hithisishal, Hobartimus, Hongooi, Honker04, Hossein-amidi, Htim, HumbleGod, Hypnosifl, IRP, Icairns, Impdog, Infovarius, Into The Fray,
Iridescent, J roc69, J.delanoy, Jdm, Jeffrey Mall, Jem11299, Jitse Niesen, JocK, Jok2000, Jonathan de Boyne Pollard, KDesk, Kaimbridge, Katalaveno, Kazvorpal, Kbdank71, Kevin Lamoreau,
KinaseD, Kjetil1001, Kkbairi, Lapqmzlapq, Learner505, Leondz, LibLord, Linkracer, Loonymonkey, Luna Santin, MacMed, Maksim-e, MarkSweep, Marshall Williams2, Maxem, Mebiancame,
Megaman en m, Melcombe, Mets501, Michael Hardy, Mikeblas, Mmortal03, Mpj17, MrOllie, Mschlindwein, Mwtoews, NawlinWiki, Nburden, Neonxing23, Neptune5000, Netalarm, Nwerneck,
Octahedron80, Oleg Alexandrov, Oore, Orphan Wiki, Oxymoron83, Ozob, P Carn, ParticleMan, Partycows, Patrick, Paul August, Persian Poet Gal, PeterSymonds, Pharaoh of the Wizards, Philip
Trueman, Philomathoholic, Piano non troppo, Pinethicket, Polar, Poulpy, Pseudomonas, Pt, Qwfp, R. S. Shaw, RJaguar3, RL0919, Rachel jean, Redvers, Riahmare815, Rich Farmbrough,
Rjohnson92, Ronhjones, Salix alba, Sam Derbyshire, Samwb123, Sanfordn, Sanjiv swarup, Sbfw, Schzmo, Scohoust, Seba5618, Secretlondon, Semperf, Shirulashem, Shorelander, Silly rabbit,
Siroxo, SkyWalker, Slamb, Slowking Man, Snowolf, Some poser, Sp3000, Spazure, Srleffler, Suffusion of Yellow, Sunderland06, Synchronism, Syrthiss, Sławomir Biały, TPK, TacomaZach,
Tanaats, Tangerines, Taransingh63, Teklund, Temerster, Tero, Tex, The Rambling Man, The Red, The Thing That Should Not Be, Tide rolls, Tom-, Trevor MacInnis, Vcpandya,
Victoriaplummer, Vrkunkel, Vsb, Whatfg, Willking1979, With goodness in mind, Woogee, Wyatt915, Xyb, Yono, Zvika, 535 anonymous edits

Statistical population  Source: http://en.wikipedia.org/w/index.php?oldid=349804544  Contributors: Abanima, Aeon1006, Arodichevski, Avenue, BD2412, Boffob, CapitalR, Conversion script,
Den fjättrade ankan, DerHexer, Dick Beldin, Dori, Giftlite, Graham87, HorsePunchKid, Irbobo, Jitse Niesen, Kembangraps, Kku, Lamro, Larry_Sanger, Latka, MCepek, Melcombe, Michael
Hardy, Nbarth, Piotrus, Raul654, Ronz, Salix alba, Suisui, TakuyaMurata, Ugen64, Wildt, Wood Thrush, 39 anonymous edits

Sampling (statistics)  Source: http://en.wikipedia.org/w/index.php?oldid=384869207  Contributors: 16@r, 2D, A8UDI, AbsolutDan, Addshore, Aegis Maelstrom, Ahoerstemeier, Aidan Croft,
Alansohn, Andrewpmk, Andy M. Wang, Andycjp, Antandrus, Aremith, Arnold90, Avenue, BenRG, Betacommand, Betsypider, Bgeelhoed, Bigbermus, BinoChrist, BlaiseFEgan, Bobo192,
Bongwarrior, Burner0718, CBM, Caferzorlu, Cdc, ChaosNil, Christian List, Correogsk, Cp111, Cr123, Crystal whacker, Ctacmo, Cutler, Dagterje, Dantheman531, Dbalson, Dcoetzee,
Deadmanjones, Den fjättrade ankan, Diderot, Digitat, Doldrums, Dt128, E rulez, ECEstats, ESkog, EdH, Email4mobile, Epbr123, Eric Kvaalen, FamicomJL, Favonian, Felagund, Finlay
McWalter, Fiskehaps, Francisco.brito, FuelWagon, G716, Gagi, Gail, Gaius Cornelius, GenericBob, Geometry guy, Giftlite, Gilliam, Gsaup, Gsociology, Haoie, Inkwina, J.delanoy, JD554,
Jaytan, Jbartii, Jbenno, Jeff G., Jmariel2000, John Foley, Johnbibby, Johnkarp, Josh Parris, Josh3580, Joshuagross, Jtneill, Jtpickering, Kateen205, Katieh5584, Keilana, Kewp, Kiefer.Wolfowitz,
Kingpin13, Kku, Krakenflies, Kungfuadam, Lexor, Lgallindo, Lokorin, MER-C, MPerel, Majj, MarkGallagher, Math Champion, Matt Crypto, MauriceJFox3, Maurreen, Maxamegalon2000,
Mdanh2002, Medwardz, Melcombe, Meredithphd, Mhardingipi, Michael Hardy, Mikael Häggström, Minna Sora no Shita, Mirv, Mmernex, Modster, Mr Stephen, Mudomon, Mydogategodshat,
Navidemami, NerdyScienceDude, Nyjuliet99, NymphadoraTonks, Orderud, OverlordQ, Oxymoron83, Patrick, Pavium, Pgan002, Pgrabarek, Physicistjedi, Pinkadelica, Piotrus, Possum,
Professor01, Qutezuce, Qwfp, Reedy, RenaudDetry, Res2216firestar, Rich Farmbrough, RichardF, Rjwilmsi, Rlsheehan, Ronz, Rucheleh, Salix alba, Sam Hocevar, Schandi, Seahorseruler,
Seaphoto, Secfan, Shadowjams, Shreevatsa, Smith609, Sotdjin, Stagalee, StefanosK, Straussian, SunCreator, Sxf1984, THF, TehBrandon, Tesi1700, The Rambling Man, Thelaststand3,
Article Sources and Contributors 233

Thomasmeeks, Thunderboltz, Tide rolls, Tomi, Tripbeetle, Tuxedo junction, Utcursch, Vegaswikian, Vern Reisenleiter, Votemania, WikHead, Wmahan, Wragge, Yamamoto Ichiro,
YellowPigNowNow, Zawersh, Zigger, Zymose, Zzuuzz, 551 anonymous edits

Probability theory  Source: http://en.wikipedia.org/w/index.php?oldid=385552220  Contributors: 129.116.226.xxx, 1ForTheMoney, APH, Aastrup, Abce2, Ali Obeid, Andeggs, Anonymous
Dissident, Arcfrk, Arjay369, Beland, Betterusername, Bjankuloski06en, Bjcairns, Bobo192, Boleslav Bobcik, Booniesyeo, Borislav, Bryan Derksen, Btyner, Calabraxthis, Capricorn42, Charles
Matthews, ChicXulub, Christopher Connor, Conversion script, Coyets, Cretog8, Cyfal, Cyrillic, DARTH SIDIOUS 2, DarkAudit, David Eppstein, Dbtfz, Debresser, Den fjättrade ankan, Drizzd,
DutchDevil, Dyaa, Dylan Lake, Dysprosia, El C, Elassint, Ensign beedrill, Fantastic4boy, Fastfission, Flammifer, Frankman, FutureNJGov, Gala.martin, Gbr3, Geometry guy, Giftlite,
Gill110951, Givegains, Goatasaur, Goldfinger 93, Graham87, Gutsul, Hadal, Hayabusa future, Hirak 99, INic, Jason Patton, Jauhienij, JayJasper, Jheald, Johannes Hüsing, Jonik, Josang,
Josephbrophy, KingTT, Knutux, KoyaanisQatsi, Krun, Kungfuadam, Kurtan, Lambiam, Larry_Sanger, Lee Daniel Crocker, Lenthe, Leroytirebiter, Lethe, Levineps, Liko81, MER-C, MH,
Magmi, Malhonen, MathMartin, Maximaximax, Mayumashu, McSly, Mdd, Melcombe, Michael Hardy, Michael Slone, Miguel, MisterSheik, Msh210, Myasuda, Ncmathsadist, Nguyen Thanh
Quang, Niteowlneils, Numbo3, Obradovic Goran, Oda Mari, Oleg Alexandrov, Omicronpersei8, PAR, PJTraill, Patrick, Paul August, Pax:Vobiscum, Pb30, Pinethicket, Porcher, Progicnet, Quiet
photon, Qwfp, RJHall, RainbowOfLight, Raymond Meredith, Raymondwinn, RexNL, Rgclegg, Rich Farmbrough, Rjwilmsi, Robertetaylor, Roman V. Odaisky, Salix alba, Sceptre,
ShaunMacPherson, SiobhanHansa, Sleeping123, Snielsen, Spebudmak, Spudbeach, Srinivasasha, Suruena, TMLutas, Tayste, The Anome, TheMandarin, Tiddly Tom, Tide rolls, Tizio, Tosha,
Treisijs, Trovatore, Tsirel, Unionhawk, Urdutext, User27091, Utcursch, Vivacissamamente, Wavelength, Weialawaga, Wyatt915, Wynand.winterbach, Ynh, Zaharous, Zenohockey, Zundark,
Zwilson, 225 anonymous edits

Normal distribution  Source: http://en.wikipedia.org/w/index.php?oldid=386388912  Contributors: 0, 119, 194.203.111.xxx, 213.253.39.xxx, 5:40, A. Pichler, A.M.R., AaronSw, Abecedare,
Abtweed98, Alektzin, Ali Obeid, AllanBz, Alpharigel, Amanjain, AndrewHowse, Anna Lincoln, Appoose, Aude, Aurimus, Awickert, AxelBoldt, Aydee, Aylex, Baccyak4H, Beetstra,
BenFrantzDale, Bhockey10, Bidabadi, Bluemaster, Bo Jacoby, Boreas231, Boxplot, Br43402, Brock, Bryan Derksen, Bsilverthorn, Btyner, Bubba73, Burn, CBM, CRGreathouse, Calvin 1998,
Can't sleep, clown will eat me, CapitalR, Cburnett, Cenarium, Charles Matthews, Charles Wolf, Chill doubt, Chris53516, ChrisHodgesUK, Christopher Parham, Ciphergoth, Coffee2theorems,
ComputerPsych, Conversion script, Coolhandscot, Coppertwig, Coubure, Courcelles, Crescentnebula, Cruise, Cwkmail, Cybercobra, DFRussia, Damian Yerrick, DanSoper, Dannya222,
Darwinek, David Haslam, DavidCBryant, Den fjättrade ankan, Denis.arnaud, Dima373, Dj thegreat, Doood1, Drilnoth, Drostie, Dudzcom, Dzordzm, EOBarnett, Eclecticos, Ed Poor, Edin1,
EelkeSpaak, Egorre, Elektron, Elockid, Enochlau, Epbr123, Eric Kvaalen, Ericd, Evan Manning, Fang Aili, Fangz, Fergusq, Fgnievinski, Fibonacci, Fintor, Firelog, Fledylids, Fnielsen,
Fresheneesz, G716, GB fan, Galastril, Gandrusz, Gary King, Gauravm1312, Gauss, Geekinajeep, Gex999, GibboEFC, Giftlite, Gil Gamesh, Gioto, GordontheGorgon, Gperjim, Graft, Graham87,
Gunnar Larsson, Gzornenplatz, Gökhan, Habbie, Heimstern, Henrygb, HereToHelp, Heron, Hiihammuk, Hiiiiiiiiiiiiiiiiiiiii, Hu12, Hugo gasca aragon, Ian Pitchford, It Is Me Here, Ivan Štambuk,
Iwaterpolo, J heisenberg, JaGa, JahJah, JanSuchy, Jason.yosinski, Jeff560, Jim.belk, Jitse Niesen, Jmlk17, Joebeone, Jorgenumata, Joris Gillis, Josephus78, Josuechan, Jpk, Jpsauro, Junkinbomb,
KMcD, KP-Adhikari, Karl-Henner, Kaslanidi, Kay Dekker, Keilana, KipKnight, Kjtobo, Knutux, LOL, Lansey, Laurifer, Lee Daniel Crocker, Leon7, Lilac Soul, Livius3, Lixy, Loadmaster,
Lpele, Lscharen, Lself, MATThematical, MIT Trekkie, Manticore, MarkSweep, Markus Krötzsch, Marlasdad, Mateoee, Mcorazao, Mdebets, Mebden, Meelar, Melcombe, Message From Xenu,
Michael Hardy, Michael Zimmermann, Miguel, Millerdl, Mindmatrix, MisterSheik, Mkch, Mm 202, Morqueozwald, Mr Minchin, Mr. okinawa, MrOllie, MrZeebo, Mundhenk, Mwtoews,
Mysteronald, Naddy, Nbarth, Nicholasink, Nicolas1981, Nilmerg, NoahDawg, Noe, Nolanbard, O18, Ohnoitsjamie, Ojigiri, Oleg Alexandrov, Oliphaunt, Olivier, Orderud, Ossiemanners,
Owenozier, PAR, PGScooter, Pablomme, Pabristow, Paclopes, Patrick, Paul August, Paulpeeling, Pcody, Pdumon, Personman, Petri Krohn, Pfeldman, Pgan002, Pinethicket, Piotrus, Plantsurfer,
Policron, Prodego, Prumpf, Ptrf, Qonnec, Quietbritishjim, Qwfp, R3m0t, RDBury, RHaworth, RSStockdale, Rabarberski, Rajah, Rajasekaran Deepak, Randomblue, Rbrwr, RexNL, Rich
Farmbrough, Richwales, Rjwilmsi, Rmrfstar, Robbyjo, Romanski, Ronz, RxS, Ryguasu, SGBailey, SJP, Saintrain, SamuelTheGhost, Samwb123, Sander123, Schmock, Schwnj, Scohoust,
Seidenstud, Seliopou, Seraphim, Sergey Suslov, SergioBruno66, Shabbychef, Shaww, Siddiganas, Sirex98, Snoyes, Somebody9973, Stan Lioubomoudrov, Stephenb, Stpasha, StradivariusTV,
Sullivan.t.j, SusanLarson, Sverdrup, Svick, Taxman, Tdunning, TeaDrinker, The Anome, The Tetrast, TheSeven, Thekilluminati, TimBentley, Tomeasy, Tomi, Tommy2010, Trewin, Tristanreid,
Trollderella, Troutinthemilk, Tryggvi bt, Tschwertner, Tstrobaugh, Unyoyega, Vakulgupta, Velocidex, Vhlafuente, Vijayarya, Vinodmp, Vrkaul, Waagh, Wakamex, Wavelength, Why Not A
Duck, Wile E. Heresiarch, Wilke, Will Thimbleby, Willking1979, Wissons, Wwoods, XJamRastafire, Yoshigev, Zero0000, Zhurov, Zrenneh, Zundark, Zvika, 589 anonymous edits

Standard deviation  Source: http://en.wikipedia.org/w/index.php?oldid=387215420  Contributors: 1exec1, AJR, Aberglaube, Abscissa, AbsolutDan, Abtin, Adamjslund, Addshore, Adi4094,
Admissions, Aeriform, Afa86, Alansohn, Ale jrb, Alex.g, Alexandrov, Allessia67, Alvinwc, Amead, Amitch, Amorim Parga, Anameofmyveryown, Andre Engels, Andres, AndrewWTaylor,
Andy Marchbanks, Anonymous Dissident, Anonymous editor, Anwar saadat, Arbitrarily0, Aroundthewayboy, Artichoker, Artorius, Asaba, Ashawley, Ashiabor, AugPi, AxelBoldt, Bart133,
Bdesham, Beefyt, Beetstra, Behco, Beland, BenFrantzDale, BiT, Billgordon1099, Blehfu, Bo Jacoby, Bobo192, Bodnotbod, Brianga, Brutha, BryanG, Bsodmike, Btyner, Buchanan-Hermit,
Bulgaroctonus, Butcheries, CJLL Wright, CRGreathouse, CSWarren, CWii, CYD, Calculator1000, CambridgeBayWeather, Captain-n00dle, Cathardic, Ceyockey, Charles Matthews, Chatfecter,
Chillwithabong, Chris the speller, ChrisFontenot13, Chrism, Christopher Parham, Chrysrobyn, Ck lostsword, Clemwang, Cmichael, Coffee2theorems, Conversion script, Coppertwig, Corrigann,
Crazy Boris with a red beard, Crisófilax, Cutler, DRHagen, DRTllbrg, DVD R W, DanielCD, Danielb613, Danski14, Dave6, DavidMcKenzie, Davidkazuhiro, Dcoetzee, Ddiazhn, Ddofborg, Ddr,
Decayintodust, DeeDeeKerby, Dekuntz, Delldot, Den fjättrade ankan, DerHexer, Dhanya139, Dick Beldin, DieterVanUytvanck, Diomidis Spinellis, Dirkbb, Discospinster, Doctorambient,
DomCleal, Dominus, Drappel, Dycedarg, Dylan Lake, Earth, Eb Oesch, Economist 2007, Egriffin, Elanb, Elaragirl, Elliskev, Emerah, Enigmaman, Epbr123, Eric Olson, Esrever, Eve
Teschlemacher, Everyking, Falcon8765, Fatla00, Felixrising, Flamurai, Forlornturtle, Forty two, Frehley, Frencheigh, Fsiler, Furrykef, G.engelstein, G716, Gabbe, Gail, Gary King, Gatoclass,
Gauge, Gauravm1312, Geneffects, George Drummond, Gerriet42, Giftlite, Gilliam, Gingemonkey, Giraffedata, Gjshisha, GlassCobra, Glen, Gogowitsch, Gouveia2, Graham87, Greentryst, Greg
L, Gurch, Gvanrossum, Gyro Copter, Gzkn, Gökhan, H3llkn0wz, HaakonHjortland, Hadleywickham, Haham hanuka, Haizum, HalfShadow, Harmil, HatlessAtlas, Hawaiian717, Hede2000,
Heezy, Helix84, Henrygb, Hgberman, Hgrenbor, Hu12, Hut 8.5, Iccaldwell, Imagine Reason, Inomyabcs, Intangir, Iridescent, Isaac Dupree, Isis, Isomorphic, IustinPop, JForget, JNW, JRBrown,
Jacob grace, JadeInOz, Jake04961, Jamamala, Jamned, Janderk, Jcw69, Jean15paul, Jeremy68, Jeremykemp, Jfitzg, Jim.belk, Jmoorhouse, Jni, Joerite, John Newbury, John11235813, JohnCD,
Jratt, Jts10101, Justanyone, Justinep, KJS77, Kainaw, Kbolino, Kelvie, Khunglongcon, Kingpin13, Kingturtle, Kiril Simeonovski, Kjtobo, Knkw, Knutux, Krinkle, Kungfuadam, Kuratowski's
Ghost, Kuru, Kvng, Kyle824, LGW3, Lambiam, Larry_Sanger, Ldm, Learning4ever, LeaveSleaves, Legare, Lethe, LewisWasGenius, Lgauthie, Lilac Soul, LizardJr8, Loodog, Lucasgw8, Luna
Santin, LysolPionex, M2Ys4U, M360 Real, MBisanz, MCepek, MER-C, MONGO, Madir, Madoka, MagneticFlux, Magnus Bakken, Malo, Mapley, Marcos, MarkSweep, Markhebner,
Markkawika, Markpravda, Marokwitz, Matthew Yeager, Mbloore, Mbweissman, McKay, Mcorson, Melchoir, Melcombe, Mercuryeagle, Mets501, Mhinckley, Miaow Miaow, Michael Hardy,
Miguel.mateo, Mike Rosoft, Mimithebrain, MisterSheik, Mjg3456789, Mjrice, Mjroyster, Moeron, Mollerup, Mooli, MrOllie, Ms2ger, Msm, Mud4t, Munckin, Murb:, Mwtoews, NHRHS2010,
Nakon, Nathandean, NawlinWiki, Nbarth, Nectarflowed, NeilRickards, Neudachnik, Neutrality, Ngoddard, Niallharkin, Nigholith, Noe, Nonagonal Spider, Normy rox, NorwegianBlue, Novalis,
O18, Octahedron80, Ocvailes, Oleg Alexandrov, Omegatron, Omicronpersei8, Oxymoron83, P toolan, P.Silveira, PMHauge, Pak21, Pakaran, Paul August, Peter ryan 1976, Pharaoh of the
Wizards, PhilKnight, Philip Trueman, Pickledweller, Pinethicket, Piotrus, Poochy, Ppardal, Psb777, Pstanton, Psychlohexane, Publius3, Qwfp, RDBury, RJaguar3, RadioKirk, Ranjithsutari,
RayAYang, Razorflame, Rednblu, Reedy, Rettetast, Revipm, Rich Farmbrough, Richard001, Rickyfrizzlebum, Rompe, Rose Garden, Rseay267, Ryk, Salix alba, Sam Korn, Sameer r,
SamuelTheGhost, Sander123, Savidan, Savie Kumara, SebastianHelm, Seresin, Shanes, Shaun ward, Sietse Snel, Skbkekas, SkerHawx, Skittle, Slakr, Slowking Man, Someguy1221,
Sonicblade128, SpeedyGonsales, Speedyboy, Spinality, Spliffy, Sstoneb, Stemonitis, Stepheng3, Stevvers, StewartMH, Storkk, Stpasha, Suffusion of Yellow, Surachit, Susurrus, Swcurran,
THEN WHO WAS PHONE?, Takanoha, Taxman, Tayste, TedE, Tempodivalse, Thadius856, The Thing That Should Not Be, The sock that should not be, Thingg, ThomasNichols,
ThomasStrohmann, Thr4wn, Tide rolls, Titoxd, Tlroche, Tom harrison, Tomi, Tompa, Tosayit, Tpbradbury, TradingBands, Triwbe, Urdutext, Useight, Vaughan Pratt, Verbum Veritas, Versus22,
Vice regent, VictorAnyakin, VladimirReshetnikov, Voyagerfan5761, Waggers, Warniats, Wavelength, Wikipe-tan, Wikipelli, Wildingd, William Avery, Winchelsea, Wmahan, Wolfkeeper,
Wonglkd, Woood, Wykypydya, X-Fi6, Yachtsman1, Yamaguchi先生, Yochai Twitto, Zafiroblue05, Zenkat, Zhieaanm, Zigger, Zvika, ‫کشرز‬, 1353 anonymous edits

Random variable  Source: http://en.wikipedia.org/w/index.php?oldid=384035501  Contributors: Aearluin, AlanUS, Albert Rosado, Albmont, Alfredo J. Herrera Lago, Algebraist, AllanBz,
Andrew Maiman, Andyjsmith, Anonymous Dissident, ArnoldReinhold, AxelBoldt, Belizefan, BenFrantzDale, Bewrocrat, Bjcairns, Brian Tvedt, BrokenSegue, Bryan Derksen, CaseInPoint,
Ccerer, Constructive editor, Conversion script, Courcelles, Creidieki, Damodarnb, Danielx, Davewho2, Dbtfz, Demonocracy, Dick Beldin, Discospinster, DrEricH, Dysepsion, Dysprosia,
Error792, FF2010, Fangz, Flammifer, Flavio Guitian, Frencheigh, Fresheneesz, G716, Giftlite, Glane23, Googl, Graham87, Gulbrand, Haham hanuka, Hede2000, Helgus, Hiken86, Illykai,
JamesBWatson, Jason Goldstick, Jheald, Jitse Niesen, Jk350, Jmath666, Justin W Smith, Karada, Kbodouhi, Keegan, Keithalewis, Kiril Simeonovski, Kjs50, Kku, Kurtan, LOL, Lambiam,
Loodog, Lova Falk, LoveMonkey, Marc van Leeuwen, MarkS, Markaci, Marner, Mathemajor, Maxim Razin, Maximus Rex, Mcorazao, Melcombe, Memming, Metacomet, Michael Hardy,
Miguel, MisterSheik, Mitchoner, Mmernex, Moberg, Mothmolevna, Msh210, Nageh, Ncmathsadist, Numbo3, O18, Oksala, Oleg Alexandrov, Oruaann, Orz, Owenozier, Oxymoron83, P64,
Patrick, Paul August, Paul Pogonyshev, Pax:Vobiscum, Peleg, Phill, Pintu 052, Pooryorick, Pstudier, Qwfp, Rdsmith4, Relativefrequency, Rich Farmbrough, Seberle, Shawnc, Shoefly, Sl, Solian
en, Stevertigo, Stpasha, Svick, TedDunning, Thehotelambush, Thelittlestspoon, Tomek81, Topology Expert, Tsirel, VMS Mosaic, Waldir, Yintan, Yworo, Zimbie, Zundark, Zzxterry, 176
anonymous edits

Probability distribution  Source: http://en.wikipedia.org/w/index.php?oldid=385275015  Contributors: (:Julien:), 198.144.199.xxx, 3mta3, A.M.R., A5, Abhinav316, AbsolutDan, Adrokin,
Alansohn, Alexius08, Ap, Applepiein, Avenue, AxelBoldt, BD2412, Baccyak4H, Bfigura's puppy, Bhoola Pakistani, Bkkbrad, Bryan Derksen, Btyner, Calvin 1998, Caramdir, Cburnett, Chirlu,
Chris the speller, Classical geographer, Closedmouth, Conversion script, Courcelles, Damian Yerrick, Davhorn, David Eppstein, David Vose, DavidCBryant, Dcljr, Delldot, Den fjättrade ankan,
Dick Beldin, Digisus, Dino, Domminico, Dysprosia, Eliezg, Emijrp, Epbr123, Eric Kvaalen, Fintor, Firelog, Fnielsen, G716, Gaius Cornelius, Gala.martin, Gandalf61, Gate2quality, Giftlite,
Gjnyasa, GoodDamon, Graham87, Hu12, ImperfectlyInformed, It Is Me Here, Iwaterpolo, J.delanoy, JRSpriggs, Jan eissfeldt, JayJasper, Jclemens, Jipumarino, Jitse Niesen, Jon Awbrey,
Josuechan, Jsd115, Jsnx, Jtkiefer, Knutux, Larryisgood, LiDaobing, Lilac Soul, Lollerskates, Lotje, Loupeter, MGriebe, MarkSweep, Markhebner, Marner, Megaloxantha, Melcombe, Mental
Blank, Michael Hardy, Miguel, MisterSheik, Morton.lin, MrOllie, Napzilla, Nbarth, Noodle snacks, NuclearWarfare, O18, OdedSchramm, Ojigiri, OverInsured, Oxymoron83, PAR, Pabristow,
Patrick, Paul August, Pax:Vobiscum, Pgan002, Phys, Ponnu, Poor Yorick, Populus, Ptrf, Quietbritishjim, Qwfp, Riceplaytexas, Rich Farmbrough, Richard D. LeCour, Rinconsoleao,
Roger.simmons, Rursus, Salgueiro, Salix alba, Samois98, Sandym, Schmock, Seglea, Serguei S. Dukachev, ShaunES, Shizhao, Silly rabbit, SiobhanHansa, Sky Attacker, Statlearn, Stpasha,
TNARasslin, TakuyaMurata, Tarotcards, Tayste, Techman224, Thamelry, The Anome, The Thing That Should Not Be, TheCoffee, Tomi, Topology Expert, Tordek ar, Tsirel, Ttony21,
Unyoyega, Uvainio, VictorAnyakin, Whosasking, Whosyourjudas, X-Bert, Zundark, 218 anonymous edits
Article Sources and Contributors 234

Real number  Source: http://en.wikipedia.org/w/index.php?oldid=387056239  Contributors: 345Kai, AbcXyz, Acct001, Acetic Acid, Addshore, AgentPeppermint, Ahoerstemeier, Aitias, Aizenr,
Akanemoto, Alai, AlexBedard, Aliotra, Amalas, Andre Engels, Andres, Andrewrost3241981, Angielaj, AnnaFrance, Anonymous56789, Antonio Lopez, Arichnad, Arthur Rubin, AstroNomer,
Avaya1, AxelBoldt, AzaToth, Bagatelle, Balster neb, BenB4, Bertik, Bobo192, Boemanneke, Borgx, Brian0918, Brion VIBBER, Bryan Derksen, CONFIQ, CRGreathouse, Carwil,
Catherineyronwode, Charles Gaudette, Charles Matthews, Chinju, Chris Roy, Christian List, Conversion script, CorvetteZ51, Curps, Cyan, CynicalMe, DVdm, DYLAN LENNON, Damian
Yerrick, Debresser, Demmy100, Den fjättrade ankan, DerHexer, Digger3000, Dmharvey, Dmmaus, DonAByrd, Doradus, Doshell, Dysprosia, Długosz, Eddideigel, Egarres, Eiyuu Kou, Ejrh, El
C, Elizabeyth, Elroch, Equendil, Eric119, Euandrew, FilipeS, FocalPoint, Franklin.vp, Fredrik, Freezercake4d4, Fresheneesz, Fropuff, Frungi, Future Perfect at Sunrise, Gaius Cornelius,
Galoubet, Gemini1980, Gene Ward Smith, Gesslein, Giftlite, Goodnightmush, Grafen, Graham87, Grover cleveland, Hans Adler, Heartyact, Helenginn, Herbee, Hmains, Ian Maxwell, Ideyal,
Immunize, Isnow, Iulianu, IvanDurak, J.delanoy, J04n, Jaberwocky6669, JackSchmidt, Jagged 85, JamesMazur22, Jcrocker, Jerzy, Jiddisch, Jitse Niesen, Joeblakesley, Josh Cherry, Josh Parris,
Jrtayloriv, Jshadias, Jumbuck, Jusdafax, Karch, Klemen Kocjancic, Koeplinger, Leks81, Lethe, Linas, LizardJr8, Lockeownzj00, LongAgedUser, Loodog, MC10, MPerel, Macrakis, Mani1,
Marek69, Markcollinsx, Marquez, Masgatotkaca, Mejor Los Indios, Michael Hardy, Michael Keenan, Miguel, MikeHobday, Miles, Modernist, Motomuku, Mr Death, Ms2ger, Msh210,
Myahmyah, Mygerardromance, N Shar, Nabla, NawlinWiki, Nbarth, Newone, Niking87, Nil Einne, Nk, No Guru, Nono64, Northumbrian, Notinasnaid, Nowhither, Oleg Alexandrov, Omtay38,
Oxymoron83, Panoramix, Patrick, Paul August, Paxsimius, Pcap, Pdcook, Peql, Peterhi, PhotoBox, Piano non troppo, Pierre de Lyon, Pinethicket, Pizza Puzzle, Pizza1512, Platonicglove,
Pmanderson, Pomte, Poochy, Populus, Puddleglum Marshwiggle, Quaeler, Qwfp, R.e.b., R3m0t, Raja Hussain, Randomblue, RaseaC, Rasmus Faber, Renfield, Rich Farmbrough, Rmrfstar,
Romanm, Rph3742, Salix alba, Sam Hocevar, Sapphic, Scepia, Sesu Prime, Sfmammamia, Siddhant, Sjakkalle, Skizzik, Slowking Man, Smithpith, Sorrywikipedia1117, SpeedyGonsales, Spur,
Stephanwehner, Stevenj, Stevertigo, Stewartadcock, SuperMidget, Symane, T00h00, TGothier, Taejo, TakuyaMurata, Tarquin, Tarret, TenPoint, Tero, The Thing That Should Not Be, Tide rolls,
Tkuvho, Tobby72, Tobias Bergemann, Toby, Toby Bartels, Tosha, Tparameter, Trovatore, Tubalubalu, Tubby23, Tweenk, VKokielov, Varlaam, WAREL, Wimt, Wolfrock, WpZurp, Wshun,
X42bn6, XJamRastafire, Xantharius, Yarnalgo, Zero sharp, Zundark, Ævar Arnfjörð Bjarmason, 386 anonymous edits

Variance  Source: http://en.wikipedia.org/w/index.php?oldid=386749529  Contributors: 16@r, 212.153.190.xxx, 28bytes, ABCD, Aastrup, Abramjackson, AbsolutDan, Accretivehealth,
Adamjslund, Adonijahowns, Adpete, Afa86, Ahoerstemeier, Alai, Albmont, Alex756, AmiDaniel, Amir Aliev, Anameofmyveryown, Andre.holzner, Angela, Animum, AntiVMan,
Anuphysicsguy, As530, Auntof6, Awickert, Baccyak4H, Bart133, BenFrantzDale, Blotwell, Bmju, Bobo The Ninja, Borgx, Brandon Moore, Brian Sayrs, Bryan Derksen, Brzak, Btyner, CanDo,
Casey Abell, Cazort, Centrx, Cfp, Cgsguy2, Compassghost, Conversion script, Coppertwig, Cremepuff222, Cruise, Cryptomatt, Cumulonix, Cybercobra, DRE, DavidCBryant, Davwillev, Dcljr,
Dearleighton, Den fjättrade ankan, Diophantus, Disavian, DoctorW, Docu, Double Blind, Duncharris, Dylan Lake, Ehrenkater, Elgreengeeto, Emrahertr, EnJx, Eric-Wester, Eric.nickel, Eykanal,
Fibonacci, Foam bubble, G716, Gap, Garamatt, Giftlite, Gjshisha, Glimz, Graft, Guanaco, Gurch, Gzkn, Hao2lian, Happy-melon, Hede2000, Het, Hgberman, Ht686rg90, Hu12, Hulk1986, I am
not a dog, Inezz40, Inter16, Isaac Dupree, J.delanoy, JackSchmidt, Jackzhp, Jessemv, Jfessler, Jheiv, Jmath666, Joepwijers, Johnny Au, Josh Cherry, Jt68, Juha, JulesEllis, Junkinbomb, Justin W
Smith, Jutta, Katzmik, Keenan Pepper, Keilana, Kiril Simeonovski, Kstarsinic, Kurykh, Kymacpherson, LOL, Lambiam, Larry_Sanger, LeaW, Lilac Soul, Madprog, Mandarax, Marek69,
MarkSweep, Matthew.daniels, Maxí, Mbloore, McKay, Mebden, Mejor Los Indios, Melcombe, Mgreenbe, Michael Hardy, Michel M Verstraete, Mjg3456789, MrOllie, Msanford, Mwilde,
Mwtoews, Natalie Erin, Nbarth, Nevillerichards, Nicogla, Nijdam, Notedgrant, O18, Oleg Alexandrov, Orphan Wiki, Ottawa4ever, Paresnah, Patrick, Paul Pogonyshev, PerfectStorm, Pgan002,
Phantomsteve, Phoenix00017, Pichote, PimRijkee, Piotrus, Pmanderson, Pokipsy76, Psychlohexane, Qwfp, Ranger2006, Rbj, Rich Farmbrough, RobertCoop, RobinK, Robinh, Romanski, SD5,
Salix alba, Sanchom, SchfiftyThree, SereneStorm, Shoeofdeath, Shreevatsa, SimonP, Sinverso, Sirnumberguy, Skbkekas, Sligocki, Spinality, Spoon!, Stpasha, TedPavlic, The Thing That Should
Not Be, Thermochap, Thesilverbail, Thomag, Tide rolls, Tilo1111, Tim Starling, Tomi, TomyDuby, Unamofa, Unyoyega, Vaughan Pratt, Voidxor, Waldir, Wikomidia, William Graham,
Wmahan, WordsOnLitmusPaper, Wykypydya, Yamamoto Ichiro, Zippanova, Zirconscot, Zundark, Zven, Борис Пряха, 408 anonymous edits

Probability density function  Source: http://en.wikipedia.org/w/index.php?oldid=386782326  Contributors: 3mta3, A. Pichler, Autopilot, AxelBoldt, BlaiseFEgan, CBM, Chowbok,
ChrisIsBelow, Ciphers, Classicalecon, Complexica, Compvis, Cyp, Damiano.varagnolo, Dino, Dirk gently, Disavian, ENRGO, Fangz, Fnielsen, Freshraisin, Gcm, Giftlite, Henrygb, Icek, Inike,
Jalal0, Jayen466, Jeff G., John G. Miles, Jost Riedel, Jovan, Khoikhoi, Kku, Kn4sbs, KudzuVine, LOL, Lankiveil, Letournp, LiDaobing, Loodog, Markhebner, MassimoAr, Mattopia, Meduz,
Melcombe, Michael Hardy, Miguel, MisterSheik, Mmernex, Ohanian, Oleg Alexandrov, Paul Pogonyshev, Pgan002, Piotrus, Qwfp, Rferreirapt, Rich Farmbrough, Rimoll, Rotem Dan,
Ruthiebabes, Ryguasu, Salgueiro, Sjoosse, Skippy le Grand Gourou, Srleffler, Stpasha, Tac-Tics, TakuyaMurata, Tayste, Tercer, The Anome, Theodds, ThorinMuglindir, Tiaanvangraan, Tiles,
Tpb, Tsirel, Velocidex, Winterfors, Wmahan, Xanthius, ‫لیقع فشاک‬, 144 anonymous edits

Cumulative distribution function  Source: http://en.wikipedia.org/w/index.php?oldid=382709149  Contributors: 129.186.205.xxx, Adoniscik, Aeusoes1, Aitias, Ap, AxelBoldt, Bertrus,
Betaeleven, Bunyk, Casey1138, Cburnett, Constructive editor, Conversion script, Cretog8, David Haslam, Davwillev, Dick Beldin, Duncanka12, DylanW, Earlh, Elitropia, Flavio Guitian,
Fresheneesz, Gerbrant, Giftlite, Graham87, GregorB, Hede2000, Hu12, HyDeckar, Inike, Internetkid2006, Jeffq, Jeppesn, Jitse Niesen, Jmsteele, KHamsun, Kwamikagami, LOL, Larry_Sanger,
Lese, LiDaobing, Ling.Nut, Llorenzi, Loodog, Marqueed, Max Duchess, Melcombe, Michael Hardy, Miguel, MisterSheik, Neelix, Nickj, O18, Obradovic Goran, Oleg Alexandrov, OttoA, Paul
Pogonyshev, Phuzion, Qwfp, R.J.Oosterbaan, RajeevA, Rumping, SMesser, Sannse, Screwpassenger, Shaww, Sluzzelin, Splash, Spoon!, Sullivan.t.j, TakuyaMurata, Tedunning, Tiagofassoni,
Toby, User A1, Vlcb, Wmahan, X-Bert, Zach1994, Zundark, ‫ينام‬, 100 anonymous edits

Expected value  Source: http://en.wikipedia.org/w/index.php?oldid=387174758  Contributors: 65.197.2.xxx, A. Pichler, Aaronchall, Adamdad, Albmont, Almwi, AxelBoldt, B7582, Banus,
Bdesham, BenFrantzDale, Bjcairns, Brews ohare, Brockert, Bth, Btyner, CKCortez, Caesura, Calbaer, Caramdir, Carbuncle, Cburnett, Centrx, Charles Matthews, Chris the speller, Cloudguitar,
Coffee2theorems, Conversion script, Cretog8, Dartelaar, Daryl Williams, DavidCBryant, Dpv, Draco flavus, Drpaule, El C, Elliotreed, Fibonacci, FilipeS, Fintor, Fresheneesz, Funandtrvl,
Gala.martin, Gary King, Giftlite, Glass Sword, GraemeL, Grafen, Grapetonix, Greghm, Grubber, Guanaco, H2g2bob, HenningThielemann, Hyperbola, INic, Iakov, Idunno271828, Ikelos,
Jabowery, Jancikotuc, Jcmo, Jitse Niesen, Jj137, Jordsan, Jrincayc, Jsondow, Jt68, KMcD, Karol Langner, Katzmik, Kazabubu, Kurykh, LALess, LOL, Lee Daniel Crocker, Leighliu, Levineps,
Lponeil, MHoerich, MarSch, Markhebner, Mccready, Melchoir, Melcombe, Mgreenbe, Michael Hardy, Mindbuilder, Minimac, MrOllie, Netheril96, NinjaCharlie, O18, Obradovic Goran, Oleg
Alexandrov, Openlander, Ossiemanners, PAR, Patrick, Percy Snoodle, Pgreenfinch, Phdb, PierreAbbat, Pol098, Poor Yorick, Populus, Puckly, Q4444q, Qwfp, R3m0t, Reetep, Reric, Rjwilmsi,
RobHar, Robinh, Romanempire, Ronald King, Rray, Ryguasu, Saebjorn, Salix alba, Schmock, SebastianHelm, Shredderyin, Shreevatsa, Skarl the Drummer, Steve Kroon, Steven J. Anderson,
Stpasha, Tarotcards, Tarquin, Taxman, TedPavlic, Tejastheory, The Bad Boy 3584, TheObtuseAngleOfDoom, Tide rolls, Tobi Kellner, Tomi, Troy112233, Tsirel, Unfree, Unyoyega, Varuag
doos, Viesta, Werner.van.belle, Wmahan, Yesitsapril, Zero0000, ZeroOne, Zojj, ZomBGolth, Zvika, 207 anonymous edits

Discrete probability distribution  Source: http://en.wikipedia.org/w/index.php?oldid=377893105  Contributors: (:Julien:), Alan smithee, Algebraist, AxelBoldt, Billinghurst, Bjcairns, Bob.v.R,
CRGreathouse, Classicalecon, Closedmouth, Conversion script, Dreadstar, Dues Ex Machina, G716, Gary King, Giftlite, Hammerite, Incnis Mrsi, Jamelan, Kurykh, Linas, Melcombe, Michael
Hardy, MisterSheik, Nabla, NawlinWiki, Novosyolov, Oleg Alexandrov, P64, Ptmc2112, Qwfp, Rich Farmbrough, RoseParks, Rumping, Salix alba, TakuyaMurata, The enemies of god, Trevor
MacInnis, Zalle, Zundark, 26 anonymous edits

Continuous probability distribution  Source: http://en.wikipedia.org/w/index.php?oldid=384004327  Contributors: 1ForTheMoney, Avenue, AxelBoldt, Conversion script, Coppertwig, Edison,
Filemon, It Is Me Here, Jamelan, LOL, Larry_Sanger, MarkSweep, Melcombe, Michael Hardy, Minesweeper.007, MisterSheik, Patrick, Paul August, Pjrobertson, Psolrzan, Qwfp, Radagast83,
Reedy, Rhetth, Ricardogpn, Ruinia, Rumping, Sevilledade, Stijn Vermeeren, Stpasha, Ulner, 29 anonymous edits

Probability mass function  Source: http://en.wikipedia.org/w/index.php?oldid=376291855  Contributors: Bjcairns, Booyabazooka, Brunton, Casper2k3, CesarB, Eraserhead1, Giftlite, Incnis
Mrsi, J.delanoy, Jiejunkong, Jitse Niesen, Jj137, LOL, LimoWreck, Melcombe, Memming, Michael Hardy, MisterSheik, Noyder, Oleg Alexandrov, Pi.C.Noizecehx, Qwfp, Rama, Salgueiro, Silly
rabbit, The Anome, Typofier, Zundark, 60 anonymous edits

Continuous function  Source: http://en.wikipedia.org/w/index.php?oldid=386068785  Contributors: 213.253.39.xxx, ABCD, AdamSmithee, Aetheling, Ams80, Andywall, Ap, Army1987,
Arthena, Ashted, AxelBoldt, Bdmy, BenKovitz, Bethnim, Bloodshedder, CRGreathouse, Charles Matthews, Cheeser1, Cic, Conversion script, Ctmt, D.M. from Ukraine, Dallashan, Darth Panda,
Dcoetzee, DomenicDenicola, Domitori, Dr.K., Dysprosia, EdC, Edemaine, Error792, Evilchicken1234, Fabartus, Felix Wiemann, Fgnievinski, Fiedorow, Fresheneesz, Giftlite, Glenn, Gombang,
Graham87, Grinevitski, Gthb, Harriv, Henry Delforn, Hqb, HyDeckar, Hyacinth, Iameukarya, Ian Pitchford, Igiffin, Igrant, Intangir, Isomorphic, Iulianu, Jacj, JahJah, Jim.belk, Jimp, Jitse Niesen,
Joseaperez, Jrtayloriv, Jshadias, K-UNIT, Katzmik, Klutzy, Kompik, LachlanA, Lambiam, Larryisgood, Lee Larson, Leoremy, Linas, Lupin, MC10, MSGJ, Markus Krötzsch, MathMartin, Mdd,
Michael Hardy, Mikez, Monkey 32606, Mormegil, Mplourde, Msh210, Musicpvm, NawlinWiki, Nbarth, Oleg Alexandrov, PV=nRT, Paul August, Pdn, Penumbra2000, Pillcrow, Pizza Puzzle,
QYV, Qz, RDBury, Ramzzhakim, Rbb l181, Rhetth, Rick Norwood, Rinconsoleao, Roman3, Sabbut, Salgueiro, Sapphic, Sbacle, Schneelocke, Seb35, Sligocki, Smmurphy, Splarka, Stan
Lioubomoudrov, Stca74, Stevenj, StradivariusTV, Sullivan.t.j, Svick, T00h00, TedPavlic, Template namespace initialisation script, Thehotelambush, Thenub314, Thierry Caro, Tiagofassoni,
Timhoooey, Tkuvho, Tlevine, Tobias Bergemann, Toby, Tosha, Tuxedo junction, Ulipaul, Ultramarine, Wolfrock, Wshun, Xantharius, Yacht, Youandme, Zoicon5, Zundark, ZyMOS, 127
anonymous edits

Measure (mathematics)  Source: http://en.wikipedia.org/w/index.php?oldid=383839920  Contributors: 16@r, 3mta3, ABCD, AiusEpsi, Akulo, Alansohn, Albmont, AleHitch, Andre Engels,
Arvinder.virk, Ashigabou, AxelBoldt, Baaaaaaar, Bdmy, Beaumont, BenFrantzDale, Benandorsqueaks, Bgpaulus, Boobahmad101, Boplin, Brian Tvedt, BrianS36, CRGreathouse, CSTAR,
Caesura, Cdamama, Charles Matthews, Charvest, Conversion script, Danielbojczuk, Daniele.tampieri, Dark Charles, Dave Ordinary, DealPete, Digby Tantrum, Dino, Discospinster, Dowjgyta,
Dpv, Dysprosia, EIFY, Edokter, Elwikipedista, Empty Buffer, Everyking, Fibonacci, Finell, Foxjwill, Gabbe, Gadykozma, Gar37bic, Gauge, Geevee, Geometry guy, Giftlite, Gilliam, Googl,
Harriv, Henning Makholm, Hesam7, Irvin83, Isnow, Iwnbap, Jay Gatsby, Jheald, Jorgen W, Joriki, Juliancolton, Jóna Þórunn, Keenanpepper, Kiefer.Wolfowitz, Lambiam, Le Docteur, Lethe,
Levineps, Linas, Loisel, Loren Rosen, Lupin, MABadger, MER-C, Manop, MarSch, Markjoseph125, Masterpiece2000, Mat cross, MathKnight, MathMartin, Matthew Auger, Mebden,
Melcombe, Michael Hardy, Miguel, Mike Segal, Mimihitam, Mousomer, MrRage, Msh210, Nbarth, Obradovic Goran, Oleg Alexandrov, OverlordQ, Patrick, Paul August, PaulTanenbaum,
Pdenapo, PhotoBox, Pmanderson, Point-set topologist, Prumpf, Ptrf, RMcGuigan, Rat144, RayAYang, Revolver, Rgdboer, Richard L. Peterson, Rktect, Salgueiro, Salix alba, SchfiftyThree,
Semistablesystem, Stca74, Sullivan.t.j, Sverdrup, Sławomir Biały, TakuyaMurata, Takwan, The Infidel, The Thing That Should Not Be, Thehotelambush, Thomasmeeks, Tobias Bergemann,
Article Sources and Contributors 235

Toby, Toby Bartels, Tosha, Tsirel, Turms, Uranographer, Vivacissamamente, Weialawaga, Xantharius, Zero sharp, Zundark, Zvika, 134 anonymous edits

Bias of an estimator  Source: http://en.wikipedia.org/w/index.php?oldid=377588775  Contributors: Aaron Kauppi, AgentRew, Ahauptfleisch, Aniboy2000, Barichd, Belchman, BenFrantzDale,
Bo Jacoby, C45207, Cancan101, Dcoetzee, Ecov, Farshidforouz, Gauravm1312, Giftlite, Hongooi, Jheald, Kiefer.Wolfowitz, Landroni, Marmelad, Melcombe, Michael Hardy, Mikael
Häggström, Mmernex, Nbarth, Netopir, O18, Pharaoh of the Wizards, Piil, Q0k, Qwfp, Reenus, Schomerus, Sergey shandar, Shadiakiki1986, Sohanz, Spoon!, Stpasha, ThatProf, Thr4wn,
Tybruce, Uraza, Wikomidia, Willsmith, Zvika, 63 anonymous edits

Probability  Source: http://en.wikipedia.org/w/index.php?oldid=387105618  Contributors: 21655, APH, Abby, Abby1019, AbsolutDan, Acerperi, Acroterion, Aitias, Aka042, Alansohn,
Alberg15, Alexjohnc3, Aliyah4499, Altenmann, Amalthea, Andeggs, AndrewHowse, Antandrus, Antonwalter, Ap, Arakunem, Arcfrk, Arenarax, Arjun01, ArnoLagrange, Avenue, BRUTE,
Badgernet, Beaumont, Bfinn, Bhound89, Bjcairns, Bobblewik, Bobo192, Braddodson, Brendo4, Brianjd, Brumski, Bryan Derksen, Btball, Buttonius, CBM, CO, CSTAR, Cactus.man, Caltas,
CanisRufus, Capitalist, Capitan Obvio, Capricorn42, Captmog, Carricko, Ceannaideachd, Cenarium, Centrx, Charles Matthews, CharlotteWebb, Chas zzz brown, Chetan.Panchal, Ciphers,
Classical geographer, Clausen, Clovis Sangrail, Connormah, Conversion script, Coppertwig, Craphouse, CrazyChemGuy, Cremepuff222, Cyclone49, D, DEMcAdams, DJ Clayworth, Dabomb87,
Danno12345, DarkFalls, DaveBrondsema, David Martland, David from Downunder, Dbtfz, Debator of mathematics, Dekisugi, Demicx, Demnevanni, Desteg, Dhammapal, Dirtytedd,
Discospinster, Disneycat, DopefishJustin, Doug Bell, Drestros power, Drivi86, Drmies, Dysprosia, ESkog, Ebsith, Edgar181, Ehheh, El Caro, Eliotwiki, Enchanter, Eog1916, Epbr123, Ettrig,
Evercat, Excirial, Fangz, Fantastic4boy, Fastilysock, Favonian, Fetchcomms, FishSpeaker, Flammifer, Footballfan190, FrF, FrankSanMiguel, Fred Bauder, Free Software Knight, FreplySpang,
G716, Gail, Garion96, Giftlite, Giggy, GoldenPi, Googie man, Graham87, Grstain, Guess Who, Gwernol, Hadal, Haduong, Hagedis, Happy-melon, Hasanbay, Hasihfiadhfoiahsio, Henrygb,
Heron, Hirak 99, Hoomank, Hu12, Hut 8.5, II MusLiM HyBRiD II, INic, Ideyal, Ignacio Icke, Infarom, Instinct, Ixfd64, J.delanoy, JJL, JTN, Ja 62, Jacek Kendysz, Jackollie, Jake Wartenberg,
JamesTeterenko, Jaysweet, Jeff G., Jeffw57, Jheald, Jimmaths, Jitse Niesen, Jj137, Jmlk17, Jni, John Vandenberg, Johnleemk, Johnuniq, Jonik, JosephCampisi, Jpbowen, Jung dalglish, Jwpurple,
KG6YKN, Kaisershatner, Kaksag, Kbodouhi, Kevmus, King Mir, Kingpin13, Klapper, Koyaanis Qatsi, Krantz2, Kurtan, Kushalneo, Kzollman, Lambiam, Larklight, Learnhead, Lee J Haywood,
Lenoxus, Levineps, LiDaobing, Liang9993, Lifung, Lipedia, Lit-sci, Localhost00, Looxix, LoveMonkey, Lugnuts, MER-C, Mabsjenbu123, Mac Davis, Mario777Zelda, MarkSweep,
Markjoseph125, Marquez, MathMartin, Matthew Auger, Mattisse, Maximaximax, McSly, Mebden, Melcombe, Menthaxpiperita, Metagraph, Mets501, Michael Hardy, Mikemoral, Mild Bill
Hiccup, Mindmatrix, Minesweeper, MisterSheik, Mlpkr, Mortein, MrOllie, Msh210, Myasuda, Mycroft80, NYKevin, NatusRoma, NawlinWiki, Ncmvocalist, NewEnglandYankee, Nigholith,
Nijdam, Noctibus, NoisyJinx, Nsaa, Ogai, Omicronpersei8, Onore Baka Sama, OwenX, Oxymoron83, Packersfannn101, Paine Ellsworth, PaperTruths, Patrick, Paul August, Paulcd2000,
Pax:Vobiscum, Pd THOR, Pdn, Peter.C, Peterjhlee, PhilKnight, Philip Trueman, Philippe, Philtr, Pinethicket, Pointless.FF59F5C9, Progicnet, Psyche825, Puchiko, Putgeminmouth, QmunkE,
Qwertyus, Qwfp, RVS, RabinZhao, Randomblue, RandorXeus, Ranger2006, RattleMan, Razorflame, Readro, Recentchanges, Reddi, Reedy, Regancy42, Requestion, RexNL, Richard001,
Richardajohns, Riotrocket8676, Rogimoto, Ronhjones, Ronz, Rtc, RuM, Sagittarian Milky Way, Salix alba, Santa Sangre, Scfencer, SchfiftyThree, Schwnj, Sengkang, Sevilledade,
ShawnAGaddy, Shoeofdeath, Sina2, SiobhanHansa, Sluzzelin, Snoyes, Solipsist, Someguy1221, SonOfNothing, Srinivasasha, Stephen Compall, Stevenmitchell, Stux, Suicidalhamster, Suisui,
SusanLesch, Swpb, Sycthos, Symane, Takeda, Tarheel95, Tautologist, Taxisfolder, Tayste, The Thing That Should Not Be, The Transhumanist, TheGreenCarrot, Thesoxlost, Thingg, Tide rolls,
TigerShark, Tintenfischlein, Treisijs, Trovatore, Twisted86, Uncle Dick, UnitedStatesian, Valodim, Vandal B, Vanished User 1004, Varnesavant, VasilievVV, Velho, Vericuester, Vicarious,
Virgilian, Vivacissamamente, Voyagerfan5761, Wafulz, Wapcaplet, Wetman, Wikistudent 1, Wile E. Heresiarch, William915, Wimt, Wmahan, Wordsmith, Wormdoggy, Wxlfsr, Wyatts, XKL,
Yamakiri, Ybbor, Yerpo, Youandme, YourEyesOnly, Zach1994, Zalle, Zundark, ‫ןמיירפ‬, 701 anonymous edits

Pierre-Simon Laplace  Source: http://en.wikipedia.org/w/index.php?oldid=385035312  Contributors: 16@r, 213.253.39.xxx, 3mta3, 5 albert square, Ac1201, Adam McMaster, Ahoerstemeier,
Ajb, Alfio, Amicon, Aminrahimian, Andre Engels, Andres, Angela, AnonMoos, Arcadia616, Asperal, Asyndeton, AtticusX, Attilios, AugPi, Avicennasis, Bachrach44, Bemoeial, Ben-Zin,
Bender235, BerndGehrmann, Bkonrad, Blueboy814, Bracodbk, Bsskchaitanya, Bubba73, C.Fred, Can't sleep, clown will eat me, Caroldermoid, Charles Matthews, Chicheley, Chris Hardy, Chris
the speller, ChrisfromHouston, Corystight1, Courcelles, Cozy, CrocodileMile, Curps, Cutler, Cyan, D6, DJ Clayworth, Dadude3320, Dchristle, Deb, Den fjättrade ankan, Dispersion,
Doctorsundar, Docu, Dv82matt, ERcheck, Eeekster, Electron9, Ellywa, Elsweyn, Elysnoss, Emerson7, Eric Kvaalen, Everyking, Francis Schonken, Gaara144, Gadfium, Gauss, Geni, GeoGreg,
Giftlite, Gliese876, Gmaxwell, Goochelaar, GraemeL, Graham87, GregorB, Haham hanuka, Hannoscholtz, HappyApple, Hektor, Hemmingsen, Hongooi, Hqb, Husond, Indiedude, J.delanoy,
J04n, JASpencer, Jaerik, Jamesmorrison, Jaredwf, Jaytan, Jmu2108, Johan1298, John, Johnbibby, Jojit fb, Joseph Solis in Australia, Jugbo, Julesd, Jumbuck, Jusdafax, Knutux, Kostisl, Kraxler,
LarryB55, Lexor, LilHelpa, Lova Falk, Lradrama, Lucidish, Lunarian, Lupo, Lzur, Mackensen, Maestlin, Maghnus, Manop, Marcus2, Markus Poessel, MartinHarper, Mashford, Metacomet,
Metasquares, Michael Hardy, Mike Rosoft, Mild Bill Hiccup, Mion, Mitteldorf, Mneideng, Monegasque, Mpatel, Mschlindwein, NBeale, NeueWelt, Neutrality, New World Man, Nicolaennio,
Nixdorf, Nk, Oleg Alexandrov, Olivier, Paine Ellsworth, Palnot, Paolo.dL, Paul August, PaulGarner, PdDemeter, Piniricc65, Pizza1512, Plucas58, Pmanderson, Pohick2, Pointqwert, Postdlf,
Pred, Promus Kaa, Psients, Ptranouez, Punstar, QueenAdelaide, Quess, QuiteUnusual, Qwfp, RJHall, RS1900, Randomblue, Rbj, Rdanneskjold, Renatops, Riisikuppi, Robma, Rory096, Rwv37,
SMStigler, Sadi Carnot, Sam Hocevar, Samuel, Santa Sangre, Schlier22, SchuminWeb, ScienceApologist, SevereTireDamage, SimonTrew, SlamDiego, Snoyes, StephenFerg, Stpasha, Studerby,
Stwalkerster, Sublium, TangoTheory, Tarotcards, TedE, Terry0051, The Thing That Should Not Be, Themerejoy, Tiddly Tom, Tomas e, Tomixdf, Tpbradbury, Tt 225, Uksam88, Unara,
Urhixidur, Utcursch, UtilityIsKing, Vojvodaen, Vsmith, WolfmanSF, XJamRastafire, XM, Zoicon5, 409 anonymous edits

Integral  Source: http://en.wikipedia.org/w/index.php?oldid=387177827  Contributors: 129.174.59.xxx, 4C, 6birc, A bit iffy, Ace Frahm, Acegikmo1, Admiral Norton, Adrruiz, Ais523,
Alansohn, Aleksander.adamowski, Alexius08, Alsandro, Aly89, AnOddName, Andrei Stroe, Andrew Moylan, Andrewcmcardle, Anonymous Dissident, Antillarum, Apokrif, Arcfrk,
ArnoldReinhold, Arunirde, AxelBoldt, Azuredu, Baccala@freesoft.org, BarretBonden, Bdesham, Bdmy, Beefman, Bemoeial, BenFrantzDale, Benjaminwill, Benzi455, Berria, Bethnim, Bnitin,
Bo Jacoby, Bobo192, Bomac, Borb, Boreas231, Bovineone, Brews ohare, Brufydsy, Bsmntbombdood, Butko, CSTAR, CalebNoble, Calltech, Caltas, Calvin 1998, Capefeather, Caramdir,
Cardamon, Cassini83, Cat2020, Catgut, Centrx, Cflm001, Chait1027, Charles Matthews, Chetvorno, Chinju, Ciko, Closedmouth, Conversion script, Crakkpot, Cronholm144, DHN, Da nuke,
DabMachine, Daryl Williams, Davewild, DavidCBryant, Dchristle, Den fjättrade ankan, Diberri, Discospinster, Diza, Djradon, Doctormatt, Dojarca, Drdonzi, Dugwiki, Dysprosia, E2eamon,
Eagleal, Edward, Edward Knave, Einsteins37, ElTchanggo, Electron9, Emaalt, Emet truth, Enochlau, Epbr123, Espressobongo, Evil saltine, Favonian, Ferengi, Filemon, FilipeS, Fintor,
FractalFusion, FrankTobia, Franklin.vp, Fredrik, Frokor, Fulvius, Fyyer, Gadykozma, Gandalf61, Garo, Gary King, Geometry guy, Gesslein, Giftlite, Glaurung, Glenn, Gmcastil, Gnixon,
Goethean, Greg Stevens, HairyFotr, Hajhouse, Hakeem.gadi, Hakkasberra, Hal0920, Herald747, Heron, Hotstreets, Hrafeiro, Icairns, Igny, Inter, Introareforcommonpublic, Iridescent, Iulianu,
Ivan Štambuk, JB Gnome, JForget, JKBlock, JRSpriggs, Jagged 85, JakeVortex, Jakob.scholbach, Jalesh, Jdlambert, Jfgrcar, Jim.belk, Jitse Niesen, Johnlemartirao, JonMcLoone, JonezyKiDx,
Jose77, Josh dos, Jugander, Jynus, KSmrq, Kapoor Amit, Karada, Karimjb, Karol Langner, Katzmik, Kawautar, Keegan, Kendelarosa5357, Kevin Baas, KieferSkunk, Kiensvay, King Bee,
Kingpin13, Kirbytime, Knakts, Kntrabssi, Kumioko, Kurykh, Kusunose, Kwantus, Kyle1278, LOL, Lambiam, Leland McInnes, Lethe, Levi.vaieua, LiDaobing, Light current, Lightdarkness,
Lindberg G Williams Jr, Lir, Loisel, Loom91, Luís Felipe Braga, MC10, MJBurrage, MONGO, Madmath789, Marek69, MarkSweep, Matqkks, Matsoftware, Maxvcore, Mcld, Mcorazao,
Melchoir, Mets501, MiNombreDeGuerra, Michael Hardy, MightyBig, Mike2vil, Mindmatrix, Minestrone Soup, Miquonranger03, Momusufan, Mormegil, MrOllie, Ms2ger, Mtz1010,
MuZemike, Mìthrandir, Nbarth, Nikai, Nitya Dharma, Nnedass, NonvocalScream, Obeattie, Oleg Alexandrov, OrgasGirl, Ourhomeplanet, Ozob, PMG, Paolo.dL, Patrick, Paul August, Paul
Matthews, Paxsimius, Pcb21, PhilipMW, Phillip J, PhySusie, Physicistjedi, PiMaster3, Pie4all88, Plasticup, Point-set topologist, Pooryorick, Programmar, Python eggs, Quuxplusone, RJFJR,
Raamin, Radomir, Rama's Arrow, Randomblue, Raven4x4x, Razorflame, RedWordSmith, Rich Farmbrough, Rjwilmsi, Rklawton, Robbyjo, Roboquant, Rracecarr, Rubybrian, SJP, Saforrest,
Salix alba, Satori42, Schneelocke, Scott MacLean, Sdornan, Seresin, Shadowjams, Shipmaster, Showgun45, Siddhant, Silly rabbit, Simon-in-sagamihara, Slakr, Sligocki, Smithpith, Snigbrook,
Someguy1221, Spellcast, Splashkid2, Ssd, StaticGull, Stca74, Stevertigo, Stlrams22, StradivariusTV, Stw, Sushant gupta, Susurrus, Sławomir Biały, TJRC, TStein, Tabletop, Tbsmith, Template
namespace initialisation script, The Anome, Thegeneralguy, Thenub314, Thiseye, Tjdw, Tobby72, Tobias Bergemann, Tomyumgoong, Topology Expert, TroyBurm, Twp, Ukexpat, Urdutext,
VKokielov, Van helsing, Vanished User 0001, Velvetron, Viames, Vladislav Pogorelov, Waabu, Waltpohl, Wik, Wile E. Heresiarch, Wilsonater23456, Wimt, WinterSpw, Witchinghour,
Wknight94, Wood Thrush, Wordsoup, Wtshymanski, Xantharius, Yacht, Ybbor, Yosha, Youandme, Yuyudevil, Zfr, Zoicon5, ZooFari, Zundark, 461 anonymous edits

Function (mathematics)  Source: http://en.wikipedia.org/w/index.php?oldid=387220321  Contributors: 21655, ABCD, Aatomic1, Aazn, AbsolutDan, Ac44ck, Adam majewski, Adi4094, Ae-a,
Agüeybaná, Aksi great, Al.locke, Aleph4, Alex43223, Alexb@cut-the-knot.com, Alexius08, Ali Obeid, Altenmann, Ams80, Andre Engels, Andreas Kaufmann, Andres, Andy.melnikov, Angela,
AnonGuy, Anonymous Dissident, Anthony Kull, Arammozuob, Arcfrk, Army1987, Artem Karimov, Arthur Rubin, Asdfqwe123, Autonova, AvicAWB, Avraham, AxelBoldt, Ayda D, Bidabadi,
Bigoperm, Bo Jacoby, Boute, Bradgib, Brainyiscool, Brianjd, BridgeBuilderKiwi, CBM, CBM2, CRGreathouse, Calayodhen, Can't sleep, clown will eat me, Carl.bunderson, Cenarium, Cetinsert,
Ceyockey, Charles Matthews, Chas zzz brown, Cheese Sandwich, Chridd, Christian List, Clark Kimberling, Classicalecon, Cmathio, Cnyrock, Crucis, Cs32en, Cybercobra, DARTH SIDIOUS 2,
Danakil, Darkreason, Daven200520, David Eppstein, David Gerard, David Shear, David spector, Dcoetzee, DefLog, Dfass, Digby Tantrum, Dino, Dmcq, Dominus, Domitori, Donarreiskoffer,
Dpr, Drmies, DuaneLAnderson, Dylan Lake, Dysprosia, ELDRAS, East of Borschov, El C, Elizabeyth, Enviroboy, Epbr123, Equendil, EugeneZelenko, Excirial, Falcon8765, Fastily, Favonian,
Foxjwill, Frankenpuppy, Fredrik, Fresheneesz, Furrykef, Fæ, GTBacchus, Gantlord, Gary King, Gauge, Gene s, Geometry guy, Gesslein, Giftlite, Glenn, Gogo Dodo, Gregbard, H.ehsaan,
Henrygb, Heryu, Hmains, Hu, Hxxvxxy, Hydrogen Iodide, I-20, I69U, IGraph, Iainspeed, Ilya Voyager, Imjustmatthew, Immunize, Indil, Indon, Iner22, Inimino, Iulianu, J heisenberg, J00tel,
Jacj, Jackol, Jagged 85, Jcobb, Jeff3000, Jerry teps, Jiddisch, Jimp, Jitse Niesen, Jojhutton, Jomomaindahouse, Jon Awbrey, Jorge Stolfi, Josh Parris, Jrtayloriv, Juliancolton, Jumbuck, Jusdafax,
Justin W Smith, Jvohn, KSmrq, Kablammo, Katalaveno, Ken Kuniyuki, Keta, Kevs, Kierstend97, Kilbad, Kku, Kusunose, L Kensington, Lambiam, LarryLACa, Laurens-af, LeaveSleaves,
Leibniz, Lightmouse, LinuxDude, LizardJr8, Lobas, Logicist, Lucky13pjn, Lunisneko, MFH, MagnaMopus, MarSch, Marc Venot, Marcos, Marcos (usurped), Marek69, MathMartin, Matijap,
MattGiuca, Merovingian, Mets501, Michael Hardy, Mindmatrix, Mintleaf, Misza13, Mjhsrocker, Mjhy0926, Mkawick, Mor, Mormegil, Mousomer, MrOllie, MrRadioGuy, Msh210, Mufka,
Natalie Erin, NawlinWiki, Newbyguesses, Nguyễn Hữu Dung, Nikai, Noisy, Ntmatter, Oleg Alexandrov, Onevalefan, Orphan Wiki, Ouzel Ring, OverlordQ, Oxymoron83, Ozob, Pak21, Palica,
Palnot, Paolo.dL, Patrick, Paul August, PaulTanenbaum, Pcap, Peterhi, Philip Trueman, Phils, PhotoBox, Piano non troppo, Pj.de.bruin, Pooryorick, Populus, Porcher, Possum, Pruneau, Quaeler,
R'n'B, RG2, Radon210, Ramu50, Randall Holmes, Rasmus Faber, Reach Out to the Truth, RedWolf, Reinderien, Renegadeshark, RexNL, Rich Farmbrough, Rick Norwood, Ronhjones, Rossami,
Rousearts, Rrburke, Ruud Koot, Ryan Reich, Salix alba, Sam Staton, Sampayu, Sbandrews, Schapel, SchfiftyThree, Senehas, Shades78, Shd, Silvaskull, Sirhanx2, SixWingedSeraph, Sligocki,
Someguy1221, Sonett72, SpeedyGonsales, Splash, Spoon!, Sprachpfleger, Stefano85, Stephen Shaw, Steven Russell, Stevertigo, Sverdrup, Symane, Sławomir Biały, THEN WHO WAS
PHONE?, TakuyaMurata, Tarif Ezaz, Taxman, Template namespace initialisation script, The editor1, TheDJ, TheNightFly, Thehotelambush, Thenub314, Thomasmeeks, Tide rolls, Tilla,
Tkuvho, Tobias Bergemann, Toby Bartels, Tooto, Tosha, Tparameter, Trixx, Turgidson, Twsx, Tyrol5, Urdutext, UserDoe, VKokielov, Vagary, Vary, Velho, Versus22, Vivacissamamente,
Article Sources and Contributors 236

Vriullop, Wafulz, Wavelength, Waxex, Weixifan, Wild one, Winekeke, Wolfrock, Woodstone, Wshun, Wvbailey, Yacht, Yamakiri, Yamamoto Ichiro, Yoshigev, Zayzya, Zfr, Zimbardo Cookie
Experiment, Zundark, Zy26, Zzuuzz, 543 anonymous edits

Calculus  Source: http://en.wikipedia.org/w/index.php?oldid=386904227  Contributors: 01001, 07fan, 129.132.2.xxx, 14chuck7, 1exec1, 207.77.174.xxx, 24.44.206.xxx, 4.21.52.xxx,
4twenty42o, 64.252.67.xxx, 6birc, ABCD, APH, Aaronbrick, Abcdwxyzsh, Abmax, Abrech, AbsolutDan, Accident4, Ace Frahm, Acepectif, Acroterion, Adamantios, Ahoerstemeier, Ahy1,
Akrabbim, Aktsu, Alansohn, AlexiusHoratius, Ali, Allen Moore, Allen3, Allen4names, Alpha Beta Epsilon, Alpha Omicron, AltGrendel, AmeliaElizabeth, AnOddName, AndHab, Andonic,
Andorphin, Andre Engels, Andrewlp1991, Andrewpmk, AndyZ, Angela, Angr, Animum, Antandrus, Antonio Lopez, Ap, Appropo, Arcfrk, Arno, Arthur Rubin, Arthursimms, Asjafari,
Astropithicus, Asyndeton, Atallcostsky, Aurumvorax, AustinKnight, Avenue, Awh, AxelBoldt, B, BOARshevik, Badagnani, Ballz4kidz, Barneca, Baronnet, Batmanand, Bazookacidal, Bcherkas,
Bcrowell, Beerad34, Bellas118, BenB4, Berek, Berndt, Bethnim, Bethpage89, Bevo, Bfesser, Bgpaulus, BiT, Billymac00, Binary TSO, Bingbong555, Bkell, Bkessler23, Black Falcon, Black
Kite, Blahdeeblaj, Blainster, BlueDevil, Bmk, Bobblewik, Bobo192, Bogey97, Bonadea, Bongwarrior, Bookmaker, Bookmarks4life, Boznia, Brian Everlasting, Brianga, Brion VIBBER,
BryanHolland, Bsroiaadn, Buckner 1986, Buillon sexycat800, Burris, C S, C quest000, CART fan, CBM, CDutcher, CIreland, CL8, CSTAR, Cabalamat, Cabhan, Caesar1313, Calculuschicken,
Callmebill, Calqulus, Caltas, Calton, Calvin 1998, Camw, Can't sleep, clown will eat me, CanadianLinuxUser, Cap'n Refsmmat, Capricorn42, Carasora, CardinalDan, CarlBBoyer, Carso32,
Castro92, Catgut, CathySc, Cdthedude, Cenarium, Cessator, Cfarsi3, Charles Matthews, Cheeser1, Chibitoaster, Choster, Christofurio, Chriszim, Chun-hian, Ckatz, Cmarkides, Coldsquid,
Commander Keane, CommonModeNoise, Comrademikhail, Conversion script, Courcelles, Courtneylynn45, CptCutLess, Cronholm144, Crotalus horridus, Css2002, Cthompson, Cymon,
DARTH SIDIOUS 2, DHN, DMacks, DVdm, Da Gingerbread Man, Damian Yerrick, Damicatz, Daniel Arteaga, Daniel Hughes 88, Daniel J. Leivick, Daniel Quinlan, Daniel5127,
DanielDeibler, Daniele.tampieri, Dannery4, Danski14, Darth Panda, Daryl Williams, Daven200520, Davewild, David Newton, DavidCBryant, Daxfire, Db099221, Dbach, DeadEyeArrow,
Debator of mathematics, Deeptrivia, Dekisugi, Delbert Grady, DerHexer, Dferg, Diddlefart, Diginity, Diletante, Dimimimon7, Dionyziz, Discospinster, Diverman, Dmharvey, Doctormatt,
Dominus, Domthedude001, Dontwerryaboutit, DopefishJustin, DragonflySixtyseven, Drdonzi, DreamGuy, Drilnoth, Drywallandoswald, Dtgm, Dullfig, Dyknowsore, Dysepsion, Dysprosia, EJF,
EdH, Edcolins, Edmoil, Eduardoporcher, Edward, Edward321, Egil, Egmontaz, Einsteins37, Eisnel, Ekotkie, El C, Elementaro, Eliyak, Elkman, Eloquence, Email4mobile, Emann74, Emily
Jensen, Emmett, Empty Buffer, Epbr123, Escape Orbit, Espressobongo, Estel, Everyking, Evil saltine, Excirial, Existentialistcowboy, Eyu100, Faithlessthewonderboy, Falcorian,
Farquaadhnchmn, Favonian, Feezo, Feinstein, Fephisto, Fetchcomms, Fiedorow, FilipeS, Filippowiki, Finell, Fintler, Fixthatspelling, Flex, Flutefreek, Foobar333, Footballfan190, Four Dog
Night, Fowler&fowler, Foxtrotman, Frazzydee, Freakinadrian, Fredrik, FrozenPurpleCube, Frymaster, Furrykef, Fuzzform, G.W., G026r, GT5162, Gabriel Kielland, Gabrielleitao, Gadfium,
Gaelen S., Gaff, Gaius Cornelius, Gaopeng, Gene Ward Smith, Genius101, Geoking66, Geometry guy, Gesslein, Giftlite, Gilliam, Glane23, Gnat, Goeagles4321, Gofeel, Gogo Dodo, Golezan,
Goocanfly, Goodwisher, Googl, Gop 24fan, Gracenotes, Graham87, Grokmoo, Groovybill, Groundling, Gscshoyru, Guanaco, Guiltyspark, Gurchzilla, Guy M, Gwernol, Gwguffey, Habhab38,
Hadal, Hajhouse, Hannes Eder, Hanse, Haonhien, Harryboyles, Hawkhkylax26, Hawthorn, Hdt83, Headbomb, Headhold, Hebrides, Heimstern, Helios Entity 2, Helix84, Helvetius, Heron,
Hesacon, Hgetnet, High Elf, Hike395, Hippasus, HolIgor, Homestarmy, Hotstreets, Htim, Hut 8.5, Hydrogen Iodide, IDX, II MusLiM HyBRiD II, Icrosson, Ictlogist, Idealitem, Ideyal, Ieremias,
If I Am Blocked I Will Cry, Igiffin, Ike9898, Ikiroid, Ilikepie2221, Imjustmatthew, Infinity0, Infrogmation, Inquisitus, Insanity Incarnate, Interrobang², Ioscius, Iosef, Irish Souffle, IronGargoyle,
Ironman104, IslandHopper973, Izzy007, J.Wolfe@unsw.edu.au, J.delanoy, JDPhD, JForget, JFreeman, JJL, JTB01, JWillFiji, JaGa, Jacek Kendysz, Jackbaird, Jacob Nathaniel Richardson,
Jacobolus, Jagged 85, JaimenKing, Jak86, Jake Wartenberg, James, James086, Jan1nad, Jandjabel, Jason Lynn Carter, Jasongallagher, Jay.perna, Jclemens, Jeff3000, JeffPla, Jengirl1988,
JensenDied, Jenssss, Jersey Devil, Jfiling, Jfilloy, JimR, JimVC3, Jimothy 46, Jimp, JinJian, Jitse Niesen, Jj137, Jjacobsmeyer, Jman9 91, John Kershaw, John254, Johnnybfat, Joodeak, Joseph
Solis in Australia, Joshuac333, Jpo, Junglecat, Justinep, Jwpurple, Jxg, Jyril, Jóna Þórunn, KRS, Kai Hillmann, Kamrama, Karl Dickman, Katanaofdoom, Katzmik, Kbdank71, Kemiv, Ken
Kuniyuki, Kesac, Ketsuekigata, Killdevil, Killfire72, Koavf, Kocher2006, Koyos, Kragen, KrakatoaKatie, Krich, Kristinadam, Kubigula, Kukooo, Kuru, L Kensington, L33tweasley, LLcopp,
Lambiam, Le coq d'or, LeaveSleaves, Leszek Jańczuk, Lethe, Lifung, Lightdarkness, Likebox, Lindmere, Lir, LittleDan, LittleOldMe, Littleyoda20, Loelin, Lollerskates, Lradrama, Luka666,
Luna Santin, Lupo, M.hayek, M1ss1ontomars2k4, MER-C, MONGO, MacGyverMagic, Madchester, Madmath789, Magioladitis, Malatesta, Mani1, Manuel Trujillo Berges, MapsMan,
Marcushdaniel, Mariewoestman, MarkMarek, Markus Krötzsch, Mashford, Math.geek3.1415926, Matthias Heiler, Mauler90, Maurice Carbonaro, Maurreen, Mav, Maxis ftw, Maxstr9,
Mayumashu, Meisterkoch, Melos Antropon, Mentifisto, Merube 89, Mets501, Mgmei, Mgummess, Michael Hardy, Michaelh09, Mike2vil, Miked2009, Minestrone Soup, Miskin,
MithrandirAgain, Mjpurses, Mlm42, Modernage, Modulatum, Moink, Mokakeiche, Mr Stephen, MrOllie, MrSomeone, Mrbond69, Mrhurtin, Ms2ger, Mspraveen, Musicman69123,
Mygerardromance, N.j.hansen, Nahum Reduta, Nandesuka, Narcissus, Natural Philosopher, NawlinWiki, Nbarth, Ndkl, NeilN, Neokamek, Nick Garvey, Nigel5, Nikai, NinSmartSpasms, Ninly,
Nixeagle, Nnedass, Nneonneo, Nohup.in, Nolefan3000, NuclearWarfare, NuclearWinner, Nucleusboy, Nufy8, Nuttyskin, OSJ1961, Obey, Obradovic Goran, Oleg Alexandrov, Oliver202,
Olop4444, Omicronpersei8, Oreo Priest, Orlady, Orphic, Otheus, OverlordQ, OwenX, Owlgorithm, Ozob, P Carn, Pabix, Pakula, Pascal.Tesson, Pattymayonaise, Paul August, Pcap, Peere,
Penguinpwrdbox, Peruvianllama, Peter Grey, Petter Strandmark, Pgunnels, Phil Bastian, Philip Trueman, PhotoBox, PhySusie, Physprob, Piano non troppo, Pieburningbateater, PierreAbbat,
Pilif12p, PinchasC, Pinethicket, Pizza Puzzle, Pluppy, Pmanderson, Pmeisel, Pnzrfaust, Poison ivy boy, Pokemon1989, Pomakis, Pramboy, Pranathi, Professor Fiendish, Proffie, Programmar,
Puchiko, Puck is awesome., PurpleRain, Pvjthomas, Pyrospirit, Qertis, Quangbao, Quantumobserver, Quintote, Qxz, RHaworth, RJHall, Ragesoss, Ral315, Ramblagir, Ramin325, Razimantv,
Razorflame, Rdsmith4, Reach Out to the Truth, Reaperman, Recentchanges, Recognizance, Reconsider the static, Red Winged Duck, RedWolf, Reepnorp, RekishiEJ, Renato Caniatti, Rettetast,
Revolver, Rich Farmbrough, Rick Norwood, Rjwilmsi, Rl, Roastytoast, RobHar, Robertgreer, RodC, Rokfaith, Rorro, Rossami, Rotem Dan, Routeh, Roy Brumback, Royboycrashfan, Roylee,
Rpchase, Rpg9000, Rrenner, Rtyq2, Rustysrfbrds99, Rxc, Ryan Postlethwaite, Ryulong, SFC9394, Salix alba, Saupreissen, Savidan, Schneelocke, ScienceApologist, Sciurinæ, Scottydude,
Sdornan, SeoMac, Sephiroth BCR, Sfngan, Shanel, Sheeana, SheepNotGoats, Shinjiman, Shizhao, Shunpiker, Silly rabbit, Simetrical, Sjakkalle, Sjforman, Skal, Skater, Skiasaurus, Skydot,
Smashville, Smeira, SmilesALot, Smithbcs, Smithpith, Smoken Flames, Snotchstar!, SoSaysChappy, Soltras, Someones life, Sp00n, SpK, SpLoT, Spartan-James, Specs112, Spkid64, Splash,
Spreadthechaos, SpuriousQ, Sr1111, Srkris, Stammer, StaticGull, Stephenb, Stevenmattern, Stevertigo, Stickee, Stizz, Storeye, Stumps, Stwalkerster, Suyashmanjul, Swegei, Symane, TBadger,
TakuyaMurata, Tangent747, Tanweer Morshed, Tarek j, Tarret, Tawker, Taxman, Tbonnie, Tbsmith, Tcncv, TedE, Tedjn, Telempe, Template namespace initialisation script, Terence,
Terminaterjohn, Tetracube, Tfeeney65, That1dude35, Thatguyflint, The Anome, The Thing That Should Not Be, The Transhumanist, The Transhumanist (AWB), The wub, TheMidnighters,
Themfromspace, Thenub314, Thomasmeeks, Thunderboltz, ThuranX, Tide rolls, Tiga-Now, Tikiwont, Timo Honkasalo, Timwi, Tkuvho, Tobby72, Tomayres, Tony Fox, Torvik, Tosha,
Tothebarricades.tk, Travisc, Trd89, Tribaal, TrigWorks, Trovatore, Trusilver, Truth100, Tualha, TutterMouse, Tzf, Ukexpat, V10, VDWI, VMS Mosaic, Variable, VasilievVV, Viriditas,
Visualerror, WPIsFlawed, Wa2ise, Wapcaplet, Watsonksuplayer, Wayward, Welsh, Widdma, Wik, Wiki alf, WikiZorro, Wikiklrsc, Wikilibrarian, WikipedianMarlith, Willardsimmons, William
felton, Wimt, Wknight94, Wolfrock, Worrydoes, Wowz00r, Wraithdart, Wwoods, X!, Xantharius, Xharze, Xnuala, Xod, Xornok, Xrchz, Y Strings 9 The Bar, Yacht, Yamamoto Ichiro, Yazaq,
YellowMonkey, Yerpo, Yongtze28, Yosri, Youandme, YourBrain, Yute, Zachorious, Zaraki, Zchenyu, Zenohockey, ‫ןושרג ןב‬, ‫ سأ مساب‬2, 1565 anonymous edits

Average  Source: http://en.wikipedia.org/w/index.php?oldid=386187730  Contributors: 16@r, 2D, AGToth, AbsolutDan, AirdishStraus, Alai, Amal Hatun, Amirab, Andrewa, Angsteh,
Anonymous Dissident, Appleptic, Aqui, Ariel., Arthur Rubin, Averagejef, Averagejefpet, B4hand, Belg4mit, Belovedfreak, Berland, Bevo, Bigvic318, Bo Jacoby, CSWarren, CWY2190,
CardinalDan, Carlsotr, Chamal N, Charles Matthews, Chase me ladies, I'm the Cavalry, ClubOranje, CroydThoth, DEMcAdams, Dag Hovland, DanMS, Daniel.Cardenas, Darkspots,
DeadEyeArrow, Den fjättrade ankan, DerHexer, Derek Ross, Diegotorquemada, Direvus, Discospinster, Dmcq, Dmmaus, Donarreiskoffer, Dtrebbien, Dweller, Ehrenkater, Ellmist, Ellywa,
Epbr123, FrancoGG, FreplySpang, Fresheneesz, Ft1, Fuzheado, G716, Gail, Gap, Gco, Gene Nygaard, Giants27, Giftlite, Gilgamesh, Gilliam, Gizurr, Gogo Dodo, Gothmog.es, Grick,
Hadleywickham, Hans Adler, Happy-melon, Heavyweight Gamer, Henrygb, Herrsheng, Hgberman, Hirak 99, Hokanomono, Htim, Hu12, Hydrogen Iodide, Hypnosifl, Impdog, Infrangible,
Insanity Incarnate, Io, J.delanoy, JPD, JaGa, Jake Wartenberg, JamesBWatson, Jeffussing, Jffootball133, John254, Johnbibby, Jonas AGX, Jshtz4, Justinfr, Jérôme, Kaldari, Khalad, Kiril
Simeonovski, Kiscica, Kjtobo, Kndiaye, Kostisl, Kri, Kyorosuke, L33tminion, Lambiam, Lammidhania, Luboogers25, Lulu of the Lotus-Eaters, Magic.crow, Majorly, Marek69, Melcombe,
Michael Hardy, Michal Jurosz, Mmxx, Mohsinvyr, Molotron, Montrealais, Moosetraxx, Mormegil, Mwalcoff, Mwtoews, NawlinWiki, Nbarth, Neutron65, Nickylame2, Nishkid64, O18,
Octahedron80, Oleg Alexandrov, OnBeyondZebrax, Optichan, Ozob, PRB, Patrick, Patsup, Paul August, PeterStJohn, Pilif12p, Pinethicket, Piotrus, Plf515, Pmanderson, Pseudomonas, QRS III,
Qwfp, RDBury, RTFVerterra, Rjohnson92, Robroot, RoyBoy, Rrburke, Ruakh, Salgueiro, Savidan, Scottqwerty123, Seabhcan, SharkD, Silly rabbit, Sirmylesnagopaleentheda, Smithpith, Snoyes,
Srathahtars, Stephan Leeds, Stickee, SunDragon34, Syrthiss, Sławomir Biały, TakuyaMurata, Tannin, Tarquin, Thatguyflint, Tide rolls, Timwi, Userabc, Vsmith, Wikid77, Wildscop, ZackV,
Zadcat, Zenohockey, ‫لیقع فشاک‬, 316 anonymous edits
Image Sources, Licenses and Contributors 237

Image Sources, Licenses and Contributors


File:The Normal Distribution.svg  Source: http://en.wikipedia.org/w/index.php?title=File:The_Normal_Distribution.svg  License: Public Domain  Contributors: Original uploader was Heds 1 at
en.wikipedia
File:Gretl screenshot.png  Source: http://en.wikipedia.org/w/index.php?title=File:Gretl_screenshot.png  License: GNU General Public License  Contributors: Den fjättrade ankan, Hannibal,
WikipediaMaster
File:Euclid.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Euclid.jpg  License: Public Domain  Contributors: Cyberpunk, Deerstop, Fishbone16, HUB, Mattes, Petropoxy
(Lithoderm Proxy), 5 anonymous edits
Image:Kapitolinischer Pythagoras adjusted.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Kapitolinischer_Pythagoras_adjusted.jpg  License: GNU Free Documentation License
 Contributors: Original uploader was Galilea at de.wikipedia
File:maya.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Maya.svg  License: GNU Free Documentation License  Contributors: Bryan Derksen
File:GodfreyKneller-IsaacNewton-1689.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:GodfreyKneller-IsaacNewton-1689.jpg  License: unknown  Contributors: Algorithme,
Beyond My Ken, Bjankuloski06en, Grenavitar, Infrogmation, Kelson, Kilom691, Porao, Saperaud, Semnoz, Siebrand, Sparkit, Thomas Gun, Wknight94, Wst, Zaphod, 4 anonymous edits
File:Leonhard Euler 2.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Leonhard_Euler_2.jpg  License: unknown  Contributors: Haham hanuka, Herbythyme, Serge Lachinov,
Shakko, 6 anonymous edits
File:Infinity symbol.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Infinity_symbol.svg  License: Public Domain  Contributors: Darapti, Hello71, Indolences, Kilom691, Magister
Mathematicae, Wst, 6 anonymous edits
File:Carl Friedrich Gauss.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Carl_Friedrich_Gauss.jpg  License: unknown  Contributors: Bcrowell, Blösöf, Conscious, Gabor, Joanjoc,
Kaganer, Kilom691, Luestling, Mattes, Rovnet, Schaengel89, Ufudu, 4 anonymous edits
File:Abacus 6.png  Source: http://en.wikipedia.org/w/index.php?title=File:Abacus_6.png  License: unknown  Contributors: Flominator, German, Grön, Luestling, RHorning
File:Elliptic curve simple.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Elliptic_curve_simple.svg  License: GNU Free Documentation License  Contributors: User:Pbroks13
File:Rubik's cube.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Rubik's_cube.svg  License: GNU Free Documentation License  Contributors: User:Booyabazooka
File:Group diagdram D6.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Group_diagdram_D6.svg  License: Public Domain  Contributors: User:Cepheus
File:Lattice of the divisibility of 60.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Lattice_of_the_divisibility_of_60.svg  License: Creative Commons Attribution-Sharealike 2.5
 Contributors: User:Ed g2s
File:Illustration to Euclid's proof of the Pythagorean theorem.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Illustration_to_Euclid's_proof_of_the_Pythagorean_theorem.svg
 License: Public Domain  Contributors: Darapti, Gerbrant
File:Sine cosine plot.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Sine_cosine_plot.svg  License: Creative Commons Attribution-Sharealike 2.5  Contributors: User:Qualc1
File:Hyperbolic triangle.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Hyperbolic_triangle.svg  License: Public Domain  Contributors: Bender235, Kieff, 1 anonymous edits
File:Torus.png  Source: http://en.wikipedia.org/w/index.php?title=File:Torus.png  License: Public Domain  Contributors: Kieff, Rimshot, SharkD
File:Mandel zoom 07 satellite.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Mandel_zoom_07_satellite.jpg  License: Creative Commons Attribution-Sharealike 2.5  Contributors:
User:Wolfgangbeyer
File:Measure illustration.png  Source: http://en.wikipedia.org/w/index.php?title=File:Measure_illustration.png  License: Public Domain  Contributors: User:Oleg Alexandrov
File:Integral as region under curve.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Integral_as_region_under_curve.svg  License: Creative Commons Attribution-Sharealike 2.5
 Contributors: 4C
File:Vector field.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Vector_field.svg  License: Public Domain  Contributors: User:Fibonacci
File:Airflow-Obstructed-Duct.png  Source: http://en.wikipedia.org/w/index.php?title=File:Airflow-Obstructed-Duct.png  License: Public Domain  Contributors: Original uploader was User A1
at en.wikipedia
File:Limitcycle.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Limitcycle.jpg  License: GNU Free Documentation License  Contributors: Dcoetzee, It Is Me Here, Kilom691,
Knutux
File:Lorenz attractor.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Lorenz_attractor.svg  License: Creative Commons Attribution 2.5  Contributors: User:Dschwen
File:Princ argument ex1.png  Source: http://en.wikipedia.org/w/index.php?title=File:Princ_argument_ex1.png  License: GNU Free Documentation License  Contributors: User:ThibautLienart
File:Venn A intersect B.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Venn_A_intersect_B.svg  License: Public Domain  Contributors: User:Cepheus
File:Commutative diagram for morphism.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Commutative_diagram_for_morphism.svg  License: Public Domain  Contributors:
User:Cepheus
File:DFAexample.svg  Source: http://en.wikipedia.org/w/index.php?title=File:DFAexample.svg  License: Public Domain  Contributors: User:Cepheus
File:Caesar3.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Caesar3.svg  License: Public Domain  Contributors: User:Cepheus
Image:Gravitation space source.png  Source: http://en.wikipedia.org/w/index.php?title=File:Gravitation_space_source.png  License: GNU Free Documentation License  Contributors:
Duesentrieb, Schekinov Alexey Victorovich, Superborsuk, WikipediaMaster
Image:BernoullisLawDerivationDiagram.svg  Source: http://en.wikipedia.org/w/index.php?title=File:BernoullisLawDerivationDiagram.svg  License: GNU Free Documentation License
 Contributors: User:MannyMax
Image:Composite trapezoidal rule illustration small.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Composite_trapezoidal_rule_illustration_small.svg  License: Attribution
 Contributors: User:Pbroks13
Image:Maximum boxed.png  Source: http://en.wikipedia.org/w/index.php?title=File:Maximum_boxed.png  License: Public Domain  Contributors: User:Freiddy
Image:Two red dice 01.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Two_red_dice_01.svg  License: Public Domain  Contributors: Stephen Silver
Image:Oldfaithful3.png  Source: http://en.wikipedia.org/w/index.php?title=File:Oldfaithful3.png  License: Public Domain  Contributors: Anynobody, Maksim, Mdd, Nandhp, Oleg Alexandrov,
WikipediaMaster, 6 anonymous edits
Image:Market Data Index NYA on 20050726 202628 UTC.png  Source: http://en.wikipedia.org/w/index.php?title=File:Market_Data_Index_NYA_on_20050726_202628_UTC.png  License:
Public Domain  Contributors: Denniss, Jodo
Image:Arbitrary-gametree-solved.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Arbitrary-gametree-solved.svg  License: Public Domain  Contributors: User:Qef
Image:Signal transduction v1.png  Source: http://en.wikipedia.org/w/index.php?title=File:Signal_transduction_v1.png  License: GNU Free Documentation License  Contributors: Original
uploader was Roadnottaken at en.wikipedia
Image:Ch4-structure.png  Source: http://en.wikipedia.org/w/index.php?title=File:Ch4-structure.png  License: GNU Free Documentation License  Contributors: Benjah-bmm27, Dbc334,
Maksim
Image:GDP PPP Per Capita IMF 2008.png  Source: http://en.wikipedia.org/w/index.php?title=File:GDP_PPP_Per_Capita_IMF_2008.png  License: Creative Commons Attribution 3.0
 Contributors: User:Sbw01f
Image:Simple feedback control loop2.png  Source: http://en.wikipedia.org/w/index.php?title=File:Simple_feedback_control_loop2.png  License: unknown  Contributors: Corona
Image:Normal Distribution PDF.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Normal_Distribution_PDF.svg  License: Public Domain  Contributors: User:Inductiveload
Image:Normal Distribution CDF.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Normal_Distribution_CDF.svg  License: Public Domain  Contributors: User:Inductiveload
Image:standard deviation diagram.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Standard_deviation_diagram.svg  License: Public Domain  Contributors: Chesnok, Juiced lemon,
Krinkle, Manuelt15, Mwtoews, Petter Strandmark, Revolus, Tom.Reding, Wknight94, 17 anonymous edits
Image:De moivre-laplace.gif  Source: http://en.wikipedia.org/w/index.php?title=File:De_moivre-laplace.gif  License: Public Domain  Contributors: User:Stpasha
Image:QHarmonicOscillator.png  Source: http://en.wikipedia.org/w/index.php?title=File:QHarmonicOscillator.png  License: GNU Free Documentation License  Contributors: Inductiveload,
Maksim, Pieter Kuiper
Image:Fisher iris versicolor sepalwidth.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Fisher_iris_versicolor_sepalwidth.svg  License: Creative Commons Attribution-Sharealike
3.0  Contributors: User:Pbroks13
Image Sources, Licenses and Contributors 238

Image:Planche de Galton.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Planche_de_Galton.jpg  License: Creative Commons Attribution-Sharealike 3.0  Contributors:
User:Antoinetav
Image:Carl Friedrich Gauss.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Carl_Friedrich_Gauss.jpg  License: unknown  Contributors: Bcrowell, Blösöf, Conscious, Gabor,
Joanjoc, Kaganer, Kilom691, Luestling, Mattes, Rovnet, Schaengel89, Ufudu, 4 anonymous edits
Image:Pierre-Simon Laplace.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Pierre-Simon_Laplace.jpg  License: unknown  Contributors: Ashill, Ecummenic, Elcobbola,
Gene.arboit, Jimmy44, Olivier2, 霧木諒二
File:cumulativeSD.svg  Source: http://en.wikipedia.org/w/index.php?title=File:CumulativeSD.svg  License: Public Domain  Contributors: User:Inductiveload, User:Wolfkeeper
Image:Standard deviation illustration.gif  Source: http://en.wikipedia.org/w/index.php?title=File:Standard_deviation_illustration.gif  License: unknown  Contributors: Forlornturtle
File:Comparison standard deviations.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Comparison_standard_deviations.svg  License: Public Domain  Contributors: User:JRBrown
File:Standard deviation diagram.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Standard_deviation_diagram.svg  License: Public Domain  Contributors: Chesnok, Juiced lemon,
Krinkle, Manuelt15, Mwtoews, Petter Strandmark, Revolus, Tom.Reding, Wknight94, 17 anonymous edits
Image:Latex real numbers.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Latex_real_numbers.svg  License: GNU Free Documentation License  Contributors: User:Arichnad
File:Number-line.gif  Source: http://en.wikipedia.org/w/index.php?title=File:Number-line.gif  License: Public Domain  Contributors: Original uploader was MathsIsFun at en.wikipedia
Image:Boxplot vs PDF.png  Source: http://en.wikipedia.org/w/index.php?title=File:Boxplot_vs_PDF.png  License: Creative Commons Attribution-Sharealike 2.5  Contributors: Original
uploader was Jhguch at en.wikipedia
Image:Discrete probability distribution illustration.png  Source: http://en.wikipedia.org/w/index.php?title=File:Discrete_probability_distribution_illustration.png  License: Public Domain
 Contributors: User:Oleg Alexandrov
Image:FoldedCumulative.PNG  Source: http://en.wikipedia.org/w/index.php?title=File:FoldedCumulative.PNG  License: Creative Commons Attribution-Sharealike 3.0  Contributors: Rumping
Image:Discrete probability distrib.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Discrete_probability_distrib.svg  License: Public Domain  Contributors: User:Oleg Alexandrov
Image:Discrete probability distribution.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Discrete_probability_distribution.svg  License: Public Domain  Contributors: User:Incnis
Mrsi
Image:Normal probability distribution.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Normal_probability_distribution.svg  License: Public Domain  Contributors: User:Incnis
Mrsi
Image:Mixed probability distribution.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Mixed_probability_distribution.svg  License: Public Domain  Contributors: User:Incnis Mrsi
Image:Fair dice probability distribution.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Fair_dice_probability_distribution.svg  License: Public Domain  Contributors: User:Oleg
Alexandrov
File:Rapid Oscillation.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Rapid_Oscillation.svg  License: Creative Commons Attribution 3.0  Contributors: --pbroks13talk? Original
uploader was Pbroks13 at en.wikipedia
Image:Right-continuous.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Right-continuous.svg  License: Public Domain  Contributors: w:User:JacjJacj
Image:Left-continuous.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Left-continuous.svg  License: Public Domain  Contributors: Jacj, Plasticspork
Image:continuity topology.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Continuity_topology.svg  License: Public Domain  Contributors: User:Dcoetzee
Image:Measure illustration.png  Source: http://en.wikipedia.org/w/index.php?title=File:Measure_illustration.png  License: Public Domain  Contributors: User:Oleg Alexandrov
File:Flag of France.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Flag_of_France.svg  License: Public Domain  Contributors: User:SKopp, User:SKopp, User:SKopp, User:SKopp,
User:SKopp, User:SKopp
Image:Rotating spherical harmonics.gif  Source: http://en.wikipedia.org/w/index.php?title=File:Rotating_spherical_harmonics.gif  License: GNU Free Documentation License  Contributors:
Cyp, Jengelh, Pieter Kuiper, 1 anonymous edits
Image:Laplace house Arcueil.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Laplace_house_Arcueil.jpg  License: unknown  Contributors: User:cutler
Image:Pierre-Simon-Laplace (1749-1827).jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Pierre-Simon-Laplace_(1749-1827).jpg  License: unknown  Contributors: Gabor,
Luestling, Olivier2, Umherirrender
File:Integral example.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Integral_example.svg  License: Creative Commons Attribution-Sharealike 2.5  Contributors: User:KSmrq
File:ArabicIntegralSign.svg  Source: http://en.wikipedia.org/w/index.php?title=File:ArabicIntegralSign.svg  License: Public Domain  Contributors: ZooFari
File:Integral approximations.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Integral_approximations.svg  License: GNU Free Documentation License  Contributors: User:KSmrq
File:Integral Riemann sum.png  Source: http://en.wikipedia.org/w/index.php?title=File:Integral_Riemann_sum.png  License: Creative Commons Attribution-Sharealike 2.5  Contributors:
User:KSmrq
File:Riemann sum convergence.png  Source: http://en.wikipedia.org/w/index.php?title=File:Riemann_sum_convergence.png  License: Creative Commons Attribution-Sharealike 2.5
 Contributors: User:KSmrq
File:Improper integral.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Improper_integral.svg  License: GNU Free Documentation License  Contributors: User:KSmrq
File:Volume under surface.png  Source: http://en.wikipedia.org/w/index.php?title=File:Volume_under_surface.png  License: Public Domain  Contributors: User:Oleg Alexandrov
File:Line-Integral.gif  Source: http://en.wikipedia.org/w/index.php?title=File:Line-Integral.gif  License: GNU Free Documentation License  Contributors: Cronholm144, Darapti, Nandhp,
SkiDragon
File:Surface integral illustration.png  Source: http://en.wikipedia.org/w/index.php?title=File:Surface_integral_illustration.png  License: Public Domain  Contributors: Darapti, Oleg
Alexandrov, WikipediaMaster
File:Numerical quadrature 4up.png  Source: http://en.wikipedia.org/w/index.php?title=File:Numerical_quadrature_4up.png  License: Creative Commons Attribution-Sharealike 2.5
 Contributors: User:KSmrq
Image:Graph of example function.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Graph_of_example_function.svg  License: Creative Commons Attribution 2.5  Contributors:
KSmrq
File:Function machine2.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Function_machine2.svg  License: Public Domain  Contributors: Wvbailey (talk). Original uploader was
Wvbailey at en.wikipedia. Later version(s) were uploaded by Threecheersfornick at en.wikipedia.
Image:Function machine5.png  Source: http://en.wikipedia.org/w/index.php?title=File:Function_machine5.png  License: Public Domain  Contributors: User:Wvbailey
File:Gottfried Wilhelm von Leibniz.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Gottfried_Wilhelm_von_Leibniz.jpg  License: unknown  Contributors: Beyond My Ken,
Davidlud, Eusebius, Factumquintus, Gabor, Luestling, Mattes, Schaengel89, Svencb, Tomisti, 4 anonymous edits
File:Tangent derivative calculusdia.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Tangent_derivative_calculusdia.svg  License: GNU Free Documentation License  Contributors:
Minestrone Soup
File:Sec2tan.gif  Source: http://en.wikipedia.org/w/index.php?title=File:Sec2tan.gif  License: GNU Free Documentation License  Contributors: User:OSJ1961
File:NautilusCutawayLogarithmicSpiral.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:NautilusCutawayLogarithmicSpiral.jpg  License: Attribution  Contributors: User:Chris 73
License 239

License
Creative Commons Attribution-Share Alike 3.0 Unported
http:/ / creativecommons. org/ licenses/ by-sa/ 3. 0/

S-ar putea să vă placă și