Sunteți pe pagina 1din 6

A randomness extractor, often simply called an "extractor", is a function, which

being applied to output from a weakly random entropy source, together with a short,
uniformly random seed, generates a highly random output that appears independent
from the source and uniformly distributed.[1] Examples of weakly random sources
include radioactive decay or thermal noise; the only restriction on possible
sources is that there is no way they can be fully controlled, calculated or
predicted, and that a lower bound on their entropy rate can be established. For a
given source, a randomness extractor can even be considered to be a true random
number generator (TRNG); but there is no single extractor that has been proven to
produce truly random output from any type of weakly random source.

Sometimes the term "bias" is used to denote a weakly random source's departure from
uniformity, and in older literature, some extractors are called unbiasing
algorithms,[2] as they take the randomness from a so-called "biased" source and
output a distribution that appears unbiased. The weakly random source will always
be longer than the extractor's output, but an efficient extractor is one that
lowers this ratio of lengths as much as possible, while simultaneously keeping the
seed length low. Intuitively, this means that as much randomness as possible has
been "extracted" from the source.

Note that an extractor has some conceptual similarities with a pseudorandom


generator (PRG), but the two concepts are not identical. Both are functions that
take as input a small, uniformly random seed and produce a longer output that
"looks" uniformly random. Some pseudorandom generators are, in fact, also
extractors. (When a PRG is based on the existence of hard-core predicates, one can
think of the weakly random source as a set of truth tables of such predicates and
prove that the output is statistically close to uniform.[3]) However, the general
PRG definition does not specify that a weakly random source must be used, and while
in the case of an extractor, the output should be statistically close to uniform,
in a PRG it is only required to be computationally indistinguishable from uniform,
a somewhat weaker concept.

NIST Special Publication 800-90B (draft) recommends several extractors, including


the SHA hash family and states that if the amount of entropy input is twice the
number of bits output from them, that output can be considered essentially fully
random.[4]

Contents
1 Formal definition of extractors
1.1 Strong extractors
1.2 Explicit extractors
1.3 Dispersers
2 Randomness extractors in cryptography
3 Examples
3.1 Von Neumann extractor
3.2 Chaos machine
3.3 Cryptographic hash function
4 Applications
5 See also
6 References
Formal definition of extractors
The min-entropy of a distribution {\displaystyle X} X (denoted {\displaystyle
H_{\infty }(X)} H_{\infty}(X)), is the largest real number {\displaystyle k} k such
that {\displaystyle \Pr[X=x]\leq 2^{-k}} \Pr[X =x] \leq 2^{-k} for every
{\displaystyle x} x in the range of {\displaystyle X} X. In essence, this measures
how likely {\displaystyle X} X is to take its most likely value, giving a worst-
case bound on how random {\displaystyle X} X appears. Letting {\displaystyle
U_{\ell }} U_{\ell} denote the uniform distribution over {\displaystyle \
{0,1\}^{\ell }} \{0, 1 \}^{\ell}, clearly {\displaystyle H_{\infty }
(U_{\ell })=\ell } H_{\infty}(U_{\ell}) = \ell.

For an n-bit distribution {\displaystyle X} X with min-entropy k, we say that


{\displaystyle X} X is an {\displaystyle (n,k)} (n,k) distribution.

Definition (Extractor): (k, e)-extractor

Let {\displaystyle {\text{Ext}}:\{0,1\}^{n}\times \{0,1\}^{d}\to \{0,1\}^{m}}


\text{Ext}: \{0,1\}^n \times \{0,1\}^d \to \{0,1\}^m be a function that takes as
input a sample from an {\displaystyle (n,k)} (n,k) distribution {\displaystyle X} X
and a d-bit seed from {\displaystyle U_{d}} U_d, and outputs an m-bit string.
{\displaystyle {\text{Ext}}} \text{Ext} is a (k, e)-extractor, if for all
{\displaystyle (n,k)} (n,k) distributions {\displaystyle X} X, the output
distribution of {\displaystyle {\text{Ext}}} \text{Ext} is e-close to
{\displaystyle U_{m}} U_m.

In the above definition, e-close refers to statistical distance.

Intuitively, an extractor takes a weakly random n-bit input and a short, uniformly
random seed and produces an m-bit output that looks uniformly random. The aim is to
have a low {\displaystyle d} d (i.e. to use as little uniform randomness as
possible) and as high an {\displaystyle m} m as possible (i.e. to get out as many
close-to-random bits of output as we can).

Strong extractors
An extractor is strong if concatenating the seed with the extractor's output yields
a distribution that is still close to uniform.

Definition (Strong Extractor): A {\displaystyle (k,\epsilon )} (k, \epsilon)-strong


extractor is a function

{\displaystyle {\text{Ext}}:\{0,1\}^{n}\times \{0,1\}^{d}\rightarrow \{0,1\}^{m}\,}


\text{Ext}: \{0,1\}^n \times \{0,1\}^d \rightarrow \{0,1\}^m \,
such that for every {\displaystyle (n,k)} (n,k) distribution {\displaystyle X} X
the distribution {\displaystyle U_{d}\circ {\text{Ext}}(X,U_{d})} U_d \circ
\text{Ext}(X, U_d) (the two copies of {\displaystyle U_{d}} U_d denote the same
random variable) is {\displaystyle \epsilon } \epsilon -close to the uniform
distribution on {\displaystyle U_{m+d}} {\displaystyle U_{m+d}}.

Explicit extractors
Using the probabilistic method, it can be shown that there exists a (k, e)-
extractor, i.e. that the construction is possible. However, it is usually not
enough merely to show that an extractor exists. An explicit construction is needed,
which is given as follows:

Definition (Explicit Extractor): For functions k(n), e(n), d(n), m(n) a family Ext
= {Extn} of functions

{\displaystyle {\text{Ext}}_{n}:\{0,1\}^{n}\times \{0,1\}^{d(n)}\rightarrow \


{0,1\}^{m(n)}} \text{Ext}_n : \{0,1\}^n \times \{0,1\}^{d(n)} \rightarrow \
{0,1\}^{m(n)}
is an explicit (k, e)-extractor, if Ext(x, y) can be computed in polynomial time
(in its input length) and for every n, Extn is a (k(n), e(n))-extractor.

By the probabilistic method, it can be shown that there exists a (k, e)-extractor
with seed length

{\displaystyle d=\log {(n-k)}+2\log \left({\frac {1}{\varepsilon }}\right)+O(1)} d


= \log{(n-k)}+2\log \left(\frac{1}{\varepsilon}\right) +O(1)
and output length

{\displaystyle m=k+d-2\log \left({\frac {1}{\varepsilon }}\right)-O(1)} m = k +d-


2\log \left(\frac{1}{\varepsilon}\right) - O(1).[5]
Dispersers
A variant of the randomness extractor with weaker properties is the disperser.

Randomness extractors in cryptography


One of the most important aspects of cryptography is random key generation.[6] It
is often necessary to generate secret and random keys from sources that are semi-
secret or which may be compromised to some degree. By taking a single, short (and
secret) random key as a source, an extractor can be used to generate a longer
pseudo-random key, which then can be used for public key encryption. More
specifically, when a strong extractor is used its output will appear be uniformly
random, even to someone who sees part (but not all) of the source. For example, if
the source is known but the seed is not known (or vice versa). This property of
extractors is particularly useful in what is commonly called Exposure-Resilient
cryptography in which the desired extractor is used as an Exposure-Resilient
Function (ERF). Exposure-Resilient cryptography takes into account that the fact
that it is difficult to keep secret the initial exchange of data which often takes
place during the initialization of an encryption application e.g., the sender of
encrypted information has to provide the receivers with information which is
required for decryption.

The following paragraphs define and establish an important relationship between two
kinds of ERF--k-ERF and k-APRF--which are useful in Exposure-Resilient
cryptography.

Definition (k-ERF): An adaptive k-ERF is a function {\displaystyle f} f where, for


a random input {\displaystyle r} r , when a computationally unbounded adversary
{\displaystyle A} A can adaptively read all of {\displaystyle r} r except for
{\displaystyle k} k bits, {\displaystyle |\Pr\{A^{r}(f(r))=1\}-\Pr\{A^{r}
(R)=1\}|\leq \epsilon (n)} |\Pr\{A^{r}(f(r)) = 1\} - \Pr\{A^{r}(R) = 1\}| \leq
\epsilon(n) for some negligible function {\displaystyle \epsilon (n)} \epsilon(n)
(defined below).

The goal is to construct an adaptive ERF whose output is highly random and
uniformly distributed. But a stronger condition is often needed in which every
output occurs with almost uniform probability. For this purpose Almost-Perfect
Resilient Functions (APRF) are used. The definition of an APRF is as follows:

Definition (k-APRF): A {\displaystyle k=k(n)} k = k(n) APRF is a function


{\displaystyle f} f where, for any setting of {\displaystyle n-k} n-k bits of the
input {\displaystyle r} r to any fixed values, the probability vector
{\displaystyle p} p of the output {\displaystyle f(r)} f(r) over the random choices
for the {\displaystyle k} k remaining bits satisfies {\displaystyle |p_{i}-2^{-m}|
<2^{-m}\epsilon (n)} |p_{i}-2^{-m}| < 2^{-m} \epsilon(n) for all {\displaystyle i}
i and for some negligible function {\displaystyle \epsilon (n)} \epsilon(n).

Kamp and Zuckerman[7] have proved a theorem stating that if a function


{\displaystyle f} f is a k-APRF, then {\displaystyle f} f is also a k-ERF. More
specifically, any extractor having sufficiently small error and taking as input an
oblivious, bit-fixing source is also an APRF and therefore also a k-ERF. A more
specific extractor is expressed in this lemma:

Lemma: Any {\displaystyle 2^{-m}\epsilon (n)} 2^{-m} \epsilon(n)-extractor


{\displaystyle f:\{0,1\}^{n}\rightarrow \{0,1\}^{m}} f: \{0,1\}^{n} \rightarrow \
{0,1\}^m for the set of {\displaystyle (n,k)} (n,k) oblivious bit-fixing sources,
where {\displaystyle \epsilon (n)} \epsilon(n) is negligible, is also a k-APRF.

This lemma is proved by Kamp and Zuckerman.[7] The lemma is proved by examining the
distance from uniform of the output, which in a {\displaystyle 2^{-m}\epsilon (n)}
2^{-m} \epsilon(n)-extractor obviously is at most {\displaystyle 2^{-m}\epsilon
(n)} 2^{-m} \epsilon(n), which satisfies the condition of the APRF.

The lemma leads to the following theorem, stating that there in fact exists a k-
APRF function as described:

Theorem (existence): For any positive constant {\displaystyle \gamma \leq {\frac
{1}{2}}} \gamma \leq \frac{1}{2}, there exists an explicit k-APRF {\displaystyle
f:\{0,1\}^{n}\rightarrow \{0,1\}^{m}} f: \{0,1\}^{n} \rightarrow \{0,1\}^{m},
computable in a linear number of arithmetic operations on {\displaystyle m} m-bit
strings, with {\displaystyle m=\Omega (n^{2\gamma })} m = \Omega(n^{2\gamma}) and
{\displaystyle k=n^{{\frac {1}{2}}+\gamma }} k = n^{\frac{1}{2}+\gamma}.

Definition (negligible function): In the proof of this theorem, we need a


definition of a negligible function. A function {\displaystyle \epsilon (n)}
\epsilon(n) is defined as being negligible if {\displaystyle \epsilon
(n)=O\left({\frac {1}{n^{c}}}\right)} {\displaystyle \epsilon (n)=O\left({\frac {1}
{n^{c}}}\right)} for all constants {\displaystyle c} c.

Proof: Consider the following {\displaystyle \epsilon } \epsilon -extractor: The


function {\displaystyle f} f is an extractor for the set of {\displaystyle
(n,\delta n)} (n,\delta n) oblivious bit-fixing source: {\displaystyle f:\
{0,1\}^{n}\rightarrow \{0,1\}^{m}} f: \{0,1\}^{n} \rightarrow \{0,1\}^{m}.
{\displaystyle f} f has {\displaystyle m=\Omega (\delta ^{2}n)} m =
\Omega(\delta^{2}n), {\displaystyle \epsilon =2^{-cm}} \epsilon = 2^{-cm} and
{\displaystyle c>1} c > 1.

The proof of this extractor's existence with {\displaystyle \delta \leq 1}


\delta \leq 1, as well as the fact that it is computable in linear computing time
on the length of {\displaystyle m} m can be found in the paper by Jesse Kamp and
David Zuckerman (p. 1240).

That this extractor fulfills the criteria of the lemma is trivially true as
{\displaystyle \epsilon =2^{-cm}} \epsilon = 2^{-cm} is a negligible function.

The size of {\displaystyle m} m is:

{\displaystyle m=\Omega (\delta ^{2}n)=\Omega (n)\geq \Omega (n^{2\gamma })} m =


\Omega(\delta^{2}n) = \Omega(n) \geq \Omega(n^{2\gamma})
Since we know {\displaystyle \delta \leq 1} \delta \leq 1 then the lower bound on
{\displaystyle m} m is dominated by {\displaystyle n} n. In the last step we use
the fact that {\displaystyle \gamma \leq {\frac {1}{2}}} \gamma \leq \frac{1}{2}
which means that the power of {\displaystyle n} n is at most {\displaystyle 1} 1.
And since {\displaystyle n} n is a positive integer we know that {\displaystyle
n^{2\gamma }} n^{2\gamma} is at most {\displaystyle n} n.

The value of {\displaystyle k} k is calculated by using the definition of the


extractor, where we know:

{\displaystyle (n,k)=(n,\delta n)\Rightarrow k=\delta n} (n,k) = (n, \delta n)


\Rightarrow k = \delta n
and by using the value of {\displaystyle m} m we have:

{\displaystyle m=\delta ^{2}n=n^{2\gamma }} m = \delta^{2}n = n^{2\gamma}


Using this value of {\displaystyle m} m we account for the worst case, where
{\displaystyle k} k is on its lower bound. Now by algebraic calculations we get:

{\displaystyle \delta ^{2}n=n^{2\gamma }} \delta^{2}n = n^{2\gamma}


{\displaystyle \Rightarrow \delta ^{2}=n^{2\gamma -1}} \Rightarrow \delta^2 =
n^{2\gamma -1}
{\displaystyle \Rightarrow \delta =n^{\gamma -{\frac {1}{2}}}} \Rightarrow \delta =
n^{\gamma -\frac{1}{2}}
Which inserted in the value of {\displaystyle k} k gives

{\displaystyle k=\delta n=n^{\gamma -{\frac {1}{2}}}n=n^{\gamma +{\frac {1}{2}}}} k


= \delta n = n^{\gamma -\frac{1}{2}}n = n^{\gamma +\frac{1}{2}},
which proves that there exists an explicit k-APRF extractor with the given
properties. {\displaystyle \Box } \Box

Examples
Von Neumann extractor
Further information: Bernoulli sequence
Perhaps the earliest example is due to John von Neumann. His extractor took
successive pairs of consecutive bits (non-overlapping) from the input stream. If
the two bits matched, no output was generated. If the bits differed, the value of
the first bit was output. The Von Neumann extractor can be shown to produce a
uniform output even if the distribution of input bits is not uniform so long as
each bit has the same probability of being one and there is no correlation between
successive bits.[8]

Thus, it takes as input a Bernoulli sequence with p not necessarily equal to 1/2,
and outputs a Bernoulli sequence with {\displaystyle p=1/2.} p = 1/2. More
generally, it applies to any exchangeable sequence�it only relies on the fact that
for any pair, 01 and 10 are equally likely: for independent trials, these have
probabilities {\displaystyle p\cdot (1-p)=(1-p)\cdot p} {\displaystyle p\cdot (1-
p)=(1-p)\cdot p}, while for an exchangeable sequence the probability may be more
complicated, but both are equally likely.

Chaos machine
Another approach is to use the output of a chaos machine applied to the input
stream. This approach generally relies on properties of chaotic systems. Input bits
are pushed to the machine, evolving orbits and trajectories in multiple dynamical
systems. Thus, small differences in the input produce very different outputs. Such
a machine has a uniform output even if the distribution of input bits is not
uniform or has serious flaws, and can therefore use weak entropy sources.
Additionally, this scheme allows for increased complexity, quality, and security of
the output stream, controlled by specifying three parameters: time cost, memory
required, and secret key.

Cryptographic hash function


It is also possible to use a cryptographic hash function as a randomness extractor.
However, not every hashing algorithm is suitable for this purpose.[citation needed]

Applications
Randomness extractors are used widely in cryptographic applications, whereby a
cryptographic hash function is applied to a high-entropy, but non-uniform source,
such as disk drive timing information or keyboard delays, to yield a uniformly
random result.

Randomness extractors have played a part in recent developments in quantum


cryptography, where photons are used by the randomness extractor to generate secure
random bits.[1]

Randomness extraction is also used in some branches of computational complexity


theory.

Random extraction is also used to convert data to a simple random sample, which is
normally distributed, and independent, which is desired by statistics.

See also

S-ar putea să vă placă și