dytuxh: Martin-Löf randomness

In this post, I sum up the basic idea of Martin-Löf randomness. Actually, this post is mainly a series of personal notes on the matter. I decided though to publish it here as it may interests someone else. In particular, for the sake of brevity, I rely on informal definitions of computable functions, Turing machines, recursive enumerable sets, etc. The reader should just think about them as any methods which can be translated in a computer program.

1. Statistics background

Let's fix a finite alphabet

$X$ of size

$d$ . We denote by

$X^n$ the set of sequences of length

$n$ on

$X$ . We denote by

$X^*$ (resp.

$X^{\omega}$ ) the set of finite (resp. infinite) sequences on

$X$ . As usual, the concatenation of words is denoted by

$u \cdot v$ , or simply

$uv$ . The length of a finite word

$u$ is denoted by

$|u|$ . We assume that

$X$ is given the uniform probability measure

$\mu(x) = d^{-1}$ . This measure naturally extends to

$X^n$ , and

$X^{\omega}$ . For instance,

$\begin{equation*} \forall n,~ \forall w \in X^n,~ \mu (w \cdot X^{\omega}) = \mu(w_1) \dots \mu(w_n) \end{equation*}$
We are interested in finding what it means for an individual sequence in

$X^{\omega}$ to be random. Martin-Löf's definition is inspired by the notion of statistical tests. We start with a basic example. Some of the commonly accepted (or "ought to be") features of random sequences comprise several laws of large numbers. For example, in a "truly" infinite random sequence

$S$ , the frequency of any symbol

$x \in X$ should be close to

$\mu(x)$ . Put another way, after fixing some arbitrary

$k$ , we can compute the frequency

$freq(x,S \upharpoonright k)$ of the symbol

$x$ in the prefix

$S \upharpoonright k$ of length

$k$ , and if we find that this quantity deviates too much from

$\mu(x)$ , we conclude that the sequence

$S$ is not random.

Reformulating this in the statististics jargon, we are designing a test for the null hypothesis "

$H_0$ : the sequence is random" with the statistics given by the frequency of the symbol

$x$ in the prefix of length

$k$ of

$S$ . Obviously, the fact that this statistics deviates from its expected value does not ensure at 100% that the sequence is "not random", but we want to bound the probability of such an error. Hence, we fix a confidence level

$\epsilon > 0$ , define a function

$f(\epsilon)$ and subset

$U(\epsilon) \subseteq X^*$ such that, for all

$n$ ,

$\begin{align} ~&U(\epsilon) = \{ w \in X^*,~ |freq(x, w) - \mu(x)| > f(\epsilon) \} \\ ~&\mu(U(\epsilon)) < \epsilon \end{align}$
The function

$f$ can be chosen to be the best one satisfying this relation, i.e., any smaller function would increase

$\mu(U(\epsilon))$ above

$\epsilon$ . The equation above means that if

$S$ is drawn "randomly", then the probability that the frequency computed from

$S \upharpoonright k$ deviates from its expected value by the quantity

$f(\epsilon)$ is at most

$\epsilon$ . In other words, the probability that a randomly selected infinite sequence

$S$ is rejected by the test, i.e., the probability that

$S \upharpoonright n \in U(\epsilon)$ , is bounded by

$\epsilon$ . Statisticians say that the set

$U(\epsilon)$ is the critical region of the test at level

$\epsilon$ . Actually, a test can be viewed as a subset

$U \subseteq \mathbb{R}^+ \times X^*$ with

$U(\epsilon) = \{S,~ (\epsilon,S) \in U\}$ .

Intuitively, the sequence

$S$ is non-random if it is rejected by the test at all levels. Thus

$S$ is random if

$\begin{equation} S \in X^{\omega} - \bigcap_{\epsilon > 0} U(\epsilon) \cdot X^{\omega} \end{equation}$

2. Effective tests

One is rightly dubious about the definition of randomness based solely on the frequency of a given symbol in the prefix of length

$n$ of an infinite sequence. Following the original intuition, one would define a random sequence as a sequence that passes any imaginable statistical test. But we cannot consider all the tests, i.e., all the subsets of

$\mathbb{R}^+ \times X^*$ . The major contribution of Martin-Löf was to focus on the effective tests, that is, roughly speaking, tests that can be performed by a Turing machine.

To do so, instead of indexing the confidence level by

$\epsilon \in \mathbb{R}^+$ and without loss of generality, we use the levels

$\epsilon_m = 2^{-m}$ ,

$m \in \mathbb{N}$ . An effective test is then defined as a recursively enumerable subset

$U \subseteq \mathbb{N} \times X^*$ such that the subsets

$U(m) = \{w \in X^*,~ (m,w) \in U\}$ satisfies, for all

$n,m$

$\begin{align} ~&U(m) \supseteq U(m+1)\\ ~&\mu(U(m)) < \epsilon_m \end{align}$
The major point is the requirement that the set

$U$ is recursively enumerable. Roughly speaking, this means that there exists an algorithm that recognizes the inputs

$(m,w) \in U$ , i.e., that is able to detect sequences that fail the test at level

$\epsilon_m$ .

To be precise, one also requires that, if

$(m,w) \in U$ , then for all

$n \leq m$ ,

$v \sqsupseteq w$ ,

$(n,v) \in U$ . This point comes from the fact we want to test randomness for infinite sequences

$S \in X^{\omega}$ . If

$(m, S \upharpoonright k) \in U$ , i.e.,

$S$ is rejected by the test at level

$\epsilon_m$ , then it is natural to require that any longer prefix

$S \upharpoonright l$ ,

$l \geq k$ , is also rejected by the test at less constrained levels

$\epsilon_n$ ,

$n \leq m$ . Intuitively, since

$S \upharpoonright l$ comprises the information in

$S \upharpoonright k$ , and if

$S \upharpoonright k$ provides a clue to consider

$S$ as being non-random, then

$S \upharpoonright l$ should also cause

$S$ to be considered non-random.

3. Universal test and random sequences

It is well known that there exists a universal Turing machine, i.e., a computer that is able to simulate every other computer. In the same spirit, Martin-Löf has proved that there exists a universal effective test

$U$ that "simulates" every other effective test. More precisely, if

$V$ is any other effective test, then there exists a constant

$c$ , depending only on

$U$ and

$V$ , such that the critical regions of

$U$ and

$V$ satisfies, for all

$m$ ,

$\begin{equation} V_{m+c} \subseteq U_m \end{equation}$
Now, let

$R$ be the set of random sequences considering the universal test

$U$ , i.e.

$\begin{equation} R = X^{\omega} - \bigcap_{m} U(m) \cdot X^{\omega} \end{equation}$
Then, by the universal property of

$U$ , every

$S \in R$ is random relatively to any effective test

$V$ .

A direct consequence of this definition is that

$\mu(R) = 1$ since

$\mu(U(m)) \rightarrow 0$ . In particular,

$R$ is dense in

$X^{\omega}$ . This means that, for any finite sequence

$w \in X^*$ , there is a random sequence

$S \in R$ that extends

$w$ , i.e.,

$w \sqsubset S$ . Otherwise,

$w\cdot X^{\omega} \cap R = \emptyset$ , i.e.,

$w \cdot X^{\omega}$ is included in the complement of

$R$ , and thus

$0 < \mu(w \cdot X^{\omega}) \leq 0$ . In other words, from a measure theoretical point of view,

$R$ is large.

Finally, one should notice that the set of random sequences

$R$ has been defined in terms of the probability distribution

$\mu$ . Actually, one could also take any probability measure on

$X^{\omega}$ , and define a set

$R_{\mu}$ of infinite sequences random with respect to

$\mu$ . The only restriction is that

$\mu$ must be computable, as otherwise, we cannot define effective tests. The main advantage of Martin-Löf's approach is that it can be easily transposed in many other measured topological space, as long as an appropriate definition of "computable stuff" is available.

1. Per Martin-Löf, The Definition of Random Sequences, Information and Control, 9(6): 602 - 619, 1966.

dytuxh

Pages

Wednesday, October 29, 2014

Martin-Löf randomness

No comments:

Post a Comment