Processing math: 80%

Pages

Wednesday, October 29, 2014

Martin-Löf randomness

In this post, I sum up the basic idea of Martin-Löf randomness. Actually, this post is mainly a series of personal notes on the matter. I decided though to publish it here as it may interests someone else. In particular, for the sake of brevity, I rely on informal definitions of computable functions, Turing machines, recursive enumerable sets, etc. The reader should just think about them as any methods which can be translated in a computer program.

    1. Statistics background

Let's fix a finite alphabet X of size d. We denote by X^n the set of sequences of length n on X. We denote by X^* (resp. X^{\omega}) the set of finite (resp. infinite) sequences on X. As usual, the concatenation of words is denoted by u \cdot v, or simply uv. The length of a finite word u is denoted by |u|. We assume that X is given the uniform probability measure \mu(x) = d^{-1}. This measure naturally extends to X^n, and X^{\omega}. For instance,
  \begin{equation*}     \forall n,~ \forall w \in X^n,~ \mu (w \cdot X^{\omega}) = \mu(w_1) \dots \mu(w_n)   \end{equation*}
We are interested in finding what it means for an individual sequence in X^{\omega} to be random. Martin-Löf's definition is inspired by the notion of statistical tests. We start with a basic example. Some of the commonly accepted (or "ought to be") features of random sequences comprise several laws of large numbers. For example, in a "truly" infinite random sequence S, the frequency of any symbol x \in X should be close to \mu(x). Put another way, after fixing some arbitrary k, we can compute the frequency freq(x,S \upharpoonright k) of the symbol x in the prefix S \upharpoonright k of length k, and if we find that this quantity deviates too much from \mu(x), we conclude that the sequence S is not random.

Reformulating this in the statististics jargon, we are designing a test for the null hypothesis "H_0: the sequence is random" with the statistics given by the frequency of the symbol x in the prefix of length k of S. Obviously, the fact that this statistics deviates from its expected value does not ensure at 100% that the sequence is "not random", but we want to bound the probability of such an error. Hence, we fix a confidence level \epsilon > 0, define a function f(\epsilon) and subset U(\epsilon) \subseteq X^* such that, for all n,
  \begin{align}     ~&U(\epsilon) = \{ w \in X^*,~ |freq(x, w) - \mu(x)| > f(\epsilon) \} \\     ~&\mu(U(\epsilon)) < \epsilon   \end{align}
The function f can be chosen to be the best one satisfying this relation, i.e., any smaller function would increase \mu(U(\epsilon)) above \epsilon. The equation above means that if S is drawn "randomly", then the probability that the frequency computed from S \upharpoonright k deviates from its expected value by the quantity f(\epsilon) is at most \epsilon. In other words, the probability that a randomly selected infinite sequence S is rejected by the test, i.e., the probability that S \upharpoonright n \in U(\epsilon), is bounded by \epsilon. Statisticians say that the set U(\epsilon) is the critical region of the test at level \epsilon. Actually, a test can be viewed as a subset U \subseteq \mathbb{R}^+ \times X^* with U(\epsilon) = \{S,~ (\epsilon,S) \in U\}.

Intuitively, the sequence S is non-random if it is rejected by the test at all levels. Thus S is random if
  \begin{equation}     S \in X^{\omega} - \bigcap_{\epsilon > 0} U(\epsilon) \cdot X^{\omega}   \end{equation}

    2. Effective tests

One is rightly dubious about the definition of randomness based solely on the frequency of a given symbol in the prefix of length n of an infinite sequence. Following the original intuition, one would define a random sequence as a sequence that passes any imaginable statistical test. But we cannot consider all the tests, i.e., all the subsets of \mathbb{R}^+ \times X^*. The major contribution of Martin-Löf was to focus on the effective tests, that is, roughly speaking, tests that can be performed by a Turing machine.

To do so, instead of indexing the confidence level by \epsilon \in \mathbb{R}^+ and without loss of generality, we use the levels \epsilon_m = 2^{-m}, m \in \mathbb{N}. An effective test is then defined as a recursively enumerable subset U \subseteq \mathbb{N} \times X^* such that the subsets U(m) = \{w \in X^*,~ (m,w) \in U\} satisfies, for all n,m
  \begin{align}     ~&U(m) \supseteq U(m+1)\\     ~&\mu(U(m)) < \epsilon_m   \end{align}
The major point is the requirement that the set U is recursively enumerable. Roughly speaking, this means that there exists an algorithm that recognizes the inputs (m,w) \in U, i.e., that is able to detect sequences that fail the test at level \epsilon_m.

To be precise, one also requires that, if (m,w) \in U, then for all n \leq m, v \sqsupseteq w, (n,v) \in U. This point comes from the fact we want to test randomness for infinite sequences S \in X^{\omega}. If (m, S \upharpoonright k) \in U, i.e., S is rejected by the test at level \epsilon_m, then it is natural to require that any longer prefix S \upharpoonright l, l \geq k, is also rejected by the test at less constrained levels \epsilon_n, n \leq m. Intuitively, since S \upharpoonright l comprises the information in S \upharpoonright k, and if S \upharpoonright k provides a clue to consider S as being non-random, then S \upharpoonright l should also cause S to be considered non-random.

  3. Universal test and random sequences

It is well known that there exists a universal Turing machine, i.e., a computer that is able to simulate every other computer. In the same spirit, Martin-Löf has proved that there exists a universal effective test U that "simulates" every other effective test. More precisely, if V is any other effective test, then there exists a constant c, depending only on U and V, such that the critical regions of U and V satisfies, for all m,
  \begin{equation}     V_{m+c} \subseteq U_m   \end{equation}
Now, let R be the set of random sequences considering the universal test U, i.e.
  \begin{equation}     R = X^{\omega} - \bigcap_{m} U(m) \cdot X^{\omega}   \end{equation}
Then, by the universal property of U, every S \in R is random relatively to any effective test V.

A direct consequence of this definition is that \mu(R) = 1 since \mu(U(m)) \rightarrow 0. In particular, R is dense in X^{\omega}. This means that, for any finite sequence w \in X^*, there is a random sequence S \in R that extends w, i.e., w \sqsubset S. Otherwise, w\cdot X^{\omega} \cap R = \emptyset, i.e., w \cdot X^{\omega} is included in the complement of R, and thus 0 < \mu(w \cdot X^{\omega}) \leq 0. In other words, from a measure theoretical point of view, R is large.

Finally, one should notice that the set of random sequences R has been defined in terms of the probability distribution \mu. Actually, one could  also take any probability measure on X^{\omega}, and define a set R_{\mu} of infinite sequences random with respect to \mu. The only restriction is that \mu must be computable, as otherwise, we cannot define effective tests. The main advantage of Martin-Löf's approach is that it can be easily transposed in many other measured topological space, as long as an appropriate definition of "computable stuff" is available.

pb


1. Per Martin-Löf, The Definition of Random Sequences, Information and Control, 9(6): 602 - 619, 1966.

No comments:

Post a Comment