## Tuesday, December 8, 2015

### Lovász Local Lemma (non-constructive)

I read the short (and interesting) presentation of the original local Lovász lemma from a lecture by Alistair Sinclair. I report here the proof of this lemma (which was left as an exercise, although a simpler case was proven whose proof contains all the ingredients).

This lemma is extremely useful in the probabilistic method: if you can prove that the set of objects satisfying a certain number of constraints has positive probability (or density), then this set cannot be empty. In particular, an object satisfying those constraints do exist. However, this method does not explicitly build the desired object. The constructive version of the Lovász local lemma was given by Moser and Tardos, as explained in this previous post.

1. Lovász Local Lemma

Let $A_1,\dots,A_n$ be a set of bad events. These events are not necessarily mutually independent. For each index $1\le i \le n$, we denote by $D_i \subseteq \{1,\dots,n\}$ the set of indices such that the events $A_i, A_j$ with $j \not\in D_i$ are mutually independent. In other words, $D_i$ is the dependency set of the event $A_i$.

The lemma states:
If there exists real values $x_1,\dots,x_n \in [0,1)$ such that $$\mathbb{P}(A_i) \le x_i \cdot \prod_{j \in D_i} (1-x_j) ~~~~~ (\star)$$ then $$\mathbb{P}\left(\bigwedge\limits_{i=1}^n \overline{A}_i\right) \ge \prod_{i=1}^n (1-x_i).$$
The proof goes as follows. We first show that for any index $1\le i \le n$ and any strict subset $S \subset \{1,\dots,n\}$, we have
$$\mathbb{P}\left( A_i ~\bigg|~ \bigwedge\limits_{j \in S} \overline{A_j} \right) \le x_i.$$
We prove this claim by induction on the size $m = |S|$. The case $m = 0$, i.e., $S = \emptyset$, follows from $(\star)$. Now, assume the result holds for all subset of size less than $m$. We split the set $S$ in two disjoint parts $S_D = S \cap D_i$ and $S_I = S - S_D$. The indices in $S_D$ (resp. $S_I$) are exactly the indices in $S$ of events that are not independent (resp. that are independent) from $A_i$.  We have, by the chain rule of conditional probabilities:
\begin{align*} \mathbb{P}\left(A_i ~\bigg|~ \bigwedge\limits_{j \in S} \overline{A_j} \right) &= \mathbb{P}\left(A_i ~\bigg|~ \bigwedge\limits_{k \in S_D} \overline{A_k} \land \bigwedge\limits_{l \in S_I} \overline{A_l} \right) \\ &= \frac{\mathbb{P}\left(A_i \land \bigwedge\limits_{k \in S_D} \overline{A_k} ~\bigg|~ \bigwedge\limits_{l \in S_I} \overline{A_l} \right)}{\mathbb{P}\left(\bigwedge\limits_{k \in S_D} \overline{A_k} ~\bigg|~ \bigwedge\limits_{l \in S_I} \overline{A_l} \right)}\\ &\le \frac{\mathbb{P}\left(A_i ~\bigg|~ \bigwedge\limits_{l \in S_I} \overline{A_l} \right)}{\mathbb{P}\left(\bigwedge\limits_{k \in S_D} \overline{A_k} ~\bigg|~ \bigwedge\limits_{l \in S_I} \overline{A_l} \right)} \end{align*}
Since the events $A_i, A_l$ , $l \in S_I$, are mutually independent, we have:
\begin{align*} \mathbb{P}\left(A_i ~\bigg|~ \bigwedge\limits_{j \in S} \overline{A_j} \right) &\le \frac{\mathbb{P}\left(A_i \right)}{\mathbb{P}\left(\bigwedge\limits_{k \in S_D} \overline{A_k} ~\bigg|~ \bigwedge\limits_{l \in S_I} \overline{A_l} \right)}\\ &\le \frac{x_i \cdot \prod_{j\in D_i} (1-x_j)}{\mathbb{P}\left(\bigwedge\limits_{k \in S_D} \overline{A_k} ~\bigg|~ \bigwedge\limits_{l \in S_I} \overline{A_l} \right)} \end{align*}
Regarding the denominator, denoting $S_D = \{k_1,\dots,k_s\}$, and $S_D^r = \{k_1,\dots,k_r\}$ for any $1 \le r \le s$, we have
\begin{align*} \mathbb{P}\left(\bigwedge\limits_{k \in S_D} \overline{A_k} ~\bigg|~ \bigwedge\limits_{l \in S_I} \overline{A_l} \right) &= \prod_{r = 1}^s \mathbb{P}\left( \overline{A}_{k_r} ~\bigg|~ \bigwedge\limits_{l \in S_I \sqcup S_D^r} \overline{A_l} \right)\\ &= \prod_{r = 1}^s \left(1 - \mathbb{P}\left( A_{k_r} ~\bigg|~ \bigwedge\limits_{l \in S_I \sqcup S_D^r} \overline{A_l} \right)\right) \\ &\ge \prod_{r = 1}^s (1 - x_{k_r}) = \prod_{j \in S_I} (1 - x_j). \end{align*}
The last inequality follows from the induction hypothesis. We conclude that
\begin{align*} \mathbb{P}\left(A_i ~\bigg|~ \bigwedge\limits_{j \in S} \overline{A_j} \right) &\le x_i \cdot \frac{\prod_{j \in D_i} (1 - x_j)}{\prod_{j \in S_I} (1 - x_j)} \\ &\le x_i ~~~ \text{ since } S_I \subseteq D_i. \end{align*}
This concludes the induction proof.

We apply this inequality to prove the Lovász lemma. Indeed, we have
\begin{align*} \mathbb{P}\left(\bigwedge\limits_{i=1}^n \overline{A}_i\right) &= \prod_{i=1}^n \left( 1 - \mathbb{P}\left( A_i ~\bigg|~ \bigwedge\limits_{1 \le j < i} \overline{A}_j \right)\right) \\ &\ge \prod_{i=1}^n (1-x_i). \end{align*}

2. Other formulations

The previous formulation is quite general. A restricted formulation is given by the following:
Assume $\mathbb{P}(A_i) \le p < 1$ for all $1 \le i \le n$. Let $d$ be the maximum size of the dependency sets $D_i$'s. If $e\cdot p\cdot (d+1) \le 1$, then $$\mathbb{P}\left(\bigwedge\limits_{i=1}^n \overline{A}_i\right) \ge (1- \frac{1}{d+1})^n > 0.$$
Indeed,let's take $x_i = 1/(d+1)$. It suffices to prove that
$$p \le \frac{1}{d+1} \cdot \left(1-\frac{1}{d+1}\right)^d$$
But we have
\begin{align*} \left(1-\frac{1}{d+1}\right)^d \ge \left(1-\frac{1}{d}\right)^d \ge \frac{1}{e} \end{align*}

Another restricted formulation is given as follows:
If $\sum_{j \in D_i} \mathbb{P}(A_j) \le 1/4$ for all $i$, then $$\mathbb{P}\left(\bigwedge\limits_{i=1}^n \overline{A}_i\right) \ge \prod_{i=1}^n \left(1- 2\mathbb{P}(A_i)\right) > 0.$$
Indeed, let's take $x_i = 2\mathbb{P}(A_i)$. The condition imposes that each $x_i \in [0,1]$. One can easily shows by induction on the size of $D_i$ that
\begin{align*} \prod_{j \in D_i} \left(1- 2\mathbb{P}(A_j)\right) &\ge 1 - \sum_{j\in D_i} 2\mathbb{P}(A_j) \\ &\ge \frac{1}{2} \end{align*}
Therefore, $\mathbb{P}(A_i) \le x_i \prod_{D_i} (1-x_j)$ is satisfied for all $i$.

pb

## Thursday, December 3, 2015

### Gibbs inequality applied to binomial inequality

1. Gibbs inequality

Let $P = \{p_x\}_{x \in X}$ and $Q = \{q_x\}_{x \in X}$ be two probability distributions on a finite set $X$. I simply assume that the $q_x$ are all positive, $q_x > 0$. The Gibbs inequality states that
$$- \sum_{x \in X} p_x \cdot \log p_x \le - \sum_{x \in X} p_x \cdot \log q_x$$
with equality iff the two distributions are identical. We use the convention that $x\cdot \log x = 0$ if $x = 0$. This result follows from the classic technique of convex inequality   (a.k.a Jensen's inequality). Indeed, the function $f(x) = x\cdot \log x$ is convex on the positive reals, so
\begin{align*} \sum_x p_x \cdot \log \frac{p_x}{q_x} &= \sum_x q_x \cdot \frac{p_x}{q_x} \cdot \log \frac{p_x}{q_x} \\ &= \sum_x q_x \cdot f\left( \frac{p_x}{q_x}\right) \\ &\ge f\left(\sum_x q_x \frac{p_x}{q_x}\right) \\ &\ge f(1) = 0 \end{align*}

2. Binomial and entropy

Varying the probability distributions in the Gibbs inequality can yield quite surprising result. For instance, let $n$ be a positive integer, and $0 \le p \le 1$ a real number such that $n\cdot p$ is an integer. Then
$$\begin{equation*} \binom{n}{n p} \le 2^{n\cdot h(p)} \end{equation*}$$
where $h(p) = - p\cdot \log - (1-p)\cdot \log (1-p)$. The function $h$ represents the entropy of a coin with bias $p$. The intuition behind this inequality is the following: to identify a subset of $np$ elements in a set of $n$ elements $\{1,2,\dots,n\}$, you need at least $\log \binom{n}{np}$ bits. Now, there is an approximate procedure to identify such a subset: for each element $i = 1,2,\dots,n$, you toss the bias coin, and add the element $i$ to your subset iff the coin turns, e.g., head. With such a procedure, on average, you identify a subset of size $np$, and you could associate with this subset the corresponding sequence of coin tosses as an identifier. By Shannon's source coding theorem, you can store such sequences of coin tosses using about $n\cdot h(p)$ bits. The inequality states that the latter procedure uses (slightly) more bits than necessary.

The proof goes as follows. Let $X_1,\dots,X_n$ be independent random variables following the same distribution
\begin{align*} \mathbb{P}(X_i = 1) &= p \\ \mathbb{P}(X_i = 0) &= 1-p \end{align*}
Let $X = (X_1,\dots,X_n)$ be the joint variable, with values in $\{0,1\}^n$. Consider also the uniform random variable $U$ with values in the subset of vectors $u \in \{0,1\}^n$ having exactly $k = np$ entries set to $1$ (and the others to $0$). We denote by $|u|$ the number of entries set to $1$ in $u$. The probability distribution of $U$ is given by, for any $u \in \{0,1\}^n$,
$$\begin{equation*} \mathbb{P}(U = u) = \left\{ \begin{array}{cc} 0 & \text{ if } |u| \ne k\\ \binom{n}{k}^{-1} & \text{ otherwise} \end{array} \right. \end{equation*}$$

Applying the Gibbs inequality to the distributions of $X$ and $U$, we get
\begin{align*} \sum_{u \in \{0,1\}^n} - \mathbb{P}(U=u) \cdot \log \mathbb{P}(U = u) &\le \sum_{u \in \{0,1\}^n} - \mathbb{P}(U=u) \cdot \log \mathbb{P}(X = u)\\ \end{align*}
\begin{align*} \log \binom{n}{k} &\le \sum_{u,~ |u|=k} - \binom{n}{k}^{-1} \cdot \log p^k(1-p)^{n-k}\\ &\le -k \cdot \log p -(n-k)\cdot\log (1-p) \\ &\le n \cdot h(p) ~~~~\text{ (since } k = np \text{)}. \end{align*}

3. The bound is quite tight

The bound on $\binom{n}{np}$ is quite tight. Indeed, Stirling's formula states that:
$$n! \sim \sqrt{2\pi n}\cdot n^n \cdot e^{-n}$$
Applying this formula to the binomial, we get (with $q = 1-p$)
\begin{align*} \binom{n}{np} &= \frac{n!}{(np)!(nq)!} \\ &= \frac{\sqrt{2\pi n}\cdot n^n \cdot e^{-n}}{\sqrt{2\pi n p}\cdot (np)^{np} \cdot e^{-np}\cdot \sqrt{2\pi nq}\cdot (nq)^{nq} \cdot e^{-nq}}\\ &= \frac{2^{n\cdot h(p)}}{\sqrt{2\pi \cdot n \cdot p\cdot (1-p)}}. \end{align*}
Taking the logarithm of both sides, we see that
$$\frac{1}{n} \log \binom{n}{np} \rightarrow h(p) \text{ as } n \rightarrow \infty$$
This means that, as $n$ gets larger and larger, the coin-tossing procedure'' to identify a subset of size $np$ in $n$ gets more and more precise, in the sense that it uses the optimal number of bits $\log \binom{n}{np} \simeq n \cdot h(p)$.

pb

## Tuesday, December 1, 2015

### Weird binomial identities

In this post, I present some relatively weird binomial identities. I will start by the symmetric case, and then consider the twisted case.

1. Symmetric case

We prove the following identity
$$C_{l,l}(y) \underset{def}{=} \sum_{i=0}^l \binom{l+i}{l} y^i = \frac{(1+\rho)^l}{1-y} - \frac{y^{l+1}}{1-y}\cdot\frac{\rho^l}{2}\cdot\left( 1+\frac{1}{\rho}\right)^{2l+1}$$
where $y$ is a formal variable, and $\rho = y/(1-y)$. This identity follows from the classical method of coefficient identification.

Indeed, let's introduce a second variable $x$ and consider the polynomial
$$P = \sum_{i=0}^l y^i \cdot (1+x)^{l+i}$$
By applying the binomial formula to $(1+x)^{l+i}$, we see that the coefficient of $x^l$ in $P$ is
$$C_{l,l}(y) = \sum_{i=0}^l \binom{l+i}{l} y^i$$

On the other hand, in $P$, we have a geometric series
\begin{align} P &= \sum_{i=0}^l y^i (1+x)^{l+i} \\ &= (1+x)^l \cdot \sum_{i=0}^l (y+y x)^i \\ &= (1+x)^l \cdot \frac{1-y^{l+1}(1+x)^{l+1}}{1-y-y x} \\ &= \left( \frac{1}{1-y} (1+x)^l - \frac{y^{l+1}}{1-y} (1+x)^{2l+1} \right) \cdot \frac{1}{1-\rho x} \\ &= \left( \frac{1}{1-y} (1+x)^l - \frac{y^{l+1}}{1-y} (1+x)^{2l+1} \right) \cdot \sum_{s \ge 0} \rho^s x^s \end{align}

From this expression, the coefficient of $x^l$ is given by the convolution
\begin{align} C_{l,l}(y) &= \sum_{s = 0}^l \left( \frac{1}{1-y} \binom{l}{l-s} - \frac{y^{l+1}}{1-y} \binom{2l+1}{l-s}\right) \cdot \rho^s \\ &= \frac{1}{1-y} \cdot \sum_{s = 0}^l \binom{l}{l-s} \rho^s - \frac{y^{l+1}}{1-y} \sum_{s=0}^l \binom{2l+1}{l-s} \rho^s \\ &= \frac{(1+\rho)^l}{1-y} - \frac{y^{l+1}}{1-y} \rho^l \sum_{s=0}^l \binom{2l+1}{l-s} \left(\frac{1}{\rho}\right)^{l-s} \\ \end{align}

Thanks to the symmetry $\binom{2l+1}{l-s} = \binom{2l+1}{l + 1 + s}$, we see that the sum in the last expression is
$$\frac{1}{2} \sum_{s = 0}^{2l+1} \binom{2l+1}{s} \left(\frac{1}{\rho}\right)^s = \frac{1}{2} \left(1+\frac{1}{\rho}\right)^{2l+1}$$

Therefore, writing the two expressions for the coefficient yields the claim
$$\sum_{i=0}^l \binom{l+i}{l} y^i = \frac{(1+\rho)^l}{1-y} - \frac{y^{l+1}}{1-y}\cdot\frac{\rho^l}{2}\cdot\left( 1+\frac{1}{\rho}\right)^{2l+1}$$

In particular, with $y = 1/2$, we get
$$\sum_{i=0}^l \binom{l+i}{l} \frac{1}{2^i} = 2^l$$

2. Twisted case

We now consider the twisted case
$$C_{a,b}(y) \underset{def}{=} \sum_{i=0}^a \binom{b+i}{b} y^i.$$
The method is exactly the same except that we compute the coefficient of $x^b$ in the polynomial
$$\sum_{i=0}^a y^i (1+x)^{b+i}.$$
We get
$$C_{a,b}(y) = \frac{(1+\rho)^b}{1-y} - \frac{y^{a+1}}{1-y} \cdot \sum_{s=0}^b \binom{a+b+1}{b-s} \rho^s.$$
Unfortunately, the last sum does not simplify as before because of the asymmetry $a \ne b$ (a priori).

However, we can measure this twist by focusing on the quantity $y^b C_{a,b}(y) + y^a C_{b,a}(y)$. Regarding the parts containing the sums, we have
\begin{align} \sum_{s=0}^b &\binom{a+b+1}{b-s} \rho^s + \sum_{s=0}^a \binom{a+b+1}{a-s} \rho^s \\ &= \rho^b \cdot \sum_{s=0}^b \binom{a+b+1}{s} \left(\frac{1}{\rho}\right)^s + \rho^a \cdot \sum_{s=0}^a \binom{a+b+1}{s} \left(\frac{1}{\rho}\right)^s \\ &= \rho^b \left(1+\frac{1}{\rho}\right)^{a+b+1} - \rho^b \cdot \sum_{s=b+1}^{a+b+1} \binom{a+b+1}{s} \left(\frac{1}{\rho}\right)^s + \dots \\ &= \rho^b \left(1+\frac{1}{\rho}\right)^{a+b+1} - \rho^b \cdot \sum_{s=0}^{a} \binom{a+b+1}{a-s} \left(\frac{1}{\rho}\right)^s + \dots \\ &= \rho^b \left(1+\frac{1}{\rho}\right)^{a+b+1} + (\rho^a - \rho^b) \cdot \sum_{s=0}^{a} \binom{a+b+1}{a-s} \left(\frac{1}{\rho}\right)^s \\ \end{align}

Whence the identities
$$y^b C_{a,b}(y) + y^a C_{b,a}(y) \\ = \frac{y^a(1+\rho)^a + y^b (1+\rho)^b}{1-y} - \frac{y^{a+b+1}}{1-y} \rho^b \left(1+\frac{1}{\rho}\right)^{a+b+1} - \frac{y^{a+b+1}}{1-y} (\rho^a - \rho^b) \sum_{s=0}^a \binom{a+b+1}{a-s} \left(\frac{1}{\rho}\right)^s \\ = \frac{y^a(1+\rho)^a + y^b (1+\rho)^b}{1-y} - \frac{y^{a+b+1}}{1-y} \rho^a \left(1+\frac{1}{\rho}\right)^{a+b+1} - \frac{y^{a+b+1}}{1-y} (\rho^b - \rho^a) \sum_{s=0}^b \binom{a+b+1}{b-s} \left(\frac{1}{\rho}\right)^s$$

In particular, if $y = 1/2$, we have $\rho = 1$, and the twist is canceled so that
$$\sum_{i=0}^a \binom{b+i}{b} \frac{1}{2^{b+i}} + \sum_{i=0}^b \binom{a+i}{a} \frac{1}{2^{a+i}} = 2$$
which is, somewhat, funny.

Question. Is there a combinatorial proof of these identities ?

pb