Limite superior exponencial

Suponha que tenhamos variáveis aleatórias IID $X_1,\dots,X_n$ com a distribuição $\mathrm{Ber}(\theta)$ . Vamos observar uma amostra do $X_i$ é da seguinte maneira: vamos $Y_1,\dots,Y_n$ ser independente $\mathrm{Ber}(1/2)$ variáveis aleatórias, suponha que todos os $X_i$ 's e $Y_i$ são independentes e definem o tamanho da amostra $N=\sum_{i=1}^n Y_i$ . Os $Y_i$ 's indicam qual dos $X_i$ ' s são na amostra, e que quer estudar a fracção de sucessos na amostra definido pelo

Z = {\begin{cases} \frac{1}{N} \sum_{i = 1}^{n} X_{i} Y_{i} & if N > 0, \\ 0 & if N = 0 . \end{cases}

$Z = \begin{cases} \frac{1}{N}\sum_{i=1}^n X_i Y_i & \text{if}\quad N > 0\, , \\ 0 & \text{if} \quad N = 0 \, . \end{cases}$ Para

ϵ > 0

$\epsilon>0$ , queremos encontrar um limite superior para

P r (Z \geq θ + ϵ)

$\mathrm{Pr}\!\left(Z \geq \theta + \epsilon\right)$ que decai exponencialmente com

n

$n$ . A desigualdade de Hoeffding não se aplica imediatamente por causa das dependências entre as variáveis.

probability-inequalities zen
fonte

Seja

Z_{i} = \frac{_{1}}{^{N}} X_{i} Y_{i}

$Z_i = \frac{_1}{^N} X_iY_i$ . (i) não é

Z_{i}

$Z_i$ independente de

Z_{j \neq i}

$Z_{j\neq i}$ ? (ii) não é

Z = \sum Z_{i}

$Z=\sum Z_i$ ? ... Como resultado, não está claro para mim que

Z

$Z$ não é 'uma soma de variáveis aleatórias independentes'

Glen_b -Reinstate Monica

Ah, bom argumento. Eu estava pensando em

n

$n$ , em vez de

N

$N$ . Mas você não pode escrever

Z_{i} = \frac{1}{n} X_{i} Y_{i}

$Z_i = \frac{1}{n}X_iY_i$ , e deixá-

Z = \sum_{i = 1}^{n} Z_{i}

$Z=\sum_{i=1}^n Z_i$ ? Ou seja, soma em todos os casos, se

Y

$Y$ é 1 ou 0. ... não, isso não funciona. O numerador é o mesmo, mas o denominador é diferente.

Glen_b -Reinstar Monica

Isso fornece menos do que a fração de sucessos na amostra, que é a quantidade de interesse no problema, porque

(1 / n) \sum_{i = 1}^{n} X_{i} Y_{i} \leq (1 / N) \sum_{i = 1}^{n} X_{i} Y_{i}

$(1/n)\sum_{i=1}^n X_i Y_i\leq (1/N)\sum_{i=1}^n X_i Y_i$ , já que

N \leq n

$N\leq n$ .

Zen

Sim, foi por isso que terminei com "não, isso não funciona". Existem desigualdades que se aplicam ao caso não independente, como algumas das desigualdades de Bernstein (veja o quarto item), e há várias desigualdades que se aplicam a martingales (embora eu não saiba que elas serão aplicadas aqui).

Glen_b -Reinstala Monica

Vou dar uma olhada e também tentar encontrar uma conexão com os resultados de martingales. O limite para

U = (1 / n) \sum_{i = 1}^{n} X_{i} Y_{i}

$U=(1/n)\sum_{i=1}^nX_i Y _i$ é tão fácil (

P r (U \geq θ / 2 + ϵ) \leq \exp (- 2 n ϵ^{2})

$\mathrm{Pr}(U\geq \theta/2+\epsilon)\leq \exp(-2n\epsilon^2)$ ) que é tentador conectar isso com

Z

$Z$ usando algum tipo de condicionamento.

Zen

Respostas:

Podemos estabelecer uma conexão com a desigualdade de Hoeffding de uma maneira bastante direta .

Note-se que temos

{Z > θ + ϵ} = {\sum_{i} X_{i} Y_{i} > (θ + ϵ) \sum_{i} Y_{i}} = {\sum_{i} (X_{i} - θ - ϵ) Y_{i} > 0} .

$\{ Z > \theta + \epsilon\} = \big\{\sum_i X_i Y_i > (\theta + \epsilon)\sum_i Y_i \big\} = \big\{ \sum_i (X_i - \theta - \epsilon) Y_i > 0 \} \>.$

Defina para que seja iid, e $Z_i = (X_i - \theta - \epsilon)Y_i + \epsilon/2$ $Z_i$ $\mathbb E Z_i = 0$ Através de uma aplicação simples dadesigualdade de Hoeffding(uma vez que o e assim tomar valores num intervalo de tamanho um).

P (Z > θ + ϵ) = P (\sum_{i} Z_{i} > n ϵ / 2) \leq e^{- n ϵ^{2} / 2},

$\mathbb P( Z > \theta + \epsilon ) = \mathbb P\big(\sum_i Z_i > n \epsilon/2\big) \leq e^{-n \epsilon^2/2}\>,$

Z_{i} \in [- θ - ϵ / 2, 1 - θ - ϵ / 2]

$Z_i \in [-\theta-\epsilon/2,1-\theta-\epsilon/2]$

Existe uma rica e fascinante literatura relacionada que se desenvolveu nos últimos anos, em particular sobre tópicos relacionados à teoria da matriz aleatória com várias aplicações práticas. Se você está interessado nesse tipo de coisa, recomendo:

R. Vershynin, Introdução à análise não assintótica de matrizes aleatórias , Capítulo 5 de Sensoriamento Comprimido, Teoria e Aplicações. Editado por Y. Eldar e G. Kutyniok. Cambridge University Press, 2012.

Penso que a exposição é clara e fornece uma maneira muito agradável de se acostumar rapidamente à literatura.

cardeal
fonte

Desde a

incluem

em sua definição, eu tenho a impressão de que

(o limite não muda).

Z_{i}

$Z_i$

ϵ / 2

$\epsilon/2$

Z_{i} \in [- θ - ϵ / 2, 1 - θ - ϵ / 2]

$Z_i \in [-\theta-\epsilon/2,1-\theta-\epsilon/2]$

Alecos Papadopoulos

Prezado @Zen: Observe que uma contabilidade cuidadosa do caso

permitirá que você substitua a desigualdade estrita

por

todos os lugares sem alterar o limite final.

N = 0

$N=0$

>

$>$

\geq

$\geq$

cardeal

Dear @cardinal: I've reworded the question because actually

Z

$Z$ is a (slightly) biased estimator of

θ

$\theta$ , since

E [Z] = E [I_{{N = 0}} Z] + E [I_{{N > 0}} Z] = (1 - 1 / 2^{n}) θ

$\mathrm{E}[Z]=\mathrm{E}[I_{\{N=0\}}Z]+\mathrm{E}[I_{\{N>0\}}Z] = (1-1/2^n)\,\theta$ .

Zen

Details to take care of the $N=0$ case.

\begin{aligned} {Z \geq θ + ϵ} & = ({Z \geq θ + ϵ} \cap {N = 0}) \cup ({Z \geq θ + ϵ} \cap {N > 0}) \\ = ({0 \geq θ + ϵ} \cap {N = 0}) \cup ({Z \geq θ + ϵ} \cap {N > 0}) \\ = (\emptyset \cap {N = 0}) \cup ({Z \geq θ + ϵ} \cap {N > 0}) \\ = {\sum_{i = 1}^{n} X_{i} Y_{i} \geq (θ + ϵ) \sum_{i = 1}^{n} Y_{i}} \cap {N > 0} \\ \subset {\sum_{i = 1}^{n} X_{i} Y_{i} \geq (θ + ϵ) \sum_{i = 1}^{n} Y_{i}} \\ = {\sum_{i = 1}^{n} (X_{i} - θ - ϵ) Y_{i} \geq 0} \\ = {\sum_{i = 1}^{n} ((X_{i} - θ - ϵ) Y_{i} + ϵ / 2) \geq n ϵ / 2} . \end{aligned}

$\begin{align} \{Z\geq\theta+\epsilon\} &= \left(\{Z\geq\theta+\epsilon\} \cap \{N=0\}\right) \cup \left(\{Z\geq\theta+\epsilon\} \cap \{N>0\}\right) \\ &= \left(\{0\geq\theta+\epsilon\} \cap \{N=0\}\right) \cup \left(\{Z\geq\theta+\epsilon\} \cap \{N>0\}\right) \\ &= \left(\emptyset \cap \{N=0\}\right) \cup \left(\{Z\geq\theta+\epsilon\} \cap \{N>0\}\right) \\ &= \left\{\sum_{i=1}^n X_iY_i\geq(\theta+\epsilon)\sum_{i=1}^n Y_i\right\} \cap \{N>0\} \\ &\subset \left\{\sum_{i=1}^n X_iY_i\geq(\theta+\epsilon)\sum_{i=1}^n Y_i\right\} \\ &= \left\{\sum_{i=1}^n (X_i-\theta-\epsilon)Y_i\geq 0\right\} \\ &= \left\{\sum_{i=1}^n \left((X_i-\theta-\epsilon)Y_i+\epsilon/2\right)\geq n\epsilon/2\right\} \, . \end{align}$

For Alecos.

\begin{aligned} E [\sum_{i = 1}^{n} W_{i}] & = E [I_{{\sum_{i = 1}^{n} Y_{i} = 0}} \sum_{i = 1}^{n} W_{i}] + E [I_{{\sum_{i = 1}^{n} Y_{i} > 0}} \sum_{i = 1}^{n} W_{i}] \\ = E [I_{{\sum_{i = 1}^{n} Y_{i} > 0}} \frac{\sum_{i = 1}^{n} Y_{i}}{\sum_{i = 1}^{n} Y_{i}}] = E [I_{{\sum_{i = 1}^{n} Y_{i} > 0}}] = 1 - 1 / 2^{n} . \end{aligned}

$\begin{align} \mathrm{E}\!\left[\sum_{i=1} ^n W_i\right]&=\mathrm{E}\!\left[I_{\{\sum_{i=1}^n Y_i=0\}}\sum_{i=1} ^n W_i\right] + \mathrm{E}\!\left[I_{\{\sum_{i=1}^n Y_i>0\}}\sum_{i=1} ^n W_i\right] \\ &=\mathrm{E}\!\left[I_{\{\sum_{i=1}^n Y_i>0\}}\frac{\sum_{i=1} ^n Y_i}{\sum_{i=1}^n Y_i}\right]=\mathrm{E}\!\left[I_{\{\sum_{i=1}^n Y_i>0\}}\right]=1-1/2^n \, . \end{align}$

Zen
fonte

This answer keeps mutating. The current version does not relate to the discussion I had with @cardinal in the comments (although it was through this discussion that I thankfully realized that the conditioning approach did not appear to lead anywhere).

For this attempt, I will use another part of Hoeffding's original 1963 paper, namely section 5 "Sums of Dependent Random Variables".

Set

W_{i} \equiv \frac{Y_{i}}{\sum_{i = 1}^{n} Y_{i}}, \sum_{i = 1}^{n} Y_{i} \neq 0, \sum_{i = 1}^{n} W_{i} = 1, n \geq 2

$W_i \equiv \frac {Y_i}{\sum_{i=1}^nY_i}, \qquad \sum_{i=1}^nY_i \neq 0, \qquad \sum_{i=1}^nW_i=1, \qquad n\geq 2$

while we set $W_i =0$ if $\sum_{i=1}^nY_i = 0$ .

Then we have the variable

Z_{n} = \sum_{i = 1}^{n} W_{i} X_{i}, E (Z_{n}) \equiv μ_{n}

$Z_n= \sum_{i=1}^nW_iX_i, \qquad E(Z_n) \equiv \mu_n$

We are interested in the probability

P r (Z_{n} \geq μ_{n} + ϵ), ϵ < 1 - μ_{n}

$\mathrm{Pr}(Z_n\geq \mu_n +\epsilon), \qquad \epsilon < 1-\mu_n$

As for many other inequalities, Hoeffding starts his reasoning by noting that

P r (Z_{n} \geq μ_{n} + ϵ) = E [1_{{Z_{n} - μ_{n} - ϵ \geq 0}}]

$\mathrm{Pr}(Z_n\geq \mu_n +\epsilon) = E\left[\mathbf 1_{\{Z_n-\mu_n -\epsilon \geq 0\}}\right]$ and that

1_{{Z_{n} - μ_{n} - ϵ \geq 0}} \leq \exp {h (Z_{n} - μ_{n} - ϵ)}, h > 0

$\mathbf 1_{\{Z_n-\mu_n -\epsilon\geq 0\}} \leq \exp\Big\{h(Z_n-\mu_n -\epsilon)\Big\}, \qquad h>0$

For the dependent-variables case, as Hoeffding we use the fact that $\sum_{i=1}^nW_i=1$ and invoke Jensen's inequality for the (convex) exponential function, to write

e^{h Z_{n}} = \exp {h (\sum_{i = 1}^{n} W_{i} X_{i})} \leq \sum_{i = 1}^{n} W_{i} e^{h X_{i}}

$e^{hZ_n} = \exp\left\{h\left(\sum_{i=1}^nW_iX_i\right)\right\} \leq \sum_{i=1}^nW_ie^{hX_i}$

and linking results to arrive at

P r (Z_{n} \geq μ_{n} + ϵ) \leq e^{- h (μ_{n} + ϵ)} E [\sum_{i = 1}^{n} W_{i} e^{h X_{i}}]

$\mathrm{Pr}(Z_n\geq \mu_n +\epsilon) \leq e^{-h(\mu_n+\epsilon)}E\left[\sum_{i=1}^nW_ie^{hX_i}\right]$

Focusing on our case, since $W_i$ and $X_i$ are independent, expected values can be separated,

P r (Z_{n} \geq μ_{n} + ϵ) \leq e^{- h (μ_{n} + ϵ)} \sum_{i = 1}^{n} E (W_{i}) E (e^{h X_{i}})

$\mathrm{Pr}(Z_n\geq \mu_n +\epsilon) \leq e^{-h(\mu_n+\epsilon)}\sum_{i=1}^nE(W_i)E\left(e^{hX_i}\right)$

In our case, the $X_i$ are i.i.d Bernoullis with parameter $\theta$ , and $E[e^{hX_i}]$ is their common moment generating function in $h$ , $E[e^{hX_i}] = 1-\theta +\theta e^h$ . So

P r (Z_{n} \geq μ_{n} + ϵ) \leq e^{- h (μ_{n} + ϵ)} (1 - θ + θ e^{h}) \sum_{i = 1}^{n} E (W_{i})

$\mathrm{Pr}(Z_n\geq \mu_n +\epsilon) \leq e^{-h(\mu_n+\epsilon)}(1-\theta +\theta e^h)\sum_{i=1}^nE(W_i)$

Minimizing the RHS with respect to $h$ , we get

e^{h^{*}} = \frac{(1 - θ) (μ_{n} + ϵ)}{θ (1 - μ_{n} - ϵ)}

$e^{h^*} = \frac {(1-\theta)(\mu_n+\epsilon)}{\theta(1-\mu_n-\epsilon)}$

Plugging it into the inequality and manipulating we obtain

P r (Z_{n} \geq μ_{n} + ϵ) \leq {(\frac{θ}{μ_{n} + ϵ})}^{μ_{n} + ϵ} \cdot {(\frac{1 - θ}{1 - μ_{n} - ϵ})}^{1 - μ_{n} - ϵ} \sum_{i = 1}^{n} E (W_{i})

$\mathrm{Pr}(Z_n\geq \mu_n +\epsilon) \leq \left(\frac {\theta}{\mu_n+\epsilon}\right)^{\mu_n+\epsilon}\cdot \left(\frac {1-\theta}{1-\mu_n-\epsilon}\right)^{1-\mu_n-\epsilon}\sum_{i=1}^nE(W_i)$

while

P r (Z_{n} \geq θ + ϵ) \leq {(\frac{θ}{θ + ϵ})}^{θ + ϵ} \cdot {(\frac{1 - θ}{1 - θ - ϵ})}^{1 - θ - ϵ} \sum_{i = 1}^{n} E (W_{i})

$\mathrm{Pr}(Z_n\geq \theta +\epsilon) \leq \left(\frac {\theta}{\theta+\epsilon}\right)^{\theta+\epsilon}\cdot \left(\frac {1-\theta}{1-\theta-\epsilon}\right)^{1-\theta-\epsilon}\sum_{i=1}^nE(W_i)$

Hoeffding shows that

{(\frac{θ}{θ + ϵ})}^{θ + ϵ} \cdot {(\frac{1 - θ}{1 - θ - ϵ})}^{1 - θ - ϵ} \leq e^{- 2 ϵ^{2}}

$\left(\frac {\theta}{\theta+\epsilon}\right)^{\theta+\epsilon}\cdot \left(\frac {1-\theta}{1-\theta-\epsilon}\right)^{1-\theta-\epsilon} \leq e^{-2\epsilon^2}$

Courtesy of the OP (thanks, I was getting a bit exhausted...)

\sum_{i = 1}^{n} E (W_{i}) = 1 - 1 / 2^{n}

$\sum_{i=1}^n E(W_i) =1-1/2^n$

So, finally, the "dependent variables approach" gives us

P r (Z_{n} \geq θ + ϵ) \leq (1 - \frac{1}{2^{n}}) e^{- 2 ϵ^{2}} \equiv B_{D}

$\mathrm{Pr}(Z_n\geq \theta +\epsilon) \leq (1-\frac 1{2^n})e^{-2\epsilon^2} \equiv B_D$

Let's compare this to Cardinal's bound, that is based on an "independence" transformation, $B_I$ . For our bound to be tighter, we need

B_{D} = (1 - \frac{1}{2^{n}}) e^{- 2 ϵ^{2}} \leq e^{- n ϵ^{2} / 2} = B_{I}

$B_D=(1-\frac 1{2^n})e^{-2\epsilon^2} \leq e^{-n\epsilon^2/2}=B_I$

\Rightarrow \frac{2^{n} - 1}{2^{n}} \leq \exp {(\frac{4 - n}{2}) ϵ^{2}}

$\Rightarrow \frac {2^n-1}{2^n} \leq \exp\left\{\left(\frac {4-n}{2}\right)\epsilon^2\right\}$

So for $n\leq 4$ we have $B_D \leq B_I$ . For $n \geq 5$ , pretty quickly $B_I$ becomes tighter than $B_D$ but for very small $\epsilon$ , while even this small "window" quickly converges to zero. For example, for $n=12$ , if $\epsilon \geq 0.008$ , then $B_I$ is tighter. So in all, Cardinal's bound is more useful.

COMMENT
To avoid misleading impressions regarding Hoeffding's original paper, I have to mention that Hoeffding examines the case of a deterministic convex combination of dependent random variables. Specificaly, his $W_i$ 's are numbers, not random variables, while each $X_i$ is a sum of independent random variables, while the dependency may exist between the $X_i$ 's. He then considers various "U-statistics" that can be represented in this way.

Alecos Papadopoulos
fonte

Alecos:

E [W_{1}] = (1 - 1 / 2^{n}) / n

$\mathrm{E}[W_1]=(1-1/2^n)/n$ (take a look at the derivation at the end of my answer). Your bound doesn't decay exponentially with

n

$n$ as cardinal's does.

Zen

@Zen Indeed (in fact it increases with sample size, although boundedly), that's why Cardinal's bound is more useful for most sample sizes.

Alecos Papadopoulos