Um lema técnico

Não tenho certeza do quanto isso é intuitivo, mas o principal resultado técnico subjacente à sua declaração do Teorema de Halmos-Savage é o seguinte:

Lema. Seja $\mu$ uma medida $\sigma$ definida em $(S, \mathcal{A})$ . Suponha-se que $\aleph$ é um conjunto de medidas em $(S, \mathcal{A})$ , tais que, para cada $\nu \in \aleph$ , $\nu \ll \mu$ . Existe uma sequência de números não negativos $\{c_i\}_{i=1}^\infty$ e uma sequência de elementos de $\aleph$ , $\{\nu_i\}_{i=1}^\infty$ de tal modo que $\sum_{i=1}^\infty c_i = 1$ e $\nu \ll \sum_{i=1}^\infty c_i \nu_i$ para cada $\nu \in \aleph$ .

Isso é extraído literalmente do Teorema A.78 na Teoria das Estatísticas de Schervish (1995) . Nele, ele o atribui às Hipóteses Estatísticas de Teste de Lehmann (1986) ( link para a terceira edição ), onde o resultado é atribuído aos próprios Halmos e Savage (ver Lema 7). Outra boa referência é a estatística matemática de Shao (segunda edição, 2003) , onde os resultados relevantes são o lema 2.1 e o teorema 2.2.

O lema acima afirma que, se você começar com uma família de medidas dominadas por uma medida $\sigma$ definida, na verdade poderá substituir a medida dominante por uma combinação convexa contável de medidas de dentro da família. Schervish escreve antes de afirmar o Teorema A.78,

"Em aplicações estatísticas, muitas vezes teremos uma classe de medidas, cada uma das quais é absolutamente contínua em relação a uma única medida $\sigma$ infinita. Seria bom se a única medida dominante estivesse na classe original ou pudesse ser construída a partir da O teorema a seguir aborda esse problema. "

Um exemplo concreto

Suponha que tomemos uma medida de uma quantidade $X$ que acreditamos estar distribuída uniformemente no intervalo $[0, \theta]$ para algum desconhecido $\theta > 0$ . Nesse problema estatístico, estamos considerando implicitamente o conjunto de medidas de probabilidade $\mathcal{P}$ de Borel em $\mathbb{R}$ consiste em distribuições uniformes em todos os intervalos da forma $[0, \theta]$ . Ou seja, se $\lambda$ indica a medida de Lebesgue e, para $\theta > 0$ , $P_\theta$ indica o $\operatorname{Uniform}([0, \theta])$ distribution (i.e.,

P θ (A) = 1 θ λ (A \cap [0, θ]) = \int A 1 θ 1 [0, θ] (x) d x

$P_\theta(A) = \frac{1}{\theta} \lambda(A \cap [0, \theta]) = \int_A \frac{1}{\theta} \mathbf{1}_{[0, \theta]}(x) \, dx$ for every Borel

A⊆R $A \subseteq \mathbb{R}$ ), then we simply have

P = {P θ : θ > 0} .

$\mathcal{P} = \{P_\theta : \theta > 0\}.$ This is the set of candidate distributions for our measurement

X $X$ .

A família $\mathcal{P}$ é claramente dominada pela medida de Lebesgue $\lambda$ (que é $\sigma$ infinita); portanto, o lema acima (com $\aleph = \mathcal{P}$ ) garante a existência de uma sequência $\{c_i\}_{i=1}^\infty$ de números não-negativos, somando $1$ e a sequência $\{Q_i\}_{i=1}^\infty$ de distribuições uniformes em $\mathcal{P}$ tal que

P θ ≪ \sum i = 1 \infty c i Q i

$P_\theta \ll \sum_{i=1}^\infty c_i Q_i$ for each

θ>0 $\theta > 0$ . In this example, we can construct such sequences explicitly!

First, let $(\theta_i)_{i=1}^\infty$ be an enumeration of the positive rational numbers (this can be done explicitly), and let $Q_i = P_{\theta_i}$ for each $i$ . Next, let $c_i = 2^{-i}$ , so that $\sum_{i=1}^\infty c_i = 1$ . I claim that this combination of $\{c_i\}_{i=1}^\infty$ and $\{Q_i\}_{i=1}^\infty$ works.

To see this, fix $\theta > 0$ and let $A$ be a Borel subset of $\mathbb{R}$ such that $\sum_{i=1}^\infty c_i Q_i(A) = 0$ . We need to show that $P_\theta(A) = 0$ . Since $\sum_{i=1}^\infty c_i Q_i(A) = 0$ and each summand is non-negative, it follows that $c_i Q_i(A) = 0$ for each $i$ . Moreover, since each $c_i$ is positive, it follows that $Q_i(A) = 0$ for each $i$ . That is, for all $i$ we have

Q i (A) = P θ i (A) = 1 θ i λ (A \cap [0, θ i]) = 0.

$Q_i(A) = P_{\theta_i}(A) = \frac{1}{\theta_i} \lambda(A \cap [0, \theta_i]) = 0.$ Since each

θi $\theta_i$ is positive, it follows that

λ(A∩[0,θi])=0 $\lambda(A \cap [0, \theta_i]) = 0$ for each

i $i$ .

Now choose a subsequence $\{\theta_{i_k}\}_{k=1}^\infty$ of $\{\theta_i\}_{i=1}^\infty$ which converges to $\theta$ from above (this can be done since $\mathbb{Q}$ is dense in $\mathbb{R}$ ). Then $A \cap [0, \theta_{\theta_{i_k}}] \downarrow A \cap [0, \theta]$ as $k \to \infty$ , so by continuity of measure we conclude that

$\lambda(A \cap [0, \theta]) = \lim_{k \to \infty} \lambda(A \cap [0, \theta_{i_k}]) = 0,$ and so

$P_\theta(A) = 0$ . This proves the claim.

Thus, in this example we were able to explicitly construct a countable convex combination of probability measures from our dominated family which still dominates the entire family. The Lemma above guarantees that this can be done for any dominated family (at least as long as the dominating measure is $\sigma$ -finite).

The Halmos-Savage Theorem

So now on to the Halmos-Savage Theorem (for which I will use slightly different notation than in the question due to personal preference). Given the Halmos-Savage Theorem, the Fisher-Neyman factorization theorem is just one application of the Doob-Dynkin lemma and the chain rule for Radon-Nikodym derivatives away!

Halmos-Savage Theorem. Let $(\mathcal{X}, \mathcal{B}, \mathcal{P})$ be a dominated statistical model (meaning that $\mathcal{P}$ is a set of probability measures on $\mathcal{B}$ and there is a $\sigma$ -finite measure $\mu$ on $\mathcal{B}$ such that $P \ll \mu$ for all $P \in \mathcal{P}$ ). Let $T : (\mathcal{X}, \mathcal{B}) \to (\mathcal{T}, \mathcal{C})$ be a measurable function, where $(T, \mathcal{C})$ is a standard Borel space. Then the following are equivalent:

$T$ is sufficient for $\mathcal{P}$ (meaning that there is a probability kernel $r : \mathcal{B} \times \mathcal{T} \to [0, 1]$ such that $r(B, T)$ is a version of $P(B \mid T)$ for all $B \in \mathcal{B}$ and $P \in \mathcal{P}$ ).

There exists a sequence $\{c_i\}_{i=1}^\infty$ of nonnegative numbers such that $\sum_{i=1}^\infty c_i = 1$ and a sequence $\{P_i\}_{i=1}^\infty$ of probability measures in $\mathcal{P}$ such that $P \ll P^*$ for all $P \in \mathcal{P}$ , where $P^* = \sum_{i=1}^\infty c_i P_i$ , and for each $P \in \mathcal{P}$ there exists a $T$ -measurable version of $dP/dP^*$ .

Proof. By the lemma above, we may immediately replace $\mu$ by $P^* = \sum_{i=1}^\infty c_i P_i$ for some sequence $\{c_i\}_{i=1}^\infty$ of nonnegative numbers such that $\sum_{i=1}^\infty c_i = 1$ and a sequence $\{P_i\}_{i=1}^\infty$ of probability measures in $\mathcal{P}$ .

(1. implies 2.) Suppose $T$ is sufficient. Then we must show that there are $T$ -measurable versions of $dP/dP^*$ for all $P \in \mathcal{P}$ . Let $r$ be the probability kernel in the statement of the theorem. For each $A \in \sigma(T)$ and $B \in \mathcal{B}$ we have

$\begin{aligned} P^*(A \cap B) &= \sum_{i=1}^\infty c_i P_i(A \cap B) \\ &= \sum_{i=1}^\infty c_i \int_A P_i(B \mid T) \, dP_i \\ &= \sum_{i=1}^\infty c_i \int_A r(B, T) \, dP_i \\ &= \int_A r(B, T) \, dP^*. \end{aligned}$ Thus

$r(B, T)$ is a version of

$P^*(B \mid T)$ for all

$B \in \mathcal{B}$ .

For each $P \in \mathcal{P}$ , let $f_P$ denote a version of the Radon-Nikodym derivative $dP/dP^*$ on the measurable space $(\mathcal{X}, \sigma(T))$ (so in particular $f_P$ is $T$ -measurable). Then for all $B \in \mathcal{B}$ and $P \in \mathcal{P}$ we have

$\begin{aligned} P(B) &= \int_{\mathcal{X}} P(B \mid T) \, dP \\ &= \int_{\mathcal{X}} r(B, T) \, dP \\ &= \int_{\mathcal{X}} r(B, T) f_P \, dP^* \\ &= \int_{\mathcal{X}} P^*(B \mid T) f_P \, dP^* \\ &= \int_{\mathcal{X}} E_{P^*}[\mathbf{1}_B f_P \mid T] \, dP^* \\ &= \int_B f_P \, dP^*. \end{aligned}$ Thus in fact

$f_P$ is a

$T$ -measurable version of

$dP/dP^*$ on

$(\mathcal{X}, \mathcal{B})$ . This proves that the first condition of the theorem implies the second.

(2. implies 1.) Suppose one can choose a $T$ -measurable version $f_P$ of $dP/dP^*$ for each $P \in \mathcal{P}$ . For each $B \in \mathcal{B}$ , let $r(B, t)$ denote a particular version of $P^*(B \mid T = t)$ (e.g., $r(B, t)$ is a function such that $r(B, T)$ is a version of $P^*(B \mid T)$ ). Since $(T, \mathcal{C})$ is a standard Borel space, we may choose $r$ in a way that makes it a probability kernel (see, e.g., Theorem B.32 in Schervish's Theory of Statistics (1995)). We will show that $r(B, T)$ is a version of $P(B \mid T)$ for any $P \in \mathcal{P}$ and any $B \in \mathcal{B}$ . Thus, let $A \in \sigma(T)$ and $B \in \mathcal{B}$ be given. Then for all $P \in \mathcal{P}$ we have

$\begin{aligned} P(A \cap B) &= \int_A \mathbf{1}_B f_P \, dP^* \\ &= \int_A E_{P^*}[\mathbf{1}_B f_P \mid T] \, dP^* \\ &= \int_A P^*(B \mid T) f_P \, dP^* \\ &= \int_A r(B, T) f_P \, dP^* \\ &= \int_A r(B, T) \, dP. \end{aligned}$ This shows that

$r(B, T)$ is a version of

$P(B \mid T)$ for any

$P \in \mathcal{P}$ and any

$B \in \mathcal{B}$ , and the proof is done.

Summary. The important technical result underlying the Halmos-Savage theorem as presented here is the fact that a dominated family of probability measures is actually dominated by a countable convex combination of probability measures from that family. Given that result, the rest of the Halmos-Savage theorem is mostly just manipulations with basic properties of Radon-Nikodym derivatives and conditional expectations.

Artem Mavrin
fonte

Compreensão intuitiva do teorema de Halmos-Savage

Respostas:

Um lema técnico

Um exemplo concreto

The Halmos-Savage Theorem