Intervalo de previsão para variável aleatória binomial

Qual é a fórmula (aproximada ou exata) para um intervalo de previsão para uma variável aleatória binomial?

Suponha $Y \sim \mathsf{Binom}(n, p)$ , e observa- (desenhado a partir de ). O é conhecido. $y$ $Y$ $n$

O nosso objectivo é a obtenção de um intervalo de previsão de 95% para um novo desenho de . $Y$

A estimativa pontual é , onde . Um intervalo de confiança para é simples, mas não consigo encontrar uma fórmula para um intervalo de previsão para . Se soubéssemos (em vez de ), um intervalo de previsão de 95% envolve apenas encontrar os quantis de um binômio. Existe algo óbvio que estou ignorando? $n\hat{p}$ $\hat{p}=\frac{y}{n}$ $\hat{p}$ $Y$ $p$ $\hat{p}$

confidence-interval binomial prediction-interval Statseeker
fonte

Veja Que métodos não bayesianos existem para inferência preditiva? . Nesse caso, o método que usa pivôs não está disponível (acho que não), mas você pode usar uma das probabilidades preditivas. Ou, é claro, uma abordagem bayesiana.

Scortchi - Restabelece Monica

Olá pessoal, gostaria de reservar um momento para abordar as preocupações que foram levantadas. - em relação à confiança para p: não estou interessado nisso. - considerando que as previsões são 95% da distribuição: sim, é exatamente isso que os intervalos de previsão são, independentemente do contexto (na regressão, você deve assumir erros normais, onde os intervalos de confiança dependem do CLT - sim, o exemplo de previsão do número de cabeças em uma troca de moeda está correta O que dificulta esse problema é que agora não "p", apenas temos uma estimativa.

Statseeker

@Addison Leia o livro Intervalos estatísticos de G. Hahn e W. Meeker. Eles explicam a diferença entre intervalos de confiança, intervalos de previsão, intervalos de tolerância e intervalos credíveis bayesianos. Um intervalo de previsão de 95% não contém 95% da distribuição. Faz o que os intervalos mais freqüentes fazem. Se você fizer uma amostragem repetida de B (n, p) e usar o mesmo método todas as vezes para produzir um intervalo de previsão de 95% para p, então 95% dos intervalos de previsão você conterá o valor verdadeiro de p. Se você deseja cobrir 95% da distribuição, construa um intervalo de tolerância.

Michael R. Chernick 13/01/19

Intervalos de tolerância cobrem uma porcentagem da distribuição. Para um intervalo de tolerância de 95% para 90% da distribuição, repita o processo novamente várias vezes e use o mesmo método para gerar o intervalo de cada vez; em aproximadamente 95% dos casos, pelo menos 90% da distribuição cairá no intervalo e 5% do tempo menos de 90% da distribuição estará contido no intervalo.

Michael R. Chernick

Lawless & Fredette (2005), "Intervalos de predição freqüentes e distribuições preditivas", Biometrika , 92 , 3 é outra boa referência, além daquelas no link que dei.

Scortchi - Restabelece Monica

Ok, vamos tentar isso. Darei duas respostas - a bayesiana, que em minha opinião é simples e natural, e uma das possíveis freqüentadoras.

Solução Bayesiana

Assumimos um antes Beta em $p$ , i, e., $p \sim Beta(\alpha,\beta)$ , porque o modelo de Beta-binomial é conjugado, o que significa que a distribuição a posteriori é também uma distribuição beta com parâmetros , (estou usando para indicar o número de sucessos em ensaios, em vez de ). Assim, a inferência é bastante simplificada. Agora, se você tem algum conhecimento prévio sobre os valores prováveis de $\hat{\alpha}=\alpha+k,\hat{\beta}=\beta+n-k$ $k$ $n$ $y$ $p$ , você pode usá-lo para definir os valores de $\alpha$ e $\beta$ , ou seja, para definir seu Beta anterior, caso contrário, você pode assumir um anterior uniforme (não informativo), com $\alpha=\beta=1$ ou outros anteriores não informativos (veja, por exemplo,aqui)) Em qualquer caso, seu posterior é

$Pr(p|n,k)=Beta(\alpha+k,\beta+n-k)$

Na inferência bayesiana, tudo o que importa é a probabilidade posterior, significando que, uma vez que você saiba disso, poderá fazer inferências para todas as outras quantidades em seu modelo. Você deseja fazer inferência nos observáveis $y$ : em particular, em um vetor de novos resultados $\mathbf{y}=y_1,\dots,y_m$ , em que $m$ não é necessariamente igual a $n$ . Especificamente, para cada $j=0,\dots,m$ , queremos calcular a probabilidade de ter exatamente $j$ sucessos nos próximos $m$ testes, considerando que obtivemos $k$ sucessos nas $n$ tentativas anteriores ; a função de massa preditiva posterior:

Entretanto, nosso modelo binomial para significa que, condicionalmente em ter um certo valor, a probabilidade de ter sucessos em ensaios não depende de resultados passados: é simplesmente $Y$ $p$ $j$ $m$

$f(j|m,p)=\binom{j}{m} p^j(1-p)^j$

Assim, a expressão se torna

$Pr(j|m,n,k)=\int_0^1 \binom{j}{m} p^j(1-p)^j Pr(p|n,k)dp=\int_0^1 \binom{j}{m} p^j(1-p)^j Beta(\alpha+k,\beta+n-k)dp$

O resultado dessa integral é uma distribuição bem conhecida chamada distribuição beta-binomial: pulando as passagens, obtemos a expressão horrível

$Pr(j|m,n,k)=\frac{m!}{j!(m-j)!}\frac{\Gamma(\alpha+\beta+n)}{\Gamma(\alpha+k)\Gamma(\beta+n-k)}\frac{\Gamma(\alpha+k+j)\Gamma(\beta+n+m-k-j)}{\Gamma(\alpha+\beta+n+m)}$

Nossa estimativa pontual para , dada a perda quadrática, é obviamente a média dessa distribuição, ou seja, $j$

$\mu=\frac{m(\alpha+k)}{(\alpha+\beta+n)}$

Agora, vamos procurar um intervalo de previsão. Como essa é uma distribuição discreta, não temos uma expressão de forma fechada para , de modo que . O motivo é que, dependendo de como você define um quantil, para uma distribuição discreta, a função quantil não é uma função ou é uma função descontínua. Mas esse não é um grande problema: para pequeno , basta escrever as probabilidades $[j_1,j_2]$ $Pr(j_1\leq j \leq j_2)= 0.95$ $m$ $m$ $Pr(j=0|m,n,k),Pr(j\leq 1|m,n,k),\dots,Pr(j \leq m-1|m,n,k)$ and from here find $j_1,j_2$ such that

$Pr(j_1\leq j \leq j_2)=Pr(j\leq j_2|m,n,k)-Pr(j < j_1|m,n,k)\geq 0.95$

Of course you would find more than one couple, so you would ideally look for the smallest $[j_1,j_2]$ such that the above is satisfied. Note that

$Pr(j=0|m,n,k)=p_0,Pr(j\leq 1|m,n,k)=p_1,\dots,Pr(j \leq m-1|m,n,k)=p_{m-1}$

are just the values of the CMF (Cumulative Mass Function) of the Beta-Binomial distribution, and as such there is a closed form expression, but this is in terms of the generalized hypergeometric function and thus is quite complicated. I'd rather just install the R package extraDistr and call pbbinom to compute the CMF of the Beta-Binomial distribution. Specifically, if you want to compute all the probabilities $p_0,\dots,p_{m-1}$ in one go, just write:

library(extraDistr)  
jvec <- seq(0, m-1, by = 1) 
probs <- pbbinom(jvec, m, alpha = alpha + k, beta = beta + n - k)

where alpha and beta are the values of the parameters of your Beta prior, i.e., $\alpha$ and $\beta$ (thus 1 if you're using a uniform prior over $p$ ). Of course it would all be much simpler if R provided a quantile function for the Beta-Binomial distribution, but unfortunately it doesn't.

Practical example with the Bayesian solution

Let $n=100$ , $k=70$ (thus we initially observed 70 successes in 100 trials). We want a point estimate and a 95%-prediction interval for the number of successes $j$ in the next $m=20$ trials. Then

n <- 100
k <- 70
m <- 20
alpha <- 1
beta  <- 1

where I assumed a uniform prior on $p$ : depending on the prior knowledge for your specific application, this may or may not be a good prior. Thus

bayesian_point_estimate <- m * (alpha + k)/(alpha + beta + n) #13.92157

Clearly a non-integer estimate for $j$ doesn't make sense, so we could just round to the nearest integer (14). Then, for the prediction interval:

jvec <- seq(0, m-1, by = 1)
library(extraDistr)
probabilities <- pbbinom(jvec, m, alpha = alpha + k, beta = beta + n - k)

The probabilities are

> probabilities
 [1] 1.335244e-09 3.925617e-08 5.686014e-07 5.398876e-06
 [5] 3.772061e-05 2.063557e-04 9.183707e-04 3.410423e-03
 [9] 1.075618e-02 2.917888e-02 6.872028e-02 1.415124e-01
[13] 2.563000e-01 4.105894e-01 5.857286e-01 7.511380e-01
[17] 8.781487e-01 9.546188e-01 9.886056e-01 9.985556e-01

For an equal-tail probabilities interval, we want the smallest $j_2$ such that $Pr(j\leq j_2|m,n,k)\ge 0.975$ and the largest $j_1$ such that $Pr(j < j_1|m,n,k)=Pr(j \le j_1-1|m,n,k)\le 0.025$ . This way, we will have

$Pr(j_1\leq j \leq j_2|m,n,k)=Pr(j\leq j_2|m,n,k)-Pr(j < j_1|m,n,k)\ge 0.975-0.025=0.95$

Thus, by looking at the above probabilities, we see that $j_2=18$ and $j_1=9$ . The probability of this Bayesian prediction interval is 0.9778494, which is larger than 0.95. We could find shorter intervals such that $Pr(j_1\leq j \leq j_2|m,n,k)\ge 0.95$ , but in that case at least one of the two inequalities for the tail probabilities wouldn't be satisfied.

Frequentist solution

I'll follow the treatment of Krishnamoorthy and Peng, 2011. Let $Y\sim Binom(m,p)$ and $X\sim Binom(n,p)$ be independently Binominally distributed. We want a $1-2\alpha-$ prediction interval for $Y$ , based on a observation of $X$ . In other words we look for $I=[L(X;n,m,\alpha),U(X;n,m,\alpha)]$ such that:

$Pr_{X,Y}(Y\in I)=Pr_{X,Y}(L(X;n,m,\alpha)\leq Y\leq U(X;n,m,\alpha)]\geq 1-2\alpha$

The " $\geq 1-2\alpha$ " is due to the fact that we are dealing with a discrete random variable, and thus we cannot expect to get exact coverage...but we can look for an interval which has always at least the nominal coverage, thus a conservative interval. Now, it can be proved that the conditional distribution of $X$ given $X+Y=k+j=s$ is hypergeometric with sample size $s$ , number of successes in the population $n$ and population size $n+m$ . Thus the conditional pmf is

$Pr(X=k|X+Y=s,n,n+m)=\frac{\binom{n}{k}\binom{m}{s-k}}{\binom{m+n}{s}}$

The conditional CDF of $X$ given $X+Y=s$ is thus

$Pr(X\leq k|s,n,n+m)=H(k;s,n,n+m)=\sum_{i=0}^k\frac{\binom{n}{i}\binom{m}{s-i}}{\binom{m+n}{s}}$

The first great thing about this CDF is that it doesn't depend on $p$ , which we don't know. The second great thing is that it allows to easily find our PI: as a matter of fact, if we observed a value $k$ of X, then the $1-\alpha$ lower prediction limit is the smallest integer $L$ such that

$Pr(X\geq k|k+L,n,n+m)=1-H(k-1;k+L,n,n+m)>\alpha$

correspondingly, the the $1-\alpha$ upper prediction limit is the largest integer such that

$Pr(X\leq k|k+U,n,n+m)=H(k;k+U,n,n+m)>\alpha$

Thus, $[L,U]$ is a prediction interval for $Y$ of coverage at least $1-2\alpha$ . Note that when $p$ is close to 0 or 1, this interval is conservative even for large $n$ , $m$ , i.e., its coverage is quite larger than $1-2\alpha$ .

Practical example with the Frequentist solution

Same setting as before, but we don't need to specify $\alpha$ and $\beta$ (there are no priors in the Frequentist framework):

n <- 100
k <- 70
m <- 20

The point estimate is now obtained using the MLE estimate for the probability of successes, $\hat{p}=\frac{k}{n}$ , which in turns leads to the following estimate for the number of successes in $m$ trials:

frequentist_point_estimate <- m * k/n #14

For the prediction interval, the procedure is a bit different. We look for the largest $U$ such that $Pr(X\leq k|k+U,n,n+m)=H(k;k+U,n,n+m)>\alpha$ , thus let's compute the above expression for all $U$ in $[0,m]$ :

jvec <- seq(0, m, by = 1)
probabilities <- phyper(k,n,m,k+jvec)

We can see that the largest $U$ such that the probability is still larger than 0.025 is

jvec[which.min(probabilities > 0.025) - 1] # 18

Same as for the Bayesian approach. The lower prediction bound $L$ is the smallest integer such that $Pr(X\geq k|k+L,n,n+m)=1-H(k-1;k+L,n,n+m)>\alpha$ , thus

probabilities <- 1-phyper(k-1,n,m,k+jvec)
jvec[which.max(probabilities > 0.025) - 1] # 8

Thus our frequentist "exact" prediction interval is $[L,U]=[8,18]$ .

DeltaIV
fonte

Intervalo de previsão para variável aleatória binomial

Respostas:

Solução Bayesiana

Practical example with the Bayesian solution

Frequentist solution

Practical example with the Frequentist solution