Passos para descobrir uma distribuição posterior quando pode ser simples o suficiente para ter uma forma analítica?

12

Isso também foi solicitado na Ciência da Computação.

Estou tentando calcular uma estimativa bayesiana de alguns coeficientes para uma regressão automática, com 11 amostras de dados:

Yi=μ+αYi1+ϵi
ondeϵi é normal de média 0 e variânciaσe2 A distribuição antes de o vector(μ,α)t é normal de média(0,0) e uma matriz covariância diagonal com entradas diagonais igual paraσp2 .

Com base na fórmula auto-regressão, isto significa que a distribuição dos pontos de dados (o Yi ) são normal com média μ+αYi1 e variânciaσe2 . Assim, a densidade para todos os pontos de dados(Y) conjunto (assumindo independência, o que é bom para o programa que estou escrevendo) seria:

p(Y|(μ,α)t)=i=21112πσe2exp(YiμαYi1)22σe2.

Pelo teorema de Bayes, podemos pegar o produto da densidade acima com a densidade anterior e depois precisaremos apenas da constante de normalização. Meu palpite é que isso deve funcionar como uma distribuição gaussiana, para que possamos nos preocupar com a constante de normalização no final, em vez de calculá-la explicitamente com integrais acima de μ e α .

Esta é a parte com a qual estou tendo problemas. Como computo a multiplicação da densidade anterior (que é multivariada) e este produto de densidades de dados univariadas? O posterior precisa ser puramente uma densidade de μ e α , mas não consigo ver como você conseguirá isso com esse produto.

Qualquer indicação é realmente útil, mesmo que você me aponte na direção certa e eu precise fazer a álgebra confusa (que é o que eu já tentei várias vezes).

Como ponto de partida, eis a forma do numerador da regra de Bayes:

1(2πσe2)52πσp2exp[12σe2i=211(YiμαYi1)2μ22σp2α22σp2].

A questão é como ver que isso reduz a uma densidade gaussiana de .(μ,α)t

Adicionado

Em última análise, isso se resume ao seguinte problema geral. Se lhe for dada alguma expressão quadrática tal como como fazê-lo colocar em que uma forma quadrática ( μ - μ , α - α ) Q ( μ - μ , α - α ) t para alguns matriz 2x2 Q

Aμ2+Bμα+Cα2+Jμ+Kα+L
(μμ^,αα^)Q(μμ^,αα^)tQ? É o suficiente simples em casos fáceis, mas que processo que você usa para obter a estimativa e α ?μ^α^

Observe que tentei a opção direta de expandir a fórmula da matriz e depois tentar igualar os coeficientes como acima. O problema, no meu caso, é que a constante é zero e, em seguida, acabo obtendo três equações em duas incógnitas; portanto, é indeterminado apenas corresponder aos coeficientes (mesmo que assuma uma matriz de forma quadrática simétrica).L

ely
fonte
My answer to [this question]( stats.stackexchange.com/questions/22852/…) may be helpful. Note that you need a prior for your first observation - the iterations stop there.
probabilityislogic
I don't see why I need it in this case. I'm supposed to treat the time intervals like they are conditionally independent given the observation. Notice that the product of the joint density is just from i=2..11. I don't think I'm supposed to get a sequentially updated formula here, just a single formula for the posterior p((μ,α)t|Y).
ely
The "multivariate" in the prior p(α,μ) is not in contradiction with the "univariate" in the data densities, because they are densities in the yi's.
Xi'an

Respostas:

7

The clue that was in my answer to the previous answer is to look at how I integrated out the parameters - because you will do exactly the same integrals here. You question assumes the variance parameters known, so they are constants. You only need to look at the α,μ dependence on the numerator. To see this, note that we can write:

p(μ,α|Y)=p(μ,α)p(Y|μ,α)p(μ,α)p(Y|μ,α)dμdα
=1(2πσe2)52πσp2exp[12σe2i=211(YiμαYi1)2μ22σp2α22σp2]1(2πσe2)52πσp2exp[12σe2i=211(YiμαYi1)2μ22σp2α22σp2]dμdα

Notice how we can pull the first factor 1(2πσe2)52πσp2 out of the double integral on the denominator, and it cancels with the numerator. We can also pull out the sum of squares exp[12σe2i=211Yi2] and it will also cancel. The integral we are left with is now (after expanding the squared term):

=exp[10μ2+α2i=110Yi22μi=211Yi2αi=211YiYi1+2μαi=110Yi2σe2μ22σp2α22σp2]exp[10μ2+α2i=110Yi22μi=211Yi2αi=211YiYi1+2μαi=110Yi2σe2μ22σp2α22σp2]dμdα

Now we can use a general result from the normal pdf.

exp(az2+bzc)dz=πaexp(b24ac)
This follows from completing the square on az2+bz and noting that c does not depend on z. Note that the inner integral over μ is of this form with a=102σe2+12σp2 and b=i=211Yiαi=110Yiσe2 and c=α2i=110Yi22αi=211YiYi12σe2+α22σp2. After doing this integral, you will find that the remaining integral over α is also of this form, so you can use this formula again, with a different a,b,c. Then you should be able to write your posterior in the form 12π|V|12exp[12(μμ^,αα^)V1(μμ^,αα^)T] where V is a 2×2 matrix

Let me know if you need more clues.

update

(note: correct formula, should be 10μ2 instead of μ2)

if we look at the quadratic form you've written in the update, we notice there is 5 coefficients (L is irrelevant for posterior as we can always add any constant which will cancel in the denominator). We also have 5 unknowns μ^,α^,Q11,Q12=Q21,Q22. Hence this is a "well posed" problem so long as the equations are linearly independent. If we expand the quadratic (μμ^,αα^)Q(μμ^,αα^)t we get:

Q11(μμ^)2+Q22(αα^)2+2Q12(μμ^)(αα^)
=Q11μ2+2Q21μα+Q22α2(2Q11μ^+2Q12α^)μ(2Q22α^+2Q12μ^)α+
+Q11μ^2+Q22α^2+2Q12μ^α^

Comparing second order coefficient we get A=Q11,B=2Q12,C=Q22 which tells us what the (inverse) covariance matrix looks like. Also we have two slightly more complicated equations for α^,μ^ after substituting for Q. These can be written in matrix form as:

(2ABB2C)(μ^α^)=(JK)

Thus the estimates are given by:

(μ^α^)=(2ABB2C)1(JK)=14ACB2(BK2JCBJ2KA)

Showing that we do not have unique estimates unless 4ACB2. Now we have:

A=102σe2+12σp2B=i=110Yiσe2C=i=110Yi22σe2+12σp2J=i=211Yiσe2K=i=211YiYi1σe2

Note that if we define Xi=Yi1 for i=2,,11 and take the limit σp2 then the estimates for μ,α are given by the usual least squares estimate α^=i=211(YiY¯)(XiX¯)i=211(XiX¯)2 and μ^=Y¯α^X¯ where Y¯=110i=211Yi and X¯=110i=211Xi=110i=110Yi. So the posterior estimates are a weighted average between the OLS estimates and the prior estimate (0,0).

probabilityislogic
fonte
This isn't particularly helpful because I mentioned specifically that it's not the denominator that matters here. The denominator is just a normalizing constant, which will be obvious once you reduce the numerator to a Gaussian form. So tricks for evaluating the integrals in the denominator are mathematically really cool, but just not needed for my application. The only issue I need resolution with is manipulating the numerator.
ely
This answer gives you both numerator and denominator. The numerator exhibits the proper second degree polynomial in (α,μ) that leads to the normal quadratic form, as stressed by probabilityislogic.
Xi'an
@ems - by calculating the normalising constant you will construct the quadratic form required. it will contain the terms needed to compllete the square
probabilityislogic
I don't understand how this gives you the quadratic form. I've worked out the two integrals in the denominator using the Gaussian integral identity that you posted. In the end, I just get a huge, messy constant. There doesn't seem to be any clear way to take that constant and turn it into something times a determinant to the 1/2 power, etc. Not to mention I don't see how any of this explains how to calculate the new 'mean vector' (μ^,α^)t.. This is what I was asking for help for in the original question.
ely
Thanks tremendously for the detailed addition. I was making some silly errors when trying to do the algebra to figure out the quadratic form. Your comments about the relation to the OLS estimator are highly interesting and appreciated as well. I think this will speed up my code because I'll be able to draw samples from an analytic form that has built-in, optimized methods. My original plan was to use Metropolis-Hastings to sample from this, but it was very slow. Thanks!
ely