Como definir uma região de rejeição quando não há UMP?

13

Considere o modelo de regressão linear

y=Xβ+u ,

uN(0,σ2I) ,

E(uX)=0 .

Seja H0:σ02=σ2 vs H1:σ02σ2 .

Podemos deduzir que yTMXyσ2χ2(nk), ondedim(X)=n×k. EMXé a notação típico para a matriz aniquilador,MXy=y^ , onde y é a variável dependenteyregrediram emXy^yX .

O livro que estou lendo afirma o seguinte: insira a descrição da imagem aqui

Perguntei anteriormente quais critérios devem ser usados ​​para definir uma região de rejeição (RR). Consulte as respostas para essa pergunta e o principal era escolher o RR que tornasse o teste o mais poderoso possível.

Nesse caso, com a alternativa sendo uma hipótese composta bilateral, geralmente não há teste UMP. Além disso, pela resposta dada no livro, os autores não mostram se fizeram um estudo do poder de seu RR. No entanto, eles escolheram um RR bicaudal. Por que isso, já que a hipótese não determina 'unilateralmente' o RR?

Editar: Esta imagem está no manual de soluções deste livro como a solução para o exercício 4.14.

Um velho no mar.
fonte
Por favor, adicione uma referência ao livro. Relacionado: Valor P em um teste bicaudal com distribuição nula assimétrica .
Scortchi - Restabelecer Monica
@ Scortchi obrigado pelo link. Posso fazer uma pergunta sobre esta questão? Você acha interessante? Estou tentando avaliar se estou fazendo perguntas interessantes, ou se devo direcionar meus interesses para outras áreas ...
Um velho no mar.
Nem todo mundo acha a teoria interessante, é claro, mas algumas pessoas pensam (inclusive eu) e temos quase 2k qs marcados commathematical-statistics . Então, um q bom. IMO. É um pouco amplo, mas acho que uma boa resposta pesquisaria várias abordagens e considerações, e um exemplo motivador ajuda muito. Porém, eu teria escolhido um exemplo o mais simples possível - testes sobre a variação de uma distribuição normal com média conhecida ou a média de uma distribuição exponencial. .]
Scortchi - Restabelece Monica
@ Scortchi obrigado pelo seu feedback. Às vezes não tenho certeza se estruturo bem a questão, já que estou estudando isso.
Um velho no mar.
2
Você deve definir MX
Taylor

Respostas:

7

Mais fácil primeiro trabalhar com o caso em que os coeficientes de regressão são conhecidos e a hipótese nula, portanto, simples. Então a estatística suficiente é , onde z é o residual; sua distribuição sob o nulo também é um qui-quadrado com escala de σ 2 0 e com graus de liberdade iguais ao tamanho da amostra n .T=z2zσ02n

Escreva a razão das probabilidades em & σ = σ 2 e confirme que é uma função crescente de T para qualquer σ 2 > σ 1 :σ=σ1σ=σ2Tσ2>σ1

A função de razão de verossimilhança de log é

(σ2;T,n)(σ1;T,n)=n2[log(σ12σ22)+Tn(1σ121σ22)]
, & directly proportional to T with positive gradient when σ2>σ1.

So by the Karlin–Rubin theorem each of the one-tailed tests H0:σ=σ0 vs HA:σ<σ0 & H0:σ=σ0 vs HA:σ<σ0 is uniformly most powerful. Clearly there's no UMP test of H0:σ=σ0 vs HA:σσ0. As discussed hereσ>σ0 or that σ<σ0 when you reject the null.

σ=σ^σ, & σ=σ0:

Como σ^2=Tn

(σ^;T,n)(σ0;T,n)=n2[log(nσ02T)+Tnσ021]

This is a fine statistic for quantifying how much the data support HA:σσ0 over H0:σ=σ0. And confidence intervals formed from inverting the likelihood-ratio test have the appealing property that all parameter values inside the interval have higher likelihood than those outside. The asymptotic distribution of twice the log-likelihood ratio is well known, but for an exact test, you needn't try to work out its distribution—just use the tail probabilities of the corresponding values of T in each tail.

If you can't have a uniformly most powerful test, you might want one that's most powerful against the alternatives closest to the null. Find the derivative of the log-likelihood function with respect to σ—the score function:

d(σ;T,n)dσ=Tσ3nσ

Evaluating its magnitude at σ0 gives a locally most powerful test of H0:σ=σ0 vs HA:σσ0. Because the test statistic's bounded below, with small samples the rejection region may be confined to the upper tail. Again, the asymptotic distribution of the squared score is well known, but you can get an exact test in the same way as for the LRT.

Another approach is to restrict your attention to unbiased tests, viz those for which the power under any alternative exceeds the size. Check your sufficient statistic has a distribution in the exponential family; then for a size α test, ϕ(T)=1 if T<c1 or T>c2, else ϕ(T)=0, you can find the uniformly most powerful unbiased test by solving

E(ϕ(T))=αE(Tϕ(T))=αET

A plot helps show the bias in the equal-tail-areas test & how it arises:

Plot of power of the test against alternatives

At values of σ a little over σ0 the increased probability of the test statistics' falling in the the upper-tail rejection rejection doesn't compensate for the reduced probability of its falling in the lower-tail rejection region & the power of the test drops below its size.

Being unbiased is good; but it's not self-evident that having a power slightly lower than the size over a small region of the parameter space within the alternative is so bad as to rule out a test altogether.

Two of the above two-tailed tests coincide (for this case, not in general):

The LRT is UMP among unbiased tests. In cases where this isn't true the LRT may still be asymptotically unbiased.

I think all, even the one-tailed tests, are admissible, i.e. there's no test more powerful or as powerful under all alternatives—you can make the test more powerful against alternatives in one direction only by making it less powerful against alternatives in the other direction. As the sample size increases, the chi-squared distribution becomes more & more symmetric, & all the two-tailed tests will end up being much the same (another reason for using the easy equal-tailed test).

With the composite null hypothesis, the arguments become a little more complicated, but I think you can get practically the same results, mutatis mutandis. Note that one but not the other of the one-tailed tests is UMP!

Scortchi - Reinstate Monica
fonte
Scortchi thanks for your answer. I still have some doubts, though. Firstly, could you elaborate a bit more on the following sentence? «applying a multiple-comparisons correction leads to the commonly used test with equally sized rejection regions in both tails, & it's quite reasonable when you're going to claim either that σ>σ0 or that σ<σ0 when you reject the null.» Also why do you say it's reasonable? I think this is the core of my question if I'm not mistaken. ;)
An old man in the sea.
I read this paragraph from you linked answer, but I did not understand it well«Doubling the lowest one-tailed p-value can be seen as a multiple-comparisons correction for carrying out two one-tailed tests.» I would be thankful if you could please explained it a bit more. ;)
An old man in the sea.
See Bonferroni correction. If you carry out two separate size α/2 tests the family-wise Type I error is no more than α, & when the rejection regions are disjoint it's exactly α. I wanted to point out that the equal-tail-areas test can be seen in this way because people sometimes seem to think the only reasons to use it are ease of calculation & approximation to the other tests. In fact each test has its own rationale: so I wouldn't say this was the core of your question; it's a matter of horses for courses.
Scortchi - Reinstate Monica
1

In this case, with the alternative being a bilateral composite hypothesis there's usually no UMP test.

I am not sure if that is true in general. Certainly, a lot of the classical results (Neymon-Pearson, Karlin-Rubin) are based on either simple or one-sided hypothesis, but generalizations to two-sided composite hypothesis do exist. You can find some notes on that here, and more discussion in the textbook here.

For your problem specifically, I don't know whether a UMP test exists or not. But intuitively, it seems to be that under 0-1 loss, a one sided test will probably be inadmissible, and thus the class of admissible test will be all two-sided tests. Give the class of two sided tests, the goal is to find the one with the largest power, which should automatically happen by choosing quantiles around the one mode of the χ2. (This is all based on intuition).

Greenparker
fonte
3
There's clearly not a uniformly most powerful test in this case because of the existence of different tests most powerful against particular alternatives in different directions from σ0. For a "best" test defined in terms of power you'd have to look for the uniformly most powerful test of all unbiased tests, or of all invariant tests; or for a locally most powerful test; or something like that - & perhaps end up settling for any admissible test.
Scortchi - Reinstate Monica