Encolhido

22

Houve alguma confusão na minha cabeça sobre dois tipos de estimadores do valor populacional do coeficiente de correlação de Pearson.

A. Fisher (1915) mostrou que para a população normal bivariada, o empírico é um estimador negativamente tendencioso de ρ , embora o viés possa ser de uma quantidade praticamente considerável apenas para o tamanho pequeno da amostra ( n < 30 ). A amostra r subestima ρ no sentido de que está mais próxima de 0 que ρ . (Exceto quando o último é 0 ou ± 1 , para então r é imparcial.) Vários estimadores quase imparciais de ρ foram propostos, sendo o melhor provavelmente orρn<30rρ0ρ0±1rρ Olkin e Pratt (1958) corrigiram :r

runbiased=r[1+1r22(n3)]

R2r2ρ2r is positively biased relative to ρ, meaning absolute value: r is farther from 0 than ρ (is that statement true?). The texts say it is the same problem as the over-estimation of the standard deviation parameter by its sample value. There exist many formulas to "adjust" observed R2 closer to its population parameter, Radj2 being the most well-known (but not the best). The root of such adjusted radj2 is called shrunken r:

rshrunk=±1(1r2)n1n2

Present are two different estimators of ρ. Very different: the first one inflates r, the second deflates r. How to reconcile them? Where to use/report one and where - the other?

In particular, can it be true that the "shrunken" estimator is (nearly) unbiased too, like the "unbiased" one, but only in the different context - in the asymmetrical context of regression. For, in OLS regression we consider the values of one side (the predictor) as fixed, attending without random error from sample to sample? (And to add here, regression does not need bivariate normality.)

ttnphns
fonte
I wonder if this just comes down to something based on Jensen's inequality. That, and bivariate normality is probably a bad assumption in most cases.
shadowtalker
1
Also, my understanding of the issue in B. is that regression r2 is an overestimate because the regression fit can be improved arbitrarily by adding predictors. That doesn't sound to me like the same issue as in A.
shadowtalker
Is it actually true that r2 is a positively biased estimate of ρ2 for all values of ρ? For the bivariate normal distribution this does not seem to be the case for ρ large enough.
NRH
Can bias go in the opposite direction for the square of an estimator? For example, with a simpler estimator, can it be shown that E[θ^θ]<0<E[θ^2θ2] for some ranges of θ? I think this would be difficult to do if θ=ρ, but perhaps a simpler example could be worked out.
Anthony

Respostas:

1

Regarding the bias in the correlation: When sample sizes are small enough for bias to have any practical significance (e.g., the n < 30 you suggested), then bias is likely to be the least of your worries, because inaccuracy is terrible.

Regarding the bias of R2 in multiple regression, there are many different adjustments that pertain to unbiased population estimation vs. unbiased estimation in an independent sample of equal size. See Yin, P. & Fan, X. (2001). Estimating R2 shrinkage in multiple regression: A comparison of analytical methods. The Journal of Experimental Education, 69, 203-224.

Modern-day regression methods also address the shrinkage of regression coefficients as well as R2 as a consequence -- e.g., the elastic net with k-fold cross validation, see http://web.stanford.edu/~hastie/Papers/elasticnet.pdf.

Fred Oswald
fonte
1
I don't know if this really answers the question
shadowtalker
1

I think the answer is in the context of simple regression and multiple regression. In simple regression with one IV and one DV, the R sq is not positively biased, and in-fact may be negatively biased given r is negatively biased. But in multiple regression with several IV's which may be correlated themselves, R sq may be positively biased because of any "suppression" that may be happening. Thus, my take is that observed R2 overestimates the corresponding population R-square, but only in multiple regression

Dingus
fonte
1
R sq is not positively biased, and in-fact may be negatively biased Interesting. Can you show it or give a reference? - In bivariate normal population, can observed sample Rsq statistic be negatively biased estimator?
ttnphns
I think you are wrong. Could you give a reference to back up your claim?
Richard Hardy
Sorry, but this was more of a thought exercise, so I have no reference.
Dingus
I was going off of Comment A above, where Fischer showed that in a bivariate normal situation, r is a negatively biased estimator of rho. If that is the case would it not follow that R sq is also negatively biased?
Dingus
Perhaps this will aid in the conversation digitalcommons.unf.edu/cgi/…
Dingus