O que se pode concluir sobre os dados quando a média aritmética está muito próxima da média geométrica?

Existe algo significativo sobre uma média geométrica e aritmética que caem muito próximas umas das outras, digamos ~ 0,1%? Que conjecturas podem ser feitas sobre esse conjunto de dados?

Eu tenho trabalhado na análise de um conjunto de dados e percebo que, ironicamente, os valores são muito, muito próximos. Não exato, mas próximo. Além disso, uma verificação rápida da sanidade da desigualdade média geométrica-aritmética, bem como uma revisão da aquisição de dados, revelam que não há nada suspeito sobre a integridade do meu conjunto de dados em termos de como eu criei os valores.

descriptive-statistics mean geometric-mean user12289
fonte

Pequena observação: verifique primeiro se todos os seus dados são positivos; um número par de valores negativos pode deixá-lo com um produto positivo e alguns pacotes podem não sinalizar o problema em potencial (a desigualdade AM-GM depende dos valores serem todos positivos). Veja, por exemplo (em R):x=c(-5,-5,1,2,3,10); prod(x)^(1/length(x))

$\:\quad$ [1] 3.383363 (enquanto a média aritmética é 1)

Glen_b -Reinstala Monica 28/06

Para elaborar o argumento de @ Glen_b, um conjunto de dados

{- x, 0, x}

$\{-x,0,x\}$ sempre tem média aritmética e geométrica igual, ou seja, zero. No entanto, podemos espalhar os três valores tão distantes quanto desejarmos.

hardmath

As médias aritmética e geométrica têm a mesma fórmula generalizada , com

dando a primeira e

dando a segunda. Torna-se então intuitivamente claro que os dois se aproximam cada vez mais quando os valores de dados

são cada vez mais iguais, aproximando-se constante.

p = 1

$p=1$

p \to 0

$p \rightarrow 0$

x

$x$

ttnphns

Respostas:

A média aritmética está relacionada à média geométrica através da desigualdade Aritmética-Média-Geométrica (AMGM), que afirma que:

\frac{x_{1} + x_{2} + \dots + x_{n}}{n} \geq \sqrt[n]{x_{1} x_{2} \dots x_{n}},

$\frac{x_1+x_2+\cdots+x_n} n \geq \sqrt[n]{x_1 x_2\cdots x_n},$

onde a igualdade é alcançada se . Portanto, provavelmente seus pontos de dados estão muito próximos um do outro. $x_1=x_2=\cdots=x_n$

Alex R.
fonte

Isto está certo. Normalmente, quanto menor a variação dos valores, mais próximas as duas médias.

Michael M

A variação teria que ser pequena POR COMPARAÇÃO com os tamanhos das observações. Assim, é o coeficiente de variação,

, que teria que ser pequeno.

σ / μ

$\sigma/\mu$

$\qquad$

Michael Hardy

AMGM significa alguma coisa? Se assim for, seria bom tê-lo explicado.

Richard Hardy

@RichardHardy: AMGM significa 'média aritmética - média geométrica' #

@ user1108, obrigado, na verdade, consegui depois de ler os outros posts. Eu só acho que poderia ser explicitado na resposta (não apenas nos comentários).

Richard Hardy

Ao elaborar a resposta de @Alex R, uma maneira de ver a desigualdade da AMGM é como um efeito de desigualdade de Jensen. Pela desigualdade de Jensen : Em seguida, tome a exponencial de ambos os lados:

\log (\frac{1}{n} \sum_{i} x_{i}) \geq \frac{1}{n} \sum_{i} \log x_{i}

$\log\left( \frac{1}{n} \sum_i x_i \right) \geq \frac{1}{n} \sum_i \log x_i$

\frac{1}{n} \sum_{i} x_{i} \geq \exp (\frac{1}{n} \sum_{i} \log x_{i})

$\frac{1}{n} \sum_i x_i \geq \exp\left( \frac{1}{n} \sum_i \log x_i \right)$

The right hand side is the geometric mean since $\left(x_1 \cdot x_2 \cdot \ldots \cdot x_n \right)^{1/n} = \exp\left(\frac{1}{n} \sum_i \log x_i \right)$

When does the AMGM inequality hold with near equality? When the Jensen's inequality effect is small. What drives the Jensen's inequality effect here is the concavity, the curvature of the logarithm. If your data is spread across an area where the logarithm has curvature, the effect will be big. If your data is spread across a region where the logarithm is basically affine, then the effect will be small.

For example, if the data has little variation, is clumped together in a sufficiently small neighborhood, then the logarithm will look like an affine function in that region (a theme of calculus is that if you zoom in enough on smooth, continuous function, that it will look like a line). For data sufficiently close together, the arithmetic mean of the data will be close to the geometric mean.

Matthew Gunn
fonte

$x_1\le x_2 \le \cdots \le x_n$ given that their arithmetic mean (AM) is a small multiple $1+\delta$ of their geometric mean (GM) (with $\delta \ge 0$ ). In the question, $\delta\approx 0.001$ but we don't know $n$ .

$1$ $x_n$ $x_1+x_2+\cdots+x_n = n(1+\delta)$ and $x_1\cdot x_2\cdots x_n = 1$ .

$x_1=x_2=\cdots=x_{n-1}=x$ $x_n=z \ge x$ . Thus

n (1 + δ) = x_{1} + \dots + x_{n} = (n - 1) x + z

$n(1+\delta) = x_1 + \cdots + x_n = (n-1)x + z$

and

1 = x_{1} \cdot x_{2} \dots x_{n} = x^{n - 1} z .

$1 = x_1\cdot x_2 \cdots x_n = x^{n-1}z.$

The solution $x$ is a root between $0$ and $1$ of

(1 - n) x^{n} + n (1 + δ) x^{n - 1} - 1.

$(1-n)x^n + n(1+\delta)x^{n-1} - 1.$

It is easily found iteratively. Here are the graphs of the optimal $x$ and $z$ as a function of $\delta$ for $n=6, 20, 50, 150$ , left to right:

As soon as $n$ reaches any appreciable size, even a tiny ratio of $1.001$ is consistent with one large outlying $x_n$ (the upper red curves) and a group of tightly clustered $x_i$ (the lower blue curves).

At the other extreme, suppose $n=2k$ is even (for simplicity). The minimum range is achieved when half the $x_i$ equal one value $x \le 1$ and the other half equal another value $z \ge 1$ . Now the solution (which is easily checked) is

x^{k} = 1 + δ \pm \sqrt{δ^{2} + 2 δ} .

$x^k = 1+\delta \pm \sqrt{\delta^2 + 2\delta}.$

For tiny $\delta$ , we may ignore the $\delta^2$ as an approximation and also approximate the $k^\text{th}$ root to first order, giving

x \approx 1 + \frac{δ - \sqrt{2 δ}}{k}; z \approx 1 + \frac{δ + \sqrt{2 δ}}{k} .

$x \approx 1 + \frac{\delta-\sqrt{2\delta}}{k};\ z \approx 1 + \frac{\delta+\sqrt{2\delta}}{k}.$

The range is approximately $\sqrt{32\delta}/n$ .

In this manner we have obtained upper and lower bounds on the possible range of the data. We have learned that they depend heavily on the amount of data $n$ . The upper bound shows the range can be appreciable even for tiny $\delta$ , thereby improving our sense of just how close to each other the data points really need to be--and placing a lower limit on their range, too.

Similar analyses, just as easily carried out, can inform you--quantitatively--of how tightly clustered the $x_i$ might be in terms of any other measure of spread, such as their variance or coefficient of variation.

whuber
fonte

On the right of your right hand graph you seem to have

n = 150, δ = 0.002, x \approx 0.9954, z \approx 1.983, k = 75

$n=150, \delta=0.002, x\approx 0.9954, z \approx 1.983, k=75$ . I do not see how these values are near your stated formulae approximations which seem to give

x \approx 0.99918, z \approx 1.00087

$x \approx 0.99918, z\approx 1.00087$ . Perhaps I have misunderstood

Henry

@Henry I don't know how you came up with those numbers. When

n = 150

$n=150$ , the requirements are that

x^{149} z = 1

$x^{149} z=1$ and

149 x + z = 150 (1.002) = 150.3

$149x + z=150(1.002)=150.3$ . Neither of those comes close to being true for the values you supply. When you plug in

x = 0.995416

$x=0.995416$ and

z = 1.98308

$z=1.98308$ , you get the correct values.

whuber

I tried what looks to me like your

z \approx 1 + \frac{δ + \sqrt{2 δ}}{k} = 1 + \frac{0.002 + \sqrt{2 \times 0.002}}{75} \approx 1.00087

$z \approx 1 + \dfrac{\delta+\sqrt{2\delta}}{k} = 1+\dfrac{0.002+\sqrt{2\times 0.002} }{75} \approx 1.00087$ and similarly for

x

$x$ . But now I see this is answering a different question

Henry

@Henry That solves a different problem: those are the values that give a minimum range. I did not post graphs for those. Indeed, with your

x

$x$ and

z

$z$ we have

75 x + 75 z \approx 150.3

$75x+75z\approx 150.3$ and

x^{75} z^{75} \approx 1

$x^{75}z^{75}\approx 1$ , as required.

whuber