Como o elemento de probabilidade de X é f(x)dx, a mudança da variável y=xσ+μ é equivalente a x=(y−μ)/σ, onde
f(x)dx=f(y−μσ)d(y−μσ)=1σf(y−μσ)dy
segue-se que a densidade de Y é
fY(y)=1σf(y−μσ).
Consequentemente, a entropia de Y é
H(Y)=−∫∞−∞log(1σf(y−μσ))1σf(y−μσ)dy
que, ao alterar a variável de volta para x=(y−μ)/σ, produz
H(Y)=−∫∞−∞log(1σf(x))f(x)dx=−∫∞−∞(log(1σ)+log(f(x)))f(x)dx=log(σ)∫∞−∞f(x)dx−∫∞−∞log(f(x))f(x)dx=log(σ)+Hf.
These calculations used basic properties of the logarithm, the linearity of integration, and that fact that f(x)dx integrates to unity (the Law of Total Probability).
The conclusion is
The entropy of Y=Xσ+μ is the entropy of X plus log(σ).
In words, shifting a random variable does not change its entropy (we may think of the entropy as depending on the values of the probability density, but not on where those values occur), while scaling a variable (which, for σ≥1 "stretches" or "smears" it out) increases its entropy by log(σ). This supports the intuition that high-entropy distributions are "more spread out" than low-entropy distributions.
As a consequence of this result, we are free to choose convenient values of μ and σ when computing the entropy of any distribution. For example, the entropy of a Normal(μ,σ) distribution can be found by setting μ=0 and σ=1. The logarithm of the density in this case is
log(f(x))=−12log(2π)−x2/2,
whence
H=−E[−12log(2π)−X2/2]=12log(2π)+12.
Consequently the entropy of a Normal(μ,σ) distribution is obtained simply by adding logσ to this result, giving
H=12log(2π)+12+log(σ)=12log(2πeσ2)
as reported by Wikipedia.