Dados dois histogramas, como avaliamos se são semelhantes ou não?
É suficiente simplesmente olhar para os dois histogramas? O mapeamento simples de um para um tem o problema de que, se um histograma for ligeiramente diferente e ligeiramente alterado, não obteremos o resultado desejado.
Alguma sugestão?
histogram
image-processing
Mew 3.4
fonte
fonte
Respostas:
Um artigo recente que pode valer a pena ser lido é:
Cao, Y. Petzold, L. Limitações de precisão e medição de erros na simulação estocástica de sistemas de reação química, 2006.
Embora o foco deste artigo esteja na comparação de algoritmos de simulação estocástica, essencialmente a idéia principal é como comparar dois histogramas.
Você pode acessar o pdf na página do autor.
fonte
Existem várias medidas de distância entre dois histogramas. Você pode ler uma boa categorização dessas medidas em:
As funções de distância mais populares estão listadas aqui para sua conveniência:
A Matlab implementation of some of these distances is available from my GitHub repository: https://github.com/meshgi/Histogram_of_Color_Advancements/tree/master/distance Also you can search guys like Yossi Rubner, Ofir Pele, Marco Cuturi and Haibin Ling for more state-of-the-art distances.
Update: Alternative explaination for the distances appears here and there in the literature, so I list them here for sake of completeness.
fonte
hist1 < hist2
The standard answer to this question is the chi-squared test. The KS test is for unbinned data, not binned data. (If you have the unbinned data, then by all means use a KS-style test, but if you only have the histogram, the KS test is not appropriate.)
fonte
You're looking for the Kolmogorov-Smirnov test. Don't forget to divide the bar heights by the sum of all observations of each histogram.
Note that the KS-test is also reporting a difference if e.g. the means of the distributions are shifted relative to one another. If translation of the histogram along the x-axis is not meaningful in your application, you may want to subtract the mean from each histogram first.
fonte
As David's answer points out, the chi-squared test is necessary for binned data as the KS test assumes continuous distributions. Regarding why the KS test is inappropriate (naught101's comment), there has been some discussion of the issue in the applied statistics literature that is worth raising here.
An amusing exchange began with the claim (García-Berthou and Alcaraz, 2004) that one third of Nature papers contain statistical errors. However, a subsequent paper (Jeng, 2006, "Error in statistical tests of error in statistical tests" -- perhaps my all-time favorite paper title) showed that Garcia-Berthou and Alcaraz (2005) used KS tests on discrete data, leading to their reporting inaccurate p-values in their meta-study. The Jeng (2006) paper provides a nice discussion of the issue, even showing that one can modify the KS test to work for discrete data. In this specific case, the distinction boils down to the difference between a uniform distribution of the trailing digit on [0,9],
fonte
You can compute the cross-correlation (convolution) between both histograms. That will take into account slight traslations.
fonte