As notícias dizem que o CERN anunciará amanhã que o bóson de Higgs foi detectado experimentalmente com 5 evidência. Segundo esse artigo:
5 equivale a 99,9994% de chance dos dados que os detectores CMS e ATLAS estão vendo não são apenas ruídos aleatórios - e 0,00006% de chance de terem sido enganados; 5 é a certeza necessária para que algo seja oficialmente rotulado de "descoberta" científica.
Isso não é super rigoroso, mas parece dizer que os físicos usam a metodologia estatística padrão de "teste de hipóteses", definindo como , que corresponde a (bicaudal)? Ou existe algum outro significado?
Em boa parte da ciência, é claro, definir alfa como 0,05 é feito rotineiramente. Isso seria equivalente a "dois- " evidências, embora eu nunca tenha ouvido falar disso. Existem outros campos (além da física de partículas) em que uma definição muito mais rigorosa de alfa é padrão? Alguém conhece uma referência sobre como a regra dos cinco foi aceita pela física de partículas?
Atualização: estou fazendo esta pergunta por um motivo simples. Meu livro Bioestatística Intuitiva (como a maioria dos livros de estatísticas) tem uma seção que explica o quão arbitrária é a regra usual "P <0,05". Eu gostaria de adicionar este exemplo de um campo científico em que um valor muito (muito!) Menor de é considerado necessário. Mas se o exemplo é realmente mais complicado, com o uso de métodos bayesianos (como sugerem alguns comentários abaixo), não seria muito adequado ou exigiria muito mais explicações.
fonte
Respostas:
In most applications of statistics there is that old chestnut about 'all models are wrong, some are useful'. This being the case, we would only expected a model to perform at a given level since we are describing some incredibly complicated process using some simple model.
Physics is very different, so intuition developed from statistical models isn't so appropriate. In Physics, in particular particle physics which deals directly with fundamental physical laws, the model really is supposed to be an exact description of reality. Any departure from what the model predicts must be completely explained by experimental noise, not a limitation of the model. This means that if the model is good and correct and the experimental apparatus understood the statistical significance should be very high, hence the high bar that is set.
The other reason is historical, the particle physics community has been burned in the past by 'discoveries' at lower significance levels being later retracted, hence they are generally more cautious now.
fonte
History and origin
According to Robert D Cousins1 and Tommaso Dorigo2 , the origin of the 5σ threshold origin lies in the early particle physics work of the 60s when numerous histograms of scattering experiments were investigated and searched for peaks/bumps that might indicate some newly discovered particle. The threshold is a rough rule to account for the multiple comparisons that are being made.
Both authors refer to a 1968 article from Rosenfeld3 , which dealt with the question whether or not there are far out mesons and baryons, for which several 4σ effects where measured. The article answered the question negatively by arguing that the number of published claims corresponds to the statistically expected number of fluctuations. Along with several calculations supporting this argument the article promoted the use of the 5σ level:
and later in the paper (emphasis is mine)
Tommaso seems to be careful in stating that it started with the Rosenfeld article
But in the 80s the use of5σ was spread out. For instance, the astronomer Steve Schneider4 mentions in 1989 that it is something being taught (emphasize mine in the quote below):
Yet, in the field of particle physics many publications where still based on4σ discrepancies up till the late 90s. This only changed into 5σ at the beginnning of the 21th century. It is probably prescribed as a guidline for publications around 2003 (see the prologue in Franklin's book Shifting Standards5 )
Modern use
Currently, the5σ threshold is a textbook standard. For instance, it occurs as a standard article on physics.org6 or in some of Glen Cowan's works, such as the statistics section of the Review of Particle Physics from the particle data group7 (albeit with several critical sidenotes)
The use of the5σ level is now ascribed to 4 reasons:
History based on practice one found that5σ is a good threshold. (exotic stuff seems to happen randomly, even between 3σ to 4σ , like recently the 750 GeV diphoton excess)
The look elsewhere effect (or the multiple comparisons). Either because multiple hypotheses are tested, or because experiments are performed many times, people adjust for this (very roughly) by adjusting the bound to5σ . This relates to the history argument.
Systematic effects and uncertainty inσ often the uncertainty of the experiment outcome is not well known. The σ is derived, but the derivation includes weak assumptions such as the absence of systematic effects, or the possibility to ignore them. Increasing the threshold seems to be a way to sort of a protect against these events. (This is a bit strange though. The computed σ has no relation to the size of systematic effects and the logic breaks down, an example is the "discovery" of superluminal neutrino's which was reported to be having a 6σ significance.)
Extraordinary claims require extraordinary evidence Scientific results are reported in a frequentist way, for instance using confidence intervals or p-values. But, they are often interpreted in a Bayesian way. The5σ level is claimed to account for this.
Currently several criticisms have been written about the5σ threshold by Louis Lyons8, 9 , and also the earlier mentioned articles by Robert D Cousins1 and Tommaso Dorigo2 provide critique.
Other Fields
It is interesting to note that many other scientific fields do not have similar thresholds or do not, somehow, deal with the issue. I imagine this makes a bit sense in the case of experiments with humans where it is very costly (or impossible) to extend an experiment that gave a .05 or .01 significance.
The result of not taking these effects into account is that over half of the published results may be wrong or at least are not reproducible (This has been argued for the case of psychology by Monya Baker10 , and I believe there are many others that made similar arguments. I personaly think that the situation may be even worse in nutritional science). And now, people from other fields than physics are thinking about how they should deal with this issue (the case of medicine/pharmacology11 ).
Cousins, R. D. (2017). The Jeffreys–Lindley paradox and discovery criteria in high energy physics. Synthese, 194(2), 395-432. arxiv link
Dorigo, T. (2013) Demystifying The Five-Sigma Criterion, from science20.com 2019-03-07
Rosenfeld, A. H. (1968). Are there any far-out mesons or baryons? web-source: escholarship
Burbidge, G., Roberts, M., Schneider, S., Sharp, N., & Tifft, W. (1990, November). Panel discussion: Redshift related problems. In NASA Conference Publication (Vol. 3098, p. 462). link to photocopy on harvard.edu
Franklin, A. (2013). Shifting standards: Experiments in particle physics in the twentieth century. University of Pittsburgh Press.
What does the 5 sigma mean? from physics.org 2019-03-07
Beringer, J., Arguin, J. F., Barnett, R. M., Copic, K., Dahl, O., Groom, D. E., ... & Yao, W. M. (2012). Review of particle physics. Physical Review D-Particles, Fields, Gravitation and Cosmology, 86(1), 010001. (section 36.2.2. Significance tests, page 394, link aps.org )
Lyons, L. (2013). Discovering the Significance of 5 sigma. arXiv preprint arXiv:1310.1284. arxiv link
Lyons, L. (2014). Statistical Issues in Searches for New Physics. arXiv preprint arxiv link
Baker, M. (2015). Over half of psychology studies fail reproducibility test. Nature News. from nature.com 2019-03-07
Horton, R. (2015). Offline: what is medicine's 5 sigma?. The Lancet, 385(9976), 1380. from thelancet.com 2019-03-07
fonte
For a reason entirely different from that of physics, there are other fields with much more strict alphas when they engage in hypothesis testing. Genetic Epidemiology is among them, especially when they use "GWAS" (Genome-Wide Association Study) to look at various genetic markers for disease.
Because a GWAS study is a massive exercise in multiple hypothesis testing, the state-of-the-art analysis techniques are all built around much more strict alphas than 0.05. Other such "candidate screening" study techniques that follow in the wake of the genomics studies will likely do the same.
fonte
The level is so high to avoid premature announcements of news that later turns out to be spurious. For more discussion on this, see
https://physics.stackexchange.com/questions/8752/standard-deviation-in-particle-physics?rq=1
https://physics.stackexchange.com/questions/31126/how-many-sigma-did-the-discovery-of-the-w-boson-have
fonte