" Como não classificar pela classificação média " de Evan Miller propõe usar o limite inferior de um intervalo de confiança para obter uma "pontuação" agregada sensata para os itens classificados. No entanto, está trabalhando com um modelo de Bernoulli: as classificações são positivas ou negativas.
Qual é um intervalo de confiança razoável para usar em um modelo de classificação que atribui uma pontuação discreta de a estrelas, assumindo que o número de classificações de um item possa ser pequeno?
Eu acho que posso ver como adaptar o centro dos intervalos Wilson e Agresti-Coull como
onde ou (provavelmente melhor) é a classificação média de todos os itens. No entanto, não sei como adaptar a largura do intervalo. Meu melhor palpite (revisado) seria
with , but I can't justify with more than hand-waving it as an analogy of Agresti-Coull, taking that as
Are there standard confidence intervals which apply? (Note that I don't have subscriptions to any journals or easy access to a university library; by all means give proper references, but please supplement with the actual result!)
Like Karl Broman said in his answer, a Bayesian approach would likely be a lot better than using confidence intervals.
The Problem With Confidence Intervals
Why might using confidence intervals not work too well? One reason is that if you don't have many ratings for an item, then your confidence interval is going to be very wide, so the lower bound of the confidence interval will be small. Thus, items without many ratings will end up at the bottom of your list.
Intuitively, however, you probably want items without many ratings to be near the average item, so you want to wiggle your estimated rating of the item toward the mean rating over all items (i.e., you want to push your estimated rating toward a prior). This is exactly what a Bayesian approach does.
Bayesian Approach I: Normal Distribution over Ratings
One way of moving the estimated rating toward a prior is, as in Karl's answer, to use an estimate of the formw∗R+(1−w)∗C :
This estimate can, in fact, be given a Bayesian interpretation as the posterior estimate of the item's mean rating when individual ratings comes from a normal distribution centered around that mean.
However, assuming that ratings come from a normal distribution has two problems:
Bayesian Approach II: Multinomial Distribution over Ratings
So instead of assuming a normal distribution for ratings, let's assume a multinomial distribution. That is, given some specific item, there's a probabilityp1 that a random user will give it 1 star, a probability p2 that a random user will give it 2 stars, and so on.
Of course, we have no idea what these probabilities are. As we get more and more ratings for this item, we can guess thatp1 is close to n1n , where n1 is the number of users who gave it 1 star and n is the total number of users who rated the item, but when we first start out, we have nothing. So we place a Dirichlet prior Dir(α1,…,αk) on these probabilities.
What is this Dirichlet prior? We can think of eachαi parameter as being a "virtual count" of the number of times some virtual person gave the item i stars. For example, if α1=2 , α2=1 , and all the other αi are equal to 0, then we can think of this as saying that two virtual people gave the item 1 star and one virtual person gave the item 2 stars. So before we even get any actual users, we can use this virtual distribution to provide an estimate of the item's rating.
[One way of choosing theαi parameters would be to set αi equal to the overall proportion of votes of i stars. (Note that the αi parameters aren't necessarily integers.)]
Then, once actual ratings come in, simply add their counts to the virtual counts of your Dirichlet prior. Whenever you want to estimate the rating of your item, simply take the mean over all of the item's ratings (both its virtual ratings and its actual ratings).
This situation cries out for a Bayesian approach. There are simple approaches for Bayesian rankings of ratings here (pay particular to the comments, which are interesting) and here, and then a further commentary on these here. As one of the comments in the first of these links points out: