Portanto, essa pergunta é um pouco confusa, mas incluirei gráficos coloridos para compensar isso! Primeiro os antecedentes e depois as perguntas.
fundo
Digamos que você tenha uma distribuição multinomial dimensional com probailites iguais nas categorias. Seja as contagens normalizadas ( ) dessa distribuição, ou seja:
Now the distribution over has support over the -simplex but with discrete steps. For example, with this distribution have the following support (the red dots):
Another distribution with similar support is the -dimensional distribution, that is, a uniform distribution over the unit simplex. For example, here are random draws from a 3-dimesional :
Now I had the idea that the distribution of from the distribution could be characterized as draws from a that are discretized to the discrete support of . The discretization I had in mind (and that seems to work well) is to take each point in the simplex and "round it off" to the closest point that is in the support of . For the 3-dimensional simplex you get the following partition where points in each coloured area should "round off" to the closest red point:
Since the Dirichlet distribution is uniform, the resulting density/probability for each of the points is proportional to the area/volume that gets "rounded off" to each point. For the two dimensional and three dimensional cases these probabilities are:
(these probabilities are from Monte Carlo simulations)
So it seems like, at least for 2 and 3 dimensions, the resulting probability distribution from discretizing in this particular way is the same as the probability distribution for . That is the normalized outcome of a distribution. I've also tried with 4-dimensions and it seems to work there to.
Question(s)
So my main question is:
When discretizing a uniform Dirichlet in this particular way, does the relation with a hold for further dimensions? Does the relation hold at all? (I've only tried this using Monte Carlo simulation...)
Further I wonder:
- If this relation holds, is it a known result? And is there some source I can cite for this?
- If this discretization of a uniform Dirichlet doesn't have this relation with the Multinomial. Is there some similar construction that has?
Some Context
My reason for asking this question is that I'm looking at the similarity between the non-parametric Bootstrap and the Bayesian Bootstrap, and then this came up. I have also noticed that the pattern on the coloured areas on the 3-dimesional simplex above looks like (and should be) a Voronoi diagram. One way (I hope) you can think about this is as a sequence of Pascal's Triangle/Simpex (http://www.math.rutgers.edu/~erowland/pascalssimplices.html). Where the size of the colored areas follow the second row of Pascal's' triangle in the 2-d case, the third row of Pascal's tetrahedron in the 3-d case, and so on. This would explain the connection with the multinomial distribution, but here I'm really in deep water...
fonte
Respostas:
Those the two distributions are different for everyn≥4 .
Notation
I'm going to rescale your simplex by a factorn , so that the lattice points have integer coordinates. This doesn't change anything, I just think it makes the notation a little less cumbersome.
LetS be the (n−1) -simplex, given as the convex hull of the points (n,0,…,0) , ..., (0,…,0,n) in Rn . In other words, these are the points where all coordinates are non-negative, and where the coordinates sum to n .
LetΛ denote the set of lattice points, i.e. those points in S where all coordinates are integral.
IfP is a lattice point, we let VP denote its Voronoi cell, defined as those points in S which are (strictly) closer to P than to any other point in Λ .
We put two probability distributions we can put onΛ . One is the multinomial distribution, where the point (a1,...,an) has the probability 2−nn!/(a1!⋯an!) . The other we will call the Dirichlet model, and it assigns to each P∈Λ a probability proportional to the volume of VP .
Very informal justification
I'm claiming that the multinomial model and the Dirichlet model give different distributions onΛ , whenever n≥4 .
To see this, consider the casen=4 , and the points A=(2,2,0,0) and B=(3,1,0,0) . I claim that VA and VB are congruent via a translation by the vector (1,−1,0,0) . This means that VA and VB have the same volume, and thus that A and B have the same probability in the Dirichlet model. On the other hand, in the multinomial model, they have different probabilities (2−4⋅4!/(2!2!) and 2−4⋅4!/3! ), and it follows that the distributions cannot be equal.
The fact thatVA and VB are congruent follows from the following plausible but non-obvious (and somewhat vague) claim:
Plausible Claim: The shape and size ofVP is only affected by the "immediate neighbors" of P , (i.e. those points in Λ which differ from P by a vector that looks like (1,−1,0,…,0) , where the 1 and −1 may be in other places)
It's easy to see that the configurations of "immediate neighbors" ofA and B are the same, and it then follows that VA and VB are congruent.
In casen≥5 , we can play the same game, with A=(2,2,n−4,0,…,0) and B=(3,1,n−4,0,…,0) , for example.
I don't think this claim is completely obvious, and I'm not going to prove it, instead of a slightly different strategy. However, I think this is a more intuitive answer to why the distributions are different forn≥4 .
Rigorous proof
TakeA and B as in the informal justification above. We only need to prove that VA and VB are congruent.
GivenP=(p1,…,pn)∈Λ , we will define WP as follows: WP is the set of points (x1,…,xn)∈S , for which max1≤i≤n(ai−pi)−min1≤i≤n(ai−pi)<1 . (In a more digestible manner: Let vi=ai−pi . WP is the set of points for which the difference between the highest and lowest vi is less than 1.)
We will show thatVP=WP .
Step 1
Claim:VP⊆WP .
This is fairly easy: Suppose thatX=(x1,…,xn) is not in WP . Let vi=xi−pi , and assume (without loss of generality) that v1=max1≤i≤nvi , v2=min1≤i≤nvi . v1−v2≥1 Since ∑ni=1vi=0 , we also know that v1>0>v2 .
Let nowQ=(p1+1,p2−1,p3,…,pn) . Since P and X both have non-negative coordinates, so does Q , and it follows that Q∈S , and so Q∈Λ . On the other hand, dist2(X,P)−dist2(X,Q)=v21+v22−(1−v1)2−(1+v2)2=−2+2(v1−v2)≥0 . Thus, X is at least as close to Q as to P , so X∉VP . This shows (by taking complements) that Vp⊆WP .
Step 2
Claim: TheWP are pairwise disjoint.
Suppose otherwise. LetP=(p1,…,pn) and Q=(q1,…,qn) be distinct points in Λ , and let X∈WP∩WQ . Since P and Q are distinct and both in Λ , there must be one index i where pi≥qi+1 , and one where pi≤qi−1 . Without loss of generality, we assume that p1≥q1+1 , and p2≤q2−1 . Rearranging and adding together, we get q1−p1+p2−q2≥2 .
Consider now the numbersx1 and x2 . From the fact that X∈WP , we have x1−p1−(x2−p2)<1 . Similarly, X∈WQ implies that x2−q2−(x1−q1)<1 . Adding these together, we get q1−p1+p2−q2<2 , and we have a contradiction.
Step 3
We have shown thatVP⊆WP , and that the WP are disjoint. The VP cover S up to a set of measure zero, and it follows that WP=VP (up to a set of measure zero). [Since WP and VP are both open, we actually have WP=VP exactly, but this is not essential.]
Now, we are almost done. Consider the pointsA=(2,2,n−4,0,…,0) and B=(3,1,n−4,0,…,0) . It is easy to see that WA and WB are congruent and translations of each other: the only way they could differ, is if the boundary of S (other than the faces on which A and B both lie) would ``cut off'' either WA or WB but not the other. But to reach such a part of the boundary of S , we would need to change one coordinate of A or B by at least 1, which would be enough to guarantee to take us out of WA and WB anyway. Thus, even though S does look different from the vantage points A and B , the differences are too far away to be picked up by the definitions of WA and WB , and thus WA and WB are congruent.
It follows then thatVA and VB have the same volume, and thus the Dirichlet model assigns them the same probability, even though they have different probabilities in the multinomial model.
fonte