Como amostra de uma distribuição discreta (categórica) no espaço de log?

12

Suponhamos que tem uma distribuição discreta definida pelo vector modo que a categoria seja desenhada com probabilidade e assim por diante. Descobri então que alguns dos valores na distribuição são tão pequenos que sobrecarregam a representação do número de ponto flutuante do meu computador. Portanto, para compensar, faço todos os meus cálculos no espaço de log. Agora que tem um vector de log-espaço $\theta_0, \theta_1, ..., \theta_N$ $0$ $\theta_0$ . $log(\theta_0), log(\theta_1), ..., log(\theta_N)$

É possível provar a partir da distribuição de tal forma que as probabilidades originais segurar (categoria é desenhado com probabilidade ), mas sem nunca deixar log-espaço? Em outras palavras, como faço para amostrar dessa distribuição sem underflows? $i$ $\theta_i$

random-generation Josh Hansen
fonte

15

É possível obter amostras da distribuição categórica, dadas as probabilidades de log sem deixar espaço no log usando o truque Gumbel-max . A idéia é que, se você receber probabilidades log anormais , isso pode ser traduzido para probabilidades apropriadas usando a função softmax $\alpha_1,\dots,\alpha_k$

p_{i} = \frac{\exp (α_{i})}{\sum_{j} \exp (α_{j})}

$p_i = \frac{\exp(\alpha_i)}{\sum_j \exp(\alpha_j)}$

então, para amostrar a partir dessa distribuição, você pode usar o fato de que se são amostras independentes colhidas da distribuição Gumbel padrão parametrizada pelo local , $g_1,\dots,g_k \sim \mathcal{G}(0)$ $m$

F (G \leq g) = \exp (- \exp (- g + m))

$F(G \le g) = \exp(-\exp(-g+m))$

então pode ser mostrado (veja as referências abaixo) que

\begin{aligned} \underset{i}{a r g m a x} {g_{i} + α_{i}} & \sim \frac{\exp (α_{i})}{\sum_{j} \exp (α_{j})} \\ max_{i} {g_{i} + α_{i}} & \sim G (\log \sum_{i} \exp {α_{i}}) \end{aligned}

$\DeclareMathOperator*{\argmax}{arg\,max} \begin{align} \argmax_i \,\{\, g_i + \alpha_i \,\} &\sim \frac{\exp(\alpha_i)}{\sum_j \exp(\alpha_j)} \\ \max_i\,\{\, g_i + \alpha_i \,\} &\sim \mathcal{G}(\; \log\sum_i\exp\{\alpha_i\}\;) \end{align}$

e podemos pegar

z = \underset{i}{a r g m a x} {g_{i} + α_{i}}

$z = \argmax_i \,\{\, g_i + \alpha_i \,\}$

$p_1,\dots,p_k$ probabilities. This approach was described in greater detail in blog entries by Ryan Adams and Laurent Dinh, moreover Chris J. Maddison, Daniel Tarlow and Tom Minka gave a speach (slides) on Neural Information Processing Systems conference (2014) and wrote a paper titled A* Sampling that generalized those ideas (see also Maddison, 2016; Maddison, Mnih and Teh, 2016; Jang and Poole, 2016), who refer to Yellott (1977) mentioning his as the one among those who first described this property.

It is pretty easy to implement it using inverse transform sampling by taking $g_i=-\log(-\log u_i)$ where $u_i$ are draws from uniform distribution on $(0,1)$ . It is certainly not the most time-efficient algorithms for sampling from categorical distribution, but it let's you to stay in log-space what may be an advantage in some scenarios.

Maddison, C. J., Tarlow, D., & Minka, T. (2014). A* sampling. [In:] Advances in Neural Information Processing Systems (pp. 3086-3094).

Yellott, J. I. (1977). The relationship between Luce's choice axiom, Thurstone's theory of comparative judgment, and the double exponential distribution. Journal of Mathematical Psychology, 15(2), 109-144.

Maddison, C. J., Mnih, A., & Teh, Y. W. (2016). The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables. arXiv preprint arXiv:1611.00712.

Jang, E., Gu, S., & Poole, B. (2016). Categorical Reparameterization with Gumbel-Softmax. arXiv preprint arXiv:1611.01144.

Maddison, C. J. (2016). A Poisson process model for Monte Carlo. arXiv preprint arXiv:1602.05986.

Tim
fonte

5

Here is one common way to avoid underflow/overflow.

Let $m = \max_i \log(\theta_i)$ .

Let $\theta_i' = \exp( \log(\theta_i) - m )$ .

You can sample from $\theta' = [\theta_1' , \theta_2',...]$ .

Siddharth Gopal
fonte

1

This works as long as the difference between any one value and the max isn't too great---when that happens, the exp can lose precision, leading to distributions like [1.0, 3.45e-66, 0.0, 7.54e-121]. I'd like to hold out for some answer that is robust even in that case. But for now I'm upvoting your answer.

Josh Hansen

Como amostra de uma distribuição discreta (categórica) no espaço de log?

Respostas: