Modificação do laço para LARS

12

Estou tentando entender como o algoritmo Lars pode ser modificado para gerar o Lasso. Embora eu compreenda o LARS, não consigo ver a modificação do laço no artigo de Tibshirani et al. Em particular, não vejo por que a condição de sinal em que o sinal da coordenada diferente de zero deve concordar com o sinal da correlação atual. Alguém por favor pode me ajudar com isso. Acho que estou procurando uma prova matemática usando a condição KKT no problema original da norma L-1, ou seja, o Lasso. Muito obrigado!

lasso novato
fonte

Você está se referindo a Efron et al's stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf ? Isso prova isso no Lema 8 da seção 5. Ou estou entendendo mal sua pergunta?

Peter Ellis

1

Também não tenho certeza sobre a pergunta, mas, na verdade, o Lasso é uma simplificação do Lars: para o Lasso, você está procurando apenas correlações positivas entre as funções residuais atuais e as demais funções básicas, já que apenas correlações positivas levam a resultados positivos. coeficientes (~ não negativos).

Mr. White

2

Seja (tamanho ) denotar um conjunto de entradas padronizadas, respostas centradas em (tamanho ), pesos de regressão (tamanho ) e a $X$ $n\times p$ $y$ $n \times 1$ $\beta$ $p \times 1$ $\lambda > 0$ $l_1$ coeficiente de penalização -norm.

O problema do LASSO então escreve

\begin{aligned} β^{*} & = {argmin}_{β} L (β, λ) \\ L (β, λ) & = ‖ y - X β ‖_{2}^{2} + λ ‖ β ‖_{1} \end{aligned}

$\begin{align} \beta^* &= \text{argmin}_{\beta}\ L(\beta,\lambda) \\ L(\beta,\lambda) &= \Vert y-X\beta \Vert_2^2 + \lambda \Vert \beta \Vert_1 \end{align}$

$\lambda > 0$ $\beta^*(\lambda)$ .

$\lambda^*$ $\beta^*$

λ^{*} = 2 sign (β_{a}^{*}) X_{a}^{T} (y - X β^{*}), \forall a \in A

$\lambda^* = 2 \ \text{sign}(\beta_a^*) X_a^T (y - X \beta^*),\ \ \ \forall a \in A$

with $A$ representing the set of active predictors.

Because $\lambda^*$ must be positive (it is a penalisation coefficient), it is clear that the sign of $\beta_a^*$ (weight of any non-zero hence active predictor) should be the same than that of $X_a^T (y - X\beta^*) = X_{a}^T r$ i.e. the correlation with the current regression residual.

Quantuple
fonte

1

@Mr._White provided a great intuitive explanation of the major difference between LARS and Lasso; the only point I would add is that lasso is (kind of) like a backward selection approach, knocking out a term at each step as long as a term exists for which of those ("normalized" over $X \times X$ ) correlations exist. LARS keeps everything in there -- basically performing the lasso in every possible order. That does mean that in lasso, each iteration is dependent on which terms have already been removed.

Effron's implementation illustrates the differences vary well: lars.R in the source pkg for lars. Notice the update step of matrices $X \times X$ matrix and $\zeta$ starting at line 180, and the dropping of the terms for which $\zeta_{min} < \zeta_{current}$ . I can imagine some weird situations arising from spaces $A$ where the terms are unbalanced ( $x_1$ and $x_2$ are very correlated but not with others, $x_2$ with $x_3$ but not with others, etc.) the selection order could be quite biased.

egbutter
fonte

Modificação do laço para LARS

Respostas: