Estou tentando entender como o algoritmo Lars pode ser modificado para gerar o Lasso. Embora eu compreenda o LARS, não consigo ver a modificação do laço no artigo de Tibshirani et al. Em particular, não vejo por que a condição de sinal em que o sinal da coordenada diferente de zero deve concordar com o sinal da correlação atual. Alguém por favor pode me ajudar com isso. Acho que estou procurando uma prova matemática usando a condição KKT no problema original da norma L-1, ou seja, o Lasso. Muito obrigado!
12
Respostas:
Seja (tamanho n × p ) denotar um conjunto de entradas padronizadas, respostas centradas em y (tamanho n × 1 ), pesos de regressão β (tamanho p × 1 ) e λ > 0 a l 1X n×p y n×1 β p×1 λ>0 l1 coeficiente de penalização -norm.
O problema do LASSO então escreve
withA representing the set of active predictors.
Becauseλ∗ must be positive (it is a penalisation coefficient), it is clear that the sign of β∗a (weight of any non-zero hence active predictor) should be the same than that of XTa(y−Xβ∗)=XTar i.e. the correlation with the current regression residual.
fonte
@Mr._White provided a great intuitive explanation of the major difference between LARS and Lasso; the only point I would add is that lasso is (kind of) like a backward selection approach, knocking out a term at each step as long as a term exists for which of those ("normalized" overX×X ) correlations exist. LARS keeps everything in there -- basically performing the lasso in every possible order. That does mean that in lasso, each iteration is dependent on which terms have already been removed.
Effron's implementation illustrates the differences vary well: lars.R in the source pkg for lars. Notice the update step of matricesX×X matrix and ζ starting at line 180, and the dropping of the terms for which ζmin<ζcurrent . I can imagine some weird situations arising from spaces A where the terms are unbalanced (x1 and x2 are very correlated but not with others, x2 with x3 but not with others, etc.) the selection order could be quite biased.
fonte