RBF Gaussiano vs. Kernel Gaussiano

A única diferença real está na regularização aplicada. Uma rede RBF regularizada normalmente usa uma penalidade com base na norma quadrática dos pesos. Para a versão do kernel, a penalidade normalmente está na norma quadrática dos pesos do modelo linear, implicitamente construído no espaço de recurso induzido pelo kernel. A principal diferença prática que isso faz é que a penalidade para a rede RBF depende dos centros da rede RBF (e, portanto, da amostra de dados usada), enquanto que para o kernel RBF, o espaço de recurso induzido é o mesmo, independentemente da amostra de dados, portanto, a penalidade é uma penalidade na função do modelo, e não na sua parametrização .

Em outras palavras, para ambos os modelos, temos

$f(\vec{x}') = \sum_{i=1}^\ell \alpha_i \mathcal{K}(\vec{x}_i, \vec{x}')$

Para a abordagem de rede RBF, o critério de treinamento é

$L = \sum_{i=1}^\ell (y_i - f(\vec{x}_i))^2 + \lambda \|\alpha\|^2$

Para o método de kernel RBF, temos que , e . Isso significa que uma penalidade de norma quadrática nos pesos do modelo no espaço de recurso induzido, pode ser escrita em termos dos parâmetros duplos, como $\mathcal{K}(\vec{x},\vec{x}') = \phi(\vec{x})\cdot\phi(\vec{x}')$ $\vec{w} = \sum_{i=1}^\ell \alpha_i\phi(\vec{x}_i)$ $\vec{w}$ $\vec{\alpha}$

$\|\vec{w}\|^2 = \vec{\alpha}^T\matrix{K}\vec{\alpha},$

onde é a matriz de avaliações em pares do kernel para todos os padrões de treinamento. O critério de treinamento é então $\matrix{K}$

. $L = \sum_{i=1}^\ell (y_i - f(\vec{x}_i))^2 + \lambda \vec{\alpha}^T\matrix{K}\vec{\alpha}$

A única diferença entre os dois modelos é o no termo de regularização. $\matrix{K}$

A principal vantagem teórica da abordagem do kernel é que ela permite interpretar um modelo não linear como um modelo linear após uma transformação não linear fixa que não depende da amostra de dados. Assim, qualquer teoria estatística de aprendizagem existente para modelos lineares é transferida automaticamente para a versão não linear. No entanto, tudo isso quebra assim que você tenta ajustar os parâmetros do kernel; nesse ponto, voltamos ao mesmo ponto teoricamente falando que estávamos com as redes neurais RBF (e MLP). Portanto, a vantagem teórica talvez não seja tão grande quanto gostaríamos.

É provável que faça alguma diferença real em termos de desempenho? Provavelmente não muito. Os teoremas do "sem almoço grátis" sugerem que não há superioridade a priori de nenhum algoritmo sobre todos os outros, e a diferença na regularização é bastante sutil, portanto, se houver dúvida, tente os dois e escolha o melhor de acordo com, por exemplo, a validação cruzada.

Dikran Marsupial
fonte

‖ \vec{α} ‖^{2} = {\vec{α}}^{T} \begin{matrix} I \end{matrix} \vec{α}

$\|\vec{\alpha}\|^2 = \vec{\alpha}^T\matrix{I}\vec{\alpha}$

{\vec{α}}^{T} \begin{matrix} K \end{matrix} \vec{α}

$\vec{\alpha}^T\matrix{K}\vec{\alpha}$ for the kernel machine. They would become more similar as the width of the basis function approaches zero as

K

$K$ would approach

I

$I$ . I think this is essentially because

K

$K$ is accounting for the correlation between basis functions.

Dikran Marsupial

@CagdasOzgenc The way I look at it is that the

K

$K$ in the regulariser weights the penalisation differently for each basis vector, and the penalty depends on the selection of the other basis vectors. This weight depends on their correlations, so if you pick a different sample, the weights change to compensate. The other way to look at it is that the model is defined in a feature space determined by

ϕ (x)

$\phi(x)$ , which doesn't depend on the choice of basis vectors (providing they span the space containing the data).

Dikran Marsupial

@CagdasOzgenc Sure we can transform the space of the basis functions by an eigen-decomposition of

K

$K$ and regain a

‖ {\vec{α}}^{'} ‖^{2}

$\|\vec{\alpha}'\|^2$ style regulariser (indeed that is a useful trick in optimising the regularisation parameter - doi.org/10.1016/j.neunet.2007.05.005). However that transformation eliminates the dependency of the original choice of basis function. For the two things to be equal would require

{\vec{α}}^{T} \begin{matrix} K \end{matrix} \vec{α} = μ {\vec{α}}^{T} \begin{matrix} I \end{matrix} \vec{α}

$\vec{\alpha}^T\matrix{K}\vec{\alpha} = \mu\vec{\alpha}^T\matrix{I}\vec{\alpha}$ , which is not generally true (especially not for the RBF kernel).

Dikran Marsupial

Thank you. I will reflect on this will get back to you. At the moment it seems I am not at your level of understanding. I need to do more thinking :).

Cagdas Ozgenc

@CagdasOzgenc no problem, most of the standard texts explain it through eigenfunctions of the kernel function, which makes my brain hurt as well! ;o)

Dikran Marsupial

RBF Gaussiano vs. Kernel Gaussiano

Respostas: