K-Means Golfe em Cluster

8

K-Means Clustering ( Wikipedia )

A tarefa aqui é bastante simples, execute uma única iteração de um algoritmo de clustering k-means em uma matriz binária. Esta é essencialmente a tarefa de configuração do principal algoritmo k-means, achei que a instalação poderia ser mais fácil e atrair linguagens de golfe para tentar também. A matriz passada para você terá o seguinte formato:

 0 0 0 0 0 0 0 0 1
 0 1 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 1
 0 0 1 0 0 0 0 0 0
 0 0 0 0 0 1 0 0 1
 0 0 0 0 0 0 0 0 0
 0 1 0 0 0 0 0 0 0
 0 1 0 0 0 1 0 0 0
 0 1 0 0 0 0 0 0 1
 0 0 0 0 0 0 0 0 0

1 representa um ponto, enquanto 0 representa uma falta de pontos. Seu trabalho será gerar k-1centróides aleatoriamente e executar o agrupamento inicial dos dados em torno dos centróides gerados. kdefine-se como min(grid#Width, grid#Height)-1. A rotulagem de cada centróide deve ir de 2até k. Por exemplo, nesses cenários, você pode gerar os seguintes centróides:

Centroid 2 was generated at: (1.0, 4.0)
Centroid 3 was generated at: (1.0, 5.0)
Centroid 4 was generated at: (5.0, 1.0)
Centroid 5 was generated at: (3.0, 3.0)
Centroid 6 was generated at: (0.0, 2.0)
Centroid 7 was generated at: (6.0, 6.0)
Centroid 8 was generated at: (2.0, 6.0)

Depois de gerar os centróides, você deve percorrer todos os pontos marcados com a 1, pois podemos tratar os pontos marcados 0como espaço vazio. Para cada centróide, você deve decidir qual centróide é o mais próximo do ponto em questão. Aqui estão as distâncias para o exemplo:

(0,8) distance from centroid 2 is 4.123105625617661
(0,8) distance from centroid 3 is 3.1622776601683795
(0,8) distance from centroid 4 is 8.602325267042627
(0,8) distance from centroid 5 is 5.830951894845301
(0,8) distance from centroid 6 is 6.0
(0,8) distance from centroid 7 is 6.324555320336759
(0,8) distance from centroid 8 is 2.8284271247461903
(1,1) distance from centroid 2 is 3.0
(1,1) distance from centroid 3 is 4.0
(1,1) distance from centroid 4 is 4.0
(1,1) distance from centroid 5 is 2.8284271247461903
(1,1) distance from centroid 6 is 1.4142135623730951
(1,1) distance from centroid 7 is 7.0710678118654755
(1,1) distance from centroid 8 is 5.0990195135927845
(2,8) distance from centroid 2 is 4.123105625617661
(2,8) distance from centroid 3 is 3.1622776601683795
(2,8) distance from centroid 4 is 7.615773105863909
(2,8) distance from centroid 5 is 5.0990195135927845
(2,8) distance from centroid 6 is 6.324555320336759
(2,8) distance from centroid 7 is 4.47213595499958
(2,8) distance from centroid 8 is 2.0
(3,2) distance from centroid 2 is 2.8284271247461903
(3,2) distance from centroid 3 is 3.605551275463989
(3,2) distance from centroid 4 is 2.23606797749979
(3,2) distance from centroid 5 is 1.0
(3,2) distance from centroid 6 is 3.0
(3,2) distance from centroid 7 is 5.0
(3,2) distance from centroid 8 is 4.123105625617661
(4,5) distance from centroid 2 is 3.1622776601683795
(4,5) distance from centroid 3 is 3.0
(4,5) distance from centroid 4 is 4.123105625617661
(4,5) distance from centroid 5 is 2.23606797749979
(4,5) distance from centroid 6 is 5.0
(4,5) distance from centroid 7 is 2.23606797749979
(4,5) distance from centroid 8 is 2.23606797749979
(4,8) distance from centroid 2 is 5.0
(4,8) distance from centroid 3 is 4.242640687119285
(4,8) distance from centroid 4 is 7.0710678118654755
(4,8) distance from centroid 5 is 5.0990195135927845
(4,8) distance from centroid 6 is 7.211102550927978
(4,8) distance from centroid 7 is 2.8284271247461903
(4,8) distance from centroid 8 is 2.8284271247461903
(6,1) distance from centroid 2 is 5.830951894845301
(6,1) distance from centroid 3 is 6.4031242374328485
(6,1) distance from centroid 4 is 1.0
(6,1) distance from centroid 5 is 3.605551275463989
(6,1) distance from centroid 6 is 6.082762530298219
(6,1) distance from centroid 7 is 5.0
(6,1) distance from centroid 8 is 6.4031242374328485
(7,1) distance from centroid 2 is 6.708203932499369
(7,1) distance from centroid 3 is 7.211102550927978
(7,1) distance from centroid 4 is 2.0
(7,1) distance from centroid 5 is 4.47213595499958
(7,1) distance from centroid 6 is 7.0710678118654755
(7,1) distance from centroid 7 is 5.0990195135927845
(7,1) distance from centroid 8 is 7.0710678118654755
(7,5) distance from centroid 2 is 6.082762530298219
(7,5) distance from centroid 3 is 6.0
(7,5) distance from centroid 4 is 4.47213595499958
(7,5) distance from centroid 5 is 4.47213595499958
(7,5) distance from centroid 6 is 7.615773105863909
(7,5) distance from centroid 7 is 1.4142135623730951
(7,5) distance from centroid 8 is 5.0990195135927845
(8,1) distance from centroid 2 is 7.615773105863909
(8,1) distance from centroid 3 is 8.06225774829855
(8,1) distance from centroid 4 is 3.0
(8,1) distance from centroid 5 is 5.385164807134504
(8,1) distance from centroid 6 is 8.06225774829855
(8,1) distance from centroid 7 is 5.385164807134504
(8,1) distance from centroid 8 is 7.810249675906654
(8,8) distance from centroid 2 is 8.06225774829855
(8,8) distance from centroid 3 is 7.615773105863909
(8,8) distance from centroid 4 is 7.615773105863909
(8,8) distance from centroid 5 is 7.0710678118654755
(8,8) distance from centroid 6 is 10.0
(8,8) distance from centroid 7 is 2.8284271247461903
(8,8) distance from centroid 8 is 6.324555320336759

O resultado final do algoritmo de agrupamento é que não deve haver 1s na matriz, apenas números de centróides. É por isso que era importante rotular os centróides 2-k+1, permitindo substituí-los da seguinte maneira:

 0 0 0 0 0 0 0 0 8
 0 6 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 8
 0 0 5 0 0 0 0 0 0
 0 0 0 0 0 5 0 0 7
 0 0 0 0 0 0 0 0 0
 0 4 0 0 0 0 0 0 0
 0 4 0 0 0 7 0 0 0
 0 4 0 0 0 0 0 0 7
 0 0 0 0 0 0 0 0 0

Esse é o layout inicial de cluster para 7 centróides na grade fornecida, considerando centróides gerados aleatoriamente. Seu trabalho é produzir a versão em cluster da grade de entrada binária.

Regras

  • Os k-1centróides devem ser gerados aleatoriamente e devem estar em qualquer lugar de (0,0)até (grid#Width, grid#Height).
    • O valor de ké min(grid#Width, grid#Height)-1.
    • Os centróides gerados DEVEM ser numerados de 2a k.
  • O formato de entrada DEVE ser uma grade de 0s e 1s, onde 0s representam espaço vazio e 1s representam pontos.
    • Uma grade é uma sequência usando 1 caractere por célula e \ncomo delimitadores de linha ou uma matriz 2D.
    • Não é garantido que a grade passada seja quadrada, mas que não fique vazia.
  • A saída final pode usar matrizes ou uma sequência delimitada.
  • O código mais curto vence, é o .
Urna de polvo mágico
fonte
1
Relacionado .
milhas

Respostas:

1

Mathematica 109 Bytes

Se eu entendi a pergunta corretamente, algo assim deve funcionar. "KMeans" é um dos métodos internos para FindClusters e ClusteringComponents.

SparseArray[Thread[(p=Position[#,1])->1+ClusteringComponents[p,Min[d=Dimensions@#]-1,1,Method->"KMeans"]],d]&

Uso: %@in//MatrixFormpara obter visualização, iné uma matriz inteira.

Kelly Lowder
fonte