Burrows, Wheeler e Costas

15

fundo

A transformação Burrows – Wheeler (BWT) é uma permutação reversível dos caracteres de uma string que resulta em grandes execuções de caracteres semelhantes para certos tipos de strings, como texto sem formatação. É usado, por exemplo, no algoritmo de compactação bzip2 .

O BWT é definido da seguinte maneira:

Dada uma sequência de entrada, como codegolf, calcule todas as rotações possíveis e classifique-as em ordem lexicográfica:

codegolf
degolfco
egolfcod
fcodegol
golfcode
lfcodego
odegolfc
olfcodeg

O BWT da string codegolfé a string que consiste no último caractere de cada string nessa ordem, ou seja, a última coluna do bloco acima. Pois codegolf, isso produz fodleocg.

Por si só, essa transformação não é reversível, pois as strings codegolfe golfcoderesultam na mesma string. No entanto, se soubermos que a sequência termina com um f, existe apenas uma pré-imagem possível.

Tarefa

Implemente um programa ou função involutivo que leia uma única string de STDIN ou como argumento de linha de comando ou função e imprima ou retorne o BWT ou seu inverso da string de entrada.

Se a sequência de entrada não contiver espaços, seu envio deverá anexar um único espaço à entrada e calcular o BWT.

Se a sequência de entrada já contiver um espaço, deverá calcular a pré-imagem do BWT que possui um espaço à direita e retirar esse espaço.

Exemplos

INPUT:  ProgrammingPuzzles&CodeGolf
OUTPUT: fs&e grodllnomzomaiCrGgPePzu
INPUT:  fs&e grodllnomzomaiCrGgPePzu
OUTPUT: ProgrammingPuzzles&CodeGolf
INPUT:  bt4{2UK<({ZyJ>LqQQDL6!d,@:~L"#Da\6%EYp%y_{ed2GNmF"1<PkB3tFbyk@u0#^UZ<52-@bw@n%m5xge2w0HeoM#4zaT:OrI1I<|f#jy`V9tGZA5su*b7X:Xn%L|9MX@\2W_NwQ^)2Yc*1b7W<^iY2i2Kr[mB;,c>^}Z]>kT6_c(4}hIJAR~x^HW?l1+^5\VW'\)`h{6:TZ)^#lJyH|J2Jzn=V6cyp&eXo4]el1W`AQpHCCYpc;5Tu@$[P?)_a?-RV82[):[@94{*#!;m8k"LXT~5EYyD<z=n`Gfn/;%}did\fw+/AzVuz]7^N%vm1lJ)PK*-]H~I5ixZ1*Cn]k%dxiQ!UR48<U/fbT\P(!z5l<AefL=q"mx_%C:2=w3rrIL|nghm1i\;Ho7q+44D<74y/l/A)-R5zJx@(h8~KK1H6v/{N8nB)vPgI$\WI;%,DY<#fz>is"eB(/gvvP{7q*$M4@U,AhX=JmZ}L^%*uv=#L#S|4D#<
OUTPUT: <#Q6(LFksq*MD"=L0<f^*@I^;_6nknNp;pWPBc@<A^[JZ?\B{qKc1u%wq1dU%;2)?*nl+U(yvuwZl"KIl*mm5:dJi{\)8YewB+RM|4o7#9t(<~;^IzAmRL\{TVH<bb]{oV4mNh@|VCT6X)@I/Bc\!#YKZDl18WDIvXnzL2Jcz]PaWux[,4X-wk/Z`J<,/enkm%HC*44yQ,#%5mt2t`1p^0;y]gr~W1hrl|yI=zl2PKU~2~#Df"}>%Io$9^{G_:\[)v<viQqwAU--A#ka:b5X@<2!^=R`\zV7H\217hML:eiD2ECETxUG}{m2:$r'@aiT5$dzZ-4n)LQ+x7#<>xW)6yWny)_zD1*f @F_Yp,6!ei}%g"&{A]H|e/G\#Pxn/(}Ag`2x^1d>5#8]yP>/?e51#hv%;[NJ"X@fz8C=|XHeYyQY=77LOrK3i5b39s@T*V6u)v%gf2=bNJi~m5d4YJZ%jbc!<f5Au4J44hP/(_SLH<LZ^%4TH8:R
INPUT:  <#Q6(LFksq*MD"=L0<f^*@I^;_6nknNp;pWPBc@<A^[JZ?\B{qKc1u%wq1dU%;2)?*nl+U(yvuwZl"KIl*mm5:dJi{\)8YewB+RM|4o7#9t(<~;^IzAmRL\{TVH<bb]{oV4mNh@|VCT6X)@I/Bc\!#YKZDl18WDIvXnzL2Jcz]PaWux[,4X-wk/Z`J<,/enkm%HC*44yQ,#%5mt2t`1p^0;y]gr~W1hrl|yI=zl2PKU~2~#Df"}>%Io$9^{G_:\[)v<viQqwAU--A#ka:b5X@<2!^=R`\zV7H\217hML:eiD2ECETxUG}{m2:$r'@aiT5$dzZ-4n)LQ+x7#<>xW)6yWny)_zD1*f @F_Yp,6!ei}%g"&{A]H|e/G\#Pxn/(}Ag`2x^1d>5#8]yP>/?e51#hv%;[NJ"X@fz8C=|XHeYyQY=77LOrK3i5b39s@T*V6u)v%gf2=bNJi~m5d4YJZ%jbc!<f5Au4J44hP/(_SLH<LZ^%4TH8:R
OUTPUT: bt4{2UK<({ZyJ>LqQQDL6!d,@:~L"#Da\6%EYp%y_{ed2GNmF"1<PkB3tFbyk@u0#^UZ<52-@bw@n%m5xge2w0HeoM#4zaT:OrI1I<|f#jy`V9tGZA5su*b7X:Xn%L|9MX@\2W_NwQ^)2Yc*1b7W<^iY2i2Kr[mB;,c>^}Z]>kT6_c(4}hIJAR~x^HW?l1+^5\VW'\)`h{6:TZ)^#lJyH|J2Jzn=V6cyp&eXo4]el1W`AQpHCCYpc;5Tu@$[P?)_a?-RV82[):[@94{*#!;m8k"LXT~5EYyD<z=n`Gfn/;%}did\fw+/AzVuz]7^N%vm1lJ)PK*-]H~I5ixZ1*Cn]k%dxiQ!UR48<U/fbT\P(!z5l<AefL=q"mx_%C:2=w3rrIL|nghm1i\;Ho7q+44D<74y/l/A)-R5zJx@(h8~KK1H6v/{N8nB)vPgI$\WI;%,DY<#fz>is"eB(/gvvP{7q*$M4@U,AhX=JmZ}L^%*uv=#L#S|4D#<
INPUT:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
OUTPUT: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
INPUT:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
OUTPUT: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Regras adicionais

  • Você não pode usar nenhum operador interno que calcule o BWT (ou seu inverso) de uma sequência.

  • Você não pode usar nenhum operador interno que inverta funções.

  • Seu código pode imprimir uma nova linha à direita se você escolher STDOUT para saída, mesmo se ele não suportar novas linhas na entrada.

  • Seu código precisa funcionar para qualquer entrada de 500 ou menos caracteres ASCII imprimíveis (0x20 a 0x7E), incluindo no máximo um espaço.

  • Para qualquer uma das entradas possíveis descritas acima, seu código deve terminar em menos de dez minutos na minha máquina (Intel Core i7-3770, 16 GiB RAM). O último caso de teste deve ser o mais lento, portanto, certifique-se de cronometrar seu código com esse.

    Para a maioria dos idiomas e abordagens, deve ser fácil cumprir o prazo. Essa regra visa unicamente impedir a implementação de uma transformação como uma força bruta inversa da outra.

  • Aplicam-se regras de código padrão de golfe. O menor envio em bytes vence.

Dennis
fonte
"Você não pode usar nenhum operador interno que inverta funções." Eu ficaria muito surpreso se isso existisse!
Hugh Allen
4
@HughAllen J possui inv.
217 Dennis
Pode funcionar para funções triviais ou internas, mas como poderia funcionar para funções gerais definidas pelo usuário? Este "inv" está documentado em algum lugar?
Hugh Allen
@HughAllen: Eu realmente não conheço J, mas aqui está um exemplo de invtrabalho em uma função definida pelo usuário. Não tenho certeza se invfuncionaria para o BWT, mas é melhor prevenir do que remediar.
Dennis
Relacionado, consulte codegolf.stackexchange.com/questions/4771/… para obter um bzip2 completo.
Keith Randall

Respostas:

9

Pitão, 29 bytes

?tthu+VzSGzz}dzseMS.>L+z\ hlz

Demonstração. Equipamento de teste.

Isso se divide diretamente em um segmento de codificação e decodificação, com o código decidindo qual usar com um ternário.

?tthu+VzSGzz}dzseMS.>L+z\ hlz
                                 Implicit:
                                 z = input()
                                 d = ' '

?           }dz                  If there is a space in the input,
    u     zz                     Update G to the result of the following,
                                 with G starting as z and repeating len(z) times:
     +V                          The vectorized sum of
       zSG                       z and sorted(G)
   h                             Take the first such result, which will consist of
                                 the first character of z followed by the
                                 first cyclic permuation of the pre-BWT string,
                                 which must start with ' '.
 tt                              Remove the first two characters and return.
                 L               Otherwise, left-map (map with the variable as the 
                                 left parameter and a constant as the right)
               .>                cyclic right shift
                  +z\            of z + ' '
                      hlz        over range(len(z)+1)
              S                  Sort the shifted strings,
            eM                   take their last charactes,
           s                     combine into one string and return.
isaacg
fonte
6

CJam, 41 36 35 bytes

q:XS&LX,{X\.+$}*0=1>XS+_,,\fm<$zW>?

Teste aqui.

Explicação

q:X   e# Read STDIN and store it in X.
S&    e# Take the set intersection with " ". We'll use this as a truthy/falsy value to
      e# select the correct output later.

# Compute the iBWT:
LX,   e# Push an empty array, compute the length of X.
{     e# Run the following block that many times:
  X\  e# Push X and pull the other array on top.
  .+  e# Add the characters of X to the corresponding line of the other array,
      e# i.e. prepend X as a new column.
  $   e# Sort the rows.
}*
0=    e# Since we just sorted the rows, the first permutation of the output will be
      e# one starting with a space, followed by the string we actually want. So just
      e# pick the first permutation.
1>    e# Remove the leading space.

# Compute the BWT:
XS+   e# Push X and append a space.
_,    e# Get that string's length N.
,\    e# Turn it into a range [0 .. N-1], swap it with the string.
fm<   e# Map each value in the range to the string shifted left by that many characters.
$     e# Sort the permutations.
zW>   e# Transpose the grid and discard all lines but the last.

?     e# Choose between the iBWT and the BWT based on whether the input had a space.
Martin Ender
fonte
2

Perl 5, 179

Outro não tão bom de mim. Provavelmente, há alguns ganhos a serem obtidos nesse caso, mas não competirá com os idiomas de golfe criados para esse fim.

$_=<>;@y=/./g;if(/ /){@x=sort@y;for(1..$#y){@x=sort map$y[$_].$x[$_],0..$#x}
say$x[0]=~/^ (.*)/}else{push@y," ";say map/(.)$/,sort map{$i=$_;join"",
map$y[($_+$i)%@y],0..$#y}0..$#y}

Sem golfe:

# Read input
$_ = <>;
# Get all the chars of the input
my @chars = /./g;

if (/ /) {
  # If there's a space, run the IBWT:
  # Make the first column of the table
  my @working = sort @chars;

  # For each remaining character
  for (1 .. $#chars) {
    # Add the input as a new column to the left of @working,
    # then sort @working again
    @working = sort map {
      $chars[$_] . $working[$_]
    } 0 .. $#working;
  }
  # Print the first element of @working (the one beginning with space), sans space
  say $working[0] =~ /^ (.*)/;
} else {
  # BWT
  # Add a space to the end of the string
  push @chars, " ";
  # Get all the rotations of the string and sort them
  @rows = sort map {
    my $offset = $_;
    join "", map {
      $chars[($_ + $offset) % @chars]
    } 0 .. $#chars
  } 0 .. $#chars;

  # Print all the last characters
  say map /(.)$/, @rows;
}
hobbs
fonte
Algumas pequenas melhorias: 1. Se você usa o -nswitch (normalmente contado como 1 byte), não precisa $_=<>;. 2. for(1..$#y){...}pode se tornar ... for 1..$#y. 3. $"é inicializado " "e um byte mais curto.
Dennis
@Dennis boa chamada. Eu tinha tido várias instruções no forem um ponto para que o formulário postfix não foi uma vitória, mas quando eu pared para baixo para que eu não percebi :)
Hobbs