Imprimir palavra contendo string e primeira palavra

10

Quero encontrar uma string em uma linha de texto e imprimir a string (entre espaços) e a primeira palavra da frase.

Por exemplo:

"Esta é uma única linha de texto"
"Outra coisa"
"É melhor você tentar de novo"
"Melhor"

A lista de strings é:

texto
coisa
experimentar
Melhor

O que estou tentando é obter uma tabela como esta:

Este texto [guia]
Outra coisa [tab]
Ele tenta
Melhor

Eu tentei com grep, mas nada ocorreu. Alguma sugestão?

command-line text-processing regex Felipe Lira
fonte

Então, basicamente "Se a linha tiver uma string, imprima a primeira palavra + string". Direita ?

Sergiy Kolodyazhnyy

12

Versão Bash / grep:

#!/bin/bash
# string-and-first-word.sh
# Finds a string and the first word of the line that contains that string.

text_file="$1"
shift

for string; do
    # Find string in file. Process output one line at a time.
    grep "$string" "$text_file" | 
        while read -r line
    do
        # Get the first word of the line.
        first_word="${line%% *}"
        # Remove special characters from the first word.
        first_word="${first_word//[^[:alnum:]]/}"

        # If the first word is the same as the string, don't print it twice.
        if [[ "$string" != "$first_word" ]]; then
            echo -ne "$first_word\t"
        fi

        echo "$string"
    done
done

Chame assim:

./string-and-first-word.sh /path/to/file text thing try Better

Resultado:

This    text
Another thing
It  try
Better

wjandrea
fonte

9

Perl para o resgate!

#!/usr/bin/perl
use warnings;
use strict;

my $file = shift;
my $regex = join '|', map quotemeta, @ARGV;
$regex = qr/\b($regex)\b/;

open my $IN, '<', $file or die "$file: $!";
while (<$IN>) {
    if (my ($match) = /$regex/) {
        print my ($first) = /^\S+/g;
        if ($match ne $first) {
            print "\t$match";
        }
        print "\n";
    }
}

Salvar como first-plus-word, execute como

perl first-plus-word file.txt text thing try Better

Ele cria uma regex a partir das palavras de entrada. Cada linha é comparada com a regex e, se houver uma correspondência, a primeira palavra será impressa e, se for diferente da palavra, a palavra também será impressa.

choroba
fonte

9

Aqui está uma versão awk:

awk '
  NR==FNR {a[$0]++; next;} 
  {
    gsub(/"/,"",$0);
    for (i=1; i<=NF; i++)
      if ($i in a) printf "%s\n", i==1? $i : $1"\t"$i;
  }
  ' file2 file1

Onde file2está a lista de palavras e file1contém as frases.

chave de aço
fonte

2

Um bom! Eu colocá-lo em um arquivo de script, paste.ubuntu.com/23063130 , apenas por conveniência

Sergiy Kolodyazhnyy

8

Aqui está a versão python:

#!/usr/bin/env python
from __future__ import print_function 
import sys

# List of strings that you want
# to search in the file. Change it
# as you fit necessary. Remember commas
strings = [
          'text', 'thing',
          'try', 'Better'
          ]


with open(sys.argv[1]) as input_file:
    for line in input_file:
        for string in strings:
            if string in line:
               words = line.strip().split()
               print(words[0],end="")
               if len(words) > 1:
                   print("\t",string)
               else:
                   print("")

Demo:

$> cat input_file.txt                                                          
This is a single text line
Another thing
It is better you try again
Better
$> python ./initial_word.py input_file.txt                                      
This    text
Another     thing
It  try
Better

Nota lateral : o script é python3compatível, portanto, você pode executá-lo com python2ou python3.

Sergiy Kolodyazhnyy
fonte

7

Tente o seguinte:

$ sed -En 's/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/p' File
This    text
Another thing
It      try
        Better

Se a guia anterior a Betterfor um problema, tente o seguinte:

$ sed -En 's/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/; ta; b; :a; s/^\t//; p' File
This    text
Another thing
It      try
Better

O acima foi testado no GNU sed (chamado gsedno OSX). Para o BSD sed, algumas pequenas alterações podem ser necessárias.

Como funciona

s/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/

Isso procura por uma palavra, [[:alnum:]]+seguida por um espaço, [[:space:]]seguida por qualquer coisa .*, seguida por uma de suas palavras text|thing|try|Better, seguida por qualquer coisa. Se isso for encontrado, ele será substituído pela primeira palavra na linha (se houver), uma guia e a palavra correspondente.
ta; b; :a; s/^\t//; p

Se o comando de substituição resultou em uma substituição, significando que uma das suas palavras foi encontrada na linha, o tacomando diz ao sed para pular para o rótulo a. Caso contrário, ramificamos ( b) para a próxima linha. :adefine o rótulo a. Portanto, se uma das suas palavras foi encontrada, (a) fazemos a substituição s/^\t//que remove uma guia à esquerda, se houver uma, e (b) imprimimos ( p) a linha.

John1024
fonte

7

Uma abordagem simples bash / sed:

$ while read w; do sed -nE "s/\"(\S*).*$w.*/\1\t$w/p" file; done < words 
This    text
Another thing
It  try
    Better

O while read w; do ...; done < wordsiterará sobre cada linha do arquivo wordse o salvará como $w. O -nfaz sednão imprime nada por padrão. O sedcomando, então, substituirá aspas duplas seguidas por espaços em branco ( \"(\S*), os parênteses servem para "capturar" o que corresponde \S*à primeira palavra, e depois podemos nos referir a ela como \1), 0 ou mais caracteres ( .*) e, em seguida, o palavra que estamos procurando ( $w) e 0 ou mais caracteres novamente ( .*). Se este partidas, nós substitui-lo com apenas o 1º palavra, um guia e $w( \1\t$w), e imprimir a linha (que é o que o pno s///pfaz).

Terdon
fonte

5

Esta é a versão Ruby

str_list = ['text', 'thing', 'try', 'Better']

File.open(ARGV[0]) do |f|
  lines = f.readlines
  lines.each_with_index do |l, idx|
    if l.match(str_list[idx])
      l = l.split(' ')
      if l.length == 1
        puts l[0]
      else
        puts l[0] + "\t" + str_list[idx]
      end
    end
  end
end

O arquivo de texto de amostra hello.txtcontém

This is a single text line
Another thing
It is better you try again
Better

Executando com ruby source.rb hello.txtresultados em

This    text
Another thing
It      try
Better

Anwar
fonte

Imprimir palavra contendo string e primeira palavra

Respostas:

Demo:

Como funciona