Carregando um arquivo com mais de uma linha de JSON no Pandas

Question 1

Estou tentando ler um arquivo JSON em um quadro de dados Python pandas (0.14.0). Esta é a primeira linha do arquivo JSON:

{"votes": {"funny": 0, "useful": 0, "cool": 0}, "user_id": "P_Mk0ygOilLJo4_WEvabAA", "review_id": "OeT5kgUOe3vcN7H6ImVmZQ", "stars": 3, "date": "2005-08-26", "text": "This is a pretty typical cafe.  The sandwiches and wraps are good but a little overpriced and the food items are the same.  The chicken caesar salad wrap is my favorite here but everything else is pretty much par for the course.", "type": "review", "business_id": "Jp9svt7sRT4zwdbzQ8KQmw"}

Eu estou tentando fazer o seguinte: df = pd.read_json(path).

Estou recebendo o seguinte erro (com traceback completo):

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/d/anaconda/lib/python2.7/site-packages/pandas/io/json.py", line 198, in read_json
    date_unit).parse()
  File "/Users/d/anaconda/lib/python2.7/site-packages/pandas/io/json.py", line 266, in parse
    self._parse_no_numpy()
  File "/Users/d/anaconda/lib/python2.7/site-packages/pandas/io/json.py", line 483, in _parse_no_numpy
    loads(json, precise_float=self.precise_float), dtype=None)
ValueError: Trailing data

Qual é o Trailing dataerro? Como faço para lê-lo em um quadro de dados?

Seguindo algumas sugestões, aqui estão algumas linhas do arquivo .json:

{"votes": {"funny": 0, "useful": 0, "cool": 0}, "user_id": "P_Mk0ygOilLJo4_WEvabAA", "review_id": "OeT5kgUOe3vcN7H6ImVmZQ", "stars": 3, "date": "2005-08-26", "text": "This is a pretty typical cafe.  The sandwiches and wraps are good but a little overpriced and the food items are the same.  The chicken caesar salad wrap is my favorite here but everything else is pretty much par for the course.", "type": "review", "business_id": "Jp9svt7sRT4zwdbzQ8KQmw"}
{"votes": {"funny": 0, "useful": 0, "cool": 0}, "user_id": "TNJRTBrl0yjtpAACr1Bthg", "review_id": "qq3zF2dDUh3EjMDuKBqhEA", "stars": 3, "date": "2005-11-23", "text": "I agree with other reviewers - this is a pretty typical financial district cafe.  However, they have fantastic pies.  I ordered three pies for an office event (apple, pumpkin cheesecake, and pecan) - all were delicious, particularly the cheesecake.  The sucker weighed in about 4 pounds - no joke.\n\nNo surprises on the cafe side - great pies and cakes from the catering business.", "type": "review", "business_id": "Jp9svt7sRT4zwdbzQ8KQmw"}
{"votes": {"funny": 0, "useful": 0, "cool": 0}, "user_id": "H_mngeK3DmjlOu595zZMsA", "review_id": "i3eQTINJXe3WUmyIpvhE9w", "stars": 3, "date": "2005-11-23", "text": "Decent enough food, but very overpriced. Just a large soup is almost $5. Their specials are $6.50, and with an overpriced soda or juice, it's approaching $10. A bit much for a cafe lunch!", "type": "review", "business_id": "Jp9svt7sRT4zwdbzQ8KQmw"}

Este arquivo .json que estou usando contém um objeto JSON em cada linha de acordo com a especificação.

Tentei o site jsonlint.com conforme sugerido e deu o seguinte erro:

Parse error on line 14:
...t7sRT4zwdbzQ8KQmw"}{    "votes": {
----------------------^
Expecting 'EOF', '}', ',', ']'

Question 2

A partir da versão 0.19.0 do Pandas você pode usar o linesparâmetro, assim:

import pandas as pd

data = pd.read_json('/path/to/file.json', lines=True)

Question 3

Você tem que ler linha por linha. Por exemplo, você pode usar o seguinte código fornecido por ryptophan no reddit :

import pandas as pd

# read the entire file into a python array
with open('your.json', 'rb') as f:
    data = f.readlines()

# remove the trailing "\n" from each line
data = map(lambda x: x.rstrip(), data)

# each element of 'data' is an individual JSON object.
# i want to convert it into an *array* of JSON objects
# which, in and of itself, is one large JSON object
# basically... add square brackets to the beginning
# and end, and have all the individual business JSON objects
# separated by a comma
data_json_str = "[" + ','.join(data) + "]"

# now, load it into pandas
data_df = pd.read_json(data_json_str)

Question 4

O código a seguir me ajudou a carregar o JSONconteúdo em um dataframe:

import json
import pandas as pd

with open('Appointment.json', encoding="utf8") as f:
    data = f.readlines()
    data = [json.loads(line) for line in data] #convert string to dict format
df = pd.read_json(data) # Load into dataframe

Question 5

Eu tive um problema parecido.

Acontece que pd.read_json(myfile.json) ele pesquisará na pasta pai automaticamente, mas retornará esse erro de 'dados finais' se você não estiver na mesma pasta do arquivo.

Eu descobri, porque quando eu tentei fazer isso open('myfile.json', 'r'), eu consegui umFileNotFound erro, verifiquei os caminhos.

Não consegui mover myfile.json para a mesma pasta do meu bloco de notas.

Mudar para pd.read_json('../myfile.json')apenas funcionou.

Question 6

Eu também enfrentei o mesmo problema. Acontece quando seus dados são escritos em linhas separadas por linhas finais como '\ n'; Você precisa primeiro lê-los em linhas e, em seguida, converter cada linha em tipos integrados do Python. Eu resolvi desta forma:

with open("/path/to/file") as f:
    content = f.readlines()

data = [eval(c) for c in content]
data = pd.DataFrame(data)

Boa sorte!

Answer 1

Estou tentando ler um arquivo JSON em um quadro de dados Python pandas (0.14.0). Esta é a primeira linha do arquivo JSON:

{"votes": {"funny": 0, "useful": 0, "cool": 0}, "user_id": "P_Mk0ygOilLJo4_WEvabAA", "review_id": "OeT5kgUOe3vcN7H6ImVmZQ", "stars": 3, "date": "2005-08-26", "text": "This is a pretty typical cafe.  The sandwiches and wraps are good but a little overpriced and the food items are the same.  The chicken caesar salad wrap is my favorite here but everything else is pretty much par for the course.", "type": "review", "business_id": "Jp9svt7sRT4zwdbzQ8KQmw"}

Eu estou tentando fazer o seguinte: df = pd.read_json(path).

Estou recebendo o seguinte erro (com traceback completo):

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/d/anaconda/lib/python2.7/site-packages/pandas/io/json.py", line 198, in read_json
    date_unit).parse()
  File "/Users/d/anaconda/lib/python2.7/site-packages/pandas/io/json.py", line 266, in parse
    self._parse_no_numpy()
  File "/Users/d/anaconda/lib/python2.7/site-packages/pandas/io/json.py", line 483, in _parse_no_numpy
    loads(json, precise_float=self.precise_float), dtype=None)
ValueError: Trailing data

Qual é o Trailing dataerro? Como faço para lê-lo em um quadro de dados?

Seguindo algumas sugestões, aqui estão algumas linhas do arquivo .json:

{"votes": {"funny": 0, "useful": 0, "cool": 0}, "user_id": "P_Mk0ygOilLJo4_WEvabAA", "review_id": "OeT5kgUOe3vcN7H6ImVmZQ", "stars": 3, "date": "2005-08-26", "text": "This is a pretty typical cafe.  The sandwiches and wraps are good but a little overpriced and the food items are the same.  The chicken caesar salad wrap is my favorite here but everything else is pretty much par for the course.", "type": "review", "business_id": "Jp9svt7sRT4zwdbzQ8KQmw"}
{"votes": {"funny": 0, "useful": 0, "cool": 0}, "user_id": "TNJRTBrl0yjtpAACr1Bthg", "review_id": "qq3zF2dDUh3EjMDuKBqhEA", "stars": 3, "date": "2005-11-23", "text": "I agree with other reviewers - this is a pretty typical financial district cafe.  However, they have fantastic pies.  I ordered three pies for an office event (apple, pumpkin cheesecake, and pecan) - all were delicious, particularly the cheesecake.  The sucker weighed in about 4 pounds - no joke.\n\nNo surprises on the cafe side - great pies and cakes from the catering business.", "type": "review", "business_id": "Jp9svt7sRT4zwdbzQ8KQmw"}
{"votes": {"funny": 0, "useful": 0, "cool": 0}, "user_id": "H_mngeK3DmjlOu595zZMsA", "review_id": "i3eQTINJXe3WUmyIpvhE9w", "stars": 3, "date": "2005-11-23", "text": "Decent enough food, but very overpriced. Just a large soup is almost $5. Their specials are $6.50, and with an overpriced soda or juice, it's approaching $10. A bit much for a cafe lunch!", "type": "review", "business_id": "Jp9svt7sRT4zwdbzQ8KQmw"}

Este arquivo .json que estou usando contém um objeto JSON em cada linha de acordo com a especificação.

Tentei o site jsonlint.com conforme sugerido e deu o seguinte erro:

Parse error on line 14:
...t7sRT4zwdbzQ8KQmw"}{    "votes": {
----------------------^
Expecting 'EOF', '}', ',', ']'

Answer 2

1

Você tem dados adicionais no arquivo que não fazem parte do objeto JSON.

Martijn Pieters

Answer 3

Como são as últimas linhas do arquivo json?

Bryan Oakley

Answer 4

2

Este exemplo foi lido muito bem para mim no pandas 0.16.0. Qual versão de pandas você está usando?

Andy Hayden

Answer 5

1

@ user62198 atualização para 0.16.0, houve algumas correções para read_json.

Andy Hayden

Answer 6

1

@Cornel Ghiban, posso carregar o arquivo inteiro ou ler uma linha individual. Parece que a conversão para o formato que você mencionou pode ser um pouco difícil, pois existem mais de 5 milhões de registros.

user62198

Answer 7

255

A partir da versão 0.19.0 do Pandas você pode usar o linesparâmetro, assim:

import pandas as pd

data = pd.read_json('/path/to/file.json', lines=True)

Andrew
fonte

Alguma ideia de como obter uma solução alternativa para esse problema relevante para o linesargumento? github.com/pandas-dev/pandas/issues/15132

Chuck

Answer 8