Issue
ad
data = self._reader.read(nrows)
File "pandas\_libs\parsers.pyx", line 826, in pandas._libs.parsers.TextReader.read
File "pandas\_libs\parsers.pyx", line 841, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas\_libs\parsers.pyx", line 920, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 1052, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas\_libs\parsers.pyx", line 1083, in pandas._libs.parsers.TextReader._convert_tokens
File "pandas\_libs\parsers.pyx", line 1220, in pandas._libs.parsers.TextReader._convert_with_dtype
File "pandas\_libs\parsers.pyx", line 1238, in pandas._libs.parsers.TextReader._string_convert
File "pandas\_libs\parsers.pyx", line 1429, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 2: invalid start byte
I am getting above error while reading my CSV
to rectify this I used unicode escape:
csv_df=pd.read_csv(file_path,header=0,squeeze=True,dtype=str,keep_default_na=False,encoding='unicode_escape')
However,
Now I am getting \xa0 for space between two words:
'ObjectStatus': 'IN\xa0SERVICE'
My CSV has:
Key Values
RequestID
ObjectType CONTAINER
ObjectName INMUNVMBMHPBNB6001ENBCMW005
ObjectStatus IN SERVICE
ObjectType CONTAINER
Solution
The unicode_escape
codec is for literal escape codes (length 4 \\xa0
vs. length 1 \xa0
). As displayed, that’s just Python’s debug representation of the string, and it prints \xa0
to show that it isn’t a regular space. You’re file is probably encoded in cp1252
or latin1
, as \xa0
is the NO-BREAK SPACE
in those encodings.
Example:
>>> d = {'ObjectStatus': 'IN\xa0SERVICE'}
>>> d
{'ObjectStatus': 'IN\xa0SERVICE'}
>>> print(d['ObjectStatus'])
IN SERVICE
Answered By – Mark Tolonen
This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0