UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 2: invalid start byte, tried all encoding styles

Issue

ad
    data = self._reader.read(nrows)
  File "pandas\_libs\parsers.pyx", line 826, in pandas._libs.parsers.TextReader.read
  File "pandas\_libs\parsers.pyx", line 841, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas\_libs\parsers.pyx", line 920, in pandas._libs.parsers.TextReader._read_rows
  File "pandas\_libs\parsers.pyx", line 1052, in pandas._libs.parsers.TextReader._convert_column_data
  File "pandas\_libs\parsers.pyx", line 1083, in pandas._libs.parsers.TextReader._convert_tokens
  File "pandas\_libs\parsers.pyx", line 1220, in pandas._libs.parsers.TextReader._convert_with_dtype
  File "pandas\_libs\parsers.pyx", line 1238, in pandas._libs.parsers.TextReader._string_convert
  File "pandas\_libs\parsers.pyx", line 1429, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 2: invalid start byte

I am getting above error while reading my CSV

to rectify this I used unicode escape:

csv_df=pd.read_csv(file_path,header=0,squeeze=True,dtype=str,keep_default_na=False,encoding='unicode_escape')   

However,
Now I am getting \xa0 for space between two words:

'ObjectStatus': 'IN\xa0SERVICE'

My CSV has:

Key          Values
RequestID   
ObjectType   CONTAINER
ObjectName   INMUNVMBMHPBNB6001ENBCMW005
ObjectStatus IN SERVICE
ObjectType   CONTAINER

Solution

The unicode_escape codec is for literal escape codes (length 4 \\xa0 vs. length 1 \xa0). As displayed, that’s just Python’s debug representation of the string, and it prints \xa0 to show that it isn’t a regular space. You’re file is probably encoded in cp1252 or latin1, as \xa0 is the NO-BREAK SPACE in those encodings.

Example:

>>> d = {'ObjectStatus': 'IN\xa0SERVICE'}
>>> d
{'ObjectStatus': 'IN\xa0SERVICE'}
>>> print(d['ObjectStatus'])
IN SERVICE

Answered By – Mark Tolonen

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published