One of the errors that you’re likely to encounter when programming is the Unicodedecodeerror: ‘ascii’ codec can’t decode byte 0x90 in position 614: ordinal not in range(128) which may occur when you want to decode a string of bytes into character strings.
To solve the error, find a way of replacing the invalid characters initiating the error. Alternatively, use the utf-8
encoding as opposed to ASCII
.
This guide presents a guide to go about it. So, let’s dive in.
How to Recreate the UnicodeDecodeError
To get a better understanding of the Unicodedecodeerror: ‘ascii’ codec can’t decode byte 0x90 in position 614: ordinal not in range(128), we take a look at some of the examples we can recreate this issue.
Error example 1: How the error can occur in a file
# Error 1
with open('file.txt', 'r', encoding='ascii') as f:
content = f.readlines()
print(content)
Output:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128)
We try to read the contents of the file using the ASCII encoding. However, the file has characters that are not within the range of ASCII characters and therefore resulting in the UnicodeDecodeError.
Error example 2: When you encode and decode a string
In the example, we try to encode a string that does not have ASCII
characters with the use of utf-8 encoding and then later we try to decode using ASCII encoding. The issue again is that the string has characters that are not within the range of ASCII
and therefore causing the error.
This error happens when a sequence of bytes cannot be decoded into strings under the specified encoding. This message implies that the ASCII codec cannot decode the byte with the value 0xe4 which is in position 7; this is because it is not under the value of ASCII characters that is in the range of 0 to 127.
Solutions for Fixing the UnicodeDecodeError
Now we have a clear picture of the way the Unicodedecodeerror: ‘ascii’ codec can’t decode byte 0x90 in position 614: ordinal not in range(128) can be recreated we take a look at some of the solutions that can be used to fix this issue.
Solution 1: Use a different encoding
One solution is through the use of a different encoding that can manage to handle the characters in question. For example, other than using ASCII we will use UTF-8 which supports a wider range of characters.
# Solution: Use UTF-8 Encoding
with open('file.txt', 'r', encoding='utf-8') as f:
content = f.readlines()
print(content)
Output:
['\n', 'dfxyz\n', '\n']
Code now runs correctly because UTF-8 is wide and can represent characters that are defined under the Unicode standard. With its variable length encoding, different characters can be represented under different numbers of bytes.
ASCII
hence all valid characters in UTF-8 are also valid ASCII characters.Solution 2: Use the error parameter to handle invalid characters
A different solution is to use the error parameter when opening a file or when you want to decode strings. This parameter mentions or specifies the way that the error can be handled when decoding. We can set it to ignore invalid characters or replace to replace the characters with a replacement character.
# Using ignore/replace
with open('file.txt', 'r', encoding='ascii', errors='ignore') as f:
content = f.readlines()
print(content)
# Solution: Use replace
with open('file.txt', 'r', encoding='ascii', errors='replace') as f:
content = f.readlines()
print(content)
Output:
['��������������������\n', 'dfxyz\n', '\n']
From the above solution, you find that there is an errors parameter as a way of handling invalid characters when decoding sequences of bits. When you set it to ignore or replace, you just directly ignore characters that are invalid or replace invalid characters with the Unicode replacement character, otherwise called (U+FFFD), and that is typically represented as a question mark inside a diamond-like shape.
And also when encoding and decoding strings, use the same encoding for all operations. This way, all characters get to be encoded and decoded correctly.
Solution 3: Use consistent encoding
The above has utf-8
for encoding and decoding the string. This way, all the characters in the string are in the form of bytes and then later decoded into characters.