Byte Strings Are Decoded To ASCII In IO By Default In Python
2 min readJan 13, 2022
The aim of this pageđź“ť is to explain why python seemingly prints characters even for byte strings. I dealt with this writing scripts requesting values from Consul KV store with require
module and they were arriving (logically) with b'
prefix when I was printing them to the console for the user (teammates).
1. encode and decode the terminology
- string =
<class 'str'>
= decoded byte string → that is executed by the brain - byte string =
<class 'bytes'>
= encoded string → that is executed by the machine - by default, python decodes byte strings to ASCII when they are printed — this is confusing unless known
- to remove
b'
pre-fix from the string you need to rundecode()
method on it - but if there are non-ASCII characters, you will notice/feel the pain immediately even when accessing/printing the variable
>>> a = 'š'.encode('utf-8')
>>> a
b'\xc5\xa1'
>>> a.decode('ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 0: ordinal not in range(128)
'ascii' codec can't decode byte 0xc5 in position 0: ordinal not in range(128)
>>> a.decode('utf-8')
'š'
>>> a
b'\xc5\xa1'
>>> print(a)
b'\xc5\xa1'
>>> a = 'š'.encode('ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\u0161' in position 0: ordinal not in range(128)
'ascii' codec can't encode character '\u0161' in position 0: ordinal not in range(128)