Explaining Bytes Objects within the Context of HashLib Library in Python

ASCII is the default encoding for bytes objects: there’s no plain text (remember The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) — Joel on Software)!

Pavol Kutaj
2 min readMay 9, 2023

The hashlib module in Python provides a common interface to many different secure hash and message digest algorithms.

  • Included are the FIPS secure hash algorithms SHA1, SHA224, SHA256, SHA384, and SHA512 (defined in FIPS 180–2) as well as RSA’s MD5 algorithm (defined in internet RFC 1321).
  • Here’s an example of how you can use the sha256 method from the hashlib module:
import hashlib

message = 'Hello, World!'
hashed_message = hashlib.sha256(message.encode('utf-8')).hexdigest()

print(hashed_message)
# Output: 7509e5bda0c762d2bac7f90d758b5b2263fa01ccbc542ab5e3df163be08e6ca9
  • In this example, we first encode the message string into bytes using the encode method with 'utf-8' encoding.
  • Then we pass the encoded message to the sha256 method of the hashlib module.
  • Finally, we use the hexdigest method to get the hexadecimal representation of the hash.

Why are we using hexdigest

  • The hexdigest method is used to get a hexadecimal string representation of the binary data returned by the digest method.
  • This can be useful for displaying the hash in a more human-readable format or for storing it in a database or file.
  • Hexadecimal representation is more compact than binary representation and can be easily converted back to binary if needed.

The b prefix in front of a string in Python signifies that it is a bytes string literal.

  • You could also do just this, without the encode()
import hashlib
hash_object = hashlib.sha256(b"example@test.com")
hex_dig = hash_object.hexdigest()
print(hex_dig)
>>> 273f6ec2fc79031c824daff15d9415db2e8f2dd2a934b6b8b13540b5f94062b0
  • In other words, a = b'example@test.com' and a = 'example@test.com'.encode('utf-8') are equivalent in Python
  • Both create a bytes object with the same content.
  • When you use a bytes literal with the b prefix to create a bytes object, you don't need to specify an encoding because the bytes object is created directly from the literal.
  • The characters in the bytes literal are interpreted as ASCII characters and are converted to their corresponding byte values.
  • If you use special characters that are not part of the ASCII character set in a bytes literal, you will get a SyntaxError.
  • To create a bytes object containing special characters that are not part of the ASCII character set, you can use the encode() method of a string and specify an encoding that supports the special characters:
a = 'example@test.com£'.encode('utf-8')
print(a)
>>> b'example@test.com\xc2\xa3'
>>> b'example@test.com£'
File "<stdin>", line 1
SyntaxError: bytes can only contain ASCII literal characters.

LINKS

--

--

Pavol Kutaj

Today I Learnt | Infrastructure Support Engineer at snowplow.io with a passion for cloud infrastructure/terraform/python/docs. More at https://pavol.kutaj.com