Explaining Bytes Objects within the Context of HashLib Library in Python
ASCII is the default encoding for bytes objects: there’s no plain text (remember The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) — Joel on Software)!
2 min readMay 9, 2023
The hashlib
module in Python provides a common interface to many different secure hash and message digest algorithms.
- Included are the FIPS secure hash algorithms SHA1, SHA224, SHA256, SHA384, and SHA512 (defined in FIPS 180–2) as well as RSA’s MD5 algorithm (defined in internet RFC 1321).
- Here’s an example of how you can use the
sha256
method from thehashlib
module:
import hashlib
message = 'Hello, World!'
hashed_message = hashlib.sha256(message.encode('utf-8')).hexdigest()
print(hashed_message)
# Output: 7509e5bda0c762d2bac7f90d758b5b2263fa01ccbc542ab5e3df163be08e6ca9
- In this example, we first encode the message string into bytes using the
encode
method with'utf-8'
encoding. - Then we pass the encoded message to the
sha256
method of thehashlib
module. - Finally, we use the
hexdigest
method to get the hexadecimal representation of the hash.
Why are we using hexdigest
- The
hexdigest
method is used to get a hexadecimal string representation of the binary data returned by thedigest
method. - This can be useful for displaying the hash in a more human-readable format or for storing it in a database or file.
- Hexadecimal representation is more compact than binary representation and can be easily converted back to binary if needed.
The b
prefix in front of a string in Python signifies that it is a bytes string literal.
- You could also do just this, without the
encode()
import hashlib
hash_object = hashlib.sha256(b"example@test.com")
hex_dig = hash_object.hexdigest()
print(hex_dig)
>>> 273f6ec2fc79031c824daff15d9415db2e8f2dd2a934b6b8b13540b5f94062b0
- In other words,
a = b'example@test.com'
anda = 'example@test.com'.encode('utf-8')
are equivalent in Python - Both create a bytes object with the same content.
- When you use a bytes literal with the
b
prefix to create a bytes object, you don't need to specify an encoding because the bytes object is created directly from the literal. - The characters in the bytes literal are interpreted as ASCII characters and are converted to their corresponding byte values.
- If you use special characters that are not part of the ASCII character set in a bytes literal, you will get a
SyntaxError
. - To create a bytes object containing special characters that are not part of the ASCII character set, you can use the
encode()
method of a string and specify an encoding that supports the special characters:
a = 'example@test.com£'.encode('utf-8')
print(a)
>>> b'example@test.com\xc2\xa3'
>>> b'example@test.com£'
File "<stdin>", line 1
SyntaxError: bytes can only contain ASCII literal characters.