Python hashing tutorial explains the hashing concept in Python. We explain hash tables and Python hashable objects.
last modified January 29, 2024
Python hashing tutorial explains the hashing concept in Python. We explain hash tables and Python hashable objects.
Hash tables are used to implement map and set data structures in many common programming languages, such as C++, Java, and Python. Python uses hash tables for dictionaries and sets. A hash table is an unordered collection of key-value pairs, where each key is unique. Hash tables offer a combination of efficient lookup, insert and delete operations. These are the best properties of arrays and linked lists.
Hashing is the process of using an algorithm to map data of any size to a fixed length. This is called a hash value. Hashing is used to create high performance, direct access data structures where large amount of data is to be stored and accessed quickly. Hash values are computed with hash functions.
An object is hashable if it has a hash value which never changes during its lifetime. (It can have different values during multiple invocations of Python programs.) A hashable object needs a hash method. In order to perform comparisons, a hashable needs an eq method.
**Note: ** Hashable objects which compare equal must have the same hash value.
Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally. Python immutable built-in objects are hashable; mutable containers (such as lists or dictionaries) are not. Objects which are instances of user-defined classes are hashable by default. They all compare unequal (except with themselves), and their hash value is derived from their id.
**Note: ** If a class does not define an eq method it should not define a hash operation either; if it defines eq but not hash, its instances will not be usable as items in hashable collections.
The hash function returns the hash value of the object if it has one. Hash values are integers. They are used to quickly compare dictionary keys during a dictionary lookup. Objects can implement the hash method.
Python immutable builtins, such as integers, strings, or tuples, are hashable.
builtin_hashables.py
#!/usr/bin/python
val = 100
print(val.hash()) print(“falcon”.hash()) print((1,).hash())
The example prints the values of three hashables: an integer, a string, and a tuple.
Python custom objects are hashable by default. Their hash is derived from their Id.
custom_object.py
#!/usr/bin/python
class User:
def __init__(self, name, occupation):
self.name = name
self.occupation = occupation
u1 = User(‘John Doe’, ‘gardener’) u2 = User(‘John Doe’, ‘gardener’)
print(‘hash of user 1’) print(hash(u1))
print(‘hash of user 2’) print(hash(u2))
if (u1 == u2): print(‘same user’) else: print(‘different users’)
In the example, we have two instances of a User.
u1 = User(‘John Doe’, ‘gardener’) u2 = User(‘John Doe’, ‘gardener’)
We have two instances with the same data.
print(‘hash of user 1’) print(hash(u1))
The hash function returns the hash value of the object. The default implementation is derived from the Id of the object.
$ python custom_object.py hash of user 1 -9223371894419573195 hash of user 2 142435202673 different users
Even though the user details are the same, the comparison yields differet objects. In order to change it, we need to implement the eq method.
In the second example, we implement a custom eq method.
custom_object2.py
#!/usr/bin/python
class User:
def __init__(self, name, occupation):
self.name = name
self.occupation = occupation
def __eq__(self, other):
return self.name == other.name \
and self.occupation == other.occupation
def __str__(self):
return f'{self.name} {self.occupation}'
u1 = User(‘John Doe’, ‘gardener’) u2 = User(‘John Doe’, ‘gardener’)
if (u1 == u2): print(‘same user’) print(f’{u1} == {u2}’) else: print(‘different users’)
Now the comparison returns the expected output for us; however, we cannot insert the objects into a Python set; it would result in TypeError: unhashable type: ‘User’. In order to change this, we implement the hash method.
In the third example, we implement the eq and the hash methods.
custom_object3.py
#!/usr/bin/python
class User:
def __init__(self, name, occupation):
self.name = name
self.occupation = occupation
def __eq__(self, other):
return self.name == other.name \
and self.occupation == other.occupation
def __hash__(self):
return hash((self.name, self.occupation))
def __str__(self):
return f'{self.name} {self.occupation}'
u1 = User(‘John Doe’, ‘gardener’) u2 = User(‘John Doe’, ‘gardener’)
users = {u1, u2}
print(len(users))
if (u1 == u2): print(‘same user’) print(f’{u1} == {u2}’) else: print(‘different users’)
print(’————————————’)
u1.occupation = ‘programmer’
users = {u1, u2}
print(len(users))
if (u1 == u2): print(‘same user’) print(f’{u1} == {u2}’) else: print(‘different users’)
The example compares two objects that have custom implementation of the eq and hash methods. The objects can be inserted into a Python set and when an attribute is later changed, we get the expected output.
def hash(self): return hash((self.name, self.occupation))
The implementation of the hash function returns a hash value computed with the hash function from a tuple of attributes.
2 different users
In the fourth example, we add a mutable object to our custom class. This results in un unexpected output.
Note: If a class defines mutable objects and implements an eq method, it should not implement hash, since the implementation of hashable collections requires that a key’s hash value is immutable.
custom_object4.py
#!/usr/bin/python
class User:
def __init__(self, name, occupation, colours):
self.name = name
self.occupation = occupation
self.colours = colours
def __eq__(self, other):
return self.name == other.name \
and self.occupation == other.occupation
def __hash__(self):
return hash((self.name, self.occupation))
def __str__(self):
return f'{self.name} {self.occupation} {self.colours}'
u1 = User(‘John Doe’, ‘gardener’, [‘steelblue’, ‘green’, ‘red’]) u2 = User(‘John Doe’, ‘gardener’, [‘steelblue’, ‘green’, ‘red’])
s1 = {u1, u2} print(len(s1))
if (u1 == u2): print(‘same user’) print(f’{u1} == {u2}’) else: print(‘different users’)
print(’———————–’)
u1.colours[1] = ‘blue’
s2 = {u1, u2} print(len(s2))
if (u1 == u2): print(‘same user’) print(f’{u1} == {u2}’) else: print(‘different users’)
In the example, we have a list as an attribute. A list is a mutable object and it has consequences for the hashing algorithm.
u1.colours[1] = ‘blue’
We change an element of a list of the first user. However, this is not reflected.
1 same user John Doe gardener [‘steelblue’, ‘blue’, ‘red’] == John Doe gardener [‘steelblue’, ‘green’, ‘red’]
The output still says that the objects are equal.
–>
Since Python 3.7, we have the dataclass decorator, which automatically generates some boilerplate code.
The dataclass decorator has a frozen argument (False by default). If specified, the fields will be frozen (i.e. read-only). If eq is set to True, which it is by default then the hash method is implemented and object instances will be hashable.
decorator.py
#!/usr/bin/python
from dataclasses import dataclass
@dataclass(frozen=True) class User:
name: str
occupation: str
u1 = User(‘John Doe’, ‘gardener’) u2 = User(‘John Doe’, ‘gardener’)
if (u1 == u2): print(‘same user’) print(f’{u1} == {u2}’) else: print(‘different users’)
users = {u1, u2} print(len(users))
The example uses the @dataclass decorator.
$ python decorator.py same user User(name=‘John Doe’, occupation=‘gardener’) == User(name=‘John Doe’, occupation=‘gardener’) 1
Python hash function - language reference
In this article we have covered hashing in Python.
My name is Jan Bodnar, and I am a passionate programmer with extensive programming experience. I have been writing programming articles since 2007. To date, I have authored over 1,400 articles and 8 e-books. I possess more than ten years of experience in teaching programming.
List all Python tutorials.