Intro to Python Sets and Using them for Deduplication

1. attributes

  • collection
  • unordered
  • elements are unique
  • mutable (there is also a frozen set which is immutable)
  • each element is immutable (like keys of a dictionary)

2. syntax

  • the literal form similar to dicts
>>> set = {333,555,77,32,124}
>>> set
{32, 555, 77, 333, 124}

3. constructor

  • NOTE: {} is already reserved for the creation of a dictionary — you, therefore, need to use the set() constructor
  • out of 4 main collection types (list, dict, set, tuple), set does not have a literal constructor
>>> f = {}
>>> f
{}
>>> type(f)
<class 'dict'>
>>> g = []
>>> g
[]
>>> type(g)
<class 'list'>
>>> b = ()
>>> b
()
>>> type(b)
<class 'tuple'>
>>> e = set()
>>> e
set()
>>> type(e)
<class 'set'>
  • any duplicates thereof are discarded
>>> j = set([1,2,2,2,3,4,5,6,11,6,])
>>> j
{1, 2, 3, 4, 5, 6, 11}

4. membership

  • this is a fundamental use — note that items of a set cannot be retrieved by their position/index
  • tested with in and not in operators
>>> j
{1, 2, 3, 4, 5, 6, 11}


>>> j[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'set' object is not subscriptable

'set' object is not subscriptable

>>> 11 in j
True

5. deduplication

  • set constructor is commonly used to efficiently remove duplicate items from a series of objects
>>> l = [1,1,2,4,6,7,1,44,108,108,108]
>>> dedup = set(l)
>>> dedup
{1, 2, 4, 6, 7, 44, 108}

6. sources

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Pavol Kutaj

Pavol Kutaj

Infrastructure Support Engineer/Technical Writer (Snowplow Analytics) with a passion for Python/writing documentation. More about me: https://pavol.kutaj.com