# Intro to Python Sets and Using them for Deduplication

The aim of this pageđź“ťis to cover Python sets. As beautifully explained in Set Theory: the Method To Database Madness by Vaidehi Joshi, Medium, sets are essential concepts for working with data(bases). It is a primitive data structure in Python with both mutable (`Set`) and immutable (`Frozenset`) type and I am using it mostly for deduplication - for example, I have hundreds of data processing jobs with environments within their suffix (foo-prod1, bar-prod1, acme-dev1, xxx-qa1) and I quickly need to get unique values of the environments (I get a set of 5 environments from the list of 200 jobs). Also, I am moved to share these notes because of the following claim made on Leet Code

If I had to choose three built in functions/methods that I wasnâ€™t comfortable with at the start and have found them super helpful, Iâ€™d probably say enumerate, zip and set

• collection
• unordered
• elements are unique
• mutable (there is also a frozen set which is immutable)
• each element is immutable (like keys of a dictionary)
• the literal form similar to dicts
`>>> set = {333,555,77,32,124}>>> set{32, 555, 77, 333, 124}`
• NOTE: `{}` is already reserved for the creation of a dictionary â€” you, therefore, need to use the `set()` constructor
• out of 4 main collection types (list, dict, set, tuple), `set` does not have a literal constructor
`>>> f = {}>>> f{}>>> type(f)<class 'dict'>>>> g = []>>> g[]>>> type(g)<class 'list'>>>> b = ()>>> b()>>> type(b)<class 'tuple'>>>> e = set()>>> eset()>>> type(e)<class 'set'>`
• you can create a set from any iterable series
• any duplicates thereof are discarded
`>>> j = set([1,2,2,2,3,4,5,6,11,6,])>>> j{1, 2, 3, 4, 5, 6, 11}`
• this is a fundamental use â€” note that items of a set cannot be retrieved by their position/index
• tested with `in` and `not in` operators
`>>> j{1, 2, 3, 4, 5, 6, 11}>>> j[0]Traceback (most recent call last):  File "<stdin>", line 1, in <module>TypeError: 'set' object is not subscriptable'set' object is not subscriptable>>> 11 in jTrue`
• set constructor is commonly used to efficiently remove duplicate items from a series of objects
`>>> l = [1,1,2,4,6,7,1,44,108,108,108]>>> dedup = set(l)>>> dedup{1, 2, 4, 6, 7, 44, 108}`

--

--

Infrastructure Support Engineer/Technical Writer (snowplow.io) with a passion for Python/writing documentation. More about me: https://pavol.kutaj.com

## Get the Medium app

Infrastructure Support Engineer/Technical Writer (snowplow.io) with a passion for Python/writing documentation. More about me: https://pavol.kutaj.com