Set

set is a mutable, unordered collection of objects. frozenset is similar to set, but immutable. See docs.python: set, frozenset for documentation.

Initialization

Sets are declared as a collection of objects separated by a comma within {} curly brace characters. The set() function can be used to initialize an empty set and to convert iterables.

>>> empty_set = set()
>>> empty_set
set()

>>> nums = {-0.1, 3, 2, -5, 7, 1, 6.3, 5}
# note that the order is not the same as declaration
>>> nums
{-0.1, 1, 2, 3, 5, 6.3, 7, -5}

# duplicates are automatically removed
>>> set([3, 2, 11, 3, 5, 13, 2])
{2, 3, 5, 11, 13}
>>> set('initialize')
{'a', 'n', 't', 'l', 'e', 'i', 'z'}

set doesn't allow mutable objects as elements.

>>> {1, 3, [1, 2], 4}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

>>> {1, 3, (1, 2), 4}
{3, 1, (1, 2), 4}

Set methods and operations

The in operator checks if a value is present in the given set. Since set uses hashtable (similar to dict keys), the lookup time is constant and much faster than ordered collections like list or tuple for large data sets.

>>> colors = {'red', 'blue', 'green'}
>>> 'blue' in colors
True
>>> 'orange' in colors
False

Here's some examples for set operations like union, intersection, etc. You can either use methods or operators, both will give you a new set object instead of in-place modification. The difference is that set methods can accept any iterable, not restricted to set objects.

>>> color_1 = {'teal', 'light blue', 'green', 'yellow'}
>>> color_2 = {'light blue', 'black', 'dark green', 'yellow'}

# union of two sets: color_1 | color_2
>>> color_1.union(color_2)
{'light blue', 'green', 'dark green', 'black', 'teal', 'yellow'}

# common items: color_1 & color_2
>>> color_1.intersection(color_2)
{'light blue', 'yellow'}

# items from color_1 not present in color_2: color_1 - color_2
>>> color_1.difference(color_2)
{'teal', 'green'}
# items from color_2 not present in color_1: color_2 - color_1
>>> color_2.difference(color_1)
{'dark green', 'black'}

# items present in one of the sets, but not both
# i.e. union of previous two operations: color_1 ^ color_2
>>> color_1.symmetric_difference(color_2)
{'green', 'dark green', 'black', 'teal'}

As mentioned in Dict chapter, methods like keys(), values() and items() return a set-like object. You can apply set operators on them.

>>> marks_1 = dict(Rahul=86, Ravi=92, Rohit=75)
>>> marks_2 = dict(Jo=89, Rohit=78, Joe=75, Ravi=100)

>>> marks_1.keys() & marks_2.keys()
{'Ravi', 'Rohit'}
>>> marks_1.keys() - marks_2.keys()
{'Rahul'}

Methods like add(), update(), symmetric_difference_update(), intersection_update() and difference_update() will do the modifications in-place.

>>> color_1 = {'teal', 'light blue', 'green', 'yellow'}
>>> color_2 = {'light blue', 'black', 'dark green', 'yellow'}

# union
>>> color_1.update(color_2)
>>> color_1
{'light blue', 'green', 'dark green', 'black', 'teal', 'yellow'}

# adding a single value
>>> color_2.add('orange')
>>> color_2
{'black', 'yellow', 'dark green', 'light blue', 'orange'}

The pop() method will return a random element being removed. Use the remove() method if you want to delete an element based on its value. The discard() method is similar to remove(), but it will not generate an error if the element doesn't exist. The clear() method will delete all the elements.

>>> colors = {'red', 'blue', 'green'}

>>> colors.pop()
'blue'
>>> colors
{'green', 'red'}

>>> colors.clear()
>>> colors
set()

Here's some examples for comparison operations.

>>> names_1 = {'Ravi', 'Rohit'}
>>> names_2 = {'Ravi', 'Ram', 'Rohit', 'Raj'}

>>> names_1 == names_2
False

# same as: names_1 <= names_2
>>> names_1.issubset(names_2)
True

# same as: names_2 >= names_1
>>> names_2.issuperset(names_1)
True

# disjoint means there's no common elements: not names_1 & names_2
>>> names_1.isdisjoint(names_2)
False
>>> names_1.isdisjoint({'Jo', 'Joe'})
True

Exercises

  • Write a function that checks whether an iterable has duplicate values or not.

    >>> has_duplicates('pip')
    True
    >>> has_duplicates((3, 2))
    False
    
  • What does the above function return for has_duplicates([3, 2, 3.0])?