DEV Community

Cover image for Three Implementations of a Bag in Python
Farley Knight
Farley Knight

Posted on

Three Implementations of a Bag in Python

One of the first data structures discussed in Sedgewick's Algorithms is the Bag. A Bag is a data structure, which holds an unordered collection. We say unordered because it is in contrast to ordered collections like conventional arrays. Because we are not concerned with order, we can store the elements in any way we want.

This article will discuss three different implementations of Bag, each using a different internal data structure. By using different internal data structures, a clearer picture is created on the value of these interfaces, and can also help identify time complexity trade-offs.

The Bag Interface

The image below describes a Bag interface in Java:

Alt Text

It describes five operations:

  1. The constructor Bag() takes zero arguments
  2. The method add(item) inserts an item into the Bag.
  3. The method isEmpty() tells us if the Bag is empty.
  4. The method size() tells us the size of the Bag.
  5. The interface Iterable<Item> in Java allows the use of the for .. in .. loop.

Implementing Iterable in Python

Unlike Java, Python does not have a formal concept of a typed interface. However, it does have facilities to allow an object to be iterable, much like Java, using the __iter__ and _next__. The following is an example of how these methods work in Python:

class DoubleYourNumber:
  def __init__(self, start=0, stop=10):
    self.start = start
    self.stop  = stop
  def __iter__(self):
    self.current = self.start
    return self
  def __next__(self):
    if self.current <= self.stop:
      value = self.current * 2
      self.current += 1
      return value
    else:
      raise StopIteration

The above is an example of a class that supports the Python Iterator interface. Below is an example of how this class could be used to iterate:

>>> for value in  DoubleYourNumber(stop=5):
...    print(value)
...
0
2
4
6
8
10

You can get the same effect by using iter and next directly on the iterator object:

>>> doubler = DoubleYourNumber(stop=5)
>>> iter_doubler = iter(doubler)
>>> next(iter_doubler)
0
>>> next(iter_doubler)
2
>>> next(iter_doubler)
4
>>> next(iter_doubler)
6
>>> next(iter_doubler)
8
>>> next(iter_doubler)
10
>>> next(iter_doubler)
Traceback (most recent call last):
...
StopIteration

In a way, you can think of iterators as following this while loop pattern:

iter_object = iter(object)

while True:
  try:
    value = next(iter_object)
    # Do something with `value`
  except StopIteration:
    break # Leave the infinite while loop

We'll be using this pattern in our implementations of Bag.

Unit Testing a Bag

Since we don't have a formal interface, we can instead rely on unit testing to confirm the implementation is useful for our needs. Below is a set of tests that we can use to verify our implementation.

import unittest

from bag_implementation import Bag

class BagUnitTest(unittest.TestCase):
  def test_new_bag_is_empty(self):
    """
    A Bag should initially be empty.
    """
    bag = Bag()
    self.assertEqual(bag.is_empty(), True)

  def test_add_increases_size_by_one(self):
    """
    Calling bag.add(...) should increase it's size by one.
    """
    bag = Bag()
    bag.add(1)
    self.assertEqual(bag.size(), 1)

  def test_bag_can_be_iterated_over(self):
    """
    Bag uses the iterable Python interface.
    """
    bag = Bag()
    for i in [1,2,3]:
      bag.add(i)

    sum = 0
    for i in bag:
      sum += i

    self.assertEqual(sum, 6)

Creating a Bag Using the Built-In list

The first implementation of Bag is mostly a wrapper around list. Since list is an ordered collection, if we drop the ordered constraint list can be used as an unordered collection. Below is a possible implementation using list:

class BagWithList(object):
  def __init__(self):
    self.collection = list()

  def add(self, item):
    self.collection.append(item)

  def size(self):
    return len(self.collection)

  def is_empty(self):
    return len(self.collection) == 0

  def __iter__(self):
    return iter(self.collection)

The official Python wiki has a page on the time complexity of some of the built-in data structures. In particular, the list operations has the following big-O in the average case:

Version Operation Big-O
list Append O(1)
list Get Length O(1)
list Iteration O(n)

Also this data structure uses O(n) space.

Using a Linked List

By default Python does not come with a version of a linked list as a native data structure. Causing us to write our own below, with a custom iterator.

class LinkedListNode(object):
  def __init__(self, value):
    self.value     = value
    self.next_node = None

  def add(self, value):
    if self.next_node == None:
      self.next_node = LinkedListNode(value)
    else:
      self.next_node.add(value)

  def size(self):
    if self.next_node == None:
      return 1
    else:
      return 1 + self.next_node.size()

  def __iter__(self):
    return LinkedListIter(self)

class LinkedListIter(object):
  def __init__(self, current_node):
    self.current_node = current_node

  def __next__(self):
    if self.current_node == None:
      raise StopIteration
    else:
      value = self.current_node.value
      self.current_node = self.current_node.next_node
      return value

We could use this implementation of a Linked List in our implementation of a Bag.

class BagWithLinkedList(object):
  def __init__(self):
    self.linked_list = None

  def add(self, value):
    if self.linked_list == None:
      self.linked_list = LinkedListNode(value)
    else:
      self.linked_list.add(value)

  def size(self):
    if self.linked_list == None:
      return 0
    else:
      return self.linked_list.size()

  def is_empty(self):
    return self.linked_list == None

  def __iter__(self):
    if self.linked_list == None:
      return iter([])
    else:
      return iter(self.linked_list)

The time complexity of these methods are not as attractive as the previous. There are other use cases for linked lists that show how to use it effectively. Despite this, it still uses O(n) space complexity, as the list version did.

Version Operation Big-O
Linked List Append O(n)
Linked List Get Length O(n)
Linked List Iteration O(n)

Using defaultdict from collections as a Multiset

A set is a commonly known mathematical object. A set is only concerned with membership, or whether an element belongs to collection or not. Like bags, sets are unordered. However, unlike bags, they only allow an element to appear once. If we relax this constraint, and keep track of the number of times an element appears, then we have a multiset.

Multisets appear in C++ as a built-in data structure. However, Python does not have anything named as a multiset. Instead, there is collections.Counter and collections.defaultdict. The former keeps track of both counts and the insertion order. We only need to keep track of counts, so we will use defaultdict with a default value of 0 by using defaultdict(int).

from collections import defaultdict

class Multiset(object):
  def __init__(self):
    self.counts = defaultdict(int)
    self.total = 0    

  def add(self, item):
    self.counts[item] += 1
    self.total += 1

  def size(self):
    return self.total

class MultisetIterator(object):
  def __init__(self, multiset):
    self.multiset   = multiset
    self.keys       = list(multiset.counts.keys()) # Choose a fixed ordering
    self.key_index  = 0 # Current `keys` index
    self.key_repeat = 0 # Number of times we've repeated the current key

  def current_key(self):
    return self.keys[self.key_index]

  def current_count(self):
    return self.multiset.counts[self.current_key()]

  def __next__(self):
    key   = self.current_key()
    count = self.current_count()

    if self.key_repeat == count:
      if self.key_index + 1 < len(self.keys):
        self.key_index += 1
        self.key_repeat = 0
      else:
        raise StopIteration

    self.key_repeat += 1
    return self.current_key()

Next, we use our Multiset in the implementation of Bag.

class BagWithMultiset(object):
  def __init__(self):
    self.multiset = Multiset()

  def add(self, item):
    self.multiset.add(item)

  def size(self):
    return self.multiset.size()

  def is_empty(self):
    return self.multiset.size() == 0

  def __iter__(self):
    return MultisetIterator(self.multiset)

This version of Bag might be the most performant, given the time and space complexity.

Version Operation Big-O Average Case Big-O Worst Case
Multiset Append O(1) O(n)
Multiset Get Length O(1) O(1)
Multiset Iteration O(n) O(n)

These results are almost on par with the list implementation. Perhaps the worst case for appending, is worth revisiting. However, the space complexity is not O(n), but in fact O(k) where k is the number of unique elements in the collection. If there are many repeated values, this might be more advantageous than the two other versions of Bag.

Conclusion

We covered the basics of Python iterators, and considered three different versions of Bag: one using list, one using our hand-rolled linked list implementation, and finally a version of a multiset using Python's defaultdict. This data structure is not very common, and does not have many advantages to ordered arrays. Nevertheless, because of its simplicity, we can use the Bag data structure as an introduction to algorithms.

With that.. happy algorithming, friends!

Top comments (0)