Teaching as a form of learning: Binary Search Trees (BST)

Last week, an e-mail was sent out to an internal mailing list [1], looking for volunteers to lecture on data structures & algorithms topics, but nobody replied. I jumped at the opportunity. I volunteered to lecture on binary search trees, without any previous knowledge on the topic. Why in the world would I volunteer to teach something that I know nothing about ? Well, one of the best ways to learn something is attempt to teach it. I love teaching as much as I love learning; teaching forces you to deeply understand the subject, internalize the material, and requires distilling the concepts into digestible bits and pieces. Here’s Ken Thompson’s advice [2].

But I love teaching: the hard work of the first class, the fun of the second class. Then the misery of the third.

—Ken Thompson

In the post, I’m covering what a binary search tree is, its structure, and the basic operations: find, insert, and delete. We also cover the three methods for traversing the tree. At the end of this post are practice problems to help drive the concepts home.

Next is an example problem that glues the concepts together.

Example: Badging in & out of the office

Picture an office requiring employees to badge in and out as they enter and exit the premise. We want to keep track of the employees in the building. We also want to be check if a specific employee is currently in the building. Let’s implement a binary search tree to maintain this dataset.

As employees badge in, we insert the Employee object (using the employee ID as the key) into the three. When employees badge out, we remove them from the tree. Occassionally, we search the database asking “Is Employee X currently in this building?”

Tree structure and performance

What do binary search trees offer over other data structures [3] ?

The binary tree data structure elegantly combines the flexibility and benefits of two existing data structures: linked lists and arrays. Linked lists efficiently add and delete nodes, by updating pointers, but inefficiently search. Linked lists require iterating over every node (i.e “from the head of the list”) until the desired node is found. Arrays, on the other hand, are perfect for searching – when taking advantage of the index – but are unsuitable for insertions/deletions. When inserting at a[i], all indexes greater than i must be shifted up; when an item is deleted at a[i], all indexes greater than i must be shifted down.

This table compares the average runtime complexity of the three data structures.

Data Structure Search Insert Delete
Arrays O(1) O(N) O(N)
Linked Lists O(N) O(1) O(1)
BSTs O(logn) O(logn) O(logn)

Binary search trees are suitable for specific problems. Binary search trees can be used to solve problems such as runway reservations [3]. Before we dive into the structure and implement the methods, let’s cover some terminology.

Tree Terminology

Here’s a visual representation of a binary search tree.

    15
   /  \
  7    21
 / \
5  10

A tree consists of nodes that are connected by edges. Those lines above represent the edges.

  • Path – Walking node to node, along the edges, creates a sequence known as the path.
  • Root – The node at the very top of the tree. There is only one root in the tree.
  • Parent – Every node, except for the root, has an edge connecting to the node above it.
  • Traversechecking or performing an operation on the node
  • Leaf – a node that contains no children
  • Level – The number of generations from the root node (level 0). For example, the root’s childrens are at level 1.

Structure & Operations

The structure of a tree is straight forward. It contains a reference to its root node, which is an instance of the Node class. TheNode class contains two references: leftChild, rightChild. The leftChild references a node whos key (i.e “used for comparing”) is less than the current node’s key, and rightChild references a node with a key greater than the current node’s key. This ordering restriction is a property of BSTs, but not all trees.

Here’s some sample code of the data structure:

 public class BinaryTree<E> {
     Node root;

     private class Node<E> {
         int key;
         E data;
         private Node leftChild;
         private Node rightChild;

         @Override
         public String toString(){
             String myString = "<Node [" + this.key + "]>";
             return myString;
         }
     }
}

Searching

Searching for the node is quite easy. Our method, find, accepts a key parameter. We start at the root. If the key we are searcing for is less than the node’s key, continue walking along the edge connecting to the leftChild. If the key is greater than the node’s key, continue walking along the edge connecting to the rightChild. When the key we are searching for is equal to that of the node, return the node.

public Node<E> find(int key){
    Node<E> currentNode = this.root;
    while(currentNode !=null){
        if (currentNode.key == key){
            return currentNode;
        }

        if (key < currentNode.key){
            currentNode = currentNode.leftChild;
        } else {
            currentNode = currentNode.rightChild;
        }
    }

    return currentNode;
}

Inserting

Inserting a node is a little more involved, but not by much. We apply an algorithm similar to find but we check our base condition is when the leftChild or rightChild is null.

public void insert(int key, E data){
    Node<E> newNode = new Node<E>();
    newNode.key = key;
    newNode.data = data;
    if(root == null){
        root = newNode;
        return;
    }else{
        Node<E> currentNode = root;
        Node<E> parent;
        while(true){
            parent = currentNode;
            if(key < currentNode.key){
                currentNode = currentNode.leftChild;
                if (currentNode == null){
                    parent.leftChild = newNode;
                    return;
                }
            }else{
                currentNode = currentNode.rightChild;
                if (currentNode == null){
                    parent.rightChild = newNode;
                    return;
                }
            }

        }
    }
}

Traversing

There are three ways to traverse a binary tree:
  • Pre Order
  • In Order
  • Post Order

The most common of the three is in order. In our example code, we only print the currentNode.

public void inorder(Node<E> localRoot){

    if(localRoot !=null){
        Node<E> currentNode = localRoot;
        inorder(currentNode.leftChild);
        System.out.println(currentNode);
        inorder(currentNode.rightChild);
    }


}

public void preorder(Node<E> localRoot){
    if (localRoot != null){
        Node<E> currentNode = localRoot;
        System.out.println(currentNode);
        preorder(currentNode.leftChild);
        preorder(currentNode.rigthChild);
    }
}

public void postOrder(Node<E> localRoot){
    if (localRoot != null){
        Node<E> currentNode = localRoot;
        postOrder(currentNode.leftChild);
        postOrder(currentNode.rightChild);
        System.out.println(currentNode);
    }
}

Deleting

There are three cases we must consider when deleting a node:
  • node with zero children
  • node with single child
  • node with two children

Let’s consider the first case whend deleting a node with zero children.

 Before               After

    15                  15
   /  \                /  \
  7    21             7    21
 / \                   \
5  10                   10

Delete with zero children

We maintain a reference to the deleted node’s parent. We “delete” the node by updating the parent’s pointer (leftChild or rightChild) to null. The only edge case we consider is if the deleted node is the root. In thtat case, we just update root to null.

public boolean delete(int key){
    Node<E> currentNode = root;
    Node<E> parent = currentNode;
    boolean isLeftChild = false;

    while (currentNode.key != key){
        parent = currentNode;
        if (key < currentNode.key){
            isLeftChild = true;
            currentNode = currentNode.leftChild;
        } else {
            currentNode = currentNode.rightChild;
        }
        // node not found
        if (currentNode == null){
            return false;
        }
    }
    // Case 1: No children
    if (currentNode.leftChild == null && currentNode.rightChild == null){
        if (currentNode == root){
            root = null;
            return true;
        }
        else if (isLeftChild){
            parent.leftChild = null;
            return true;
        } else {
            parent.rightChild = null;
            return true;
        }
    }
    return false;
}

Delete node with single child

We just covered how to update the tree when a node has no children. Deleting a node, with a single child, is more involved, but just a little more. There’s four cases for the deleted node:

  • it’s the leftChild of its parent and has a leftChild
  • it’s the leftChild of its parent and has a rightChild
  • it’s the rightChild of its parent and has a leftChild
  • it’s the rightChild of its parent and has a rightChild

In the first two diagrams, we are deleting node 7. In the last two, we are deleting node 21.

     Left child with single left child

    15                15
   /  \              /  \
  7   21            5   21
 /
5

     Left Child with a single right child

    15                15
   /  \              /  \
  7   21            9   21
   \
    9

     Right child with single left child

    15                15
   /  \              /  \
  7    21           7    18
      /
     18

     Right child with single right child

    15                15
   /  \              /  \
  7    21           7    25
         \
          25
if (currentNode.leftChild != null && currentNode.rightChild == null){
    if(currentNode == root){
        root = currentNode.leftChild;
    }
    else if(isLeftChild){
        parent.leftChild = currentNode.leftChild;
    } else {
        parent.rightChild = currentNode.leftChild;
    }
}
else if(currentNode.rightChild != null && currentNode.leftChild == null){
    if (currentNode == root){
        root = currentNode.rightChild;
    }
    else if(isLeftChild){
        parent.leftChild = currentNode.rightChild;
    }
    else {
        parent.rightChild = currentNode.rightChild;
    }
}
    15                15
   /  \              /  \
  7   21            8   21
 / \               / \
5   9             5   9
   /
  8
    15
   /  \
  7   21
 / \
5   9
     \
     12
    /  \
   10   14

When deleting a node, we replace the node with its inorder successor. The successor is the minimum node, within the deleted node’s right child subtree. Take a moment and think about that. Why the right subtree, and not the left?

Would the next largest number ever be to the left of the current node? No. By definition, all nodes to the left are less thanand all node to the right are greater than. Therefore, the smallest number (greater than the current node’s key) will be the minimum node, to the right.

Once we identify the successor, we update the successor’s parent leftChild pointer with the successor’s rightChild pointer (this may be null). We finalize the connections by updating the successor’s rightChild pointer to the delete node’s rightChild.

public Node<E> getSuccesor(Node<E> toBeDeleted){
Node<E> parentSuccessor = toBeDeleted;
Node<E> successor = toBeDeleted;
Node<E> currentNode = toBeDeleted.rightChild;
while(currentNode.leftChild != null){
    parentSuccessor = successor;
    successor = currentNode;
    currentNode = currentNode.leftChild;
}
if(toBeDeleted.rightChild != successor){
    parentSuccessor.leftChild = successor.rightChild;
    successor.rightChild = toBeDeleted.rightChild;
}
return successor;

Drawbacks

The binary search tree sounds like the dream, so why use an alternative data structure?

My manager shared a story of an interview candidate who inserted a sequence of sorted items into a binary search tree. Pause for a moment and visualize the data structure. What do you get?

A linked list!

This increases the height of the tree and we lose advantage of the O(logn) performance. Binary Search tree is suited best forrandomized keys, not sorted.

Example

public class Employee {
    private int id;
    private String firstname;
    private String lastname;

    public Employee (int employeeid, String firstname, String lastname){
        this.id = employeeid;
        this.firstname = firstname;
        this.lastname = lastname;
    }

    @Override
    public String toString(){
        // < 05: Joe Schmoe >
        String myString = "< " + this.id + " " + this.firstname + " >";
        return myString;
    }

    public static void main(String[] args){
        Employee jboyd = new Employee(15, "jess", "boyd");
        Employee mstreiter = new Employee(7, "myles", "streiter");
        Employee mdavis = new Employee(21, "matt", "davis");
        Employee mmadsen= new Employee(5, "martin", "madsen");
        Employee mchung = new Employee(19, "matt", "chung");

        BinaryTree<Employee> employees = new BinaryTree<Employee>();
        employees.insert(jboyd.id, jboyd);
        employees.insert(mstreiter.id, mstreiter);
        employees.insert(mdavis.id, mdavis);
        employees.insert(mmadsen.id, mmadsen);

        // Is myles in the building?
        // returns true
        System.out.println(employees.find(mstreiter.id));

        // Jess clocks out
        employes.delete(jboyd.id);

        // Is Matt in the building?
        // returns false
        System.out.println(employees.find(mchung.id));

    }
}

Exercises

  1. Write a function that verifies that a binary search tree is valid.
    15
   /  \
  7   21
 / \
5   17

Acknowledgements

Thanks to Jess Boyd for correcting my initial draft, Matt Davis for pointing out the drawbacks of BSTs, and Martin Madsen for techniques to make the post more comprehensible.

Footnotes

[1] In a previous post, I shared how I am teaching myself data structures and algorithms. In the breakroom work, a colleague overheard a group forming to study together – the weekend bravehearts. I was lucky enough to catch the group at the head end. We meet Friday after work, and Saturday mornings. If you live in the greater Seattle area and wanted to join, e-mail me and I’ll bring you along.
[2] This is a quote pulled from interview, from the book “Coders at work”, with Ken Thompson
[3] (1, 2) MIT’s lecturer presents a scheduling problem and compare the advantages and disadvatages of previously discussed datastructures (e.g “linked lists”, “arrays”, “heaps”). I recommend watching it:https://www.youtube.com/watch?v=9Jry5-82I68