Skip to main content

Trees

By Afshine Amidi and Shervine Amidi

General concepts

Definition

A tree is a DAG with the following properties:

  • Incoming edge:

    • There is exactly one node that has no incoming edge, and that node is called the root.

    • Each of the other nodes has exactly one incoming edge.

  • Outgoing edge: A node cannot have an outgoing edge that points to itself.

There are two types of nodes in a tree:

  • Internal node: Node that has one or more children.

  • Leaf: Node that does not have any children.

Here are some examples of graphs that are not trees, along with the associated reason:

Self-loopMultiple parentsCycleMultiple roots
Remark

By construction, a tree with NN nodes has N1N-1 edges.

Notations

Names given to nodes follow a family-like vocabulary. The most common ones are presented in the table below:

ConceptDescriptionIllustration
ParentThe parent p of a node n is the predecessor of that node.

n has at most one parent.
GrandparentThe grandparent g of a node n is the predecessor of the predecessor of that node.

n has at most one grandparent.
ChildA child c of a node n is a successor of that node.

n can have 0, 1 or multiple children.
SiblingA sibling s of a node n is a different node with the same parent as n.

n can have 0, 1 or multiple siblings.
UncleAn uncle u of a node n is a child of n’s grandparent that is not its parent.

n can have 0, 1 or multiple uncles.

Depth

The depth of a given node nn, noted depth(n)\textrm{depth}(n), is the number of edges from the root rr of the tree to node nn.

Remark

The root node has a depth of 0.

Height

The height of a given node nn, noted height(n)\textrm{height}(n), is the number of edges of the longest path from node nn to the deepest leaf.

The height of a tree is the height of its root node.

Remark

Leaf nodes have a height of 0.

Lowest Common Ancestor

In a given tree, the lowest common ancestor of two nodes pp and qq, noted LCA(p,q)\textrm{LCA}(p, q), is defined as being the deepest node that has both pp and qq as descendants. The examples below use sLCA(p,q)s \triangleq \textrm{LCA}(p, q).

Remark

For the purposes of this definition, we may consider a node to be a descendant of itself.

Node distance

The distance d(p,q)d(p,q) between two nodes pp and qq is the minimum number of edges between these two nodes. If we note sLCA(p,q)s\triangleq\textrm{LCA}(p,q), we have the following formula:

d(p,q)=depth(p)+depth(q)2×depth(s)\boxed{d(p,q)=\textrm{depth}(p)+\textrm{depth}(q)-2\times\textrm{depth}(s)}

For instance, nodes pp and qq have a distance of 3 in the tree above.

Serialization

Tree serialization is a method to encode a tree into a generic format (e.g. string) without losing information.

Remark

There can be more than one way to serialize a tree. The figure above illustrates a level-by-level approach in the case of binary trees.

Binary trees

Definition

A binary tree is a tree where each node has at most 2 children. The table below sums up possible properties of binary trees:

TypeDescriptionIllustration
FullEvery node has either 0 or 2 children.
CompleteAll levels are completely filled, except possibly the last one where all nodes are to the left.
PerfectAll internal nodes have 2 children and all leaves are at the same level.
Remark

A perfect tree has exactly 2h+112^{h+1}-1 nodes with hh the height of the tree.

Diameter

The diameter of a binary tree is the longest distance between any of its two nodes.

For example, the binary tree above has a diameter of 5.

Main tree traversals

The table below summarizes the 3 main ways to recursively traverse a binary tree:

TraversalAlgorithmIllustrationExample
Pre-order1. Visit the root
2. Visit the left subtree
3. Visit the right subtree
In-order1. Visit the left subtree
2. Visit the root
3. Visit the right subtree
Post-order1. Visit the left subtree
2. Visit the right subtree
3. Visit the root
Remark

Pre-order, in-order and post-order traversals are all variations of DFS.

Balanced tree

A binary tree is said to be balanced if the difference in height of each node restricted to its left and right subtrees is at most 1.

BalancedNot balanced

Heaps

Definition

A heap is a complete binary tree with an additional property that makes it either a min-heap or a max-heap:

TypeDescriptionIllustrationExample
Min-heapThe value of each node is lower than its children’s, if any.
By definition, the root has the lowest value.
Max-heapThe value of each node is higher than its children’s, if any.
By definition, the root has the highest value.

Since a heap is a complete binary tree, it is convenient to represent it with an array of size nn, where nn is the number of nodes.

For a given node of index i[ ⁣[0,n1] ⁣]i\in[\![0, n-1]\!],

  • its parent has an index of i12\displaystyle\left\lfloor\frac{i-1}{2}\right\rfloor

  • its left child has an index of 2i+12i+1 and its right child has an index of 2i+22i+2

The following parts will use max-heaps for consistency. However, one could also use min-heaps to obtain the same results.

Heapify up

Let's suppose that the last child is potentially not fulfilling the properties of the max-heap. The heapify up operation, also called bubble up, aims at finding the correct place for this node.

This operation is done in O(log(n))\mathcal{O}(\log(n)) time:

  • Update step: While the node's parent has a lower value than the node's, swap the two.

  • Final step: We now have a valid max-heap.

Heapify down

Let’s suppose that the root is potentially not fulfilling the properties of the max-heap. The heapify down operation, also called *bubble down, aims at finding the correct place for this node.

This operation is done in O(log(n))\mathcal{O}(\log(n)) time:

  • Update step: While the highest-value child of the node has a higher value than the node's, swap the two.

  • Final step: We now have a valid max-heap.

Operations

The main operations that can be performed on a max-heap are explained below:

Search We distinguish two cases:

  • Maximum value: Look at the value corresponding to the root of the heap. It takes O(1)\mathcal{O}(1) time.

  • Any other value: Traverse the tree, given that we have no information as to where each node is. It takes O(n)\mathcal{O}(n) time.

Insertion It takes O(log(n))\mathcal{O}(\log(n)) time.

  • Placeholder step: Add new node as the last child.

  • Heapify up step: Heapify up the child to its final position.

Deletion It takes O(log(n))\mathcal{O}(\log(n)) time.

  • Swap step: Swap node with last child and remove new last child.

  • Heapify step: Move the newly-placed node to its final position depending on the situation:

    • Node's value is higher than parent's: Heapify up to its final position.

    • Node's value is lower than highest of its children: Heapify down to its final position.

    • Node's value is lower than parent's and higher than highest of its children: There is nothing to do.

k smallest elements

Given array A=[a0,...,an1]A=[a_0, ..., a_{n-1}], the goal is to find the kk smallest elements, with knk\leqslant n.

Trick Use a max-heap of size kk.

Algorithm The solution is found in O(nlog(k))\mathcal{O}(n\log(k)) time and O(k)\mathcal{O}(k) space:

  • Initialization:

    • Set an empty max-heap that will keep track of the kk smallest elements.

    • Add the first kk elements into the max-heap.

At any given point, we note MM the value of the root of the max-heap, i.e. its maximum value.

  • Update step: We need to see whether any of the remaining elements ai[ ⁣[k,n1] ⁣]a_{i\in[\![k, n-1]\!]} is potentially part of the kk smallest elements:

    • If ai<Ma_i < M, pop the max-heap and insert aia_i in O(log(k))\mathcal{O}(\log(k)) time.

    • If aiMa_i\geqslant M, it means that the element is greater than any of the current kk smallest elements. We do not do anything. The check is done in O(1)\mathcal{O}(1) time.

  • Final step: The max-heap contains the kk smallest elements.

Remark

Similarly, the kk largest elements can be retrieved using a min-heap.

Binary search trees

Definition

A binary search tree (BST) is a binary tree where each node has the following properties:

  • Its value is greater than any node values in its left subtree

  • Its value is less than any node values in its right subtree

We note hh the height of the BST, where hh can go anywhere between:

Best caseWorst case
DescriptionTree is balancedEvery node has at most 1 child
Heighthlog(n)h\approx\log(n)h=nh=n
Illustration

Operations

The main operations that can be performed on a BST each have a time complexity of O(h)\mathcal{O}(h) and are explained below:

Search Starting from the root, compare the node value with the value vv we want to search for.

  • Node value different than vv:

    • If it is higher than vv, go to its left child.

    • If it is lower than vv, go to its right child.

  • Node value equal to vv: We found the target node.

Continue this process recursively until either finding the target node or hitting the end of the tree, in which case vv is not present in the BST.

Insertion Let's suppose we want to insert a node of value vv in the BST.

In order to do that, we check if there is already a node with the same value in O(h)\mathcal{O}(h) time. At the end of the search, there are two possible situations:

  • Node is found: There is nothing else to do, since the element that we want to insert is already in the tree.

  • Node is not found: By construction, the last node seen during the search is a leaf.

    • If vv is higher than the value of the last node, add it as its right child.

    • If vv is lower than the value of the last node, add it as its left child.

Deletion The process depends on the number of children of the node to be deleted. We have the following situations:

  • 0 child: Delete the node.

  • 1 child: Replace the node with its child.

  • 2 children: Replace the node with the max of its left subtree, which is also equal to the in-order predecessor.

N-ary trees

Definition

An NN-ary tree is a tree where each node has at most NN children.

Trie

A trie, also called prefix tree, is an NN-ary tree that allows for efficient storing and fast retrieval of words. The path from the root node to another given node forms a word that is deduced by concatenating the characters along that path.

Each node nn is defined by the following quantities:

  • A character cc.

  • A hash table h={(c1,n1),...,(ck,nk)}h=\{(c_1, n_1), ..., (c_k, n_k)\} gathering the children of that node, where cic_i is the child's character and nin_i its associated node.

  • A boolean that indicates whether the word formed by the path leading to that node is in the dictionary or not ◯.

By convention, the root is a node for which cc is an empty string.

Note that even though all characters of a word are present in the correct order, it does not necessarily mean that the word itself is present in the trie. In order to ensure that the word is indeed in the trie, an important additional condition is for the boolean of the last character to be set to .

The operations that can be done with a trie are described below:

Search We would like to know whether a word ww is in the trie.

  • Character step: Starting from the root, traverse the trie by checking one character of ww at a time. If any character is missing, it means that the word is not present.

  • Word step: Check whether the boolean of the node corresponding to the last character is set to . If it is not, it means that the word is not present.

Insertion We would like to insert a word ww in the trie.

  • Character step: Starting from the root, traverse the trie one character of ww at a time. Starting from the first missing character (if applicable), add corresponding nodes until completing the word.

  • Word step: Set the boolean of the last character to .

Deletion We would like to remove a word ww from the trie.

  • Character step: Traverse the trie one character of ww at a time.

  • Word step: Set the boolean of the last character to ◯.

Remark

Use cases of a trie include searching for a word starting with a given prefix.


Want more content like this?

Subscribe here to be notified of new Super Study Guide releases!