Skip to main content

Advanced trees

By Afshine Amidi and Shervine Amidi

Self-balancing trees

Definition

A self-balancing tree is a BST that keeps its height in O(log(n))\mathcal{O}(\log(n)) by maintaining specific properties. Examples of such trees include red-black trees and AVL trees.

Tree rotations

Most self-balancing tree operations involve tree rotations, which are done by changing the shape of the BST while keeping its properties satisfied. There are two kinds of rotations: left and right rotations. We give the recipe to do both below.

We arbitrarily focus on the three following nodes, which are useful during a tree rotation:

  • Pivot pp: Node that becomes the new root.

  • Original root rr: Root of the original tree.

  • Moving child cc: Node that changes parents.

Tree rotations have a time complexity of O(1)\mathcal{O}(1).

Left rotation
BeforeAfter
PivotRight child of rootNew root
Original rootRootLeft child of pivot
Moving childLeft child of pivotRight child of original root
Illustration
Right rotation
BeforeAfter
PivotLeft child of rootNew root
Original rootRootRight child of pivot
Moving childRight child of pivotLeft child of original root
Illustration
Remark

Performing tree rotations does not change the in-order traversal of the BST.

Red-black tree

A red-black tree is a self-balancing tree with height O(log(n))\mathcal{O}(\log(n)) that maintains the following properties:

  • Coloring:

    • Each node is either red or black.

    • The root and leaves (\texttt{NIL}) are black.

    • If a node is red, then its children are black.

  • Count: All paths from a given node to any of the leaves have the same number of black nodes, often referred to as the black-height.

AVL tree

An Adelson-Velsky and Landis (AVL) tree is a self-balancing tree with height O(log(n))\mathcal{O}(\log(n)). Its key property is that any of its nodes must have the heights hL,hRh_L, h_R of their child subtrees TL,TRT_L, T_R differ by at most 1:

hLhR1\boxed{|h_L-h_R|\leqslant1}

Efficient trees

Range query

Given an array A=[a0,...,an1]A=[a_0, ..., a_{n-1}], a range query qf(A,i,j)q_f(A, i, j) is defined as being an operation ff over elements ai,...,aja_i, ..., a_j.

The following are the most common types of range queries:

MinimumMaximumSum
Operationmin(ai,...,aj)\min(a_i, ..., a_j)max(ai,...,aj)\max(a_i, ..., a_j)ai+...+aja_i+...+a_j
Illustration

In the case of a range sum query qSq_S, we note that for 0ij<n0\leqslant i\leqslant j<n, we have:

qS(A,i,j)=qS(A,0,j)qS(A,0,i1)\boxed{q_{S}(A, i, j)=q_{S}(A, 0, j)-q_{S}(A, 0, i-1)}

This means that if we can compute qS(A,0,i)q_S(A, 0, i) for all ii, then we can handle any range sum query on array AA.

Prefix sum array

A prefix sum array P=[p0,...,pn1]P=[p_0, ..., p_{n-1}] is an array that computes range sum queries on AA in O(1)\mathcal{O}(1) time.

Construction It is built in O(n)\mathcal{O}(n) time as follows:

p0=a0andi[ ⁣[1,n1] ⁣],pi=pi1+ai\boxed{p_0=a_0}\quad\textrm{and}\quad\forall i\in[\![1, n-1]\!],\quad \boxed{p_i=p_{i-1}+a_i}

By convention, we set p10p_{-1}\triangleq0.

Access The computation of a range sum query takes O(1)\mathcal{O}(1) time as follows:

qS(A,i,j)=pjpi1\boxed{q_S(A, i, j)=p_j-p_{i-1}}

Update If we update aiai+xa_i\longleftarrow a_i+x, it takes O(n)\mathcal{O}(n) time to update PP:

k[ ⁣[i,n1] ⁣],pkpk+x\forall k\in[\![i, n-1]\!], \quad \boxed{p_k\longleftarrow p_k+x}

Binary indexed tree

A binary indexed tree BB, also called Fenwick tree, is a data structure that efficiently handles range sum queries on array A=[a0,...,an1]A=[a_0, ..., a_{n-1}]. Updates in AA are reflected in BB in O(log(n))\mathcal{O}(\log(n)) time.

Rescaling A prerequisite is to map the input array indices from [ ⁣[0,n1] ⁣][\![0, n-1]\!] to [ ⁣[1,n] ⁣][\![1,n]\!]. We rewrite AA as A=[0,a1,...,an]A'=[0, a_1', ..., a_n'] with ai=ai1a_{i}'=a_{i-1}.

Intuition Elements of the binary indexed tree leverage the binary representations of their indices i[ ⁣[1,n] ⁣]i\in[\![1,n]\!]. In order to do that, it relies on the concept of lowest set bit ll. We note lil_i the lowest set bit of ii.

A given element bib_i of the binary indexed tree BB is said to be responsible for the range of indices [ ⁣[i2li+1,i] ⁣][\![i-2^{l_i}+1, i]\!] and we have:

bi=j=i2li+1iaj\boxed{b_i=\sum_{j=i-2^{l_i}+1}^ia_j'}

The data structure can be represented either as an array or as a tree:

ArrayTree
Ranges of indices are indicated by
horizontal lines
- Ranges of indices are next to nodes
- Parent of bib_i is bi2lib_{i-2^{l_i}}
- Tree depth is O(log(n))\mathcal{O}(\log(n))

Construction The binary indexed tree is built iteratively in O(n)\mathcal{O}(n) time:

  • Initialization: Initialize BB with the values of AA':

    i[ ⁣[0,n] ⁣],biai\forall i\in[\![0, n]\!],\quad\boxed{b_i\longleftarrow a_i'}

  • Compute step: For each index i[ ⁣[1,n] ⁣]i\in[\![1,n]\!], the relevant partial sums are propagated through the overlapping ranges of responsible values:

    If i+2lin,bi+2libi+2li+bi\textrm{If }i+2^{l_i}\leqslant n,\quad\boxed{b_{i+2^{l_i}}\longleftarrow b_{i+2^{l_i}}+b_i}

Access In order to compute qS(A,0,i)q_S(A', 0, i), we need to sum relevant elements of BB in O(log(n))\mathcal{O}(\log(n)) time:

  • Initialization: We start by initializing the sum SS to bib_i.

  • Compute step: Starting from ji2lij\longleftarrow i-2^{l_i} and then by further decrements of 2lj2^{l_j}, we update SS until jj is equal to 0:

    SS+bj\boxed{S\longleftarrow S+b_j}

Update If we update aiai+xa_i'\longleftarrow a_i'+x, it takes O(log(n))\mathcal{O}(\log(n)) time to update BB:

  • Initialization: We start by updating bib_i as follows:

    bibi+x\boxed{b_i\longleftarrow b_i+x}

  • Compute step: We continue the same update process with ji+2lij\longleftarrow i+2^{l_i}, and then by further increments of 2lj2^{l_j} until reaching the maximum index of the array:

    bjbj+x\boxed{b_j\longleftarrow b_j+x}

Segment tree

A segment tree SS is a data structure represented by a binary tree of nodes s[ ⁣[i,j] ⁣]s_{[\![i, j]\!]} that is designed to support the main types of range queries (minimum, maximum, sum) on an array A=[a0,...,an1]A=[a_0, ..., a_{n-1}]. It has an update time complexity of O(log(n))\mathcal{O}(\log(n)).

Intuition Each node of the segment tree is responsible for a range [ ⁣[l,r] ⁣][\![l, r]\!] of indices of the original array. In particular, each node s[ ⁣[l,r] ⁣]s_{[\![l,r]\!]} falls into one of the following categories:

  • Case lrl\neq r: It has two children. By noting m=l+r2\displaystyle m=\left\lfloor\frac{l+r}{2}\right\rfloor:

    • Its left child is responsible for indices [ ⁣[l,m] ⁣][\![l, m]\!].

    • Its right child is responsible for indices [ ⁣[m+1,r] ⁣][\![m+1, r]\!].

  • Case l=rl=r: It has 0 child and it is responsible for index {l}\{l\}.

The root node is responsible for indices [ ⁣[0,n1] ⁣][\![0, n-1]\!]. The segment tree has a height of O(log(n))\mathcal{O}(\log(n)).

*For illustrative purposes, we are going to focus on range sum queries so that operations of the segment tree can be easily compared to those of the data structures we saw previously.}

Construction The segment tree is built recursively in O(n)\mathcal{O}(n) time, where the value of each node depends on where it is located:

  • Node has no children: Its value is the corresponding value in the original array:

    s{e}=ae\boxed{s_{\{e\}}=a_e}

  • Node has two children: Its value is the sum of the values of its child nodes:

    s[ ⁣[l,r] ⁣]=s[ ⁣[l,m] ⁣]+s[ ⁣[m+1,r] ⁣]\boxed{s_{[\![l,r]\!]}=s_{[\![l,m]\!]}+s_{[\![m+1,r]\!]}}

Access The operation qS(A,i,j)q_S(A, i, j) is done with a complexity of O(log(n))\mathcal{O}(\log(n)) time.

Given an interval I=[ ⁣[i,j] ⁣]I=[\![i, j]\!], we start from the root node of the segment tree and make recursive calls.

At node sINs_{I_N}, there are 3 possible situations:

  • Case INII_N\cap I\neq\varnothing: There is an overlap.

    • Sub-case INII_N\subseteq I: There is a total overlap, so the information of that node is added to the total sum.

    • Sub-case IN⊈II_N\not\subseteq I: There is a partial overlap, so we are going to look at both children of the node.

  • Case INI=I_N\cap I=\varnothing: There is no overlap, the answer is not there. We don't look further.

Update If we update aiai+xa_i\longleftarrow a_i+x, it takes O(log(n))\mathcal{O}(\log(n)) time to update SS.

We start at the root node and make recursive calls. We check whether ii is in the interval IN=[ ⁣[l,r] ⁣]I_N=[\![l, r]\!] associated with the node:

  • Case iINi\in I_N: The node contains information related to aia_i. We update the node by adding xx:

    sINsIN+x\boxed{s_{I_N}\longleftarrow s_{I_N}+x}

    By noting m=l+r2\displaystyle m=\left\lfloor\frac{l+r}{2}\right\rfloor:

    • Sub-case imi\leqslant m: ii is also located in its left child.

    • Sub-case i>mi > m: ii is also located in its right child.

  • Case iINi\notin I_N: The node does not contain information related to aia_i.


Want more content like this?

Subscribe here to be notified of new Super Study Guide releases!