# Advanced trees

*By Afshine Amidi and Shervine Amidi*

## Self-balancing trees

### Definition

A self-balancing tree is a BST that keeps its height in $\mathcal{O}(\log(n))$ by maintaining specific properties. Examples of such trees include red-black trees and AVL trees.

### Tree rotations

Most self-balancing tree operations involve tree rotations, which are done by changing the shape of the BST while keeping its properties satisfied. There are two kinds of rotations: left and right rotations. We give the recipe to do both below.

We arbitrarily focus on the three following nodes, which are useful during a tree rotation:

*Pivot*$p$: Node that becomes the new root.*Original root*$r$: Root of the original tree.*Moving child*$c$: Node that changes parents.

Tree rotations have a time complexity of $\mathcal{O}(1)$.

Left rotation | ||
---|---|---|

Before | After | |

Pivot | Right child of root | New root |

Original root | Root | Left child of pivot |

Moving child | Left child of pivot | Right child of original root |

Illustration |

Right rotation | ||
---|---|---|

Before | After | |

Pivot | Left child of root | New root |

Original root | Root | Right child of pivot |

Moving child | Right child of pivot | Left child of original root |

Illustration |

Performing tree rotations does not change the in-order traversal of the BST.

### Red-black tree

A red-black tree is a self-balancing tree with height $\mathcal{O}(\log(n))$ that maintains the following properties:

*Coloring:*Each node is either red or black.

The root and leaves (\texttt{NIL}) are black.

If a node is red, then its children are black.

*Count:*All paths from a given node to any of the leaves have the same number of black nodes, often referred to as the black-height.

### AVL tree

An Adelson-Velsky and Landis (AVL) tree is a self-balancing tree with height $\mathcal{O}(\log(n))$. Its key property is that any of its nodes must have the heights $h_L, h_R$ of their child subtrees $T_L, T_R$ differ by at most 1:

## Efficient trees

### Range query

Given an array $A=[a_0, ..., a_{n-1}]$, a range query $q_f(A, i, j)$ is defined as being an operation $f$ over elements $a_i, ..., a_j$.

The following are the most common types of range queries:

Minimum | Maximum | Sum | |
---|---|---|---|

Operation | $\min(a_i, ..., a_j)$ | $\max(a_i, ..., a_j)$ | $a_i+...+a_j$ |

Illustration |

In the case of a range sum query $q_S$, we note that for $0\leqslant i\leqslant j<n$, we have:

This means that if we can compute $q_S(A, 0, i)$ for all $i$, then we can handle any range sum query on array $A$.

### Prefix sum array

A prefix sum array $P=[p_0, ..., p_{n-1}]$ is an array that computes range sum queries on $A$ in $\mathcal{O}(1)$ time.

**Construction** It is built in $\mathcal{O}(n)$ time as follows:

By convention, we set $p_{-1}\triangleq0$.

**Access** The computation of a range sum query takes $\mathcal{O}(1)$ time as follows:

**Update** If we update $a_i\longleftarrow a_i+x$, it takes $\mathcal{O}(n)$ time to update $P$:

### Binary indexed tree

A binary indexed tree $B$, also called Fenwick tree, is a data structure that efficiently handles range sum queries on array $A=[a_0, ..., a_{n-1}]$. Updates in $A$ are reflected in $B$ in $\mathcal{O}(\log(n))$ time.

**Rescaling** A prerequisite is to map the input array indices from $[\![0, n-1]\!]$ to $[\![1,n]\!]$. We rewrite $A$ as $A'=[0, a_1', ..., a_n']$ with $a_{i}'=a_{i-1}$.

**Intuition** Elements of the binary indexed tree leverage the binary representations of their indices $i\in[\![1,n]\!]$. In order to do that, it relies on the concept of lowest set bit $l$. We note $l_i$ the lowest set bit of $i$.

A given element $b_i$ of the binary indexed tree $B$ is said to be *responsible* for the range of indices $[\![i-2^{l_i}+1, i]\!]$ and we have:

The data structure can be represented either as an array or as a tree:

Array | Tree |
---|---|

Ranges of indices are indicated by horizontal lines | - Ranges of indices are next to nodes - Parent of $b_i$ is $b_{i-2^{l_i}}$ - Tree depth is $\mathcal{O}(\log(n))$ |

**Construction** The binary indexed tree is built iteratively in $\mathcal{O}(n)$ time:

*Initialization:*Initialize $B$ with the values of $A'$:$\forall i\in[\![0, n]\!],\quad\boxed{b_i\longleftarrow a_i'}$*Compute step:*For each index $i\in[\![1,n]\!]$, the relevant partial sums are propagated through the overlapping ranges of responsible values:$\textrm{If }i+2^{l_i}\leqslant n,\quad\boxed{b_{i+2^{l_i}}\longleftarrow b_{i+2^{l_i}}+b_i}$

**Access** In order to compute $q_S(A', 0, i)$, we need to sum relevant elements of $B$ in $\mathcal{O}(\log(n))$ time:

*Initialization:*We start by initializing the sum $S$ to $b_i$.*Compute step:*Starting from $j\longleftarrow i-2^{l_i}$ and then by further decrements of $2^{l_j}$, we update $S$ until $j$ is equal to 0:$\boxed{S\longleftarrow S+b_j}$

**Update** If we update $a_i'\longleftarrow a_i'+x$, it takes $\mathcal{O}(\log(n))$ time to update $B$:

*Initialization:*We start by updating $b_i$ as follows:$\boxed{b_i\longleftarrow b_i+x}$*Compute step:*We continue the same update process with $j\longleftarrow i+2^{l_i}$, and then by further increments of $2^{l_j}$ until reaching the maximum index of the array:$\boxed{b_j\longleftarrow b_j+x}$

### Segment tree

A segment tree $S$ is a data structure represented by a binary tree of nodes $s_{[\![i, j]\!]}$ that is designed to support the main types of range queries (minimum, maximum, sum) on an array $A=[a_0, ..., a_{n-1}]$. It has an update time complexity of $\mathcal{O}(\log(n))$.

**Intuition** Each node of the segment tree is responsible for a range $[\![l, r]\!]$ of indices of the original array. In particular, each node $s_{[\![l,r]\!]}$ falls into one of the following categories:

*Case $l\neq r$:*It has two children. By noting $\displaystyle m=\left\lfloor\frac{l+r}{2}\right\rfloor$:Its left child is responsible for indices $[\![l, m]\!]$.

Its right child is responsible for indices $[\![m+1, r]\!]$.

*Case $l=r$:*It has 0 child and it is responsible for index $\{l\}$.

The root node is responsible for indices $[\![0, n-1]\!]$. The segment tree has a height of $\mathcal{O}(\log(n))$.

*For illustrative purposes, we are going to focus on range sum queries so that operations of the segment tree can be easily compared to those of the data structures we saw previously.}

**Construction** The segment tree is built recursively in $\mathcal{O}(n)$ time, where the value of each node depends on where it is located:

*Node has no children:*Its value is the corresponding value in the original array:$\boxed{s_{\{e\}}=a_e}$*Node has two children:*Its value is the sum of the values of its child nodes:$\boxed{s_{[\![l,r]\!]}=s_{[\![l,m]\!]}+s_{[\![m+1,r]\!]}}$

**Access** The operation $q_S(A, i, j)$ is done with a complexity of $\mathcal{O}(\log(n))$ time.

Given an interval $I=[\![i, j]\!]$, we start from the root node of the segment tree and make recursive calls.

At node $s_{I_N}$, there are 3 possible situations:

*Case $I_N\cap I\neq\varnothing$:*There is an overlap.*Sub-case $I_N\subseteq I$:*There is a total overlap, so the information of that node is added to the total sum.*Sub-case $I_N\not\subseteq I$:*There is a partial overlap, so we are going to look at both children of the node.

*Case $I_N\cap I=\varnothing$:*There is no overlap, the answer is not there. We don't look further.

**Update** If we update $a_i\longleftarrow a_i+x$, it takes $\mathcal{O}(\log(n))$ time to update $S$.

We start at the root node and make recursive calls. We check whether $i$ is in the interval $I_N=[\![l, r]\!]$ associated with the node:

*Case $i\in I_N$:*The node contains information related to $a_i$. We update the node by adding $x$:$\boxed{s_{I_N}\longleftarrow s_{I_N}+x}$By noting $\displaystyle m=\left\lfloor\frac{l+r}{2}\right\rfloor$:

*Sub-case $i\leqslant m$:*$i$ is also located in its left child.*Sub-case $i > m$:*$i$ is also located in its right child.

*Case $i\notin I_N$:*The node does not contain information related to $a_i$.

**Subscribe here** to be notified of new Super Study Guide releases!