# Arrays and strings

*By Afshine Amidi and Shervine Amidi*

## Arrays

### Definition

An array $A$ is a data structure of fixed size $n$ that stores elements $a_{0}, ..., a_{n-1}$ in a sequential way. Each element $a_i$ can be accessed by its index $i$.

Most programming languages start their indices from 0. The resulting arrays are said to be 0-indexed.

### Operations

The main operations that can be performed on an array are explained below:

Type | Time | Description | Illustration |
---|---|---|---|

Access | O(1) | Using index i, we can directly access the ith element of the array with A[i]. | |

Search | O(n) | We need to search the array by checking each element one by one until finding the desired value. | |

Insertion | O(n) | 1. Elements at indices i and up are moved to the right. 2. The new element is inserted at index i. Note that if there is no space for the new element to be added in the existing array, we need to create a bigger array and copy existing elements over there. | |

Deletion | O(n) | 1. Elements at indices i+1 and up are moved to the left. 2. The former last element of the array is either ignored or removed. |

### Kadane's algorithm

Kadane's algorithm computes the maximum subarray sum of a given array $A$.

**Trick** Scan the array and keep track of the following quantities:

$S_{i}^\textrm{\,current}$: the maximum subarray sum of $A$ up to index $i$ that contains $a_i$.

$S_{i}^\textrm{\,max}$: the maximum subarray sum of $A$ up to index $i$.

**Algorithm** The solution is found in $\mathcal{O}(n)$ time and $\mathcal{O}(1)$ space:

*Initialization:*Set $S_{0}^\textrm{\,current} = S_{0}^\textrm{\,max} = a_0$.*Update step:*For each index $i\in[\![1, n-1]\!]$, compute $S_{i}^\textrm{\,current}$ by distinguishing two cases:*Case $S_{i-1}^\textrm{\,current} < 0$:*Appending $a_i$ to the currently tracked subarray sum is worse than starting over. We set $S_{i}^\textrm{\,current} = a_i$.*Case $S_{i-1}^\textrm{\,current} \geqslant 0$:*It is worth continuing on the current subarray and appending $a_i$ to it. We set $S_{i}^\textrm{\,current} = S_{i-1}^\textrm{\,current} + a_i$.Then, the maximum subarray sum seen so far is updated if a new maximum is detected:

$S_{i}^\textrm{\,max} = \max\left(S_{i-1}^\textrm{\,max}, S_{i}^\textrm{\,current}\right)$

In the end, we output the maximum subarray sum of the array:

In order to guarantee a space complexity of $\mathcal{O}(1)$, values of $S^{\textrm{current}}$ and $S^{\textrm{max}}$ are updated in place at each step $i$.

### Merge intervals

The merge intervals problem is a classic problem that aims at producing an array of non-overlapping intervals based on an array $I=[I_0, ..., I_{n-1}]$ of intervals that are potentially overlapping.

We note:

$I=[I_{0}, ..., I_{n-1}]$ the input array of potentially-overlapping intervals, with:

$I_k=[a_k, b_k]\quad\textrm{ and }\quad a_k\leqslant b_k\quad\quad\textrm{ for }k\in[\![0, n-1]\!]$$I'=[I_{0}', ..., I_{m-1}']$ the output array of non-overlapping intervals, with:

$I_k'=[a_k', b_k']\quad\textrm{ and}\quad a_k'\leqslant b_k'\quad\quad\textrm{ for }k\in[\![0, m-1]\!]\textrm{ with }m\leqslant n$

The output array $I'$ is obtained in $\mathcal{O}(n\log(n))$ time and $\mathcal{O}(n)$ space:

*Initialization:*The input array $I$ is sorted with respect to the lower bound $a_k$ of each interval $I_k$ in $\mathcal{O}(n\log(n))$ time. Without loss of generality, we renumber intervals so that $a_0 \leqslant ... \leqslant a_{n-1}$.

The output array is initialized with $I_0'=I_0$.

*Update step:*Starting from $k=1$, we would like to incorporate $I_k=[a_k, b_k]$ into the output array. By noting $[a_i', b_j]$ the last added interval of $I'$, we have the following cases:*Case $b_k < b_j$:*$I_k$ does not add any new information since it is entirely overlapping with the last added interval. We discard $I_k$.*Case $a_k\leqslant b_j\leqslant b_k$:*$I_k$ is partially overlapping with the last added interval, so we update the current upper bound to $b_k$.*Case $b_j < a_k$:*There is no overlap between $I_k$ and the last added interval, so we add $I_k$ to the final array $I'$ at index $i+1$.

We repeat the *update step* for all intervals $I_k$ in a sequential way to obtain the output array of non-overlapping intervals $I'$.

## Strings

### Definition

A string $s$ is a data structure that can be seen as an array of characters $s_0, ..., s_{n-1}$.

The following table summarizes a few categories of strings that have special properties:

Type | Description | Example |
---|---|---|

Palindrome | Reads the same backward and forward. | kayak, madam, radar |

Anagram | Forms another string using a rearrangement of its letters. | listen, silent |

Strobogrammatic | Remains the same when rotated by 180 degrees. | 69, 818, 1001 |

### Longest substring

Given a string $s_{0}...s_{n-1}$, the goal of this problem is to find the length of the longest substring that does not contain repeated characters.

A naive approach would check the constraint on every possible substring using a nested \texttt{for}-loop in $\mathcal{O}(n^2)$ time. However, there is a more efficient approach that uses the sliding window trick.

**Trick** Use a left pointer $l$ and a right pointer $r$ to delimit a window which has the constraint to not have repeated characters. We have the following rules:

*Constraint is not violated:*The current solution may have not reached its full potential yet. The $r$ pointer advances until the constraint is violated.*Constraint is violated:*The current solution is not valid anymore, so we need to trim the solution down. The $l$ pointer advances until the constraint is no longer violated.

**Algorithm** The solution is found in $\mathcal{O}(n)$ time and $\mathcal{O}(n)$ space:

*Initialization:*Initialize an empty hash set that aims at storing the characters that are part of the current substring.

Set a global counter $c$ to 0, which keeps track of the maximum substring length encountered so far.

Set the left pointer $l$ and right pointer $r$ to 0.

*Compute step:*$s_r$ is the new character proposed to be added to the current substring.If $s_r$ is already in the hash set, then adding it again would make us have $s_r$ twice in the substring.

Increment $l$ and remove $s_l$ from the hash set until $s_r$ is not there anymore. The substring gets trimmed down.

Now, the updated substring is ready to have $s_r$ without violating the constraint. Add $s_r$ to the hash set.

Check whether the length $r-l+1$ of the resulting substring is greater than the counter $c$. If it is, store the new maximum in $c$.

Increment $r$ by 1.

Repeat this process until $r$ reaches the end of the string.

*Final step:*$c$ is the length of the biggest substring without repeated characters.

The sliding window trick is found in many string problems.

**Subscribe here** to be notified of new Super Study Guide releases!