Skip to main content

Arrays and strings

By Afshine Amidi and Shervine Amidi

Arrays

Definition

An array AA is a data structure of fixed size nn that stores elements a0,...,an1a_{0}, ..., a_{n-1} in a sequential way. Each element aia_i can be accessed by its index ii.

Most programming languages start their indices from 0. The resulting arrays are said to be 0-indexed.

Operations

The main operations that can be performed on an array are explained below:

TypeTimeDescriptionIllustration
AccessO(1)Using index i, we can directly access the ith element of the array with A[i].
SearchO(n)We need to search the array by checking each element one by one until finding the desired value.
InsertionO(n)1. Elements at indices i and up are moved to the right.
2. The new element is inserted at index i.

Note that if there is no space for the new element to be added in the existing array, we need to create a bigger array and copy existing elements over there.
DeletionO(n)1. Elements at indices i+1 and up are moved to the left.
2. The former last element of the array is either ignored or removed.

Kadane's algorithm

Kadane's algorithm computes the maximum subarray sum of a given array AA.

Trick Scan the array and keep track of the following quantities:

  • SicurrentS_{i}^\textrm{\,current}: the maximum subarray sum of AA up to index ii that contains aia_i.

  • SimaxS_{i}^\textrm{\,max}: the maximum subarray sum of AA up to index ii.

Algorithm The solution is found in O(n)\mathcal{O}(n) time and O(1)\mathcal{O}(1) space:

  • Initialization: Set S0current=S0max=a0S_{0}^\textrm{\,current} = S_{0}^\textrm{\,max} = a_0.

  • Update step: For each index i[ ⁣[1,n1] ⁣]i\in[\![1, n-1]\!], compute SicurrentS_{i}^\textrm{\,current} by distinguishing two cases:

    • Case Si1current<0S_{i-1}^\textrm{\,current} < 0: Appending aia_i to the currently tracked subarray sum is worse than starting over. We set Sicurrent=aiS_{i}^\textrm{\,current} = a_i.

    • Case Si1current0S_{i-1}^\textrm{\,current} \geqslant 0: It is worth continuing on the current subarray and appending aia_i to it. We set Sicurrent=Si1current+aiS_{i}^\textrm{\,current} = S_{i-1}^\textrm{\,current} + a_i.

      Then, the maximum subarray sum seen so far is updated if a new maximum is detected:

      Simax=max(Si1max,Sicurrent)S_{i}^\textrm{\,max} = \max\left(S_{i-1}^\textrm{\,max}, S_{i}^\textrm{\,current}\right)

In the end, we output the maximum subarray sum of the array:

Smax=Sn1max\boxed{S^\textrm{\,max} = S_{n-1}^\textrm{\,max}}
Remark

In order to guarantee a space complexity of O(1)\mathcal{O}(1), values of ScurrentS^{\textrm{current}} and SmaxS^{\textrm{max}} are updated in place at each step ii.

Merge intervals

The merge intervals problem is a classic problem that aims at producing an array of non-overlapping intervals based on an array I=[I0,...,In1]I=[I_0, ..., I_{n-1}] of intervals that are potentially overlapping.

We note:

  • I=[I0,...,In1]I=[I_{0}, ..., I_{n-1}] the input array of potentially-overlapping intervals, with:

    Ik=[ak,bk] and akbk for k[ ⁣[0,n1] ⁣]I_k=[a_k, b_k]\quad\textrm{ and }\quad a_k\leqslant b_k\quad\quad\textrm{ for }k\in[\![0, n-1]\!]

  • I=[I0,...,Im1]I'=[I_{0}', ..., I_{m-1}'] the output array of non-overlapping intervals, with:

    Ik=[ak,bk] andakbk for k[ ⁣[0,m1] ⁣] with mnI_k'=[a_k', b_k']\quad\textrm{ and}\quad a_k'\leqslant b_k'\quad\quad\textrm{ for }k\in[\![0, m-1]\!]\textrm{ with }m\leqslant n

The output array II' is obtained in O(nlog(n))\mathcal{O}(n\log(n)) time and O(n)\mathcal{O}(n) space:

  • Initialization:

    • The input array II is sorted with respect to the lower bound aka_k of each interval IkI_k in O(nlog(n))\mathcal{O}(n\log(n)) time. Without loss of generality, we renumber intervals so that a0...an1a_0 \leqslant ... \leqslant a_{n-1}.

    • The output array is initialized with I0=I0I_0'=I_0.

  • Update step: Starting from k=1k=1, we would like to incorporate Ik=[ak,bk]I_k=[a_k, b_k] into the output array. By noting [ai,bj][a_i', b_j] the last added interval of II', we have the following cases:

    • Case bk<bjb_k < b_j: IkI_k does not add any new information since it is entirely overlapping with the last added interval. We discard IkI_k.

    • Case akbjbka_k\leqslant b_j\leqslant b_k: IkI_k is partially overlapping with the last added interval, so we update the current upper bound to bkb_k.

    • Case bj<akb_j < a_k: There is no overlap between IkI_k and the last added interval, so we add IkI_k to the final array II' at index i+1i+1.

We repeat the update step for all intervals IkI_k in a sequential way to obtain the output array of non-overlapping intervals II'.

Strings

Definition

A string ss is a data structure that can be seen as an array of characters s0,...,sn1s_0, ..., s_{n-1}.

The following table summarizes a few categories of strings that have special properties:

TypeDescriptionExample
PalindromeReads the same backward and forward.kayak, madam, radar
AnagramForms another string using a rearrangement of its letters.listen, silent
StrobogrammaticRemains the same when rotated by 180 degrees.69, 818, 1001

Longest substring

Given a string s0...sn1s_{0}...s_{n-1}, the goal of this problem is to find the length of the longest substring that does not contain repeated characters.

A naive approach would check the constraint on every possible substring using a nested \texttt{for}-loop in O(n2)\mathcal{O}(n^2) time. However, there is a more efficient approach that uses the sliding window trick.

Trick Use a left pointer ll and a right pointer rr to delimit a window which has the constraint to not have repeated characters. We have the following rules:

  • Constraint is not violated: The current solution may have not reached its full potential yet. The rr pointer advances until the constraint is violated.

  • Constraint is violated: The current solution is not valid anymore, so we need to trim the solution down. The ll pointer advances until the constraint is no longer violated.

Algorithm The solution is found in O(n)\mathcal{O}(n) time and O(n)\mathcal{O}(n) space:

  • Initialization:

    • Initialize an empty hash set that aims at storing the characters that are part of the current substring.

    • Set a global counter cc to 0, which keeps track of the maximum substring length encountered so far.

    • Set the left pointer ll and right pointer rr to 0.

  • Compute step: srs_r is the new character proposed to be added to the current substring.

    • If srs_r is already in the hash set, then adding it again would make us have srs_r twice in the substring.

      Increment ll and remove sls_l from the hash set until srs_r is not there anymore. The substring gets trimmed down.

    • Now, the updated substring is ready to have srs_r without violating the constraint. Add srs_r to the hash set.

    • Check whether the length rl+1r-l+1 of the resulting substring is greater than the counter cc. If it is, store the new maximum in cc.

    • Increment rr by 1.

Repeat this process until rr reaches the end of the string.

  • Final step: cc is the length of the biggest substring without repeated characters.
Remark

The sliding window trick is found in many string problems.


Want more content like this?

Subscribe here to be notified of new Super Study Guide releases!