Skip to main content

Algorithmic concepts

By Afshine Amidi and Shervine Amidi

Overview

Algorithm

Given a problem, an algorithm A\mathcal{A} is a set of well-defined instructions that runs in a finite amount of time and space. It receives an input II and returns an output OO that satisfies the constraints of the problem.

As an example, a problem can be to check whether a number is even. In order to do that, an algorithm could be to check whether the number is divisible by 2.

Iteration

An iterative algorithm is an algorithm that runs through a sequence of actions. It is characterized by either a for or a while loop.

Suppose we want to return the sum of all of the elements of a given list. An example of an iterative algorithm would be to sequentially add each element of the list to a variable, and return its final value.

Recursion

A recursive algorithm uses a function that calls itself. It is composed of the following components:

  • Base case: This is the set of inputs for which the outputs are known.
  • Recursive formula: The answer of the current step is based on function calls relying on previous steps, eventually using the base case answer.

A classic problem is to compute the power of a number xnx^n without explicitly using the power operation. In order to do that, a recursive solution could rely on the following cases:

{x0=1 is knownxn=xn2×xn2when nN is evenxn=x×xn12×xn12when nN is odd\begin{cases} \: x^0=1&\textrm{ is known}\\ \: x^n=x^{\frac{n}{2}}\times x^{\frac{n}{2}}&\textrm{when }n\in\mathbb{N}^*\textrm{ is even}\\ \: x^n=x\times x^{\frac{n-1}{2}}\times x^{\frac{n-1}{2}}&\textrm{when }n\in\mathbb{N}^*\textrm{ is odd} \end{cases}

Call stack

In a recursive algorithm, the space used by function calls cic_i is called the stack space.

Stack overflow

The problem of stack overflow occurs when a recursive algorithm uses more stack space than the maximum allowed NN.

A solution to circumvent this bottleneck is to convert the code from being recursive to being iterative so that it relies on memory space, which is typically bigger than stack space.

Memoization

Memoization is an optimization technique aimed at speeding up the runtime by storing results of expensive function calls and returning the cache when the same result is needed.

Types of algorithms

Brute-force

A brute-force approach aims at listing all the possible output candidates of a problem and checking whether any of them satisfies the constraints. It is generally the least efficient way of solving a problem.

To illustrate this technique, let's consider the following problem: given a sorted array AA, we want to return all pairs of elements that sum up to a given number.

  • A brute-force approach would try all possible pairs (ai,aj)(a_i, a_j) and return those that sum up to that number. This method produces the desired result but not in minimal time.
  • A non-brute-force approach could use the fact that the array is sorted and scan the array using the two-pointer technique.

Backtracking

A backtracking algorithm recursively generates potential solutions and prunes those that do not satisfy the problem constraints. It can be seen as a version of brute-force that discards invalid candidates as soon as possible.

As an example, the NN-Queens problem aims at finding a configuration of NN queens on a N×NN\times N chessboard where no two queens attack each other. A backtracking approach would consist of placing queens one at a time and prune solution branches that involve queens attacking each other.

Greedy

A greedy algorithm makes choices that are seen as optimal at every given step. However, it is important to note that the resulting solution may not be optimal globally. This technique often leads to relatively low-complexity algorithms that work reasonably well within the constraints of the problem.

To illustrate this concept, let's consider the problem of finding the longest path from a given starting point in a weighted graph. A greedy approach constructs the final path by iteratively selecting the next edge that has the highest weight. The resulting solution may miss a longer path that has large edge weights "hidden" behind a low-weighted edge.

Divide and conquer

A divide and conquer (D&C) algorithm computes the final result of a problem by recursively dividing it into independent subproblems:

  • Divide: The problem is divided into several independent subproblems.

  • Conquer: Each subproblem is solved independently.

  • Combine: The result of each subproblem is combined to find the final answer.

Algorithms following the D&C principle include sorting algorithms such as merge sort and quick sort.

Dynamic Programming

Dynamic programming (DP) is a method of problem resolution that relies on finding answers to overlapping subproblems.

A common example of problem resolution using DP is the computation of Fibonacci numbers.

There are two main approaches:

Top-down This approach finds the target value by recursively computing previous values.

  • Step 1: Try computing the desired value FnF_n and notice that it is based on previous values.

  • Step 2: Try computing the previous values, which themselves rely on earlier values, some of which may have already been computed. In this case, we can use memoization to avoid duplicate operations.

  • Step 3: Use the newly computed values to deduce FnF_n.

Bottom-up This approach starts from already-known results and iteratively computes succeeding values until it reaches the target value.

  • Step 1: Compute F0F_0, F1F_1, F2F_2, etc. in a predetermined way. These values are typically stored in an array.

  • Step 2: Deduce FnF_n.

In summary, the two main ways of solving a problem with DP are:

Top-downBottom-up
Recursive approachIterative approach
Remark

A key difference between DP and D&C strategies is that DP relies on overlapping subproblems whereas D&C bases itself on independent subproblems.

Complexity

Definition

The concept of complexity is used to quantify the efficiency of an algorithm. There are two types of complexities:

  • Time: How many operations are made?
  • Space: How much extra space is needed?

Both measures are usually given as a function of the input size nn, although other parameters can also be used.

Notations

The complexity ff of an algorithm can be described using a known function gg with the notations described in the table below:

NotationDefinitionMeaningIllustration
f=o(g)f=o(g)

"little oh of gg"
ϵ>0,n0,nn0\forall\epsilon>0, \exists n_0, \forall n \geqslant n_0

f(n)ϵg(n)\vert f(n)\vert\leqslant\epsilon\vert g(n)\vert
Negligible compared to gg

f(n)n+g(n)f(n) \underset{n\rightarrow+\infty}{\ll} g(n)
f=O(g)f=\mathcal{O}(g)

"big oh of gg"
c>0,n0,nn0\exists c>0, \exists n_0, \forall n\geqslant n_0

f(n)cg(n)\vert f(n)\vert \leqslant c\vert g(n)\vert
Upper-bounded by gg

f(n)n+g(n)f(n)\underset{n\rightarrow+\infty}{\leqslant}g(n)
f=Ω(g)f=\Omega(g)

"omega of gg"
c>0,n0,nn0\exists c>0, \exists n_0, \forall n\geqslant n_0

f(n)cg(n)\vert f(n)\vert \geqslant c\vert g(n)\vert
Lower-bounded by gg

f(n)n+g(n)f(n)\underset{n\rightarrow+\infty}{\geqslant}g(n)
f=Θ(g)f=\Theta(g)

"theta of gg"
c1,c2>0,n0,nn0\exists c_1, c_2>0, \exists n_0, \forall n\geqslant n_0

f(n)c1g(n)\vert f(n)\vert \geqslant c_1\vert g(n)\vert
f(n)c2g(n)\vert f(n)\vert \leqslant c_2\vert g(n)\vert
Similar to gg

f(n)n+g(n)f(n)\underset{n\rightarrow+\infty}{\sim}g(n)
Remark

The big oh notation is frequently used to describe the time and space complexity of a given algorithm.

Orders of magnitude

The table below highlights the main kinds of runtime complexities T(n)T(n) as a function of the input size nn:

ComplexityExample of application
O(1)
"constant"
Hash table lookup: Accessing the value of a given key does not depend on the size of the hash table.
O(log(n))
"logarithmic"
Binary search: The search space is divided by 2 at each iteration.
O(n)
"linear"
Linear search: All elements of the input are visited.
O(n log(n))
"linearithmic"
Merge sort: The array is broken down by two at every step and each step requires all elements to be checked.
O(n**2)
"quadratic"
Bubble sort: Each pair of elements of the input array is checked.
O(2**n)
"exponential"
0/1 knapsack problem: The naive approach would consist of trying out every combination of items in the final set.
O(n!)
"factorial"
Traveling salesman problem: The naive approach consists of trying all possible permutations of cities to visit.

The following graph shows the difference in their evolutions:

As a rule of thumb, we have:

O(1)<O(log(n))<O(n)<O(nlog(n))<O(n2)<O(2n)<O(n!)\boxed{\mathcal{O}(1) < \mathcal{O}(\log(n)) < \mathcal{O}(n) < \mathcal{O}(n\log(n)) < \mathcal{O}(n^2) < \mathcal{O}(2^n) < \mathcal{O}(n!)}

Master theorem

The master theorem gives an explicit solution of a runtime T(n)T(n) that satisfies a recursive relationship of the form below:

T(n)=aT(nb)+Θ(nd)\boxed{T(n)=aT\left(\frac{n}{b}\right)+\Theta(n^d)}

The solution depends on the relationship between aNa\in\mathbb{N}^*, bN\{1}b\in\mathbb{N}^*\backslash\{1\} and d0d\geqslant0:

CaseSolution T(n)T(n)Example of application
d<logb(a)d < \log_b(a)Θ(nlogb(a))\Theta(n^{\log_b(a)})Binary tree traversal: After a node is visited, we move on
to its left and right subtrees.

T(n)=2T(n2)T(n)=Θ(n)\displaystyle T(n)=2T\left(\frac{n}{2}\right)\quad\longrightarrow\quad T(n)=\Theta(n)
d=logb(a)d = \log_b(a)Θ(ndlog(n))\Theta(n^{d}\log(n))Binary search: The search space is divided by 2 at
each pass.

T(n)=T(n2)T(n)=Θ(log(n))\displaystyle T(n)=T\left(\frac{n}{2}\right)\quad\longrightarrow\quad T(n)=\Theta(\log(n))
d>logb(a)d > \log_b(a)Θ(nd)\Theta(n^{d})T(n)=4T(n2)+Θ(n3)T(n)=Θ(n3)\displaystyle T(n)=4T\left(\frac{n}{2}\right)+\Theta(n^3)\quad\longrightarrow\quad T(n)=\Theta(n^3)

Problem complexity

Problems can be divided into classes that quantify how hard it is to solve them and to verify whether a proposed solution works. The table below presents two well-known classes:

Problem classDescriptionExample
P
polynomial time
The problem can be solved in polynomial time.Sorting: An array of n elements can be sorted in polynomial time with selection sort.
NP
nondeterministic polynomial time
A solution to the problem can be verified inpolynomial time.Traveling salesman problem: Given a path of n cities and a target value, we can determine whether the lengthof the path is lower or equal than the target value in O(n) time.

We note the following:

  • PNP\textrm{P}\subseteq \textrm{NP}: If it is possible to find a solution in polynomial time, then it is possible to verify a candidate in polynomial time by computing the solution directly.
  • P?NP\textrm{P}\stackrel{?}{\supseteq}\textrm{NP}: It is unclear whether being able to verify a solution in polynomial time implies that we can solve the problem in polynomial time. The general consensus is that P⊉NP\textrm{P}\not\supseteq\textrm{NP} (hence PNP\textrm{P}\neq\textrm{NP}) but this has not been formally proven yet.

Any problem in NP can be reduced in polynomial time to the following set of problems:

  • NP-hard problems, that are at least as hard as NP problems
  • NP-complete problems, that are NP-hard problems in NP


Want more content like this?

Subscribe here to be notified of new Super Study Guide releases!