# Graphs

By Afshine Amidi and Shervine Amidi

## General concepts​

### Definition​

A graph $G$ is defined by its vertices $V$ and edges $E$ and is often noted $G = (V, E)$. The following table summarizes the two main types of graph:

Undirected graphDirected graph
Edges do not have a direction.Edges have a direction.
Remark

The terms "node" and "vertex" can be used interchangeably.

### Graph representation​

Let's consider the following graph:

It can be represented in two different ways:

TypeDescriptionIllustration
list
Collection of unordered lists where each
entry $V_i$ maps to all the nodes $V_j$ such
that $E_{ij}$ exists in the graph.
matrix
Matrix of boolean values where the
entry $(i, j)$ indicates whether $E_{ij}$ is
present in the graph.
In an undirected graph, this matrix is
always symmetric.

### Degree​

A node can have the following characteristics based on its adjacent edges:

GraphNotationDefinitionIllustration
UndirectedDegreeNumber of connected edges
DirectedIn-DegreeNumber of connected inbound edges.
Out-DegreeNumber of connected outbound edges
Remark

Nodes of even (resp. odd) degree are called even (resp. odd) nodes.

### Handshaking lemma​

In an undirected graph, the sum of node degrees is an even number.

Indeed, every edge links two nodes, therefore responsible for increasing the sum of degrees by 2 (one per node). We have the following formula:

$\boxed{\sum_{v\in V}\textrm{deg}(v) = 2|E|}$
Remark

A consequence of the above formula is that there is an even number of odd nodes.

### Cyclicity​

The cyclicity of a graph is a property that depends on whether the graph has a cycle:

AcyclicCyclic
Does not contain a cycleContains at least a cycle
Remark

A directed acyclic graph is commonly abbreviated as DAG.

### Properties​

A graph can have any of the following properties:

TypeDescriptionIllustration
CompleteEvery pair of vertices is connected by an edge.
ConnectedThere exists a path between every pair of nodes.
BipartiteVertices can be split into two disjoint sets such that every edge of the graph links a vertex from one set to one in the other.

## Graph traversal​

Breadth-First Search (BFS) is an algorithm that explores a graph level by level.

The graph traversal is performed in $\mathcal{O}(|V| + |E|)$ time and $\mathcal{O}(|V|)$ space:

• Initialization: The following quantities are tracked:

• Queue $q$ that keeps track of the next nodes to potentially visit. In the beginning, the first node is the only element inside.

• Hash set $h_{v}$ that keeps track of visited nodes. It is initially empty.

• Update step: Dequeue element from $q$. We have the following cases:

• Node already visited: Skip it.

• Node not visited yet: Add it to the set $h_{v}$ of visited nodes, and look at its neighbors: if they were not visited before, enqueue them in $q$.

• Final step: Repeat the update step until the queue gets empty. The set $h_{v}$ represents the set of visited nodes.

Depth-First Search (DFS) is an algorithm that explores a graph by prioritizing depth.

The graph traversal is performed in $\mathcal{O}(|V|+|E|)$ time and $\mathcal{O}(|V|)$ space:

• Initialization: The following quantities are tracked:

• Stack $s$ that keeps track of the next nodes to potentially visit. In the beginning, the first node is the only element inside.

• Hash set $h_{v}$ that keeps track of visited nodes. It is initially empty.

• Update step: Pop element from $s$. We have the following cases:

• Node already visited: Skip it.

• Node not visited yet: Add it to the set $h_{v}$ of visited nodes, and look at its neighbors: if they were not visited before, push them to $s$.

• Final step: Repeat the update step until the stack gets empty. The set $h_{v}$ represents the set of visited nodes.

Remark

DFS can also be implemented recursively.

### Graph traversal summary​

The table below highlights the main differences between BFS and DFS:

Breadth-First Search (BFS)Depth-First Search (DFS)
MindsetLevel by levelDepth first
Possible
approaches
Iteration using a queue- Iteration using a stack
- Recursion
Illustration

### Number of islands​

A classic application of graph traversal algorithms is to count the number of islands in an $m\times n$ grid, where each cell is either marked as $\texttt{1}$ (land) or $\texttt{0}$ (water).

Trick Perform a BFS starting from each unvisited land cell of the grid to explore the associated island and skip cells that have already been visited.

Algorithm The entire grid is explored in $\mathcal{O}(m\times n)$ time and $\mathcal{O}(m\times n)$ space as follows:

• Initialization: Set the counter $c$ that tracks the number of islands to 0.

• Explore step: For each cell of the grid, we have the following situations:

• Water not visited: Skip this cell as it is not part of any island.

• Land not visited: Perform a BFS that starts from that cell and visits all neighboring land cells. This process uses a temporary queue $q$ to track nodes to visit, and marks visited nodes by changing their values to $\texttt{X}$.

At the end of the exploration, add 1 to the number of islands $c$.

• Water/Land already visited: Skip this cell as it was already explored.

• Final step: The number of islands in the grid is equal to $c$.

We note that:

• Using an iterative graph traversal algorithm is helpful in preventing stack overflow in case the exploration of an island leads to too many stack calls.
• It would have been equally possible to make island explorations using DFS.

### Robot room cleaner​

The goal of this classic problem is to clean a room composed of $n$ cells using a robot cleaner that can only perform the 3 following actions:

Turn rightMove cellClean cell
Rotate 90◦ to the right while staying in the same cell.Move forward by one cell in the current direction, provided that there is no obstacle.Clean current cell.

The configuration of the room is not known by the robot but is assumed to be of finite size. It has obstacles along the way that the robot needs to avoid. We assume that the robot initially starts in a cell with no obstacle.

Trick

• Perform a DFS using an exploration strategy that consists of trying all possible directions and that backtracks if an obstacle is found.

• Rotating to the right can be done by keeping track of the direction $(d_x, d_y)$ that the robot is pointing towards.

Algorithm The robot cleans the room in $\mathcal{O}(n)$ time and $\mathcal{O}(n)$ space as follows:

• Initialization: We take the convention that:

• The initial cell of the robot is $(x, y) = (0, 0)$.

• The robot is initially facing up: $(d_x, d_y) = (0, 1)$.

A hash set is used to keep track of the coordinates of the cells visited by the robot.

• Explore step: This is the main step where the robot cleans the cell and moves to an unvisited cell.

The robot performs the following actions:

• It cleans the cell.

• It tries to visit each of the 4 neighboring cells by right-rotating and trying to move there. If the neighboring cell is an obstacle or has already been visited, the robot skips it.

• When it does not have any more new cells to visit, the robot backtracks by turning right twice, moving and turning right twice again to be pointing back to the initial direction.

• Final step: When the robot cleaned all cells, it backtracks to its original cell.

### Topological sort​

In a given directed graph, a topological sort (topsort) is a linear ordering of the nodes such that for all directed edges $E_{ij}=(V_i, V_j)$, $V_i$ appears before $V_j$.

We note that:

• A graph can have multiple topological orderings.

• A DAG always has a topological sort.

• If a graph contains a cycle, then it cannot have a valid ordering.

### Kahn's algorithm​

Kahn's algorithm aims at finding a topological ordering of a given DAG. It finds a solution in $\mathcal{O}(|V|+|E|)$ time and $\mathcal{O}(|V|)$ space:

• Initialization: Initialize an empty array that keeps track of the final ordering of vertices.

• Compute step: Repeat the following procedure until all nodes are visited:

• Pick a random node of in-degree 0 and visit it.

• Decrement by 1 the in-degrees of the nodes that the newly-visited node is connected to.

• Final step: When all nodes are visited, the array contains a valid ordering of vertices.

## Shortest path​

### Weighted graph​

A weighted graph is a graph where each edge $E_{ij}$ has an associated weight $w_{ij}$.

Remark

When the weight is interpreted as a distance, it is often noted $d_{i, j}$.

### Dijkstra's algorithm​

Dijkstra's algorithm is a greedy algorithm that aims at finding the shortest paths between a source node $V_S$ and all other nodes $j\in V$ of a weighted graph that has no negative edge weights.

The algorithm finds a solution in $\mathcal{O}(|E|\log(|V|))$ time and $\mathcal{O}(|V|)$ space:

• Initialization:

• Set the distances $d_{V_S,\hspace{0.2em}j}$ from $V_S$ to each of the other nodes $j\in V$ to $+\infty$ to indicate that they are initially unreachable.

• Set the distance $d_{V_S,\hspace{0.2em}V_S}$ from $V_S$ to itself to 0.

• Initialize a hash table $h_p$ that keeps track of the chosen predecessor for each node. By convention, the predecessor of $V_S$ is itself.

• Compute step: Repeat the following procedure until all nodes are visited:

• Pick the unvisited node $i$ with the lowest distance $d_{V_S,\hspace{0.2em}i}$ and visit it.

• Look at the unvisited nodes $j$ that have an edge coming from the newly-visited node $i$.

• Case $d_{V_S,\hspace{0.2em}j} > d_{V_S,\hspace{0.2em}i} + d_{i, j}$: This means that the distance of the path from node $V_S$ to $j$ via node $i$ is smaller than the current distance. We update the corresponding distance:

$\boxed{d_{V_S,\hspace{0.2em}j} \longleftarrow d_{V_S,\hspace{0.2em}i} + d_{i, j}}$

Also, we update $h_p$ to indicate that the predecessor of $j$ is now $i$.

• Case $d_{V_S,\hspace{0.2em}j} \leqslant d_{V_S,\hspace{0.2em}i} + d_{i, j}$: This means that the proposed path does not improve the current distance. We do not make any updates.

• Final step:

• The final distances are those of the shortest paths between $V_S$ and each of the other nodes $j\in V$ of the graph.

• The associated shortest paths can be reconstructed using $h_p$.

Remark

Given that choices are based on local optima, Dijkstra's algorithm does not guarantee a correct solution when there are negative edge weights. Indeed, if such an edge were to be found "too late" in the exploration process, the algorithm may visit a node earlier than desired and produce a suboptimal path.

### A* algorithm​

The $\textrm{A}^{\star}$ algorithm is an extension of Dijkstra's algorithm that is used when the goal is to find the shortest path between a source node $V_S$ and a fixed target node $V_T$.

To better direct the search towards the target node, it modifies the distance used to select the next node by using a heuristic $h_{j,\hspace{0.2em}V_T}$ that approximates the remaining distance between unvisited nodes $j$ and the target node $V_T$.

The distance used to determine the next node $j$ to visit is given by $d^{\hspace{0.05em}A^{\star}}$:

$\boxed{d_{V_S,\hspace{0.2em}j}^{\hspace{0.05em}A^{\star}}=d_{V_S,\hspace{0.2em}j}+h_{j,\hspace{0.2em}V_T}}$
Remark

The $\textrm{A}^{\star}$ algorithm heavily relies on a good choice of heuristic $h$.

### Bellman-Ford algorithm​

The Bellman-Ford algorithm aims at finding the shortest paths between a source node $V_S$ and all other nodes $j\in V$ of a weighted graph.

A solution is found in $\mathcal{O}(|V|\times|E|)$ time and $\mathcal{O}(|V|)$ space:

• Initialization:

• Set the distances $d_{V_S,\hspace{0.2em}j}$ from $V_S$ to each of the other nodes $j\in V$ to $+\infty$ to indicate that they are initially unreachable.

• Set the distance $d_{V_S,\hspace{0.2em}V_S}$ from $V_S$ to itself to 0.

• Initialize a hash table $h_p$ that keeps track of the chosen predecessor for each node. By convention, the predecessor of $V_S$ is itself.

• Compute step: Repeat the following procedure until there is no more updates, which is at most $|V|-1$ times:

For each node $i$ of the graph, look at its neighbors $j$ and update the distance $d_{V_S,\hspace{0.2em}j}$ between the source node $V_S$ and the node $j$ depending on the following situations:

• Case $d_{V_S,\hspace{0.2em}j} > d_{V_S,\hspace{0.2em}i} + d_{i, j}$: This means that the distance of the path from node $V_S$ to $j$ via node $i$ is smaller than the current distance. We update the corresponding distance:

$\boxed{d_{V_S,\hspace{0.2em}j} \longleftarrow d_{V_S,\hspace{0.2em}i} + d_{i, j}}$

Also, we update $h_p$ to indicate that the predecessor of $j$ is now $i$.

• Case $d_{V_S,\hspace{0.2em}j} \leqslant d_{V_S,\hspace{0.2em}i} + d_{i, j}$: This means that the proposed path does not improve the current distance. We do not make any updates.

• Final step:

• The final distances are those of the shortest paths between $V_S$ and each of the other nodes $j\in V$ of the graph.

• The associated shortest paths can be reconstructed using $h_p$.

Compared to Dijkstra's algorithm, Bellman-Ford's algorithm has the advantage of supporting negative edge weights. However, it cannot run on a graph having negative cycles, i.e. cycles with a negative sum of edge weights.

### Floyd-Warshall algorithm​

The Floyd-Warshall algorithm aims at finding the shortest paths between all pairs of nodes of a weighted graph.

A solution is found in $\mathcal{O}(|V|^3)$ time and $\mathcal{O}(|V|^2)$ space:

• Initialization: Initialize a matrix of distances of size $|V|\times|V|$, where each cell $(i, j)$ represents the distance of the shortest path from node $i$ to node $j$. We know that:

• The distance of a node to itself is 0, so each element of the diagonal is equal to 0.

• The distances of nodes linked by edges are given.

• Update step: For each intermediary node $k\in V$, loop through start nodes $i\in V$ and end nodes $j \in V$:

• Case $d_{i,j} > d_{i,k} + d_{k,j}$: This means that the distance from node $i$ to $j$ via node $k$ is smaller than the current distance. We make the following update:

$\boxed{d_{i, j} \longleftarrow d_{i, k} + d_{k, j}}$

• Case $d_{i,j} \leqslant d_{i,k} + d_{k,j}$: This means that the proposed path does not improve the current distance. We do not make any updates.

• Final step: The resulting matrix gives the shortest path between each pair of nodes $i$ and $j$.

This algorithm allows for negatively weighted edges but it does not allow for negative cycles like the other shortest path algorithms.

Remark

In the same fashion as in Dijkstra's and Bellman-Ford's algorithms, we can reconstruct the resulting shortest paths by keeping track of nodes during updates.

### Summary​

The differences between the main shortest paths algorithms are summarized below:

AlgorithmShortest path
between
Negative
weights
TimeSpace
Dijkstra$V_S$ and all other nodesNo$\mathcal{O}(\vert E\vert\log(\vert V\vert))$$\mathcal{O}(\vert V\vert)$
$\textrm{A}^\star$$V_S$ and $V_T$No$\mathcal{O}(\vert E\vert\log(\vert V\vert))$$\mathcal{O}(\vert V\vert)$
Bellman-Ford$V_S$ and all other nodesYes$\mathcal{O}(\vert V\vert\times\vert E\vert))$$\mathcal{O}(\vert V\vert)$
Floyd-WarshallAll pairs of nodesYes$\mathcal{O}(\vert V\vert^3)$$\mathcal{O}(\vert V\vert^2)$

Want more content like this?

Subscribe here to be notified of new Super Study Guide releases!