# Graphs

*By Afshine Amidi and Shervine Amidi*

## General concepts

### Definition

A graph $G$ is defined by its vertices $V$ and edges $E$ and is often noted $G = (V, E)$. The following table summarizes the two main types of graph:

Undirected graph | Directed graph |
---|---|

Edges do not have a direction. | Edges have a direction. |

The terms "node" and "vertex" can be used interchangeably.

### Graph representation

Let's consider the following graph:

It can be represented in two different ways:

Type | Description | Illustration |
---|---|---|

Adjacency list | Collection of unordered lists where each entry $V_i$ maps to all the nodes $V_j$ such that $E_{ij}$ exists in the graph. | |

Adjacency matrix | Matrix of boolean values where the entry $(i, j)$ indicates whether $E_{ij}$ is present in the graph. In an undirected graph, this matrix is always symmetric. |

### Degree

A node can have the following characteristics based on its adjacent edges:

Graph | Notation | Definition | Illustration |
---|---|---|---|

Undirected | Degree | Number of connected edges | |

Directed | In-Degree | Number of connected inbound edges. | |

Out-Degree | Number of connected outbound edges |

Nodes of even (resp. odd) degree are called even (resp. odd) nodes.

### Handshaking lemma

In an undirected graph, the sum of node degrees is an even number.

Indeed, every edge links two nodes, therefore responsible for increasing the sum of degrees by 2 (one per node). We have the following formula:

A consequence of the above formula is that there is an even number of odd nodes.

### Cyclicity

The cyclicity of a graph is a property that depends on whether the graph has a cycle:

Acyclic | Cyclic |
---|---|

Does not contain a cycle | Contains at least a cycle |

A directed acyclic graph is commonly abbreviated as DAG.

### Properties

A graph can have any of the following properties:

Type | Description | Illustration |
---|---|---|

Complete | Every pair of vertices is connected by an edge. | |

Connected | There exists a path between every pair of nodes. | |

Bipartite | Vertices can be split into two disjoint sets such that every edge of the graph links a vertex from one set to one in the other. |

## Graph traversal

### Breadth-First Search

Breadth-First Search (BFS) is an algorithm that explores a graph level by level.

The graph traversal is performed in $\mathcal{O}(|V| + |E|)$ time and $\mathcal{O}(|V|)$ space:

*Initialization:*The following quantities are tracked:Queue $q$ that keeps track of the next nodes to potentially visit. In the beginning, the first node is the only element inside.

Hash set $h_{v}$ that keeps track of visited nodes. It is initially empty.

*Update step:*Dequeue element from $q$. We have the following cases:*Node already visited:*Skip it.*Node not visited yet:*Add it to the set $h_{v}$ of visited nodes, and look at its neighbors: if they were not visited before, enqueue them in $q$.

*Final step:*Repeat the*update step*until the queue gets empty. The set $h_{v}$ represents the set of visited nodes.

### Depth-First Search

Depth-First Search (DFS) is an algorithm that explores a graph by prioritizing depth.

The graph traversal is performed in $\mathcal{O}(|V|+|E|)$ time and $\mathcal{O}(|V|)$ space:

*Initialization:*The following quantities are tracked:Stack $s$ that keeps track of the next nodes to potentially visit. In the beginning, the first node is the only element inside.

Hash set $h_{v}$ that keeps track of visited nodes. It is initially empty.

*Update step:*Pop element from $s$. We have the following cases:*Node already visited:*Skip it.*Node not visited yet:*Add it to the set $h_{v}$ of visited nodes, and look at its neighbors: if they were not visited before, push them to $s$.

*Final step:*Repeat the*update step*until the stack gets empty. The set $h_{v}$ represents the set of visited nodes.

DFS can also be implemented recursively.

### Graph traversal summary

The table below highlights the main differences between BFS and DFS:

Breadth-First Search (BFS) | Depth-First Search (DFS) | |
---|---|---|

Mindset | Level by level | Depth first |

Possibleapproaches | Iteration using a queue | - Iteration using a stack - Recursion |

Illustration |

### Number of islands

A classic application of graph traversal algorithms is to count the number of islands in an $m\times n$ grid, where each cell is either marked as $\texttt{1}$ (land) or $\texttt{0}$ (water).

**Trick** Perform a BFS starting from each unvisited *land* cell of the grid to explore the associated island and skip cells that have already been visited.

**Algorithm** The entire grid is explored in $\mathcal{O}(m\times n)$ time and $\mathcal{O}(m\times n)$ space as follows:

*Initialization:*Set the counter $c$ that tracks the number of islands to 0.*Explore step:*For each cell of the grid, we have the following situations:*Water not visited:*Skip this cell as it is not part of any island.*Land not visited:*Perform a BFS that starts from that cell and visits all neighboring land cells. This process uses a temporary queue $q$ to track nodes to visit, and marks visited nodes by changing their values to $\texttt{X}$.At the end of the exploration, add 1 to the number of islands $c$.

*Water/Land already visited:*Skip this cell as it was already explored.

*Final step:*The number of islands in the grid is equal to $c$.

We note that:

- Using an iterative graph traversal algorithm is helpful in preventing stack overflow in case the exploration of an island leads to too many stack calls.
- It would have been equally possible to make island explorations using DFS.

### Robot room cleaner

The goal of this classic problem is to clean a room composed of $n$ cells using a robot cleaner that can only perform the 3 following actions:

Turn right | Move cell | Clean cell |
---|---|---|

Rotate 90◦ to the right while staying in the same cell. | Move forward by one cell in the current direction, provided that there is no obstacle. | Clean current cell. |

The configuration of the room is not known by the robot but is assumed to be of finite size. It has obstacles along the way that the robot needs to avoid. We assume that the robot initially starts in a cell with no obstacle.

**Trick**

Perform a DFS using an exploration strategy that consists of trying all possible directions and that backtracks if an obstacle is found.

Rotating to the right can be done by keeping track of the direction $(d_x, d_y)$ that the robot is pointing towards.

**Algorithm** The robot cleans the room in $\mathcal{O}(n)$ time and $\mathcal{O}(n)$ space as follows:

*Initialization:*We take the convention that:The initial cell of the robot is $(x, y) = (0, 0)$.

The robot is initially facing up: $(d_x, d_y) = (0, 1)$.

A hash set is used to keep track of the coordinates of the cells visited by the robot.

*Explore step:*This is the main step where the robot cleans the cell and moves to an unvisited cell.The robot performs the following actions:

It cleans the cell.

It tries to visit each of the 4 neighboring cells by right-rotating and trying to move there. If the neighboring cell is an obstacle or has already been visited, the robot skips it.

When it does not have any more new cells to visit, the robot backtracks by turning right twice, moving and turning right twice again to be pointing back to the initial direction.

*Final step:*When the robot cleaned all cells, it backtracks to its original cell.

### Topological sort

In a given directed graph, a topological sort (topsort) is a linear ordering of the nodes such that for all directed edges $E_{ij}=(V_i, V_j)$, $V_i$ appears before $V_j$.

We note that:

A graph can have multiple topological orderings.

A DAG always has a topological sort.

If a graph contains a cycle, then it cannot have a valid ordering.

### Kahn's algorithm

Kahn's algorithm aims at finding a topological ordering of a given DAG. It finds a solution in $\mathcal{O}(|V|+|E|)$ time and $\mathcal{O}(|V|)$ space:

*Initialization:*Initialize an empty array that keeps track of the final ordering of vertices.*Compute step:*Repeat the following procedure until all nodes are visited:Pick a random node of in-degree 0 and visit it.

Decrement by 1 the in-degrees of the nodes that the newly-visited node is connected to.

*Final step:*When all nodes are visited, the array contains a valid ordering of vertices.

## Shortest path

### Weighted graph

A weighted graph is a graph where each edge $E_{ij}$ has an associated weight $w_{ij}$.

When the weight is interpreted as a distance, it is often noted $d_{i, j}$.

### Dijkstra's algorithm

Dijkstra's algorithm is a greedy algorithm that aims at finding the shortest paths between a source node $V_S$ and all other nodes $j\in V$ of a weighted graph that has no negative edge weights.

The algorithm finds a solution in $\mathcal{O}(|E|\log(|V|))$ time and $\mathcal{O}(|V|)$ space:

*Initialization:*Set the distances $d_{V_S,\hspace{0.2em}j}$ from $V_S$ to each of the other nodes $j\in V$ to $+\infty$ to indicate that they are initially unreachable.

Set the distance $d_{V_S,\hspace{0.2em}V_S}$ from $V_S$ to itself to 0.

Initialize a hash table $h_p$ that keeps track of the chosen predecessor for each node. By convention, the predecessor of $V_S$ is itself.

*Compute step:*Repeat the following procedure until all nodes are visited:Pick the unvisited node $i$ with the lowest distance $d_{V_S,\hspace{0.2em}i}$ and visit it.

Look at the unvisited nodes $j$ that have an edge coming from the newly-visited node $i$.

*Case $d_{V_S,\hspace{0.2em}j} > d_{V_S,\hspace{0.2em}i} + d_{i, j}$:*This means that the distance of the path from node $V_S$ to $j$ via node $i$ is smaller than the current distance. We update the corresponding distance:$\boxed{d_{V_S,\hspace{0.2em}j} \longleftarrow d_{V_S,\hspace{0.2em}i} + d_{i, j}}$Also, we update $h_p$ to indicate that the predecessor of $j$ is now $i$.

*Case $d_{V_S,\hspace{0.2em}j} \leqslant d_{V_S,\hspace{0.2em}i} + d_{i, j}$:*This means that the proposed path does not improve the current distance. We do not make any updates.

*Final step:*The final distances are those of the shortest paths between $V_S$ and each of the other nodes $j\in V$ of the graph.

The associated shortest paths can be reconstructed using $h_p$.

Given that choices are based on local optima, Dijkstra's algorithm does not guarantee a correct solution when there are negative edge weights. Indeed, if such an edge were to be found "too late" in the exploration process, the algorithm may visit a node earlier than desired and produce a suboptimal path.

### A* algorithm

The $\textrm{A}^{\star}$ algorithm is an extension of Dijkstra's algorithm that is used when the goal is to find the shortest path between a source node $V_S$ and a fixed target node $V_T$.

To better direct the search towards the target node, it modifies the distance used to select the next node by using a heuristic $h_{j,\hspace{0.2em}V_T}$ that approximates the remaining distance between unvisited nodes $j$ and the target node $V_T$.

The distance used to determine the next node $j$ to visit is given by $d^{\hspace{0.05em}A^{\star}}$:

The $\textrm{A}^{\star}$ algorithm heavily relies on a good choice of heuristic $h$.

### Bellman-Ford algorithm

The Bellman-Ford algorithm aims at finding the shortest paths between a source node $V_S$ and all other nodes $j\in V$ of a weighted graph.

A solution is found in $\mathcal{O}(|V|\times|E|)$ time and $\mathcal{O}(|V|)$ space:

*Initialization:*Set the distances $d_{V_S,\hspace{0.2em}j}$ from $V_S$ to each of the other nodes $j\in V$ to $+\infty$ to indicate that they are initially unreachable.

Set the distance $d_{V_S,\hspace{0.2em}V_S}$ from $V_S$ to itself to 0.

Initialize a hash table $h_p$ that keeps track of the chosen predecessor for each node. By convention, the predecessor of $V_S$ is itself.

*Compute step:*Repeat the following procedure until there is no more updates, which is at most $|V|-1$ times:For each node $i$ of the graph, look at its neighbors $j$ and update the distance $d_{V_S,\hspace{0.2em}j}$ between the source node $V_S$ and the node $j$ depending on the following situations:

*Case $d_{V_S,\hspace{0.2em}j} > d_{V_S,\hspace{0.2em}i} + d_{i, j}$:*This means that the distance of the path from node $V_S$ to $j$ via node $i$ is smaller than the current distance. We update the corresponding distance:$\boxed{d_{V_S,\hspace{0.2em}j} \longleftarrow d_{V_S,\hspace{0.2em}i} + d_{i, j}}$Also, we update $h_p$ to indicate that the predecessor of $j$ is now $i$.

*Case $d_{V_S,\hspace{0.2em}j} \leqslant d_{V_S,\hspace{0.2em}i} + d_{i, j}$:*This means that the proposed path does not improve the current distance. We do not make any updates.

*Final step:*The final distances are those of the shortest paths between $V_S$ and each of the other nodes $j\in V$ of the graph.

The associated shortest paths can be reconstructed using $h_p$.

Compared to Dijkstra's algorithm, Bellman-Ford's algorithm has the advantage of supporting negative edge weights. However, it cannot run on a graph having negative cycles, i.e. cycles with a negative sum of edge weights.

### Floyd-Warshall algorithm

The Floyd-Warshall algorithm aims at finding the shortest paths between all pairs of nodes of a weighted graph.

A solution is found in $\mathcal{O}(|V|^3)$ time and $\mathcal{O}(|V|^2)$ space:

*Initialization:*Initialize a matrix of distances of size $|V|\times|V|$, where each cell $(i, j)$ represents the distance of the shortest path from node $i$ to node $j$. We know that:The distance of a node to itself is 0, so each element of the diagonal is equal to 0.

The distances of nodes linked by edges are given.

*Update step:*For each intermediary node $k\in V$, loop through start nodes $i\in V$ and end nodes $j \in V$:*Case $d_{i,j} > d_{i,k} + d_{k,j}$:*This means that the distance from node $i$ to $j$ via node $k$ is smaller than the current distance. We make the following update:$\boxed{d_{i, j} \longleftarrow d_{i, k} + d_{k, j}}$*Case $d_{i,j} \leqslant d_{i,k} + d_{k,j}$:*This means that the proposed path does not improve the current distance. We do not make any updates.

*Final step:*The resulting matrix gives the shortest path between each pair of nodes $i$ and $j$.

This algorithm allows for negatively weighted edges but it does not allow for negative cycles like the other shortest path algorithms.

In the same fashion as in Dijkstra's and Bellman-Ford's algorithms, we can reconstruct the resulting shortest paths by keeping track of nodes during updates.

### Summary

The differences between the main shortest paths algorithms are summarized below:

Algorithm | Shortest path between | Negative weights | Time | Space |
---|---|---|---|---|

Dijkstra | $V_S$ and all other nodes | No | $\mathcal{O}(\vert E\vert\log(\vert V\vert))$ | $\mathcal{O}(\vert V\vert)$ |

$\textrm{A}^\star$ | $V_S$ and $V_T$ | No | $\mathcal{O}(\vert E\vert\log(\vert V\vert))$ | $\mathcal{O}(\vert V\vert)$ |

Bellman-Ford | $V_S$ and all other nodes | Yes | $\mathcal{O}(\vert V\vert\times\vert E\vert))$ | $\mathcal{O}(\vert V\vert)$ |

Floyd-Warshall | All pairs of nodes | Yes | $\mathcal{O}(\vert V\vert^3)$ | $\mathcal{O}(\vert V\vert^2)$ |

**Subscribe here** to be notified of new Super Study Guide releases!