Skip to main content

Graphs

By Afshine Amidi and Shervine Amidi

General concepts

Definition

A graph GG is defined by its vertices VV and edges EE and is often noted G=(V,E)G = (V, E). The following table summarizes the two main types of graph:

Undirected graphDirected graph
Edges do not have a direction.Edges have a direction.
Remark

The terms "node" and "vertex" can be used interchangeably.

Graph representation

Let's consider the following graph:

It can be represented in two different ways:

TypeDescriptionIllustration
Adjacency
list
Collection of unordered lists where each
entry ViV_i maps to all the nodes VjV_j such
that EijE_{ij} exists in the graph.
Adjacency
matrix
Matrix of boolean values where the
entry (i,j)(i, j) indicates whether EijE_{ij} is
present in the graph.
In an undirected graph, this matrix is
always symmetric.

Degree

A node can have the following characteristics based on its adjacent edges:

GraphNotationDefinitionIllustration
UndirectedDegreeNumber of connected edges
DirectedIn-DegreeNumber of connected inbound edges.
Out-DegreeNumber of connected outbound edges
Remark

Nodes of even (resp. odd) degree are called even (resp. odd) nodes.

Handshaking lemma

In an undirected graph, the sum of node degrees is an even number.

Indeed, every edge links two nodes, therefore responsible for increasing the sum of degrees by 2 (one per node). We have the following formula:

vVdeg(v)=2E\boxed{\sum_{v\in V}\textrm{deg}(v) = 2|E|}
Remark

A consequence of the above formula is that there is an even number of odd nodes.

Cyclicity

The cyclicity of a graph is a property that depends on whether the graph has a cycle:

AcyclicCyclic
Does not contain a cycleContains at least a cycle
Remark

A directed acyclic graph is commonly abbreviated as DAG.

Properties

A graph can have any of the following properties:

TypeDescriptionIllustration
CompleteEvery pair of vertices is connected by an edge.
ConnectedThere exists a path between every pair of nodes.
BipartiteVertices can be split into two disjoint sets such that every edge of the graph links a vertex from one set to one in the other.

Graph traversal

Breadth-First Search (BFS) is an algorithm that explores a graph level by level.

The graph traversal is performed in O(V+E)\mathcal{O}(|V| + |E|) time and O(V)\mathcal{O}(|V|) space:

  • Initialization: The following quantities are tracked:

    • Queue qq that keeps track of the next nodes to potentially visit. In the beginning, the first node is the only element inside.

    • Hash set hvh_{v} that keeps track of visited nodes. It is initially empty.

  • Update step: Dequeue element from qq. We have the following cases:

    • Node already visited: Skip it.

    • Node not visited yet: Add it to the set hvh_{v} of visited nodes, and look at its neighbors: if they were not visited before, enqueue them in qq.

  • Final step: Repeat the update step until the queue gets empty. The set hvh_{v} represents the set of visited nodes.

Depth-First Search (DFS) is an algorithm that explores a graph by prioritizing depth.

The graph traversal is performed in O(V+E)\mathcal{O}(|V|+|E|) time and O(V)\mathcal{O}(|V|) space:

  • Initialization: The following quantities are tracked:

    • Stack ss that keeps track of the next nodes to potentially visit. In the beginning, the first node is the only element inside.

    • Hash set hvh_{v} that keeps track of visited nodes. It is initially empty.

  • Update step: Pop element from ss. We have the following cases:

    • Node already visited: Skip it.

    • Node not visited yet: Add it to the set hvh_{v} of visited nodes, and look at its neighbors: if they were not visited before, push them to ss.

  • Final step: Repeat the update step until the stack gets empty. The set hvh_{v} represents the set of visited nodes.

Remark

DFS can also be implemented recursively.

Graph traversal summary

The table below highlights the main differences between BFS and DFS:

Breadth-First Search (BFS)Depth-First Search (DFS)
MindsetLevel by levelDepth first
Possible
approaches
Iteration using a queue- Iteration using a stack
- Recursion
Illustration

Number of islands

A classic application of graph traversal algorithms is to count the number of islands in an m×nm\times n grid, where each cell is either marked as 1\texttt{1} (land) or 0\texttt{0} (water).

Trick Perform a BFS starting from each unvisited land cell of the grid to explore the associated island and skip cells that have already been visited.

Algorithm The entire grid is explored in O(m×n)\mathcal{O}(m\times n) time and O(m×n)\mathcal{O}(m\times n) space as follows:

  • Initialization: Set the counter cc that tracks the number of islands to 0.

  • Explore step: For each cell of the grid, we have the following situations:

    • Water not visited: Skip this cell as it is not part of any island.

    • Land not visited: Perform a BFS that starts from that cell and visits all neighboring land cells. This process uses a temporary queue qq to track nodes to visit, and marks visited nodes by changing their values to X\texttt{X}.

      At the end of the exploration, add 1 to the number of islands cc.

    • Water/Land already visited: Skip this cell as it was already explored.

  • Final step: The number of islands in the grid is equal to cc.

We note that:

  • Using an iterative graph traversal algorithm is helpful in preventing stack overflow in case the exploration of an island leads to too many stack calls.
  • It would have been equally possible to make island explorations using DFS.

Robot room cleaner

The goal of this classic problem is to clean a room composed of nn cells using a robot cleaner that can only perform the 3 following actions:

Turn rightMove cellClean cell
Rotate 90◦ to the right while staying in the same cell.Move forward by one cell in the current direction, provided that there is no obstacle.Clean current cell.

The configuration of the room is not known by the robot but is assumed to be of finite size. It has obstacles along the way that the robot needs to avoid. We assume that the robot initially starts in a cell with no obstacle.

Trick

  • Perform a DFS using an exploration strategy that consists of trying all possible directions and that backtracks if an obstacle is found.

  • Rotating to the right can be done by keeping track of the direction (dx,dy)(d_x, d_y) that the robot is pointing towards.

Algorithm The robot cleans the room in O(n)\mathcal{O}(n) time and O(n)\mathcal{O}(n) space as follows:

  • Initialization: We take the convention that:

    • The initial cell of the robot is (x,y)=(0,0)(x, y) = (0, 0).

    • The robot is initially facing up: (dx,dy)=(0,1)(d_x, d_y) = (0, 1).

      A hash set is used to keep track of the coordinates of the cells visited by the robot.

  • Explore step: This is the main step where the robot cleans the cell and moves to an unvisited cell.

    The robot performs the following actions:

    • It cleans the cell.

    • It tries to visit each of the 4 neighboring cells by right-rotating and trying to move there. If the neighboring cell is an obstacle or has already been visited, the robot skips it.

    • When it does not have any more new cells to visit, the robot backtracks by turning right twice, moving and turning right twice again to be pointing back to the initial direction.

  • Final step: When the robot cleaned all cells, it backtracks to its original cell.

Topological sort

In a given directed graph, a topological sort (topsort) is a linear ordering of the nodes such that for all directed edges Eij=(Vi,Vj)E_{ij}=(V_i, V_j), ViV_i appears before VjV_j.

We note that:

  • A graph can have multiple topological orderings.

  • A DAG always has a topological sort.

  • If a graph contains a cycle, then it cannot have a valid ordering.

Kahn's algorithm

Kahn's algorithm aims at finding a topological ordering of a given DAG. It finds a solution in O(V+E)\mathcal{O}(|V|+|E|) time and O(V)\mathcal{O}(|V|) space:

  • Initialization: Initialize an empty array that keeps track of the final ordering of vertices.

  • Compute step: Repeat the following procedure until all nodes are visited:

    • Pick a random node of in-degree 0 and visit it.

    • Decrement by 1 the in-degrees of the nodes that the newly-visited node is connected to.

  • Final step: When all nodes are visited, the array contains a valid ordering of vertices.

Shortest path

Weighted graph

A weighted graph is a graph where each edge EijE_{ij} has an associated weight wijw_{ij}.

Remark

When the weight is interpreted as a distance, it is often noted di,jd_{i, j}.

Dijkstra's algorithm

Dijkstra's algorithm is a greedy algorithm that aims at finding the shortest paths between a source node VSV_S and all other nodes jVj\in V of a weighted graph that has no negative edge weights.

The algorithm finds a solution in O(Elog(V))\mathcal{O}(|E|\log(|V|)) time and O(V)\mathcal{O}(|V|) space:

  • Initialization:

    • Set the distances dVS,jd_{V_S,\hspace{0.2em}j} from VSV_S to each of the other nodes jVj\in V to ++\infty to indicate that they are initially unreachable.

    • Set the distance dVS,VSd_{V_S,\hspace{0.2em}V_S} from VSV_S to itself to 0.

    • Initialize a hash table hph_p that keeps track of the chosen predecessor for each node. By convention, the predecessor of VSV_S is itself.

  • Compute step: Repeat the following procedure until all nodes are visited:

    • Pick the unvisited node ii with the lowest distance dVS,id_{V_S,\hspace{0.2em}i} and visit it.

    • Look at the unvisited nodes jj that have an edge coming from the newly-visited node ii.

      • Case dVS,j>dVS,i+di,jd_{V_S,\hspace{0.2em}j} > d_{V_S,\hspace{0.2em}i} + d_{i, j}: This means that the distance of the path from node VSV_S to jj via node ii is smaller than the current distance. We update the corresponding distance:

        dVS,jdVS,i+di,j\boxed{d_{V_S,\hspace{0.2em}j} \longleftarrow d_{V_S,\hspace{0.2em}i} + d_{i, j}}

        Also, we update hph_p to indicate that the predecessor of jj is now ii.

      • Case dVS,jdVS,i+di,jd_{V_S,\hspace{0.2em}j} \leqslant d_{V_S,\hspace{0.2em}i} + d_{i, j}: This means that the proposed path does not improve the current distance. We do not make any updates.

  • Final step:

    • The final distances are those of the shortest paths between VSV_S and each of the other nodes jVj\in V of the graph.

    • The associated shortest paths can be reconstructed using hph_p.

Remark

Given that choices are based on local optima, Dijkstra's algorithm does not guarantee a correct solution when there are negative edge weights. Indeed, if such an edge were to be found "too late" in the exploration process, the algorithm may visit a node earlier than desired and produce a suboptimal path.

A* algorithm

The A\textrm{A}^{\star} algorithm is an extension of Dijkstra's algorithm that is used when the goal is to find the shortest path between a source node VSV_S and a fixed target node VTV_T.

To better direct the search towards the target node, it modifies the distance used to select the next node by using a heuristic hj,VTh_{j,\hspace{0.2em}V_T} that approximates the remaining distance between unvisited nodes jj and the target node VTV_T.

The distance used to determine the next node jj to visit is given by dAd^{\hspace{0.05em}A^{\star}}:

dVS,jA=dVS,j+hj,VT\boxed{d_{V_S,\hspace{0.2em}j}^{\hspace{0.05em}A^{\star}}=d_{V_S,\hspace{0.2em}j}+h_{j,\hspace{0.2em}V_T}}
Remark

The A\textrm{A}^{\star} algorithm heavily relies on a good choice of heuristic hh.

Bellman-Ford algorithm

The Bellman-Ford algorithm aims at finding the shortest paths between a source node VSV_S and all other nodes jVj\in V of a weighted graph.

A solution is found in O(V×E)\mathcal{O}(|V|\times|E|) time and O(V)\mathcal{O}(|V|) space:

  • Initialization:

    • Set the distances dVS,jd_{V_S,\hspace{0.2em}j} from VSV_S to each of the other nodes jVj\in V to ++\infty to indicate that they are initially unreachable.

    • Set the distance dVS,VSd_{V_S,\hspace{0.2em}V_S} from VSV_S to itself to 0.

    • Initialize a hash table hph_p that keeps track of the chosen predecessor for each node. By convention, the predecessor of VSV_S is itself.

  • Compute step: Repeat the following procedure until there is no more updates, which is at most V1|V|-1 times:

    For each node ii of the graph, look at its neighbors jj and update the distance dVS,jd_{V_S,\hspace{0.2em}j} between the source node VSV_S and the node jj depending on the following situations:

    • Case dVS,j>dVS,i+di,jd_{V_S,\hspace{0.2em}j} > d_{V_S,\hspace{0.2em}i} + d_{i, j}: This means that the distance of the path from node VSV_S to jj via node ii is smaller than the current distance. We update the corresponding distance:

      dVS,jdVS,i+di,j\boxed{d_{V_S,\hspace{0.2em}j} \longleftarrow d_{V_S,\hspace{0.2em}i} + d_{i, j}}

      Also, we update hph_p to indicate that the predecessor of jj is now ii.

    • Case dVS,jdVS,i+di,jd_{V_S,\hspace{0.2em}j} \leqslant d_{V_S,\hspace{0.2em}i} + d_{i, j}: This means that the proposed path does not improve the current distance. We do not make any updates.

  • Final step:

    • The final distances are those of the shortest paths between VSV_S and each of the other nodes jVj\in V of the graph.

    • The associated shortest paths can be reconstructed using hph_p.

Compared to Dijkstra's algorithm, Bellman-Ford's algorithm has the advantage of supporting negative edge weights. However, it cannot run on a graph having negative cycles, i.e. cycles with a negative sum of edge weights.

Floyd-Warshall algorithm

The Floyd-Warshall algorithm aims at finding the shortest paths between all pairs of nodes of a weighted graph.

A solution is found in O(V3)\mathcal{O}(|V|^3) time and O(V2)\mathcal{O}(|V|^2) space:

  • Initialization: Initialize a matrix of distances of size V×V|V|\times|V|, where each cell (i,j)(i, j) represents the distance of the shortest path from node ii to node jj. We know that:

    • The distance of a node to itself is 0, so each element of the diagonal is equal to 0.

    • The distances of nodes linked by edges are given.

  • Update step: For each intermediary node kVk\in V, loop through start nodes iVi\in V and end nodes jVj \in V:

    • Case di,j>di,k+dk,jd_{i,j} > d_{i,k} + d_{k,j}: This means that the distance from node ii to jj via node kk is smaller than the current distance. We make the following update:

      di,jdi,k+dk,j\boxed{d_{i, j} \longleftarrow d_{i, k} + d_{k, j}}

    • Case di,jdi,k+dk,jd_{i,j} \leqslant d_{i,k} + d_{k,j}: This means that the proposed path does not improve the current distance. We do not make any updates.

  • Final step: The resulting matrix gives the shortest path between each pair of nodes ii and jj.

This algorithm allows for negatively weighted edges but it does not allow for negative cycles like the other shortest path algorithms.

Remark

In the same fashion as in Dijkstra's and Bellman-Ford's algorithms, we can reconstruct the resulting shortest paths by keeping track of nodes during updates.

Summary

The differences between the main shortest paths algorithms are summarized below:

AlgorithmShortest path
between
Negative
weights
TimeSpace
DijkstraVSV_S and all other nodesNoO(Elog(V))\mathcal{O}(\vert E\vert\log(\vert V\vert))O(V)\mathcal{O}(\vert V\vert)
A\textrm{A}^\starVSV_S and VTV_TNoO(Elog(V))\mathcal{O}(\vert E\vert\log(\vert V\vert))O(V)\mathcal{O}(\vert V\vert)
Bellman-FordVSV_S and all other nodesYesO(V×E))\mathcal{O}(\vert V\vert\times\vert E\vert))O(V)\mathcal{O}(\vert V\vert)
Floyd-WarshallAll pairs of nodesYesO(V3)\mathcal{O}(\vert V\vert^3)O(V2)\mathcal{O}(\vert V\vert^2)

Want more content like this?

Subscribe here to be notified of new Super Study Guide releases!