Graphs
By Afshine Amidi and Shervine Amidi
General concepts
Definition
A graph is defined by its vertices and edges and is often noted . The following table summarizes the two main types of graph:
Undirected graph | Directed graph |
---|---|
Edges do not have a direction. | Edges have a direction. |
![]() | ![]() |
The terms "node" and "vertex" can be used interchangeably.
Graph representation
Let's consider the following graph:
It can be represented in two different ways:
Type | Description | Illustration |
---|---|---|
Adjacency list | Collection of unordered lists where each entry maps to all the nodes such that exists in the graph. | ![]() |
Adjacency matrix | Matrix of boolean values where the entry indicates whether is present in the graph. In an undirected graph, this matrix is always symmetric. | ![]() |
Degree
A node can have the following characteristics based on its adjacent edges:
Graph | Notation | Definition | Illustration |
---|---|---|---|
Undirected | Degree | Number of connected edges | ![]() |
Directed | In-Degree | Number of connected inbound edges. | ![]() |
Out-Degree | Number of connected outbound edges | ![]() |
Nodes of even (resp. odd) degree are called even (resp. odd) nodes.
Handshaking lemma
In an undirected graph, the sum of node degrees is an even number.
Indeed, every edge links two nodes, therefore responsible for increasing the sum of degrees by 2 (one per node). We have the following formula:
A consequence of the above formula is that there is an even number of odd nodes.
Cyclicity
The cyclicity of a graph is a property that depends on whether the graph has a cycle:
Acyclic | Cyclic |
---|---|
Does not contain a cycle | Contains at least a cycle |
![]() | ![]() |
A directed acyclic graph is commonly abbreviated as DAG.
Properties
A graph can have any of the following properties:
Type | Description | Illustration |
---|---|---|
Complete | Every pair of vertices is connected by an edge. | ![]() |
Connected | There exists a path between every pair of nodes. | ![]() |
Bipartite | Vertices can be split into two disjoint sets such that every edge of the graph links a vertex from one set to one in the other. | ![]() |
Graph traversal
Breadth-First Search
Breadth-First Search (BFS) is an algorithm that explores a graph level by level.
The graph traversal is performed in time and space:
Initialization: The following quantities are tracked:
Queue that keeps track of the next nodes to potentially visit. In the beginning, the first node is the only element inside.
Hash set that keeps track of visited nodes. It is initially empty.
Update step: Dequeue element from . We have the following cases:
Node already visited: Skip it.
Node not visited yet: Add it to the set of visited nodes, and look at its neighbors: if they were not visited before, enqueue them in .
Final step: Repeat the update step until the queue gets empty. The set represents the set of visited nodes.
Depth-First Search
Depth-First Search (DFS) is an algorithm that explores a graph by prioritizing depth.
The graph traversal is performed in time and space:
Initialization: The following quantities are tracked:
Stack that keeps track of the next nodes to potentially visit. In the beginning, the first node is the only element inside.
Hash set that keeps track of visited nodes. It is initially empty.
Update step: Pop element from . We have the following cases:
Node already visited: Skip it.
Node not visited yet: Add it to the set of visited nodes, and look at its neighbors: if they were not visited before, push them to .
Final step: Repeat the update step until the stack gets empty. The set represents the set of visited nodes.
DFS can also be implemented recursively.
Graph traversal summary
The table below highlights the main differences between BFS and DFS:
Breadth-First Search (BFS) | Depth-First Search (DFS) | |
---|---|---|
Mindset | Level by level | Depth first |
Possible approaches | Iteration using a queue | - Iteration using a stack - Recursion |
Illustration | ![]() | ![]() |
Number of islands
A classic application of graph traversal algorithms is to count the number of islands in an grid, where each cell is either marked as (land) or (water).
Trick Perform a BFS starting from each unvisited land cell of the grid to explore the associated island and skip cells that have already been visited.
Algorithm The entire grid is explored in time and space as follows:
Initialization: Set the counter that tracks the number of islands to 0.
Explore step: For each cell of the grid, we have the following situations:
Water not visited: Skip this cell as it is not part of any island.
Land not visited: Perform a BFS that starts from that cell and visits all neighboring land cells. This process uses a temporary queue to track nodes to visit, and marks visited nodes by changing their values to .
At the end of the exploration, add 1 to the number of islands .
Water/Land already visited: Skip this cell as it was already explored.
Final step: The number of islands in the grid is equal to .
We note that:
- Using an iterative graph traversal algorithm is helpful in preventing stack overflow in case the exploration of an island leads to too many stack calls.
- It would have been equally possible to make island explorations using DFS.
Robot room cleaner
The goal of this classic problem is to clean a room composed of cells using a robot cleaner that can only perform the 3 following actions:
Turn right | Move cell | Clean cell |
---|---|---|
Rotate 90◦ to the right while staying in the same cell. | Move forward by one cell in the current direction, provided that there is no obstacle. | Clean current cell. |
![]() | ![]() | ![]() |
The configuration of the room is not known by the robot but is assumed to be of finite size. It has obstacles along the way that the robot needs to avoid. We assume that the robot initially starts in a cell with no obstacle.
Trick
Perform a DFS using an exploration strategy that consists of trying all possible directions and that backtracks if an obstacle is found.
Rotating to the right can be done by keeping track of the direction that the robot is pointing towards.
Algorithm The robot cleans the room in time and space as follows:
Initialization: We take the convention that:
The initial cell of the robot is .
The robot is initially facing up: .
A hash set is used to keep track of the coordinates of the cells visited by the robot.
Explore step: This is the main step where the robot cleans the cell and moves to an unvisited cell.
The robot performs the following actions:
It cleans the cell.
It tries to visit each of the 4 neighboring cells by right-rotating and trying to move there. If the neighboring cell is an obstacle or has already been visited, the robot skips it.
When it does not have any more new cells to visit, the robot backtracks by turning right twice, moving and turning right twice again to be pointing back to the initial direction.
Final step: When the robot cleaned all cells, it backtracks to its original cell.
Topological sort
In a given directed graph, a topological sort (topsort) is a linear ordering of the nodes such that for all directed edges , appears before .
We note that:
A graph can have multiple topological orderings.
A DAG always has a topological sort.
If a graph contains a cycle, then it cannot have a valid ordering.
Kahn's algorithm
Kahn's algorithm aims at finding a topological ordering of a given DAG. It finds a solution in time and space:
Initialization: Initialize an empty array that keeps track of the final ordering of vertices.
Compute step: Repeat the following procedure until all nodes are visited:
Pick a random node of in-degree 0 and visit it.
Decrement by 1 the in-degrees of the nodes that the newly-visited node is connected to.
Final step: When all nodes are visited, the array contains a valid ordering of vertices.
Shortest path
Weighted graph
A weighted graph is a graph where each edge has an associated weight .
When the weight is interpreted as a distance, it is often noted .
Dijkstra's algorithm
Dijkstra's algorithm is a greedy algorithm that aims at finding the shortest paths between a source node and all other nodes of a weighted graph that has no negative edge weights.
The algorithm finds a solution in time and space:
Initialization:
Set the distances from to each of the other nodes to to indicate that they are initially unreachable.
Set the distance from to itself to 0.
Initialize a hash table that keeps track of the chosen predecessor for each node. By convention, the predecessor of is itself.
Compute step: Repeat the following procedure until all nodes are visited:
Pick the unvisited node with the lowest distance and visit it.
Look at the unvisited nodes that have an edge coming from the newly-visited node .
Case : This means that the distance of the path from node to via node is smaller than the current distance. We update the corresponding distance:
Also, we update to indicate that the predecessor of is now .
Case : This means that the proposed path does not improve the current distance. We do not make any updates.
Final step:
The final distances are those of the shortest paths between and each of the other nodes of the graph.
The associated shortest paths can be reconstructed using .
Given that choices are based on local optima, Dijkstra's algorithm does not guarantee a correct solution when there are negative edge weights. Indeed, if such an edge were to be found "too late" in the exploration process, the algorithm may visit a node earlier than desired and produce a suboptimal path.
A* algorithm
The algorithm is an extension of Dijkstra's algorithm that is used when the goal is to find the shortest path between a source node and a fixed target node .
To better direct the search towards the target node, it modifies the distance used to select the next node by using a heuristic that approximates the remaining distance between unvisited nodes and the target node .
The distance used to determine the next node to visit is given by :
The algorithm heavily relies on a good choice of heuristic .
Bellman-Ford algorithm
The Bellman-Ford algorithm aims at finding the shortest paths between a source node and all other nodes of a weighted graph.
A solution is found in time and space:
Initialization:
Set the distances from to each of the other nodes to to indicate that they are initially unreachable.
Set the distance from to itself to 0.
Initialize a hash table that keeps track of the chosen predecessor for each node. By convention, the predecessor of is itself.
Compute step: Repeat the following procedure until there is no more updates, which is at most times:
For each node of the graph, look at its neighbors and update the distance between the source node and the node depending on the following situations:
Case : This means that the distance of the path from node to via node is smaller than the current distance. We update the corresponding distance:
Also, we update to indicate that the predecessor of is now .
Case : This means that the proposed path does not improve the current distance. We do not make any updates.
Final step:
The final distances are those of the shortest paths between and each of the other nodes of the graph.
The associated shortest paths can be reconstructed using .
Compared to Dijkstra's algorithm, Bellman-Ford's algorithm has the advantage of supporting negative edge weights. However, it cannot run on a graph having negative cycles, i.e. cycles with a negative sum of edge weights.
Floyd-Warshall algorithm
The Floyd-Warshall algorithm aims at finding the shortest paths between all pairs of nodes of a weighted graph.
A solution is found in time and space:
Initialization: Initialize a matrix of distances of size , where each cell represents the distance of the shortest path from node to node . We know that:
The distance of a node to itself is 0, so each element of the diagonal is equal to 0.
The distances of nodes linked by edges are given.
Update step: For each intermediary node , loop through start nodes and end nodes :
Case : This means that the distance from node to via node is smaller than the current distance. We make the following update:
Case : This means that the proposed path does not improve the current distance. We do not make any updates.
Final step: The resulting matrix gives the shortest path between each pair of nodes and .
This algorithm allows for negatively weighted edges but it does not allow for negative cycles like the other shortest path algorithms.
In the same fashion as in Dijkstra's and Bellman-Ford's algorithms, we can reconstruct the resulting shortest paths by keeping track of nodes during updates.
Summary
The differences between the main shortest paths algorithms are summarized below:
Algorithm | Shortest path between | Negative weights | Time | Space |
---|---|---|---|---|
Dijkstra | and all other nodes | No | ||
and | No | |||
Bellman-Ford | and all other nodes | Yes | ||
Floyd-Warshall | All pairs of nodes | Yes |
Subscribe here to be notified of new Super Study Guide releases!