This post is part 4 of 4 in a Series
In the first three posts of this series, we have made it through the outline of graphs, the integration of the google matrix distance api, and the setup of the min-heap data structure, all for the focus of this application, which is the implementation of Prim’s algorithm for finding the Minimum Spanning Tree. This implementation, like all of the examples so far, is completed using Ruby.
There are several online resources for understanding Prim’s algorithm. One video, among others, that are helpful is from Bo Qian HERE. Princeton made these lecture slides available HERE. These resources go through several examples in details, and this algorithm is much easier to conceptualize with the interactive examples and walk through.
High Level Implementation:
Before jumping into the code we take a look at the pseudocode for this algorithm, courtesy of Wikipedia:
- Initialize a tree with a single vertex, chosen arbitrarily from the graph.
- Grow the tree by one edge: of the edges that connect the tree to vertices not yet in the tree, find the minimum-weight edge, and transfer it to the tree.
- Repeat step 2 (until all vertices are in the tree).
So we will follow the following exercise:
First, we take all nodes from the graph, and put them into a hash we will call @excluded_nodes. Then we initialize an empty hash @included_nodes.
Next we traverse the graph by asking the following: ‘What is the minimum edge that connects the nodes in @excluded_nodes to the nodes in @included_nodes’? Each time we find a minimum edge, we add it to our list to return. We also take that node from the @excluded_nodes (the one that uses the minimum edge to connect with the @included_nodes) and transfer it from being in the @excluded_nodes to being in the @included_nodes.
Eventually, @included_nodes will have all of the nodes, and @excluded_nodes will be empty. At this point, all of the minimum edges will have been added to our list, and we can return this list.
Visually, this looks like this after the first step:
The rest of this post explores the methods and time complexity to implement the above pseudocode.
The above visual example and pseudocode requires us to set up a few data structures and initialize variables.
First we initialize the PrimMinimumSpanningTree with our graph. We set all nodes of the graph to have a key of infinite. Each graph node’s key will be updated along the way, as we learn the various edge weights between the nodes.
Next, we initialize a new min-heap (discussed in post 3), and set the first node to be a key less than all others (0 as opposed to infinite). This allows us to start the algorithm at a random node.
# prim_minimum_spanning_tree.rb class PrimMinimumSpanningTree INT_MAX = Float::INFINITY def initialize(graph) @graph = graph @included_nodes = Hash.new @included_edges =  @excluded_nodes = Hash.new @min_heap = MinHeap.new end def minimum_spanning_tree @graph.update_all_nodes_to_distance(INT_MAX) # setup the heap of nodes and the excluded sets # time complexity is O(n) -> http://stackoverflow.com/questions/9755721/how-can-building-a-heap-be-on-time-complexity @graph.node_list.values.each_with_index do |node, index| if index == 0 node.key = 0 # set the first input (random) to be key of 0 (rather than INT_MAX default) end @excluded_nodes[node.node_data] = node @min_heap << node # add node to min_heap (comparable is on key) end # ... end # ... end
Now comes, the algorithm itself. While nodes exist in the following set, we traverse the graph from the ‘minimum node’. This ‘minimum node’ is the node with the smallest key pulled from our min-heap.
From the minimum node we find the minimum edge that connects the minimum node to the included set, and then transfer the node in question
# while there are nodes that exist in the excluded set # O(n) while @excluded_nodes.count > 0 # find the node that has the minimum key min_node = @min_heap.extract_min # (log(n)) -> total becomes n(log(n)) # find the minimum edge from this min node to the included set of nodes # time complexity of O(E) - where E is the edges from a node # total times is 2E (undirected graph - all edges) + 2E * log(n) min_edge = find_min_edge(min_node, @min_heap, @included_nodes) @included_edges << min_edge unless min_edge.nil? @included_nodes[min_node.node_data] = min_node @excluded_nodes.delete(min_node.node_data) end
The complexity comes in the form of finding the minimum edge. The below code shows the helper method:
# find the minimum edge from the min_node (from excluded to group) to those nodes already included # recompute keys and parents for nodes as applicable def find_min_edge(min_node, min_heap, included_nodes) min_edge_weight = INT_MAX min_edge_index = nil # loop through all of min_node's neighbor nodes # get the node that is in the included_nodes that has smallest edge weight # min edge is edge between this min node and its parent # we could get this from the graph edges by using the key of min node and parent min_node.neighbors.each_with_index do |neighbor, index| # reconstruct the key and parent for adjacent nodes that are not already included if min_node.edges[index].weight < neighbor.key neighbor.parent = min_node.node_data neighbor.key = min_node.edges[index].weight # O(log(n)) if min_heap.contains_element(neighbor) min_heap.delete_element(neighbor) min_heap << neighbor end end # if the neighbor node is not excluded (meaning it is in the set) if min_node.edges[index].weight < min_edge_weight && included_nodes.has_key?(neighbor.node_data) min_edge_weight = min_node.edges[index].weight min_edge_index = index end end min_edge = min_node.edges[min_edge_index] unless min_edge_index.nil? return min_edge end
We start by initializing a ‘min_edge_weight’ and ‘min_edge_index’. These will allow us to keep track of the minimum edge to return. Next, we loop through all of the minimum node’s adjacent nodes (neighbors). For each neighbor, if the edge weight to this neighbor is less than the previous key (which starts at infinite), the key is updated to the new edge weight. It is then removed and reinserted with the new key to the min-heap that we initialized earlier. Note that to find, delete, and reinsert the node to the min-heap requires additional bookkeeping covered in post 3. This keeps the contains/delete/reinsert min-heap operations at 2 * O(log(n)), which reduces to O(log(n)) (more details later in this post).
In the above calculation, we compare the edge from the minimum node to the neighbor. If it is less than the initial variable, we flag it to be compared in subsequent iterations. The minimum edge is returned after the above loop.
Algorithm Time Complexity Analysis
If we walk through the steps we see three distinct operations:
- Updating all node keys and adding all nodes to a new min-heap:
- Iterating through all nodes of the graph (as they are found, they are added to the @included_nodes hash)
- The find_min_edge helper method that loops through a node’s neighbors and uses min-heap operations to find the minimum edge.
Step 1 can be reduced to O(n) for updating all distances + O(n) for initializing and populating the min-heap. But wait, if insert to a min-heap is O(log(n)), and we build the heap by iterating through each node (O(n)), wouldn’t the total be O(n * log(n))? I thought so too, but this post explains the details, along with the link to this article.
The idea is that most of the insert work, which uses the O(log(n)) sift_up method is done on the bottom levels of the tree, and the overall build heap operation can be simplified to O(n). For clarity distinguishing edges and nodes, we will refer to nodes as ‘V’ (for vertex), which makes the complexity O(V) for step 1.
Step 2 of iterating through all nodes gives us an O(V) loop. Within the loop we have an extract_min operation at O(log(V)). Step 3 of the find_min_edge helper method has a complexity of 2 * log(V) + edges from V. Taken into account with the outer loop, the total will be 2 * E * log(V). This is because over the iteration of all nodes, we will call the min_heap operations for each edge (times 2 because the graph is undirected).
If we add all of the complexities, we have:
O(V) + V(log(V)) + E(log(V)). This reduces to O((E+V)log(V)), which because E is larger than V can be expressed as total running time of O(E(log(V)).
This post also details the above analysis HERE
In the application paired with this algorithm, we have a perfect example of a matrix graph. In this case, because we have V(V-1)/2 edges, the running time will be slightly worse than if the algorithm was implemented using adjacency matrix. The running time can replace E with V2, which brings the complexity of the matrix graph to O(V2 * (log(V)). For most graph applications, the number of edges would not be an exponential factor of vertices, however.
This post concludes the 4 part post series. To recap, we built a ruby on rails application which leverages the Google Maps Matrix API, the Google Maps Distance Service, custom built minimum heap and graph data structures, and Prim’s Minimum Spanning Tree algorithm to demonstrate a visual example of the MST algorithm in practice.
There are many additional resources and next steps worth visiting, one in particular being the 'Algorithms and Containers' project HERE and HERE. Using the Priority Queue provided here implemented using a Fibonacci Heap, could provide better run time on Prim's algorithm.
I hope you have enjoyed the post. If you find any errors, opportunities for improvement, or other ideas, please comment below or email me at firstname.lastname@example.org.