Tarjan's Strongly Connected Components Algorithm Files: sc.c, sc.h If a set of vertices within a graph are "strongly connected", then their is a path between any vertex pair (vs, vf) where vs is any starting vertex and vf is any finishing vertex. That is, each vertex is reachable from every other vertex. Tarjan's strongly connected (SC) component algorithm determines which vertices in a directed graph are strongly connected. The algorithm proceeds in a depth first search manner. It is described below: SC(v) { N[v] = c; /* Mark v visited by assigning it a visit number. */ L[v] = c; /* Low-link initially equal to visit number. */ c++; push v onto the stack; for each w in OUT(v) { if N[w] == UNDEFINED { /* N[w] == UNDEFINED means w is unvisited. */ Add edge (v, w) to T; DFS(w); L[v] = min(L[v], L[w]); /* Low-link number can propagate upward. */ } else if w is on the stack { L[v] = min(L[v], N[w]); } } /* Check if SC component found. */ if L[v] == N[v] { pop vertices off stack down to v; /* These vertices make up an SC component. */ } } main_program { T := EMPTY; /* T is the tree made up of traversed edges. */ c : = 0; /* c is the counter for visit numbers. */ for each vertex, v, in the graph, N[v] = UNDEFINED /* Mark v "unvisited". */ SC(v0); /* v0 is the starting vertex. */ } As the DFS proceeds it assigns a DFS (or visit) number to each vertex as it is visited. The algorithm also maintains a low-link number, which is initially assigned the same value as the visit number when a vertex is visited for the first time. A stack of vertices is maintained, and when a vertex is visited for the first time it is placed on the stack. We describe Tarjan's algorithm by considering the imaginary DFS tree that is built up as the algorithm proceeds. As the DFS tree is built vertices, v, are assigned visit numbers N[v] and low-link numbers L[v]. There are two ways that it is possible for the low-link number to be updated as DFS proceeds. Suppose the algorithm is at vertex v in the graph. The first kind of update is when the algorithm follows an edge to a vertex w in OUT(v) that was previously visited and is still in the stack. In this case, L[v] is updated to the smaller of L[v] and N[w]. The second kind of update is for L[w] to propagate up the DFS tree and overwrite L[v], since after the DFS from w is completed L[v] is updated to the smaller of L[v] and L[w]. After the DFS from a vertex, v, is complete, the algorithm checks to see if L[v] and N[v] are still equal (recall that the low-link and visit numbers were initially the same). If L[v] is equal to N[v] then vertices are popped off the stack, down to and including vertex v. A set of vertices popped off the stack together make up an SC component. After the DFS terminates at the starting vertex, all SC components have been found. If the the SC components are condensed into single vertices with edges between them, we have an acyclic graph. A side-effect of Tarjan's SC component algorithm is that the SC components pop off the stack in topological order. ---------- In the source files sc.c and sc.h, SC components are represented using arrays of vertex numbers. The structure type for SC component results is given below: typedef struct sc_result { int size, n_sets; int *vertices; int *sets_s, *sets_f; } sc_result_t; Field sets_n stores the number of SC components found in the search. Field size specifies the array sizes. The SC components are stored using three arrays: - sets_s[i] gives the starting position in the vertices[] array of SC component i. - sets_f[i] gives the finishing position in the vertices[] array of SC component i. - vertices[] is used for storing the vertex numbers of vertices in the SC components. For example if there are three SC components the vertices in each are stored as follows: - SC0: vertices[sets_s[0]] ... vertices[sets_f[0]]. - SC1: vertices[sets_s[1]] ... vertices[sets_f[1]]. - SC2: vertices[sets_s[2]] ... vertices[sets_f[2]]. Note that array entries sets[3] onward are set to UNDEFINED.