Friday, January 14, 2011

Directed Graph nodes traverse using WITH RECURSIVE Query of PG

This is a Question which I had seen people generally ask, when they Move their DBs to PG or PGPlus Advanced Server and Look for Workaround of Oracle Feature of Traverse of Directed Graph in PG.

Oracle Provides CONNECT BY NOCYCLE for traversing a Directed Graph.

As mentioned in Following link:
http://www.adp-gmbh.ch/ora/sql/connect_by_nocycle.html
However, CONNECT BY NOCYCLE implementation, some times miss some paths for Traversing Graph as you can see in following Example:
create table directed_graph (
  node_from char(1),
  node_to   char(1)
);

insert into directed_graph values ('A', 'C');
insert into directed_graph values ('A', 'B');
insert into directed_graph values ('B', 'E');
insert into directed_graph values ('C', 'H');
insert into directed_graph values ('H', 'I');
insert into directed_graph values ('I', 'D');
insert into directed_graph values ('D', 'F');
insert into directed_graph values ('D', 'C');
insert into directed_graph values ('F', 'J');
insert into directed_graph values ('J', 'K');
insert into directed_graph values ('J', 'G');
insert into directed_graph values ('K', 'L');
insert into directed_graph values ('F', 'L');
insert into directed_graph values ('K', 'M');
Output of Oracle "CONNECT BY NOCYCLE" paths are given below:
A->C
 C->H
  H->I
   I->D
    D->F
     F->J
      J->K
       K->L
       K->M
      J->G
     F->L
A->B
 B->E

Now, lets look at the Directed Graph:
A-->C<--D-->F-->L
|   |   ^   |   ^
v   v   |   v   |
B   H-->I   J-->K
|           |   |
v           v   v
E           G   M
In Above, we can see that there is a path from A->C->H->I->D->C. However that path would not be visible using CONNECT BY NOCYCLE.

In such case, if user is using the PG or PGPLUS8.4, then they can use WITH RECURSIVE Query to get all the paths, as given below:
WITH RECURSIVE travel
AS(SELECT node_from, node_to, 1 as depth,
ARRAY[node_from::varchar, node_to::varchar] as path, false as cycle
FROM directed_graph
UNION ALL
SELECT a.node_from, b.node_to, a.depth+1,
a.path||ARRAY[b.node_to::varchar], (b.node_to=ANY (path)) as cycle
FROM travel a
JOIN directed_graph b on (b.node_from = a.node_to) WHERE NOT cycle
)
SELECT node_from,node_to, path FROM travel ORDER BY node_from, node_to, depth
Output as given below:
 node_from | node_to |        path         
-----------+---------+---------------------
 A         | B       | {A,B}
 A         | C       | {A,C}
 A         | C       | {A,C,H,I,D,C}
 A         | D       | {A,C,H,I,D}
 A         | E       | {A,B,E}
 A         | F       | {A,C,H,I,D,F}
 A         | G       | {A,C,H,I,D,F,J,G}
 A         | H       | {A,C,H}
 A         | I       | {A,C,H,I}
 A         | J       | {A,C,H,I,D,F,J}
 A         | K       | {A,C,H,I,D,F,J,K}
 A         | L       | {A,C,H,I,D,F,L}
 A         | L       | {A,C,H,I,D,F,J,K,L}
 A         | M       | {A,C,H,I,D,F,J,K,M}
 B         | E       | {B,E}
 C         | C       | {C,H,I,D,C}
 C         | D       | {C,H,I,D}
 C         | F       | {C,H,I,D,F}
 C         | G       | {C,H,I,D,F,J,G}
 C         | H       | {C,H}
 C         | I       | {C,H,I}
 C         | J       | {C,H,I,D,F,J}
 C         | K       | {C,H,I,D,F,J,K}
 C         | L       | {C,H,I,D,F,L}
 C         | L       | {C,H,I,D,F,J,K,L}
 C         | M       | {C,H,I,D,F,J,K,M}
 D         | C       | {D,C}
 D         | D       | {D,C,H,I,D}
 D         | F       | {D,F}
 D         | G       | {D,F,J,G}
 D         | H       | {D,C,H}
 D         | I       | {D,C,H,I}
 D         | J       | {D,F,J}
 D         | K       | {D,F,J,K}
 D         | L       | {D,F,L}
 D         | L       | {D,F,J,K,L}
 D         | M       | {D,F,J,K,M}
 F         | G       | {F,J,G}
 F         | J       | {F,J}
 F         | K       | {F,J,K}
 F         | L       | {F,L}
 F         | L       | {F,J,K,L}
 F         | M       | {F,J,K,M}
 H         | C       | {H,I,D,C}
 H         | D       | {H,I,D}
 H         | F       | {H,I,D,F}
 H         | G       | {H,I,D,F,J,G}
 H         | H       | {H,I,D,C,H}
 H         | I       | {H,I}
 H         | J       | {H,I,D,F,J}
 H         | K       | {H,I,D,F,J,K}
 H         | L       | {H,I,D,F,L}
 H         | L       | {H,I,D,F,J,K,L}
 H         | M       | {H,I,D,F,J,K,M}
 I         | C       | {I,D,C}
 I         | D       | {I,D}
 I         | F       | {I,D,F}
 I         | G       | {I,D,F,J,G}
 I         | H       | {I,D,C,H}
 I         | I       | {I,D,C,H,I}
 I         | J       | {I,D,F,J}
 I         | K       | {I,D,F,J,K}
 I         | L       | {I,D,F,L}
 I         | L       | {I,D,F,J,K,L}
 I         | M       | {I,D,F,J,K,M}
 J         | G       | {J,G}
 J         | K       | {J,K}
 J         | L       | {J,K,L}
 J         | M       | {J,K,M}
 K         | L       | {K,L}
 K         | M       | {K,M}
(71 rows)
WITH RECURSIVE Query is really helpful in getting all possible path for each node.

7 comments:

  1. hi, very nice example! helped me a lot! however i like to know if it's possible to query only the path with the lowest depth within your recursive statement if there is more than 1 path?

    ReplyDelete
  2. is it possible to have this algorithm for undirected graphs?

    ReplyDelete
  3. Only problem is... the output is NOT correct =/

    Most of the paths are incorrect and many are missing (A->H isn't in the output for e.g.).

    For example, C | H | {A,C} is wrong: C and H are directly connected, so not need for node A. And H is not even in the path...

    ReplyDelete
  4. Following my previous message, here is a better solution. It give nodes "from" and "to" and the connecting paths properly, up to a given number of cycles. Depth is also correct. The only problem is the number of cycles when above 2.

    WITH RECURSIVE travel
    AS(
    SELECT node_from,
    node_to,
    1 as depth,
    ARRAY[node_from::varchar, node_to::varchar] as path,
    0 as nb_cycles
    FROM directed_graph


    UNION ALL


    SELECT t.node_from, g.node_to,
    t.depth+1,
    t.path||ARRAY[g.node_to::varchar],
    nb_cycles + CASE WHEN (g.node_to=ANY (path)) THEN 1 ELSE 0 END
    FROM travel t
    JOIN directed_graph g on (g.node_from = t.node_to)
    WHERE nb_cycles<1
    )
    SELECT node_from,node_to,depth, nb_cycles, path
    FROM travel
    ORDER BY node_from, node_to, depth



    Sample output:

    node_from node_to depth nb_cycles path
    A B 1 0 {A,B}
    A C 1 0 {A,C}
    A C 5 1 {A,C,H,I,D,C}
    A D 4 0 {A,C,H,I,D}
    A E 2 0 {A,B,E}
    A F 5 0 {A,C,H,I,D,F}
    A G 7 0 {A,C,H,I,D,F,J,G}
    A H 2 0 {A,C,H}
    A I 3 0 {A,C,H,I}
    A J 6 0 {A,C,H,I,D,F,J}
    A K 7 0 {A,C,H,I,D,F,J,K}
    A L 6 0 {A,C,H,I,D,F,L}
    A L 8 0 {A,C,H,I,D,F,J,K,L}
    A M 8 0 {A,C,H,I,D,F,J,K,M}
    B E 1 0 {B,E}
    C C 4 1 {C,H,I,D,C}
    C D 3 0 {C,H,I,D}
    [...]

    71 rows. Total runtime 9.410 ms.

    ReplyDelete
    Replies
    1. Thanks for the query.... This is really good. It has covered all possible path.

      Delete