I'd like yr opinions on this idea that might greatly enhance the
performance of CORAL. It might be that this is an optimization that
has already been suggested : i dont know. Else, it is probably wrong :-)
anyway, here goes ...
consider the vanilla Rt Recursive Anc (magic rewritten)
m_anc_bf(Y) :-
m_anc_bf(X),
parent(X,Y).
anc_bf(X,Y) :-
m_anc_bf(X),
parent(X,Y).
anc_bf(X,Z) :-
m_anc_bf(X),
parent(X,Y),
anc_bf(Y,Z).
when run on a tree data set, with the query anc(1,X), _all_ the nodes of
the tree are magic nodes. However, the very last layer (which contains
the greatest number of nodes in a tree) need not be marked, since any query
on a leaf node is bound to fail. This is the intuition behind the suggestion.
In the two rules defining anc_bf(X,Y), the predicate m_anc_bf(X) acts as a
selection filter for the rest of the rule. However, the join of m_anc_bf(X)
and parent(X,Y) also acts as a join filter with its own selectivity. In
this case, the join selectivity is much greater than the query selectivity
which is 1. But we do not use the join selectivity for restricting the number
of goals generated. If instead, we rewrote the program as follows :
m_anc_bf(Y) :-
m_anc_bf(X),
parent(X,Y), parent(Y,_).
anc_bf(X,Y) :-
m_anc_bf(X),
parent(X,Y).
anc_bf(X,Z) :-
m_anc_bf(X),
parent(X,Y),
anc_bf(Y,Z).
the number of magic nodes is restricted to all the non-leaf nodes, and the
program runs much faster ( a clear order of magnitude : i tried it ).
The only change is that the join with parent() is being pushed down to
the scc that computes m_anc_bf(). It is true that the join with parent()
is being performed now an extra time in the m_anc_bf() computation.
Also, you might say, this is fine for a tree, but what about a chain,
where this buys next to nothing. This is where the next modification
comes in. Since we are anyway pushing the join with parent() down into the
computation of m_anc_bf(), why not store the result and use it, instead
of recomputing the join.
m1anc_bf(X,Y) :-
m_anc_bf(X),
parent(X,Y). // this rule just catches the query coming in
m1anc_bf(X,Y) :-
m1anc_bf(X, Z),
parent(Z,Y). // this is the recursive "magic" rule
anc_bf(X,Y) :-
m1anc_bf(X,Y). // since the join is stored already, use it
anc_bf(X,Z) :-
m1anc_bf(X, Y), // since the join is stored already, use it
anc_bf(Y,Z). // here's where the saving is !
While the simple join_pushing approach did a bit worse than plain magic
on a chain data set, this caching join_pushing did much better than
magic. I realize that in this particular case, we can use factoring to
throw away the recursive rule for anc_bf(), but that seems to me to be
a totally independent issue. I was able to do the same for the sg program.
It also seems to be orthogonal to whether supmagic or magic is being used.
In general, if a magic predicate is being used in the rhs of a rule in a
join with another predicate p, we can push the join with p if p is
in the same scc as the magic predicate, or in a lower scc.