points by YeGoblynQueenne 5 years ago

>> It says it operates in-memory but also that it starts up from a set of known facts (which sounds like it could just read the current matview at startup).

That "it starts up from a set of known facts" refers to the "bottom-up" evaluation of datalog programs, in contrast with the top-down evaluation in Prolog.

Consider for instance the following database of facts and rules:

  grandparent(X,Y):- parent(X,Z), parent(Z,Y). % (1)
  parent(maria, kostas).
  parent(kostas,yannis).

Prolog's top-down evaluation starts with the "head" of rule (1), grandparent(X,Y) and proceeds to its "body", (parent(X,Z), parent(Z,Y)) to instantiate the variables {X,Y,Z} according to the facts in the database and derive the new fact grandparent(maria,yannis).

Datalog's bottom-up evaluation instead starts with the facts in the program: parent(maria,kostas), parent(kostas, yannis) and adds those to the set of immediate consequences of the program (TP is called the "immediate consequence operator"):

  TP₀ = {parent(maria,kostas), parent(kostas,yannis)}

Now the "parent" facts in the body of the "grandparent" rule can be instantiated as follows:

  (parent(maria,kostas), parent(kostas,yannis)).

And finally the head of the rule can be instantiated to derive the same new atom as with Prolog's top-down evaluation:

  TP₁ = TP₀ ∪ {grandparent(maria,yannis)}

The advantage of bottom-up evaluation is decidability, as long as a program is datalog, which means no function symbols other than constants (which are function symbols of arity 0, i.e. with 0 arguments) and no negation. For instance, given the following program with a left-recursive clause, datalog's bottom-up evaluation terminates:

  ancestor(X,Y):- parent(X,Y). % (2)
  ancestor(X,Y):- ancestor(X,Z), ancestor(Z,Y). % (3)
  parent(maria,kostas).
  parent(kostas,yannis).

Given this program, datalog's bottom-up evaluation will first add the two facts of "parent" to the immediate consequences of the program; then instantiate the fact parent(X,Y) in the body of rule (2) to the two facts of "parent" in the program; and finally instantiate the head of rule (2), deriving two new facts of "ancestor":

  TP₁ = {ancestor(maria,kostas), ancestor(kostas,yannis), parent(maria,kostas),
  parent(kostas,yannis)}

Then the two facts of "ancestor" in the body of rule (3) will be instantiated to the facts derived in the previous step and a new fact derived from the instantiation of the variables in the head of the rule :

  TP₂ = TP₀ ∪ {ancestor(maria,yannis)}

And then the program will terminate because there are no more new facts to derive.

By contrast, Prolog's top-down evaluation will first correctly derive the two new "ancestor" facts from the head of rule (2) but then will try to evaluate the first ancestor(X,Z) fact in the body of rule (3). This will take it back to the head of rule (3) and evaluation will be stuck in an infinite recursion.

The trade-off is that bottom-up evaluation cannot handle negation or functions (including the cons function used to construct lists) and so is, in that sense, incomplete, whereas Prolog can do both. For this reason, datalog is preferred as a database language that offers a limited ability for reasoning (certainly richer than what is possible e.g. with SQL, that essentially only allows facts) whereas Prolog is preferred for general-purpose programming.