Datalog
Encyclopedia
Datalog is a query
and rule language for deductive database
s that syntactically is a subset of Prolog
. Its origins date back to the beginning of logic programming
, but it became prominent as a separate area around 1977 when Hervé Gallaire and Jack Minker
organized a workshop on logic
and database
s. David Maier
is credited with coining the term Datalog.
and complete
. It can be done efficiently even for large databases. Query evaluation is usually done using bottom-up strategies.
In contrast to Prolog, it
These rules make the set of all possible proofs finite, with the consequence that all datalog programs terminate (unlike Prolog programs). As a consequence, statements and predicates of a program can be stated in any order (unlike Prolog). Various methods have been proposed to efficiently perform queries, e.g. the Magic Sets algorithm, or tabled logic programming.
Datalog engines are behind specialised database system
s such as Intellidimension's database for the semantic web
. Moreover, some widely used database systems include ideas and algorithms developed for Datalog. For example, the SQL:1999
standard includes recursive queries, and the Magic Sets algorithm (initially developed for the faster evaluation of Datalog queries) is implemented in IBM's DB2
.
Two extensions that have been made to Datalog include an extension to allow object-oriented programming and an extension to allow disjunctions
as heads of clause
s. Both extensions have major impacts on the definition of Datalog's semantics and on the implementation of a corresponding Datalog interpreter.
parent(bill,mary).
parent(mary,john).
These two lines define two facts, i.e. things that always hold. They can be intuitively understood as: the parent of mary is bill and the parent of john is mary.
ancestor(X,Y) :- parent(X,Y).
ancestor(X,Y) :- parent(X,Z),ancestor(Z,Y).
These two lines describe the rules that define the ancestor relationship. A rule consists of two main parts separated by the :- symbol. The part to the left of this symbol is the head, the part to the right the body of the rule. A rule is read (and can be intuitively understood) as if it is known that . Uppercase letters stand for variables. Hence in the example the first rule can be read as X is the ancestor of Y if it is known that X is the parent of Y. And the second rule as X is the ancestor of Y if it is known that X is the parent of some Z and Z is the ancestor of Y. The ordering of the clauses is irrelevant in Datalog in contrast to Prolog which depends on the ordering of clauses for computing the result of the query call.
Datalog distinguishes between extensional and intensional predicate symbols. While extensional predicate symbols are only defined by facts, intensional predicate symbols are defined only by rules. In the example above
?- ancestor(bill,X).
The query above asks for all ancestors of bill and would return mary and john when posed against a Datalog system containing the facts and rules described above.
Query language
Query languages are computer languages used to make queries into databases and information systems.Broadly, query languages can be classified according to whether they are database query languages or information retrieval query languages...
and rule language for deductive database
Deductive database
A Deductive database is a database system that can make deductions based on rules and facts stored in the database. Datalog is the language typically used to specify facts, rules and queries in deductive databases...
s that syntactically is a subset of Prolog
Prolog
Prolog is a general purpose logic programming language associated with artificial intelligence and computational linguistics.Prolog has its roots in first-order logic, a formal logic, and unlike many other programming languages, Prolog is declarative: the program logic is expressed in terms of...
. Its origins date back to the beginning of logic programming
Logic programming
Logic programming is, in its broadest sense, the use of mathematical logic for computer programming. In this view of logic programming, which can be traced at least as far back as John McCarthy's [1958] advice-taker proposal, logic is used as a purely declarative representation language, and a...
, but it became prominent as a separate area around 1977 when Hervé Gallaire and Jack Minker
Jack Minker
Jack Minker is a leading authority in artificial intelligence, deductive databases, logic programming and non-monotonic reasoning. He is also an internationally recognized leader in the field of human rights of computer scientists.-Career:...
organized a workshop on logic
Logic
In philosophy, Logic is the formal systematic study of the principles of valid inference and correct reasoning. Logic is used in most intellectual activities, but is studied primarily in the disciplines of philosophy, mathematics, semantics, and computer science...
and database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
s. David Maier
David Maier
David Maier is a professor of computer science at Portland State University.He has been chairman of the program committee of ACM SIGMOD. He also served as an associate editor of ACM Transactions on Database Systems...
is credited with coining the term Datalog.
Features, limitations and extensions
Query evaluation with Datalog is based on first order logic, and is thus soundSoundness
In mathematical logic, a logical system has the soundness property if and only if its inference rules prove only formulas that are valid with respect to its semantics. In most cases, this comes down to its rules having the property of preserving truth, but this is not the case in general. The word...
and complete
Completeness
In general, an object is complete if nothing needs to be added to it. This notion is made more specific in various fields.-Logical completeness:In logic, semantic completeness is the converse of soundness for formal systems...
. It can be done efficiently even for large databases. Query evaluation is usually done using bottom-up strategies.
In contrast to Prolog, it
- disallows complex terms as arguments of predicates, e.g. p(1, 2) is admissible but not p(f1(1), 2),
- imposes certain stratificationStratification (mathematics)-In mathematical logic:In mathematical logic, stratification is any consistent assignment of numbers to predicate symbols guaranteeing that a unique formal interpretation of a logical theory exists...
restrictions on the use of negation and recursionRecursionRecursion is the process of repeating items in a self-similar way. For instance, when the surfaces of two mirrors are exactly parallel with each other the nested images that occur are a form of infinite recursion. The term has a variety of meanings specific to a variety of disciplines ranging from...
, and - only allows range-restricted variables, i.e. each variable in the conclusion of a rule must also appear in a not negated clause in the premisePremisePremise can refer to:* Premise, a claim that is a reason for, or an objection against, some other claim as part of an argument...
of this rule.
These rules make the set of all possible proofs finite, with the consequence that all datalog programs terminate (unlike Prolog programs). As a consequence, statements and predicates of a program can be stated in any order (unlike Prolog). Various methods have been proposed to efficiently perform queries, e.g. the Magic Sets algorithm, or tabled logic programming.
Datalog engines are behind specialised database system
Database system
A database system is a term that is typically used to encapsulate the constructs of a data model, database Management system and database....
s such as Intellidimension's database for the semantic web
Semantic Web
The Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...
. Moreover, some widely used database systems include ideas and algorithms developed for Datalog. For example, the SQL:1999
SQL:1999
SQL:1999 was the fourth revision of the SQL database query language. The latest revision of the standard is SQL:2008.-Summary:The SQL:1999 standard, also known as SQL3, was published in 1999. Unlike previous editions, the standard's name used a colon instead of a hyphen for consistency with the...
standard includes recursive queries, and the Magic Sets algorithm (initially developed for the faster evaluation of Datalog queries) is implemented in IBM's DB2
IBM DB2
The IBM DB2 Enterprise Server Edition is a relational model database server developed by IBM. It primarily runs on Unix , Linux, IBM i , z/OS and Windows servers. DB2 also powers the different IBM InfoSphere Warehouse editions...
.
Two extensions that have been made to Datalog include an extension to allow object-oriented programming and an extension to allow disjunctions
as heads of clause
Clause (logic)
In logic, a clause is a finite disjunction ofliterals. Clausesare usually written as follows, where the symbols l_i areliterals:l_1 \vee \cdots \vee l_nIn some cases, clauses are written as sets of literals, so that clause above...
s. Both extensions have major impacts on the definition of Datalog's semantics and on the implementation of a corresponding Datalog interpreter.
Example
Example Datalog program:parent(bill,mary).
parent(mary,john).
These two lines define two facts, i.e. things that always hold. They can be intuitively understood as: the parent of mary is bill and the parent of john is mary.
ancestor(X,Y) :- parent(X,Y).
ancestor(X,Y) :- parent(X,Z),ancestor(Z,Y).
These two lines describe the rules that define the ancestor relationship. A rule consists of two main parts separated by the :- symbol. The part to the left of this symbol is the head, the part to the right the body of the rule. A rule is read (and can be intuitively understood) as if it is known that . Uppercase letters stand for variables. Hence in the example the first rule can be read as X is the ancestor of Y if it is known that X is the parent of Y. And the second rule as X is the ancestor of Y if it is known that X is the parent of some Z and Z is the ancestor of Y. The ordering of the clauses is irrelevant in Datalog in contrast to Prolog which depends on the ordering of clauses for computing the result of the query call.
Datalog distinguishes between extensional and intensional predicate symbols. While extensional predicate symbols are only defined by facts, intensional predicate symbols are defined only by rules. In the example above
ancestor
is an intensional predicate symbol, and parent
is extensional. Predicates may also be defined by facts and rules and therefore neither be purely extensional nor intensional, but any datalog program can be rewritten into an equivalent program without such predicate symbols with duplicate roles.?- ancestor(bill,X).
The query above asks for all ancestors of bill and would return mary and john when posed against a Datalog system containing the facts and rules described above.
Systems implementing Datalog
Most implementations of Datalog stem from university projects. Here is a short list of systems that are either based on Datalog or provide a Datalog interpreter:Free software/Open source
- DES, an open-source implementation of Datalog to be used for teaching Datalog in courses. (GNU GPL)
- bddbddb, an implementation of Datalog done at Stanford UniversityStanford UniversityThe Leland Stanford Junior University, commonly referred to as Stanford University or Stanford, is a private research university on an campus located near Palo Alto, California. It is situated in the northwestern Santa Clara Valley on the San Francisco Peninsula, approximately northwest of San...
. It is mainly used to query Java bytecode including points-to analysis on large Java programs. (GNU LGPL) - IRIS, an open-source Datalog engine implemented in JavaJava (programming language)Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
. IRIS extends Datalog with function symbols, built-in predicates, locally stratified or un-stratified logic programs (using the well-founded semantics), unsafe rules and XML schema data types. (GNU LGPL v2.1) - Oroboro, an open-source RDFResource Description FrameworkThe Resource Description Framework is a family of World Wide Web Consortium specifications originally designed as a metadata data model...
processing framework implemented in JavaJava (programming language)Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
, includes a query and inference language based on Datalog. (GNU GPL v3+) - Inter4QL, an open-source command-line interpreter of Datalog-like 4QL query language implemented in C++ for Windows, Mac OS X and Linux. Negation is allowed in heads and bodies of rules as well as in recursion. (GNU GPL v3)
- Cascalog, a Clojure library for querying data stored on HadoopHadoopApache Hadoop is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data...
clusters (GNU GPL v3) - Clojure Datalog, a contributed ClojureClojureClojure |closure]]") is a recent dialect of the Lisp programming language created by Rich Hickey. It is a general-purpose language supporting interactive development that encourages a functional programming style, and simplifies multithreaded programming....
library implementing aspects of Datalog. (Eclipse Public License 1.0) - XSBXSBXSB is the name of a dialect of the Prolog programming language and its implementation developed at Stony Brook University in collaboration with the Katholieke Universiteit Leuven, the New University of Lisbon, Uppsala University and software vendor XSB, Inc....
, a logic programming and deductive database system for Unix and Windows. (GNU LGPL) - ConceptBase, a deductive and object-oriented database system based on a Datalog query evaluator. It is mainly used for conceptual modeling and metamodeling. (FreeBSD-style license)
- Datalog, a lightweight deductive database system written in Lua. (GNU LGPL)
- The JenaJena (framework)Jena is an open source Semantic Web framework for Java. It provides an API to extract data from and write to RDF graphs. The graphs are represented as an abstract "model". A model can be sourced with data from files, databases, URLs or a combination of these...
Semantic Web framework includes a Datalog implementation as part of its general purpose rule engine, which also forms the implementation of OWLWeb Ontology LanguageThe Web Ontology Language is a family of knowledge representation languages for authoring ontologies.The languages are characterised by formal semantics and RDF/XML-based serializations for the Semantic Web...
and RDFS support
Non-free software
- .QL.QL.QL is an object-oriented query language used to retrieve data from relational database management systems. It is reminiscent of the standard query language SQL and the object-oriented programming language Java. .QL is an object-oriented variant of a logic programming language known in the...
, a commercial object-oriented variant of Datalog created by Semmle. (patents pending) - SecPALSecPALSecPAL is a declarative, logic-based, security policy language that has been developed to support the complex access control requirements of large scale distributed computing environments.-Common Access Control Requirements:...
a security policy language developed by Microsoft ResearchMicrosoft ResearchMicrosoft Research is the research division of Microsoft created in 1991 for developing various computer science ideas and integrating them into Microsoft products. It currently employs Turing Award winners C.A.R. Hoare, Butler Lampson, and Charles P... - DLVDLVThe DLV system is a disjunctive logic programming system, implementing the stable model semantics under the Answer set programming paradigm. It extends the datalog language to allow the use of OR in rules...
is a commercial Datalog extension that supports disjunctive head clauses. - Datalog for Racket, an implementation of Datalog for the Racket programming language.(unknown licence, possibly free software)
- OverLog, an implementation of Datalog for overlay networks. (unknown licence)
- LogicBlox, a commercial implementation of Datalog used for web-based retail planning and insurance applications
- Meld, a commercial extension to Datalog for programming distributed systems
- Intellidimension, provides several commercial implementations of datalog engine based on Semantic Web standards
- StrixDBStrixDBStrixDB is a Triplestore designed to manipulate middle sized RDF graphs.- Features :StrixDB main features are:*compliance with SPARQL and SPARQL/Update....
: a commercial RDF graph store, SPARQLSPARQLSPARQL is an RDF query language; its name is an acronym that stands for SPARQL Protocol and RDF Query Language. It was made a standard by the RDF Data Access Working Group of the World Wide Web Consortium, and considered as one of the key technologies of semantic web...
compliant with LuaLuaLua may refer to:* Lua , a Roman goddess* Lua , a traditional Hawaiian martial art* Lua , a lightweight, extensible programming language* Lua , a single by the folk rock band Bright Eyes...
API and Datalog inference capabilities. Could be used as httpd (Apache HTTP ServerApache HTTP ServerThe Apache HTTP Server, commonly referred to as Apache , is web server software notable for playing a key role in the initial growth of the World Wide Web. In 2009 it became the first web server software to surpass the 100 million website milestone...
) module or standalone. (although beta versions are under the Perl Artistic License 2.0)
See also
- Answer set programmingAnswer set programmingAnswer set programming is a form of declarative programming oriented towards difficult search problems. It is based on the stable model semantics of logic programming. In ASP, search problems are reduced to computing stable models, and answer set solvers -- programs for generating stable...
- SWRL
- D (data language specification)D (data language specification)D is a set of requirements for what Christopher J. Date and Hugh Darwen believe a relational database query language ought to be like. It is proposed in their book The Third Manifesto.-Overview:...
- D4 (programming language)D4 (programming language)D4 is a computer language used in Dataphor, a relational database management system.-Syntax:Alphora, the creators of D4, have given it a Pascal like syntax...
- IBM DB2IBM DB2The IBM DB2 Enterprise Server Edition is a relational model database server developed by IBM. It primarily runs on Unix , Linux, IBM i , z/OS and Windows servers. DB2 also powers the different IBM InfoSphere Warehouse editions...
- Conjunctive queryConjunctive queryIn database theory, a conjunctive query is a restricted form of first-order queries. A large part of queries issued on relational databases can be written as conjunctive queries, and large parts of other first-order queries can be written as conjunctive queries....
- 4QL Query Language