ERROL
Encyclopedia
ERROL is a declarative database query
and manipulation
language for the Entity-relationship model
(ERM). It is applicable to any data model
on which ERM can be mapped
, virtually any general purpose database data model. It is based on the capability of ER diagrams to be described accurately by simple
Natural language
(NL) sentences. A specification of a complex operation upon an ERM database can be described accurately by a complex
and/or compound
NL sentence constructed from the simple sentences describing the respective ER diagram. An ERROL expression mimics such NL sentence with one-to-one correspondence
between ERROL subexpressions and NL subsentences: An ERROL expression can look like the corresponding NL sentence, or at least like a similar, equivalent one. This allows to write in ERROL very complex queries by simple conversion from their NL specifications. It also allows a straightforward checking of an ERROL expression meeting a complex NL specification. With such characteristics it can be a foundation for future Data management
languages, more convenient for human
s to use than existing languages which may need complex expressions even for moderately complex NL expressions (e.g., SQL
; see Example below).
ERROL is also applicable to newer applications like querying the Semantic web
using ontologies. As well it is applicable to other than ERM semantic data model
s, e.g., Object-Role Modeling (ORM), which has many similarities to the ERM. ERROL has some similarities to the later Gellish
(in particular to Gellish English
), a formal language with a strong connection to natural languages, and can use its dictionaries
.
Reshaped relational algebra (RRA; Markowitz and Raz 1983b, 1984; Raz 1986), with operators that follow the semantics
of respective major NL constructs, has been developed to support ERROL over relational database
s. It is used both to specify ERROL's semantics concisely and accurately, and to implement ERROL effectively over relational database
s.
ERROL and its RRA translation expose and exploit the connection between the way humans reason and talk about needed information, possibly very complex, and the database operations needed in order to compute this information from the database data. A sequence of such RRA operations is generated automatically from an ERROL expression by an ERROL-to-RRA compiler
. The compiler output has been applied directly to a relational database, and also has a translation to SQL
, the standard interface, for a straightforward, "regular" execution by SQL database systems, to take advantage of their query optimization
.
(NL) can be vague and ambiguous, ERROL (Markowitz and Raz 1983a) is accurate with well-defined semantics, and thus provides a check for NL queries. With these properties ERROL is a good compromise between NL as a query language and other database languages: Long experience with database languages shows that simple queries are easy to phrase with both NL and most database languages. However, some complicated queries may be hard to phrase accurately in most languages, including NL. In most cases NL is the natural query tool for humans, but being sometimes vague and ambiguous, often a complex NL query's meaning is not accurately defined, and a dialogue that involves a well-defined formal language feedback is needed to verify the intended meaning when NL is used. In practice a query specification is typically initially thought of, formulated, in NL, and then translated by an expert to the desired database language. Validating translation correctness (semantic equivalence) is usually difficult for complex queries and relies completely on the expert's understanding of the NL specification. Wrong interpretation can be very costly, both in execution time and implications. Being very close to NL, and identical to it in most cases, ERROL eases these difficulties.
It is worthwhile noting that the English language
utilization in ERROL is different in nature from its utilization in other computer languages like COBOL
and SQL
: While in the latter English-like language commands
are utilized, in ERROL a predicate (in a query or other manipulation) is specified by an English-like expression.
Though originally motivated by the linguistic
aspect of ERM, ERROL has been found to provide a powerful paradigm beyond this aspect: Specification of a complex query predicate by Navigation
in an ER Diagram ("query by navigation"; the emphasis is on navigation in an ER Diagram, versus any other data structure). Even without exploiting many NL elements, the navigation itself together with basic schema
elements provides accurate specification. As a matter of fact the navigation is independent of any specific NL, and presents the semantic relations (Relationships and comparisons) between objects of interest (Entities and attributes, and resulting objects by arithmetic operations, aggregations, and logic derivations), where the ER diagram provides a limited form of semantic network
. All the needed information for a complete specification over a given schema exists in a skeleton
ERROL specification which includes only ER Diagram elements, and may include constants, aggregate function
s, logical connective
s, arithmetic operations
and comparison operators. Using such language-independent representation, a specification can be easily machine-translated accurately among different natural languages, using small numbers of each language's syntactic
constructs, and having an ER diagram described by respective simple sentences in different languages. Specification reconstruction in English (that can be re-processed correctly by the ERROL compiler) from the language-independent representation has been done successfully (within the ERROL System project; see the Brief history section below).
Reshaped relational algebra (RRA; Markowitz and Raz 1983b, 1984) has been developed to support ERROL over relational database
s. RRA is equivalent to Relational algebra
(RA) in expression power (each one algebra's operator can be expressed by the other algebra's operators (Raz 1986) ). A strong correlation exists between ERROL constructs (and corresponding NL's) and RRA operators. This is used both to specify ERROL's semantics
concisely and accurately, and to implement ERROL effectively over relational database
s. For any ERROL specification (expression; query or other manipulation) a corresponding RRA expression is derived in a straightforward way. This RRA expresstion computes the specification over any relational database with schema that is semantically relevant to the specification (possibly through schema transformation for ERM compatibility with the specification, when needed; the computation result is a relation).
comprises portions of each type). An ERROL expression describes a navigation hypertree (hypertree
is an acyclic
hypergraph
), generated by navigating in the ER diagram. Names and constants are nodes. Entities and Relationships define one types of edges, sets of attribute names. A connecting ERROL construct defines a second type of edge, a "regular" tree
(dual node) edge. (See (Raz 1987) below for more detail.) The navigation may include jumps in the diagram (in case of comparisons) or repetitions over same sections in the diagrams (if the query specification requires repeated utilization of same diagram elements). A complex specification of a Hypertree, a computaion by a long NL sentence, may suffer from ambiguities due to unclear connection between sentence parts. ERROL solves this problem by inserted parantheses (primarily parentheses) when needed, to uniquely define an expression's hypertree with the correct intended meaning.
used for pseudocode
) is an extension of a subset of the English language
and an enhancement of the initial ERROL. In SE several "syntactic sugar
" elements of ERROL have been made more flexible, and further flexibility with word usage and sentence structure has been introduced, to get it closer to NL. However, the original correspondence between NL and respective ERROL basic constructs has been preserved. In what follows no distinction is made between ERROL and SE.
The linguistic aspect of the Entity relationship model, and the way it is utilized by ERROL for navigation in an ER Diagram is demonstrated by the following example:
s. RRA is equivalent to the Relational Algebra
(RA) in expression power (each one algebra's operator can be expressed by the other algebra's operators (Raz 1986) ), and has strong analogies to some basic NL constructs. As such it is ideal for describing the semantics
of ERROL, as well as for implementing ERROL over relational database
s. For any ERROL specification (expression; query or other manipulation) a crresponding RRA expression computes the specification over a relational database. The main difference between RRA and RA is with natural join
and projection
operations embedded in various RRA operators.
An ERROL expression (or respective hypertree) can be translated in a straightforward way to an RRA expression. An RRA expression is a partial order (typically with several compatible sequences) of RRA operations, which provides a procedure for computing the value of the ERROL expression over a relational database (i.e., computing the query or data manipulation resulting relation) by manipulating its relations. Each ERROL subexpression type has a corresponding RRA operator. The subexpression variables (entity, relationship, and attribute names) and constants comprise the operator's parameters. The operators' order is determined by the hypertree structure of the (possibly paranthesized) ERROL expression. By proper renaming, the join operation embedded in RRA operators automatically connects between corresponding attributes in entities and relationships, and resolves references by using a single entity identifying temporary name for all that entity occurrences in the query that have the same reference symbol (for same entity occurrences with different reference symbols, different temporary names are given; for same entity occurrences with no reference symbol, different temporary names are given).
RRA expression simplification, and consistency
checking together with subexpression contradiction
and tautology
identification, can be done using RRA axiom
s and theorem
s (Raz 1986). RRA expression computation optimization
can be done similarly to the ways it is done for Relational algebra
(RA).
as the subject of his M.Sc. thesis
(Markowitz 1983) at the Technion - Israel Institute of Technology
(advisor: Yoav Raz). Further ERROL enhancements and implementation, including ERROL to RRA translation
(M.Sc. project of Reuven Cohen; Cohen 1984), have been done by Yoav Raz together with graduate students. In 1984 Yoav Raz, Victor Markowitz, and Reuven Cohen won the Computer Science Award of ILA – The Israeli Information Technology Association.
The ERROL System (Raz et al. 1984) implements database queries and manipulations over a relational database using SE and RRA. During the years 1982-1988 it has been developed at the Technion, Israel
, using UNIX
, Lex
, YACC
, and Ingres, and further enhanced at UCSD (see output examples in Raz 1987).
(three-way) relationships.
Using the schema in Example 1 above consider the following imaginary complex query:
A possible straightforward skeleton ERROL expression for this query is the following:
or the following:
ERROL query:
If half the number of "Red" items requested by any single department is considered, the NL sentence is slightly changed:
A possible ERROL query is the following:
ERROL query:
ERROL query:
Query language
Query languages are computer languages used to make queries into databases and information systems.Broadly, query languages can be classified according to whether they are database query languages or information retrieval query languages...
and manipulation
Data Manipulation Language
A data manipulation language is a family of syntax elements similar to a computer programming language used for inserting, deleting and updating data in a database...
language for the Entity-relationship model
Entity-relationship model
In software engineering, an entity-relationship model is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements...
(ERM). It is applicable to any data model
Data model
A data model in software engineering is an abstract model, that documents and organizes the business data for communication between team members and is used as a plan for developing applications, specifically how data is stored and accessed....
on which ERM can be mapped
Data mapping
Data mapping is the process of creating data element mappings between two distinct data models. Data mapping is used as a first step for a wide variety of data integration tasks including:...
, virtually any general purpose database data model. It is based on the capability of ER diagrams to be described accurately by simple
Simple sentence
A simple sentence is a sentence structure that contains one independent clause and no dependent clauses.-Examples:*The runner jumped....
Natural language
Natural language
In the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...
(NL) sentences. A specification of a complex operation upon an ERM database can be described accurately by a complex
Complex sentence
A complex sentence is a sentence with one independent clause and at least one dependent clause.-Examples:* I ate the meal that you cooked....
and/or compound
Compound sentence (linguistics)
A compound sentence is composed of at least two independent clauses. It does not require a dependent clause. The clauses are joined by a coordinating conjunction , a correlative conjunction , a semicolon that functions as a conjunction, or a conjunctive adverb preceded by a semicolon. A conjunction...
NL sentence constructed from the simple sentences describing the respective ER diagram. An ERROL expression mimics such NL sentence with one-to-one correspondence
Bijection
A bijection is a function giving an exact pairing of the elements of two sets. A bijection from the set X to the set Y has an inverse function from Y to X. If X and Y are finite sets, then the existence of a bijection means they have the same number of elements...
between ERROL subexpressions and NL subsentences: An ERROL expression can look like the corresponding NL sentence, or at least like a similar, equivalent one. This allows to write in ERROL very complex queries by simple conversion from their NL specifications. It also allows a straightforward checking of an ERROL expression meeting a complex NL specification. With such characteristics it can be a foundation for future Data management
Data management
Data management comprises all the disciplines related to managing data as a valuable resource.- Overview :The official definition provided by DAMA International, the professional organization for those in the data management profession, is: "Data Resource Management is the development and execution...
languages, more convenient for human
Human
Humans are the only living species in the Homo genus...
s to use than existing languages which may need complex expressions even for moderately complex NL expressions (e.g., SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....
; see Example below).
ERROL is also applicable to newer applications like querying the Semantic web
Semantic Web
The Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...
using ontologies. As well it is applicable to other than ERM semantic data model
Semantic data model
A semantic data model in software engineering has various meanings:# It is a conceptual data model in which semantic information is included. This means that the model describes the meaning of its instances...
s, e.g., Object-Role Modeling (ORM), which has many similarities to the ERM. ERROL has some similarities to the later Gellish
Gellish
Gellish is a controlled natural language, also called a formal language, in which information and knowledge can be expressed in such a way that it is computer-interpretable, as well as system-independent. Gellish is a structured subset of natural language that is suitable for information modelling...
(in particular to Gellish English
Gellish English
Gellish English is a variant of Gellish and is a formal language, which means that it is structured and formalised subset of natural English that is computer interpretable. Its definition includes an English dictionary of concepts that is arranged in a taxonomy and that is extended into an ontology...
), a formal language with a strong connection to natural languages, and can use its dictionaries
Dictionary
A dictionary is a collection of words in one or more specific languages, often listed alphabetically, with usage information, definitions, etymologies, phonetics, pronunciations, and other information; or a book of words in one language with their equivalents in another, also known as a lexicon...
.
Reshaped relational algebra (RRA; Markowitz and Raz 1983b, 1984; Raz 1986), with operators that follow the semantics
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....
of respective major NL constructs, has been developed to support ERROL over relational database
Relational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
s. It is used both to specify ERROL's semantics concisely and accurately, and to implement ERROL effectively over relational database
Relational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
s.
ERROL and its RRA translation expose and exploit the connection between the way humans reason and talk about needed information, possibly very complex, and the database operations needed in order to compute this information from the database data. A sequence of such RRA operations is generated automatically from an ERROL expression by an ERROL-to-RRA compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...
. The compiler output has been applied directly to a relational database, and also has a translation to SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....
, the standard interface, for a straightforward, "regular" execution by SQL database systems, to take advantage of their query optimization
Query optimization
Query optimization is a function of many relational database management systems in which multiple query plans for satisfying a query are examined and a good query plan is identified. This may or not be the absolute best strategy because there are many ways of doing plans. There is a trade-off...
.
Introduction
While Natural languageNatural language
In the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...
(NL) can be vague and ambiguous, ERROL (Markowitz and Raz 1983a) is accurate with well-defined semantics, and thus provides a check for NL queries. With these properties ERROL is a good compromise between NL as a query language and other database languages: Long experience with database languages shows that simple queries are easy to phrase with both NL and most database languages. However, some complicated queries may be hard to phrase accurately in most languages, including NL. In most cases NL is the natural query tool for humans, but being sometimes vague and ambiguous, often a complex NL query's meaning is not accurately defined, and a dialogue that involves a well-defined formal language feedback is needed to verify the intended meaning when NL is used. In practice a query specification is typically initially thought of, formulated, in NL, and then translated by an expert to the desired database language. Validating translation correctness (semantic equivalence) is usually difficult for complex queries and relies completely on the expert's understanding of the NL specification. Wrong interpretation can be very costly, both in execution time and implications. Being very close to NL, and identical to it in most cases, ERROL eases these difficulties.
It is worthwhile noting that the English language
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...
utilization in ERROL is different in nature from its utilization in other computer languages like COBOL
COBOL
COBOL is one of the oldest programming languages. Its name is an acronym for COmmon Business-Oriented Language, defining its primary domain in business, finance, and administrative systems for companies and governments....
and SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....
: While in the latter English-like language commands
Command (computing)
In computing, a command is a directive to a computer program acting as an interpreter of some kind, in order to perform a specific task. Most commonly a command is a directive to some kind of command line interface, such as a shell....
are utilized, in ERROL a predicate (in a query or other manipulation) is specified by an English-like expression.
Though originally motivated by the linguistic
Natural language
In the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...
aspect of ERM, ERROL has been found to provide a powerful paradigm beyond this aspect: Specification of a complex query predicate by Navigation
Navigation
Navigation is the process of monitoring and controlling the movement of a craft or vehicle from one place to another. It is also the term of art used for the specialized knowledge used by navigators to perform navigation tasks...
in an ER Diagram ("query by navigation"; the emphasis is on navigation in an ER Diagram, versus any other data structure). Even without exploiting many NL elements, the navigation itself together with basic schema
Logical schema
A Logical Schema is a data model of a specific problem domain expressed in terms of a particular data management technology. Without being specific to a particular database management product, it is in terms of either relational tables and columns, object-oriented classes, or XML tags...
elements provides accurate specification. As a matter of fact the navigation is independent of any specific NL, and presents the semantic relations (Relationships and comparisons) between objects of interest (Entities and attributes, and resulting objects by arithmetic operations, aggregations, and logic derivations), where the ER diagram provides a limited form of semantic network
Semantic network
A semantic network is a network which represents semantic relations among concepts. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, which represent concepts, and edges.- History :...
. All the needed information for a complete specification over a given schema exists in a skeleton
Skeleton (category theory)
In mathematics, a skeleton of a category is a subcategory which, roughly speaking, does not contain any extraneous isomorphisms. In a certain sense, the skeleton of a category is the "smallest" equivalent category which captures all "categorical properties". In fact, two categories are equivalent...
ERROL specification which includes only ER Diagram elements, and may include constants, aggregate function
Aggregate function
In computer science, an aggregate function is a function where the values of multiple rows are grouped together as input on certain criteria to form a single value of more significant meaning or measurement such as a set, a bag or a list....
s, logical connective
Logical connective
In logic, a logical connective is a symbol or word used to connect two or more sentences in a grammatically valid way, such that the compound sentence produced has a truth value dependent on the respective truth values of the original sentences.Each logical connective can be expressed as a...
s, arithmetic operations
Arithmetic
Arithmetic or arithmetics is the oldest and most elementary branch of mathematics, used by almost everyone, for tasks ranging from simple day-to-day counting to advanced science and business calculations. It involves the study of quantity, especially as the result of combining numbers...
and comparison operators. Using such language-independent representation, a specification can be easily machine-translated accurately among different natural languages, using small numbers of each language's syntactic
Syntax
In linguistics, syntax is the study of the principles and rules for constructing phrases and sentences in natural languages....
constructs, and having an ER diagram described by respective simple sentences in different languages. Specification reconstruction in English (that can be re-processed correctly by the ERROL compiler) from the language-independent representation has been done successfully (within the ERROL System project; see the Brief history section below).
Reshaped relational algebra (RRA; Markowitz and Raz 1983b, 1984) has been developed to support ERROL over relational database
Relational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
s. RRA is equivalent to Relational algebra
Relational algebra
Relational algebra, an offshoot of first-order logic , deals with a set of finitary relations that is closed under certain operators. These operators operate on one or more relations to yield a relation...
(RA) in expression power (each one algebra's operator can be expressed by the other algebra's operators (Raz 1986) ). A strong correlation exists between ERROL constructs (and corresponding NL's) and RRA operators. This is used both to specify ERROL's semantics
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....
concisely and accurately, and to implement ERROL effectively over relational database
Relational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
s. For any ERROL specification (expression; query or other manipulation) a corresponding RRA expression is derived in a straightforward way. This RRA expresstion computes the specification over any relational database with schema that is semantically relevant to the specification (possibly through schema transformation for ERM compatibility with the specification, when needed; the computation result is a relation).
Language overview
ERROL is a declarative language (a specification of what is requested is given; rather than a procedural language which specifies the way to compute it; for example, SQLSQL
SQL is a programming language designed for managing data in relational database management systems ....
comprises portions of each type). An ERROL expression describes a navigation hypertree (hypertree
Hypertree
A hypergraph H is called a hypertree, if it admits a host graph T such that T is a tree, in other words if there exists a tree T such that every hyperedge of H induces a subtree in T....
is an acyclic
Acyclic
Acyclic can refer to:* In chemistry, a compound which is not cyclic, e.g. alkanes and acyclic aliphatic compounds* In mathematics:** A graph without a cycle, especially*** A directed acyclic graph...
hypergraph
Hypergraph
In mathematics, a hypergraph is a generalization of a graph, where an edge can connect any number of vertices. Formally, a hypergraph H is a pair H = where X is a set of elements, called nodes or vertices, and E is a set of non-empty subsets of X called hyperedges or links...
), generated by navigating in the ER diagram. Names and constants are nodes. Entities and Relationships define one types of edges, sets of attribute names. A connecting ERROL construct defines a second type of edge, a "regular" tree
Tree (data structure)
In computer science, a tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes.Mathematically, it is an ordered directed tree, more specifically an arborescence: an acyclic connected graph where each node has zero or more children nodes and at...
(dual node) edge. (See (Raz 1987) below for more detail.) The navigation may include jumps in the diagram (in case of comparisons) or repetitions over same sections in the diagrams (if the query specification requires repeated utilization of same diagram elements). A complex specification of a Hypertree, a computaion by a long NL sentence, may suffer from ambiguities due to unclear connection between sentence parts. ERROL solves this problem by inserted parantheses (primarily parentheses) when needed, to uniquely define an expression's hypertree with the correct intended meaning.
Structured English
Structured English (SE; Raz 1987; not to be confused with Structured EnglishStructured English
Structured English is the use of the English language with the syntax of structured programming. Thus structured English aims at getting the benefits of both the programming logic and natural language...
used for pseudocode
Pseudocode
In computer science and numerical computation, pseudocode is a compact and informal high-level description of the operating principle of a computer program or other algorithm. It uses the structural conventions of a programming language, but is intended for human reading rather than machine reading...
) is an extension of a subset of the English language
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...
and an enhancement of the initial ERROL. In SE several "syntactic sugar
Syntactic sugar
Syntactic sugar is a computer science term that refers to syntax within a programming language that is designed to make things easier to read or to express....
" elements of ERROL have been made more flexible, and further flexibility with word usage and sentence structure has been introduced, to get it closer to NL. However, the original correspondence between NL and respective ERROL basic constructs has been preserved. In what follows no distinction is made between ERROL and SE.
Highlights
- The navigation hypertree with correct RRA interpretation (respective RRA operator substitution for any ERROL construct type) provides all the semantics needed to define accurately an ERROL (SE) expression (see examples in Raz 1987). A navigation hypertree has one-to-one correspondence with a skeletonSkeleton (category theory)In mathematics, a skeleton of a category is a subcategory which, roughly speaking, does not contain any extraneous isomorphisms. In a certain sense, the skeleton of a category is the "smallest" equivalent category which captures all "categorical properties". In fact, two categories are equivalent...
expression of ERROL, a canonical formCanonical formGenerally, in mathematics, a canonical form of an object is a standard way of presenting that object....
that describes the navigation hypertree only by Entitys, Relationships, and Attributes, together with ConstantsConstant (programming)In computer programming, a constant is an identifier whose associated value cannot typically be altered by the program during its execution...
, Aggregate functionAggregate functionIn computer science, an aggregate function is a function where the values of multiple rows are grouped together as input on certain criteria to form a single value of more significant meaning or measurement such as a set, a bag or a list....
s, Logical connectiveLogical connectiveIn logic, a logical connective is a symbol or word used to connect two or more sentences in a grammatically valid way, such that the compound sentence produced has a truth value dependent on the respective truth values of the original sentences.Each logical connective can be expressed as a...
s, Arithmetic operationsArithmeticArithmetic or arithmetics is the oldest and most elementary branch of mathematics, used by almost everyone, for tasks ranging from simple day-to-day counting to advanced science and business calculations. It involves the study of quantity, especially as the result of combining numbers...
, and Comparison operators. - An ERROL expression has one navigation hypertree, but typically several navigation hypertrees exist (for different ways of NL phrasing) that provide the same result, by navigating in the ER Diagram through same elements in different ways (e.g., in different directions, in different orders).
- Scope of an aggregate functionAggregate functionIn computer science, an aggregate function is a function where the values of multiple rows are grouped together as input on certain criteria to form a single value of more significant meaning or measurement such as a set, a bag or a list....
is determined by parantheses when the intended scope is not properly determined by the defaults. Any ERROL subexpression can be aggregated by inserting aggregation function name and parentheses in the right place. - DecompositionDecomposition (computer science)Decomposition in computer science, also known as factoring, refers to the process by which a complex problem or system is broken down into parts that are easier to conceive, understand, program, and maintain.- Overview :...
: A long NL specification can be broken into several simpler sentences. Correspondingly a respective ERROL expression can be replaced by a set of respective simple expressions separated by delimiterDelimiterA delimiter is a sequence of one or more characters used to specify the boundary between separate, independent regions in plain text or other data streams. An example of a delimiter is the comma character, which acts as a field delimiter in a sequence of comma-separated values.Delimiters represent...
s, (e.g., " ; "), which is equivalent to the original expression (see example below). - ReferenceReferenceReference is derived from Middle English referren, from Middle French rèférer, from Latin referre, "to carry back", formed from the prefix re- and ferre, "to bear"...
or correlation: Often it is required that entities can be referenced properly in a sentence or across sentences. In NL it is done by using expressions like "his", "the above", "the second above", etc. In ERROL it is done as well, primarily by reference (correlation) symbols like (x), attached to entity name repetitions in an expression, and by some supported, or ad-hoc defined per a given schema, English constructs. All entity name repetitions marked with (x) are the same entity occurrence in the database for any instance for which the query predicate is true (see examples below). - AssignmentAssignment (computer science)In computer programming, an assignment statement sets or re-sets the value stored in the storage location denoted by a variable name. In most imperative computer programming languages, assignment statements are one of the basic statements...
and SubstitutionSubstitutionSubstitution may refer to:- Sciences :* Substitution , a syntactic transformation on strings of symbols of a formal language* Substitution of variables* Substitution cipher, a method of encryption...
: An ERROL subexpression can be named using assignment and be utilized in a long expression by its name (see example below). In particular entities and relationships can be defined and utilized. - Reserved wordReserved wordReserved words are one type of grammatical construct in programming languages. These words have special meaning within the language and are predefined in the language’s formal specifications...
s: Errol has a set of reserved words, for example, for commandsCommand (computing)In computing, a command is a directive to a computer program acting as an interpreter of some kind, in order to perform a specific task. Most commonly a command is a directive to some kind of command line interface, such as a shell....
(e.g., "get") and logic operatorsLogical connectiveIn logic, a logical connective is a symbol or word used to connect two or more sentences in a grammatically valid way, such that the compound sentence produced has a truth value dependent on the respective truth values of the original sentences.Each logical connective can be expressed as a...
(e.g., "or"). - Transitive closureTransitive closureIn mathematics, the transitive closure of a binary relation R on a set X is the transitive relation R+ on set X such that R+ contains R and R+ is minimal . If the binary relation itself is transitive, then the transitive closure will be that same binary relation; otherwise, the transitive closure...
. This feature enhances the expression and computing power of ERROL considerably. It is done by allowing to define (and name) relationships which are the transitive closure of existing reflexiveReflexive relationIn mathematics, a reflexive relation is a binary relation on a set for which every element is related to itself, i.e., a relation ~ on S where x~x holds true for every x in S. For example, ~ could be "is equal to".-Related terms:...
relationships (i.e., between an entity to itself, e.g., EMPLOYEE MANAGES EMPLOYEE: A manager can be direct, or several levels above) either already appearing in the database, or defined by an ERROL expression. - An interactive graphical interface (Graphical ERROL) can be built in a straightforward way to construct the navigation hypertree (and corresponding ERROL expression; the ERROL system can already reconstruct an NL expression from an RRA expression using the system's dictionary) by picking up elements from the ER Diagram, and other needed elements (logical, comparison, constants, etc.). Such component has not been implemented within the framework of the ERROL System project described below (see the Brief history section below).
- SE can be replaced by a respective subset of any other Natural languageNatural languageIn the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...
(e.g., FrenchFrench languageFrench is a Romance language spoken as a first language in France, the Romandy region in Switzerland, Wallonia and Brussels in Belgium, Monaco, the regions of Quebec and Acadia in Canada, and by various communities elsewhere. Second-language speakers of French are distributed throughout many parts...
, Hebrew, etc.) with basic syntactic constructs that have semantics similar to the English constructs used for SE. Skeleton expression translations require only names' translations, and language fontsUnicode typefacesA Unicode font is a computer font that contains a wide range of characters, letters, digits, glyphs, symbols, ideograms, logograms, etc., which are collectively mapped into the standard Universal Character Set, derived from many different languages and scripts from around the world...
replacement when needed. Such components have not been implemented within the framework of the ERROL System project described below (see the Brief history section below).
The linguistic aspect of the Entity relationship model, and the way it is utilized by ERROL for navigation in an ER Diagram is demonstrated by the following example:
Example
See additional examples in a section below.Reshaped relational algebra
The Reshaped relational algebra (RRA; Markowitz and Raz 1983b, 1984; an alternative formulation can be found in Raz 1986 ) has been developed to support ERROL over relational databaseRelational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
s. RRA is equivalent to the Relational Algebra
Relational algebra
Relational algebra, an offshoot of first-order logic , deals with a set of finitary relations that is closed under certain operators. These operators operate on one or more relations to yield a relation...
(RA) in expression power (each one algebra's operator can be expressed by the other algebra's operators (Raz 1986) ), and has strong analogies to some basic NL constructs. As such it is ideal for describing the semantics
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....
of ERROL, as well as for implementing ERROL over relational database
Relational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
s. For any ERROL specification (expression; query or other manipulation) a crresponding RRA expression computes the specification over a relational database. The main difference between RRA and RA is with natural join
Relational algebra
Relational algebra, an offshoot of first-order logic , deals with a set of finitary relations that is closed under certain operators. These operators operate on one or more relations to yield a relation...
and projection
Relational algebra
Relational algebra, an offshoot of first-order logic , deals with a set of finitary relations that is closed under certain operators. These operators operate on one or more relations to yield a relation...
operations embedded in various RRA operators.
An ERROL expression (or respective hypertree) can be translated in a straightforward way to an RRA expression. An RRA expression is a partial order (typically with several compatible sequences) of RRA operations, which provides a procedure for computing the value of the ERROL expression over a relational database (i.e., computing the query or data manipulation resulting relation) by manipulating its relations. Each ERROL subexpression type has a corresponding RRA operator. The subexpression variables (entity, relationship, and attribute names) and constants comprise the operator's parameters. The operators' order is determined by the hypertree structure of the (possibly paranthesized) ERROL expression. By proper renaming, the join operation embedded in RRA operators automatically connects between corresponding attributes in entities and relationships, and resolves references by using a single entity identifying temporary name for all that entity occurrences in the query that have the same reference symbol (for same entity occurrences with different reference symbols, different temporary names are given; for same entity occurrences with no reference symbol, different temporary names are given).
RRA expression simplification, and consistency
Consistency proof
In logic, a consistent theory is one that does not contain a contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consistent if and only if it has a model, i.e. there exists an interpretation under which all...
checking together with subexpression contradiction
Contradiction
In classical logic, a contradiction consists of a logical incompatibility between two or more propositions. It occurs when the propositions, taken together, yield two conclusions which form the logical, usually opposite inversions of each other...
and tautology
Tautology (logic)
In logic, a tautology is a formula which is true in every possible interpretation. Philosopher Ludwig Wittgenstein first applied the term to redundancies of propositional logic in 1921; it had been used earlier to refer to rhetorical tautologies, and continues to be used in that alternate sense...
identification, can be done using RRA axiom
Axiom
In traditional logic, an axiom or postulate is a proposition that is not proven or demonstrated but considered either to be self-evident or to define and delimit the realm of analysis. In other words, an axiom is a logical statement that is assumed to be true...
s and theorem
Theorem
In mathematics, a theorem is a statement that has been proven on the basis of previously established statements, such as other theorems, and previously accepted statements, such as axioms...
s (Raz 1986). RRA expression computation optimization
Query optimization
Query optimization is a function of many relational database management systems in which multiple query plans for satisfying a query are examined and a good query plan is identified. This may or not be the absolute best strategy because there are many ways of doing plans. There is a trade-off...
can be done similarly to the ways it is done for Relational algebra
Relational algebra
Relational algebra, an offshoot of first-order logic , deals with a set of finitary relations that is closed under certain operators. These operators operate on one or more relations to yield a relation...
(RA).
Brief history
Both RRA and initial version of ERROL, with all needed for queries linguistic constructs, including implementation guidelines, have been developed by Victor M. MarkowitzVictor M. Markowitz
Victor M. Markowitz is Chief Informatics Officer & Associate Director at DOE Joint Genome Institute , and head of Lawrence Berkeley National Laboratory's Biological Data Management and Technology Center....
as the subject of his M.Sc. thesis
Thesis
A dissertation or thesis is a document submitted in support of candidature for an academic degree or professional qualification presenting the author's research and findings...
(Markowitz 1983) at the Technion - Israel Institute of Technology
Technion - Israel Institute of Technology
The Technion – Israel Institute of Technology is a research-intensive institute of technology in Haifa, Israel. Originally called the Technikum, it was founded in 1912...
(advisor: Yoav Raz). Further ERROL enhancements and implementation, including ERROL to RRA translation
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...
(M.Sc. project of Reuven Cohen; Cohen 1984), have been done by Yoav Raz together with graduate students. In 1984 Yoav Raz, Victor Markowitz, and Reuven Cohen won the Computer Science Award of ILA – The Israeli Information Technology Association.
The ERROL System (Raz et al. 1984) implements database queries and manipulations over a relational database using SE and RRA. During the years 1982-1988 it has been developed at the Technion, Israel
Israel
The State of Israel is a parliamentary republic located in the Middle East, along the eastern shore of the Mediterranean Sea...
, using UNIX
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
, Lex
Lex programming tool
Lex is a computer program that generates lexical analyzers . Lex is commonly used with the yacc parser generator. Lex, originally written by Mike Lesk and Eric Schmidt, is the standard lexical analyzer generator on many Unix systems, and a tool exhibiting its behavior is specified as part of the...
, YACC
Yacc
The computer program yacc is a parser generator developed by Stephen C. Johnson at AT&T for the Unix operating system. The name is an acronym for "Yet Another Compiler Compiler." It generates a parser based on an analytic grammar written in a notation similar to BNF.Yacc used to be available as...
, and Ingres, and further enhanced at UCSD (see output examples in Raz 1987).
ERROL (SE) examples
The examples below are given primarily in their skeleton representation, or close to that, rather than their more close-to-NL forms (e.g., set operations rather than quantifiers) to more clearly hint on their RRA translations.Example 1
This example relates to a factory database. The portion of the database relevant to this example (and other examples below) has departments, items in stock, and suppliers of these items as entities. Departments REQUEST (order) items from suppliers. Suppliers SUPPLY items to departments. Both last sentences above define ternaryTernary
Ternary is an adjective meaning "composed of three items". It can refer to:* Ternary complex, a complex formed by the interaction of three molecules* Ternary compound, a type of chemical compound...
(three-way) relationships.
- Entities are defined by the following relations:
-
- DEPARTMENT(did, name, floor)
- ITEM(iid, name, color)
- SUPPLIER(sid, name)
- The relationships are defined by the following relations:
-
- REQUEST(did, iid, sid, quantity)
- SUPPLY(did, iid, sid, quantity)
- Possible simple ERROL queries over this schema for respective Natural language (NL) specifications are as follows:
- NL: "Find id and name for items supplied to the Engineering department."
-
- Get id, name of ITEM SUPPLIED to DEPARTMENT with name="Engineering", or
- Get id, name of ITEMS SUPPLIED to the Engineering DEPARTMENT - (additional dictionaryDictionaryA dictionary is a collection of words in one or more specific languages, often listed alphabetically, with usage information, definitions, etymologies, phonetics, pronunciations, and other information; or a book of words in one language with their equivalents in another, also known as a lexicon...
definition is required for using this ERROL expression)
- NL: "Find the names of suppliers from whom red or blue items are requested by the Engineering department."
-
- Get names of SUPLIERS REQUESTED ITEMS with color="Red" OR "Blue" by DEPARTMENT with name="Engineering"
Example 2
(from example 2 in (Raz 1987) )Using the schema in Example 1 above consider the following imaginary complex query:
- "Find departments such that each ( is located in floor lower than the floor of a department requesting number of items greater than 20) or ( (not supplied by supplier named "Tom") and has the name "Engineering" )."
A possible straightforward skeleton ERROL expression for this query is the following:
- Get DEPARTMENT (floor < floor DEPARTMENT REQUEST COUNT ITEM > 20) OR (NOT SUPPLY SUPPLIER name = "Tom") AND name = "Engineering"
or the following:
- Get DEPARTMENT(x) WHERE (E1 AND E2) OR ((NOT E3) AND E4); E1:=floor DEPARTMENT(x)< floor DEPARTMENT(y); E2:=DEPARTMENT(y) REQUEST COUNT ITEM>20; E3:=DEPARTMENT(x) SUPPLY SUPPLIER name = "Tom"; E4:=DEPARTMENT(x) name="Engineering" (comment: the parentheses in the first subexpression are unnecessary due to common defaultDefault (computer science)A default, in computer science, refers to a setting or value automatically assigned to a software application, computer program or device, outside of user intervention. Such settings are also called presets, especially for electronic devices...
s)
-
- Comment: Judging by the ERROL compilerCompilerA compiler is a computer program that transforms source code written in a programming language into another computer language...
output in example 2 of the last reference below, it seems that some parantheses in the input expression there have been lost, probably due to a (paper) "cut and pasteCut and pasteIn human-computer interaction, cut and paste and copy and paste offer user-interface interaction techniques for transferring text, data, files or objects from a source to a destination. Most ubiquitously, users require the ability to cut and paste sections of plain text...
" error, that had also propagated to the example 2 text there.
- Comment: Judging by the ERROL compiler
Example 3
Using the schema in Example 1 above:- "Find pairs of supplier, department, such that the supplier supplies to the department more than half the number of all the "Red" items requested by all the departments."
ERROL query:
- Get SUPPLIER(x), DEPARTMENT SUPPLIED by SUPPLIER(x) COUNT ITEMS > 0.5*COUNT ITEMS (REQUESTED AND have color="Red")
If half the number of "Red" items requested by any single department is considered, the NL sentence is slightly changed:
- "Find pairs of supplier, department, such that the supplier supplies to the department more than half the number of "Red" items requested by any department."
A possible ERROL query is the following:
- Get SUPPLIER(x), DEPARTMENT SUPPLIED by SUPPLIER(x) COUNT ITEMS > 0.5*COUNT ITEMS (REQUESTED AND have color="Red") by DEPARTMENT
-
- Here in the second COUNT operator the aggregation is done separately for each DEPARTMENT.
-
- The symbol x is for a reference.
Example 4
Using the schema Example 1 above:- "Find pairs of supplier, department such that the supplier supplies to the department all the items that this department requests from that supplier."
ERROL query:
- Get SUPPLIER(x), DEPARTMENT(y) SUPPLIED by SUPPLIER(x) SET ITEMS = SET ITEMS REQUESTED by DEPARTMENT(y) from SUPPLIER(x)."
-
- The symbols x, y are for references.
Example 5
Using the schema in Example 1 above:- "Find the departments that get all the quantity of Red Bolts they order."
ERROL query:
- Get DEPARTMENT(x) SUPPLIED with SUM quantity of ITEMS with color="RED" AND name="Bolt" = SUM quantity of REQUESTED ITEMS with color="RED" AND name="Bolt" by DEPARTMENT(x)
-
- The symbol x is for reference.
See also
- Natural languageNatural languageIn the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...
- Entity-relationship modelEntity-relationship modelIn software engineering, an entity-relationship model is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements...
(ERM) - Object-Role Modeling (ORM)
- GellishGellishGellish is a controlled natural language, also called a formal language, in which information and knowledge can be expressed in such a way that it is computer-interpretable, as well as system-independent. Gellish is a structured subset of natural language that is suitable for information modelling...
- Gellish EnglishGellish EnglishGellish English is a variant of Gellish and is a formal language, which means that it is structured and formalised subset of natural English that is computer interpretable. Its definition includes an English dictionary of concepts that is arranged in a taxonomy and that is extended into an ontology...
- Reshaped relational algebra
External links
- ERROL page in Yoav Raz's site