Gellish database
Encyclopedia
Universal data structure
Gellish databases are semantic databases that all have the same universally applicable data structure. That data structure is suitable to contain any fact that is expressed in a GellishGellish
Gellish is a controlled natural language, also called a formal language, in which information and knowledge can be expressed in such a way that it is computer-interpretable, as well as system-independent. Gellish is a structured subset of natural language that is suitable for information modelling...
language variant, such as Gellish English
Gellish English
Gellish English is a variant of Gellish and is a formal language, which means that it is structured and formalised subset of natural English that is computer interpretable. Its definition includes an English dictionary of concepts that is arranged in a taxonomy and that is extended into an ontology...
or Gellish Dutch. This means that the structure of a Gellish database does not need to be extended when the scope of the database increases. The data structure is based on an extended version of the object-relationship-object (ORO) principles, which state that every atomic fact is expressed as one or more binary relations. Higher order relations are converted into collections of binary relations. A Gellish database consists of one or more Naming Tables and one or more Fact Tables. A Naming Table and a Fact Table can also be combined in a Message Table, which is intended especially for data exchange between systems. Each row in a Fact Table contains one main fact and a large number of auxiliary facts. The auxiliary facts provide additional information about the main facts. For example, their status, author, creation date, validity context, language, etc. A Naming Table contains the terms, names, codes, abbreviations and synonyms by which the concepts and individual objects are named, possibly in multiple languages. They also contain unique identifiers for the named concept and things.
The universal data structure was originally defined as an implementation method for ISO 10303-221 and ISO 15926
ISO 15926
The ISO 15926 is titled: "Industrial automation systems and integration—Integration of life-cycle data for process plants including oil and gas production facilities" is a standard for data integration, sharing, exchange, and hand-over between computer systems.This title is regarded too...
-2, both of which International Standards are based on the same ORO principles. A Gellish database requires that each fact is expressed by one or more relations and that each relation is explicitly classified by a relation type (= fact type). The Gellish language requires that the relation type shall be one of the Gellish standard relation types. It also requires that each individual thing is explicitly classified by a kind of thing, which kind shall be selected from a dictionary, such as ISO 15926-4 or from the Gellish English Dictionary-Taxonomy
Gellish English dictionary
The Gellish English Dictionary-Taxonomy is an example of an open-source “smart” electronic dictionary, which concepts are arranged in a subtype-supertype hierarchy, thus forming a taxonomy. The dictionary-taxonomy is a machine readable...
or its proprietary extension. In conventional databases the relation types and the classification of individual things are usually implied by the database structure, which makes them limited and not extensible. Standard fact types (relation types) in a Gellish Database can be chosen from one of the above ISO standards or from the Gellish English Dictionary-Taxonomy.
Gellish Databases
Each GellishGellish
Gellish is a controlled natural language, also called a formal language, in which information and knowledge can be expressed in such a way that it is computer-interpretable, as well as system-independent. Gellish is a structured subset of natural language that is suitable for information modelling...
Database consists of one or more Gellish Data Tables. Each of those Gellish Data Tables has basically the same structure and is standardized and is application system independent. This is different from conventional databases that usually have proprietary data structures, whereas all database tables are different. Each of the Gellish Data Tables shall contain at least the obligatory columns that are defined in specification document 'Definition of Gellish Databases and Data Exchange Messages'.
The content of Gellish Data Tables can be created conform ISO 10303-221 or ISO 15926 or can be compliant with the grammar and the dictionary of the formal Gellish English language (or a Gellish variant in any other natural language). The standardized tables, in combination with the formal Gellish language, enables to combine an arbitrary number of Gellish Data Tables into one Database. Furthermore, such a database might be centralized, but can also be a distributed database. This also enables to combine the results of a query to various independent data stores, which then act as a distributed database.
The various Gellish Data Tables all have the same core of column definitions. Apart from that core, the tables may also have one or more of the optional columns. Preferred collections of columns are defined in standard Gellish Data Table subsets.
A Gellish Database may be implemented in various formats. It can be in the form of an SQL database, or in RDF/XML, or even in XLS (the form of Excel spreadsheet tables).
Limitations of conventional databases
Conventional databases typically consist of many tables, each of which is composed of a number of columns. The definition of those tables and columns determine the storage capabilities of the database, whereas the relations between the columns define the kinds of facts that can be stored in such a database. Those columns and relations determine the database structure that defines the expression capabilities of the database. Similar rules apply for the structure of data exchange files and thus for the information that is exchanged in electronic data files.This conventional database technology has some major constraints:
When data was not covered during the database design and thus is not included in the data model, then such data cannot be stored in the database nor exchanged via such a data file structure.
Different databases have different data structures, which causes that data in one database cannot be integrated with data from other databases nor exchanged between databases without dedicated data conversion.
A database modification or extension requires redesign of the database structure, modification of software and data conversion, which makes it a relatively complicated and costly exercise.
Another characteristic of conventional databases is that there are hardy international standards available or used for the content of the databases, being the data that is entered by its users. This typically means that local conventions are applied to limit the diversity of data that may be entered in those databases. As local conventions usually differ from other local conventions this has as disadvantage that data that are entered in one database cannot be compared or integrated with data in other databases, even if those database structures are the same and even if the application domain of the databases is the same. For example, within a company there may be various implementations of the same system in various sites for the storage of data about equipment, whereas for example the performance data about the same type of equipment still cannot be compared with the performance data in another location, because the equipment types have different names and the properties are also different.
Characteristics of a Gellish Database
A Gellish database does not have the semantic limitations that conventional databases have, because of the flexible and open Gellish language and because of its standard universal data structure (grammar), which is simple, computer and human interpretable. A Gellish database consists of one or more database tables, each of which has the same table structure (column definitions). The fact that those Gellish Database tables are standardized and universally applicable makes a Gellish database application independent. A standardized Gellish database table is universally applicable because it enables the application of the following two fundamental principles:Explicit classification of individual things or explicit specialization of classes, with an unlimited number of classes in a dictionary.
The Gellish database table enables to store any kind of object; because any individual object can be introduced by specification of an explicit classification relation between the object and a class, whereas classes (kinds of objects or concepts) can be selected from the very large number of classes that are already defined in the Gellish English Dictionary and if the proper class is not available it can be added by specification of a subtype-supertype relation with a direct supertype of the new class. This is fundamentally different from conventional databases that predefine the object types (classes) about which information can be stored by defining a limited number of entity types and attribute types in a fixed data model.
Explicit classification of relations (facts), by an extensible unlimited number of standardized relation types.
The Gellish database table enables to store any kind of fact about any kind of object, because any fact is expressed by a relation, whereas those relations are explicitly classified by relation types that can be selected from the standardized relation types that are defined in the Gellish Dictionary or by relation types that are added to the dictionary as proprietary extensions. This is fundamentally different from conventional databases that predefine a fixed and limited number of relation types between the columns in the database tables (whereas unfortunately those relation types are usually defined only in an implicit way).
As a consequence, a Gellish database does not need to be modified or extended when the scope of an application changes and facts from different Gellish databases can be merged and integrated whenever required without a need for a conversion exercise.
Furthermore the content of a Gellish Database uses a common Gellish Dictionary for all its data, including for example, equipment types, property types, document types, activity types, etc.
Gellish Expressions in a Gellish Database
A Gellish Database is a database that contains one or more standardised Gellish Database tables. Each such table contains the same predefined columns and is suitable for the expression of virtually any kind of fact such that is computer interpretable and system independent. The table can be implemented as an MSAccess database table, an SQL database table or simply as a standard table in a spreadsheet. The core of a Gellish Database table consists of three columns, just as is the case in RDF/Notation 3. Each row with those three columns in such a table expresses a main (binary) fact. For example, the fact that the Eiffel tower is located in Paris can be expressed as follows:Left hand object | Relation type | Right hand object |
---|---|---|
The Eiffel tower | is located in | Paris |
The Eiffel tower | is classified as a | tower |
Paris | is classified as a | city |
The left hand objects and the right hand objects may either be selected from the Gellish English dictionary or may be new proprietary objects that are introduced by defining them on separate lines. If such a new object is an individual thing, then it shall be defined by a classification relation with a class, as is done in the above table and if the new object is a class, then it shall be defined on a separate line by a specialization relation with their direct supertype. The relation types (such as 'is located in' and 'is classified as a') shall be selected from the Gellish English dictionary, otherwise the expression cannot be called standard Gellish, but becomes a proprietary extension of Gellish English.
Multi-language support
Furthermore, a Gellish database structure supports the simultaneous use of multiple languages. This is enabled because a Gellish database table contains a separate column for the language in which a fact is expressed (see the example table below). Thus a Gellish database supports the use of various natural language specific versions of Gellish. In principle, there is a Gellish variant language for each natural language, depending on the availability of a translation of the Gellish concepts. For example, the Gellish English Dictionary defines Gellish English, and contains partial translations to Gellish Deutsch (German) and Gellish Nederlands (Dutch). International terminology (such as most units of measure and mathematical concepts) is included as International Gellish.Unique identifiers, homonyms, synonyms and automatic translation
A Gellish database uses a unique identifier for each thing, irrespective whether it is a user object, a concept from the Gellish dictionary, a fact or a relation type. The following Gellish database table is an extended version of the above example and includes the language in which the fact is expressed as well as the identifiers of the objects.Language | UID of left hand object | Name of left hand object | UID of fact | UID of relation type | Name of relation type | UID of right hand object | Name of right hand object |
---|---|---|---|---|---|---|---|
English | 1 | The Eiffel tower | 101 | 5138 | is located in | 2700887 | Paris |
English | 1 | The Eiffel tower | 102 | 1225 | is classified as a | 40903 | tower |
Dutch | 1 | De Eiffel toren | 103 | 4691 | is a translation of | 1 | The Eiffel tower |
The unique identifiers enable the use of synonyms and homonyms and enable that a computer can automatically translate a Gellish expression in a certain language into a Gellish expression in another language. This is caused by the fact that the meaning of a Gellish expression is captured as a relation between the unique identifiers, so that the meaning is language independent.
This adds automatic translation capabilities to Gellish expressions, because a Gellish message can be created e.g. in Gellish English whereas computer software can present it in another Gellish variant, such as Gellish Dutch if a dictionary or a translation is available, such as on the third line in the above table.
Auxiliary facts
A full Gellish database table has a number of additional columns that enable the expression of auxiliary facts or data about the main facts. For example, columns for:a textual definition of the left hand object
the context in which a fact is valid
a unit of measure with its UID
the status of the fact (accepted, proposed, deleted, replaced, etc.)
the originator of the fact
the date of creation of the fact
etc.
Other documentation about Gellish
Other documentation about Gellish can be found on:- http://www.gellish.net/
- http://sourceforge.net/project/showfiles.php?group_id=28353.
For example,
- The “Gellish Application Handbook” provides extensive guidance on how to express information or knowledge in the Gellish English language about physical objects, their design and operation, in other words on what to fill-in in a Gellish database when that language is applied.
- The 'Gellish English language extension manual' (previously referred to as the ‘Guide on STEPlib’) describes how proprietary extensions to the dictionary can be specified and how they can be proposed as additions to the official Gellish English Dictionary-Taxonomy.
- The document 'Example of a Road in Gellish' (pdf) and an accompanying Gellish Data Table as an Excel spreadsheet. These together fors an illustrative example of the expression of knowledge about roads and an example of information about a particular road, both expressed in Gellish English with a translation to Gellish Dutch and vice versa.