Semantic spectrum
Encyclopedia
The semantic spectrum (sometimes referred to as the ontology spectrum or the smart data continuum or semantic precision) is a series of increasingly precise or rather semantically
expressive definitions for data element
s in knowledge representation
s, especially for machine use.
At the low end of the spectrum is a simple binding of a single word or phrase and its definition. At the high end is a full ontology
that specifies relationships between data elements using precise URI
s for relationships and properties.
With increased specificity comes increased precision and the ability to use tools to automatically integrate systems but also increased cost to build and maintain a metadata registry
.
Some steps in the semantic spectrum include the following:
correctness: How can correct syntax and semantics be enforced? Are tools (such as XML Schema
) readily available to validate syntax of data exchanges?
adequacy/expressivity/scope: Does the system represent everything that is of practical use for the purpose? Is an emphasis being placed on data that is externalized (exposed or transferred between systems)?
efficiency: How efficiently can the representation be searched / queried, and - possibly - reasoned on?
complexity: How steep is the learning curve
for defining new concepts, querying for them or constraining them? are there appropriate tools for simplifying typical workflows? (See also: ontology editor
)
translatability: Can the representation easily be transformed (e.g. by Vocabulary-based transformation
) into an equivalent representation so that semantic equivalence
is ensured?
to store their data definitions and to perform metadata publishing
. The question of where they are on the semantic spectrum frequently arises. To determine where your systems are, some of the following questions are frequently useful.
movement.
In the past, many organizations that created custom database application used isolated teams of developers that did not formally publish their data definitions. These teams frequently used internal data definitions that were incompatible with other computer systems. This made Enterprise Application Integration
and Data warehousing extremely difficult and costly. Many organizations today require that teams consult a centralized data registry before new applications are created.
The job title of an individual that is responsible for coordinating an organization's data is a Data architect
.
in collaboration with the panelists (Fritz, Mike Uschold, Mike Gruninger, and Deborah McGuinness
) came up with a "spectrum" of kinds of information systems that were, at the time, referred to as ontologies. The "ontology spectrum" picture appeared in print in the introduction to
Formal Ontology and Information Systems: Proceedings of the 2001 Conference. The ontology spectrum was also featured in a talk at the Semantics for the Web meeting in 2000 at Dagstuhl by Deborah McGuinness. McGuinness produced a paper describing the points on that spectrum that appeared in the book that emerged (much later) from that workshop called "Spinning the Semantic Web." Later, Leo Obrst extended the spectrum into two dimensions (which technically is not really a spectrum anymore) and added a lot more detail, which was included in his book, The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management.
The concept of the Semantic precision in business systems was popularized by Dave McComb in his book Semantics in Business Systems: The Savvy Managers Guide published in 2003 where he frequently uses the term Semantic Precision.
This discussion centered around a 10 level partition that included the following levels (listed in the order of increasing semantic precision):
Note that there was a special emphasis on the adding of formal is-a relationships to the spectrum which seems to have been dropped.
The company Cerebra has also popularized this concept by describing the data formats that exist within an enterprise in their ability to store semantically precise metadata
. Their list includes:
What the concepts share in common is the ability to store information with increasing precision to facilitate intelligent agents.
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....
expressive definitions for data element
Data element
In metadata, the term data element is an atomic unit of data that has precise meaning or precise semantics. A data element has:# An identification such as a data element name# A clear data element definition# One or more representation terms...
s in knowledge representation
Knowledge representation
Knowledge representation is an area of artificial intelligence research aimed at representing knowledge in symbols to facilitate inferencing from those knowledge elements, creating new elements of knowledge...
s, especially for machine use.
At the low end of the spectrum is a simple binding of a single word or phrase and its definition. At the high end is a full ontology
Ontology (computer science)
In computer science and information science, an ontology formally represents knowledge as a set of concepts within a domain, and the relationships between those concepts. It can be used to reason about the entities within that domain and may be used to describe the domain.In theory, an ontology is...
that specifies relationships between data elements using precise URI
Úri
Úriis a village and commune in the comitatus of Pest in Hungary....
s for relationships and properties.
With increased specificity comes increased precision and the ability to use tools to automatically integrate systems but also increased cost to build and maintain a metadata registry
Metadata registry
A metadata registry is a central location in an organization where metadata definitions are stored and maintained in a controlled method.-Use of Metadata Registries:...
.
Some steps in the semantic spectrum include the following:
- glossaryGlossaryA glossary, also known as an idioticon, vocabulary, or clavis, is an alphabetical list of terms in a particular domain of knowledge with the definitions for those terms...
: A simple list of terms and their definitions. A glossary focuses on creating a complete list of the terminology of domain-specific terms and acronyms. It is useful for creating clear and unambiguous definitions for terms and because it can be created with simple word processing tools, few technical tools are necessary. - controlled vocabularyControlled vocabularyControlled vocabularies provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesauri, taxonomies and other form of knowledge organization systems...
: A simple list of terms, definitions and naming conventions. A controlled vocabulary frequently has some type of oversight process associated with adding or removing data element definitions to ensure consistency. Terms are often defined in relationship to each other. - data dictionaryData dictionaryA data dictionary, or metadata repository, as defined in the IBM Dictionary of Computing, is a "centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format." The term may have one of several closely related meanings pertaining to...
: Terms, definitions, naming conventions and one or more representations of the data elements in a computer system. Data dictionaries often define data types, validation checks such as enumerated values and the formal definitions of each of the enumerated values. - data modelData modelA data model in software engineering is an abstract model, that documents and organizes the business data for communication between team members and is used as a plan for developing applications, specifically how data is stored and accessed....
: Terms, definitions, naming conventions, representations and one or more representations of the data elements as well as the beginning of specification of the relationships between data elements including abstractions and containers. - taxonomyTaxonomyTaxonomy is the science of identifying and naming species, and arranging them into a classification. The field of taxonomy, sometimes referred to as "biological taxonomy", revolves around the description and use of taxonomic units, known as taxa...
: A complete data model in an inheritance hierarchy where all data elements inherit their behaviors from a single "super data element". The difference between a data model and a formal taxonomy is the arrangement of data elements into a formal tree structure where each element in the tree is a formally defined concept with associated properties. - ontologyOntology (computer science)In computer science and information science, an ontology formally represents knowledge as a set of concepts within a domain, and the relationships between those concepts. It can be used to reason about the entities within that domain and may be used to describe the domain.In theory, an ontology is...
: A complete, machine-readable specification of a conceptualization using URIÚriÚriis a village and commune in the comitatus of Pest in Hungary....
s (and then IRIInternationalized Resource IdentifierOn the Internet, the Internationalized Resource Identifier is a generalization of the Uniform Resource Identifier . While URIs are limited to a subset of the ASCII character set, IRIs may contain characters from the Universal Character Set , including Chinese or Japanese kanji, Korean, Cyrillic...
s) for all data elements, properties and relationship types. The W3C standard language for representing ontologies is the Web Ontology LanguageWeb Ontology LanguageThe Web Ontology Language is a family of knowledge representation languages for authoring ontologies.The languages are characterised by formal semantics and RDF/XML-based serializations for the Semantic Web...
(OWL). Ontologies frequently contain formal business rules formed in discrete logic statements that relate data elements to each another.
Typical questions for determining semantic precision
The following is a list of questions that may arise in determining semantic precision.correctness: How can correct syntax and semantics be enforced? Are tools (such as XML Schema
XML Schema
XML Schema, published as a W3C recommendation in May 2001, is one of several XML schema languages. It was the first separate schema language for XML to achieve Recommendation status by the W3C...
) readily available to validate syntax of data exchanges?
adequacy/expressivity/scope: Does the system represent everything that is of practical use for the purpose? Is an emphasis being placed on data that is externalized (exposed or transferred between systems)?
efficiency: How efficiently can the representation be searched / queried, and - possibly - reasoned on?
complexity: How steep is the learning curve
Learning curve
A learning curve is a graphical representation of the changing rate of learning for a given activity or tool. Typically, the increase in retention of information is sharpest after the initial attempts, and then gradually evens out, meaning that less and less new information is retained after each...
for defining new concepts, querying for them or constraining them? are there appropriate tools for simplifying typical workflows? (See also: ontology editor
Ontology editor
Ontology editors are applications designed to assist in the creation or manipulation of ontologies.They often express ontologies in one of many ontology languages...
)
translatability: Can the representation easily be transformed (e.g. by Vocabulary-based transformation
Vocabulary-based transformation
In metadata, a vocabulary-based transformation is a transformation aided by the use of a semantic equivalence statements within a controlled vocabulary.Many organizations today require communication between one or more computers...
) into an equivalent representation so that semantic equivalence
Semantic equivalence
In computer metadata, semantic equivalence is a declaration that two data elements from different vocabularies contain data that has similar meaning...
is ensured?
Determining location on the semantic spectrum
Many organizations today are building a metadata registryMetadata registry
A metadata registry is a central location in an organization where metadata definitions are stored and maintained in a controlled method.-Use of Metadata Registries:...
to store their data definitions and to perform metadata publishing
Metadata publishing
Metadata publishing is the process of making metadata data elements available to external users, both people and machines using a formal review process and a commitment to change control processes....
. The question of where they are on the semantic spectrum frequently arises. To determine where your systems are, some of the following questions are frequently useful.
- Is there a centralized glossary of terms for the subject matter?
- Does the glossary of terms include precise definitions for each terms?
- Is there a central repository to store data elementData elementIn metadata, the term data element is an atomic unit of data that has precise meaning or precise semantics. A data element has:# An identification such as a data element name# A clear data element definition# One or more representation terms...
s that includes data types information? - Is there an approval process associated with the creation and changes to data elements?
- Are coded data elements fully enumerated? Does each enumeration have a full definition?
- Is there a process in place to removed duplicate or redundant data elements from the metadata registry?
- Is there one or more classification schemes used to classify data elements?
- Are document exchanges and web services created using the data elements?
- Can the central metadata registry be used as part of a Model-driven architectureModel-driven architectureModel-driven architecture is a software design approach for the development of software systems. It provides a set of guidelines for the structuring of specifications, which are expressed as models. Model-driven architecture is a kind of domain engineering, and supports model-driven engineering of...
? - Are their staff members trained to extract data elements that can be reused in metadata structures?
Strategic nature of semantics
Today, much of the World Wide Web is stored as Hypertext Markup Language. Search engines are severely hampered by their inability to understand the meaning of published web pages. These limitations have led to the advent of the Semantic webSemantic Web
The Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...
movement.
In the past, many organizations that created custom database application used isolated teams of developers that did not formally publish their data definitions. These teams frequently used internal data definitions that were incompatible with other computer systems. This made Enterprise Application Integration
Enterprise application integration
Enterprise Application Integration is defined as the use of software and computer systems architectural principles to integrate a set of enterprise computer applications.- Overview :...
and Data warehousing extremely difficult and costly. Many organizations today require that teams consult a centralized data registry before new applications are created.
The job title of an individual that is responsible for coordinating an organization's data is a Data architect
Data architect
A data architect is a person responsible for ensuring that the data assets of an organization are supported by an architecture supporting the organization in achieving its strategic goals. The architecture should cover databases, data integration and the means to get to the data. Usually the data...
.
History
The first reference to this term was at the 1999 AAAI Ontologies Panel. The panel was organized by Chris Welty, who at the prodding of Fritz Lehmann andin collaboration with the panelists (Fritz, Mike Uschold, Mike Gruninger, and Deborah McGuinness
Deborah McGuinness
Deborah Louise McGuinness is a computer scientist working in the field of artificial intelligence, specifically in knowledge representation and reasoning, description logics, the semantic web, explanation, and trust. She is a professor at Rensselaer Polytechnic Institute where she holds an endowed...
) came up with a "spectrum" of kinds of information systems that were, at the time, referred to as ontologies. The "ontology spectrum" picture appeared in print in the introduction to
Formal Ontology and Information Systems: Proceedings of the 2001 Conference. The ontology spectrum was also featured in a talk at the Semantics for the Web meeting in 2000 at Dagstuhl by Deborah McGuinness. McGuinness produced a paper describing the points on that spectrum that appeared in the book that emerged (much later) from that workshop called "Spinning the Semantic Web." Later, Leo Obrst extended the spectrum into two dimensions (which technically is not really a spectrum anymore) and added a lot more detail, which was included in his book, The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management.
The concept of the Semantic precision in business systems was popularized by Dave McComb in his book Semantics in Business Systems: The Savvy Managers Guide published in 2003 where he frequently uses the term Semantic Precision.
This discussion centered around a 10 level partition that included the following levels (listed in the order of increasing semantic precision):
- Simple Catalog of Data Elements
- GlossaryGlossaryA glossary, also known as an idioticon, vocabulary, or clavis, is an alphabetical list of terms in a particular domain of knowledge with the definitions for those terms...
of Terms and Definitions - Thesauri, Narrow Terms, RelationshipsRelation (mathematics)In set theory and logic, a relation is a property that assigns truth values to k-tuples of individuals. Typically, the property describes a possible connection between the components of a k-tuple...
- Informal "Is-aIs-aIn knowledge representation, object-oriented programming and design, is-a or is_a or is a is a relationship where one class D is a subclass of another class B ....
" relationships - Formal "Is-a" relationships
- Formal instances
- Frames (properties)
- Value Restrictions
- Disjointness, Inverse, Part-of
- General Logical Constraints
Note that there was a special emphasis on the adding of formal is-a relationships to the spectrum which seems to have been dropped.
The company Cerebra has also popularized this concept by describing the data formats that exist within an enterprise in their ability to store semantically precise metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
. Their list includes:
- HTMLHTMLHyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....
- Word ProcessingWord processingWord processing is the creation of documents using a word processor. It can also refer to advanced shorthand techniques, sometimes used in specialized contexts with a specially modified typewriter.-External links:...
documents - Microsoft ExcelMicrosoft ExcelMicrosoft Excel is a proprietary commercial spreadsheet application written and distributed by Microsoft for Microsoft Windows and Mac OS X. It features calculation, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications...
- Relational databaseRelational databaseA relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
s - XMLXMLExtensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
- XML SchemaXML SchemaXML Schema, published as a W3C recommendation in May 2001, is one of several XML schema languages. It was the first separate schema language for XML to achieve Recommendation status by the W3C...
- Taxonomies
- Ontologies
What the concepts share in common is the ability to store information with increasing precision to facilitate intelligent agents.
See also
- Enterprise messaging systemEnterprise messaging systemAn enterprise messaging system is a set of published Enterprise-wide standards that allows organizations to send semantically precise messages between computer systems. EMS systems promote loosely coupled architectures that allow changes in the formats of messages to have minimum impact on...
- SemanticsSemanticsSemantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....
- SKOSSKOSSimple Knowledge Organization System is a family of formal languages designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is built upon RDF and RDFS, and its main objective is to enable...
- ontology (computer science)Ontology (computer science)In computer science and information science, an ontology formally represents knowledge as a set of concepts within a domain, and the relationships between those concepts. It can be used to reason about the entities within that domain and may be used to describe the domain.In theory, an ontology is...
- collabulary
- Web serviceWeb serviceA Web service is a method of communication between two electronic devices over the web.The W3C defines a "Web service" as "a software system designed to support interoperable machine-to-machine interaction over a network". It has an interface described in a machine-processable format...
- Conceptual interoperabilityConceptual interoperabilityConceptual interoperability is a concept in simulation theory.From the early ideas of Harkrider and Lunceford, simulation composability has been studied in more detail...