Freebase (database)
Encyclopedia
Freebase is a large collaborative knowledge base
consisting of metadata
composed mainly by its community
members. It is an online collection of structured data
harvested from many sources, including individual 'wiki
' contributions. Freebase aims to create a global resource which allows people (and machines) to access common information more effectively. It was developed by the American software company Metaweb
and has been running publicly since March 2007. Metaweb was acquired by Google
in a private sale announced July 16, 2010.
Freebase data is available for free/libre
for commercial
and non-commercial
use under a Creative Commons Attribution License
, and an open API
, RDF endpoint, and database dump are provided for programmers. Google's News Timeline includes media information from Freebase.
using wikipedia-turned-database or entity-relationship model
, Freebase provides an interface that allows non-programmers to fill in structured, or 'meta-data', of general information, and to categorize or connect data items in meaningful, or 'semantic
' ways.
Described by Tim O'Reilly
upon their launch, "Freebase is the bridge between the bottom up vision of Web 2.0 collective intelligence and the more structured world of the semantic web."
Freebase contains data harvested from sources such as Wikipedia
, ChefMoz
, NNDB
, and MusicBrainz
, as well as individually contributed data from its users. The structured data is licensed under the Creative Commons Attribution License,
and a JSON
based HTTP
API
is provided to programmers for developing applications on any platform to utilize the Freebase data. The source code
for the Metaweb application itself is proprietary.
Freebase runs on a database infrastructure created in-house by Metaweb that utilizes a graph
model. This means that instead of using tables and keys to define data structures, Freebase defines its data structure as a set of nodes
and a set of links that establish relationships between the nodes. Because its data structure is non-hierarchical, Freebase can model much more complex relationships between individual elements than a conventional database
, and is open for users to enter new objects and relationships into the underlying graph.
Queries to the database are made in "Metaweb Query Language" (MQL).
, in parallel computing and database design, is Metaweb’s executive vice president for product development. John Giannandrea, formerly chief technologist at Tellme Networks
and chief technologist of the Web browser group at Netscape
/AOL, is the company’s chief technology officer.
Originally accessible by invitation only, Freebase opened full anonymous read access to the public in its alpha stage of development, and now requires registration only for data contributions.
On October 29, 2008, at the International Semantic Web Conference
2008, Freebase released its RDF
service for generating RDF representations of Freebase topics, allowing Freebase to be used as Linked Data
.
, the former Governor of California, would be entered as a topic that would include a variety of types describing him as an actor, bodybuilder, and politician. Freebase has approximately 11.5 million topics as of April 2010.
Freebase's ontologies
(structured categories), known in Freebase as "types" — are themselves user-editable. Each type has a number of defined predicates, called "properties".
In this manner, Freebase differs from the wiki
model in many ways. Users can create their own types, but these types aren't adopted in the 'public commons' until promoted by a Metaweb employee. As well, users cannot modify each other's types. The reason Freebase can't open up permissions of schemas is because external apps rely on them; thus changing a type's schema, for instance by deleting a property or changing a simple property, might break queries for API users and even within Freebase itself - in saved views, for example.
Metaweb promotes some users to expert status, similar to Wikipedia's administrator policy, who are given some admin permissions.
The underlying data storage supports multilingual data, but every user’s display language is forcibly set to English. This will change at some point.
the only access is via MQL.
In terms of Freebase's relationship with the open data
community:
Freebase is planning formal mappings of some of their types to established ontologies like FOAF
, though this is not a priority.
In the future, the company hopes to also generate profit by organizing proprietary
data.
Denormalisation: A type or base created on Freebase cannot be edited by anyone but its creator. This is a policy to prevent inexperienced or malevolent users from breaking schemas. A result of this policy is that a half-complete schema cannot be improved by other users and must instead be reproduced completely, producing non-cooperative and often duplicate types.
Information of absence: Freebase has no solution to Null, nothing, unknown or N/A values. The None topic is badly broken, because many people seem to share the same spouse. As it stands, if one were looking for "fires of unknown cause", one would look for missing causes, not knowing if the cause of the fire is really unknown or the data is missing.
Bulk import tools: are used internally at Metaweb, but the reconciliation process for imported data has so far proved too complicated for public release, and public bulk tools are very limited.
Multilingual implementation: Freebase has translations (or translation support) of many of its topics, but its types are currently implemented (or at least described) in natural language English, leading to challenges in developing a universal schema.
Knowledge base
A knowledge base is a special kind of database for knowledge management. A Knowledge Base provides a means for information to be collected, organised, shared, searched and utilised.-Types:...
consisting of metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
composed mainly by its community
Community
The term community has two distinct meanings:*a group of interacting people, possibly living in close proximity, and often refers to a group that shares some common values, and is attributed with social cohesion within a shared geographical location, generally in social units larger than a household...
members. It is an online collection of structured data
Online database
An online database is a database accessible from a network, including from the Internet.It differs from a local database, held in an individual computer or its attached storage, such as a CD....
harvested from many sources, including individual 'wiki
Wiki
A wiki is a website that allows the creation and editing of any number of interlinked web pages via a web browser using a simplified markup language or a WYSIWYG text editor. Wikis are typically powered by wiki software and are often used collaboratively by multiple users. Examples include...
' contributions. Freebase aims to create a global resource which allows people (and machines) to access common information more effectively. It was developed by the American software company Metaweb
Metaweb
Metaweb Technologies, Inc. was a United States company based in San Francisco that developed Freebase, described as an "open, shared database of the world's knowledge". The company was founded by Danny Hillis in July, 2005, and operated in stealth mode until 2007. Metaweb was acquired by Google in...
and has been running publicly since March 2007. Metaweb was acquired by Google
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...
in a private sale announced July 16, 2010.
Freebase data is available for free/libre
Gratis versus Libre
Gratis versus libre is the distinction between two meanings of the English adjective "free"; namely, "for zero price" and "with little or no restriction"...
for commercial
Commerce
While business refers to the value-creating activities of an organization for profit, commerce means the whole system of an economy that constitutes an environment for business. The system includes legal, economic, political, social, cultural, and technological systems that are in operation in any...
and non-commercial
Non-commercial
Non-commercial refers to an activity or entity that does not in some sense involve commerce, at least relative to similar activities that do have a commercial objective or emphasis...
use under a Creative Commons Attribution License
Creative Commons
Creative Commons is a non-profit organization headquartered in Mountain View, California, United States devoted to expanding the range of creative works available for others to build upon legally and to share. The organization has released several copyright-licenses known as Creative Commons...
, and an open API
Application programming interface
An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...
, RDF endpoint, and database dump are provided for programmers. Google's News Timeline includes media information from Freebase.
Overview
On March 3, 2007 Metaweb publicly announced Freebase, described by the company as "an open shared database of the world's knowledge," and "a massive, collaboratively-edited database of cross-linked data." Often understood as database modelDatabase model
A database model is the theoretical foundation of a database and fundamentally determines in which manner data can be stored, organized, and manipulated in a database system. It thereby defines the infrastructure offered by a particular database system...
using wikipedia-turned-database or entity-relationship model
Entity-relationship model
In software engineering, an entity-relationship model is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements...
, Freebase provides an interface that allows non-programmers to fill in structured, or 'meta-data', of general information, and to categorize or connect data items in meaningful, or 'semantic
Semantic Web
The Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...
' ways.
Described by Tim O'Reilly
Tim O'Reilly
Tim O'Reilly is the founder of O'Reilly Media and a supporter of the free software and open source movements.-Life and career:...
upon their launch, "Freebase is the bridge between the bottom up vision of Web 2.0 collective intelligence and the more structured world of the semantic web."
Freebase contains data harvested from sources such as Wikipedia
Wikipedia
Wikipedia is a free, web-based, collaborative, multilingual encyclopedia project supported by the non-profit Wikimedia Foundation. Its 20 million articles have been written collaboratively by volunteers around the world. Almost all of its articles can be edited by anyone with access to the site,...
, ChefMoz
ChefMoz
Chef Moz, an offshoot of the Open Directory Project , was an open content directory of World Wide Web links of restaurants. The website was constructed and maintained by a community of volunteer editors, and owned by Netscape....
, NNDB
NNDB
The Notable Names Database , produced by Soylent Communications, the same entity that produces Rotten, Daily Rotten, Dr. Sputnik's Society Pages and Penny Postcards, is an online database of biographical details of over 36,000 people of note...
, and MusicBrainz
MusicBrainz
MusicBrainz is a project that aims to create an open content music database. Similar to the freedb project, it was founded in response to the restrictions placed on the CDDB...
, as well as individually contributed data from its users. The structured data is licensed under the Creative Commons Attribution License,
and a JSON
JSON
JSON , or JavaScript Object Notation, is a lightweight text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects...
based HTTP
Hypertext Transfer Protocol
The Hypertext Transfer Protocol is a networking protocol for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web....
API
Application programming interface
An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...
is provided to programmers for developing applications on any platform to utilize the Freebase data. The source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...
for the Metaweb application itself is proprietary.
Freebase runs on a database infrastructure created in-house by Metaweb that utilizes a graph
Graph (data structure)
In computer science, a graph is an abstract data structure that is meant to implement the graph and hypergraph concepts from mathematics.A graph data structure consists of a finite set of ordered pairs, called edges or arcs, of certain entities called nodes or vertices...
model. This means that instead of using tables and keys to define data structures, Freebase defines its data structure as a set of nodes
Node (computer science)
A node is a record consisting of one or more fields that are links to other nodes, and a data field. The link and data fields are often implemented by pointers or references although it is also quite common for the data to be embedded directly in the node. Nodes are used to build linked, often...
and a set of links that establish relationships between the nodes. Because its data structure is non-hierarchical, Freebase can model much more complex relationships between individual elements than a conventional database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
, and is open for users to enter new objects and relationships into the underlying graph.
Queries to the database are made in "Metaweb Query Language" (MQL).
Development
Danny Hillis first described his idea for creating a knowledge web he called Aristotle in a paper in 2000. But he said he did not try to build the system until he had recruited two technical experts as co-founders. Robert CookRobert Cook (programmer)
Robert Cook is a co-founder of Metaweb. He was a software programmer at Brøderbund in the 1980s and was the model for one of the characters in Jordan Mechner's game Prince of Persia...
, in parallel computing and database design, is Metaweb’s executive vice president for product development. John Giannandrea, formerly chief technologist at Tellme Networks
Tellme Networks
Tellme. Networks, Inc. is a company founded in 1999 by Mike McCue and Angus Davis, based out of Mountain View, California, in the United States, that specializes in telephone-based applications....
and chief technologist of the Web browser group at Netscape
Netscape
Netscape Communications is a US computer services company, best known for Netscape Navigator, its web browser. When it was an independent company, its headquarters were in Mountain View, California...
/AOL, is the company’s chief technology officer.
Originally accessible by invitation only, Freebase opened full anonymous read access to the public in its alpha stage of development, and now requires registration only for data contributions.
On October 29, 2008, at the International Semantic Web Conference
International Semantic Web Conference
The International Semantic Web Conference is a series of academic conferences in which the latest research, results, and technical innovations on all aspects of the Semantic Web are presented. Its proceedings are published in the Lecture Notes in Computer Science by Springer-Verlag.-Overview:...
2008, Freebase released its RDF
Resource Description Framework
The Resource Description Framework is a family of World Wide Web Consortium specifications originally designed as a metadata data model...
service for generating RDF representations of Freebase topics, allowing Freebase to be used as Linked Data
Linked Data
In computing, linked data describes a method of publishing structured data so that it can be interlinked and become more useful. It builds upon standard Web technologies such as HTTP and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a...
.
Organization and policy
Freebase's subjects (which often correspond to a Wikipedia article) are called topics and the data stored about them depend on their type, how they are classified. For example, an entry for Arnold SchwarzeneggerArnold Schwarzenegger
Arnold Alois Schwarzenegger is an Austrian-American former professional bodybuilder, actor, businessman, investor, and politician. Schwarzenegger served as the 38th Governor of California from 2003 until 2011....
, the former Governor of California, would be entered as a topic that would include a variety of types describing him as an actor, bodybuilder, and politician. Freebase has approximately 11.5 million topics as of April 2010.
Freebase's ontologies
Ontology (computer science)
In computer science and information science, an ontology formally represents knowledge as a set of concepts within a domain, and the relationships between those concepts. It can be used to reason about the entities within that domain and may be used to describe the domain.In theory, an ontology is...
(structured categories), known in Freebase as "types" — are themselves user-editable. Each type has a number of defined predicates, called "properties".
[U]nlike the W3C approach to the semantic web, which starts with controlled ontologies, Metaweb adopts a folksonomyFolksonomyA folksonomy is a system of classification derived from the practice and method of collaboratively creating and managing tags to annotate and categorize content; this practice is also known as collaborative tagging, social classification, social indexing, and social tagging...
approach, in which people can add new categories (much like tags), in a messy sprawl of potentially overlapping assertions.
In this manner, Freebase differs from the wiki
Wiki
A wiki is a website that allows the creation and editing of any number of interlinked web pages via a web browser using a simplified markup language or a WYSIWYG text editor. Wikis are typically powered by wiki software and are often used collaboratively by multiple users. Examples include...
model in many ways. Users can create their own types, but these types aren't adopted in the 'public commons' until promoted by a Metaweb employee. As well, users cannot modify each other's types. The reason Freebase can't open up permissions of schemas is because external apps rely on them; thus changing a type's schema, for instance by deleting a property or changing a simple property, might break queries for API users and even within Freebase itself - in saved views, for example.
Metaweb promotes some users to expert status, similar to Wikipedia's administrator policy, who are given some admin permissions.
The underlying data storage supports multilingual data, but every user’s display language is forcibly set to English. This will change at some point.
the only access is via MQL.
Business and community
The Freebase system is built and patented by Metaweb, a for-profit company, which delivers targeted advertising on Freebase.com.In terms of Freebase's relationship with the open data
Open Data
Open data is the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. The goals of the open data movement are similar to those of other "Open" movements such as open source, open...
community:
...we have no formal relationship with other open data projects. Though the definition of open data is pretty loose, we try to follow general open data principles by not restricting access to Freebase information to registered users, charging users to access our information, imposing restrictive licenses over the use of Freebase information, or using proprietary or closed technology as a barrier to accessing Freebase information.
Freebase is planning formal mappings of some of their types to established ontologies like FOAF
FOAF (software)
FOAF is a machine-readable ontology describing persons, their activities and their relations to other people and objects. Anyone can use FOAF to describe him or herself...
, though this is not a priority.
In the future, the company hopes to also generate profit by organizing proprietary
Proprietary software
Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary...
data.
Criticism
Lack of Notability guideline: Unlike Wikipedia, Freebase has no notability guidelines. Instead, it permits any data that might be of interest to other people; it does not permit transient data or that of only personal interest. Under these guidelines, commercial content is permitted if it is structured, factual data. Because of this, some have raised concerns about spam.Denormalisation: A type or base created on Freebase cannot be edited by anyone but its creator. This is a policy to prevent inexperienced or malevolent users from breaking schemas. A result of this policy is that a half-complete schema cannot be improved by other users and must instead be reproduced completely, producing non-cooperative and often duplicate types.
Information of absence: Freebase has no solution to Null, nothing, unknown or N/A values. The None topic is badly broken, because many people seem to share the same spouse. As it stands, if one were looking for "fires of unknown cause", one would look for missing causes, not knowing if the cause of the fire is really unknown or the data is missing.
Bulk import tools: are used internally at Metaweb, but the reconciliation process for imported data has so far proved too complicated for public release, and public bulk tools are very limited.
Multilingual implementation: Freebase has translations (or translation support) of many of its topics, but its types are currently implemented (or at least described) in natural language English, leading to challenges in developing a universal schema.
Popular applications
- Google Refine http://code.google.com/p/freebase-gridworks/ - a power tool for data cleaning and discovery.
- PowersetPowerset (company)Powerset is a Microsoft owned company based in San Francisco, California that, in 2006, was developing a natural language search engine for the Internet....
http://powerset.com/ - a semantic search engine that searched Freebase for answers to natural-language questions (purchased by Microsoft and used in their Bing search engine) - Freebase genealogy - family-tree viewer
- FMDb - a Freebase IMDB
- Freebase sets - a clone of Google sets using Freebase data
- Freebase Schema Explorer - a visualiser for Freebase's ontologies
- Thinkbase - a visual graph-based exploration tool
- Seevl - a music recommendation engine based on relationships from Freebase and DBpedia
- Ranker - lets users build "top N" lists based on types and entities from Freebase
- Ookaboo - Creative Commons images that illustrate more than 500,000 topics from Freebase
See also
- DBpediaDBpediaDBpedia is a project aiming to extract structured content from the information created as part of the Wikipedia project. This structured information is then made available on the World Wide Web. DBpedia allows users to query relationships and properties associated with Wikipedia resources,...
- YAGOYAGO (database)YAGO is a knowledge base developed at the Max-Planck-Institute Saarbrücken.The knowledge base contains information harvested from Wikipedia and linked to Wordnet....
- CycCycCyc is an artificial intelligence project that attempts to assemble a comprehensive ontology and knowledge base of everyday common sense knowledge, with the goal of enabling AI applications to perform human-like reasoning....
- True KnowledgeTrue KnowledgeTrue Knowledge Ltd. company in Cambridge, England, founded by William Tunstall-Pedoe, which specialises in knowledge base and semantic search engine software. Its first product was an answer engine that aimed to directly answer questions posed in plain English text, which is accomplished using a...
- Semantic WebSemantic WebThe Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...
- Entity-relationship modelEntity-relationship modelIn software engineering, an entity-relationship model is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements...