Open science data
Encyclopedia
Open science data is a type of Open data
focussed on publishing observations and results of scientific activities available for anyone to analyze and reuse. While the idea of open science data has been actively promoted since the 1950s, the rise of the Internet has significantly lowered the cost and time required to publish or obtain data.
system, in preparation for the International Geophysical Year
of 1957-1958. The International Council of Scientific Unions (now the International Council for Science
) established several World Data Centers to minimize the risk of data loss and to maximize data accessibility, further recommending in 1955 that data be made available in machine-readable form.
In 1995 GCDIS (US) put its position clearly in
On the Full and Open Exchange of Scientific Data (A publication of the Committee on Geophysical and Environmental Data - National Research Council):
The last phrase highlights the traditional cost of disseminating information by print and post. It is the removal of this cost through the Internet which has made data vastly easier to disseminate technically. It is correspondingly cheaper to create, sell and control many data resources and this has led to the current concerns over non-open data.
More recent uses of the term include:
In 2004, the Science Ministers of all nations of the OECD (Organisation for Economic Co-operation and Development), which includes most developed countries of the world, signed a declaration which essentially states that all publicly-funded archive data should be made publicly available. Following a request and an intense discussion with data-producing institutions in member states, the OECD published in 2007 the OECD Principles and Guidelines for Access to Research Data from Public Funding as a soft-law recommendation.
In 2005 Edd Dumbill introduced an "Open Data" theme in XTech, including:
In 2006 Science Commons ran a 2-day conference in Washington where the primary topic could be described as Open Data. It was reported that the amount of micro-protection of data (e.g. by license) in areas such as biotechnology was creating a Tragedy of the anticommons
. In this the costs of obtaining licenses from a large number of owners made it uneconomic to do research in the area.
In 2007 SPARC and Science Commons announced a consolidation and enhancement of their author addenda
In 2010 the Panton Principles launched, advocating Open Data in science and setting out for principles to which providers must comply to have their data Open.
(2001) coined this term:
The logic of the declaration permits re-use of the data although the term "literature" has connotations of human-readable text and can imply a scholarly publication process. In Open Access discourse the term "full-text" is often used which does not emphasize the data contained within or accompanying the publication.
Some Open Access publishers do not require the authors to assign copyright and the data associated with these publications can normally be regarded as Open Data. Some publishers have Open Access strategies where the publisher requires assignment of the copyright and where it is unclear that the data in publications can be truly regarded as Open Data.
The ALPSP and STM publishers have issued a statement about the desirability of making data freely available :
and
Even though this statement was without any effect on the open availability of primary data related to publications in journals of the ALPSP and STM members. Data tables provided by the authors as supplement with a paper are still available to subscribers only.
Open Data
Open data is the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. The goals of the open data movement are similar to those of other "Open" movements such as open source, open...
focussed on publishing observations and results of scientific activities available for anyone to analyze and reuse. While the idea of open science data has been actively promoted since the 1950s, the rise of the Internet has significantly lowered the cost and time required to publish or obtain data.
History
The concept of open access to scientific data was institutionally established with the formation of the World Data CenterWorld Data Center
The World Data Center system was created to archive and distribute data collected from the observational programs of the 1957-1958 International Geophysical Year. Originally established in the United States, Europe, Soviet Union, and Japan, the WDC system has since expanded to other countries and...
system, in preparation for the International Geophysical Year
International Geophysical Year
The International Geophysical Year was an international scientific project that lasted from July 1, 1957, to December 31, 1958. It marked the end of a long period during the Cold War when scientific interchange between East and West was seriously interrupted...
of 1957-1958. The International Council of Scientific Unions (now the International Council for Science
International Council for Science
The International Council for Science , formerly the International Council of Scientific Unions, was founded in 1931 as an international non-governmental organization devoted to international co-operation in the advancement of science...
) established several World Data Centers to minimize the risk of data loss and to maximize data accessibility, further recommending in 1955 that data be made available in machine-readable form.
In 1995 GCDIS (US) put its position clearly in
On the Full and Open Exchange of Scientific Data (A publication of the Committee on Geophysical and Environmental Data - National Research Council):
"The Earth's atmosphere, oceans, and biosphere form an integrated system that transcends national boundaries. To understand the elements of the system, the way they interact, and how they have changed with time, it is necessary to collect and analyze environmental data from all parts of the world. Studies of the global environment require international collaboration for many reasons:
- to address global issues, it is essential to have global data sets and products derived from these data sets;
- it is more efficient and cost-effective for each nation to share its data and information than to collect everything it needs independently; and
- the implementation of effective policies addressing issues of the global environment requires the involvement from the outset of nearly all nations of the world.
International programs for global change research and environmental monitoring crucially depend on the principle of full and open data exchange (i.e., data and information are made available without restriction, on a non-discriminatory basis, for no more than the cost of reproduction and distribution."
The last phrase highlights the traditional cost of disseminating information by print and post. It is the removal of this cost through the Internet which has made data vastly easier to disseminate technically. It is correspondingly cheaper to create, sell and control many data resources and this has led to the current concerns over non-open data.
More recent uses of the term include:
- SAFARI 2000 (South Africa, 2001) used a license informed by ICSU and NASA policies
- the human genome (Kent, 2002)
- An Open Data Consortium on geospatial data (2003)
- Manifesto for Open Chemistry (Murray-Rust and Rzepa, 2004) (2004)
- Presentations to JISC and OAI under the title "open data" (Murray-Rust, 2005)
- Science Commons launch (2004)
- First Open Knowledge Forums (London, UK) run by the Open Knowledge FoundationOpen Knowledge FoundationThe Open Knowledge Foundation is a not-for-profit organization that promotes open knowledge, including open content and open data. It was founded 24 May 2004 in Cambridge, UK...
(London UK) on open data in relation to civic information and geodata (February and April 2005) - The Blue ObeliskBlue ObeliskBlue Obelisk is an informal group of chemists who promote Open Data, Open Source, and Open Standards; it was initiated by Peter Murray-Rust and others in 2005...
group in chemistry (mantra: Open Data, Open Source, Open Standards) (2005) - The Petition for Open Data in Crystallography is launched by the Crystallography Open Database Advisory Board.(2005)
- XML Conference & Exposition 2005 (Connolly 2005)
- SPARC Open Data mailing list (2005)
- First draft of the Open Knowledge Definition explicitly references "Open Data" (2005)
- XTech (Dumbill, 2005), (Bray and O'Reilly 2006)
In 2004, the Science Ministers of all nations of the OECD (Organisation for Economic Co-operation and Development), which includes most developed countries of the world, signed a declaration which essentially states that all publicly-funded archive data should be made publicly available. Following a request and an intense discussion with data-producing institutions in member states, the OECD published in 2007 the OECD Principles and Guidelines for Access to Research Data from Public Funding as a soft-law recommendation.
In 2005 Edd Dumbill introduced an "Open Data" theme in XTech, including:
- Open governmentOpen governmentOpen government is the governing doctrine which holds that citizens have the right to access the documents and proceedings of the government to allow for effective public oversight. In its broadest construction it opposes reason of state and racist considerations, which have tended to legitimize...
. - Public web services.
- Grassroots data.
- Scientific and academic publishing.
- Intellectual propertyIntellectual propertyIntellectual property is a term referring to a number of distinct types of creations of the mind for which a set of exclusive rights are recognized—and the corresponding fields of law...
. - Blogging and personal content.
- Semantic WebSemantic WebThe Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...
.
In 2006 Science Commons ran a 2-day conference in Washington where the primary topic could be described as Open Data. It was reported that the amount of micro-protection of data (e.g. by license) in areas such as biotechnology was creating a Tragedy of the anticommons
Tragedy of the anticommons
The tragedy of the anticommons is a neologism coined by Michael Heller to describe a coordination breakdown where the existence of numerous rightsholders frustrates achieving a socially desirable outcome. The term mirrors the older term tragedy of the commons used to describe coordination...
. In this the costs of obtaining licenses from a large number of owners made it uneconomic to do research in the area.
In 2007 SPARC and Science Commons announced a consolidation and enhancement of their author addenda
In 2010 the Panton Principles launched, advocating Open Data in science and setting out for principles to which providers must comply to have their data Open.
Relation to open access
Much data is made available through scholarly publication, which now attracts intense debate under "Open Access". The Budapest Open Access InitiativeBudapest Open Access Initiative
The Budapest Open Access Initiative was a conference convened by the Open Society Institute on December 1-2, 2001. This small gathering of individuals is recognised as one of the major historical, and defining, events of the open access movement....
(2001) coined this term:
By "open access" to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.
The logic of the declaration permits re-use of the data although the term "literature" has connotations of human-readable text and can imply a scholarly publication process. In Open Access discourse the term "full-text" is often used which does not emphasize the data contained within or accompanying the publication.
Some Open Access publishers do not require the authors to assign copyright and the data associated with these publications can normally be regarded as Open Data. Some publishers have Open Access strategies where the publisher requires assignment of the copyright and where it is unclear that the data in publications can be truly regarded as Open Data.
The ALPSP and STM publishers have issued a statement about the desirability of making data freely available :
Publishers recognise that in many disciplines data itself, in various forms, is now a key output of research. Data searching and mining tools permit increasingly sophisticated use of raw data. Of course, journal articles provide one ‘view’ of the significance and interpretation of that data – and conference presentations and informal exchanges may provide other ‘views’ – but data itself is an increasingly important community resource. Science is best advanced by allowing as many scientists as possible to have access to as much prior data as possible; this avoids costly repetition of work, and allows creative new integration and reworking of existing data.
and
We believe that, as a general principle, data sets, the raw data outputs of research, and sets or sub-sets of that data which are submitted with a paper to a journal, should wherever possible be made freely accessible to other scholars. We believe that the best practice for scholarly journal publishers is to separate supporting data from the article itself, and not to require any transfer of or ownership in such data or data sets as a condition of publication of the article in question.
Even though this statement was without any effect on the open availability of primary data related to publications in journals of the ALPSP and STM members. Data tables provided by the authors as supplement with a paper are still available to subscribers only.
See also
- Open dataOpen DataOpen data is the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. The goals of the open data movement are similar to those of other "Open" movements such as open source, open...
- CODATACommittee on Data for Science and TechnologyThe Committee on Data for Science and Technology was established in 1966 as an interdisciplinary committee of the International Council for Science. It seeks to improve the compilation, critical evaluation, storage, and retrieval of data of importance to science and technology.The CODATA Task...
- Science CommonsScience CommonsScience Commons is a Creative Commons project for designing strategies and tools for faster, more efficient web-enabled scientific research. The organization identifies unnecessary barriers to research, crafts policy guidelines and legal agreements to lower those barriers, and develops technology...
- OpenWetWare
- Open ConnectomeProject
External links
- Research Data Canada
- Open Data In Science article (P Murray-Rust)