WebCite - AbsoluteAstronomy.com

WebCite is a service that archives web page

Web page

A web page or webpage is a document or information resource that is suitable for the World Wide Web and can be accessed through a web browser and displayed on a monitor or mobile device. This information is usually in HTML or XHTML format, and may provide navigation to other web pages via hypertext...

s on demand. Authors can subsequently cite the archived web page

Web archiving

Web archiving is the process of collecting portions of the World Wide Web and ensuring the collection is preserved in an archive, such as an archive site, for future researchers, historians, and the public. Due to the massive size of the Web, web archivists typically employ web crawlers for...

s through WebCite, in addition to citing the original URL

Uniform Resource Locator

In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....

of the web page. Readers are able to retrieve the archived web pages indefinitely, without regard to whether the original web page is revised or removed (so-called link rot

Link rot

Link rot , also known as link death or link breaking is an informal term for the process by which, either on individual websites or the Internet in general, increasing numbers of links point to web pages, servers or other resources that have become permanently unavailable...

). Such archiving is especially important in the academic context. WebCite is a non-profit

Non-profit organization

Nonprofit organization is neither a legal nor technical definition but generally refers to an organization that uses surplus revenues to achieve its goals, rather than distributing them as profit or dividends...

consortium

Consortium

A consortium is an association of two or more individuals, companies, organizations or governments with the objective of participating in a common activity or pooling their resources for achieving a common goal....

supported by publishers and editors, and it can be used by individual authors and readers without charge. It is a member of International Internet Preservation Consortium

International Internet Preservation Consortium

-Projects:IIPC sponsored a project on "cross-archival search strategies" which included the creation of an archive focused on the 2010 Winter Olympics....

.

Rather than relying on a web crawler

Web crawler

A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, Web spiders, Web robots, or—especially in the FOAF community—Web scutters.This process is called Web...

which archives pages in a "random

Randomness

Randomness has somewhat differing meanings as used in various fields. It also has common meanings which are connected to the notion of predictability of events....

" fashion, WebCite users who want to cite web page

Web page

s in a scholarly article

Academic publishing

Academic publishing describes the subfield of publishing which distributes academic research and scholarship. Most academic work is published in journal article, book or thesis form. The part of academic written output that is not formally published but merely printed up or posted is often called...

can initiate the archiving process. They then cite—instead of or in addition to the original URL—a WebCite address, with an identifier

Identifier

An identifier is a name that identifies either a unique object or a unique class of objects, where the "object" or class may be an idea, physical [countable] object , or physical [noncountable] substance...

that specifies a snapshot of the contents of the particular page they meant to cite.

One may archive all types of web content, including HTML

HTML

HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

web pages, PDF files, style sheets, JavaScript

JavaScript

JavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....

and digital image

Digital image

A digital image is a numeric representation of a two-dimensional image. Depending on whether or not the image resolution is fixed, it may be of vector or raster type...

s. WebCite also archives metadata

Metadata

The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

about the collected resources such as access time, MIME type, and content length. This metadata is useful in establishing the authenticity

Authentication

Authentication is the act of confirming the truth of an attribute of a datum or entity...

and provenance

Provenance

Provenance, from the French provenir, "to come from", refers to the chronology of the ownership or location of an historical object. The term was originally mostly used for works of art, but is now used in similar senses in a wide range of fields, including science and computing...

of the archived collection.

History

Conceived in 1997 by Gunther Eysenbach

Gunther Eysenbach

Gunther Eysenbach is a researcher on Open access publishing, health policy, eHealth, and consumer health informatics.Eysenbach was born on 22 March 1967 in Berlin, Germany. While a medical student, he served on the executive board as elected Communication Director, later as Vice-President of the...

, WebCite was publicly described the following year when an article on Internet

Internet

The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...

quality control

Quality control

Quality control, or QC for short, is a process by which entities review the quality of all factors involved in production. This approach places an emphasis on three aspects:...

declared that such a service could also measure the citation impact

Citation impact

Citation is the process of acknowledging or citing the author, year, title, and locus of publication of a source used in a published work. Such citations can be counted as measures of the usage and impact of the cited work. This is called citation analysis or bibliometrics...

of web pages. In the same year, a pilot service was set up at the address webcite.net (see ). Shortly thereafter, Google

Google

Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...

and the Internet Archive

Internet Archive

The Internet Archive is a non-profit digital library with the stated mission of "universal access to all knowledge". It offers permanent storage and access to collections of digitized materials, including websites, music, moving images, and nearly 3 million public domain books. The Internet Archive...

entered the market, seemingly reducing the need for a service like WebCite.

The WebCite idea was revived in 2003, when a study published in the journal Science
Science (journal)
Science is the academic journal of the American Association for the Advancement of Science and is one of the world's top scientific journals....

concluded that no appropriate and agreed-on archiving solution yet existed for publishing. Neither the Internet Archive nor Google allows for “on-demand” archiving by authors, and they do not have interfaces to scholarly journals and publishers to automate the archiving of cited links. By 2008, over 200 journals had begun routinely using WebCite.

WebCite is a member of the International Internet Preservation Consortium

International Internet Preservation Consortium

-Projects:IIPC sponsored a project on "cross-archival search strategies" which included the creation of an archive focused on the 2010 Winter Olympics....

. It "feeds its content" to other digital preservation projects, including the Internet Archive

Internet Archive

. Lawrence Lessig

Lawrence Lessig

Lawrence "Larry" Lessig is an American academic and political activist. He is best known as a proponent of reduced legal restrictions on copyright, trademark, and radio frequency spectrum, particularly in technology applications, and he has called for state-based activism to promote substantive...

, an American

United States

The United States of America is a federal constitutional republic comprising fifty states and a federal district...

academic who writes extensively on copyright and technology, used WebCite in his amicus brief in the United States Supreme Court case of MGM Studios, Inc. v. Grokster, Ltd.

MGM Studios, Inc. v. Grokster, Ltd.

MGM Studios, Inc. v. Grokster, Ltd. 545 U.S. 913 is a United States Supreme Court decision in which the Court unanimously held that defendant P2P file sharing companies Grokster and Streamcast could be sued for inducing copyright infringement for acts taken in the course of marketing file sharing...

Copyright issues

WebCite maintains the legal position that its archiving activities are allowed by the copyright doctrines of fair use

Fair use

Fair use is a limitation and exception to the exclusive right granted by copyright law to the author of a creative work. In United States copyright law, fair use is a doctrine that permits limited use of copyrighted material without acquiring permission from the rights holders...

and implied license

Implied license

An implied license is an unwritten license which permits a party to do something that would normally require the express permission of another party...

. To support the fair use argument, WebCite notes that its archived copies are transformative

Transformativeness

Transformativeness is a concept used in United States copyright law to describe a characteristic of some derivative works that makes them transcend or place in a new light the underlying works on which they are based...

, socially valuable for academic research, and not harmful to the market value of any copyrighted work. WebCite argues that caching and archiving web pages is not considered a copyright infringement when the archiver offers the copyright owner an opportunity to "opt-out" of the archive system, thus creating an implied license. To that end, WebCite will not archive Web sites in violation of "do-not-cache" and "no-archive" metadata

Metadata

, as well as robot exclusion standards, the absence of which creates an "implied license" for web archive services to preserve the content.

In a similar case involving Google

Google

's web caching activities, on January 19, 2006, the United States District Court for the District of Nevada

United States District Court for the District of Nevada

The United States District Court for the District of Nevada is the Federal district court whose jurisdiction is the state of Nevada. The court has locations in Las Vegas and Reno....

agreed with that argument in the case of Field vs Google
Field v. Google
Field v. Google, Inc., 412 F.Supp. 2d 1106 is a case where Google Inc. successfully defended a lawsuit for copyright infringement. Field argued that Google infringed his exclusive right to reproduce his copyrighted works when it "cached" his website and made a copy of it available on its search...

(CV-S-04-0413-RCJ-LRL), holding that fair use and an "implied license" meant that Google's caching of Web pages did not constitute copyright violation. The "implied license" referred to general Internet standards.

Process

WebCite allows on-demand prospective archiving. It is not crawler-based; pages are only archived if the citing author or publisher requests it. No cached copy will appear in a WebCite search unless the author or another person has specifically cached it beforehand.

To initiate the caching and archiving of a page, an author may use WebCite's "archive" menu option or create a WebCite bookmarklet

Bookmarklet

A bookmarklet is Unobtrusive JavaScript stored as the URL of a bookmark in a web browser or as a hyperlink on a web page. The term is a portmanteau of the terms bookmark and applet, however, an applet is not to be confused with a bookmarklet just as JavaScript is not to be confused with Java...

that will allow web surfers to cache pages just by clicking a button in their bookmarks folder.

One can retrieve or cite archived pages through a transparent format such as

http://webcitation.org/query?url=URL&date=DATE

where URL is the URL that was archived, and DATE indicates the caching date. For example,

http://webcitation.org/query?url=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FMain_Page&date=2008-03-04

or the alternate short form http://webcitation.org/5W56XTY5h
retrieves an archived copy of the URL http://en.wikipedia.org/wiki/Main_Page that is closest to the date of March 4, 2008.

It is important to note that Webcite does not work for pages which contain a no-cache tag. WebCite respects the author's request to not have their web page cached.

You can archive a page simply loading a link like this in your browser:

http://webcitation.org/archive?url=urltoarchive&email=yourmail

June–July 2009 Outages

In June 2009, attempts to create new citations failed. The project's creator wrote on June 19 that increased server load generated by Wikipedia

Wikipedia

Wikipedia is a free, web-based, collaborative, multilingual encyclopedia project supported by the non-profit Wikimedia Foundation. Its 20 million articles have been written collaboratively by volunteers around the world. Almost all of its articles can be edited by anyone with access to the site,...

's prompted migration of the service to a new server. By the end of June 2009, attempts to access the project's website returned a message that it was "undergoing maintenance", and previously archived links became inaccessible. The archiving service resumed operation by the second week of July 2009, and previously archived links became accessible again.

September 2011 Outage

On September 4, 2011 Gunther Eysenbach, owner of the "webcitation.org" domain, was notified that not only were new citation requests producing errors, but previous citations where failing as well. Subsequently the following message appeared on various WebCite pages:

Sep 6, 2011 - On Sep 3rd (just before the long labor day weekend), WebCite went down due to a hardware failure. While we are restoring the database from our backups, no new snapshots can be made, and old snapshots may be temporarily unavailable. We apologize for any inconvenience caused.

On September 11, 2011 at 7:38pm EST, WebCite appeared to be up-and-running again:

Sep 11, 2011 - We apologize for the recent outage following the week of Sep 3rd, 2011. WebCite went down due to a hardware failure, and restoring our huge database took a couple of days. Everything should be back to normal. We apologize for any inconvenience caused.

Business model

The term WebCite is a registered trademark. WebCite does not charge individual users, journal editors and publishers any fee to use their service. WebCite earns revenue from publishers who want to "have their publications analyzed and cited webreferences archived", and accepts donations. Early support was from the University of Toronto

University of Toronto