SCIgen
Encyclopedia
SCIgen is a program
created by scientists at the Massachusetts Institute of Technology
that randomly generates nonsense
in the form of computer science
research papers, including graphs
, diagram
s, and citation
s. It uses a context-free grammar
to form all elements of the papers, and its authors state that their aim is "to maximize amusement, rather than coherence."
of Rooter: A Methodology for the Typical Unification of Access Points and Redundancy:
and the authors were invited to speak. The authors of SCIgen described their hoax on their website, and it soon received great publicity when picked up by Slashdot
.
WMSCI withdrew their invitation, but the SCIgen team went anyway, renting space in the hotel separately from the conference and delivering a series of randomly generated talks on their own "track." The organizer of all these conferences is Professor Nagib Callaos. The WMSCI was also sponsored by the Institute of Electrical and Electronics Engineers
from 2000 until 2005. The IEEE stopped granting sponsorship to Callaos in 2006, while Callaos received again IEEE sponsorship in 2008.
Submitting the paper was a deliberate attempt to embarrass WMSCI, which the authors claim accepts low-quality papers and sends unsolicited requests for submissions in bulk to academics. As the SCIgen website states:
Computing writer Stan Kelly-Bootle
noted in ACM Queue
that many sentences in the "Rooter" paper were individually plausible, which he regarded as posing a problem for automated detection of hoax articles. He suggested that even human readers might be taken in by the effective use of jargon ("The pun on root/router is par for MIT-graduate humor, and at least one occurrence of methodology is mandatory") and attribute the paper's apparent incoherence to their own limited knowledge. His conclusion was that "a reliable gibberish filter requires a careful holistic review by several peer domain experts".
demonstrated the vulnerability of h-index
calculations based on Google Scholar
output by feeding it a large set of SCIgen-generated documents that were citing each other (effectively an academic link farm
). Using this method the author managed to rank "Ike Antkare" ahead of Albert Einstein
for instance.
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...
created by scientists at the Massachusetts Institute of Technology
Massachusetts Institute of Technology
The Massachusetts Institute of Technology is a private research university located in Cambridge, Massachusetts. MIT has five schools and one college, containing a total of 32 academic departments, with a strong emphasis on scientific and technological education and research.Founded in 1861 in...
that randomly generates nonsense
Nonsense
Nonsense is a communication, via speech, writing, or any other symbolic system, that lacks any coherent meaning. Sometimes in ordinary usage, nonsense is synonymous with absurdity or the ridiculous...
in the form of computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
research papers, including graphs
Graphics
Graphics are visual presentations on some surface, such as a wall, canvas, computer screen, paper, or stone to brand, inform, illustrate, or entertain. Examples are photographs, drawings, Line Art, graphs, diagrams, typography, numbers, symbols, geometric designs, maps, engineering drawings,or...
, diagram
Diagram
A diagram is a two-dimensional geometric symbolic representation of information according to some visualization technique. Sometimes, the technique uses a three-dimensional visualization which is then projected onto the two-dimensional surface...
s, and citation
Citation
Broadly, a citation is a reference to a published or unpublished source . More precisely, a citation is an abbreviated alphanumeric expression Broadly, a citation is a reference to a published or unpublished source (not always the original source). More precisely, a citation is an abbreviated...
s. It uses a context-free grammar
Context-free grammar
In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....
to form all elements of the papers, and its authors state that their aim is "to maximize amusement, rather than coherence."
Sample output
Opening abstractAbstract (summary)
An abstract is a brief summary of a research article, thesis, review, conference proceeding or any in-depth analysis of a particular subject or discipline, and is often used to help the reader quickly ascertain the paper's purpose. When used, an abstract always appears at the beginning of a...
of Rooter: A Methodology for the Typical Unification of Access Points and Redundancy:
Prominent results
In 2005 a paper generated by SCIgen, Rooter: A Methodology for the Typical Unification of Access Points and Redundancy, was accepted as a "non-reviewed" paper to the 2005 World Multiconference on Systemics, Cybernetics and InformaticsWMSCI
WMSCI, the World Multiconference on Systemics, Cybernetics and Informatics, is a computer science and engineering conference that has occurred annually since 1995.-History:...
and the authors were invited to speak. The authors of SCIgen described their hoax on their website, and it soon received great publicity when picked up by Slashdot
Slashdot
Slashdot is a technology-related news website owned by Geeknet, Inc. The site, which bills itself as "News for Nerds. Stuff that Matters", features user-submitted and ‑evaluated current affairs news stories about science- and technology-related topics. Each story has a comments section...
.
WMSCI withdrew their invitation, but the SCIgen team went anyway, renting space in the hotel separately from the conference and delivering a series of randomly generated talks on their own "track." The organizer of all these conferences is Professor Nagib Callaos. The WMSCI was also sponsored by the Institute of Electrical and Electronics Engineers
Institute of Electrical and Electronics Engineers
The Institute of Electrical and Electronics Engineers is a non-profit professional association headquartered in New York City that is dedicated to advancing technological innovation and excellence...
from 2000 until 2005. The IEEE stopped granting sponsorship to Callaos in 2006, while Callaos received again IEEE sponsorship in 2008.
Submitting the paper was a deliberate attempt to embarrass WMSCI, which the authors claim accepts low-quality papers and sends unsolicited requests for submissions in bulk to academics. As the SCIgen website states:
Computing writer Stan Kelly-Bootle
Stan Kelly-Bootle
Stan Kelly-Bootle is an author of nine books and numerous magazine articles, and songwriter. His most famous song is the Liverpool Lullaby , which Cilla Black recorded in 1969 as the B-side to her pop hit Conversations...
noted in ACM Queue
ACM Queue
ACM Queue is a computer magazine published by the Association for Computing Machinery . Steve Bourne helped found the magazine when he was President of the ACM and he is now Chair of the Advisory Board. The magazine is produced by computing professionals and is intended for computing professionals...
that many sentences in the "Rooter" paper were individually plausible, which he regarded as posing a problem for automated detection of hoax articles. He suggested that even human readers might be taken in by the effective use of jargon ("The pun on root/router is par for MIT-graduate humor, and at least one occurrence of methodology is mandatory") and attribute the paper's apparent incoherence to their own limited knowledge. His conclusion was that "a reliable gibberish filter requires a careful holistic review by several peer domain experts".
List of works with notable acceptance
- Rob Thomas: Rooter: A Methodology for the Typical Unification of Access Points and Redundancy, 2005 for WMSCI (see above)
- Mathias Uslar's paper was accepted to the IPSI-BG conference.
- Professor Genco Gülan published a paper in the 3rd International Symposium of Interactive Media Design.
- Students at IranIranIran , officially the Islamic Republic of Iran , is a country in Southern and Western Asia. The name "Iran" has been in use natively since the Sassanian era and came into use internationally in 1935, before which the country was known to the Western world as Persia...
's Sharif University of TechnologySharif University of TechnologySharif University of Technology is a university of higher education in technology, engineering and physical sciences in Tehran. Sharif University of Technology is one of the most prestigious universities in the country, and is considered Iran's MIT...
published a paper in the Journal of Applied Mathematics and Computation (which is published by ElsevierElsevierElsevier is a publishing company which publishes medical and scientific literature. It is a part of the Reed Elsevier group. Based in Amsterdam, the company has operations in the United Kingdom, USA and elsewhere....
). The students wrote under the false, non-Persian surname, MosallahNejad, which translates literally as: "from an Armed Breed". The paper was subsequently removed when the publishers were informed that it was a joke paper. - A paper titled "Towards the Simulation of E-Commerce" by Herbert Schlangemann got accepted as a reviewed paper at the "International Conference on Computer Science and Software Engineering" (CSSE) and was briefly in the IEEE Xplore Database . The author is named after the Swedish short film Der SchlangemannDer SchlangemannDer Schlangemann is a freely available 7 minute short filmin pseudo-German made by Andreas Hansson and Björn Renberg in Umeå, Sweden, 1998-2000....
. Furthermore the author was invited to be a session chair during the conference. Read the official Herbert Schlangemann Blog for details. The official review comment: "This paper presents cooperative technology and classical Communication. In conclusion, the result shows that though the much-touted amphibious algorithm for the refinement of randomized algorithms is impossible, the well-known client-server algorithm for the analysis of voice-over-IP by Kumar and Raman runs in _(n) time. The authors can clearly identify important features of visualization of DHTs and analyze them insightfully. It is recommended that the authors should develop ideas more cogently, organizes them more logically, and connects them with clear transitions" - Mikhail Gelfand published a translation of the "Rooter" article in the Russian-language Journal of Scientific Publications of Aspirants and Doctorants in August 2008. Gelfand was protesting against the journal, which was apparently not peer reviewed and was being used by Russian PhD candidates to publish in an "accredited" scientific journal, charging them 4000 Rubles to do so. The accreditation was revoked two weeks later.
Spoofing Google Scholar and h-index calculators
A 2010 paper by Cyril Labbe from Grenoble UniversityJoseph Fourier University
Université Joseph Fourier , often known as UJF, is a French university situated in the city of Grenoble and focused on the fields of sciences, technologies and health...
demonstrated the vulnerability of h-index
H-index
The h-index is an index that attempts to measure both the productivity and impact of the published work of a scientist or scholar. The index is based on the set of the scientist's most cited papers and the number of citations that they have received in other publications...
calculations based on Google Scholar
Google Scholar
Google Scholar is a freely accessible web search engine that indexes the full text of scholarly literature across an array of publishing formats and disciplines. Released in beta in November 2004, the Google Scholar index includes most peer-reviewed online journals of Europe and America's largest...
output by feeding it a large set of SCIgen-generated documents that were citing each other (effectively an academic link farm
Link farm
On the World Wide Web, a link farm is any group of web sites that all hyperlink to every other site in the group. Although some link farms can be created by hand, most are created through automated programs and services. A link farm is a form of spamming the index of a search engine...
). Using this method the author managed to rank "Ike Antkare" ahead of Albert Einstein
Albert Einstein
Albert Einstein was a German-born theoretical physicist who developed the theory of general relativity, effecting a revolution in physics. For this achievement, Einstein is often regarded as the father of modern physics and one of the most prolific intellects in human history...
for instance.
See also
- Derailment (thought disorder)Derailment (thought disorder)In psychiatry, derailment refers to a pattern of discourse that is a sequence of unrelated or only remotely related ideas. The frame of reference often changes from one sentence to the next...
- Infinite monkey theoremInfinite monkey theoremThe infinite monkey theorem states that a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will almost surely type a given text, such as the complete works of William Shakespeare....
- Postmodernism GeneratorPostmodernism GeneratorThe Postmodernism Generator is a computer program that automatically produces imitations of postmodernist writing, especially that of critical theory. It was written in 1996 by Andrew Bulhak of Monash University and is currently hosted at . The essays are produced from a formal grammar defined by a...
- snarXivSnarXivsnarXiv is a website spoofing the high-energy physics section of the popular electronic scientific paper repository arXiv. It was created in March 2010 by David Simmons-Duffin, a 3rd year Ph.D. student at Harvard University studying theoretical high-energy physics...
- Sokal affairSokal AffairThe Sokal affair, also known as the Sokal hoax, was a publishing hoax perpetrated by Alan Sokal, a physics professor at New York University. In 1996, Sokal submitted an article to Social Text, an academic journal of postmodern cultural studies...
- Turing testTuring testThe Turing test is a test of a machine's ability to exhibit intelligent behaviour. In Turing's original illustrative example, a human judge engages in a natural language conversation with a human and a machine designed to generate performance indistinguishable from that of a human being. All...