Inauthentic Text
Encyclopedia
An inauthentic text is a computer-generated expository document meant to appear as genuine, but which is actually meaningless. Frequently they are created in order to be intermixed with genuine documents and thus manipulate the results of search engines, as with Spam blog
s. They are also carried along in email in order to fool spam filters by giving the spam the superficial characteristics of legitimate text.
Sometimes nonsensical documents are created with computer assistance for humorous effect, as with Dissociated press
or Flarf poetry
. They have also been used to challenge the veracity of a publication—MIT students submitted papers generated by a computer program called SCIgen
to a conference, where they were initially accepted. This led the students to claim that the bar for submissions was too low.
With the amount of computer generated text outpacing the ability of people to humans to curate it, there needs some means of distinguishing between the two. Yet automated approaches to determining absolutely whether a text is authentic or not face intrinsic challenges of semantics. Noam Chomsky
coined the phrase "Colorless green ideas sleep furiously
" giving an example of grammatically-correct, but semantically incoherent sentence; some will point out that in certain contexts one could give this sentence (or any phrase) meaning.
The first group to use the expression in this regard can be found below from Indiana University
. Their work explains in detail an attempt to detect inauthentic texts and identify pernicious problems of inauthentic texts in cyberspace. The site has a means of submitting text that assesses, based on supervised learning, whether a corpus is inauthentic or not. Many users have submitted incorrect types of data and have correspondingly commented on the scores. This application is meant for a specific kind of data; therefore, submitting, say, an email, will not return a meaningful score.
Spam blog
A spam blog, sometimes referred to by the neologism splog, is a blog which the author uses to promote affiliated websites, to increase the search engine rankings of associated sites or to simply sell links/ads....
s. They are also carried along in email in order to fool spam filters by giving the spam the superficial characteristics of legitimate text.
Sometimes nonsensical documents are created with computer assistance for humorous effect, as with Dissociated press
Dissociated press
Dissociated press is an algorithm for generating text based on another text. It is intended for transforming any text into potentially humorous garbage. The name is a play on "Associated Press".An implementation of the algorithm is available in Emacs....
or Flarf poetry
Flarf poetry
Flarf poetry can be characterized as an avant garde poetry movement of the late 20th century and the early 21st century. Its first practitioners used an aesthetic dedicated to the exploration of “the inappropriate” in all of its guises...
. They have also been used to challenge the veracity of a publication—MIT students submitted papers generated by a computer program called SCIgen
SCIgen
SCIgen is a program created by scientists at the Massachusetts Institute of Technology that randomly generates nonsense in the form of computer science research papers, including graphs, diagrams, and citations...
to a conference, where they were initially accepted. This led the students to claim that the bar for submissions was too low.
With the amount of computer generated text outpacing the ability of people to humans to curate it, there needs some means of distinguishing between the two. Yet automated approaches to determining absolutely whether a text is authentic or not face intrinsic challenges of semantics. Noam Chomsky
Noam Chomsky
Avram Noam Chomsky is an American linguist, philosopher, cognitive scientist, and activist. He is an Institute Professor and Professor in the Department of Linguistics & Philosophy at MIT, where he has worked for over 50 years. Chomsky has been described as the "father of modern linguistics" and...
coined the phrase "Colorless green ideas sleep furiously
Colorless green ideas sleep furiously
"Colorless green ideas sleep furiously" is a sentence composed by Noam Chomsky in his 1957 Syntactic Structures as an example of a sentence that is grammatically correct but semantically nonsensical. The term was originally used in his 1955 thesis "Logical Structures of Linguistic Theory"...
" giving an example of grammatically-correct, but semantically incoherent sentence; some will point out that in certain contexts one could give this sentence (or any phrase) meaning.
The first group to use the expression in this regard can be found below from Indiana University
Indiana University
Indiana University is a multi-campus public university system in the state of Indiana, United States. Indiana University has a combined student body of more than 100,000 students, including approximately 42,000 students enrolled at the Indiana University Bloomington campus and approximately 37,000...
. Their work explains in detail an attempt to detect inauthentic texts and identify pernicious problems of inauthentic texts in cyberspace. The site has a means of submitting text that assesses, based on supervised learning, whether a corpus is inauthentic or not. Many users have submitted incorrect types of data and have correspondingly commented on the scores. This application is meant for a specific kind of data; therefore, submitting, say, an email, will not return a meaningful score.
External links
- An Inauthentic Paper Detector from Indiana UniversityIndiana UniversityIndiana University is a multi-campus public university system in the state of Indiana, United States. Indiana University has a combined student body of more than 100,000 students, including approximately 42,000 students enrolled at the Indiana University Bloomington campus and approximately 37,000...
School of Informatics