Wordfilter
Encyclopedia
A wordfilter is a script typically used on Internet forum
s or chat room
s that automatically scans users' posts or comments as they are submitted and automatically changes or censors
particular words or phrases.
The most primitive wordfilters search only for a specific string and replaces it regardless of the situation. More advanced wordfilters will make distinctions against certain words, such as filtering "ass" but not "grass". The most advanced wordfilters may use regular expression
s.
s are typically partially replaced ('f*ck'), completely replaced ('****'), or replaced by nonsense words ('fark'). This relieves the administrators or moderators of the task of constantly patrolling the board to watch for such language. This may also help the message board avoid content-control software
installed on users' computers or networks, since such software often blocks access to Web pages that contain vulgar language.
Filtered phrases may be permanently replaced as it is saved (example: phpBB
1.x), or the original phrase may be saved but displayed as the censored text. In some software users can view the text behind the wordfilter by quoting the post.
s -- particular words or phrases constantly reused in posts, also known as "memes" -- often develop on forums. Some users find that these clichés add to the fun, but other users find them tedious, especially when overused. Administrators may configure the wordfilter to replace the annoying cliché with a more embarrassing phrase, or remove it altogether.
who try to fill the forum with repeated nonsense messages, or by spammers
who try to insert links to their commercial web sites. The site's wordfilter may be configured to remove the nonsense text used by the vandals, or to remove all links to particular websites from posts.
comments from being posted in response to stories. Some of the things they are designed to filter include:
, users aware of the filters will sometimes try to circumvent them by changing their lettering just enough to avoid the filters. A user trying to avoid a vulgarity filter might use "shi-" instead of "shit", for example. Some administrators respond by revising the wordfilters to catch common substitutions; others may make filter evasion a punishable offense of its own. A simple example of evading a wordfilter would be entering "f.uck" instead of "fuck" or using leet
. More advanced techniques of wordfilter evasion include the use of images, using hidden tags (such as fu[i][/i]ck), or Cyrillic characters
.
Another method is to use a soft hyphen
. A soft hyphen is only used to indicate where a word can be split when breaking text lines and is not displayed. By placing this halfway in a word, the word gets broken up and will in some cases not be recognised by the wordfilter.
Some more advanced filters, such as those in the online game RuneScape
, can detect bypassing such as "sh1t" instead of "shit". However, the downside of sensitive wordfilters is that legitimate phrases get filtered out as well.
, which is typically installed on an end user's PC or computer network, and which can filter all Internet content sent to or from the PC or network in question. Since wordfilters alter a user's words without his or her consent, some users still consider them to be censorship
, while others consider them an acceptable part of a forum operator's right to control the contents of the forum.
s. One example was 4chan
's wordfilter which replaced "wapanese" (slang term for a Westerner obsessed with Japanese culture
) with the initially nonsensical "weeaboo" — a word taken from a strip of the webcomic The Perry Bible Fellowship
— which then became a popular synonym for "wapanese" (for example as of November 2009, it has 7x the Google hits).
, blocks content because the text contains a string
of letters that are shared by an obscene
word.
Internet forum
An Internet forum, or message board, is an online discussion site where people can hold conversations in the form of posted messages. They differ from chat rooms in that messages are at least temporarily archived...
s or chat room
Chat room
The term chat room, or chatroom, is primarily used by mass media to describe any form of synchronous conferencing, occasionally even asynchronous conferencing...
s that automatically scans users' posts or comments as they are submitted and automatically changes or censors
Censorship
thumb|[[Book burning]] following the [[1973 Chilean coup d'état|1973 coup]] that installed the [[Military government of Chile |Pinochet regime]] in Chile...
particular words or phrases.
The most primitive wordfilters search only for a specific string and replaces it regardless of the situation. More advanced wordfilters will make distinctions against certain words, such as filtering "ass" but not "grass". The most advanced wordfilters may use regular expression
Regular expression
In computing, a regular expression provides a concise and flexible means for "matching" strings of text, such as particular characters, words, or patterns of characters. Abbreviations for "regular expression" include "regex" and "regexp"...
s.
Removal of vulgar language
Most commonly, wordfilters are used to censor language considered inappropriate by the operators of the forum or chat room. ExpletiveProfanity
Profanity is a show of disrespect, or a desecration or debasement of someone or something. Profanity can take the form of words, expressions, gestures, or other social behaviors that are socially constructed or interpreted as insulting, rude, vulgar, obscene, desecrating, or other forms.The...
s are typically partially replaced ('f*ck'), completely replaced ('****'), or replaced by nonsense words ('fark'). This relieves the administrators or moderators of the task of constantly patrolling the board to watch for such language. This may also help the message board avoid content-control software
Content-control software
Content-control software, also known as censorware or web filtering software, is a term for software designed and optimized for controlling what content is permitted to a reader, especially when it is used to restrict material delivered over the Web...
installed on users' computers or networks, since such software often blocks access to Web pages that contain vulgar language.
Filtered phrases may be permanently replaced as it is saved (example: phpBB
PhpBB
phpBB is a popular Internet forum package written in the PHP scripting language. The name "phpBB" is an abbreviation of PHP Bulletin Board...
1.x), or the original phrase may be saved but displayed as the censored text. In some software users can view the text behind the wordfilter by quoting the post.
Cliché control
ClichéCliché
A cliché or cliche is an expression, idea, or element of an artistic work which has been overused to the point of losing its original meaning or effect, especially when at some earlier time it was considered meaningful or novel. In phraseology, the term has taken on a more technical meaning,...
s -- particular words or phrases constantly reused in posts, also known as "memes" -- often develop on forums. Some users find that these clichés add to the fun, but other users find them tedious, especially when overused. Administrators may configure the wordfilter to replace the annoying cliché with a more embarrassing phrase, or remove it altogether.
Vandalism control
Internet forums are sometimes attacked by vandalsVandalism
Vandalism is the behaviour attributed originally to the Vandals, by the Romans, in respect of culture: ruthless destruction or spoiling of anything beautiful or venerable...
who try to fill the forum with repeated nonsense messages, or by spammers
Spam (electronic)
Spam is the use of electronic messaging systems to send unsolicited bulk messages indiscriminately...
who try to insert links to their commercial web sites. The site's wordfilter may be configured to remove the nonsense text used by the vandals, or to remove all links to particular websites from posts.
Lameness filter
Lameness filters are text-based wordfilters used by Slash-based websites to stop junkSpam (electronic)
Spam is the use of electronic messaging systems to send unsolicited bulk messages indiscriminately...
comments from being posted in response to stories. Some of the things they are designed to filter include:
- Too many capital letters
- Too much repetition
- ASCII artASCII artASCII art is a graphic design technique that uses computers for presentation and consists of pictures pieced together from the 95 printable characters defined by the ASCII Standard from 1963 and ASCII compliant character sets with proprietary extended characters...
- Comments which are too short or long
- Use of HTML tags that try to break web pages
- Comment titles consisting solely of "first post"
- Any occurrence of the word "gay" or other terms deemed (by the programmers) to be offensive/vulgar
Circumventing filters
Since wordfilters are automated and look only for particular sequences of charactersCharacter (computing)
In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language....
, users aware of the filters will sometimes try to circumvent them by changing their lettering just enough to avoid the filters. A user trying to avoid a vulgarity filter might use "shi-" instead of "shit", for example. Some administrators respond by revising the wordfilters to catch common substitutions; others may make filter evasion a punishable offense of its own. A simple example of evading a wordfilter would be entering "f.uck" instead of "fuck" or using leet
Leet
Leet , also known as eleet or leetspeak, is an alternative alphabet for the English language that is used primarily on the Internet. It uses various combinations of ASCII characters to replace Latinate letters...
. More advanced techniques of wordfilter evasion include the use of images, using hidden tags (such as fu[i][/i]ck), or Cyrillic characters
Cyrillic alphabet
The Cyrillic script or azbuka is an alphabetic writing system developed in the First Bulgarian Empire during the 10th century AD at the Preslav Literary School...
.
Another method is to use a soft hyphen
Soft hyphen
In computing and typesetting, a soft hyphen is a type of hyphen used to specify a place in text where a hyphenated break is allowed without forcing a line break in an inconvenient place if the text is re-flowed....
. A soft hyphen is only used to indicate where a word can be split when breaking text lines and is not displayed. By placing this halfway in a word, the word gets broken up and will in some cases not be recognised by the wordfilter.
Some more advanced filters, such as those in the online game RuneScape
RuneScape
RuneScape is a fantasy massively multiplayer online role-playing game released in January 2001 by Andrew and Paul Gower, and developed and published by Jagex Games Studio. It is a graphical browser game implemented on the client-side in Java, and incorporates 3D rendering...
, can detect bypassing such as "sh1t" instead of "shit". However, the downside of sensitive wordfilters is that legitimate phrases get filtered out as well.
Censorship aspects
Wordfilters are coded into the Internet forums or chat rooms, and operate only on material submitted to the forum or chat room in question. This distinguishes wordfilters from content-control softwareContent-control software
Content-control software, also known as censorware or web filtering software, is a term for software designed and optimized for controlling what content is permitted to a reader, especially when it is used to restrict material delivered over the Web...
, which is typically installed on an end user's PC or computer network, and which can filter all Internet content sent to or from the PC or network in question. Since wordfilters alter a user's words without his or her consent, some users still consider them to be censorship
Censorship
thumb|[[Book burning]] following the [[1973 Chilean coup d'état|1973 coup]] that installed the [[Military government of Chile |Pinochet regime]] in Chile...
, while others consider them an acceptable part of a forum operator's right to control the contents of the forum.
Cultural significance
Some wordfilters originally implemented for their humorous value became Internet memeInternet meme
The term Internet meme is used to describe a concept that spreads via the Internet. The term is a reference to the concept of memes, although the latter concept refers to a much broader category of cultural information.-Description:...
s. One example was 4chan
4chan
4chan is an English-language imageboard website. Launched on October 1, 2003, its boards were originally used for the posting of pictures and discussion of manga and anime...
's wordfilter which replaced "wapanese" (slang term for a Westerner obsessed with Japanese culture
Japanophile
Japanophilia is an interest in, or love of, Japan and anything Japanese; its opposite is Japanophobia. One who has such an interest or love is a Japanophile...
) with the initially nonsensical "weeaboo" — a word taken from a strip of the webcomic The Perry Bible Fellowship
The Perry Bible Fellowship
The Perry Bible Fellowship is a newspaper comic strip and webcomic by Nicholas Gurewitch. It originated in the Syracuse University newspaper The Daily Orange. The comics are usually three or four panels long, and are generally characterized by the juxtaposition of whimsical childlike imagery or...
— which then became a popular synonym for "wapanese" (for example as of November 2009, it has 7x the Google hits).
False positives
The Scunthorpe problem occurs when a wordfilter, spam filter, or search engineSearch engine
A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...
, blocks content because the text contains a string
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....
of letters that are shared by an obscene
Obscenity
An obscenity is any statement or act which strongly offends the prevalent morality of the time, is a profanity, or is otherwise taboo, indecent, abhorrent, or disgusting, or is especially inauspicious...
word.