Internet linguistics
Encyclopedia
Internet linguistics is a sub-domain of linguistics
advocated by David Crystal
. It studies new language styles and forms that have arisen under the influence of the Internet
and other New Media
, such as Short Message Service
(SMS) text messaging
. Since the beginning of Human-computer interaction (HCI) leading to computer-mediated communication
(CMC) and Internet-mediated communication (IMC), experts have acknowledged that linguistics has a contributing role in it, in terms of web interface and usability. Studying the emerging language on the Internet can help improve conceptual organization, translation and web usability. This will benefit both linguists and web users.
The study of Internet linguistics can be effectively done through 4 main perspectives; sociolinguistics, education, stylistics and applied. Further dimensions have developed as a result of further technological advancements which include the development of the Web as Corpus and the spread and influence of the stylistic variations brought forth by the spread of the Internet, through the Mass Media and Literary Works. In view of the increasing number of users connected to the Internet, the linguistics future of the Internet remains to be determined as new computer-mediated technologies continues to emerge and people adapt their languages to suit these new mediums. The Internet continues to play a significant role in both encouraging as well as diverting attention away from the usage of languages.
The distinction between the Internet and the World Wide Web
, or simply the Web must be made clear. The Internet is a network of networks which connects computers worldwide while the Web is one part of this network. It is the medium through which people access the information that travels over the Internet.
, virtual worlds and the Web.
The evolution of these new mediums of communications has raised much concern with regards to the way language is being used. According to Crystal (2005), these concerns are neither without grounds nor unseen in history – it surfaces almost always when a new technology breakthrough influences languages; as seen in the 15th century when printing
was introduced, the 19th century when the telephone
was invented and the 20th century when broadcasting
began to penetrate our society.
At a personal level, CMC such as SMS Text Messaging and mobile emailing (push mail) has greatly enhanced instantaneous communication. Some examples include the iPhone
and the BlackBerry
.
In schools, it is not uncommon for educators and students to be given personalized school email accounts for communication and interaction purposes. Classroom discussions are increasingly being brought onto the Internet in the form of discussion forums. For instance, at Nanyang Technological University
, students engage in collaborative learning at the university’s portal – edveNTUre, where they participate in discussions on forums and online quizzes and view streaming podcasts prepared by their course instructors among others. ITunes U in 2008 began to collaborate with universities as they converted the Apple music service into a store that makes available academic lectures and scholastic materials for free – they have partnered more than 600 institutions in 18 countries including Oxford, Cambridge and Yale
Universities.
These forms of academic social networking and media are slated to rise as educators from all over the world continue to seek new ways to better engage students. It is commonplace for students in New York University
to interact with “guest speakers weighing in via Skype
, library staffs providing support via instant messaging
, and students accessing library resources from off campus.” This will affect the way language is used as students and teachers begin to use more of these CMC platforms.
At a professional level, it is a common sight for companies to have their computers and laptops hooked up onto the Internet (via wired and wireless Internet connection) and employees having individual email accounts. This greatly facilitates internal (among staffs of the company) and external (with other parties outside of one’s organization) communication. Mobile communications such as smart phones are increasingly making their way into the corporate world. For instance, in 2008, Apple announced their intention to actively step up their efforts to help companies incorporate the iPhone into their enterprise environment, facilitated by technological developments in streamlining integrated features (push e-mail, calendar and contact management) using ActiveSync
.
In general, these new CMCs that are made possible by the Internet have altered the way people use language – there is heightened informality and consequently a growing fear of its deterioration. However as David Crystal puts it, these should be seen positively as it reflects the power of the creativity of a language.
use, specifically on Standard English
, which in turn affects language education
. The rise and rapid spread of Internet use has brought about new linguistic features specific only to the Internet platform. These include, but are not limited to, an increase in the use of informal written language, inconsistency in written styles and stylistics
and the use of new abbreviations in Internet chats and SMS text messaging, where constraints of technology on word count contributed to the rise of new abbreviations. Such acronyms exist primarily for practical reasons — to reduce the time and effort required to communicate through these mediums apart from technological limitations. Examples of common acronyms include lol
(for laughing out loud; a general expression of laughter), omg (oh my god) and gtg (got to go).
The educational perspective has been considerably established in the research on the Internet's impact on language education. It is an important and crucial aspect as it affects and involves the education of current and future student generations in the appropriate and timely use of informal language that arises from Internet usage. There are concerns for the growing infiltration of informal language use and incorrect word use into academic or formal situations, such as the usage of casual words like "guy" or the choice of the word "preclude" in place of "precede" in academic papers by students. There are also issues with spellings and grammar occurring at a higher frequency among students' academic works as noted by educators, with the use of abbreviations such as "u" for "you" and "2" for "to" being the most common.
Linguists and professors like Eleanor Johnson suspect that widespread mistakes in writing are strongly connected to Internet usage, where educators have similarly reported new kinds of spelling and grammar mistakes in student works. There is, however, no scientific evidence
, to confirm the proposed connection. Though there are valid concerns about Internet usage and its impact on student's academic and formal writing, its severity is however enlarged by the informal nature of the new media platforms. Naomi S. Baron
(2008) argues in Always On that student writings suffer little impact from the use of Internet-mediated communication (IMC) such as internet chat, SMS text messaging and e-mail. A recent study published by the British Journal of Developmental Psychology
found that students who regularly texted (sends message via SMS using a mobile phone) displayed a wider range of vocabulary and this may lead to a positive impact on their reading development.
Though the use of the Internet resulted in stylistics that are not deemed appropriate in academic and formal language use, it is to be noted that Internet use may not hinder language education but instead aid it. The Internet has proven in different ways that it can provide potential benefits in enhancing language learning, especially in second or foreign language learning. Language education through the Internet in relation to Internet linguisitics is, most significantly, applied through the communication aspect (use of e-mails, discussion forums, chat messengers
, blog
s, etc.).
IMC allows for greater interaction between language learners and native speakers of the language, providing for greater error corrections and better learning opportunities of standard language, in the process allowing the picking up of specific skills such as negotiation and persuasion.
Future research also includes new varieties of expressions that the Internet and its various technologies are constantly producing and their effects not only on written languages but also their spoken forms. The communicative style of Internet language is best observed in the CMC channels below, as there are often attempts to overcome technological restraints such as transmission time lags and to re-establish social cues that are often vague in written text.
. The 160-character limit imposed by the cell phone has motivated users to exercise their linguistic creativity to overcome them. A similar example of new technology with character constraints is Twitter
, which has a 140-character limit. There have been debates as to whether these new abbreviated forms introduced in users’ Tweets are "lazy" or whether they are creative fragments of communication. Despite the on-going debate, there is no doubt that Twitter has contributed to the linguistic landscape with new lingoes and also brought about a new dimension of communication.
The cell phone has also created a new literary genre – cell phone novels. A typical cell phone novel consists of several chapters which readers download in short installments. These novels are in their "raw" form as they do not go through editing processes like traditional novels. They are written in short sentences, similar to text-messaging.
Authors of such novels are also able to receive feedbacks and new ideas from their readers through emails or online feedback channels. Unlike traditional novel writing, readers’ ideas sometimes get incorporated into the storyline or authors may also decide to change their story’s plot according to the demand and popularity of their novel (typically gauged by the number of download hits).
Despite their popularity, there has also been criticism regarding the novels’ "lack of diverse vocabulary" and poor grammar.
, videoblog, audioblog and moblog. These developments in interactive blogging have created new linguistic conventions and styles, with more expected to arise in the future.
and noob. Emoticons are further examples of how users have adapted different expressions to suit the limitations of cyberspace communication, one of which is the "loss of emotivity".
Communication in niches such as role-playing game
s (RPG) of Multi-User domains (MUDs) and virtual worlds is highly interactive, with emphasis on speed, brevity and spontaneity. As a result, CMC is generally more vibrant, volatile, unstructured and open. There are often complex organization of sequences and exchange structures evident in the connection of conversational strands and short turns. Some of the CMC strategies used include capitalization for words such as EMPHASIS, usage of symbols such as the asterisk to enclose words as seen in *stress* and the creative use of punctuation like ???!?!?!?. Besides contributing to these new forms in language, virtual worlds are also being used to teach languages. Virtual world language learning
provides students with simulations of real-life environments, allowing them to find creative ways to improve their language skills. Virtual worlds are good tools for language learning among the younger learners because they already see such places as a "natural place to learn and play".
Email
One of the most popular Internet-related technologies to be studied under this perspective is email, which has expanded the stylistics of languages in many ways. A study done on the linguistic profile of emails has shown that there is a hybrid of speech and writing styles in terms of format, grammar and style. Email is rapidly replacing traditional letter-writing because of its convenience, speed and spontaneity. It is often related to informality as it feels temporary and can be deleted easily. However, as this medium of communication matures, email is no longer confined to sending informal messages between friends and relatives. Instead, business correspondences are increasingly being carried out through emails. Job seekers are also using emails to send their resumes to potential employers. The result of a move towards more formal usages will be a medium representing a range of formal and informal stylistics.
While email has been blamed for students’ increased usage of informal language in their written work, David Crystal argues that email is "not a threat, for language education" because email with its array of stylistic expressiveness can act as a domain for language learners to make their own linguistic choices responsibly. Furthermore, the younger generation’s high propensity for using email may improve their writing and communication skills because of the efforts they are making to formulate their thoughts and ideas, albeit through a digital medium.
The applied perspective views the linguistic exploitation of the Internet in terms of its communicative capabilities – the good and the bad. The Internet provides a platform where users can experience multilingualism. Although English is still the dominant language used on the Internet, other languages are gradually increasing in their number of users. The Global Internet usage
page provides some information on the number of users of the Internet by language, nationality and geography. This multilingual environment continues to increase in diversity as more language communities become connected to the Internet. The Internet is thus a platform where minority and endangered language
s can seek to revive their language use and/or create awareness. This can be seen in two instances where it provides these languages opportunities for progress in two important regards - language documentation
and language revitalization.
Foundations such as the Hans Rausing Endangered Languages Project (HRELP), funded by Arcadia also help to develop the interest in linguistic documentation. The HRELP is a project that seeks to document endangered languages, preserve and disseminate documentation materials among others. The materials gathered are made available online under its Endangered Languages Archive (ELAR) program.
Other online materials that support language documentation include the Language Archive Newsletter which provides news and articles about topics in endangered languages. The web version of Ethnologue also provides brief information of all of the world’s known living languages. By making resources and information of endangered languages and language documentation available on the Internet, it allows researchers to build on these materials and hence preserve endangered languages.
, language revitalization through the internet is no longer restricted to literate users.
Hawaiian educators have been taking advantage of the Internet in their language revitalization programs. The graphical bulletin board system, Leoki (Powerful Voice), was established in 1994. The content, interface and menus of the system are entirely in the Hawaiian language. It is installed throughout the immersion school system and includes components for e-mails, chat, dictionary and online newspaper among others. In higher institutions such as colleges and universities where the Leoki system is not yet installed, the educators make use of other software and Internet tools such as Daedalus Interchange, e-mails and the Web to connect students of Hawaiian language with the broader community.
Another use of the Internet includes having students of minority languages write about their native cultures in their native languages for distant audiences. Also, in an attempt to preserve their language and culture, Occitan speakers have been taking advantage of the Internet to reach out to other Occitan speakers from around the world. These methods provide reasons for using the minority languages by communicating in it. In addition, the use of digital technologies, which the young generation think of as ‘cool’, will appeal to them and in turn maintain their interest and usage of their native languages.
The Internet can also be exploited for activities such as terrorism
, internet fraud
and pedophilia
. In recent years, there has been an increase in crimes that involved the use of the Internet such as e-mails and Internet Relay Chat
(IRC), as it is relatively easy to remain anonymous. These conspiracies carry concerns for security and protection. From a forensic linguistic point of view, there are many potential areas to explore. While developing a chat room child protection procedure
based on search terms filtering is effective, there is still minimal linguistically orientated literature to facilitate the task. In other areas, it is observed that the Semantic Web
has been involved in tasks such as personal data protection, which helps to prevent fraud.
at the 1989 ACL meeting in Vancouver. It was met with much controversy as they lacked theoretical integrity leading to much skepticism of their role in the field, until the publication of the journal ‘Using Large Corpora’ in 1993 that the relationship between computational linguistics and corpora became widely accepted.
To establish whether the Web is a corpus, it is worthwhile to turn to the definition established by McEnery and Wilson (1996, pp 21).
Relating closer to the Web as a Corpus, Manning and Schütze (1999, pp 120) further streamlines the definition:
Hit counts were used for carefully constructed search engine queries to identify rank orders for word sense frequencies, as an input to a word sense disambiguation engine. This method was further explored with the introduction of the concept of a parallel corpora where the existing Web pages that exist in parallel in local and major languages be brought together. It was demonstrated that it is possible to build a language-specific corpus from a single document in that specific language.
In areas of language modeling, the Web has been used to address data sparseness. Lexical statistics have been gathered for resolving prepositional phrase attachments, while Web document were used to seek a balance in the corpus.
In areas of information retrieval, a Web track was integrated as a component in the community’s TREC evaluation initiative. The sample of the Web used for this exercise amount to around 100GB, compromising of largely documents in the .gov top level domain.
The British National Corpus
contains ample information on the dominant meanings and usage patterns for the 10,000 words that forms the core of English.
The number of words in the British National Corpus (ca 100 million) is sufficient for many empirical strategies for learning about language for linguists and lexicographers, and is satisfactory for technologies that utilize quantitative information about the behavior of words as input (parsing).
However, for some other purposes, it is insufficient, as an outcome of the Zipfian nature of word frequencies. Although the bulk of the lexical stock occurs less than 50 times in the British National Corpus, it is insufficient for statistically stable conclusions about such words. Furthermore for some rarer words, rare meanings of common words, and combinations of words, no data has been found. Researchers find that probabilistic models of language based on very large quantities of data are better than ones based on estimates from smaller, cleaner data sets.
A test to find contiguous words like ‘deep breath’ revealed 868,631 Web pages containing the terms in AlltheWeb
. The number found through the search engines are more than three times the counts generated by the British National Corpus, indicating the significant size of the English corpus available on the Web.
The massive size of text available on the Web can be seen in the analysis of controlled data in which corpora of different languages were mixed in various proportions. The estimated Web size in words by AltaVista
saw English at the top of the list with 76,598,718,000 words. The next is German, with 7,035,850,000 words alongside with 6 other languages with over a billion hits. Even languages with fewer hits on the Web such as Slovenian, Croatian, Malay, and Turkish have more than one hundred million words on the Web. This reveals the potential strength and accuracy of using the Web as a Corpus given its significant size, which warrants much additional research such as the project currently being carried out by the British National Corpus to exploit its scale.
As Web texts are easily produced (in terms of cost and time) and with many different authors working on them, it often results in little concern for accuracy. Grammatical and typographical errors are regarded as “erroneous” forms that cause the Web to be a dirty corpus. Nonetheless, it may still be useful even with some noise.
The issue of whether sublanguage
s should be included remains unsettled. Proponents of it argue that with all sublanguages removed, it will result in an impoverished view of language. Since language is made up of lexicons, grammar and a wide array of different sublanguages, they should be included. However, it is not until recently that it became a viable option. Striking a middle ground by including some sublanguages is contentious because it’s an arbitrary issue of which to include and which not.
The decision of what to include in a corpus lies with corpus developers, and it has been done so with pragmatism. The desiderata and criteria used for the British National Corpus serves as a good model for a general-purpose, general-language corpus with the focus of being representative replaced with being balanced.
Search engines such as Google
serves as a default means of access to the Web and its wide array of linguistics resources. However for linguists working in the field of corpora, there presents a number of challenges. This includes the limited instances that are presented by the search engines (1,000 or 5,000 maximum); insufficient context for each instance (Google provides a fragment of around ten words); results selected according to criteria that are distorted (from a linguistic point of view) as search term in titles and headings often occupy the top results slots; inability to allow searches to be specified according to linguistic criteria, such as the citation form for a word, or word class; unreliability of statistics, with results varying according to search engine load and many other factors. At present, in view of the conflicts of priorities among the different stakeholders, the best solution is for linguists to attempt to correct these problems by themselves. This will then lead to a large number of possibilities opening in the area of harnessing the rich potential of the Web.
and literary works. The infiltration of Internet stylistics is important as mass audiences are exposed to the works, reinforcing certain Internet specific language styles which may not be acceptable in standard or more formal forms of language.
Apart from internet slang, grammatical errors and typographical errors are features of writing on the Internet and other CMC channels. As users of the Internet gets accustomed to these errors, it progressively infiltrates into everyday language use, in both written and spoken forms. It is also common to witness such errors in mass media works, from typographical errors in news articles to grammatical errors in advertisements and even internet slang in drama dialogues.
commercial in the United States, acronyms such as "BFF Jill" (which means "Best Friend Forever, Jill") were used. More businesses have adopted the use of Internet slang in their advertisements as the more people are growing up using the Internet and other CMC platforms, in an attempt to relate and connect to them better. Such commercials have received relatively enthusiastic feedback from its audiences.
The use of Internet lingo has also spread into the arena of music, significantly seen in popular music
. A recent example is Trey Songz's
lyrics for "LOL :-)", which incorporated many Internet lingo and mentions of Twitter and texting.
The spread of Internet linguistics is also present in films made by both commercial and independent filmmakers. Though primarily screened at film festivals, DVDs of independent films are often available for purchase over the internet including paid-live-streamings, making access to films more easily available for the public. The very nature of commercial films being screened at public cinemas allows for the wide exposure to the mainstream mass audience, resulting in a faster and wider spread of Internet slangs. The latest commercial film is titled "LOL" (acronym for Laugh Out Loud or Laughing Out Loud), starring Miley Cyrus
and Demi Moore
. This movie is a 2011 remake of the Lisa Azuelos'
2008 popular French film similarly titled "LOL (Laughing Out Loud)
".
The use of internet slangs is not limited to the English language but extends to other languages as well. The Korean language
has incorporated the English alphabet in the formation of its slang, while others were formed from common misspellings arising from fast typing. The new Korean slang is further reinforced and brought into everyday language use by television shows such as soap operas or comedy dramas like “High Kick Through the Roof” released in 2009.
Common misuse of punctuations include the semicolon and colon, the hyphen and the dash. Grammatical errors have also become more rampant in written works, especially in popular fiction. Readers of the Twilight
series by Stephenie Meyer
have pointed out several grammatical issues in the works. For example, in the second sentence of the second paragraph of Chapter 21 Trails of the Eclipse, it wrote “and I needed come to grips with the consequences” instead of “and I needed to come to grips with the consequences”. These mistakes were reinforced by the transmission of the works over the Internet.
As the number of Internet users increase rapidly around the world, the cultural background, linguistic habits and language differences among users are brought into the Web at a much faster pace. These individual differences among Internet users will significantly impact the future of Internet linguistics, notably in the aspect of the multilingual web. The Internet is on its way to becoming a more diverse multilingual Web, with a wider variety of languages being used. As seen from 2000 to 2010, Internet penetration has experienced its greatest growth in non-English speaking countries such as China, India and Africa, resulting in more languages apart from English penetrating the Web. This leads to the possibility of English losing its status as being the dominant language of the Internet in the future.
Also, the interaction between English and other languages will be an important area of study. As global users interact with each other, possible references to different languages may continue to increase, resulting in formation of new Internet stylistics that spans across languages. Chinese and Korean languages have already experienced English language's infiltration leading to the formation of their multilingual Internet lingo.
At current state, the Internet provides a form of education and promotion for minority languages. However, similar to how cross-language interaction has resulted in English language's infiltration into Chinese and Korean languages to form new slangs, minority languages are also affected by the more common languages used on the Internet (such as English and Spanish). While language interaction can cause a loss in the authentic standard of minority languages, familiarity of the majority language can also affect the minority languages in adverse ways. For example, users attempting to learn the minority language may opt to read and understand about it in a majority language and stop there, resulting in a loss instead of gain in the potential speakers of the minority language. Also, speakers of minority languages may be encouraged to learn the more common languages that are being used on the Web in order to gain access to more resources, and in turn leading to a decline in their usage of their own language. The future of endangered minority languages in view of the spread of Internet remains to be observed.
Linguistics
Linguistics is the scientific study of human language. Linguistics can be broadly broken into three categories or subfields of study: language form, language meaning, and language in context....
advocated by David Crystal
David Crystal
David Crystal OBE FLSW FBA is a linguist, academic and author.-Background and career:Crystal was born in Lisburn, Northern Ireland. He grew up in Holyhead, North Wales, and Liverpool, England where he attended St Mary's College from 1951....
. It studies new language styles and forms that have arisen under the influence of the Internet
Internet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...
and other New Media
New media
New media is a broad term in media studies that emerged in the latter part of the 20th century. For example, new media holds out a possibility of on-demand access to content any time, anywhere, on any digital device, as well as interactive user feedback, creative participation and community...
, such as Short Message Service
Short message service
Short Message Service is a text messaging service component of phone, web, or mobile communication systems, using standardized communications protocols that allow the exchange of short text messages between fixed line or mobile phone devices...
(SMS) text messaging
Text messaging
Text messaging, or texting, refers to the exchange of brief written text messages between fixed-line phone or mobile phone and fixed or portable devices over a network...
. Since the beginning of Human-computer interaction (HCI) leading to computer-mediated communication
Computer-mediated communication
Computer-mediated communication is defined as any communicative transaction that occurs through the use of two or more networked computers...
(CMC) and Internet-mediated communication (IMC), experts have acknowledged that linguistics has a contributing role in it, in terms of web interface and usability. Studying the emerging language on the Internet can help improve conceptual organization, translation and web usability. This will benefit both linguists and web users.
The study of Internet linguistics can be effectively done through 4 main perspectives; sociolinguistics, education, stylistics and applied. Further dimensions have developed as a result of further technological advancements which include the development of the Web as Corpus and the spread and influence of the stylistic variations brought forth by the spread of the Internet, through the Mass Media and Literary Works. In view of the increasing number of users connected to the Internet, the linguistics future of the Internet remains to be determined as new computer-mediated technologies continues to emerge and people adapt their languages to suit these new mediums. The Internet continues to play a significant role in both encouraging as well as diverting attention away from the usage of languages.
The distinction between the Internet and the World Wide Web
World Wide Web
The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...
, or simply the Web must be made clear. The Internet is a network of networks which connects computers worldwide while the Web is one part of this network. It is the medium through which people access the information that travels over the Internet.
Main perspectives
David Crystal has identified four main perspectives for further investigation – the sociolinguistic perspective, the educational perspective, the stylistic perspective and the applied perspective. The four perspectives are effectively interlinked and affect one another.Sociolinguistic perspective
This perspective deals with how society views the impact of Internet development on languages. The advent of the Internet has revolutionized communication in many ways; it changed the way people communicate and created new platforms with far-reaching social impact. Significant avenues include but are not limited to SMS Text Messaging, e-mails, chatgroupsOnline chat
Online chat may refer to any kind of communication over the Internet, that offers an instantaneous transmission of text-based messages from sender to receiver, hence the delay for visual access to the sent message shall not hamper the flow of communications in any of the directions...
, virtual worlds and the Web.
The evolution of these new mediums of communications has raised much concern with regards to the way language is being used. According to Crystal (2005), these concerns are neither without grounds nor unseen in history – it surfaces almost always when a new technology breakthrough influences languages; as seen in the 15th century when printing
Printing
Printing is a process for reproducing text and image, typically with ink on paper using a printing press. It is often carried out as a large-scale industrial process, and is an essential part of publishing and transaction printing....
was introduced, the 19th century when the telephone
Telephone
The telephone , colloquially referred to as a phone, is a telecommunications device that transmits and receives sounds, usually the human voice. Telephones are a point-to-point communication system whose most basic function is to allow two people separated by large distances to talk to each other...
was invented and the 20th century when broadcasting
Broadcasting
Broadcasting is the distribution of audio and video content to a dispersed audience via any audio visual medium. Receiving parties may include the general public or a relatively large subset of thereof...
began to penetrate our society.
At a personal level, CMC such as SMS Text Messaging and mobile emailing (push mail) has greatly enhanced instantaneous communication. Some examples include the iPhone
IPhone
The iPhone is a line of Internet and multimedia-enabled smartphones marketed by Apple Inc. The first iPhone was unveiled by Steve Jobs, then CEO of Apple, on January 9, 2007, and released on June 29, 2007...
and the BlackBerry
BlackBerry
BlackBerry is a line of mobile email and smartphone devices developed and designed by Canadian company Research In Motion since 1999.BlackBerry devices are smartphones, designed to function as personal digital assistants, portable media players, internet browsers, gaming devices, and much more...
.
In schools, it is not uncommon for educators and students to be given personalized school email accounts for communication and interaction purposes. Classroom discussions are increasingly being brought onto the Internet in the form of discussion forums. For instance, at Nanyang Technological University
Nanyang Technological University
Nanyang Technological University is one of the two largest public universities in Singapore with the biggest campus in Singapore and the world's largest engineering college. Its lush 200-hectare Yunnan Garden campus was the Youth Olympic Village of the world's first 2010 Summer Youth Olympics in...
, students engage in collaborative learning at the university’s portal – edveNTUre, where they participate in discussions on forums and online quizzes and view streaming podcasts prepared by their course instructors among others. ITunes U in 2008 began to collaborate with universities as they converted the Apple music service into a store that makes available academic lectures and scholastic materials for free – they have partnered more than 600 institutions in 18 countries including Oxford, Cambridge and Yale
Yale University
Yale University is a private, Ivy League university located in New Haven, Connecticut, United States. Founded in 1701 in the Colony of Connecticut, the university is the third-oldest institution of higher education in the United States...
Universities.
These forms of academic social networking and media are slated to rise as educators from all over the world continue to seek new ways to better engage students. It is commonplace for students in New York University
New York University
New York University is a private, nonsectarian research university based in New York City. NYU's main campus is situated in the Greenwich Village section of Manhattan...
to interact with “guest speakers weighing in via Skype
Skype
Skype is a software application that allows users to make voice and video calls and chat over the Internet. Calls to other users within the Skype service are free, while calls to both traditional landline telephones and mobile phones can be made for a fee using a debit-based user account system...
, library staffs providing support via instant messaging
Instant messaging
Instant Messaging is a form of real-time direct text-based chatting communication in push mode between two or more people using personal computers or other devices, along with shared clients. The user's text is conveyed over a network, such as the Internet...
, and students accessing library resources from off campus.” This will affect the way language is used as students and teachers begin to use more of these CMC platforms.
At a professional level, it is a common sight for companies to have their computers and laptops hooked up onto the Internet (via wired and wireless Internet connection) and employees having individual email accounts. This greatly facilitates internal (among staffs of the company) and external (with other parties outside of one’s organization) communication. Mobile communications such as smart phones are increasingly making their way into the corporate world. For instance, in 2008, Apple announced their intention to actively step up their efforts to help companies incorporate the iPhone into their enterprise environment, facilitated by technological developments in streamlining integrated features (push e-mail, calendar and contact management) using ActiveSync
ActiveSync
ActiveSync is a mobile data synchronization technology and protocol developed by Microsoft, originally released in 1996. There are two implementations of the technology: one which synchronizes data and information with handheld devices with a specific desktop computer , and another technology,...
.
In general, these new CMCs that are made possible by the Internet have altered the way people use language – there is heightened informality and consequently a growing fear of its deterioration. However as David Crystal puts it, these should be seen positively as it reflects the power of the creativity of a language.
Themes
The sociolinguistics of the Internet may also be examined through five interconnected themes.- MultilingualismMultilingualismMultilingualism is the act of using, or promoting the use of, multiple languages, either by an individual speaker or by a community of speakers. Multilingual speakers outnumber monolingual speakers in the world's population. Multilingualism is becoming a social phenomenon governed by the needs of...
– It looks at the prevalence and status of various languages on the Internet. - Language changeLanguage changeLanguage change is the phenomenon whereby phonetic, morphological, semantic, syntactic, and other features of language vary over time. The effect on language over time is known as diachronic change. Two linguistic disciplines in particular concern themselves with studying language change:...
– From a sociolinguistic perspective, language change is influenced by the physical constraints of technology (e.g. typed text) and the shifting social-economic priorities such as globalization. It explores the linguistic changes over time, with emphasis on Internet lingo. - Conversation discourseConversation analysisConversation analysis is the study of talk in interaction . CA generally attempts to describe the orderliness, structure and sequential patterns of interaction, whether institutional or in casual conversation.Inspired by ethnomethodology Conversation analysis (commonly abbreviated as CA) is the...
– It explores the changes in patterns of social interaction and communicative practice on the Internet. - Stylistic diffusion – It involves the study of the spread of Internet jargons and related linguistic forms into common usage. As language changes, conversation discourse and stylistic diffusion overlap with the aspect of language stylistics.
- See below: Stylistic perspective
- MetalanguageMetalanguageBroadly, any metalanguage is language or symbols used when language itself is being discussed or examined. In logic and linguistics, a metalanguage is a language used to make statements about statements in another language...
and folk linguisticsFolk linguisticsFolk linguistics is a term applied to the amateur study of linguistics. The term is often used as a pejorative.The linguist Ray Jackendoff points out that applying folk linguistics to education can be potentially damaging to the attainment of students who speak less standard dialects...
– In involves looking at the way these linguistic forms and changes on the Internet are labelled and discussed (e.g. impact of Internet lingo resulted in the 'death' of the apostrophe and loss of capitalization.)
Educational perspective
The educational perspective of internet linguistics examines the Internet's impact on formal languageFormal language
A formal language is a set of words—that is, finite strings of letters, symbols, or tokens that are defined in the language. The set from which these letters are taken is the alphabet over which the language is defined. A formal language is often defined by means of a formal grammar...
use, specifically on Standard English
Standard English
Standard English refers to whatever form of the English language is accepted as a national norm in an Anglophone country...
, which in turn affects language education
Language education
Language education is the teaching and learning of a foreign or second language. Language education is a branch of applied linguistics.- Need for language education :...
. The rise and rapid spread of Internet use has brought about new linguistic features specific only to the Internet platform. These include, but are not limited to, an increase in the use of informal written language, inconsistency in written styles and stylistics
Stylistics (linguistics)
Stylistics is the study and interpretation of texts from a linguistic perspective. As a discipline it links literary criticism and linguistics, but has no autonomous domain of its own...
and the use of new abbreviations in Internet chats and SMS text messaging, where constraints of technology on word count contributed to the rise of new abbreviations. Such acronyms exist primarily for practical reasons — to reduce the time and effort required to communicate through these mediums apart from technological limitations. Examples of common acronyms include lol
LOL
LOL is an acronym or abbreviation of "laughing out loud", "lots of luck", or "lots of love". In medical slang, it is used as an acronym for "little old lady", a reference to the novel House of God.LOL or Lol may also refer to:...
(for laughing out loud; a general expression of laughter), omg (oh my god) and gtg (got to go).
The educational perspective has been considerably established in the research on the Internet's impact on language education. It is an important and crucial aspect as it affects and involves the education of current and future student generations in the appropriate and timely use of informal language that arises from Internet usage. There are concerns for the growing infiltration of informal language use and incorrect word use into academic or formal situations, such as the usage of casual words like "guy" or the choice of the word "preclude" in place of "precede" in academic papers by students. There are also issues with spellings and grammar occurring at a higher frequency among students' academic works as noted by educators, with the use of abbreviations such as "u" for "you" and "2" for "to" being the most common.
Linguists and professors like Eleanor Johnson suspect that widespread mistakes in writing are strongly connected to Internet usage, where educators have similarly reported new kinds of spelling and grammar mistakes in student works. There is, however, no scientific evidence
Scientific evidence
Scientific evidence has no universally accepted definition but generally refers to evidence which serves to either support or counter a scientific theory or hypothesis. Such evidence is generally expected to be empirical and properly documented in accordance with scientific method such as is...
, to confirm the proposed connection. Though there are valid concerns about Internet usage and its impact on student's academic and formal writing, its severity is however enlarged by the informal nature of the new media platforms. Naomi S. Baron
Naomi Baron
Naomi S. Baron is a linguist and professor of linguistics at the Department of Language and Foreign Studies, at American University, in Washington, D.C.. Her areas of research and interest include computer-mediated communication, writing and technology, language in social context, language...
(2008) argues in Always On that student writings suffer little impact from the use of Internet-mediated communication (IMC) such as internet chat, SMS text messaging and e-mail. A recent study published by the British Journal of Developmental Psychology
British Psychological Society
The British Psychological Society is a representative body for psychologists and psychology in the United Kingdom. The BPS is also a Registered Charity and, along with advantages, this also imposes certain constraints on what the society can and cannot do...
found that students who regularly texted (sends message via SMS using a mobile phone) displayed a wider range of vocabulary and this may lead to a positive impact on their reading development.
Though the use of the Internet resulted in stylistics that are not deemed appropriate in academic and formal language use, it is to be noted that Internet use may not hinder language education but instead aid it. The Internet has proven in different ways that it can provide potential benefits in enhancing language learning, especially in second or foreign language learning. Language education through the Internet in relation to Internet linguisitics is, most significantly, applied through the communication aspect (use of e-mails, discussion forums, chat messengers
Instant messaging
Instant Messaging is a form of real-time direct text-based chatting communication in push mode between two or more people using personal computers or other devices, along with shared clients. The user's text is conveyed over a network, such as the Internet...
, blog
Blog
A blog is a type of website or part of a website supposed to be updated with new content from time to time. Blogs are usually maintained by an individual with regular entries of commentary, descriptions of events, or other material such as graphics or video. Entries are commonly displayed in...
s, etc.).
IMC allows for greater interaction between language learners and native speakers of the language, providing for greater error corrections and better learning opportunities of standard language, in the process allowing the picking up of specific skills such as negotiation and persuasion.
Stylistic perspective
This perspective examines how the Internet and its related technologies have encouraged new and different forms of creativity in language, especially in literature. It looks at the Internet as a medium through which new language phenomena have arisen. This new mode of language is interesting to study because it is an amalgam of both spoken and written languages. For example, traditional writing is static compared to the dynamic nature of the new language on the Internet where words can appear in different colors and font sizes on the computer screen. Yet, this new mode of language also contains other elements not found in natural languages. One example is the concept of framing found in emails and discussion forums. In replying to emails, people generally use the sender’s email message as a frame to write their own messages. They can choose to respond to certain parts of an email message while leaving other bits out. In discussion forums, one can start a new thread and anyone regardless of their physical location can respond to the idea or thought that was set down through the Internet. This is something that is usually not found in written language.Future research also includes new varieties of expressions that the Internet and its various technologies are constantly producing and their effects not only on written languages but also their spoken forms. The communicative style of Internet language is best observed in the CMC channels below, as there are often attempts to overcome technological restraints such as transmission time lags and to re-establish social cues that are often vague in written text.
Mobile phones
Mobile phones (also called "cell phones") have an expressive potential beyond their basic communicative functions. This can be seen in text-messaging poetry competitions such as the one held by The GuardianThe Guardian
The Guardian, formerly known as The Manchester Guardian , is a British national daily newspaper in the Berliner format...
. The 160-character limit imposed by the cell phone has motivated users to exercise their linguistic creativity to overcome them. A similar example of new technology with character constraints is Twitter
Twitter
Twitter is an online social networking and microblogging service that enables its users to send and read text-based posts of up to 140 characters, informally known as "tweets".Twitter was created in March 2006 by Jack Dorsey and launched that July...
, which has a 140-character limit. There have been debates as to whether these new abbreviated forms introduced in users’ Tweets are "lazy" or whether they are creative fragments of communication. Despite the on-going debate, there is no doubt that Twitter has contributed to the linguistic landscape with new lingoes and also brought about a new dimension of communication.
The cell phone has also created a new literary genre – cell phone novels. A typical cell phone novel consists of several chapters which readers download in short installments. These novels are in their "raw" form as they do not go through editing processes like traditional novels. They are written in short sentences, similar to text-messaging.
Authors of such novels are also able to receive feedbacks and new ideas from their readers through emails or online feedback channels. Unlike traditional novel writing, readers’ ideas sometimes get incorporated into the storyline or authors may also decide to change their story’s plot according to the demand and popularity of their novel (typically gauged by the number of download hits).
Despite their popularity, there has also been criticism regarding the novels’ "lack of diverse vocabulary" and poor grammar.
Blogs
Blogging has brought about new ways of writing diaries and from a linguistic perspective, the language used in blogs is "in its most 'naked' form", published for the world to see without undergoing the formal editing process. This is what makes blogs stand out because almost all other forms of printed language have gone through some form of editing and standardization. David Crystal stated that blogs were "the beginning of a new stage in the evolution of the written language". Blogs have become so popular that they have expanded beyond written blogs, with the emergence of photoblogPhotoblog
A photoblog is a form of photo sharing and publishing in the format of a blog. It differs from a blog through the predominant use of and focus on photographs rather than text...
, videoblog, audioblog and moblog. These developments in interactive blogging have created new linguistic conventions and styles, with more expected to arise in the future.
Virtual worlds
Virtual worlds provide insights into how users are adapting the usage of natural language for communication within these new mediums. The Internet language that has arisen through user interactions in text-based chatrooms and computer-simulated worlds has led to the development of slangs within digital communities. Examples of these include pwnPwn
Pwn is a leetspeak slang term derived from the verb own, as meaning to appropriate or to conquer to gain ownership. The term implies domination or humiliation of a rival, used primarily in the Internet-based video game culture to taunt an opponent who has just been soundly defeated .In hacker...
and noob. Emoticons are further examples of how users have adapted different expressions to suit the limitations of cyberspace communication, one of which is the "loss of emotivity".
Communication in niches such as role-playing game
Role-playing game
A role-playing game is a game in which players assume the roles of characters in a fictional setting. Players take responsibility for acting out these roles within a narrative, either through literal acting, or through a process of structured decision-making or character development...
s (RPG) of Multi-User domains (MUDs) and virtual worlds is highly interactive, with emphasis on speed, brevity and spontaneity. As a result, CMC is generally more vibrant, volatile, unstructured and open. There are often complex organization of sequences and exchange structures evident in the connection of conversational strands and short turns. Some of the CMC strategies used include capitalization for words such as EMPHASIS, usage of symbols such as the asterisk to enclose words as seen in *stress* and the creative use of punctuation like ???!?!?!?. Besides contributing to these new forms in language, virtual worlds are also being used to teach languages. Virtual world language learning
Virtual world language learning
Virtual worlds are playing an increasingly important role in education, especially in language learning. By March 2007 it was estimated that over 200 universities or academic institutions were involved in Second Life...
provides students with simulations of real-life environments, allowing them to find creative ways to improve their language skills. Virtual worlds are good tools for language learning among the younger learners because they already see such places as a "natural place to learn and play".
While email has been blamed for students’ increased usage of informal language in their written work, David Crystal argues that email is "not a threat, for language education" because email with its array of stylistic expressiveness can act as a domain for language learners to make their own linguistic choices responsibly. Furthermore, the younger generation’s high propensity for using email may improve their writing and communication skills because of the efforts they are making to formulate their thoughts and ideas, albeit through a digital medium.
Instant messaging
Like other forms of online communication, instant messaging has also developed its own acronyms and short forms. However, instant messaging is quite different from email and chatgroups because it allows participants to interact with one another in real-time while conversing in private. With instant messaging, there is an added dimension of familiarity among participants. This increased degree of intimacy allows greater informality in language and "typographical idiosyncrasies". There are also greater occurrences of stylistic variation because there can be a very wide age gap between participants. For example, a granddaughter can catch up with her grandmother through instant messaging. Unlike chatgroups where participants come together with shared interests, there is no pressure to conform in language here.Applied perspective
]The applied perspective views the linguistic exploitation of the Internet in terms of its communicative capabilities – the good and the bad. The Internet provides a platform where users can experience multilingualism. Although English is still the dominant language used on the Internet, other languages are gradually increasing in their number of users. The Global Internet usage
Global internet usage
Global Internet usage provides information on the number of people who use the Internet by language, nationality, geography, etc.-Languages used on the Internet:Most web pages on the Internet are in English....
page provides some information on the number of users of the Internet by language, nationality and geography. This multilingual environment continues to increase in diversity as more language communities become connected to the Internet. The Internet is thus a platform where minority and endangered language
Endangered language
An endangered language is a language that is at risk of falling out of use. If it loses all its native speakers, it becomes a dead language. If eventually no one speaks the language at all it becomes an "extinct language"....
s can seek to revive their language use and/or create awareness. This can be seen in two instances where it provides these languages opportunities for progress in two important regards - language documentation
Language documentation
Language documentation is the process by which a language is documented from a documentary linguistics perspective. It aims to “to provide a comprehensive record of the linguistic practices characteristic of a given speech community”...
and language revitalization.
Language documentation
Firstly, the Internet facilitates language documentation. Digital archives of media such as audio and video recordings not only help to preserve language documentation, but also allows for global dissemination through the Internet. Publicity about endangered languages, such as Webster (2003) has helped to spur a worldwide interest in linguistic documentation.Foundations such as the Hans Rausing Endangered Languages Project (HRELP), funded by Arcadia also help to develop the interest in linguistic documentation. The HRELP is a project that seeks to document endangered languages, preserve and disseminate documentation materials among others. The materials gathered are made available online under its Endangered Languages Archive (ELAR) program.
Other online materials that support language documentation include the Language Archive Newsletter which provides news and articles about topics in endangered languages. The web version of Ethnologue also provides brief information of all of the world’s known living languages. By making resources and information of endangered languages and language documentation available on the Internet, it allows researchers to build on these materials and hence preserve endangered languages.
Language revitalization
Secondly, the Internet facilitates language revitalization. Throughout the years, the digital environment has developed in various sophisticated ways that allow for virtual contact. From e-mails, chats to instant messaging, these virtual environments have helped to bridge the spatial distance between communicators. The use of e-mails has been adopted in language courses to encourage students to communicate in various styles such as conference-type formats and also to generate discussions. Similarly, the use of e-mails facilitate language revitalization in the sense that speakers of a minority language who moved to a location where his native language is not being spoken can take advantage of the Internet to communicate with his family and friends, thus maintaining the use of his native language. With the development and increasing use of telephone broadband communication such as SkypeSkype
Skype is a software application that allows users to make voice and video calls and chat over the Internet. Calls to other users within the Skype service are free, while calls to both traditional landline telephones and mobile phones can be made for a fee using a debit-based user account system...
, language revitalization through the internet is no longer restricted to literate users.
Hawaiian educators have been taking advantage of the Internet in their language revitalization programs. The graphical bulletin board system, Leoki (Powerful Voice), was established in 1994. The content, interface and menus of the system are entirely in the Hawaiian language. It is installed throughout the immersion school system and includes components for e-mails, chat, dictionary and online newspaper among others. In higher institutions such as colleges and universities where the Leoki system is not yet installed, the educators make use of other software and Internet tools such as Daedalus Interchange, e-mails and the Web to connect students of Hawaiian language with the broader community.
Another use of the Internet includes having students of minority languages write about their native cultures in their native languages for distant audiences. Also, in an attempt to preserve their language and culture, Occitan speakers have been taking advantage of the Internet to reach out to other Occitan speakers from around the world. These methods provide reasons for using the minority languages by communicating in it. In addition, the use of digital technologies, which the young generation think of as ‘cool’, will appeal to them and in turn maintain their interest and usage of their native languages.
Exploitation of the Internet
-
- See also: Forensic linguisticsForensic linguisticsForensic linguistics is the application of linguistic knowledge, methods and insights to the forensic context of law, language, crime investigation, trial, and judicial procedure. It is a branch of applied linguistics...
- See also: Forensic linguistics
The Internet can also be exploited for activities such as terrorism
Terrorism
Terrorism is the systematic use of terror, especially as a means of coercion. In the international community, however, terrorism has no universally agreed, legally binding, criminal law definition...
, internet fraud
Internet fraud
Internet fraud refers to the use of Internet services to present fraudulent solicitations to prospective victims, to conduct fraudulent transactions, or to transmit the proceeds of fraud to financial institutions or to others connected with the scheme....
and pedophilia
Pedophilia
As a medical diagnosis, pedophilia is defined as a psychiatric disorder in adults or late adolescents typically characterized by a primary or exclusive sexual interest in prepubescent children...
. In recent years, there has been an increase in crimes that involved the use of the Internet such as e-mails and Internet Relay Chat
Internet Relay Chat
Internet Relay Chat is a protocol for real-time Internet text messaging or synchronous conferencing. It is mainly designed for group communication in discussion forums, called channels, but also allows one-to-one communication via private message as well as chat and data transfer, including file...
(IRC), as it is relatively easy to remain anonymous. These conspiracies carry concerns for security and protection. From a forensic linguistic point of view, there are many potential areas to explore. While developing a chat room child protection procedure
Content-control software
Content-control software, also known as censorware or web filtering software, is a term for software designed and optimized for controlling what content is permitted to a reader, especially when it is used to restrict material delivered over the Web...
based on search terms filtering is effective, there is still minimal linguistically orientated literature to facilitate the task. In other areas, it is observed that the Semantic Web
Semantic Web
The Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...
has been involved in tasks such as personal data protection, which helps to prevent fraud.
Dimensions of Internet linguistics
The dimensions covered in this section include looking at the Web as a corpus and issues of language identification and normalization. The impacts of internet linguistics on everyday life are examined under the spread and influence of Internet stylistics, trends of language change on the Internet and conversation discourse.The Web as a corpus
With the Web being a huge reservoir of data and resources, language scientists and technologist are increasingly turning to the web for language data. Corpora were first formally mentioned in the field of computational linguisticsComputational linguistics
Computational linguistics is an interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective....
at the 1989 ACL meeting in Vancouver. It was met with much controversy as they lacked theoretical integrity leading to much skepticism of their role in the field, until the publication of the journal ‘Using Large Corpora’ in 1993 that the relationship between computational linguistics and corpora became widely accepted.
To establish whether the Web is a corpus, it is worthwhile to turn to the definition established by McEnery and Wilson (1996, pp 21).
Relating closer to the Web as a Corpus, Manning and Schütze (1999, pp 120) further streamlines the definition:
Hit counts were used for carefully constructed search engine queries to identify rank orders for word sense frequencies, as an input to a word sense disambiguation engine. This method was further explored with the introduction of the concept of a parallel corpora where the existing Web pages that exist in parallel in local and major languages be brought together. It was demonstrated that it is possible to build a language-specific corpus from a single document in that specific language.
Themes
There has been much discussion about the possible developments in the arena of the Web as a corpus. The development of using the web as a data source for word sense disambiguation was brought forward in The EU MEANING project in 2002. It used the assumption that within a domain, words often have a single meaning, and that domains are identifiable on the Web. This was further explored by using Web technology to gather manual word sense annotations on the Word Expert Web site.In areas of language modeling, the Web has been used to address data sparseness. Lexical statistics have been gathered for resolving prepositional phrase attachments, while Web document were used to seek a balance in the corpus.
In areas of information retrieval, a Web track was integrated as a component in the community’s TREC evaluation initiative. The sample of the Web used for this exercise amount to around 100GB, compromising of largely documents in the .gov top level domain.
British National Corpus
-
- See also: British National CorpusBritish National CorpusThe British National Corpus is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. It was compiled as a general corpus in the field of corpus linguistics...
- See also: British National Corpus
The British National Corpus
British National Corpus
The British National Corpus is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. It was compiled as a general corpus in the field of corpus linguistics...
contains ample information on the dominant meanings and usage patterns for the 10,000 words that forms the core of English.
The number of words in the British National Corpus (ca 100 million) is sufficient for many empirical strategies for learning about language for linguists and lexicographers, and is satisfactory for technologies that utilize quantitative information about the behavior of words as input (parsing).
However, for some other purposes, it is insufficient, as an outcome of the Zipfian nature of word frequencies. Although the bulk of the lexical stock occurs less than 50 times in the British National Corpus, it is insufficient for statistically stable conclusions about such words. Furthermore for some rarer words, rare meanings of common words, and combinations of words, no data has been found. Researchers find that probabilistic models of language based on very large quantities of data are better than ones based on estimates from smaller, cleaner data sets.
The multilingual Web
The Web is clearly a multilingual corpus. It is estimated that 71% of the pages (453 million out of 634 million Web pages indexed by the Excite engine) were written in English, followed by Japanese (6.8%), German (5.1%), French (1.8%), Chinese (1.5%), Spanish (1.1%), Italian (0.9%), and Swedish (0.7%).A test to find contiguous words like ‘deep breath’ revealed 868,631 Web pages containing the terms in AlltheWeb
AlltheWeb
AlltheWeb was an Internet search engine that made its debut in mid-1999. It grew out of FTP Search, Tor Egge's doctorate thesis at the Norwegian University of Science and Technology, which he started in 1994, which in turn resulted in the formation of Fast Search and Transfer, established on July...
. The number found through the search engines are more than three times the counts generated by the British National Corpus, indicating the significant size of the English corpus available on the Web.
The massive size of text available on the Web can be seen in the analysis of controlled data in which corpora of different languages were mixed in various proportions. The estimated Web size in words by AltaVista
AltaVista
AltaVista is a web search engine owned by Yahoo!. AltaVista was once one of the most popular search engines but its popularity declined with the rise of Google...
saw English at the top of the list with 76,598,718,000 words. The next is German, with 7,035,850,000 words alongside with 6 other languages with over a billion hits. Even languages with fewer hits on the Web such as Slovenian, Croatian, Malay, and Turkish have more than one hundred million words on the Web. This reveals the potential strength and accuracy of using the Web as a Corpus given its significant size, which warrants much additional research such as the project currently being carried out by the British National Corpus to exploit its scale.
Challenges
In areas of language modeling, there are limitations on the applicability of any language model as the statistics for different types of text will be different. When a language technology application is put into use (applied to a new text type), it is not certain that the language model will fare in the same way as how it would when applied to the training corpus. It is found that there are substantial variations in model performance when the training corpus changes. This lack of theory types limits the assessment of the usefulness of language-modeling work.As Web texts are easily produced (in terms of cost and time) and with many different authors working on them, it often results in little concern for accuracy. Grammatical and typographical errors are regarded as “erroneous” forms that cause the Web to be a dirty corpus. Nonetheless, it may still be useful even with some noise.
The issue of whether sublanguage
Sublanguage
-In Natural Language:In Informatics, natural language processing, and machine translation, a sublanguage is the language of a restricted domain, particularly a technical domain...
s should be included remains unsettled. Proponents of it argue that with all sublanguages removed, it will result in an impoverished view of language. Since language is made up of lexicons, grammar and a wide array of different sublanguages, they should be included. However, it is not until recently that it became a viable option. Striking a middle ground by including some sublanguages is contentious because it’s an arbitrary issue of which to include and which not.
The decision of what to include in a corpus lies with corpus developers, and it has been done so with pragmatism. The desiderata and criteria used for the British National Corpus serves as a good model for a general-purpose, general-language corpus with the focus of being representative replaced with being balanced.
Search engines such as Google
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...
serves as a default means of access to the Web and its wide array of linguistics resources. However for linguists working in the field of corpora, there presents a number of challenges. This includes the limited instances that are presented by the search engines (1,000 or 5,000 maximum); insufficient context for each instance (Google provides a fragment of around ten words); results selected according to criteria that are distorted (from a linguistic point of view) as search term in titles and headings often occupy the top results slots; inability to allow searches to be specified according to linguistic criteria, such as the citation form for a word, or word class; unreliability of statistics, with results varying according to search engine load and many other factors. At present, in view of the conflicts of priorities among the different stakeholders, the best solution is for linguists to attempt to correct these problems by themselves. This will then lead to a large number of possibilities opening in the area of harnessing the rich potential of the Web.
Representation
Despite the sheer size of the Web, it may still not be representative of all the languages and domains in the world, and neither are other corpora. However, the huge quantities of text, in numerous languages and language types on a huge range of topics makes it a good starting point that opens up to large number of possibilities in the study of corpora.Impact of its spread and influence
Stylistics arising from Internet usage has spread beyond the new media into other areas and platforms, including but not limited to, films, musicMusic
Music is an art form whose medium is sound and silence. Its common elements are pitch , rhythm , dynamics, and the sonic qualities of timbre and texture...
and literary works. The infiltration of Internet stylistics is important as mass audiences are exposed to the works, reinforcing certain Internet specific language styles which may not be acceptable in standard or more formal forms of language.
Apart from internet slang, grammatical errors and typographical errors are features of writing on the Internet and other CMC channels. As users of the Internet gets accustomed to these errors, it progressively infiltrates into everyday language use, in both written and spoken forms. It is also common to witness such errors in mass media works, from typographical errors in news articles to grammatical errors in advertisements and even internet slang in drama dialogues.
Mass media
There has been instances of television advertisements using Internet slang, reinforcing the penetration of Internet stylistics in everyday language use. For example, in the CingularCingular Wireless
AT&T Mobility LLC is a wholly owned subsidiary of AT&T that provides wireless services to 100.7 million subscribers in the United States, Puerto Rico and the U.S. Virgin Islands...
commercial in the United States, acronyms such as "BFF Jill" (which means "Best Friend Forever, Jill") were used. More businesses have adopted the use of Internet slang in their advertisements as the more people are growing up using the Internet and other CMC platforms, in an attempt to relate and connect to them better. Such commercials have received relatively enthusiastic feedback from its audiences.
The use of Internet lingo has also spread into the arena of music, significantly seen in popular music
Popular music
Popular music belongs to any of a number of musical genres "having wide appeal" and is typically distributed to large audiences through the music industry. It stands in contrast to both art music and traditional music, which are typically disseminated academically or orally to smaller, local...
. A recent example is Trey Songz's
Trey Songz
Tremaine "Trey" Aldon Neverson , better known by his stage name Trey Songz, is an American singer-songwriter, rapper, record producer and actor. His debut album, I Gotta Make It, was released in 2005, while his second album, Trey Day, was released in 2007...
lyrics for "LOL :-)", which incorporated many Internet lingo and mentions of Twitter and texting.
The spread of Internet linguistics is also present in films made by both commercial and independent filmmakers. Though primarily screened at film festivals, DVDs of independent films are often available for purchase over the internet including paid-live-streamings, making access to films more easily available for the public. The very nature of commercial films being screened at public cinemas allows for the wide exposure to the mainstream mass audience, resulting in a faster and wider spread of Internet slangs. The latest commercial film is titled "LOL" (acronym for Laugh Out Loud or Laughing Out Loud), starring Miley Cyrus
Miley Cyrus
Miley Ray Cyrus is an American actress and pop singer-songwriter. She achieved wide fame for her role as Miley Stewart/Hannah Montana on the Disney Channel sitcom Hannah Montana....
and Demi Moore
Demi Moore
Demi Guynes Kutcher , known professionally as Demi Moore, is an American actress. After minor roles in film and a role in the soap opera General Hospital, Moore established her career in films such as St...
. This movie is a 2011 remake of the Lisa Azuelos'
Lisa Azuelos
Lisa Azuelos , born 6 November 1965, is a French director and writer. She is the daughter of singer Marie Laforêt.- Filmography :* Classe mannequin, two episodes written for the television series:** Héritage amoureux...
2008 popular French film similarly titled "LOL (Laughing Out Loud)
LOL (Laughing Out Loud)
LOL is a 2008 French comedy movie directed by Lisa Azuelos, who co-wrote its screenplay with Delgado Nans. It stars Sophie Marceau, who plays a mother of a girl named Lola, whose nickname is Lol...
".
The use of internet slangs is not limited to the English language but extends to other languages as well. The Korean language
Korean language
Korean is the official language of the country Korea, in both South and North. It is also one of the two official languages in the Yanbian Korean Autonomous Prefecture in People's Republic of China. There are about 78 million Korean speakers worldwide. In the 15th century, a national writing...
has incorporated the English alphabet in the formation of its slang, while others were formed from common misspellings arising from fast typing. The new Korean slang is further reinforced and brought into everyday language use by television shows such as soap operas or comedy dramas like “High Kick Through the Roof” released in 2009.
Literary works
The use of Internet slangs has affected the proper use of punctuations, increased occurrences of grammatical and typographical errors in Standard English, both in academic and journalistic works as well as fiction writing. Though it has been observed that incorrect spellings in written works are more often caused by typing errors than due to the use of internet slang or abbreviations.Common misuse of punctuations include the semicolon and colon, the hyphen and the dash. Grammatical errors have also become more rampant in written works, especially in popular fiction. Readers of the Twilight
Twilight
Twilight is the time between dawn and sunrise or between sunset and dusk, during which sunlight scattering in the upper atmosphere illuminates the lower atmosphere, and the surface of the earth is neither completely lit nor completely dark. The sun itself is not directly visible because it is below...
series by Stephenie Meyer
Stephenie Meyer
Stephenie Meyer is an American author known for her vampire romance series Twilight. The Twilight novels have gained worldwide recognition and sold over 100 million copies globally, with translations into 37 different languages...
have pointed out several grammatical issues in the works. For example, in the second sentence of the second paragraph of Chapter 21 Trails of the Eclipse, it wrote “and I needed come to grips with the consequences” instead of “and I needed to come to grips with the consequences”. These mistakes were reinforced by the transmission of the works over the Internet.
Linguistic future of the Internet
With the emergence of greater computer/Internet mediated communication systems, coupled with the readiness with which people adapt to meet the new demands of a more technologically sophisticated world, it is expected that users will continue to remain under pressure to alter their language use to suit the new dimensions of communication.As the number of Internet users increase rapidly around the world, the cultural background, linguistic habits and language differences among users are brought into the Web at a much faster pace. These individual differences among Internet users will significantly impact the future of Internet linguistics, notably in the aspect of the multilingual web. The Internet is on its way to becoming a more diverse multilingual Web, with a wider variety of languages being used. As seen from 2000 to 2010, Internet penetration has experienced its greatest growth in non-English speaking countries such as China, India and Africa, resulting in more languages apart from English penetrating the Web. This leads to the possibility of English losing its status as being the dominant language of the Internet in the future.
Also, the interaction between English and other languages will be an important area of study. As global users interact with each other, possible references to different languages may continue to increase, resulting in formation of new Internet stylistics that spans across languages. Chinese and Korean languages have already experienced English language's infiltration leading to the formation of their multilingual Internet lingo.
At current state, the Internet provides a form of education and promotion for minority languages. However, similar to how cross-language interaction has resulted in English language's infiltration into Chinese and Korean languages to form new slangs, minority languages are also affected by the more common languages used on the Internet (such as English and Spanish). While language interaction can cause a loss in the authentic standard of minority languages, familiarity of the majority language can also affect the minority languages in adverse ways. For example, users attempting to learn the minority language may opt to read and understand about it in a majority language and stop there, resulting in a loss instead of gain in the potential speakers of the minority language. Also, speakers of minority languages may be encouraged to learn the more common languages that are being used on the Web in order to gain access to more resources, and in turn leading to a decline in their usage of their own language. The future of endangered minority languages in view of the spread of Internet remains to be observed.
See also
- Internet slangInternet slangInternet slang is a type of slang that Internet users have popularized, and in many cases, have coined. Such terms often originate with the purpose of saving keystrokes. Many people use the same abbreviations in texting and instant messaging, and social networking websites...
- Stylistics (linguistics)Stylistics (linguistics)Stylistics is the study and interpretation of texts from a linguistic perspective. As a discipline it links literary criticism and linguistics, but has no autonomous domain of its own...
- Standard EnglishStandard EnglishStandard English refers to whatever form of the English language is accepted as a national norm in an Anglophone country...
- Applied linguisticsApplied linguisticsApplied linguistics is an interdisciplinary field of study that identifies, investigates, and offers solutions to language-related real-life problems...
- Glossary of Internet-related terminology
- Appendix: Internet Slang
Further reading
- Aitchison, J., & Lewis, D. M. (Eds.). (2003). New Media Language. London and New York: Routledge. ISBN 0415283035
- Baron, N. S. (2000). Alphabet to Email: How Written English Evolved and Where It’s Heading. London and New York: Routledge. ISBN 0415186854
- Beard, A. (2004). Language Change. London and New York: Routledge. ISBN 0415320569
- Biewer, C., Nesselhauf, N., & Hundt, M. (Eds.). (2006). Corpus Linguistics and the Web. The Netherlands: Rodopi. ISBN 9042021284
- Boardman, M. (2005). The Language of Websites. New York and London: Routledge. ISBN 0415328543
- Crystal, D. (2004). A Glossary of Netspeak and Textspeak. Edinburgh: Edinburgh University Press. ISBN 0748619828
- Crystal, D. (2004). The Language Revolution (Themes for the 21st Century). United Kingdom: Polity Press Ltd. ISBN 0745633129
- Crystal, D. (2006). Language and the Internet (2nd Ed.). Cambridge: Cambridge University Press. ISBN 9780521868594
- Crystal, D. (2011). Internet Linguistics: A Student Guide. New York: Routledge. ISBN 9780415602716
- Dieter, J. (2007). Webliteralität: Lesen und Schreiben im World Wide Web. ISBN 3833497297
- Enteen, J. (2010). Virtual English: Internet Use, Language, and Global Subjects. London and New York: Routledge. ISBN 041597724X
- Gerrand, P. (2009). Minority Languages on the Internet: Promoting the Regional Languages of Spain. VDM Verlag. ISBN 3639191110
- Gibbs, D., & Krause, K. (Eds.). (2006). Cyberlines 2.0.: Languages and Cultures of the Internet. Australia: James Nicholas Publishers. ISBN 1875408428
- Jenkins, J. (2003). World Englishes: A Resource Book for Students. London and New York: Routledge. ISBN 0415258065
- Macfadyen, L. P., Roche, J., & Doff, S. (2005). Communicating Across Cultures in Cyberspace : A Bibliographical Review of Intercultural Communication Online. Lit Verlag. ISBN 3825876136
- Thurlow, C., Lengel, L. B., & Tomic, A. (2004). Computer Mediated Communication: Social Interaction and the Internet. London: Sage Publications. ISBN 0761949542