English on the Internet
Encyclopedia
The English language
is sometimes described as the lingua franca
of computing. In comparison to other sciences, where Latin
and Greek
are the principal sources of vocabulary, Computer Science
borrows more extensively from English. Due to the technical limitations of early computers, and the lack of international standards on the Internet
, computer users were limited to using English and the Latin alphabet. However, this historical limitation is less present today. Most software products are localized
in numerous languages and the use of the Unicode
character encoding has resolved problems with non-Latin alphabets. Some limitations have only been changed recently, such as with domain name
s, which previously allowed only ASCII
characters.
computing vocabulary. However, in many cases the borrowed terminology is translated, and not transcribed phonetically. Combined with the use of the Cyrillic alphabet
this can make it difficult to recognize loanwords. For example the Bulgarian term for motherboard
is 'дънна платка' (IPA /danna platka/ or literally "bottom board" ).
has a sparse scientific vocabulary based on the language itself. Many Faroese scientific words are borrowed and/or modified versions of especially Nordic and English equivalents. The vocabulary is constantly evolving and thus new words often die out, and only a few survive and become widely used. Examples of successful words include e.g. "telda" (computer), "kurla" (at sign) and "ambætari" (server). List of Faroese-English-Danish IT words
there are some generally accepted English loan-words, but there is also a distinct effort to avoid them. In France, the Académie française
is responsible for the standardisation of the language and often coins new technological terms. Some of them are accepted in practice, in other cases the English loanwords remain predominant. In Quebec, the Office québécois de la langue française
has a similar function.
, English words are very often used as well:
has its own vocabulary of scientific terms, still English borrowings exist. English or Icelandicised words are mostly used in casual conversations, whereas the Icelandic words might be longer or not widespread.
frequently untranslated, and their Spanish equivalent
Not translated
Undecided
Many computing terms in Spanish share a common root with their English counterpart. In these cases, both terms are understood, but the Spanish is preferred for formal use:
character encoding, created in the 1960s, only supported 256 different characters. With the use of additional software it was possible to provide support for some languages, for instance those based on the Cyrillic alphabet. However, complex-script languages like Chinese or Japanese need more characters than the 256 limit imposed by ASCII. Some computers created in the former USSR had native support for the Cyrillic alphabet.
The wide adoption of Unicode
, and UTF-8
on the web, resolved most of these historical limitations. ASCII remains the de-facto standard for command interpreters, programming languages and text-based communication protocols.
Some examples of non-English programming languages:
Examples:
It is notable that response codes, that is, the strings sent back by the recipient of a request, are typically numeric: for instance, in HTTP (and some borrowed by other protocols)
This is because response codes also need to convey unambiguous information, but can have various nuances that the requester may optionally use to vary its subsequent actions. To convey all such "sub-codes" with alphabetic words would be unwieldy, and negate the advantage of using pseudo-English words. Since responses are usually generated by software they do not need to be mnemonic. Numeric codes are also more easily analysed and categorised when they are processed by software, instead of a human testing the protocol by manual input.
chip, displaying text in English during boot time.
s are usually defined in terms of English keywords such as CTRL+F for find
.
is the largest language on the World Wide Web
, with 27% of internet users. Please refer to the article for Internet linguistic patterns for more details.
This lead may be eroding due mainly to a rapid increase of Chinese users, which broadly parallels China's advance on other economic fronts. In fact, if first-language speakers are compared, Chinese ought, in time, to outstrip English by a wide margin (837+ million for Mandarin Chinese, 370+ million for English).
First-language users among other relatively affluent countries appear generally stable, the two largest being German and Japanese, which each have between 5% and 10% of the overall share.
A classic example of this scenario is India
, the world's second most populated country. With economic growth, English has begun exploding as the emerging lingua franca in India. In 1995 it was thought that perhaps only 4% of the population was truly fluent in English (still an impressive 40 million). A decade later, by 2005, India had the world's largest English
-speaking and understanding population and second largest "Fluent English" speaking population (led only by the U.S.). It is expected to have the world's largest number of English speakers within a decade.
Chinese is rarely employed as a lingua franca
outside of China
by non-ethnic Chinese
; . Further, China is not truly monoglot: Mandarin
is official but different spoken variants of Chinese are often mutually unintelligible; . There is, however, an existing written standard that serves as a common written language.
In the future, then, English and Chinese may have roughly equal positions at the top of the overall web first-language users, but English will likely continue to dominate as the default choice for those accessing the World Wide Web in a second language.
Other world languages that could conceivably begin to challenge English include Spanish
and Arabic
, though it remains to be seen if these, too, will be largely isolated to first-language speakers on the Internet as is Chinese.
There are two notable facts about these percentages:
The English web content is greater than the number of first-language English users by as much as 2 to 1.
Given the enormous lead it already enjoys and its increasing use as a lingua franca in other spheres, English web content may continue to dominate even as English first-language Internet users decline. This is a classic positive feedback loop: new Internet users find it helpful to learn English and employ it on-line, thus reinforcing the language's prestige and forcing subsequent new users to learn English as well.
Certain other factors (some predating the medium's appearance) have propelled English into a majority web-content position. Most notable in this regard is the tendency for researchers and professionals to publish in English to ensure maximum exposure. The largest database of medical bibliographical information, for example, shows English was the majority language choice for the past forty years and its share has continually increased over the same period.
The fact that non-Anglophones regularly publish in English only reinforces the language's dominance. English has a rich technical vocabulary (largely because native and non-native speakers alike use it to communicate technical ideas) and many IT and technical professionals use English regardless of country of origin (Linus Torvalds
, for instance, comments his code in English, despite being from Finland and having Swedish as his first language).
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...
is sometimes described as the lingua franca
Lingua franca
A lingua franca is a language systematically used to make communication possible between people not sharing a mother tongue, in particular when it is a third language, distinct from both mother tongues.-Characteristics:"Lingua franca" is a functionally defined term, independent of the linguistic...
of computing. In comparison to other sciences, where Latin
Latin
Latin is an Italic language originally spoken in Latium and Ancient Rome. It, along with most European languages, is a descendant of the ancient Proto-Indo-European language. Although it is considered a dead language, a number of scholars and members of the Christian clergy speak it fluently, and...
and Greek
Greek language
Greek is an independent branch of the Indo-European family of languages. Native to the southern Balkans, it has the longest documented history of any Indo-European language, spanning 34 centuries of written records. Its writing system has been the Greek alphabet for the majority of its history;...
are the principal sources of vocabulary, Computer Science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
borrows more extensively from English. Due to the technical limitations of early computers, and the lack of international standards on the Internet
Internet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...
, computer users were limited to using English and the Latin alphabet. However, this historical limitation is less present today. Most software products are localized
Internationalization and localization
In computing, internationalization and localization are means of adapting computer software to different languages, regional differences and technical requirements of a target market...
in numerous languages and the use of the Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
character encoding has resolved problems with non-Latin alphabets. Some limitations have only been changed recently, such as with domain name
Domain name
A domain name is an identification string that defines a realm of administrative autonomy, authority, or control in the Internet. Domain names are formed by the rules and procedures of the Domain Name System ....
s, which previously allowed only ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
characters.
Influence on other languages
The computing terminology of many languages borrows from English. Some language communities resist actively to that trend, and in other cases English is used extensively and more directly. This section gives some examples for the use of English terminology in other languages, and also mentions any notable differences.Bulgarian
Both English and Russian have influence over BulgarianBulgarian language
Bulgarian is an Indo-European language, a member of the Slavic linguistic group.Bulgarian, along with the closely related Macedonian language, demonstrates several linguistic characteristics that set it apart from all other Slavic languages such as the elimination of case declension, the...
computing vocabulary. However, in many cases the borrowed terminology is translated, and not transcribed phonetically. Combined with the use of the Cyrillic alphabet
Cyrillic alphabet
The Cyrillic script or azbuka is an alphabetic writing system developed in the First Bulgarian Empire during the 10th century AD at the Preslav Literary School...
this can make it difficult to recognize loanwords. For example the Bulgarian term for motherboard
Motherboard
In personal computers, a motherboard is the central printed circuit board in many modern computers and holds many of the crucial components of the system, providing connectors for other peripherals. The motherboard is sometimes alternatively known as the mainboard, system board, or, on Apple...
is 'дънна платка' (IPA /danna platka/ or literally "bottom board" ).
- компютър /compiutar/ - computer
- твърд диск /tvard disk/ - hard disk
- дискета /disketa/ - floppy diskFloppy diskA floppy disk is a disk storage medium composed of a disk of thin and flexible magnetic storage medium, sealed in a rectangular plastic carrier lined with fabric that removes dust particles...
; like the French disquette - уеб сайт /web sait/ - web site; but also "интернет страница" /internet stranitsa/
Faroese
The Faroese languageFaroese language
Faroese , is an Insular Nordic language spoken by 48,000 people in the Faroe Islands and about 25,000 Faroese people in Denmark and elsewhere...
has a sparse scientific vocabulary based on the language itself. Many Faroese scientific words are borrowed and/or modified versions of especially Nordic and English equivalents. The vocabulary is constantly evolving and thus new words often die out, and only a few survive and become widely used. Examples of successful words include e.g. "telda" (computer), "kurla" (at sign) and "ambætari" (server). List of Faroese-English-Danish IT words
French
In FrenchFrench language
French is a Romance language spoken as a first language in France, the Romandy region in Switzerland, Wallonia and Brussels in Belgium, Monaco, the regions of Quebec and Acadia in Canada, and by various communities elsewhere. Second-language speakers of French are distributed throughout many parts...
there are some generally accepted English loan-words, but there is also a distinct effort to avoid them. In France, the Académie française
Académie française
L'Académie française , also called the French Academy, is the pre-eminent French learned body on matters pertaining to the French language. The Académie was officially established in 1635 by Cardinal Richelieu, the chief minister to King Louis XIII. Suppressed in 1793 during the French Revolution,...
is responsible for the standardisation of the language and often coins new technological terms. Some of them are accepted in practice, in other cases the English loanwords remain predominant. In Quebec, the Office québécois de la langue française
Office québécois de la langue française
The Office québécois de la langue française is a public organization established on March 24, 1961 by the Liberal government of Jean Lesage...
has a similar function.
- email/mail (in Europe); courriel (mainly in Quebec, but increasingly used in French speaking Europe); informally mèl; more formally "courrier électronique"
- pourriel - Spam
- hameçonnage, phishing - Phishing
- télécharger - to download
- site web - web site
- lien - website hyper-link
- base de données - Database
- caméra web - Webcam
- amorcer, démarrer, booter - to boot
- redémarrer, rebooter - to reboot
- arrêter, éteindre - to shutdown
- amorçable, bootable - Bootable
- overclocking, surfréquençage, surcadençage - Overclocking
- watercooling: refroidissement à l'eau
- tuning PC: case moddingCase moddingCase modification is the modification of a computer chassis , or a video game console chassis. Modifying a computer case in any non-standard way is considered a case mod...
German
In GermanGerman language
German is a West Germanic language, related to and classified alongside English and Dutch. With an estimated 90 – 98 million native speakers, German is one of the world's major languages and is the most widely-spoken first language in the European Union....
, English words are very often used as well:
- noun: Computer, Website, Software, E-Mail, Blog
- verb: downloaden, booten, crashen
Icelandic
The Icelandic languageIcelandic language
Icelandic is a North Germanic language, the main language of Iceland. Its closest relative is Faroese.Icelandic is an Indo-European language belonging to the North Germanic or Nordic branch of the Germanic languages. Historically, it was the westernmost of the Indo-European languages prior to the...
has its own vocabulary of scientific terms, still English borrowings exist. English or Icelandicised words are mostly used in casual conversations, whereas the Icelandic words might be longer or not widespread.
Russian
- History of computer hardware in Soviet Bloc countriesHistory of computer hardware in Soviet Bloc countriesThe history of computing hardware in the former Soviet Bloc is somewhat different from that of the Western world. As a result of the CoCom embargo, computers could not be imported in a large scale from capitalist countries...
- Computer RussificationComputer russificationIn computing, Russification is the localization of computers and software, i.e., making the user interface of a computer and software to communicate in the Russian language and alphabet....
Spanish
The English influence on the software industry and the internet in Latin America has borrowed significantly from the Castilian lexicon.frequently untranslated, and their Spanish equivalent
- email: correo electrónico
- messenger: mensajero
- webcam: cámara web
- website: página web, sitio web
- blog: bitácora, 'blog'
Not translated
- web
- flog
Undecided
Many computing terms in Spanish share a common root with their English counterpart. In these cases, both terms are understood, but the Spanish is preferred for formal use:
- mouse vs ratón
- net vs red
Character encoding
The early computer software and hardware had very little support for alphabets other than the Latin. As a result of this it was difficult or impossible to represent languages based on other scripts. The ASCIIASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
character encoding, created in the 1960s, only supported 256 different characters. With the use of additional software it was possible to provide support for some languages, for instance those based on the Cyrillic alphabet. However, complex-script languages like Chinese or Japanese need more characters than the 256 limit imposed by ASCII. Some computers created in the former USSR had native support for the Cyrillic alphabet.
The wide adoption of Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
, and UTF-8
UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...
on the web, resolved most of these historical limitations. ASCII remains the de-facto standard for command interpreters, programming languages and text-based communication protocols.
- MojibakeMojibake, from the Japanese 文字 "character" + 化け "change", is the occurrence of incorrect, unreadable characters shown when computer software fails to render text correctly according to its associated character encoding.-Causes:...
- Common mistakes
Programming language
The syntax of most programming languages uses English keywords, and therefore it could be argued some knowledge of English is required in order to use them. However, it is important to recognize all programming languages are in the class of formal languages. They are very different from any natural language, including English.Some examples of non-English programming languages:
- Although it uses English keywords, Ruby allows the use of JapaneseJapanese languageis a language spoken by over 130 million people in Japan and in Japanese emigrant communities. It is a member of the Japonic language family, which has a number of proposed relationships with other languages, none of which has gained wide acceptance among historical linguists .Japanese is an...
characters in variable names, and other elements of the code. - ArabicArabic languageArabic is a name applied to the descendants of the Classical Arabic language of the 6th century AD, used most prominently in the Quran, the Islamic Holy Book...
: ARLOGOARLOGOARLOGO is the Arabic language Logo Project. It is based on UCBLogo and is an attempt to create the first open-source Arabic programming language.At the moment, UCBLogo Arabic Beta 1 is available only for Microsoft Windows.-External links:*... - Bangla: BangaBhasha
- ChineseChinese languageThe Chinese language is a language or language family consisting of varieties which are mutually intelligible to varying degrees. Originally the indigenous languages spoken by the Han Chinese in China, it forms one of the branches of Sino-Tibetan family of languages...
: Chinese BASICChinese BASICChinese BASIC is the name given to several Chinese-localized versions of the BASIC programming language in the early 1980s.- Versions :... - DutchDutch languageDutch is a West Germanic language and the native language of the majority of the population of the Netherlands, Belgium, and Suriname, the three member states of the Dutch Language Union. Most speakers live in the European Union, where it is a first language for about 23 million and a second...
: Superlogo - FrenchFrench languageFrench is a Romance language spoken as a first language in France, the Romandy region in Switzerland, Wallonia and Brussels in Belgium, Monaco, the regions of Quebec and Acadia in Canada, and by various communities elsewhere. Second-language speakers of French are distributed throughout many parts...
: LSELSE (programming language)LSE is a programming language developed at Supélec in the late 1970s/early 1980s. It is similar to the BASIC, except with French-language instead of English-language keywords. It was derived from an earlier language called LSD, also developed at Supélec...
, WinDevWinDevWinDev is an integrated development environment fourth generation language , first published by PC SOFT in 1993, which is based upon a run-time engine . It uses a 4GL known as WLanguage. The tools enables a predetermined set of standard forms and algorithms to be used in an automated fashion to...
, Pascal (although the English version is more widespread)Pascal (programming language)Pascal is an influential imperative and procedural programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small and efficient language intended to encourage good programming practices using structured programming and data structuring.A derivative known as Object Pascal... - HebrewHebrew languageHebrew is a Semitic language of the Afroasiatic language family. Culturally, is it considered by Jews and other religious groups as the language of the Jewish people, though other Jewish languages had originated among diaspora Jews, and the Hebrew language is also used by non-Jewish groups, such...
: Hebrew Programming Language - IcelandicIcelandic languageIcelandic is a North Germanic language, the main language of Iceland. Its closest relative is Faroese.Icelandic is an Indo-European language belonging to the North Germanic or Nordic branch of the Germanic languages. Historically, it was the westernmost of the Indo-European languages prior to the...
: Fjölnir - Indian Languages: Hindawi Programming SystemHindawi Programming SystemHindawi Programming System is a suite of open source programming languages. It allows non-English medium literates to learn and write computer programs...
- RussianRussian languageRussian is a Slavic language used primarily in Russia, Belarus, Uzbekistan, Kazakhstan, Tajikistan and Kyrgyzstan. It is an unofficial but widely spoken language in Ukraine, Moldova, Latvia, Turkmenistan and Estonia and, to a lesser extent, the other countries that were once constituent republics...
: Glagol - SpanishSpanish languageSpanish , also known as Castilian , is a Romance language in the Ibero-Romance group that evolved from several languages and dialects in central-northern Iberia around the 9th century and gradually spread with the expansion of the Kingdom of Castile into central and southern Iberia during the...
: Lexico
Communication protocols
Many application protocols, especially those depending on widespread standardisation to be effective, use text strings for requests and parameters, rather than the binary values commonly used in lower layer protocols. The request strings are generally based on English words, although in some cases the strings are contractions or acronyms of English expressions, which renders them somewhat cryptic to anyone not familiar with the protocol, whatever their proficiency in English. Nevertheless, the use of word-like strings is a convenient mnemonic device that allows a person skilled in the art (and with sufficient knowledge of English) to execute the protocol manually from a keyboard, usually for the purpose of finding a problem with the service.Examples:
- FTP: USER, PASS (password), PASV (passive), PORT, RETR (retrieve), STOR (store), QUIT
- SMTP: HELO (hello), MAIL, RCPT (recipient), DATA, QUIT
- HTTP: GET, PUT, POST, HEAD (headers), DELETE, TRACE, OPTIONS
It is notable that response codes, that is, the strings sent back by the recipient of a request, are typically numeric: for instance, in HTTP (and some borrowed by other protocols)
- 200 OK request succeeded
- 301 Moved Permanently to redirect the request to a new address
- 404 Not Found the requested page does not exist
This is because response codes also need to convey unambiguous information, but can have various nuances that the requester may optionally use to vary its subsequent actions. To convey all such "sub-codes" with alphabetic words would be unwieldy, and negate the advantage of using pseudo-English words. Since responses are usually generated by software they do not need to be mnemonic. Numeric codes are also more easily analysed and categorised when they are processed by software, instead of a human testing the protocol by manual input.
BIOS
Many personal computers have a BIOSBIOS
In IBM PC compatible computers, the basic input/output system , also known as the System BIOS or ROM BIOS , is a de facto standard defining a firmware interface....
chip, displaying text in English during boot time.
Keyboard shortcut
Keyboard shortcutKeyboard shortcut
In computing, a keyboard shortcut is a finite set of one or more keys that invoke a software or operating system operation when triggered by the user. A meaning of term "keyboard shortcut" can vary depending on software manufacturer...
s are usually defined in terms of English keywords such as CTRL+F for find
Find
In Unix-like and some other operating systems, find is a command-line utility that searches through one or more directory trees of a file system, locates files based on some user-specified criteria and applies a user-specified action on each matched file...
.
English on the World Wide Web
EnglishEnglish language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...
is the largest language on the World Wide Web
World Wide Web
The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...
, with 27% of internet users. Please refer to the article for Internet linguistic patterns for more details.
English speakers
Web user percentages usually focus on raw comparisons of the first language of those who access the web. Just as important is a consideration of second- and foreign-language users; i.e., the first language of a user does not necessarily reflect which language he or she regularly employs when using the web.Native speakers
English-language users appear to be a plurality of web users, consistently cited as around one-third of the overall (near one billion). This reflects the relative affluence of English-speaking countries and high Internet penetration rates in them.This lead may be eroding due mainly to a rapid increase of Chinese users, which broadly parallels China's advance on other economic fronts. In fact, if first-language speakers are compared, Chinese ought, in time, to outstrip English by a wide margin (837+ million for Mandarin Chinese, 370+ million for English).
First-language users among other relatively affluent countries appear generally stable, the two largest being German and Japanese, which each have between 5% and 10% of the overall share.
As a foreign language
If a gradual decline in English first-language users is inevitable, it does not necessarily follow that English will not continue to be the language of choice for those accessing the World Wide Web. There is an enormous pool of English second-language speakers who employ the language in technical, governmental and educational spheres and access the Internet in English.A classic example of this scenario is India
India
India , officially the Republic of India , is a country in South Asia. It is the seventh-largest country by geographical area, the second-most populous country with over 1.2 billion people, and the most populous democracy in the world...
, the world's second most populated country. With economic growth, English has begun exploding as the emerging lingua franca in India. In 1995 it was thought that perhaps only 4% of the population was truly fluent in English (still an impressive 40 million). A decade later, by 2005, India had the world's largest English
Indian English
Indian English is an umbrella term used to describe dialects of the English language spoken primarily in the Republic of India.As a result of British colonial rule until Indian independence in 1947 English is an official language of India and is widely used in both spoken and literary contexts...
-speaking and understanding population and second largest "Fluent English" speaking population (led only by the U.S.). It is expected to have the world's largest number of English speakers within a decade.
Chinese is rarely employed as a lingua franca
Lingua franca
A lingua franca is a language systematically used to make communication possible between people not sharing a mother tongue, in particular when it is a third language, distinct from both mother tongues.-Characteristics:"Lingua franca" is a functionally defined term, independent of the linguistic...
outside of China
China
Chinese civilization may refer to:* China for more general discussion of the country.* Chinese culture* Greater China, the transnational community of ethnic Chinese.* History of China* Sinosphere, the area historically affected by Chinese culture...
by non-ethnic Chinese
Overseas Chinese
Overseas Chinese are people of Chinese birth or descent who live outside the Greater China Area . People of partial Chinese ancestry living outside the Greater China Area may also consider themselves Overseas Chinese....
; . Further, China is not truly monoglot: Mandarin
Standard Chinese
Standard Chinese, or Modern Standard Chinese, also known as Mandarin or Putonghua, is the official language of the People's Republic of China and Republic of China , and is one of the four official languages of Singapore....
is official but different spoken variants of Chinese are often mutually unintelligible; . There is, however, an existing written standard that serves as a common written language.
In the future, then, English and Chinese may have roughly equal positions at the top of the overall web first-language users, but English will likely continue to dominate as the default choice for those accessing the World Wide Web in a second language.
Other world languages that could conceivably begin to challenge English include Spanish
Spanish language
Spanish , also known as Castilian , is a Romance language in the Ibero-Romance group that evolved from several languages and dialects in central-northern Iberia around the 9th century and gradually spread with the expansion of the Kingdom of Castile into central and southern Iberia during the...
and Arabic
Arabic language
Arabic is a name applied to the descendants of the Classical Arabic language of the 6th century AD, used most prominently in the Quran, the Islamic Holy Book...
, though it remains to be seen if these, too, will be largely isolated to first-language speakers on the Internet as is Chinese.
World Wide Web content
One widely quoted figure for the amount of web content in English is 80%. Other sources show figures five to fifteen points lower, though still well over 50%.There are two notable facts about these percentages:
The English web content is greater than the number of first-language English users by as much as 2 to 1.
Given the enormous lead it already enjoys and its increasing use as a lingua franca in other spheres, English web content may continue to dominate even as English first-language Internet users decline. This is a classic positive feedback loop: new Internet users find it helpful to learn English and employ it on-line, thus reinforcing the language's prestige and forcing subsequent new users to learn English as well.
Certain other factors (some predating the medium's appearance) have propelled English into a majority web-content position. Most notable in this regard is the tendency for researchers and professionals to publish in English to ensure maximum exposure. The largest database of medical bibliographical information, for example, shows English was the majority language choice for the past forty years and its share has continually increased over the same period.
The fact that non-Anglophones regularly publish in English only reinforces the language's dominance. English has a rich technical vocabulary (largely because native and non-native speakers alike use it to communicate technical ideas) and many IT and technical professionals use English regardless of country of origin (Linus Torvalds
Linus Torvalds
Linus Benedict Torvalds is a Finnish software engineer and hacker, best known for having initiated the development of the open source Linux kernel. He later became the chief architect of the Linux kernel, and now acts as the project's coordinator...
, for instance, comments his code in English, despite being from Finland and having Swedish as his first language).