Bengali script
Encyclopedia
The Bengali alphabet ( bangla lipi or bônggolipi) is the writing system
Writing system
A writing system is a symbolic system used to represent elements or statements expressible in language.-General properties:Writing systems are distinguished from other possible symbolic communication systems in that the reader must usually understand something of the associated spoken language to...

 for the Bengali language
Bengali language
Bengali or Bangla is an eastern Indo-Aryan language. It is native to the region of eastern South Asia known as Bengal, which comprises present day Bangladesh, the Indian state of West Bengal, and parts of the Indian states of Tripura and Assam. It is written with the Bengali script...

. The script with variations is used for Assamese and is basis for Meitei, Bishnupriya Manipuri
Bishnupriya Manipuri language
The Bishnupriya or Bishnupriya Manipuri is an Indo-Aryan language spoken in parts of the Indian states of Assam, Tripura, Manipur and others, as well as in Bangladesh, Burma, and other countries.-History and development:...

, Kokborok
Kokborok language
The Borok language, or Kok Borok, also spelled Kokborok and also known as Tripuri, is the native language of the Tripuri people of the Indian state of Tripura and neighboring areas of Bangladesh. The word Kok Borok is a compound of kok "language" and borok "people", which is used specifically for...

, Garo
Garo language
Garo is the language of the majority of the people of the Garo Hills in the Indian state of Meghalaya. Garo is also used in Kamrup, Dhubri, Goalpara and the Darrang districts of Assam, India as well as in neighboring Bangladesh...

 and Mundari
Mundari language
The Mundari are a small ethnic group of South Sudan and one of the Nilotic peoples.The group is composed of cattle-herders and agriculturalists and are part of Karo people which also includes Bari, Pojulu, Kakwa, Kuku and Nyangwara...

 alphabets. All these languages are spoken in the eastern region of South Asia
South Asia
South Asia, also known as Southern Asia, is the southern region of the Asian continent, which comprises the sub-Himalayan countries and, for some authorities , also includes the adjoining countries to the west and the east...

. Historically, the script has also been used to write the Sanskrit language in the same region. From a classificatory point of view, the Bengali script is an abugida
Abugida
An abugida , also called an alphasyllabary, is a segmental writing system in which consonant–vowel sequences are written as a unit: each unit is based on a consonant letter, and vowel notation is obligatory but secondary...

, i.e. its vowel
Vowel
In phonetics, a vowel is a sound in spoken language, such as English ah! or oh! , pronounced with an open vocal tract so that there is no build-up of air pressure at any point above the glottis. This contrasts with consonants, such as English sh! , where there is a constriction or closure at some...

 graphemes are mainly realized not as independent letters, but as diacritic
Diacritic
A diacritic is a glyph added to a letter, or basic glyph. The term derives from the Greek διακριτικός . Diacritic is both an adjective and a noun, whereas diacritical is only an adjective. Some diacritical marks, such as the acute and grave are often called accents...

s attached to its consonant
Consonant
In articulatory phonetics, a consonant is a speech sound that is articulated with complete or partial closure of the vocal tract. Examples are , pronounced with the lips; , pronounced with the front of the tongue; , pronounced with the back of the tongue; , pronounced in the throat; and ,...

 letters. It is written from left to right and lacks distinct letter case
Letter case
In orthography and typography, letter case is the distinction between the larger majuscule and smaller minuscule letters...

s. It is recognizable by a distinctive horizontal line running along the tops of the letters that links them together, a property it shares with two other popular Indian scripts: Devanagari
Devanagari
Devanagari |deva]]" and "nāgarī" ), also called Nagari , is an abugida alphabet of India and Nepal...

 (used for Hindi, Marathi
Marathi language
Marathi is an Indo-Aryan language spoken by the Marathi people of western and central India. It is the official language of the state of Maharashtra. There are over 68 million fluent speakers worldwide. Marathi has the fourth largest number of native speakers in India and is the fifteenth most...

 and Nepali
Nepali language
Nepali or Nepalese is a language in the Indo-Aryan branch of the Indo-European language family.It is the official language and de facto lingua franca of Nepal and is also spoken in Bhutan, parts of India and parts of Myanmar...

) and Gurumukhi
Gurmukhi script
Gurmukhi is the most common script used for writing the Punjabi language. An abugida derived from the Laṇḍā script and ultimately descended from Brahmi, Gurmukhi was standardized by the second Sikh guru, Guru Angad Dev Ji, in the 16th century. The whole of the Sri Guru Granth Sahib Ji's 1430...

 (used for Punjabi
Punjabi language
Punjabi is an Indo-Aryan language spoken by inhabitants of the historical Punjab region . For Sikhs, the Punjabi language stands as the official language in which all ceremonies take place. In Pakistan, Punjabi is the most widely spoken language...

). The Bengali script is, however, less blocky and presents a more sinuously shaped.

History

The Bengali script evolved from the Eastern Nagari script
Eastern Nagari script
The Eastern Nagari script is an Abugida system of writing belonging to the Brahmic family of scripts which use is associated with the two main languages Assamese and Bengali and other related variants such as, Bishnupriya Manipuri, Maithili, Mising, Meitei Manipuri, Sylheti, and Chittagonian...

, which belongs to the Brahmic family
Brahmic family
The Brahmic or Indic scripts are a family of abugida writing systems. They are used throughout South Asia , Southeast Asia, and parts of Central and East Asia, and are descended from the Brāhmī script of the ancient Indian subcontinent...

 of scripts, along with the Devanagari
Devanagari
Devanagari |deva]]" and "nāgarī" ), also called Nagari , is an abugida alphabet of India and Nepal...

 and other written systems of the Indian subcontinent
Indian subcontinent
The Indian subcontinent, also Indian Subcontinent, Indo-Pak Subcontinent or South Asian Subcontinent is a region of the Asian continent on the Indian tectonic plate from the Hindu Kush or Hindu Koh, Himalayas and including the Kuen Lun and Karakoram ranges, forming a land mass which extends...

. Both Eastern Nagari and Devanagari were derived from the ancient Nagari script
Nagari script
The Nāgarī script is the ancestor of Devanagari, Assamese, Bengali and other variants, and was first used to write Prakrit and Sanskrit. It was in vogue from before the 10th century...

. In addition to differences in how the letters are pronounced in the different languages, there are some typographical differences between the version of the script used for Assamese and Bishnupriya Manipuri as well as Maithili languages, and that used for Bengali and other languages.

Illustration:
  • The character ক্ষ (Assamese khyô, Bengali khio) is considered a separate letter in Assamese script (ক্ষ ) but considered a conjunct (orthographically ক্‌+ষ ) in Bengali. In both languages, it functions as though it were orthographically খ্য .
  • rô is represented as র in Bengali, ৰ in Assamese, and either of the two variants in Bishnupriya Manipuri and Maithili.
  • wô is represented as ৱ in Assamese, Bishnupriya Manipuri, and Maithili, but is collapsed with ব bô in Bengali.


The Bengali script was originally not associated with any particular language, but was often used in the eastern regions of Medieval India
Middle kingdoms of India
Middle kingdoms of India refers to the political entities in India from the 3rd century BC after the decline of the Maurya Empire, and the corresponding rise of the Satavahana dynasty, beginning with Simuka, from 230 BC...

. It was standardized into the modern Bengali script by Ishwar Chandra
Ishwar Chandra Vidyasagar
Ishwar Chandra Vidyasagar CIE , born Ishwar Chandra Bandopadhyaya , was an Indian Bengali polymath and a key figure of the Bengal Renaissance....

 under the reign of the British East India Company
British East India Company
The East India Company was an early English joint-stock company that was formed initially for pursuing trade with the East Indies, but that ended up trading mainly with the Indian subcontinent and China...

. The script was originally used to write Sanskrit
Sanskrit
Sanskrit , is a historical Indo-Aryan language and the primary liturgical language of Hinduism, Jainism and Buddhism.Buddhism: besides Pali, see Buddhist Hybrid Sanskrit Today, it is listed as one of the 22 scheduled languages of India and is an official language of the state of Uttarakhand...

. Epics of Hindu
Hindu
Hindu refers to an identity associated with the philosophical, religious and cultural systems that are indigenous to the Indian subcontinent. As used in the Constitution of India, the word "Hindu" is also attributed to all persons professing any Indian religion...

 scripture, including the Mahabharata
Mahabharata
The Mahabharata is one of the two major Sanskrit epics of ancient India and Nepal, the other being the Ramayana. The epic is part of itihasa....

 or Ramayana
Ramayana
The Ramayana is an ancient Sanskrit epic. It is ascribed to the Hindu sage Valmiki and forms an important part of the Hindu canon , considered to be itihāsa. The Ramayana is one of the two great epics of India and Nepal, the other being the Mahabharata...

, were written Mithilakshar/Tirhuta script in this region. After the medieval period, the use of Sanskrit as the sole written language gave way to Pali
Pali language
Pāli is a Middle Indo-Aryan language of the Indian subcontinent. It is best known as the language of many of the earliest extant Buddhist scriptures, as collected in the Pāi Canon or Tipitaka, and as the liturgical language of Theravada Buddhism.-Etymology of the name:The word Pali itself...

, and eventually to the vernacular
Vernacular
A vernacular is the native language or native dialect of a specific population, as opposed to a language of wider communication that is not native to the population, such as a national language or lingua franca.- Etymology :The term is not a recent one...

 languages we know now as Maithili, Bengali, and Assamese.There is a rich legacy of Indian literature written in this script, which is still occasionally used to write Sanskrit today.

Standardization

In the Bengali script, clusters of consonants are represented by different and sometimes quite irregular forms; thus, learning to read is complicated by the sheer size of the full set of letters and letter combinations, numbering about 350. While efforts at standardizing the alphabet for the Bengali language continue in such notable centres as the Bangla Academies (unaffiliated) at Dhaka
Dhaka
Dhaka is the capital of Bangladesh and the principal city of Dhaka Division. Dhaka is a megacity and one of the major cities of South Asia. Located on the banks of the Buriganga River, Dhaka, along with its metropolitan area, had a population of over 15 million in 2010, making it the largest city...

 (Bangladesh
Bangladesh
Bangladesh , officially the People's Republic of Bangladesh is a sovereign state located in South Asia. It is bordered by India on all sides except for a small border with Burma to the far southeast and by the Bay of Bengal to the south...

) and Kolkata
Kolkata
Kolkata , formerly known as Calcutta, is the capital of the Indian state of West Bengal. Located on the east bank of the Hooghly River, it was the commercial capital of East India...

 (West Bengal
West Bengal
West Bengal is a state in the eastern region of India and is the nation's fourth-most populous. It is also the seventh-most populous sub-national entity in the world, with over 91 million inhabitants. A major agricultural producer, West Bengal is the sixth-largest contributor to India's GDP...

, India
India
India , officially the Republic of India , is a country in South Asia. It is the seventh-largest country by geographical area, the second-most populous country with over 1.2 billion people, and the most populous democracy in the world...

), it is still not quite uniform as yet, as many people continue to use various archaic forms of letters, resulting in concurrent forms for the same sounds. Among the various regional variations within this script, only the Assamese and Bengali variations exist today in the formalized system.

It seems likely that the standardization
Standardization
Standardization is the process of developing and implementing technical standards.The goals of standardization can be to help with independence of single suppliers , compatibility, interoperability, safety, repeatability, or quality....

 of the alphabet will be greatly influenced by the need to typeset it on computers. The large alphabet can be represented, with a great deal of ingenuity, within the ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 character set, omitting certain irregular conjuncts. Work has been underway since around 2001 to develop Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 fonts, and it seems likely that it will split into two variants, traditional and modern.

In this and other articles on Wikipedia dealing with the Bengali language, a Romanization scheme used by linguists specializing in Bengali phonology is included along with IPA
International Phonetic Alphabet
The International Phonetic Alphabet "The acronym 'IPA' strictly refers [...] to the 'International Phonetic Association'. But it is now such a common practice to use the acronym also to refer to the alphabet itself that resistance seems pedantic...

 transcription.

A recent effort by the government of West Bengal focused on simplifying Bengali orthography in primary school texts.

Description of Bengali glyphs

The glyphs of the Bengali script can be divided into vowel diacritics, consonant and vowel letters (including consonant conjuncts), modifiers, digits, and punctuation marks.

Vowels

The Bengali script has a total of 11 vowel graphemes, each of which is called a স্বরবর্ণ shôrobôrno "vowel letter". These shôrobôrnos represent six of the seven main vowel sounds of Bengali, along with two vowel diphthongs. All of these are used in both Bengali
Bengali language
Bengali or Bangla is an eastern Indo-Aryan language. It is native to the region of eastern South Asia known as Bengal, which comprises present day Bangladesh, the Indian state of West Bengal, and parts of the Indian states of Tripura and Assam. It is written with the Bengali script...

 and Assamese
Assamese language
Assamese is the easternmost Indo-Aryan language. It is used mainly in the state of Assam in North-East India. It is also the official language of Assam. It is also spoken in parts of Arunachal Pradesh and other northeast Indian states. Nagamese, an Assamese-based Creole language is widely used in...

, the two main languages using the script. There is no standard character in the script for the Bengali main vowel sound /æ/, and vowel length differences thought to be represented by different vowel graphemes (e.g., hrôshsho i vs. dirgho i) do not hold true for the spoken language. Also, the grapheme called ri does not really represent a vowel phoneme, rather the sound /ri/.

When a vowel sound occurs at the beginning of a syllable or when it follows another vowel, it is written using a distinct letter. But when a vowel sound follows a consonant (or a consonant cluster), it is written with a diacritic which, depending on the vowel, can appear above, below, before or after the consonant. The diacritic cannot appear without a consonant. A diacritic form is named by adding a "-kar" to the end of the name of the corresponding vowel letter (see table below).

An exception to the above system is the vowel /ɔ/. This has no diacritic form, but is considered inherent in every consonant letter. To specifically denote the absence of this inherent vowel [ɔ] following a consonant, a diacritic called the hôshonto (্) may be written underneath the consonant.

Although there are only two diphthongs in the inventory of the script, the Bengali sound system has in fact many diphthongs. Most of these diphthongs are represented by juxtaposing the graphemes of their forming vowels, as in কেউ keu /keu/.

The table below shows the vowels present in the modern (i.e., since late nineteenth century) inventory of the Bengali alphabet, which has abandoned three historical vowels, rri, li, and lli, traditionally placed between ri and e.
Bengali vowels (স্বরবর্ণ shôrobôrno)
Full form Name of
full form
IPA
International Phonetic Alphabet
The International Phonetic Alphabet "The acronym 'IPA' strictly refers [...] to the 'International Phonetic Association'. But it is now such a common practice to use the acronym also to refer to the alphabet itself that resistance seems pedantic...

 
transcription
Diacritic form,
used with
the consonant [kɔ] (ক)
Name of
diacritic form
Transliteration  IPA
International Phonetic Alphabet
The International Phonetic Alphabet "The acronym 'IPA' strictly refers [...] to the 'International Phonetic Association'. But it is now such a common practice to use the acronym also to refer to the alphabet itself that resistance seems pedantic...

shôro ô
(shôre ô)
"vowel ô"
/ɔ/ and /o/ (none) (none) kô and ko /kɔ/ and /ko/
shôro a
(shôre a)
"vowel a"
/a/ কা akar ka /ka/
hrôshsho i
(hrôshsho i)
"short i"
/i/ কি hrôshsho ikar
(roshshikar)
ki /ki/
dirgho i
"long i"
/i/ কী dirgho ikar
(dirghikar)
ki /ki/
hrôshsho u
(rôshsho u)
"short u"
/u/ কু hrôshsho ukar
(roshshukar)
ku /ku/
dirgho u
"long u"
/u/ কূ dirgho ukar
(dirghukar)
ku /ku/
ri /ri/ কৃ rikar/rifôla kri /kri/
e /e/ and /æ/ কে ekar kê and ke /ke/ and /kæ/
oi /oj/ কৈ oikar koi /koj/
o /o/ কো okar ko /ko/
ou /ow/ কৌ oukar kou /kow/

Consonants

Consonant letters are called ব্যঞ্জনবর্ণ bênjonbôrno "consonant letter" in Bengali. The names of these letters are typically just the consonant sound plus the inherent vowel ô. Since the inherent vowel is assumed and not written, most letters' names look identical to the letter itself (e.g. the name of the letter ঘ is itself ঘ ghô). Some letters that have lost their distinctive pronunciation in Modern Bengali are called by a more elaborate name. For example, since the consonant phoneme /n/ can be written ন, ণ, or ঞ (depending on the spelling of the particular word), these letters are not simply called nô; instead, they are called দন্ত্য ন donto nô ("dental n"), মূর্ধন্য ণ murdhonno nô ("cerebral
Retroflex consonant
A retroflex consonant is a coronal consonant where the tongue has a flat, concave, or even curled shape, and is articulated between the alveolar ridge and the hard palate. They are sometimes referred to as cerebral consonants, especially in Indology...

 n"), and ঞীয়/ইঙ niô/ingô. Similarly, the phoneme /ʃ/ can be written শ talobbo shô ("palatal
Palatal consonant
Palatal consonants are consonants articulated with the body of the tongue raised against the hard palate...

 s"), ষ murdhonno shô ("cerebral s"), or স donto shô ("dental s"), depending on the word. Since the consonant ঙ /ŋ/ cannot occur at the beginning of a word in Bengali, its name is not ঙ ngô but উঙ ungô . Similarly, since semivowels ([j], [w], [e̯], [o̯]) cannot occur at the beginning of a Bengali word, the name for "semi-vowel " য় is not অন্তঃস্থ য় but অন্তঃস্থ অ ôntostho ô.

In the earlier inventories of the Bengali alphabet, one can find a second bô (called ôntostho bô) following lô. This ôntostho bô originally represented a /v/ or /w/ sound, but later merged with the borgio bô in the Bengali language. The two bôs were represented with identically but occurred in two different places in the inventory. In the orthography of Bangladesh, only borgio bô is retained. The ôntostho bô continues to be used in the Indian state of West Bengal.

The table below presents the Bengali consonant letters in their traditional order.
Bengali consonants (ব্যঞ্জনবর্ণ bênjonbôrno)
Letter Name of
consonant
Transliteration  IPA
International Phonetic Alphabet
The International Phonetic Alphabet "The acronym 'IPA' strictly refers [...] to the 'International Phonetic Association'. But it is now such a common practice to use the acronym also to refer to the alphabet itself that resistance seems pedantic...

k /k/
khô kh /kʰ/
g /ɡ/
ghô gh /ɡʱ/
ungô, umô ņ /ŋ/
chô ch /tʃ/
chhô chh /tʃʰ/
borgio jô
(burgijjô)
j /dʒ/
jhô jh /dʒʱ/
ingô, niô n /n/
ţô ţ /ʈ/
ţhô ţh /ʈʰ/
đô đ /ɖ/
đhô đh /ɖʱ/
murdhonno nô
(moddhennô)
n /n/
t /t̪/
thô th /t̪ʰ/
d /d̪/
dhô dh /d̪ʱ/
donto nô
(dontennô)
n /n/
p /p/
phô ph /pʰ/
bô (the so-called borgio bô) b /b/
bhô bh /bʱ/
m /m/
ôntostho jô
(ontostejô)
j /dʒ/
(bôe bindu/shunno) rô r /ɾ/
l /l/
talobbo shô
(taleboshshô)
sh and s /ʃ/ / /s/
murdhonno shô
(muddhennoshshô)
(peţ kaţa shô)
sh /ʃ/
donto shô
(donteshshô)
sh and s /ʃ/ / /s/
h /h/
য় ôntostho ô
(ontosteô)
e and – /e̯/ /-
ড় đôe shunno/bindu ŗô ŗ /ɽ/
ঢ় đhôe shunno/bindu ŗô ŗh /ɽ/

Consonant conjuncts

Up to four consecutive consonants not separated by vowels can be orthographically represented as a ligature
Ligature
Ligature may refer to:* Ligature * Ligature , a characteristic notation style of the Medieval and Renaissance periods of music history* Ligature , a device used to attach a reed to the mouthpiece of a woodwind instrument...

 called a "consonant conjunct" (Bengali: যুক্তাক্ষর juktakkhor or যুক্তবর্ণ juktobôrno). Typically, the first consonant in the conjunct is shown above and/or to the left of the following consonants. Many consonants appear in an abbreviated or compressed form when serving as part of a conjunct. Others simply take exceptional forms in conjuncts, bearing little or no resemblance to the base character.

Often, consonant conjuncts are not actually pronounced as would be implied by the pronunciation of the individual components. For example, adding ল lô underneath শ shô in Bengali creates the conjunct শ্ল, which is not pronounced shlô but slô in Bengali. Many conjuncts represent Sanskrit sounds that were lost thousands of years before modern Bengali was ever spoken, as in জ্ঞ, which is a combination of জ jô and ঞ niô, but is not pronounced jnô. Instead, it is pronounced ggõ in Bengali. Thus, as conjuncts often represent (combinations of) sounds that cannot be easily understood from the components, the following descriptions are concerned only with the construction of the conjunct, and not the resulting pronunciation. Thus, a variant of the IAST
IAST
The International Alphabet of Sanskrit Transliteration is a transliteration scheme that allows a lossless romanization of Indic scripts as employed by the Sanskrit language.-Popularity:...

 romanization scheme is used instead of the phonemic romanization.

Fused forms

Some consonants fuse in such a way that one stroke of the first consonant also serves as a stroke of the next.
  • The consonants can be placed on top of one another, sharing their vertical line: ক্ক kkô গ্ন gnô গ্ল glô ন্ন nnô প্ন pnô প্প ppô ল্ল llô etc.
  • As the last member of a conjunct, ৱ wô and ব bô can hang on the vertical line under the preceding consonants, taking the shape of ব bô (here referred to as বফলা bôfôla): গ্ব gwô ণ্ব দ্ব dwô/dbô ল্ব lwô শ্ব śwô.
  • The consonants can also be placed side-by-side, sharing their vertical line: দ্দ ddô ন্দ ndô ব্দ bdô ব্জ bjô প্ট শ্চ ścô শ্ছ śchô etc.

Approximated forms

Some consonants are simply written closer to one another to indicate that they are in a conjunct together.
  • As the last member of a conjunct, গ gô can appear unaltered, with the preceding consonant simply written closer to it: দ্গ dgô.
  • As the last member of a conjunct, ৱ wô and ব bô can appear immediately to the right of the preceding consonant, taking the shape of ব bô (here referred to as বফলা bôfôla): ধ্ব dhwô ব্ব bbô হ্ব hwô.

Compressed forms

Some consonants are compressed (and often simplified) when appearing as the first member of a conjunct.
  • As the first member of a conjunct, the consonants ঙ ŋô চ cô ড ব bô are often compressed and placed at the top-left of the following consonant, with little or no change to the basic shape: ঙ্ক্ষ ঙ্খ ŋkhô ঙ্ঘ ŋghô ঙ্ম ŋmô চ্চ ccô চ্ছ cchô চ্ঞ cñô ড্ড ব্ব bbô.
  • As the first member of a conjunct, ত tô is compressed and placed above the following consonant, with little or no change to the basic shape: ত্ন tnô ত্ম tmô ত্ব twô.
  • As the first member of a conjunct, ম mô is compressed and simplified to a curved shape. It is placed above or to the top-left of the following consonant: ম্ন mnô ম্প mpô ম্ফ mfô ম্ব mbô ম্ভ mbhô ম্ম mmô ম্ল mlô.
  • As the first member of a conjunct, ষ is compressed and simplified to an oval shape with a diagonal stroke through it. It is placed to the top-left of the following consonants: ষ্ক ষ্ট ষ্ঠ ষ্প ষ্ফ ষ্ম .
  • As the first member of a conjunct, স sô is compressed and simplified to a ribbon shape. It is placed above or to the top-left of the following consonant: স্ক skô স্খ skhô স্ট স্ত stô স্থ sthô স্ন snô স্প spô স্ফ sfô স্ব swô স্ম smô স্ল slô.

Abbreviated forms

Some consonants are abbreviated when appearing in conjuncts, losing part of their basic shape.
  • As the first member of a conjunct, জ jô can lose its final downstroke: জ্জ jjô জ্ঞ jñô জ্ব jwô.
  • As the first member of a conjunct, ঞ ñô can lose its bottom half: ঞ্চ ñcô ঞ্ছ ñchô ঞ্জ ñjô ঞ্ঝ ñjhô.
  • As the last member of a conjunct, ঞ ñô can lose its left half (the এ part): জ্ঞ jñô.
  • As the first member of a conjunct, ণ and প pô can lose their downstroke: ণ্ঠ ণ্ড প্ত ptô প্স psô.
  • As the first member of a conjunct, ত tô and ভ bhô can lose their final upward tail: ত্ত ttô ত্থ tthô ত্র trô ভ্র bhrô.
  • As the last member of a conjunct, থ thô can lose its final upstroke, taking the form of হ hô instead: ন্থ nthô ম্থ mthô স্থ sthô.
  • As the last member of a conjunct, ম mô can lose its initial downstroke: ক্ম kmô গ্ম gmô ঙ্ম ŋmô ট্ম ণ্ম ত্ম tmô দ্ম dmô ন্ম nmô ম্ম mmô শ্ম śmô ষ্ম স্ম smô.
  • As the last member of a conjunct, স sô can lose its top half: ক্স ksô ন্স nsô.

Variant forms

Some consonants have forms that are used regularly, but only within conjuncts.
  • As the first member of a conjunct, ঙ ŋô can appear as a loop and curl: ঙ্ক ŋkô ঙ্গ ŋgô.
  • As the last member of a conjunct, the curled top of ধ dhô is replaced by a straight downstroke to the right: গ্ধ gdhô দ্ধ ddhô ন্ধ ndhô ব্ধ bdhô.
  • As the first member of a conjunct, র rô appears as a diagonal stroke (called রেফ ref) above the following member: র্ক rkô র্খ rkhô র্গ rgô র্ঘ rghô etc.
  • As the last member of a conjunct, র rô appears as a wavy horizontal line (called রফলা rôfôla) under the previous member: খ্র khrô গ্র grô ঘ্র ghrô ব্র brô etc.
    • In some fonts, certain conjuncts with রফলা rôfôla appear using the compressed (and often simplified) form of the previous consonant: জ্র jrô ট্র ঠ্র ড্র ম্র mrô স্র srô.
    • In some fonts, certain conjuncts with রফলা rôfôla appear using the abbreviated form of the previous consonant: ক্র krô ত্র trô ভ্র bhrô
  • As the last member of a conjunct, য yô appears as a wavy vertical line (called যফলা jôfôla) to the right of the previous member: ক্য kyô খ্য khyô গ্য gyô ঘ্য ghyô etc.
    • In some fonts, certain conjuncts with যফলা jôfôla appear using special fused forms: দ্য dyô ন্য nyô শ্য śyô ষ্য স্য syô হ্য hyô.

Exceptions

  • When followed by র rô, ক kô takes on the abbreviated form of ত tô with the addition of a curl to the right: ক্র krô.
  • When preceded by the abbreviated form of ঞ ñô, চ cô takes the shape of ব bô: ঞ্চ ñcô
  • When preceded by another ট , ট is reduced to a leftward curl: ট্ট .
  • When preceded by ষ , ণ appears as two loops to the right: ষ্ণ .
  • As the first member of a conjunct, or when word-final and followed by no vowel, ত tô can appear as ৎ (called খণ্ড-ত or "broken tô"): ৎস tsô ৎপ tpô ৎক tkô etc.
  • When preceded by হ hô, ন nô appears as a curl to the right: হ্ন hnô.
  • Certain combinations simply must be memorized: ক্ষ হ্ম hmô.

Exceptional consonant-vowel combinations

When serving as a vowel sign, উ u, ঊ ū, and ঋ take on many exceptional forms.
  • উ u
    • When following গ gô or শ śô, it takes on a variant form resembling the final tail of ও: গু gu শু śu.
    • When following a ত tô that is already part of a conjunct with ন nô or স sô, it is fused with the ত tô to resemble ও o: ন্তু ntu স্তু stu.
    • When following র rô, and in many fonts also following the variant রফলা rôfôla, it appears as an upward curl to the right of the preceding consonant as opposed to a downward loop below: রু ru গ্রু gru ত্রু tru থ্রু thru দ্রু dru ধ্রু dhru ব্রু bru ভ্রু bhru শ্রু śru.
    • When following হ hô, it appears as an extra curl: হু hu.
  • ঊ ū
    • When following র rô, and in many fonts also following the variant রফলা rôfôla, it appears as a downstroke to the right of the preceding consonant as opposed to a downward hook below: রূ rū গ্রূ grū থ্রূ thrū দ্রূ drū ধ্রূ dhrū ভ্রূ bhrū শ্রূ śrū.
    • When following হ hô, it takes the variant shape of ঊ ū: হৃ .


Conjuncts of three consonants also exist, and follow the same rules as above. Examples include স sô + ত tô +র rô = স্ত্র strô, ম mô + প pô + র rô = ম্প্র mprô, ঙ ŋô + ক kô + ষ = ঙ্ক্ষ , জ jô + জ jô + ৱ wô = জ্জ্ব jjwô, ক kô + ষ + ম mô = ক্ষ্ম . Theoretically, four-consonant conjuncts can also be created, as in র rô + স sô + ট + র rô = র্স্ট্র , but these are not found in real words.

Modifiers and others

Modifier and other graphemes in Bengali
Symbol with [kɔ] (ক) Name Function Transliteration  IPA
International Phonetic Alphabet
The International Phonetic Alphabet "The acronym 'IPA' strictly refers [...] to the 'International Phonetic Association'. But it is now such a common practice to use the acronym also to refer to the alphabet itself that resistance seems pedantic...

ক্ hôshonto
Virama
Virama is a generic term for the diacritic in many Brahmic scripts, including Devanagari and East Nagari, that is used to suppress the inherent vowel that otherwise occurs with every consonant letter. The name is Sanskrit for "cessation, termination, end"...

 
"final hôsh"
Suppresses the inherent vowel [ɔ] k /k/
কৎ khônđo tô
"broken tô"
Final unaspirated dental [t̪] (ত) kôt /kɔt̪/
কং ônushshôr
Anusvara
Anusvara is the diacritic used to mark a type of nasalization used in a number of Indic languages. Depending on the location of the anusvara in the word and the language within which it is used, its exact pronunciation can vary greatly....

 
Final velar nasal
Velar nasal
The velar nasal is the sound of ng in English sing. It is a type of consonantal sound, used in some spoken languages. The symbol in the International Phonetic Alphabet that represents this sound is , and the equivalent X-SAMPA symbol is N....

 
kôņ /kɔŋ/
কঃ bishôrgo
Visarga
Visarga is a Sanskrit word meaning "sending forth, discharge". In Sanskrit phonology , is the name of a phone, , written as IAST , Harvard-Kyoto , Devanagari . Visarga is an allophone of and in pausa...

  1. pronounced as syllable-final voiceless breath
    Voiceless glottal fricative
    The voiceless glottal transition, commonly called a "fricative", is a type of sound used in some spoken languages which patterns like a fricative or approximant consonant phonologically, but often lacks the usual phonetic characteristics of a consonant...

    , as in উঃ
  2. isn't pronounced, but geminates
    Gemination
    In phonetics, gemination happens when a spoken consonant is pronounced for an audibly longer period of time than a short consonant. Gemination is distinct from stress and may appear independently of it....

     the following consonant, as in দুঃসময়
  3. isn't pronounced at all, as in দুঃস্থ

-
  1. [uh]
  2. [d̪uʃːomɔj]
  3. [d̪ustʰo]
কঁ chôndrobindu
Chandrabindu
Chandrabindu is a diacritic sign having the form of a dot inside the lower half of a circle. It is used in the Devanagari , Bengali , Gujarati , Oriya and Telugu scripts.It usually means that the previous vowel is nasalized...

 
"moon-dot"
Vowel nasalization
Nasalization
In phonetics, nasalization is the production of a sound while the velum is lowered, so that some air escapes through the nose during the production of the sound by the mouth...

 
kôñ /kɔ̃/


ঃ -h and ং -ng are also often used as abbreviation marks
Abbreviation
An abbreviation is a shortened form of a word or phrase. Usually, but not always, it consists of a letter or group of letters taken from the word or phrase...

 in Bengali, with ং -ng used when the next sound following the abbreviation would be a nasal sound, and ঃ -h otherwise. For example ডঃ ḍôh stands for ডক্টর ḍôkṭor "doctor" and নং nông stands for নম্বর nômbor "number". Some abbreviations have no marking at all, as in ঢাবি ḍhabi for ঢাকা বিশ্ববিদ্যালয় Ḍhaka Bishshobiddalôe "Dhaka University". The full stop can also be used when writing out English letters as initials, such as ই.ইউ. i iu "E.U.
European Union
The European Union is an economic and political union of 27 independent member states which are located primarily in Europe. The EU traces its origins from the European Coal and Steel Community and the European Economic Community , formed by six countries in 1958...

".

The jôphôla is sometimes used as a diacritic to indicate non-Bengali vowels of various kinds in transliterated foreign words. For example, the schwa
Schwa
In linguistics, specifically phonetics and phonology, schwa can mean the following:*An unstressed and toneless neutral vowel sound in some languages, often but not necessarily a mid-central vowel...

 is indicated by a jôphôla, the French u and the German umlaut ü
Ü
Ü, or ü, is a character which can be either a letter from several extended Latin alphabets, or the letter U with an umlaut or a diaeresis...

 as উ্য, the German umlaut ö
Ö
"Ö", or "ö", is a character used in several extended Latin alphabets, or the letter O with umlaut to denote the front vowels or . In languages without umlaut, the character is also used as a "O with diaeresis" to denote a syllable break, wherein its pronunciation remains an unmodified .- O-Umlaut...

 as ও্য or এ্য, etc.

The apostrophe, known in Bengali as ঊর্ধ্বকমা urdhokôma "upper comma", is sometimes used to distinguish between homograph
Homograph
A homograph is a word or a group of words that share the same written form but have different meanings. When spoken, the meanings may be distinguished by different pronunciations, in which case the words are also heteronyms. Words with the same writing and pronunciation A homograph (from the ,...

s, as in পাটা paţa "plank" and পা'টা paţa "the leg". Sometimes a hyphen is used for the same purpose (as in পা-টা, an alternative of পা'টা).

Digits and numerals

The Bengali script has ten digit
Numerical digit
A digit is a symbol used in combinations to represent numbers in positional numeral systems. The name "digit" comes from the fact that the 10 digits of the hands correspond to the 10 symbols of the common base 10 number system, i.e...

s (graphemes or symbols indicating the numbers from 0 to 9), which are variants of Indian numerals
Indian numerals
Most of the positional base 10 numeral systems in the world have originated from India, where the concept of positional numeration was first developed...

 (known as Arabic numerals
Arabic numerals
Arabic numerals or Hindu numerals or Hindu-Arabic numerals or Indo-Arabic numerals are the ten digits . They are descended from the Hindu-Arabic numeral system developed by Indian mathematicians, in which a sequence of digits such as "975" is read as a numeral...

 in the West). Bengali digits have no horizontal headstroke or "matra".
Bengali digits
English digits 0 1 2 3 4 5 6 7 8 9
Bengali digits
Bengali names
of digits
shunno êk dui tin char pãch chhôe shat nôe
শূন্য এক দুই তিন চার পাঁচ ছয় সাত আট নয়


Numbers larger than 9 are written in Bengali using a positional base 10 numeral system (the decimal system), just as in English. A period or dot is used to denote the decimal separator
Decimal separator
Different symbols have been and are used for the decimal mark. The choice of symbol for the decimal mark affects the choice of symbol for the thousands separator used in digit grouping. Consequently the latter is treated in this article as well....

, which separates the integral and the fractional parts of a decimal number. When writing large numbers with many digits, commas are used as delimiters to group digits, indicating the thousand (হাজার hajar), the hundred thousand or lakh
Lakh
A lakh is a unit in the Indian numbering system equal to one hundred thousand . It is widely used both in official and other contexts in Pakistan, Bangladesh, India, Maldives, Nepal, Sri Lanka, Myanmar and is often used in Indian English.-Usage:...

 (লাখ lakh or লক্ষ lokkho), and the ten million
Ten Million
Ten Million was a minor league baseball player who played for various teams in the Northwestern League in the years prior to World War I. He is best known for his unusual name.-Baseball career:...

 or hundred lakh or crore
Crore
A crore is a unit in the Indian number system equal to ten million , or 100 lakhs. It is widely used in India, Bangladesh, Nepal, and Pakistan....

 (কোটি or ক্রোড় ) units. In other words, going leftwards from the decimal separator, the first grouping consists of three digits, and the subsequent groupings always consist of 2 digits.

For example, the English number 17,557,345 will be written in traditional Bengali as ১,৭৫,৫৭,৩৪৫ (এক কোটি, পঁচাত্তর লাখ, সাতান্ন হাজার, তিন শ পঁয়তাল্লিশ , "one crore, seventy-five lakhs, fifty-seven thousand, three hundred forty-five").

The matra

Whereas in western scripts (Latin, Cyrillic, etc.) the letter-forms stand on an invisible baseline, the Bengali letter-forms hang from a visible horizontal left-to-right headstroke called মাত্রা matra (not to be confused with its Hindi cognate matra, which denotes the dependent forms of Hindi vowels). The presence and absence of this matra can be important. For example, the letter ত [tɔ] and the numeral ৩ "3" are distinguishable only by the presence or absence of the matra, as is the case between the consonant cluster ত্র [trɔ] and the independent vowel এ [e]. The letter-forms also employ the concepts of letter-width and letter-height (the vertical space between the visible matra and an invisible baseline).

Punctuation marks

Bengali punctuation marks, apart from the downstroke daŗi (|), the Bengali equivalent of a full stop, have been adopted from western scripts and their usage is similar. Commas, semicolons, colons, quotation marks, etc. are the same as in English. The concept of using capital letters is absent in the Bengali script, hence proper names are unmarked.

Characteristics of the Bengali text

Size

The consonant graphemes and the full form of vowel graphemes fit into an imaginary rectangle of uniform size (i.e. uniform width and height). The size of a consonant conjunct, regardless of its complexity, is deliberately maintained the same as that of a single consonant grapheme, so that diacritic vowel forms can be attached to it without any distortion.

Spacing

In a typical Bengali text, orthographic words, i.e., words as they are written, can be seen as being separated from each other by an even spacing. Graphemes within a word are also evenly spaced, but this spacing is much narrower than the spacing between words.

Punctuation marks

As discussed earlier, the Bengali punctuation marks are often the same as their English counterparts, both in form and function.

Characteristics of the orthographic word

In every Bengali orthographic word, one can find different kinds of graphemes and their combinations, and they are as follows:
  • Independent form of vowel graphemes, which can be found at the beginning of a word or after another vowel sound.
  • Consonant graphemes (or consonant conjuncts) with no diacritic vowel form attached to them. An inherent vowel (either /ɔ/ or /o/, depending on context) is nevertheless assumed to be attached to them.
  • Consonant graphemes (or consonant conjuncts) with a diacritic vowel form attached to them.
  • Other modifier symbols indicating nasalization of vowels, suppression of the inherent vowel, etc.

The "matra"

The matra or the horizontal headstroke on each grapheme usually add up and often form a continuous single headstroke over the entire orthographic word, with different graphemes hanging down from it. This gives Bengali text a distinct look. Among other modern Indic scripts, Devanagari
Devanagari
Devanagari |deva]]" and "nāgarī" ), also called Nagari , is an abugida alphabet of India and Nepal...

 also has this characteristic.

Inconsistencies

The following inconsistencies are inherent in the Bengali script and orthography. They often put additional burden on the person learning the script. The inconsistencies manifest themselves in various ways. Sometimes there are multiple different letters or symbols for the same sound (over-production). Sometimes a letter loses its original sound value. In other instances, the coverage of phonological information by the script is incomplete, inconsistent and/or ambiguous. Most of these inconsistencies can be attributed to the fact that the script was originally conceived to represent Sanskrit sounds.

Redundant graphemes for the vowel sounds [i] and [u]

The Bengali script has two symbols for the vowel sound [i] and two symbols for the vowel sound [u]. This redundancy stems from the time when this script was used to write Sanskrit
Sanskrit
Sanskrit , is a historical Indo-Aryan language and the primary liturgical language of Hinduism, Jainism and Buddhism.Buddhism: besides Pali, see Buddhist Hybrid Sanskrit Today, it is listed as one of the 22 scheduled languages of India and is an official language of the state of Uttarakhand...

, a language that had a short [i] and a long [iː], and a short [u] and a long [uː]. These letters are preserved in the Bengali script with their traditional names of hrôshsho i/u (lit. "short i/u") and dirgho i/u (lit. "long i/u") despite the fact that they are no longer pronounced differently in ordinary speech. These graphemes do serve an etymological function, however, in preserving the original Sanskrit spelling in tôtshomo Bengali words (i.e., words that were borrowed from Sanskrit).

The vowel grapheme ri

The grapheme called ri does not really represent a vowel phoneme in Bengali, rather the consonant-vowel combination /ri/. Nevertheless, it is included in the vowel section of the inventory of the Bengali script. This inconsistency is also a remnant from Sanskrit, where the grapheme represents a retroflex approximant, a sound considered a vowel in Sanskrit.

The vowel sound [æ]

Even though the near-open front unrounded vowel [æ] is one of the seven main vowel sounds in the standard Bengali language, no distinct vowel symbol has been allotted for it in the script, since there is no [æ] sound in Sanskrit, the primary written language when the script was conceived. As a result, this sound is orthographically realized by multiple means in modern Bengali orthography, usually using some combination of এ, অ and the jôfôla (diacritic form of the consonant grapheme য ôntostho jô) as seen in the following examples:
  • word-initially: এত [æt̪o] "so much", এ্যাকাডেমী [ækademi] "academy", অ্যামিবা [æmiba] "amoeba"
  • word-medially and following a consonant: দেখা [d̪ækha] "to see", ব্যস্ত [bæst̪o] "busy", ব্যাকরণ [bækɔron] "grammar".

ত and ৎ

In native or tôdbhôbo Bengali words, syllable-final ত tô is pronounced /t̪/, as in নাতনি /nat̪ni/ "grand daughter", করাত /kɔrat̪/ "saw", etc.

ৎ (called খণ্ড-ত "broken tô") is always used syllable-finally and always pronounced as /t̪/. It is predominantly found in loan words from Sanskrit such as ভবিষ্যৎ /bʱobiʃːɔt̪/ "future", সত্যজিৎ /ʃot̪ːod͡ʒit̪/ "Satyajit (a proper name)", etc. It is also found in some onomatopoeic words (such as থপাৎ /t̪ʰopat̪/ "sound of something heavy that fell", মড়াৎ /mɔɽat̪/ "sound of something breaking", etc.), as the first member of some consonant conjuncts (such as ৎস tsô, ৎপ tpô, ৎক tkô, etc.), and in some foreign loanwords (e.g. নাৎসি /nat̪si/ "Nazi", জুজুৎসু /d͡ʒud͡ʒut̪su/ "Jujitsu", etc.) which contain the same conjuncts.

This is an over-production inconsistency, where the sound /t̪/ is realized by both ত and ৎ. This creates confusion among inexperienced writers of Bengali. There is no simple way of telling which symbol should be used. Usually, the contexts where ৎ is used need to be memorized, as these are less frequent.

শ, ষ and স

Three graphemes—শ talobbo shô "palatal s", ষ murdhonno shô "cerebral s", and স donto shô "dental s"—are used to represent the voiceless palato-alveolar fricative [ʃ], as seen in their word-final pronunciations in ফিসফিস [pʰiʃpʰiʃ] "whisper", বিশ [biʃ] "twenty" and বিষ [biʃ] "poison". The grapheme স donto shô "dental s", however, does retain the voiceless alveolar fricative
Voiceless alveolar fricative
The voiceless alveolar sibilant is a common consonant sound in spoken languages. It is the sound in English words such as sea and pass, and is represented in the International Phonetic Alphabet as . It has a characteristic high-pitched, highly perceptible hissing sound...

 [s] sound when used as the first component in certain consonant conjuncts as in স্খলন [skʰɔlon] "fall", স্পন্দন [spɔndon] "beat", etc.

জ and য

There are two letters (জ and য) for the voiced postalveolar affricate
Voiced postalveolar affricate
The voiced palato-alveolar affricate, also described as voiced domed postalveolar affricate, is a type of consonantal sound, used in some spoken languages. The sound is transcribed in the International Phonetic Alphabet with ⟨⟩ , and the equivalent X-SAMPA representation is ⟨dZ⟩...

 [dʒ]. Compare জাল [dʒal] "net" and যাও [dʒao] "Go!".

What was once pronounced and written as a retroflex nasal ণ [ɳ] is now pronounced as an alveolar [n] (unless conjoined with another retroflex consonant
Retroflex consonant
A retroflex consonant is a coronal consonant where the tongue has a flat, concave, or even curled shape, and is articulated between the alveolar ridge and the hard palate. They are sometimes referred to as cerebral consonants, especially in Indology...

 such as ট, ঠ, ড and ঢ), although the spelling does not reflect this change.

Romanization

The romanization of Bengali is the representation of the Bengali language
Bengali language
Bengali or Bangla is an eastern Indo-Aryan language. It is native to the region of eastern South Asia known as Bengal, which comprises present day Bangladesh, the Indian state of West Bengal, and parts of the Indian states of Tripura and Assam. It is written with the Bengali script...

 in the Latin script. While different standards for romanization
Romanization
In linguistics, romanization or latinization is the representation of a written word or spoken speech with the Roman script, or a system for doing so, where the original word or language uses a different writing system . Methods of romanization include transliteration, for representing written...

 have been proposed for Bengali, these have not been adopted with the degree of uniformity seen in languages such as Japanese
Japanese language
is a language spoken by over 130 million people in Japan and in Japanese emigrant communities. It is a member of the Japonic language family, which has a number of proposed relationships with other languages, none of which has gained wide acceptance among historical linguists .Japanese is an...

 or Sanskrit. Most standardized Bengali romanizations are adapted from standards proposed for Indic languages, and these models are compared below.

History

The Portuguese missionaries stationed in Bengal in the 16th century were the first people to employ the Latin alphabet
Latin alphabet
The Latin alphabet, also called the Roman alphabet, is the most recognized alphabet used in the world today. It evolved from a western variety of the Greek alphabet called the Cumaean alphabet, which was adopted and modified by the Etruscans who ruled early Rome...

 in writing Bengali books, the most famous of which are the Crepar Xaxtrer Orth, Bhed and the Vocabolario em idioma Bengalla, e Portuguez dividido em duas partes, both written by Manuel da Assumpção
Manuel da Assumpção
Manuel da Assumpção was a Portuguese missionary who wrote the first grammar of Bengali language , titled Vocabulary of Bangla language and Portuguese, divided in two parts ....

. But the Portuguese-based romanization did not take root. In the late 18th century Augustin Aussant used a romanization scheme based on the French alphabet. At the same time, Nathaniel Brassey Halhed
Nathaniel Brassey Halhed
Nathaniel Brassey Halhed was an English Orientalist and philologist. Halhed was born at Westminster. He was educated at Harrow, where he began his intimacy with Richard Brinsley Sheridan, which continued after he entered Christ Church, Oxford...

 used a romanization scheme based on English for his Bengali grammar book. After Halhed, the renowned English philologist and oriental scholar Sir William Jones devised a romanization scheme for Bengali and for Indian languages in general, and published it in the Asiatick Researches journal in 1801. This scheme came to be known as the "Jonesian System" of romanization, and served as a model for the next century and a half.

Transliteration vs Transcription

The Romanization of a language written in a non-Roman script can be based on transliteration
Transliteration
Transliteration is a subset of the science of hermeneutics. It is a form of translation, and is the practice of converting a text from one script into another...

 (orthographically
Orthography
The orthography of a language specifies a standardized way of using a specific writing system to write the language. Where more than one writing system is used for a language, for example Kurdish, Uyghur, Serbian or Inuktitut, there can be more than one orthography...

 accurate, i.e. the original spelling can be recovered) or transcription
Transcription (linguistics)
Transcription in the linguistic sense is the systematic representation of language in written form. The source can either be utterances or preexisting text in another writing system, although some linguists only consider the former as transcription.Transcription should not be confused with...

 (phonetically accurate, i.e. the pronunciation can be reproduced). This distinction is important in Bengali as its orthography was adopted from Sanskrit, and ignores sound change
Sound change
Sound change includes any processes of language change that affect pronunciation or sound system structures...

 processes of several millennia. To some degree, all writing systems differ from the way the language is pronounced, but this may be more extreme for languages like Bengali. For example, the three letters শ, ষ, and স had distinct pronunciations in Sanskrit, but over several centuries, the standard pronunciation of Bengali (usually modeled on the Nadia
Nadia District
Nadia district is a district of the state of West Bengal, in the north east of India. It borders with Bangladesh to the east, North 24 Parganas and Hooghly districts to the south, Bardhaman district to the west, and Murshidabad district to the north....

 dialect), has lost these phonetic distinctions (all three are usually pronounced as IPA [ʃ]) while the spelling distinction nevertheless persists in orthography.

In written texts, it is easy to distinguish between homophones such as শাপ shap "curse" and সাপ shap "snake". Such a distinction could be particularly relevant in searching for the term in an encyclopedia, for example. However, the fact that the words sound identical means that they would be transcribed identically; thus, some important meaning distinctions cannot be rendered in a transcription model. Another issue with transcription systems is that cross-dialectal and cross-register
Register (linguistics)
In linguistics, a register is a variety of a language used for a particular purpose or in a particular social setting. For example, when speaking in a formal setting an English speaker may be more likely to adhere more closely to prescribed grammar, pronounce words ending in -ing with a velar nasal...

 differences are widespread, and thus the same word or lexeme
Lexeme
A lexeme is an abstract unit of morphological analysis in linguistics, that roughly corresponds to a set of forms taken by a single word. For example, in the English language, run, runs, ran and running are forms of the same lexeme, conventionally written as RUN...

 may have many different transcriptions. Even simple words like মন "mind" may be pronounced "mon", "môn", or (in poetry) "mônô" (e.g. the Indian national anthem, Jana Gana Mana
Jana Gana Mana
Jana Gana Mana is the national anthem of India. Written in highly Sanskritized Bengali, it is the first of five stanzas of a Brahmo hymn composed and scored by Nobel laureate Rabindranath Tagore. It was first sung at the Calcutta Session of the Indian National Congress on 27 December 1911...

).

Often, different phonemes (meaningfully different sounds) are represented by the same symbol or grapheme. Thus, the vowel এ can represent both [e] (এল elo [elo] "came"), or [ɛ] (এক êk [ɛk] "one"). Occasionally, words written in the same way (homograph
Homograph
A homograph is a word or a group of words that share the same written form but have different meanings. When spoken, the meanings may be distinguished by different pronunciations, in which case the words are also heteronyms. Words with the same writing and pronunciation A homograph (from the ,...

s) may have different pronunciations for differing meanings: মত can mean "opinion" , or "similar to" (môto). Thus, some important phonemic distinctions cannot be rendered in a transliteration model. In addition, when representing a Bengali word to allow speakers of other languages to pronounce it easily, it may be better to use a transcription, which does not include the silent letters and other idiosyncrasies (e.g. স্বাস্থ্য shastho, spelled , or অজ্ঞান ôggên, spelled ) that make Bengali orthography so complicated.

Comparison of Romanizations

Comparisons of standard romanization schemes for Bengali are given in the table below. Two standards are commonly used for transliteration of Indic languages including Bengali. Many standards (e.g. NLK / ISO), use diacritic marks and permit case markings for proper nouns. Newer forms (e.g. Harvard-Kyoto) are more suited for ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

-derivative keyboards, and use upper- and lower-case letters contrastively and forgo normal standards for English capitalization.
  • "NLK" stands for the diacritic-based letter-to-letter transliteration schemes, best represented by the National Library at Kolkata romanization
    National Library at Kolkata romanization
    The National Library at Kolkata romanization is the most widely used transliteration scheme in dictionaries and grammars of Indic languages. This transliteration scheme is also known as Library of Congress and is nearly identical to one of the possible ISO 15919 variants.The tables below mostly use...

     or the ISO 15919
    ISO 15919
    ISO 15919 Transliteration of Devanagari and related Indic scripts into Latin characters is an international standard for the transliteration of Indic scripts to the Latin alphabet formed in 2001...

    , or IAST
    IAST
    The International Alphabet of Sanskrit Transliteration is a transliteration scheme that allows a lossless romanization of Indic scripts as employed by the Sanskrit language.-Popularity:...

    . This is the ISO standard, and it uses diacritic marks (e.g. ā) to reflect the additional characters and sounds of Bengali letters.

  • ITRANS
    ITRANS
    The "Indian languages TRANSliteration" is an ASCII transliteration scheme for Indic scripts, particularly for Devanagari script. It was developed by Avinash Chopde. The latest version of ITRANS is version 5.30 released in July, 2001...

     is an ASCII
    ASCII
    The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

     representation for Sanskrit; it is one-to-many, i.e. there may be more than one way of transliterating characters, which can make internet searching more complicated. ITRANS representations forgo capitalization norms of English so as to be able to represent the characters using a normal ASCII keyboard.

  • "HK" stands for two other case-sensitive letter-to-letter transliteration schemes: Harvard-Kyoto
    Harvard-Kyoto
    The Harvard-Kyoto Convention is a system for transliterating in ASCII the Sanskrit language and other languages that use the Devanāgarī script...

     and XIAST scheme. These are similar to the ITRANS scheme, and use only one form for each character.

  • XHK or Extended Harvard-Kyoto (XHK) stands for the case-sensitive letter-to-letter Extended Harvard-Kyoto
    Harvard-Kyoto
    The Harvard-Kyoto Convention is a system for transliterating in ASCII the Sanskrit language and other languages that use the Devanāgarī script...

     transliteration. This adds some specific characters for handling Bengali text to IAST.

  • "Wiki" stands for a phonemic
    Phonemic orthography
    A phonemic orthography is a writing system where the written graphemes correspond to phonemes, the spoken sounds of the language. In terms of orthographic depth, these are termed shallow orthographies, contrasting with deep orthographies...

     transcription-based romanization. It is a sound-preserving transcription based on what is perceived to be the standard pronunciation of the Bengali words, with no reference to how it is written in Bengali script. It uses diacritics often used by linguists specializing in Bengali (other than IPA
    International Phonetic Alphabet
    The International Phonetic Alphabet "The acronym 'IPA' strictly refers [...] to the 'International Phonetic Association'. But it is now such a common practice to use the acronym also to refer to the alphabet itself that resistance seems pedantic...

    ), and is the transcription system used to represent Bengali sounds in Wikipedia
    Wikipedia
    Wikipedia is a free, web-based, collaborative, multilingual encyclopedia project supported by the non-profit Wikimedia Foundation. Its 20 million articles have been written collaboratively by volunteers around the world. Almost all of its articles can be edited by anyone with access to the site,...

     articles.

Examples

The following table includes examples of Bengali words Romanized using the various systems mentioned above.
Example words
In orthography Meaning NLK
National Library at Kolkata romanization
The National Library at Kolkata romanization is the most widely used transliteration scheme in dictionaries and grammars of Indic languages. This transliteration scheme is also known as Library of Congress and is nearly identical to one of the possible ISO 15919 variants.The tables below mostly use...

 
XHK ITRANS
ITRANS
The "Indian languages TRANSliteration" is an ASCII transliteration scheme for Indic scripts, particularly for Devanagari script. It was developed by Avinash Chopde. The latest version of ITRANS is version 5.30 released in July, 2001...

 
HK
Harvard-Kyoto
The Harvard-Kyoto Convention is a system for transliterating in ASCII the Sanskrit language and other languages that use the Devanāgarī script...

 
Wiki IPA
মন mind mana mana mana mana mon [mon]
সাপ snake sāpa sApa saapa sApa shap [ʃap]
শাপ curse śāpa zApa shaapa zApa shap [ʃap]
মত opinion mata mata mata mata môt [mɔt̪]
মত like mata mata mata mata môto [mɔt̪o]
তেল oil tēla tela tela tela tel [t̪el]
গেল went gēla gela gela gela gêlo [ɡɛlo]
জ্বর fever jvara jvara jvara jvara jôr [dʒɔr]
স্বাস্থ্য health svāsthya svAsthya svaasthya svAsthya shastho [ʃast̪ʰo]
বাংলাদেশ Bangladesh bAMlAdeza baa.mlaadesha bAMlAdeza Bangladesh [baŋlad̪eʃ]
ব্যঞ্জনধ্বনি consonant byañjanadhvani byaJjanadhvani bya~njanadhvani byaJjanadhvani bênjondhoni [bɛndʒond̪ʱoni]
আত্মহত্যা suicide ātmahatyā AtmahatyA aatmahatyaa AtmahatyA attõhotta [at̪ːõhot̪ːa]

Romanization Reference

The IPA (International Phonetic Alphabet) transcription is provided in the rightmost column, representing the most common pronunciation of the glyph in Standard Colloquial Bengali, alongside the various romanizations described above.
EWLINE
Vowels
Symbol NLK
National Library at Kolkata romanization
The National Library at Kolkata romanization is the most widely used transliteration scheme in dictionaries and grammars of Indic languages. This transliteration scheme is also known as Library of Congress and is nearly identical to one of the possible ISO 15919 variants.The tables below mostly use...

 
XHK ITRANS
ITRANS
The "Indian languages TRANSliteration" is an ASCII transliteration scheme for Indic scripts, particularly for Devanagari script. It was developed by Avinash Chopde. The latest version of ITRANS is version 5.30 released in July, 2001...

 
HK
Harvard-Kyoto
The Harvard-Kyoto Convention is a system for transliterating in ASCII the Sanskrit language and other languages that use the Devanāgarī script...

 
Wiki IPA
a a a a ô/o [ɔ]/[o]
ā ā A~aa A a [a]
i i i i i [i]
ī ī I~ii I i [i]
u u u u u [u]
ū ū U~uu U u [u]
RRi~R^i R ri [ri]
ē e e e ê/e [ɛ]/[e]
ai ai ai ai oi [oj]
ō o o o o [o]
au au au au ou [ow]
EWLINE
Consonants
Symbol NLK
National Library at Kolkata romanization
The National Library at Kolkata romanization is the most widely used transliteration scheme in dictionaries and grammars of Indic languages. This transliteration scheme is also known as Library of Congress and is nearly identical to one of the possible ISO 15919 variants.The tables below mostly use...

 
XHK ITRANS
ITRANS
The "Indian languages TRANSliteration" is an ASCII transliteration scheme for Indic scripts, particularly for Devanagari script. It was developed by Avinash Chopde. The latest version of ITRANS is version 5.30 released in July, 2001...

 
HK
Harvard-Kyoto
The Harvard-Kyoto Convention is a system for transliterating in ASCII the Sanskrit language and other languages that use the Devanāgarī script...

 
Wiki IPA

k k k k k [k]
kh kh kh kh kh [kʰ]
g g g g g [ɡ]
gh gh gh gh gh [ɡʱ]
~N G ng [ŋ]
c c ch c ch [tʃ]
ch ch Ch ch chh [tʃʰ]
j j j j j [dʒ]
jh jh jh jh jh [dʒʱ]
ñ ñ ~n J n [n]
T T ţ [ʈ]
Th Th ţh [ʈʰ]
D D đ [ɖ]
ড় .D P ŗ [ɽ]
Dh Dh đh [ɖʱ]
ঢ় .Dh Ph ŗ [ɽ]
N N n [n]
t t t t t [t̪]
th th th th th [t̪ʰ]
d d d d d [d̪]
dh dh dh dh dh [d̪ʱ]
n n n n n [n]
p p p p p [p]
ph ph ph ph ph [pʰ]
b b b b b [b]
bh bh bh bh bh [bʱ]
m m m m m [m]
y y y j [dʒ]
য় y Y Y e/- [e]/–
r r r r r [r]
l l l l l [l]
ś ś sh z sh/s [ʃ]/[s]
Sh S sh [ʃ]
s s s s sh/s [ʃ]/[s]
h h h h h [ɦ]
EWLINE
Miscellaneous
Symbol NLK
National Library at Kolkata romanization
The National Library at Kolkata romanization is the most widely used transliteration scheme in dictionaries and grammars of Indic languages. This transliteration scheme is also known as Library of Congress and is nearly identical to one of the possible ISO 15919 variants.The tables below mostly use...

 
XHK ITRANS
ITRANS
The "Indian languages TRANSliteration" is an ASCII transliteration scheme for Indic scripts, particularly for Devanagari script. It was developed by Avinash Chopde. The latest version of ITRANS is version 5.30 released in July, 2001...

 
HK
Harvard-Kyoto
The Harvard-Kyoto Convention is a system for transliterating in ASCII the Sanskrit language and other languages that use the Devanāgarī script...

 
Wiki IPA
H H varies varies
.m M ng [ŋ]
.N ~ ~ [~] (nasalization
Nasalization
In phonetics, nasalization is the production of a sound while the velum is lowered, so that some air escapes through the nose during the production of the sound by the mouth...

)
y y y y varies varies
v v v v varies varies
ক্ষ x kS kkh/kh [kʰː]/[kʰ]
জ্ঞ GY jJ gg/g [ɡː]/[ɡ]
শ্র śr śr shr zr sr [sr]

Unicode

Bengali script was added to the Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 Standard in October, 1991 with the release of version 1.0.

The Unicode block for Bengali is U+0980 ... U+09FF:

Bijoy keyboard layout

The Bijoy keyboard layout was commercialized by Mostafa Jabbar as part of the Bengali software package Bijoy Ekushe.

Inscript keyboard layout

The InScript keyboard layout was designed by the Indian government to standardize the inputting of Indic scripts.

Probhat keyboard layout

People used to typing Romanized forms of Bengali will find it easier to use a more phonetic layout such as the Probhat layout shown below, which is one of several Bengali input methods
Bengali input methods
Bengali input methods refer to different systems developed to type Bengali language characters using a typewriter or a computer keyboard.-Typewriter:In the 1960s, Munier Choudhury created the first scientifically analysed key layout for typewriters...

 available.

Grapheme frequency

According to Bengali linguist Munier Chowdhury
Munier Chowdhury
Munier Chowdhury was a Bangladeshi educationist, playwright, literary critic and political dissident.-Education:...

, the following 9 graphemes are the most frequent in Bengali texts :
Grapheme Percentage
11.32
8.96
7.01
6.63
4.44
4.15
4.14
3.83
2.78

Collating sequence

There is yet to be a uniform standard collating sequence
Collating sequence
The term collating sequence refers to the order in which individual characters should be taken when sorting a collection of character strings using dictionary order. This article is concerned with the order of the alphabetical characters comprising variants of the Latin alphabet in various languages...

 (sorting order of graphemes to be used in dictionaries, indices, computer sorting programs, etc.) of Bengali graphemes. Experts in both India and Bangladesh are currently working towards a common solution for this problem.

Article 1 of the Universal Declaration of Human Rights


Bengali
Bengali language
Bengali or Bangla is an eastern Indo-Aryan language. It is native to the region of eastern South Asia known as Bengal, which comprises present day Bangladesh, the Indian state of West Bengal, and parts of the Indian states of Tripura and Assam. It is written with the Bengali script...

 in Eastern Nagari script
Eastern Nagari script
The Eastern Nagari script is an Abugida system of writing belonging to the Brahmic family of scripts which use is associated with the two main languages Assamese and Bengali and other related variants such as, Bishnupriya Manipuri, Maithili, Mising, Meitei Manipuri, Sylheti, and Chittagonian...


ধারা ১: সমস্ত মানুষ স্বাধীনভাবে সমান মর্যাদা এবং অধিকার নিয়ে জন্মগ্রহণ করে। তাঁদের বিবেক এবং বুদ্ধি আছে; সুতরাং সকলেরই একে অপরের প্রতি ভ্রাতৃত্বসুলভ মনোভাব নিয়ে আচরণ করা উচিত।



Bengali
Bengali language
Bengali or Bangla is an eastern Indo-Aryan language. It is native to the region of eastern South Asia known as Bengal, which comprises present day Bangladesh, the Indian state of West Bengal, and parts of the Indian states of Tripura and Assam. It is written with the Bengali script...

 in Romanization
.



Bengali
Bengali language
Bengali or Bangla is an eastern Indo-Aryan language. It is native to the region of eastern South Asia known as Bengal, which comprises present day Bangladesh, the Indian state of West Bengal, and parts of the Indian states of Tripura and Assam. It is written with the Bengali script...

 in IPA
d̪ʱara æk ʃɔmost̪o manuʃ ʃad̪ʱinbʱabe ʃɔman mɔrdʒad̪a eboŋ od̪ʱikar nie dʒɔnmoɡrohon kɔre. t̪ãd̪er bibek eboŋ bud̪ʱːi atʃʰe; ʃut̪oraŋ ʃɔkoleri æke ɔporer prot̪i bʱrat̪ːrit̪ːoʃulɔbʱ monobʱab nie atʃoron kɔra utʃit̪.



Gloss
Clause 1: All human free-manner-in equal dignity and right taken birth-take do. Their reason and intelligence exist; therefore everyone-indeed one another's towards brotherhood-ly attitude taken conduct do should.



Translation
Article 1: All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience. Therefore, they should act towards one another in a spirit of brotherhood.

Jana Gana Mana

The following is a sample text of script, from the song Jana Gana Mana
Jana Gana Mana
Jana Gana Mana is the national anthem of India. Written in highly Sanskritized Bengali, it is the first of five stanzas of a Brahmo hymn composed and scored by Nobel laureate Rabindranath Tagore. It was first sung at the Calcutta Session of the Indian National Congress on 27 December 1911...

 (জন গণ মন Jôno Gôno Mono). The selection is a Bengali
Bengali language
Bengali or Bangla is an eastern Indo-Aryan language. It is native to the region of eastern South Asia known as Bengal, which comprises present day Bangladesh, the Indian state of West Bengal, and parts of the Indian states of Tripura and Assam. It is written with the Bengali script...

 song, written in Shadhubhasha (সাধুভাষা) style. The song was later adopted as the national anthem
National anthem
A national anthem is a generally patriotic musical composition that evokes and eulogizes the history, traditions and struggles of its people, recognized either by a nation's government as the official national song, or by convention through use by the people.- History :Anthems rose to prominence...

 of India
India
India , officially the Republic of India , is a country in South Asia. It is the seventh-largest country by geographical area, the second-most populous country with over 1.2 billion people, and the most populous democracy in the world...

. It was written by Rabindranath Tagore
Rabindranath Tagore
Rabindranath Tagore , sobriquet Gurudev, was a Bengali polymath who reshaped his region's literature and music. Author of Gitanjali and its "profoundly sensitive, fresh and beautiful verse", he became the first non-European Nobel laureate by earning the 1913 Prize in Literature...

 (রবীন্দ্রনাথ ঠাকুর ) who is acknowledged as the single most important and defining figure of Bengali literature.


জনগণমন-অধিনায়ক জয় হে ভারতভাগ্যবিধাতা!

পঞ্জাব সিন্ধু গুজরাট মরাঠা দ্রাবিড় উত্কল বঙ্গ

বিন্ধ্য হিমাচল যমুনা গঙ্গা উচ্ছলজলধিতরঙ্গ

তব শুভ নামে জাগে, তব শুভ আশিস মাগে,

গাহে তব জয়গাথা।

জনগণমঙ্গলদায়ক জয় হে ভারতভাগ্যবিধাতা!

জয় হে, জয় হে, জয় হে, জয় জয় জয়, জয় হে ॥





In Romanization:


Jônogônomono-odhinaeoko jôeô he Bharotobhaggobidhata!

Pônjabo Shindhu Gujoraţo Môraţha Drabiŗo Utkôlo Bônggo,

Bindho Himachôlo Jomuna Gôngga Uchchhôlojôlodhitoronggo,

Tôbo shubho name jage, tôbo shubho ashish mage,

Gahe tôbo jôeogatha.

Jônogônomonggolodaeoko jôeô he Bharotobhaggobidhata!

Jôeo he, jôeo he, jôeo he, jôeo jôeo jôeo, jôeo he!



External links



digital encoding and rendering
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK