Interword separation
Encyclopedia
In punctuation
, a word divider is a glyph that separates written words. In languages which use the Latin
, Cyrillic
, and Arabic alphabet
s, as well as other languages of Europe and the Mideast, the word divider is a blank space
, or whitespace, a convention which is spreading, along with other aspects of European punctuation, to Asia and Africa. However, many languages of East Asia are written without word separation (Saenger 2000).
, determinative
s may have been used as much to demarcate word boundaries as to disambiguate the semantics of words. Rarely in Assyrian cuneiform, but commonly in the later cuneiform Ugaritic alphabet
, a vertical stroke đ’‘° was used to separate words. In Old Persian cuneiform, a diagonally sloping wedge was used .
As the alphabet spread throughout the ancient world, words were often run together without division, and this practice remains or remained until recently in much of South and Southeast Asia. However, not infrequently in inscriptions a vertical line, and in manuscripts a single (·), double (:), or triple interpunct
(dot) was used to divide words. This practice was found in Phoenician
, Aramaic
, Hebrew
, Greek
, and Latin
, and continues today with Ethiopic
, though there whitespace is gaining ground.
ic writing systems of Mesopotamia
, such as the Phoenician alphabet
, had only signs for consonant
s (although some signs for consonant could also stand for a vowel
, so-called matres lectionis). Without some form of visible word dividers, parsing a text into its separate words would have been a puzzle. With the introduction of letters representing vowels in the Greek alphabet
, the need for inter-word separation became much less. The earliest Greek inscriptions used interpuncts, as was common in the writing systems which preceded it, but soon the practice of scriptio continua
, continuous writing in which all words ran together without separation became common.
In the 7th century Irish monks started using blank spaces, and introduced their script to France. By the 8th or 9th century spacing was being used fairly consistently across Europe (Knight 1996).
None
Alphabetic writing without inter-word separation, known as scriptio continua, was used in Ancient Egyptian. It appeared in Post-classical Latin after several centuries of the use of the interpunct.
Traditionally, scriptio continua was used for the Indic alphabets of South and Southeast Asia and hangul
of Korea, but spacing is now used with hangul and increasingly with the Indic alphabets.
Today Chinese and Japanese are the main scripts consistently written without punctuation to separate words. In Classical Chinese, a word and a character
were almost the same thing, so that word dividers would have been superfluous. Although Modern Mandarin
has numerous polysyllabic words, and each syllable is written with a distinct character, the conceptual link between character and word or at least morpheme
remains strong, and no need is felt for word separation apart from what characters already provide.
Vertical lines
Ancient inscribed and cuneiform scripts such as Anatolian hieroglyphs
frequently used short vertical lines to separate words, as did Linear B
. In manuscripts, vertical lines were more commonly used for larger breaks, equivalent to the Latin comma and period. This was the case for Biblical Hebrew (the paseq) and continues with many Indic scripts today.
Interpunct and double dot
As noted above, the single and double interpunct were used in manuscripts (on paper) throughout the ancient world. For example, Ethiopic inscriptions used a vertical line, whereas manuscripts used double dots resembling a colon. The latter practice continues today, though the space is making inroads. Classical Latin used the interpunct in both paper manuscripts and stone inscriptions (Wingo 1972:16).
Different letter forms
In the modern Hebrew
and Arabic alphabet
s, some letters have distinct forms at the ends and/or beginnings of words. This demarcation is used in addition to spacing.
Vertical arrangement
The Nastaʿlīq form of Arabic calligraphy uses vertical arrangement to separate words. The beginning of each word is written higher than the end of the preceding word, so that a line of text takes on a sawtooth appearance. Nastaliq spread from Persia and today is used for Persian
, Uyghur
, Pashto, and Urdu
.
Punctuation
Punctuation marks are symbols that indicate the structure and organization of written language, as well as intonation and pauses to be observed when reading aloud.In written English, punctuation is vital to disambiguate the meaning of sentences...
, a word divider is a glyph that separates written words. In languages which use the Latin
Latin alphabet
The Latin alphabet, also called the Roman alphabet, is the most recognized alphabet used in the world today. It evolved from a western variety of the Greek alphabet called the Cumaean alphabet, which was adopted and modified by the Etruscans who ruled early Rome...
, Cyrillic
Cyrillic alphabet
The Cyrillic script or azbuka is an alphabetic writing system developed in the First Bulgarian Empire during the 10th century AD at the Preslav Literary School...
, and Arabic alphabet
Arabic alphabet
The Arabic alphabet or Arabic abjad is the Arabic script as it is codified for writing the Arabic language. It is written from right to left, in a cursive style, and includes 28 letters. Because letters usually stand for consonants, it is classified as an abjad.-Consonants:The Arabic alphabet has...
s, as well as other languages of Europe and the Mideast, the word divider is a blank space
Space (punctuation)
In writing, a space is a blank area devoid of content, serving to separate words, letters, numbers, and punctuation. Conventions for interword and intersentence spaces vary among languages, and in some cases the spacing rules are quite complex....
, or whitespace, a convention which is spreading, along with other aspects of European punctuation, to Asia and Africa. However, many languages of East Asia are written without word separation (Saenger 2000).
History
In Ancient EgyptianEgyptian hieroglyphs
Egyptian hieroglyphs were a formal writing system used by the ancient Egyptians that combined logographic and alphabetic elements. Egyptians used cursive hieroglyphs for religious literature on papyrus and wood...
, determinative
Determinative
A determinative, also known as a taxogram or semagram, is an ideogram used to mark semantic categories of words in logographic scripts which helps to disambiguate interpretation. They have no direct counterpart in spoken language, though they may derive historically from glyphs for real words, and...
s may have been used as much to demarcate word boundaries as to disambiguate the semantics of words. Rarely in Assyrian cuneiform, but commonly in the later cuneiform Ugaritic alphabet
Ugaritic alphabet
The Ugaritic script is a cuneiform abjad used from around 1400 BCE for Ugaritic, an extinct Northwest Semitic language, and discovered in Ugarit , Syria, in 1928. It has 30 letters...
, a vertical stroke đ’‘° was used to separate words. In Old Persian cuneiform, a diagonally sloping wedge was used .
As the alphabet spread throughout the ancient world, words were often run together without division, and this practice remains or remained until recently in much of South and Southeast Asia. However, not infrequently in inscriptions a vertical line, and in manuscripts a single (·), double (:), or triple interpunct
Interpunct
An interpunct —also called an interpoint—is a small dot used for interword separation in ancient Latin script, which also appears in some modern languages as a stand-alone sign inside a word. It is present in Unicode as code point ....
(dot) was used to divide words. This practice was found in Phoenician
Phoenician alphabet
The Phoenician alphabet, called by convention the Proto-Canaanite alphabet for inscriptions older than around 1050 BC, was a non-pictographic consonantal alphabet, or abjad. It was used for the writing of Phoenician, a Northern Semitic language, used by the civilization of Phoenicia...
, Aramaic
Aramaic alphabet
The Aramaic alphabet is adapted from the Phoenician alphabet and became distinctive from it by the 8th century BC. The letters all represent consonants, some of which are matres lectionis, which also indicate long vowels....
, Hebrew
Hebrew alphabet
The Hebrew alphabet , known variously by scholars as the Jewish script, square script, block script, or more historically, the Assyrian script, is used in the writing of the Hebrew language, as well as other Jewish languages, most notably Yiddish, Ladino, and Judeo-Arabic. There have been two...
, Greek
Greek alphabet
The Greek alphabet is the script that has been used to write the Greek language since at least 730 BC . The alphabet in its classical and modern form consists of 24 letters ordered in sequence from alpha to omega...
, and Latin
Latin alphabet
The Latin alphabet, also called the Roman alphabet, is the most recognized alphabet used in the world today. It evolved from a western variety of the Greek alphabet called the Cumaean alphabet, which was adopted and modified by the Etruscans who ruled early Rome...
, and continues today with Ethiopic
Ge'ez alphabet
Ge'ez , also called Ethiopic, is a script used as an abugida for several languages of Ethiopia and Eritrea but originated in an abjad used to write Ge'ez, now the liturgical language of the Ethiopian and Eritrean Orthodox Church...
, though there whitespace is gaining ground.
Scriptio continua
The early alphabetAlphabet
An alphabet is a standard set of letters—basic written symbols or graphemes—each of which represents a phoneme in a spoken language, either as it exists now or as it was in the past. There are other systems, such as logographies, in which each character represents a word, morpheme, or semantic...
ic writing systems of Mesopotamia
Mesopotamia
Mesopotamia is a toponym for the area of the Tigris–Euphrates river system, largely corresponding to modern-day Iraq, northeastern Syria, southeastern Turkey and southwestern Iran.Widely considered to be the cradle of civilization, Bronze Age Mesopotamia included Sumer and the...
, such as the Phoenician alphabet
Phoenician alphabet
The Phoenician alphabet, called by convention the Proto-Canaanite alphabet for inscriptions older than around 1050 BC, was a non-pictographic consonantal alphabet, or abjad. It was used for the writing of Phoenician, a Northern Semitic language, used by the civilization of Phoenicia...
, had only signs for consonant
Consonant
In articulatory phonetics, a consonant is a speech sound that is articulated with complete or partial closure of the vocal tract. Examples are , pronounced with the lips; , pronounced with the front of the tongue; , pronounced with the back of the tongue; , pronounced in the throat; and ,...
s (although some signs for consonant could also stand for a vowel
Vowel
In phonetics, a vowel is a sound in spoken language, such as English ah! or oh! , pronounced with an open vocal tract so that there is no build-up of air pressure at any point above the glottis. This contrasts with consonants, such as English sh! , where there is a constriction or closure at some...
, so-called matres lectionis). Without some form of visible word dividers, parsing a text into its separate words would have been a puzzle. With the introduction of letters representing vowels in the Greek alphabet
Greek alphabet
The Greek alphabet is the script that has been used to write the Greek language since at least 730 BC . The alphabet in its classical and modern form consists of 24 letters ordered in sequence from alpha to omega...
, the need for inter-word separation became much less. The earliest Greek inscriptions used interpuncts, as was common in the writing systems which preceded it, but soon the practice of scriptio continua
Scriptio continua
Scriptio continua is a style of writing without spaces or other marks between words or sentences....
, continuous writing in which all words ran together without separation became common.
Use of spaces in Medieval Latin
The interpunct died out in Latin only after the Classic period, sometime around the year 200 CE, as the Greek style of scriptio continua became fashionable.In the 7th century Irish monks started using blank spaces, and introduced their script to France. By the 8th or 9th century spacing was being used fairly consistently across Europe (Knight 1996).
Types of word divider
The Latin interpunct |
None
Alphabetic writing without inter-word separation, known as scriptio continua, was used in Ancient Egyptian. It appeared in Post-classical Latin after several centuries of the use of the interpunct.
Traditionally, scriptio continua was used for the Indic alphabets of South and Southeast Asia and hangul
Hangul
Hangul,Pronounced or ; Korean: 한글 Hangeul/Han'gĹl or ěˇ°ě„ ę¸€ ChosĹŹn'gĹl/Joseongeul the Korean alphabet, is the native alphabet of the Korean language. It is a separate script from Hanja, the logographic Chinese characters which are also sometimes used to write Korean...
of Korea, but spacing is now used with hangul and increasingly with the Indic alphabets.
Today Chinese and Japanese are the main scripts consistently written without punctuation to separate words. In Classical Chinese, a word and a character
Chinese character
Chinese characters are logograms used in the writing of Chinese and Japanese , less frequently Korean , formerly Vietnamese , or other languages...
were almost the same thing, so that word dividers would have been superfluous. Although Modern Mandarin
Standard Mandarin
Standard Chinese or Modern Standard Chinese, also known as Mandarin or Putonghua, is the official language of the People's Republic of China and Republic of China , and is one of the four official languages of Singapore....
has numerous polysyllabic words, and each syllable is written with a distinct character, the conceptual link between character and word or at least morpheme
Morpheme
In linguistics, a morpheme is the smallest semantically meaningful unit in a language. The field of study dedicated to morphemes is called morphology. A morpheme is not identical to a word, and the principal difference between the two is that a morpheme may or may not stand alone, whereas a word,...
remains strong, and no need is felt for word separation apart from what characters already provide.
Vertical lines
Ancient inscribed and cuneiform scripts such as Anatolian hieroglyphs
Anatolian hieroglyphs
Anatolian hieroglyphs are an indigenous logographic script native to central Anatolia, consisting of some 500 signs. They were once commonly known as Hittite hieroglyphs, but the language they encode proved to be Luwian, not Hittite, and the term Luwian hieroglyphs is used in English publications...
frequently used short vertical lines to separate words, as did Linear B
Linear B
Linear B is a syllabic script that was used for writing Mycenaean Greek, an early form of Greek. It pre-dated the Greek alphabet by several centuries and seems to have died out with the fall of Mycenaean civilization...
. In manuscripts, vertical lines were more commonly used for larger breaks, equivalent to the Latin comma and period. This was the case for Biblical Hebrew (the paseq) and continues with many Indic scripts today.
Interpunct and double dot
As noted above, the single and double interpunct were used in manuscripts (on paper) throughout the ancient world. For example, Ethiopic inscriptions used a vertical line, whereas manuscripts used double dots resembling a colon. The latter practice continues today, though the space is making inroads. Classical Latin used the interpunct in both paper manuscripts and stone inscriptions (Wingo 1972:16).
Different letter forms
In the modern Hebrew
Hebrew alphabet
The Hebrew alphabet , known variously by scholars as the Jewish script, square script, block script, or more historically, the Assyrian script, is used in the writing of the Hebrew language, as well as other Jewish languages, most notably Yiddish, Ladino, and Judeo-Arabic. There have been two...
and Arabic alphabet
Arabic alphabet
The Arabic alphabet or Arabic abjad is the Arabic script as it is codified for writing the Arabic language. It is written from right to left, in a cursive style, and includes 28 letters. Because letters usually stand for consonants, it is classified as an abjad.-Consonants:The Arabic alphabet has...
s, some letters have distinct forms at the ends and/or beginnings of words. This demarcation is used in addition to spacing.
Vertical arrangement
The Nastaʿlīq form of Arabic calligraphy uses vertical arrangement to separate words. The beginning of each word is written higher than the end of the preceding word, so that a line of text takes on a sawtooth appearance. Nastaliq spread from Persia and today is used for Persian
Persian language
Persian is an Iranian language within the Indo-Iranian branch of the Indo-European languages. It is primarily spoken in Iran, Afghanistan, Tajikistan and countries which historically came under Persian influence...
, Uyghur
Uyghur language
Uyghur , formerly known as Eastern Turk, is a Turkic language with 8 to 11 million speakers, spoken primarily by the Uyghur people in the Xinjiang Uyghur Autonomous Region of Western China. Significant communities of Uyghur-speakers are located in Kazakhstan and Uzbekistan, and various other...
, Pashto, and Urdu
Urdu
Urdu is a register of the Hindustani language that is identified with Muslims in South Asia. It belongs to the Indo-European family. Urdu is the national language and lingua franca of Pakistan. It is also widely spoken in some regions of India, where it is one of the 22 scheduled languages and an...
.
See also
- WhitespaceWhitespace (computer science)In computer science, whitespace is any single character or series of characters that represents horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visual mark, but typically does occupy an area on a page...
- Sentence spacing
- Zero-width non-joinerZero-width non-joinerThe zero-width non-joiner is a non-printing character used in the computerization of writing systems that make use of ligatures. When placed between two characters that would otherwise be connected into a ligature, a ZWNJ causes them to be printed in their final and initial forms, respectively...
- Zero-width spaceZero-width spaceThe zero-width space is a non-printing character used in computerized typesetting to indicate word boundaries to text processing systems when using scripts that do not use explicit spacing, or after characters that are not followed by a visible space but after which there may nevertheless be a...