Scottish Corpus of Texts and Speech
Encyclopedia
The Scottish Corpus of Texts & Speech (SCOTS) is an ongoing project to build a corpus
Text corpus
In linguistics, a corpus or text corpus is a large and structured set of texts...

 of modern-day (post-1940) written and spoken texts in Scottish English
Scottish English
Scottish English refers to the varieties of English spoken in Scotland. It may or may not be considered distinct from the Scots language. It is always considered distinct from Scottish Gaelic, a Celtic language....

 and varieties of Scots
Scots language
Scots is the Germanic language variety spoken in Lowland Scotland and parts of Ulster . It is sometimes called Lowland Scots to distinguish it from Scottish Gaelic, the Celtic language variety spoken in most of the western Highlands and in the Hebrides.Since there are no universally accepted...

. SCOTS has been available online since November 2004, and can be freely searched and browsed. By the end of the project, in mid-2007, SCOTS aims to increase the size of the text collection to 4 million words.

The project is a venture by the Department of English Language and STELLA project at the University of Glasgow
University of Glasgow
The University of Glasgow is the fourth-oldest university in the English-speaking world and one of Scotland's four ancient universities. Located in Glasgow, the university was founded in 1451 and is presently one of seventeen British higher education institutions ranked amongst the top 100 of the...

. SCOTS is grant-funded by the Arts and Humanities Research Council
Arts and Humanities Research Council
Established in April 2005 as successor to the Arts and Humanities Research Board, the Arts and Humanities Research Council is a British Research Council and non-departmental public body that provides approximately £102 million from the Government to support research and postgraduate study in the...

.

Language Variety

SCOTS contains texts in Scottish English
Scottish English
Scottish English refers to the varieties of English spoken in Scotland. It may or may not be considered distinct from the Scots language. It is always considered distinct from Scottish Gaelic, a Celtic language....

 and varieties of broad Scots, including Doric
Doric dialect (Scotland)
Doric, the popular name for Mid Northern Scots or Northeast Scots, refers to the dialects of Scots spoken in the northeast of Scotland.-Nomenclature:...

, Lallans
Lallans
Lallans , a variant of the Modern Scots word lawlands meaning the lowlands of Scotland, was also traditionally used to refer to the Scots language as a whole...

, urban varieties such as Glaswegian
Glasgow patter
Glaswegian or The Glasgow Patter is a dialect spoken in and around Glasgow, Scotland. In addition to local West Mid Scots, the dialect has Highland English and Hiberno-English influences, owing to the speech of Highlanders and Irish people, who migrated in large numbers to the Glasgow area in the...

 and Insular Scots
Insular Scots
Insular Scots comprises varieties of Lowland Scots generally subdivided into:*Shetlandic*OrcadianBoth dialects share much Norn vocabulary, Shetlandic more so, than does any other Scots dialect, perhaps because they both were under strong Scandinavian influence in their recent past.It should not be...

. SCOTS contains a geographical spread of texts as well as a demographic spread. Each text is accompanied by extensive metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

, including such information as author’s decade of birth, gender, occupation, birthplace and place of residence, and details about the text such as publication information, audience, date and genre.

Genre and Mode

SCOTS is a multimedia
Multimedia
Multimedia is media and content that uses a combination of different content forms. The term can be used as a noun or as an adjective describing a medium as having multiple content forms. The term is used in contrast to media which use only rudimentary computer display such as text-only, or...

 corpus
Text corpus
In linguistics, a corpus or text corpus is a large and structured set of texts...

, containing written texts and spoken texts, available as orthographic transcriptions
Transcription (linguistics)
Transcription in the linguistic sense is the systematic representation of language in written form. The source can either be utterances or preexisting text in another writing system, although some linguists only consider the former as transcription.Transcription should not be confused with...

, accompanied by source audio or video files. SCOTS includes a large number of genre
Genre
Genre , Greek: genos, γένος) is the term for any category of literature or other forms of art or culture, e.g. music, and in general, any type of discourse, whether written or spoken, audial or visual, based on some set of stylistic criteria. Genres are formed by conventions that change over time...

s and text types, including prose fiction, poetry, business and personal correspondence, religious texts, parliamentary and administrative documents, emails, conversations and interviews.

Search and Analysis

SCOTS can be investigated in various ways, depending on the user’s interest. The corpus can be browsed, for example by the author’s name or date of the text, and all texts can be downloaded in plain text
Plain text
In computing, plain text is the contents of an ordinary sequential file readable as textual material without much processing, usually opposed to formatted text....

 format.

Transcriptions
Transcription (linguistics)
Transcription in the linguistic sense is the systematic representation of language in written form. The source can either be utterances or preexisting text in another writing system, although some linguists only consider the former as transcription.Transcription should not be confused with...

 are synchronised with audio / video files, which are streamed and may also be downloaded.

An Advanced Search facility allows the user to build up more complex queries, choosing from all the fields available in the metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

. Geographical results are plotted on an interactive map, so regional variation may be investigated.

Advanced Search results can also be viewed as a KWIC concordance, which can be reordered to highlight collocation
Collocation
In corpus linguistics, collocation defines a sequence of words or terms that co-occur more often than would be expected by chance. In phraseology, collocation is a sub-type of phraseme. An example of a phraseological collocation is the expression strong tea...

al patterns.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK