Lancaster-Oslo-Bergen Corpus
Encyclopedia
The Lancaster-Oslo-Bergen Corpus (often abbreviated as LOB Corpus) was compiled in 1980s
1980s
File:1980s decade montage.png|thumb|400px|From left, clockwise: The first Space Shuttle, Columbia, lifted off in 1981; American President Ronald Reagan and Soviet leader Mikhail Gorbachev eased tensions between the two superpowers, leading to the end of the Cold War; The Fall of the Berlin Wall in...

 in collaboration between the University of Lancaster, the University of Oslo
University of Oslo
The University of Oslo , formerly The Royal Frederick University , is the oldest and largest university in Norway, situated in the Norwegian capital of Oslo. The university was founded in 1811 and was modelled after the recently established University of Berlin...

, and the Norwegian Computing Centre for the Humanities, Bergen
Bergen
Bergen is the second largest city in Norway with a population of as of , . Bergen is the administrative centre of Hordaland county. Greater Bergen or Bergen Metropolitan Area as defined by Statistics Norway, has a population of as of , ....

, to provide a British counterpart to the Brown Corpus
Brown Corpus
The Brown University Standard Corpus of Present-Day American English was compiled in the 1960s by Henry Kucera and W. Nelson Francis at Brown University, Providence, Rhode Island as a general corpus in the field of corpus linguistics...

 compiled by Kucera and Francis for American English in 1960s
1960s
The 1960s was the decade that started on January 1, 1960, and ended on December 31, 1969. It was the seventh decade of the 20th century.The 1960s term also refers to an era more often called The Sixties, denoting the complex of inter-related cultural and political trends across the globe...

.

Its composition was designed to match the original Brown corpus in terms of its size and genres as closely as possible using documents published in the UK by British authors. Both corpora consist of 500 samples each comprising about 2000 words in the following genres:
Label Text category Brown Corpus LOB Corpus
A Press: reportage 44 44
B Press: editorial 27 27
C Press: reviews 17 17
D Religion 17 17
E Skills, trades and hobbies 36 38
F Popular lore 48 44
G Belles lettres, biography, essays 75 77
H Miscellaneous (documents, reports, etc) 30 30
J Learned and scientific writings 80 80
K General fiction 29 29
L Mystery and detective fiction 24 24
M Science fiction 6 6
N Adventure and western fiction 29 29
P Romance and love story 29 29
R Humour 9 9
Total 500 500

The corpus has been also tagged
Part-of-speech tagging
In corpus linguistics, part-of-speech tagging , also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text as corresponding to a particular part of speech, based on both its definition, as well as its context—i.e...

, i.e. part-of-speech categories have been assigned to every word.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK