National Corpus of Polish
Encyclopedia
The National Corpus of Polish (Polish : Narodowy Korpus Języka Polskiego NKJP) is the biggest and the most important corpus
of the Polish language
. A linguistic corpus is a collection of texts where one can find the typical use of a single word or a phrase, as well as their meaning and grammatical function.
The intended size of the whole National Corpus of Polish is 1 billion words, of which at least 300-million word subcorpus will be carefully balanced.
The demo version contains over 1200 million words from the three segments of the Polish language corpora: IPIPAN, PELCRA and PWN (September 2009)
The corpus contains classic literature, daily newspapers, specialist periodicals and journals, transcripts of conversations, and a variety of short-lived and internet texts.
(not publicly available), followed by the corpus of PWN publishers, then the corpus of the PELCRA group at the University of Łódź, and finally the corpus of the Institute of Computer Science, Polish Academy of Science. All four teams decided to
join forces in 2006, forming the Consortium for the National Corpus of Polish.
Instytut Podstaw Informatyki Polskiej Akademii Nauk
Instytut Jezyka Polskiego Polskiej Akademii Nauk
Corpus
Corpus is Latin for body. It may refer to:* Corpus Christi * Corpus, the figure of Christ on a crucifix.* Corpus linguistics...
of the Polish language
Polish language
Polish is a language of the Lechitic subgroup of West Slavic languages, used throughout Poland and by Polish minorities in other countries...
. A linguistic corpus is a collection of texts where one can find the typical use of a single word or a phrase, as well as their meaning and grammatical function.
Description
The National Corpus of Polish is a shared initiative of four institutions: Institute of Computer Science at the Polish Academy of Sciences (coordinator), Institute of Polish Language at the Polish Academy of Sciences, Polish Scientific Publishers PWN, and the Department of Computational and Corpus Linguistics at the University of Łódź. It has been registered as a research-development project of the Ministry of Science and Higher Education.The intended size of the whole National Corpus of Polish is 1 billion words, of which at least 300-million word subcorpus will be carefully balanced.
The demo version contains over 1200 million words from the three segments of the Polish language corpora: IPIPAN, PELCRA and PWN (September 2009)
The corpus contains classic literature, daily newspapers, specialist periodicals and journals, transcripts of conversations, and a variety of short-lived and internet texts.
Search Engines
- PELCRA – 1200 milions words from three corporas : IPIPAN, PELCRA, PWN. It is easy to use and the results can be downloaded in form of spreadsheets. A special query syntax also allows the use of morphological expansion and spelling, the search in one query options and flexible lexical phraseological compounds. PELCRA offers also a visualization of the registry function and the generation of time series for words, phrases and idioms.
- POLICARP- Poliqarp gives the ability to search for specific words or phrases. It also allows to find the sequence determined using regular expressions, for example, all occurring in the body of phrases consisting of noun and an adjective or all of the grammatical forms of the selected word (especially useful for studies on the Polish language.) These operations, in both on-line and off-line, run pretty quickly - in the simple search queries, does not take more than a few seconds. Finally, we draw attention to a lot of configuration options of the program.
History
The first corpus to emerge was developed by the Institute of the Polish Language, Polish Academy of SciencesPolish Academy of Sciences
The Polish Academy of Sciences, headquartered in Warsaw, is one of two Polish institutions having the nature of an academy of sciences.-History:...
(not publicly available), followed by the corpus of PWN publishers, then the corpus of the PELCRA group at the University of Łódź, and finally the corpus of the Institute of Computer Science, Polish Academy of Science. All four teams decided to
join forces in 2006, forming the Consortium for the National Corpus of Polish.
External links
Narodowy Korpus Jezyka PolskiegoInstytut Podstaw Informatyki Polskiej Akademii Nauk
Instytut Jezyka Polskiego Polskiej Akademii Nauk