LZWL
Encyclopedia
LZWL is a syllable-based variant of the character-based LZW
compression algorithm.http://www.cs.vsb.cz/dateso/2005/slides/slides6.pps
LZWL can work with syllables obtained by all algorithms of decomposition into syllables. This algorithm can be used for words too.
, syllable is defined as:
As the decomposition to syllables is used in data compression, it is not necessary to decompose words into syllables always correctly.
In the initialization step the dictionary is filled up with all characters from the alphabet. In each next step it is searched for the maximal string S, which is from the dictionary and matches the prefix of the still non-coded part of the input. The number of phrase S is sent to the output. A new phrase is added to the dictionary. This phrase is created by concatenation of string S and the character that follows S in file. The actual input position is moved forward by the length of S.
Decoding has only one situation for solving. We can receive the number of phrase, which is not from the dictionary. In this case we can create that phrase by concatenation of the last added phrase with its first character.
The syllable-based version works over an alphabet of syllables. In the initialization step we add to the dictionary the empty syllable and small syllables from a database of frequent syllables. Finding string S and coding its number is similar to the character-based version, except that string S is a string of syllables. The number of phrase S is encoded to the output. The string S can be the empty syllable.
If S is the empty syllable, then we must get from the file one syllable called K and encode K by methods for coding new syllables. Syllable K is added to the dictionary. The position in the file is moved forward by the length of S. In the case when S is the empty syllable, the input position is moved forward by the length of K.
In adding a phrase to the dictionary there is a difference to the character-based version. The phrase from the next step will be called S1. If S and S1 are both non-empty syllables, then we add a new phrase to the dictionary. The new phrase is created by the concatenation of S1 with the first syllable of S. This solution has two advantages: The first is that strings are not created from syllables that appear only once. The second advantage is that we cannot receive in decoder number of phrase that is not from dictionary.
LZW
Lempel–Ziv–Welch is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978...
compression algorithm.http://www.cs.vsb.cz/dateso/2005/slides/slides6.pps
LZWL can work with syllables obtained by all algorithms of decomposition into syllables. This algorithm can be used for words too.
Syllables
According to WiktionaryWiktionary
Wiktionary is a multilingual, web-based project to create a free content dictionary, available in 158 languages...
, syllable is defined as:
- A unit of human speech that is interpreted by the listener as a single sound, although syllables usually consist of one or more vowelVowelIn phonetics, a vowel is a sound in spoken language, such as English ah! or oh! , pronounced with an open vocal tract so that there is no build-up of air pressure at any point above the glottis. This contrasts with consonants, such as English sh! , where there is a constriction or closure at some...
sounds, either alone or combined with the sound of one or more consonantConsonantIn articulatory phonetics, a consonant is a speech sound that is articulated with complete or partial closure of the vocal tract. Examples are , pronounced with the lips; , pronounced with the front of the tongue; , pronounced with the back of the tongue; , pronounced in the throat; and ,...
s; a word consists of one or more syllables. - The written representation of a given pronounced syllable.
As the decomposition to syllables is used in data compression, it is not necessary to decompose words into syllables always correctly.
Algorithm
Algorithm LZWL can work with syllables obtained by all algorithms of decomposition into syllables. This algorithm can be used for words too.In the initialization step the dictionary is filled up with all characters from the alphabet. In each next step it is searched for the maximal string S, which is from the dictionary and matches the prefix of the still non-coded part of the input. The number of phrase S is sent to the output. A new phrase is added to the dictionary. This phrase is created by concatenation of string S and the character that follows S in file. The actual input position is moved forward by the length of S.
Decoding has only one situation for solving. We can receive the number of phrase, which is not from the dictionary. In this case we can create that phrase by concatenation of the last added phrase with its first character.
The syllable-based version works over an alphabet of syllables. In the initialization step we add to the dictionary the empty syllable and small syllables from a database of frequent syllables. Finding string S and coding its number is similar to the character-based version, except that string S is a string of syllables. The number of phrase S is encoded to the output. The string S can be the empty syllable.
If S is the empty syllable, then we must get from the file one syllable called K and encode K by methods for coding new syllables. Syllable K is added to the dictionary. The position in the file is moved forward by the length of S. In the case when S is the empty syllable, the input position is moved forward by the length of K.
In adding a phrase to the dictionary there is a difference to the character-based version. The phrase from the next step will be called S1. If S and S1 are both non-empty syllables, then we add a new phrase to the dictionary. The new phrase is created by the concatenation of S1 with the first syllable of S. This solution has two advantages: The first is that strings are not created from syllables that appear only once. The second advantage is that we cannot receive in decoder number of phrase that is not from dictionary.