Match Rating Approach
Encyclopedia
A phonetic algorithm
developed by Western Airlines
in 1977 for the indexation and comparison of homophonous names.
The algorithm itself has a simple set of encoding rules but a more lengthy set of comparison rules.
The main mechanism being the similarity comparison which calculates the number of unmatched characters by comparing the strings from left to right and then from right to left and removing identical characters. This value is subtracted from 6 and then compared to a minimum threshold. The minimum threshold is defined by table A and is dependent upon the length of the strings.
The encoded name is known (perhaps incorrectly) as a personal numeric identifier (PNI). The PNI codex
can never contain more than 6 alpha only characters.
Match rating approach performs well with names containing the letter "y" unlike the original flavour of the NYSIIS algorithm. For example, the surnames "Smith" and "Smyth" are successfully matched.
MRA does not perform well with encoded names that differ in length by more than 2.
Phonetic algorithm
A phonetic algorithm is an algorithm for indexing of words by their pronunciation. Most phonetic algorithms were developed for use with the English language; consequently, applying the rules to words in other languages might not give a meaningful result....
developed by Western Airlines
Western Airlines
Western Airlines was a large airline based in California, with operations throughout the Western United States, and hubs at Los Angeles International Airport, Salt Lake City International Airport, and the former Stapleton International Airport in Denver...
in 1977 for the indexation and comparison of homophonous names.
The algorithm itself has a simple set of encoding rules but a more lengthy set of comparison rules.
The main mechanism being the similarity comparison which calculates the number of unmatched characters by comparing the strings from left to right and then from right to left and removing identical characters. This value is subtracted from 6 and then compared to a minimum threshold. The minimum threshold is defined by table A and is dependent upon the length of the strings.
The encoded name is known (perhaps incorrectly) as a personal numeric identifier (PNI). The PNI codex
Codex
A codex is a book in the format used for modern books, with multiple quires or gatherings typically bound together and given a cover.Developed by the Romans from wooden writing tablets, its gradual replacement...
can never contain more than 6 alpha only characters.
Match rating approach performs well with names containing the letter "y" unlike the original flavour of the NYSIIS algorithm. For example, the surnames "Smith" and "Smyth" are successfully matched.
MRA does not perform well with encoded names that differ in length by more than 2.
Encoding rules
- Delete all vowels unless the vowel begins the word
- Remove the second consonant of any double consonants present
- Reduce codex to 6 letters by joining the first 3 and last 3 letters only
Comparison rules
In this section, the words "string(s)" and "name(s)" mean "encoded string(s)" and "encoded name(s)".- If the length difference between the encoded strings is 3 or greater, then no similarity comparison is done.
- Obtain the minimum rating value by calculating the length sum of the encoded strings and using table A
- Process the encoded strings from left to right and remove any identical characters found from both strings respectively.
- Process the unmatched characters from right to left and remove any identical characters found from both names respectively.
- Subtract the number of unmatched characters from 6 in the longer string. This is the similarity rating.
- If the similarity rating equal to or greater than the minimum rating then the match is considered good.
Minimum threshold
The following table shows the mapping between the minimum rating and the string lengths.Sum of Lengths | Minimum Rating |
---|---|
≤ 4 | 5 |
4 < sum ≤ 7 | 4 |
7 < sum ≤ 11 | 3 |
= 12 | 2 |
Match rating approach examples
The table below displays the output of the match rating approach algorithm for some common homophonous names.Name | MRA Codex | Minimum Rating | Similarity Comparison Rating |
---|---|---|---|
Byrne | BYRN | 4 | 5 |
Boern | BRN | ||
Smith | SMTH | 3 | 5 |
Smyth | SMYTH | ||
Catherine | CTHRN | 3 | 4 |
Kathryn | KTHRYN |
External references
- An Overview of The Issues Related to the use of Personal Identifiers, HSMD, Statistics Canada
- C# Implementation: http://sounditout.codeplex.com/