Fuzzy matching
Encyclopedia
Fuzzy matching is a technique used in computer-assisted translation
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...

 and some other information technology applications such as record linkage
Record linkage
Record linkage refers to the task of finding records in a data set that refer to the same entity across different data sources...

. It works with matches that may be less than 100% perfect when finding correspondences between segments of a text and entries in a database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

 of previous translations. It usually operates at sentence
Sentence (linguistics)
In the field of linguistics, a sentence is an expression in natural language, and often defined to indicate a grammatical unit consisting of one or more words that generally bear minimal syntactic relation to the words that precede or follow it...

-level segments, but some translation technology allows matching at a phrasal
Phrase
In everyday speech, a phrase may refer to any group of words. In linguistics, a phrase is a group of words which form a constituent and so function as a single unit in the syntax of a sentence. A phrase is lower on the grammatical hierarchy than a clause....

 level. It is used when the translator is working with translation memory
Translation memory
A translation memory, or TM, is a database that stores so-called "segments", which can be sentences or sentence-like units that have previously been translated. A translation memory system stores the words, phrases and paragraphs that have already been translated, in order to aid human translators...

 (TM).

Background

When an exact match cannot be found in the TM database for the text being translated, there is an option to search for a match that is less than exact; the translator sets the threshold of the fuzzy match to a percentage value less than 100%, and the database will then return any matches in its memory corresponding to that percentage. Its primary function is to assist the translator by speeding up the translation process; fuzzy matching is not designed to replace the human translator.

History

Because of the polymorphous and dynamic nature of language
Language
Language may refer either to the specifically human capacity for acquiring and using complex systems of communication, or to a specific instance of such a system of complex communication...

, particularly English
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...

(which accounts for 90% of all source texts undergoing translation in the localisation industry), methods are always being sought to make the translation process easier and faster. Since the late 1980s, translation memory tools have been developed to increase productivity and make the whole translation process faster for the translator.

In the 1990s, fuzzy matching began to take off as a prominent feature of TM tools, and despite some issues concerning the extra work involved in editing a fuzzy match "proposal", it is still a popular subset of TM. It is currently a feature of most popular TM tools.

Methodology

The TM tool searches the database to locate segments that are an approximate match for a segment in a new source text to be translated. The TM, in effect, "proposes" the match to the translator; it is then up to the translator to accept this proposal or to edit this proposal to more fully equate with the new source text that is undergoing translation. In this way, fuzzy matching can speed up the translation process and lead to increased productivity.

This raises questions about the quality of the resulting translations. On occasions a translator is under pressure to deliver on time and is thus led to accept a fuzzy match proposal without checking its suitability and context. TM databases are built up by input from numerous different translators working on a variety of different texts, with a danger that sentences extracted from this word "tapestry" will be a stitched-together hodgepodge of styles, and the antithesis of the striven-after consistency – what some critics have dubbed "sentence salad". The question of faith in the TM's proposals can be a problem when trying to strike a balance between a faster translation process and the quality of that translation. Nevertheless, fuzzy matching is still an important part of the translator's tool-kit.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK