Edit distance
Encyclopedia
In information theory
and computer science
, the edit distance between two strings of characters
generally refers to the Levenshtein distance
. However, according to Nico Jacobs, “The term ‘edit distance’ is sometimes used to refer to the distance in which insertions and deletions have equal cost and replacements have twice the cost of an insertion”.
It may also refer to the whole class of string metric
s that measure distance as the (weighted or unweighted) number of operations required to transform a string into another. There are several different ways to define an edit distance, depending on which edit operations are allowed: replace, delete, insert, transpose, and so on. There are algorithms to calculate its value under various definitions:
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...
and computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, the edit distance between two strings of characters
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....
generally refers to the Levenshtein distance
Levenshtein distance
In information theory and computer science, the Levenshtein distance is a string metric for measuring the amount of difference between two sequences...
. However, according to Nico Jacobs, “The term ‘edit distance’ is sometimes used to refer to the distance in which insertions and deletions have equal cost and replacements have twice the cost of an insertion”.
It may also refer to the whole class of string metric
String metric
In mathematics, string metrics are a class of metric that measure similarity or dissimilarity between two text strings for approximate string matching or comparison and in fuzzy string searching. For example the strings "Sam" and "Samuel" can be considered to be similar...
s that measure distance as the (weighted or unweighted) number of operations required to transform a string into another. There are several different ways to define an edit distance, depending on which edit operations are allowed: replace, delete, insert, transpose, and so on. There are algorithms to calculate its value under various definitions:
- Hamming distanceHamming distanceIn information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different...
- Levenshtein distanceLevenshtein distanceIn information theory and computer science, the Levenshtein distance is a string metric for measuring the amount of difference between two sequences...
(the most common definition, calculated by Hirschberg's algorithmHirschberg's algorithmHirschberg's algorithm, named after its inventor, Dan Hirschberg, is a dynamic programming algorithm that finds the least cost sequence alignment between two strings, where cost is measured as Levenshtein distance, defined to be the sum of the costs of insertions, replacements, deletions, and null...
or the Wagner–Fischer algorithm) - Damerau–Levenshtein distance
- Jaro–Winkler distance
See also
- Approximate string matchingApproximate string matchingIn computing, approximate string matching is the technique of finding strings that match a pattern approximately...
- Levenshtein distanceLevenshtein distanceIn information theory and computer science, the Levenshtein distance is a string metric for measuring the amount of difference between two sequences...
- Longest common subsequence problemLongest common subsequence problemThe longest common subsequence problem is to find the longest subsequence common to all sequences in a set of sequences . Note that subsequence is different from a substring, see substring vs. subsequence...
- String-to-string correction problemString-to-string correction problemThe string-to-string correction problem refers to the minimum number of edit operations necessary to change one string into another. A single edit operation may be changing a single symbol of the string into another, deleting, or inserting a symbol...
- String metricString metricIn mathematics, string metrics are a class of metric that measure similarity or dissimilarity between two text strings for approximate string matching or comparison and in fuzzy string searching. For example the strings "Sam" and "Samuel" can be considered to be similar...
External links
- Text::WagnerFischer, a Perl implementation of the Wagner-Fischer edit distance