The Damerau - Levenshtein distance (named after the scientists Frederic Damerau and Vladimir Levenshtein ) is a measure of the difference of two lines of characters, defined as the minimum number of insertion, deletion, replacement and transposition (permutation of two adjacent characters) required to transfer one line to another. It is a modification of the Levenshtein distance : to the operations of inserting, deleting and replacing the characters defined in the Levenshtein distance, the operation of transposition (permutation) of characters is added.
Algorithm
Damerau - Levenshtein distance between two lines and determined by function as:
Where this is an indicator function equal to zero at and 1 otherwise.
Each recursive call corresponds to one of the following cases:
- corresponds to deleting a character (from a to b ),
- matches the insert (from a to b ),
- match or mismatch, depending on the match of characters,
- in the case of a permutation of two consecutive characters.
Implementations
- On python
1 def damerau_levenshtein_distance ( s1 , s2 ): 2 d = {} 3 lenstr1 = len ( s1 ) 4 lenstr2 = len ( s2 ) 5 for i in range ( - 1 , lenstr1 + 1 ): 6 d [( i , - 1 )] = i + 1 7 for j in range ( - 1 , lenstr2 + 1 ): 8 d [( - 1 , j )] = j + 1 9 10 for i in range ( lenstr1 ): 11 for j in range ( lenstr2 ): 12 if s1 [ i ] == s2 [ j ]: 13 cost = 0 14 else : 15 cost = 1 16 d [( i , j )] = min ( 17 d [( i - 1 , j )] + 1 , # deletion 18 d [( i , j - 1 )] + 1 , # insertion 19 d [( i - 1 , j - 1 )] + cost , # substitution 20 ) 21 if i and j and s1 [ i ] == s2 [ j - 1 ] and s1 [ i - 1 ] == s2 [ j ]: 22 d [( i , j )] = min ( d [( i , j )], d [ i - 2 , j - 2 ] + cost ) # transposition 23 24 return d [ lenstr1 - 1 , lenstr2 - 1 ]
- On Visual Basic for Applications (VBA)
1 Public Function WeightedDL ( source As String , target As String ) As Double 2 3 Dim deleteCost As Double 4 Dim insertCost As Double 5 Dim replaceCost As Double 6 Dim swapCost As Double 7 8 deleteCost = 1 9 insertCost = 1 10 replaceCost = 1 11 swapCost = 1 12 13 Dim i As Integer 14 Dim j As Integer 15 Dim k As Integer sixteen 17 If Len ( source ) = 0 Then 18 WeightedDL = Len ( target ) * insertCost 19 Exit Function 20 End If 21 22 If Len ( target ) = 0 Then 23 WeightedDL = Len ( source ) * deleteCost 24 Exit Function 25 End If 26 27 Dim table () As Double 28 ReDim table ( Len ( source ), Len ( target )) 29th 30 Dim sourceIndexByCharacter () As Variant 31 ReDim sourceIndexByCharacter ( 0 To 1 , 0 To Len ( source ) - 1 ) As Variant 32 33 If Left ( source , 1 ) <> Left ( target , 1 ) Then 34 table ( 0 , 0 ) = Application . Min ( replaceCost , ( deleteCost + insertCost )) 35 End If 36 37 sourceIndexByCharacter ( 0 , 0 ) = Left ( source , 1 ) 38 sourceIndexByCharacter ( 1 , 0 ) = 0 39 40 Dim deleteDistance As Double 41 Dim insertDistance As Double 42 Dim matchDistance As Double 43 44 For i = 1 To Len ( source ) - 1 45 46 deleteDistance = table ( i - 1 , 0 ) + deleteCost 47 insertDistance = (( i + 1 ) * deleteCost ) + insertCost 48 49 If Mid ( source , i + 1 , 1 ) = Left ( target , 1 ) Then 50 matchDistance = ( i * deleteCost ) + 0 51 Else 52 matchDistance = ( i * deleteCost ) + replaceCost 53 End If 54 55 table ( i , 0 ) = Application . Min ( Application . Min ( deleteDistance , insertDistance ), matchDistance ) 56 Next 57 58 For j = 1 To Len ( target ) - 1 59 60 deleteDistance = table ( 0 , j - 1 ) + insertCost 61 insertDistance = (( j + 1 ) * insertCost ) + deleteCost 62 63 If Left ( source , 1 ) = Mid ( target , j + 1 , 1 ) Then 64 matchDistance = ( j * insertCost ) + 0 65 Else 66 matchDistance = ( j * insertCost ) + replaceCost 67 End If 68 69 table ( 0 , j ) = Application . Min ( Application . Min ( deleteDistance , insertDistance ), matchDistance ) 70 Next 71 72 For i = 1 To Len ( source ) - 1 73 74 Dim maxSourceLetterMatchIndex As Integer 75 76 If Mid ( source , i + 1 , 1 ) = Left ( target , 1 ) Then 77 maxSourceLetterMatchIndex = 0 78 else 79 maxSourceLetterMatchIndex = - 1 80 End If 81 82 For j = 1 To Len ( target ) - 1 83 84 Dim candidateSwapIndex As Integer 85 candidateSwapIndex = - 1 86 87 For k = 0 To UBound ( sourceIndexByCharacter , 2 ) 88 If sourceIndexByCharacter ( 0 , k ) = Mid ( target , j + 1 , 1 ) Then candidateSwapIndex = sourceIndexByCharacter ( 1 , k ) 89 Next 90 91 Dim jSwap As Integer 92 jSwap = maxSourceLetterMatchIndex 93 94 deleteDistance = table ( i - 1 , j ) + deleteCost 95 insertDistance = table ( i , j - 1 ) + insertCost 96 matchDistance = table ( i - 1 , j - 1 ) 97 98 If Mid ( source , i + 1 , 1 ) <> Mid ( target , j + 1 , 1 ) Then 99 matchDistance = matchDistance + replaceCost 100 Else 101 maxSourceLetterMatchIndex = j 102 End If 103 104 Dim swapDistance As Double 105 106 If candidateSwapIndex <> - 1 And jSwap <> - 1 Then 107 108 Dim iSwap As Integer 109 iSwap = candidateSwapIndex 110 111 Dim preSwapCost 112 If iSwap = 0 And jSwap = 0 Then 113 preSwapCost = 0 114 Else 115 preSwapCost = table ( Application . Max ( 0 , iSwap - 1 ), Application . Max ( 0 , jSwap - 1 )) 116 End If 117 118 swapDistance = preSwapCost + (( i - iSwap - 1 ) * deleteCost ) + (( j - jSwap - 1 ) * insertCost ) + swapCost 119 120 else 121 swapDistance = 500 122 End If 123 124 table ( i , j ) = Application . Min ( Application . Min ( Application . Min ( deleteDistance , insertDistance ), _ 125 matchDistance ), swapDistance ) 126 127 Next 128 129 sourceIndexByCharacter ( 0 , i ) = Mid ( source , i + 1 , 1 ) 130 sourceIndexByCharacter ( 1 , i ) = i 131 132 Next 133 134 WeightedDL = table ( Len ( source ) - 1 , Len ( target ) - 1 ) 135 136 End Function
- In the Perl programming language as a module Text :: Levenshtein :: Damerau
- In the programming language PlPgSQL
- Additional module for MySQL
See also
- Levenshtein distance