Clever Geek Handbook
📜 ⬆️ ⬇️

Distance Damerau - Levenshtein

The Damerau - Levenshtein distance (named after the scientists Frederic Damerau and Vladimir Levenshtein ) is a measure of the difference of two lines of characters, defined as the minimum number of insertion, deletion, replacement and transposition (permutation of two adjacent characters) required to transfer one line to another. It is a modification of the Levenshtein distance : to the operations of inserting, deleting and replacing the characters defined in the Levenshtein distance, the operation of transposition (permutation) of characters is added.

Algorithm

Damerau - Levenshtein distance between two linesa {\ displaystyle a}   andb {\ displaystyle b}   determined by functionda,b(|a|,|b|) {\ displaystyle d_ {a, b} (| a |, | b |)}   as:

da,b(i,j)={max(i,j)ifmin(i,j)=0,min{da,b(i-one,j)+oneda,b(i,j-one)+oneda,b(i-one,j-one)+one(ai≠bj)da,b(i-2,j-2)+oneifi,j>oneandai=bj-oneandai-one=bjmin{da,b(i-one,j)+oneda,b(i,j-one)+oneda,b(i-one,j-one)+one(ai≠bj)otherwise{\ displaystyle \ qquad d_ {a, b} (i, j) = {\ begin {cases} \ max (i, j) & {\ text {if}} \ min (i, j) = 0, \\ \ min {\ begin {cases} d_ {a, b} (i-1, j) +1 \\ d_ {a, b} (i, j-1) +1 \\ d_ {a, b} (i -1, j-1) +1 _ {(a_ {i} \ neq b_ {j})} \\ d_ {a, b} (i-2, j-2) +1 \ end {cases}} & { \ text {if}} i, j> 1 {\ text {and}} a_ {i} = b_ {j-1} {\ text {and}} a_ {i-1} = b_ {j} \\\ min {\ begin {cases} d_ {a, b} (i-1, j) +1 \\ d_ {a, b} (i, j-1) +1 \\ d_ {a, b} (i- 1, j-1) +1 _ {(a_ {i} \ neq b_ {j})} \ end {cases}} & {\ text {otherwise,}} \ end {cases}}}  

Whereone(ai≠bj) {\ displaystyle 1 _ {(a_ {i} \ neq b_ {j})}}   this is an indicator function equal to zero atai=bj {\ displaystyle a_ {i} = b_ {j}}   and 1 otherwise.

Each recursive call corresponds to one of the following cases:

  • da,b(i-one,j)+one{\ displaystyle d_ {a, b} (i-1, j) +1}   corresponds to deleting a character (from a to b ),
  • da,b(i,j-one)+one{\ displaystyle d_ {a, b} (i, j-1) +1}   matches the insert (from a to b ),
  • da,b(i-one,j-one)+one(ai≠bj){\ displaystyle d_ {a, b} (i-1, j-1) +1 _ {(a_ {i} \ neq b_ {j})}}   match or mismatch, depending on the match of characters,
  • da,b(i-2,j-2)+one{\ displaystyle d_ {a, b} (i-2, j-2) +1}   in the case of a permutation of two consecutive characters.

Implementations

  • On python
      1 def damerau_levenshtein_distance ( s1 , s2 ):
     2 d = {}
     3 lenstr1 = len ( s1 )
     4 lenstr2 = len ( s2 )
     5 for i in range ( - 1 , lenstr1 + 1 ):
     6 d [( i , - 1 )] = i + 1
     7 for j in range ( - 1 , lenstr2 + 1 ):
     8 d [( - 1 , j )] = j + 1
     9 
     10 for i in range ( lenstr1 ):
     11 for j in range ( lenstr2 ):
     12 if s1 [ i ] == s2 [ j ]:
     13 cost = 0
     14 else :
     15 cost = 1
     16 d [( i , j )] = min (
     17 d [( i - 1 , j )] + 1 , # deletion
     18 d [( i , j - 1 )] + 1 , # insertion
     19 d [( i - 1 , j - 1 )] + cost , # substitution
     20 )
     21 if i and j and s1 [ i ] == s2 [ j - 1 ] and s1 [ i - 1 ] == s2 [ j ]:
     22 d [( i , j )] = min ( d [( i , j )], d [ i - 2 , j - 2 ] + cost ) # transposition
     23 
     24 return d [ lenstr1 - 1 , lenstr2 - 1 ]
    
  • On Visual Basic for Applications (VBA)
      1 Public Function WeightedDL ( source As String , target As String ) As Double
     2
     3 Dim deleteCost As Double
     4 Dim insertCost As Double
     5 Dim replaceCost As Double
     6 Dim swapCost As Double
     7
     8 deleteCost = 1
     9 insertCost = 1
     10 replaceCost = 1
     11 swapCost = 1
     12
     13 Dim i As Integer
     14 Dim j As Integer
     15 Dim k As Integer
     sixteen
     17 If Len ( source ) = 0 Then
     18 WeightedDL = Len ( target ) * insertCost
     19 Exit Function
     20 End If
     21
     22 If Len ( target ) = 0 Then
     23 WeightedDL = Len ( source ) * deleteCost
     24 Exit Function
     25 End If
     26
     27 Dim table () As Double
     28 ReDim table ( Len ( source ), Len ( target ))
     29th
     30 Dim sourceIndexByCharacter () As Variant
     31 ReDim sourceIndexByCharacter ( 0 To 1 , 0 To Len ( source ) - 1 ) As Variant
     32
     33 If Left ( source , 1 ) <> Left ( target , 1 ) Then
     34 table ( 0 , 0 ) = Application .  Min ( replaceCost , ( deleteCost + insertCost ))
     35 End If
     36
     37 sourceIndexByCharacter ( 0 , 0 ) = Left ( source , 1 )
     38 sourceIndexByCharacter ( 1 , 0 ) = 0
     39
     40 Dim deleteDistance As Double
     41 Dim insertDistance As Double
     42 Dim matchDistance As Double
     43
     44 For i = 1 To Len ( source ) - 1
     45
     46 deleteDistance = table ( i - 1 , 0 ) + deleteCost
     47 insertDistance = (( i + 1 ) * deleteCost ) + insertCost
     48
     49 If Mid ( source , i + 1 , 1 ) = Left ( target , 1 ) Then
     50 matchDistance = ( i * deleteCost ) + 0
     51 Else
     52 matchDistance = ( i * deleteCost ) + replaceCost
     53 End If
     54
     55 table ( i , 0 ) = Application .  Min ( Application . Min ( deleteDistance , insertDistance ), matchDistance )
     56 Next
     57
     58 For j = 1 To Len ( target ) - 1
     59
     60 deleteDistance = table ( 0 , j - 1 ) + insertCost
     61 insertDistance = (( j + 1 ) * insertCost ) + deleteCost
     62
     63 If Left ( source , 1 ) = Mid ( target , j + 1 , 1 ) Then
     64 matchDistance = ( j * insertCost ) + 0
     65 Else
     66 matchDistance = ( j * insertCost ) + replaceCost
     67 End If
     68
     69 table ( 0 , j ) = Application .  Min ( Application . Min ( deleteDistance , insertDistance ), matchDistance )
     70 Next
     71
     72 For i = 1 To Len ( source ) - 1
     73
     74 Dim maxSourceLetterMatchIndex As Integer
     75
     76 If Mid ( source , i + 1 , 1 ) = Left ( target , 1 ) Then
     77 maxSourceLetterMatchIndex = 0
     78 else
     79 maxSourceLetterMatchIndex = - 1
     80 End If
     81
     82 For j = 1 To Len ( target ) - 1
     83
     84 Dim candidateSwapIndex As Integer
     85 candidateSwapIndex = - 1
     86
     87 For k = 0 To UBound ( sourceIndexByCharacter , 2 )
     88 If sourceIndexByCharacter ( 0 , k ) = Mid ( target , j + 1 , 1 ) Then candidateSwapIndex = sourceIndexByCharacter ( 1 , k )
     89 Next
     90
     91 Dim jSwap As Integer
     92 jSwap = maxSourceLetterMatchIndex
     93
     94 deleteDistance = table ( i - 1 , j ) + deleteCost
     95 insertDistance = table ( i , j - 1 ) + insertCost
     96 matchDistance = table ( i - 1 , j - 1 )
     97
     98 If Mid ( source , i + 1 , 1 ) <> Mid ( target , j + 1 , 1 ) Then
     99 matchDistance = matchDistance + replaceCost
     100 Else
     101 maxSourceLetterMatchIndex = j
     102 End If
     103
     104 Dim swapDistance As Double
     105
     106 If candidateSwapIndex <> - 1 And jSwap <> - 1 Then
     107
     108 Dim iSwap As Integer
     109 iSwap = candidateSwapIndex
     110
     111 Dim preSwapCost
     112 If iSwap = 0 And jSwap = 0 Then
     113 preSwapCost = 0
     114 Else
     115 preSwapCost = table ( Application . Max ( 0 , iSwap - 1 ), Application . Max ( 0 , jSwap - 1 ))
     116 End If
     117
     118 swapDistance = preSwapCost + (( i - iSwap - 1 ) * deleteCost ) + (( j - jSwap - 1 ) * insertCost ) + swapCost
     119
     120 else
     121 swapDistance = 500
     122 End If
     123
     124 table ( i , j ) = Application .  Min ( Application . Min ( Application . Min ( deleteDistance , insertDistance ), _
     125 matchDistance ), swapDistance )
     126
     127 Next
     128
     129 sourceIndexByCharacter ( 0 , i ) = Mid ( source , i + 1 , 1 )
     130 sourceIndexByCharacter ( 1 , i ) = i
     131
     132 Next
     133
     134 WeightedDL = table ( Len ( source ) - 1 , Len ( target ) - 1 )
     135
     136 End Function
    
  • In the Perl programming language as a module Text :: Levenshtein :: Damerau
  • In the programming language PlPgSQL
  • Additional module for MySQL

See also

  • Levenshtein distance


Source - https://ru.wikipedia.org/w/index.php?title=Damerau__Distance___Levenshteina&oldid=100175296


More articles:

  • Sprenger, Alois
  • Allured
  • Rigert, David Adamovich
  • President of Brazil
  • Gums
  • Cunha, Roberto Emilio da
  • Sinners
  • Shatrov, Mikhail Filippovich
  • 2000 Grand Prix Motor Racing
  • Indian Space Research Organization

All articles

Clever Geek | 2019