Prefix function

Line prefix function $S$ ${\ displaystyle S}$ $S$ and position $i$ ${\ displaystyle i}$ $i$ in it is the length $k$ ${\ displaystyle k}$ $k$ largest substring prefix $S[1..i]$ ${\ displaystyle S [1..i]}$ ${\ displaystyle S [1..i]}$ , which is also the suffix of this substring. That is, at the beginning of a substring $S[1..i]$ ${\ displaystyle S [1..i]}$ ${\ displaystyle S [1..i]}$ long $i$ ${\ displaystyle i}$ $i$ need to find such a prefix of maximum length $k<i$ ${\ displaystyle k <i}$ ${\ displaystyle k <i}$ which would be the suffix of the given substring $\left(S[1..k]==S\left[(i-k+1)..i\right]\right)$ ${\ displaystyle \ left (S [1..k] == S \ left [(i-k + 1) .. i \ right] \ right)}$ ${\ displaystyle \ left (S [1..k] == S \ left [(i-k + 1) .. i \ right] \ right)}$ .

Designated $\pi (S,i)$ ${\ displaystyle \ pi (S, i)}$ ${\ displaystyle \ pi (S, i)}$ ; Where $S\in \Sigma ^{+}$ ${\ displaystyle S \ in \ Sigma ^ {+}}$ ${\ displaystyle S \ in \ Sigma ^ {+}}$ - line; $1\leqslant i<\left|S\right|$ ${\ displaystyle 1 \ leqslant i <\ left | S \ right |}$ ${\ displaystyle 1 \ leqslant i <\ left | S \ right |}$ Is the length of the substring in S. It is believed that $\pi (S,1)=0$ ${\ displaystyle \ pi (S, 1) = 0}$ ${\ displaystyle \ pi (S, 1) = 0}$ .

Often, the prefix function is defined in vector form:

Line prefix function $S\in \Sigma ^{+}$ ${\ displaystyle S \ in \ Sigma ^ {+}}$ ${\ displaystyle S \ in \ Sigma ^ {+}}$ there is a vector $\pi (S)\in \mathbb {Z} ^{\left|S\right|-1}$ ${\ displaystyle \ pi (S) \ in \ mathbb {Z} ^ {\ left | S \ right | -1}}$ ${\ displaystyle \ pi (S) \ in \ mathbb {Z} ^ {\ left | S \ right | -1}}$ , each $i$ ${\ displaystyle i}$ $i$ whose element is equal to $\pi (S,i)$ ${\ displaystyle \ pi (S, i)}$ ${\ displaystyle \ pi (S, i)}$ .

For example, for the string 'abcdabscabcdabia' the prefix function would be: π (abcdabscabcdabia) = '0000120012345601' .

This function is used, for example, in the Knut-Morris-Pratt algorithm . [one]

Computation Algorithm

Line characters are numbered from 1.

Let be $\pi (S,i)=k$ ${\ displaystyle \ pi (S, i) = k}$ $\pi (S,i)=k$ . Let's try to calculate the prefix function for $i+1$ ${\ displaystyle i + 1}$ $i+1$ .

If a $S[i+1]=S[k+1]$ ${\ displaystyle S [i + 1] = S [k + 1]}$ $S[i+1]=S[k+1]$ then naturally $\pi (S,i+1)=k+1$ ${\ displaystyle \ pi (S, i + 1) = k + 1}$ $\pi (S,i+1)=k+1$ . If not, try smaller suffixes. Iterate over all suffixes by linear search. You can use the already calculated values of the prefix function. You may notice that $S[1\ldots \pi (S,k)]$ ${\ displaystyle S [1 \ ldots \ pi (S, k)]}$ $S[1\ldots \pi (S,k)]$ will also be the suffix of the string $S[1\ldots i]$ ${\ displaystyle S [1 \ ldots i]}$ $S[1\ldots i]$ , because $k$ ${\ displaystyle k}$ $k$ - length of the prefix-suffix at this point. For anyone $j\in (k,i)$ ${\ displaystyle j \ in (k, i)}$ $j\in (k,i)$ line $S[1\ldots j]$ ${\ displaystyle S [1 \ ldots j]}$ $S[1\ldots j]$ the suffix will not be. Thus, the algorithm is obtained:

At $S[i+1]=S[k+1]$ ${\ displaystyle S [i + 1] = S [k + 1]}$ $S[i+1]=S[k+1]$ - put $\pi (S,i+1)=k+1$ ${\ displaystyle \ pi (S, i + 1) = k + 1}$ $\pi (S,i+1)=k+1$ .
Otherwise when $k=0$ ${\ displaystyle k = 0}$ $k=0$ - put $\pi (S,i+1)=0$ ${\ displaystyle \ pi (S, i + 1) = 0}$ $\pi (S,i+1)=0$ .
Otherwise, install $k:=\pi (S,k)$ ${\ displaystyle k: = \ pi (S, k)}$ $k:=\pi (S,k)$ and go to step 1.

For the string 'abcdabscabcdabia' calculation will be like this:

  'a'! = 'b' => π = 0;
 'a'! = 'c' => π = 0;
 'a'! = 'd' => π = 0;
 'a' == 'a' => π = π + 1 = 1;
 'b' == 'b' => π = π + 1 = 2;
 'c'! = 's' => π = 0;
 'a'! = 'c' => π = 0;
 'a' == 'a' => π = π + 1 = 1;
 'b' == 'b' => π = π + 1 = 2;
 'c' == 'c' => π = π + 1 = 3;
 'd' == 'd' => π = π + 1 = 4;
 'a' == 'a' => π = π + 1 = 5;
 'b' == 'b' => π = π + 1 = 6;
 's'! = 'i' => π = 0;
 'a' == 'a' => π = π + 1 = 1;

A slightly more interesting example is for the string 'abcdabcabcdabcdab' calculation would be like this:

  1 S [1] = 'a', k = π = 0;
 2 S [2] = 'b'! = S [k + 1] => k = π = 0;
 3 S [3] = 'c'! = S [1] => k = π = 0;
 4 S [4] = 'd'! = S [1] => k = π = 0;
 5 S [5] = 'a' == S [1] => k = π = 1;
 6 S [6] = 'b' == S [2] => k = π = 2;
 7 S [7] = 'c' == S [3] => k = π = 3;
 8 S [8] = 'a'! = S [4] => k: = π (S, 3) = 0, S [8] == S [1] => k = π = 1;
 9 S [9] = 'b' == S [2] => k = π = 2;
 10 S [10] = 'c' == S [3] => k = π = 3;
 11 S [11] = 'd' == S [4] => k = π = 4;
 12 S [12] = 'a' == S [5] => k = π = 5;
 13 S [13] = 'b' == S [6] => k = π = 6;
 14 S [14] = 'c' == S [7] => k = π = 7;
 15 S [15] = 'd'! = S [8] => k: = π (S, 7) = 3, S [15] == S [4] => k = π = 4;
 16 S [16] = 'a' == S [5] => k = π = 5;
 17 S [17] = 'b' == S [6] => k = π = 6;

And the result is: '00001231234567456' .

Speed

Despite the fact that paragraph 3 is an inner loop, the calculation time of the prefix function is estimated as $O(|S|)$ ${\ displaystyle O (| S |)}$ . Let us prove it.

Everything $i$ ${\ displaystyle i}$ are divided into:

Magnifying $k$ ${\ displaystyle k}$ per unit. The loop goes through one iteration.
Non-zero $k$ ${\ displaystyle k}$ . The loop also goes through one iteration. Cases 1 and 2 in total no more $\left|S\right|-1$ ${\ displaystyle \ left | S \ right | -1}$ pieces.
Not changing or decreasing positive $k$ ${\ displaystyle k}$ . Since inside the loop the value $k$ ${\ displaystyle k}$ can only decrease, and increase $k$ ${\ displaystyle k}$ only possible by one, then the total value $k$ ${\ displaystyle k}$ cannot decrease more than $\left|S\right|-2$ ${\ displaystyle \ left | S \ right | -2}$ times, which limits the number of operations of the internal cycle.

Total algorithm requires no more $2\left|S\right|$ ${\ displaystyle 2 \ left | S \ right |}$ iterations, which proves the order of speed $O(\left|S\right|)$ ${\ displaystyle O (\ left | S \ right |)}$ . The worst case for an algorithm is the case of processing a string of the form 'aa…ab' .

C ++ implementation example

  vector < int > compute_prefix_function ( const string & s ) 
 {
	 int len = s .  length ();
	 vector < int > p ( len );  // prefix function values
	                     // index of the vector corresponds to the number of the last character of the argument
	 p [ 0 ] = 0 ;  // for a prefix of zero and one character, the function is zero

     int k = 0 ;	
	 for ( int i = 1 ; i < len ; ++ i ) {	
		 while (( k > 0 ) && ( s [ k ] ! = s [ i ])) 
			 k = p [ k - 1 ]; 
		 if ( s [ k ] == s [ i ])
			 ++ k ;
		 p [ i ] = k ;
	 }
	 return p ;
 }

Delphi implementation example

  type Arr = array of LongInt ;
 function compute_prefix_function ( s : string ) : Arr ;
 var k , i , l : LongInt ;
 begin
   l : = Length ( s ) ;
   SetLength ( Result , 1 + l ) ;
   Result [ 0 ] : = 0 ;
   Result [ 1 ] : = 0 ;
   k : = 0 ;
   for i : = 2 to l do
   begin
     while ( k > 0 ) and ( s [ k + 1 ] <> s [ i ]) do
       k : = Result [ k ] ;
     if s [ k + 1 ] = s [ i ] then
       Inc ( k ) ;
     Result [ i ] : = k ;
   end ;
 end ;

Ruby implementation example

  def prefix s
   n = s .  length - 1
   p = Array .  new ( n , 0 )
   n  times do |  i |
     i + = 1
     j = p [ i - 1 ]
     while j > 0 && s [ i ] !  = s [ j ]
       j = p [ j - 1 ]
     end
     j + = 1 if s [ i ] == s [ j ]
     p [ i ] = j
   end
   return p
 end

Java implementation example

  public static List < Integer > compute_prefix_function ( String s ) {
     int len = s .  length ();
     List < Integer > p = new ArrayList <> ();  // prefix function values
     for ( int i = 0 ; i < len ; ++ i ) { // fill the sheet with zeros default
         p .  add ( 0 );
     }
                                          // sheet index corresponds to the number of the last character of the argument
     int k = 0 ;
     for ( int i = 1 ; i < len ; ++ i ) {
         while (( k > 0 ) && ( s . charAt ( k ) ! = s . charAt ( i )))
             k = p .  get ( k - 1 );
         if ( s . charAt ( k ) == s . charAt ( i ))
             ++ k ;
         p .  set ( i , k );
     }
     return p ;
 }

Python implementation example

  def prefix ( s ):
 v = [ 0 ] * len ( s )
 for i in range ( 1 , len ( s )):
 k = v [ i - 1 ]
 while k > 0 and s [ k ] ! = s [ i ]:
 k = v [ k - 1 ]
 if s [ k ] == s [ i ]:
 k = k + 1
 v [ i ] = k
 return v

Haskell implementation example

  prefix :: String -> [ Int ]
 prefix [] = []
 prefix str @ ( _ : t ) = 
     go 0 0 str t []
   where 
     go vlen _ [] _ v = reverse v
     go vlen k s @ ( si : ssi ) ( sk : ssk ) v 
         |  si == sk = go ( vlen + 1 ) ( k + 1 ) ssi ssk (( k + 1 ) : v )
         |  k == 0 = go ( vlen + 1 ) 0 ssi str ( 0 : v )
         |  otherwise = 
             let
                k ' = v !!  ( vlen - k )
             in
                go vlen k ' s ( drop k' str ) v