Disjoint set system

The system of disjoint sets ( English disjoint-set , or union – find data structure ) is a data structure that allows you to administer a set of elements, divided into disjoint subsets. Moreover, each subset is assigned its representative - an element of this subset. An abstract data structure is defined by many three operations: $\{\mathrm {Union} ,\mathrm {Find} ,\mathrm {MakeSet} \}$ ${\ displaystyle \ {\ mathrm {Union}, \ mathrm {Find}, \ mathrm {MakeSet} \}}$ ${\ displaystyle \ {\ mathrm {Union}, \ mathrm {Find}, \ mathrm {MakeSet} \}}$ .

It is used to store connected components in graphs , in particular, the Kraskal algorithm needs a similar data structure for effective implementation.

Content

Definition

Let be $S$ ${\ displaystyle S}$ finite set divided into disjoint subsets ( classes) $X_{i}$ ${\ displaystyle X_ {i}}$ :

S=X_{0}\cup X_{1}\cup X_{2}\cup \ldots \cup X_{k}:X_{i}\cap X_{j}=\varnothing \quad \forall i,j\in \lbrace 0,1,\ldots ,k\rbrace ,i\neq j

{\ displaystyle S = X_ {0} \ cup X_ {1} \ cup X_ {2} \ cup \ ldots \ cup X_ {k}: X_ {i} \ cap X_ {j} = \ varnothing \ quad \ forall i , j \ in \ lbrace 0,1, \ ldots, k \ rbrace, i \ neq j}

.

To each subset $X_{i}$ ${\ displaystyle X_ {i}}$ a representative is appointed $r_{i}\in X_{i}$ ${\ displaystyle r_ {i} \ in X_ {i}}$ . The corresponding system of disjoint sets supports the following operations:

$\mathrm {MakeSet} (x)$ ${\ displaystyle \ mathrm {MakeSet} (x)}$ : creates for an element $x$ ${\ displaystyle x}$ new subset. Assigns the same element as a representative of the created subset.
$\mathrm {Union} (r,s)$ ${\ displaystyle \ mathrm {Union} (r, s)}$ : combines both subsets belonging to representatives $r$ ${\ displaystyle r}$ and $s$ ${\ displaystyle s}$ , and appoints $r$ ${\ displaystyle r}$ representative of a new subset.
$\mathrm {Find} (x)$ ${\ displaystyle \ mathrm {Find} (x)}$ : defines for $x\in S$ ${\ displaystyle x \ in S}$ the subset to which the element belongs and returns its representative.

Algorithmic implementation

The trivial implementation preserves the belonging of elements from $S$ ${\ displaystyle S}$ and representatives $r_{i}$ ${\ displaystyle r_ {i}}$ in the index array . In practice, many trees are often used. This can significantly reduce the time required for the Find operation. In this case, the representative is written to the root of the tree, and the remaining elements of the class to the nodes below it.

$\mathrm {Union} (r,s)$ ${\ displaystyle \ mathrm {Union} (r, s)}$ : hangs the root of a lower tree under the root of a higher tree. If at the same time $r$ ${\ displaystyle r}$ becomes a descendant $s$ ${\ displaystyle s}$ , both nodes are swapped.
$\mathrm {Find} (x)$ ${\ displaystyle \ mathrm {Find} (x)}$ : goes from $x$ ${\ displaystyle x}$ to the root of the tree and returns it (the root in this case is a representative).

Heuristics

To speed up Union and Find operations, the Union -By-Size , Union-By-Height , Random-Union, and path compression heuristics can be used.

In Union-By-Size heuristic during operation $\mathrm {Union} (r,s)$ ${\ displaystyle \ mathrm {Union} (r, s)}$ the root of the smaller tree is hung under the root of the larger tree. Thanks to this approach, tree balancing is maintained. Depth of each subtree $T$ ${\ displaystyle T}$ cannot exceed the value $\log \left|T\right|$ ${\ displaystyle \ log \ left | T \ right |}$ . When using this heuristic, the Find operation time in the worst case increases with $O(\log n)$ ${\ displaystyle O (\ log n)}$ before $O(n)$ ${\ displaystyle O (n)}$ . For effective implementation, it is proposed to keep the number of nodes in the tree at the root.

The Union-By-Height heuristic is similar to Union-By-Size , but uses tree height instead of size.

The Random-Union heuristic uses the fact that you can not spend extra $O(n)$ ${\ displaystyle O (n)}$ memory to save the number of nodes in the tree: it is enough to choose a root randomly - such a solution gives a speed on random queries that is quite comparable with other implementations. Nevertheless, if there are many queries such as “combine a large set with a small one”, this heuristic improves the expectation (that is, the average operating time) by only two times, therefore it is not recommended to use it without the path compression heuristic.

Path compression heuristics used to speed up operation $\mathrm {Find} (x)$ ${\ displaystyle \ mathrm {Find} (x)}$ . With each new search, all the elements that are on the way from the root to the desired element are hung under the root of the tree. In this case, the Find operation will work on average $\alpha (n)$ ${\ displaystyle \ alpha (n)}$ where $\alpha$ ${\ displaystyle \ alpha}$ Is the inverse of the Ackerman function . This allows you to significantly speed up the work, as $\alpha$ ${\ displaystyle \ alpha}$ for all values used in practice, takes a value less than 5.

Implementation Example

C ++ implementation:

  const int MAXN = 1000 ;

 int p [ MAXN ], rank [ MAXN ];

 void MakeSet ( int x ) 
 {
     p [ x ] = x ;
     rank [ x ] = 0 ;
 }

 int Find ( int x ) 
 {
     return ( x == p [ x ] ? x : p [ x ] = Find ( p [ x ]) );
 }

 void Union ( int x , int y ) 
 {
     if ( ( x = Find ( x )) == ( y = Find ( y )) )
         return
	
     if ( rank [ x ] < rank [ y ] )
         p [ x ] = y ;
     else {
         p [ y ] = x ;
         if ( rank [ x ] == rank [ y ] )
             ++ rank [ x ];
     }
 }

Free Pascal implementation:

  1 const MAX_N = 1000 ;
 2
 3 var Parent , Rank : array [ 1 .. MAX_N ] of LongInt ;
 four
 5 procedure swap ( var x , y : LongInt ) ;
 6 var tmp : LongInt ;
 7 begin
 8 tmp : = x ; 
 9 x : = y ; 
 10 y : = tmp ;
 11 end ;
 12
 13 procedure MakeSet ( x : LongInt ) ;
 14 begin
 15 Parent [ x ] : = x ;
 16 Rank [ x ] : = 0 ;
 17 end ;
 18
 19 function Find ( x : LongInt ) : LongInt ;
 20 begin
 21 if ( Parent [ x ] <> x ) then
 22 Parent [ x ] : = Find ( Parent [ x ] ) ;
 23 Exit ( Parent [ x ] ) ;
 24 end ;
 25
 26 procedure Union ( x , y : LongInt ) ;
 27 begin
 28 x : = Find ( x ) ;
 29 y : = Find ( y ) ;
 30 if ( x = y ) then exit () ;
 31 if ( Rank [ x ] < Rank [ y ] ) then swap ( x , y ) ;
 32  
 33 Parent [ y ] : = x ;
 34 if ( Rank [ x ] = Rank [ y ] ) then
 35 inc ( Rank [ x ] ) ;
 36 end ;

Literature

Galler, Bernard A., and Michael J. Fisher. “An improved equivalence algorithm.” // Communications of the ACM , 7.5 (1964): 301-303. (eng.)
Tarjan, Robert E., and Jan Van Leeuwen. “Worst-case analysis of set union algorithms.” // Journal of the ACM 31.2 (1984): 245–281. (eng.)
Thomas Kormen et al. Algorithms: Construction and Analysis = Introduction to Algorithms. - 2nd ed. - M .: "Williams" , 2006. - S. 1296. - ISBN 0-07-013151-1 .

Links

Union-Find / Kevin Wayne, Pearson-Addison Wesley
Chapter 22: Data Structures For Disjoint Sets / Introduction to Algorithms, Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest
Visualizer of the work of some data structures for disjoint sets / ITMO
Implementing disjoint sets in the C ++ library collection Boost , 2006