The system of disjoint sets ( English disjoint-set , or union – find data structure ) is a data structure that allows you to administer a set of elements, divided into disjoint subsets. Moreover, each subset is assigned its representative - an element of this subset. An abstract data structure is defined by many three operations: .
It is used to store connected components in graphs , in particular, the Kraskal algorithm needs a similar data structure for effective implementation.
Content
Definition
Let be finite set divided into disjoint subsets ( classes) :
- .
To each subset a representative is appointed . The corresponding system of disjoint sets supports the following operations:
- : creates for an element new subset. Assigns the same element as a representative of the created subset.
- : combines both subsets belonging to representatives and , and appoints representative of a new subset.
- : defines for the subset to which the element belongs and returns its representative.
Algorithmic implementation
The trivial implementation preserves the belonging of elements from and representatives in the index array . In practice, many trees are often used. This can significantly reduce the time required for the Find operation. In this case, the representative is written to the root of the tree, and the remaining elements of the class to the nodes below it.
- : hangs the root of a lower tree under the root of a higher tree. If at the same time becomes a descendant , both nodes are swapped.
- : goes from to the root of the tree and returns it (the root in this case is a representative).
Heuristics
To speed up Union and Find operations, the Union -By-Size , Union-By-Height , Random-Union, and path compression heuristics can be used.
In Union-By-Size heuristic during operation the root of the smaller tree is hung under the root of the larger tree. Thanks to this approach, tree balancing is maintained. Depth of each subtree cannot exceed the value . When using this heuristic, the Find operation time in the worst case increases with before . For effective implementation, it is proposed to keep the number of nodes in the tree at the root.
The Union-By-Height heuristic is similar to Union-By-Size , but uses tree height instead of size.
The Random-Union heuristic uses the fact that you can not spend extra memory to save the number of nodes in the tree: it is enough to choose a root randomly - such a solution gives a speed on random queries that is quite comparable with other implementations. Nevertheless, if there are many queries such as “combine a large set with a small one”, this heuristic improves the expectation (that is, the average operating time) by only two times, therefore it is not recommended to use it without the path compression heuristic.
Path compression heuristics used to speed up operation . With each new search, all the elements that are on the way from the root to the desired element are hung under the root of the tree. In this case, the Find operation will work on average where Is the inverse of the Ackerman function . This allows you to significantly speed up the work, as for all values used in practice, takes a value less than 5.
Implementation Example
C ++ implementation:
const int MAXN = 1000 ;
int p [ MAXN ], rank [ MAXN ];
void MakeSet ( int x )
{
p [ x ] = x ;
rank [ x ] = 0 ;
}
int Find ( int x )
{
return ( x == p [ x ] ? x : p [ x ] = Find ( p [ x ]) );
}
void Union ( int x , int y )
{
if ( ( x = Find ( x )) == ( y = Find ( y )) )
return
if ( rank [ x ] < rank [ y ] )
p [ x ] = y ;
else {
p [ y ] = x ;
if ( rank [ x ] == rank [ y ] )
++ rank [ x ];
}
}
Free Pascal implementation:
1 const MAX_N = 1000 ;
2
3 var Parent , Rank : array [ 1 .. MAX_N ] of LongInt ;
four
5 procedure swap ( var x , y : LongInt ) ;
6 var tmp : LongInt ;
7 begin
8 tmp : = x ;
9 x : = y ;
10 y : = tmp ;
11 end ;
12
13 procedure MakeSet ( x : LongInt ) ;
14 begin
15 Parent [ x ] : = x ;
16 Rank [ x ] : = 0 ;
17 end ;
18
19 function Find ( x : LongInt ) : LongInt ;
20 begin
21 if ( Parent [ x ] <> x ) then
22 Parent [ x ] : = Find ( Parent [ x ] ) ;
23 Exit ( Parent [ x ] ) ;
24 end ;
25
26 procedure Union ( x , y : LongInt ) ;
27 begin
28 x : = Find ( x ) ;
29 y : = Find ( y ) ;
30 if ( x = y ) then exit () ;
31 if ( Rank [ x ] < Rank [ y ] ) then swap ( x , y ) ;
32
33 Parent [ y ] : = x ;
34 if ( Rank [ x ] = Rank [ y ] ) then
35 inc ( Rank [ x ] ) ;
36 end ;
See also
- Forest of disjoint sets
Literature
- Galler, Bernard A., and Michael J. Fisher. “An improved equivalence algorithm.” // Communications of the ACM , 7.5 (1964): 301-303. (eng.)
- Tarjan, Robert E., and Jan Van Leeuwen. “Worst-case analysis of set union algorithms.” // Journal of the ACM 31.2 (1984): 245–281. (eng.)
- Thomas Kormen et al. Algorithms: Construction and Analysis = Introduction to Algorithms. - 2nd ed. - M .: "Williams" , 2006. - S. 1296. - ISBN 0-07-013151-1 .
Links
- Union-Find / Kevin Wayne, Pearson-Addison Wesley
- Chapter 22: Data Structures For Disjoint Sets / Introduction to Algorithms, Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest
- Visualizer of the work of some data structures for disjoint sets / ITMO
- Implementing disjoint sets in the C ++ library collection Boost , 2006