Hopcroft - Carp Algorithm

Hopcroft - Carp Algorithm
Hopcroft - Carp Algorithm
Destination	Search for maximum matching
Data structure	Graph
Worst time
Memory overhead

The Hopcroft – Karp algorithm is an algorithm that takes a bipartite graph as an input and returns the maximum matching , that is, the largest set of edges, such that no two have a common vertex. The asymptotic behavior of the running time of the algorithm is $O(|E|{\sqrt {|V|}})$ ${\ displaystyle O (| E | {\ sqrt {| V |}})}$ ${\ displaystyle O (| E | {\ sqrt {| V |}})}$ in the worst case (here $E$ ${\ displaystyle E}$ $E$ Is the set of edges of the graph, and $V$ ${\ displaystyle V}$ $V$ - many of its peaks). For tight graphs , run time is limited $O(|V|^{2.5})$ ${\ displaystyle O (| V | ^ {2.5})}$ ${\ displaystyle O (| V | ^ {2.5})}$ , and for a random graph, the algorithm works in almost linear time.

The algorithm was created by John Hopcroft and Richard Karp in 1973 ^[1] . Like previously created algorithms (such as the Hungarian algorithm and the algorithm from Edmonds ^[2] , the Hopcroft – Karp algorithm in a cycle increases matching by finding increasing paths ( chains whose edges alternately belong to the matching and do not belong to it, and the first and last vertices do not belong to the matching; by alternating the matching along the chain, that is, removing the edges that were in the chain from the matching and adding those that were not in it, you can get a greater matching). As the path increases, the algorithm finds the maximum set of shortest magnifying paths. As a result, only $O({\sqrt {|V|}})$ ${\ displaystyle O ({\ sqrt {| V |}})}$ ${\ displaystyle O ({\ sqrt {| V |}})}$ iterations. The same idea is used in more complex algorithms to search for matching in non-bipartite graphs with the same asymptotic run time as the Hopcroft-Karp algorithm.

Task Description

The task of finding the greatest matching in a bipartite graph can be informally described as follows. There is a group of boys and girls. Some boys are familiar with some girls. It is required to make as many pairs as possible for the dance, consisting of a boy and a girl who are familiar with each other ^[3] .

Increasing Paths

A vertex that is not the end of any matching rib $M$ ${\ displaystyle M}$ $M$ is called the free vertex (for this matching). The algorithm is based on the concept of a magnifying path - a path that begins and ends at a free vertex, and inside the path edges that belong and do not belong to matching $M$ ${\ displaystyle M}$ alternate. It follows from the definition that all the vertices of such a path, except the first and last, must be non-free. The increasing path can consist of two free vertices and one edge between them (which does not lie in the matching).

If a $M$ ${\ displaystyle M}$ - matching and $P$ ${\ displaystyle P}$ - increasing path in $M$ ${\ displaystyle M}$ , then the symmetric difference of two sets of edges $M\oplus P$ ${\ displaystyle M \ oplus P}$ is a matching size $|M|+1$ ${\ displaystyle | M | +1}$ . Thus, finding increasing paths, you can increase the size of matching.

On the contrary let $M$ ${\ displaystyle M}$ not optimal and let $P$ ${\ displaystyle P}$ there is a symmetric difference $M\oplus M^{*}$ ${\ displaystyle M \ oplus M ^ {*}}$ where $M^{*}$ ${\ displaystyle M ^ {*}}$ - optimal matching. Insofar as $M$ ${\ displaystyle M}$ and $M^{*}$ ${\ displaystyle M ^ {*}}$ - matching, each vertex in $P$ ${\ displaystyle P}$ has a degree of not more than two. Means $P$ ${\ displaystyle P}$ must form many disjoint increasing paths and cycles or paths in which the edges from the matching are as many as not from it. Size difference $M$ ${\ displaystyle M}$ and $M^{*}$ ${\ displaystyle M ^ {*}}$ and there are a number of magnifying paths $P$ ${\ displaystyle P}$ . So, if there is a matching $M^{*}$ ${\ displaystyle M ^ {*}}$ greater than the current match $M$ ${\ displaystyle M}$ , there must also be a magnifying path. If the magnifying path does not exist, you can successfully abort the algorithm since $M$ ${\ displaystyle M}$ should be optimal ^[4] .

The increasing paths in matching problems are closely related to the increasing paths arising in the maximum flow problems , along the paths along which the flow between the source and the drain can be increased. You can reduce the problem of finding the greatest matching to the problem of finding the maximum flow ^[5] . The technique used in the Hopcroft – Karp algorithm can be generalized for an arbitrary transport network , which will lead to the Dinitz algorithm ^[6] .

Algorithm

An example of the algorithm: two phases were required.

Below is the structure of the algorithm:

Entrance : Bipartite Count

G(U\cup V,E)

{\ displaystyle G (U \ cup V, E)}

Output : Matching

M\subseteq E

{\ displaystyle M \ subseteq E}

M\leftarrow \emptyset

{\ displaystyle M \ leftarrow \ emptyset}

cycle

{\mathcal {P}}\leftarrow \{P_{1},P_{2},\dots ,P_{k}\}

{\ displaystyle {\ mathcal {P}} \ leftarrow \ {P_ {1}, P_ {2}, \ dots, P_ {k} \}}

maximum set of shortest increasing paths disjoint over vertices

M\leftarrow M\oplus (P_{1}\cup P_{2}\cup \dots \cup P_{k})

{\ displaystyle M \ leftarrow M \ oplus (P_ {1} \ cup P_ {2} \ cup \ dots \ cup P_ {k})}

until

{\mathcal {P}}\neq \emptyset

{\ displaystyle {\ mathcal {P}} \ neq \ emptyset}

In more detail, let them be given $U$ ${\ displaystyle U}$ and $V$ ${\ displaystyle V}$ - sets of vertices of a bipartite graph $G$ ${\ displaystyle G}$ , but $E$ ${\ displaystyle E}$ - the set of its edges connecting the vertices of $U$ ${\ displaystyle U}$ and $V$ ${\ displaystyle V}$ . Algorithm starting with an empty matching $M$ ${\ displaystyle M}$ , sequentially increases it. In each phase, the algorithm does the following:

Width search (BFS) splits the vertices of the graph into layers. BFS Starts with Many Free Tops $U$ ${\ displaystyle U}$ , which thereby form the first partition layer (thus, the first layer contains only ribs not covered by matching). At subsequent levels of the search, the algorithm adds vertices to a new level, alternating edges: it will alternately add vertices connected by an edge either in the matching or outside it, therefore, in the process of searching from the vertex from $U$ ${\ displaystyle U}$ we will always pass along the edges not from matching, but if the vertex from $V$ ${\ displaystyle V}$ - the opposite. Search aborts at level $k$ ${\ displaystyle k}$ as soon as at least one free peak from $V$ ${\ displaystyle V}$ .
All free peaks from $V$ ${\ displaystyle V}$ on this layer $k$ ${\ displaystyle k}$ denote as $F$ ${\ displaystyle F}$ . It turns out, top $v$ ${\ displaystyle v}$ belongs $F$ ${\ displaystyle F}$ if and only if the shortest lengthening path ends in it.
The algorithm finds the maximum set of paths disjoint over vertices of length $k$ ${\ displaystyle k}$ . This set can be found by DFS, which uses the previously found layering. The search can only go through the edges that lead to the unused vertices of the previous layer, and the path in the depth search tree should alternate relative to matching $M$ ${\ displaystyle M}$ . As soon as the increasing path enters one of the peaks $F$ ${\ displaystyle F}$ , you should start DFS from the next peak.
Each of the paths to be found is used to increase $M$ ${\ displaystyle M}$ .

The algorithm is interrupted when at any phase the BFS does not find any incremental paths (i.e. $F$ ${\ displaystyle F}$ empty).

Analysis

Each phase consists of one BFS and one DFS, so one phase is performed per $O(|E|)$ ${\ displaystyle O (| E |)}$ . Consequently, the first ${\sqrt {|V|}}$ ${\ displaystyle {\ sqrt {| V |}}}$ phases in the graph with $|V|$ ${\ displaystyle | V |}$ peaks and $|E|$ ${\ displaystyle | E |}$ ribs have a cost $O(|E|{\sqrt {|V|}})$ ${\ displaystyle O (| E | {\ sqrt {| V |}})}$ . It can be shown that each phase increases the length of the shortest extension path by at least 1: the phase finds the maximum set of complementary paths of a given length, so that any remaining path should be longer. Therefore, after the first ${\sqrt {|V|}}$ ${\ displaystyle {\ sqrt {| V |}}}$ the phases of the algorithm are completed, the shortest remaining incremental path has a length of at least ${\sqrt {|V|}}$ ${\ displaystyle {\ sqrt {| V |}}}$ . However, the symmetric difference between the optimal matching and the current matching $M$ ${\ displaystyle M}$ found in previous phases, forms a lot of vertex-disjoint increasing paths and alternating cycles. If each path has a length of at least ${\sqrt {|V|}}$ ${\ displaystyle {\ sqrt {| V |}}}$ maybe no more than ${\sqrt {|V|}}$ ${\ displaystyle {\ sqrt {| V |}}}$ paths, and the size of the optimal matching may differ from the size $M$ ${\ displaystyle M}$ no more than ${\sqrt {|V|}}$ ${\ displaystyle {\ sqrt {| V |}}}$ . Since each phase of the algorithm increases the matching size by at least 1, no more can happen ${\sqrt {|V|}}$ ${\ displaystyle {\ sqrt {| V |}}}$ phases.

Since in the worst case the algorithm has $2{\sqrt {|V|}}$ ${\ displaystyle 2 {\ sqrt {| V |}}}$ phases, the total operating time is $O(|E|{\sqrt {|V|}})$ ${\ displaystyle O (| E | {\ sqrt {| V |}})}$ in the worst case ^[1] .

However, in many cases, the algorithm can be much faster than the worst case estimate. For example, in the case of a sparse bipartite random graph in 2006 it was shown ^[7] (improving the previous result ^[8] ) that, with a high probability, all non-optimal matching have increasing paths of a logarithmic length. As a result, for such graphs, the number of iterations $O(\log |V|)$ ${\ displaystyle O (\ log | V |)}$ , and the running time of the algorithm is $O(|E|\log |V|)$ ${\ displaystyle O (| E | \ log | V |)}$ .

Comparison with other search algorithms for maximum matching

For sparse graphs, the Hopcroft – Karp algorithm has the best asymptotics in the worst case of all known algorithms, but for dense graphs, the newer algorithm ^[9] has a slightly better bound $O\left(n^{1.5}{\sqrt {\frac {m}{\log n}}}\right)$ ${\ displaystyle O \ left (n ^ {1.5} {\ sqrt {\ frac {m} {\ log n}}} \ right)}$ . This algorithm is based on the pre-flow push algorithm , and when the matching becomes close to optimal, it switches to the Hopcroft-Karp method.

Several authors conducted an experimental comparison of maximum matching algorithms. Their results showed that in the general case the Hopcroft – Karp algorithm is not as good in practice as in theory: it is superior in performance to both simple BFS and DFS strategies for searching for the increasing path, and algorithms based on the preflow method ^[10] .

Non-bipartite graphs

The same idea of finding the maximum set of shortest magnifying paths also works to find matches of maximum power in non-bipartite graphs, and for the same reasons, the algorithm will have no more $O({\sqrt {|V|}})$ ${\ displaystyle O ({\ sqrt {| V |}})}$ phases. However, for non-bipartite graphs, the search for extension paths in each phase is more complicated. Based on earlier work, Micali & Vazirani (1980 ) showed how to perform a phase in linear time, which led to the creation of an algorithm with the same upper bound as the Hopcroft-Karp algorithm for bipartite graphs. The Micali - Vazirani method is complex and the authors did not provide complete evidence of their results; subsequently, Peterson & Loui (1988 ) published a complete rationale for the Micali-Vazirani algorithm, and other algorithms were published: Gabow & Tarjan (1991 ) and Blum (2001 ). In 2012, Vazirani proposed a new, simplified proof of the Micali algorithm - Vazirani ^[11] .

Pseudo-code

Below is a pseudo-code of the algorithm, close to implementation in Java ^[12] .

  / * 
  G = U ∪ V ∪ {NIL}
  where U and V are partition of graph and NIL is a special null vertex
 * /
  
 function BFS ()
     for u in U
         if Pair_U [u] == NIL
             Dist [u] = 0
             Enqueue (Q, u)
         else
             Dist [u] = ∞
     Dist [NIL] = ∞
     while Empty (Q) == false
         u = Dequeue (Q)
         if Dist [u] <Dist [NIL] 
             for each v in Adj [u]
                 if Dist [Pair_V [v]] == ∞
                     Dist [Pair_V [v]] = Dist [u] + 1
                     Enqueue (Q, Pair_V [v])
     return Dist [NIL]! = ∞

 function DFS (u)
     if u! = NIL
         for each v in Adj [u]
             if Dist [Pair_V [v]] == Dist [u] + 1
                 if DFS (Pair_V [v]) == true
                     Pair_V [v] = u
                     Pair_U [u] = v
                     return true
         Dist [u] = ∞
         return false
     return true

 function hopcroft-karp
     for each u in U
         Pair_U [u] = NIL
     for each v in V
         Pair_V [v] = NIL
     matching = 0
     while BFS () == true
         for each u in U
             if Pair_U [u] == NIL
                 if DFS (u) == true
                     matching = matching + 1
     return matching

Explanation

Let the graph consist of fractions of U and V. The key idea is to add two dummy vertices on each side of the graph: uDummy, connected to all uncovered vertices from U, and vDummy, connected to all uncovered vertices from V. Now, if we run BFS from uDummy to vDummy, we get the shortest path between an uncovered vertex from U and an uncovered vertex from V. Due to the bipartite graph, the path will alternate between U and V. However, you need to make sure that going from V to U, we always choose an edge from matching . If there are no vertices left from the match, then we end up in vDummy. Guided by this criterion in the BFS process, in the end we get the shortest increasing path.

After the shortest increasing path is found, all paths that are longer than it must be ignored. The BFS marks the vertices whose distance from the source is 0. After executing the BFS, we can, starting from each vertex from U that is not in the matching, walking along a path in which the distance to the next vertex is greater than the distance to the previous one by 1. If we are at the end we arrive at vDummy, the distance to which is 1 greater than the distance to one of the vertices from V, along which you can walk along one of the shortest paths. In this case, we can go further and update the match for the peaks along the way. Note that each vertex V on the path, except the very last one, is already in matching. Therefore, updating the matching is equivalent to a symmetric difference (that is, removing those edges of the path that were in the matching and adding those that were not.

How to make sure that the increasing paths do not intersect at the tops? It is already provided. After the symmetric difference is fulfilled, none of the vertices of the path will be considered again, because Dist [Pair_V [v]] will not be equal to Dist [u] + 1 (but it will be exactly Dist [u]).

Why do we need the following lines?

  Dist [u] = ∞
 return false

When we cannot find any shortest incremental path from vertex u, DFS returns False. In this case, it will be good to mark these peaks so as not to visit them again. To mark them, we simply set Dist [u] to be infinite.

We do not need uDummy, because it is only to add all the vertices not from matching to the BFS queue. This can be done simply by initialization. vDummy can be added to U for convenience in many implementations, and initialize matching for all vertices from V with a pointer to vDummy. Therefore, if in the end the last vertex from U is not matching with any vertex from V, then the last vertex of our extension path will be vDummy. In the pseudocode above, vDummy is denoted by Nil.

Notes

↑ ¹ ² Hopcroft & Karp (1973 )
↑ Edmonds (1965 )
↑ Algorithms for finding the maximum matching in a bipartite graph (neopr.) .
↑ Edmonds, 1965 , p. 453.
↑ Ahuja, Magnanti & Orlin (1993 ), section 12.3, bipartite cardinality matching problem, pp. 469-470.
↑ Yefim Dinitz. Dinitz 'Algorithm: The Original Version and Even's Version // Theoretical Computer Science: Essays in Memory of Shimon Even / ed. Oded Goldreich, Arnold L. Rosenberg, and Alan L. Selman. - Springer, 2006. - P. 218–240. - ISBN 978-3540328803 .
↑ Bast et al. (2006 )
↑ Motwani (1994 )
↑ Alt et al. (1991 )
↑ Chang & McCormick (1990 ); Darby-Dowman (1980 ); Setubal (1993 ); Setubal (1996 ).
↑ Vazirani (2012 )
↑ Java Program to Implement Hopcroft Algorithm - Sanfoundry (Eng.) , Sanfoundry (November 20, 2013). Date of treatment April 6, 2017.

Literature

Ahuja, Ravindra K .; Magnanti, Thomas L. & Orlin, James B. (1993), Network Flows: Theory, Algorithms and Applications , Prentice-Hall .
Alt, H .; Blum, N .; Mehlhorn, K. & Paul, M. (1991), " Computing a maximum cardinality matching in a bipartite graph in time $\scriptstyle O\left(n^{1.5}{\sqrt {\frac {m}{\log n}}}\right)$ ${\ displaystyle \ scriptstyle O \ left (n ^ {1.5} {\ sqrt {\ frac {m} {\ log n}}} \ right)}$ ", Information Processing Letters T. 37 (4): 237–240 , DOI 10.1016 / 0020-0190 (91) 90195-N .
Bast, Holger; Mehlhorn, Kurt; Schafer, Guido & Tamaki, Hisao (2006), " Matching algorithms are fast in sparse random graphs ", Theory of Computing Systems T. 39 (1): 3-14 , DOI 10.1007 / s00224-005-1254-y .
Blum, Norbert (2001), A Simplified Realization of the Hopcroft-Karp Approach to Maximum Matching in General Graphs , Tech. Rep. 85232-CS, Computer Science Department, Univ. of Bonn , < http://theory.cs.uni-bonn.de/ftp/reports/cs-reports/2001/85232-CS.ps.gz > .
Chang, S. Frank & McCormick, S. Thomas (1990), A faster implementation of a bipartite cardinality matching algorithm , Tech. Rep. 90-MSC-005, Faculty of Commerce and Business Administration, Univ. of british columbia . As cited by Setubal (1996 ).
Darby-Dowman, Kenneth (1980), The exploitation of sparsity in large scale linear programming problems - Data structures and restructuring algorithms , Ph.D. thesis, Brunel University . As cited by Setubal (1996 ).
Edmonds, Jack (1965), " Paths, Trees and Flowers ", Canadian J. Math T. 17: 449-467 , DOI 10.4153 / CJM-1965-045-4 .
Gabow, Harold N. & Tarjan, Robert E. (1991), " Faster scaling algorithms for general graph matching problems ", Journal of the ACM T. 38 (4): 815–853 , DOI 10.1145 / 115234.115366 .
Hopcroft, John E. & Karp, Richard M. (1973), " An n ^5/2 algorithm for maximum matchings in bipartite graphs ", SIAM Journal on Computing T. 2 (4): 225–231 , DOI 10.1137 / 0202019 .
Micali, S. & Vazirani, VV (1980), "An $\scriptstyle O({\sqrt {|V|}}\cdot |E|)$ ${\ displaystyle \ scriptstyle O ({\ sqrt {| V |}} \ cdot | E |)}$ algorithm for finding maximum matching in general graphs ", Proc. 21st IEEE Symp. Foundations of Computer Science , pp. 17–27 , DOI 10.1109 / SFCS.1980.12 .
Peterson, Paul A. & Loui, Michael C. (1988), " The general maximum matching algorithm of Micali and Vazirani ", Algorithmica T. 3 (1-4): 511-533 , DOI 10.1007 / BF01762129 .
Motwani, Rajeev (1994), " Average-case analysis of algorithms for matchings and related problems ", Journal of the ACM T. 41 (6): 1329-1356 , DOI 10.1145 / 195613.195663 .
Setubal, João C. (1993), "New experimental results for bipartite matching", Proc. Netflow93 , Dept. of Informatics, Univ. of Pisa, p. 211–216 . As cited by Setubal (1996 ).
Setubal, João C. (1996), Sequential and parallel experimental results with bipartite matching algorithms , Tech. Rep. IC-96-09, Inst. of Computing, Univ. of Campinas , < http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.3539 > .
Vazirani, Vijay (2012), An Improved Definition of Blossoms and a Simpler Proof of the MV Matching Algorithm , CoRR abs / 1210.4594 .

[автоссылка1-1] ¹ ² Hopcroft & Karp (1973 )

[2] Edmonds (1965 )

[3] Algorithms for finding the maximum matching in a bipartite graph (neopr.) .

[_6fe130b295a438aa-4] Edmonds, 1965 , p. 453.

[5] Ahuja, Magnanti & Orlin (1993 ), section 12.3, bipartite cardinality matching problem, pp. 469-470.

[6] Yefim Dinitz. Dinitz 'Algorithm: The Original Version and Even's Version // Theoretical Computer Science: Essays in Memory of Shimon Even / ed. Oded Goldreich, Arnold L. Rosenberg, and Alan L. Selman. - Springer, 2006. - P. 218–240. - ISBN 978-3540328803 .

[7] Bast et al. (2006 )

[8] Motwani (1994 )

[9] Alt et al. (1991 )

[10] Chang & McCormick (1990 ); Darby-Dowman (1980 ); Setubal (1993 ); Setubal (1996 ).

[11] Vazirani (2012 )

[12] Java Program to Implement Hopcroft Algorithm - Sanfoundry (Eng.) , Sanfoundry (November 20, 2013). Date of treatment April 6, 2017.