The graph editing distance is the coefficient of similarity (or dissimilarity) between two graphs . The concept of distance graph editing was first formulated mathematically by Alberto Sanfeliu and King-San Fu in 1983 [1] . The main application of graph editing distances is in , such as robust pattern recognition in machine learning [2] .
The editing distance of a graph between two graphs is related to the between the lines . When interpreting stock as connected directed acyclic graphs with a maximum degree of two, classical definitions of editing distance, such as Levenshtein distance [3] , Hamming distance [4] and Jaro – Winkler distance , can be interpreted as graph editing distances between matching graphs. Similarly, the graph editing distance is a generalization of the tree editing distance between trees with roots [5] [6] [7] [8] .
Content
Formal Definitions and Properties
The mathematical definition of the graph editing distance depends on the definition of graphs for which the distance is determined. For example, it depends on whether the vertices and edges of the graph are labeled and how they are labeled , as well as whether the graph is oriented . In general, if you are given a set of graph editing operations (also known as elementary graph operations ), the graph editing distance between two graphs and written as can be defined as
- ,
Where means a set of conversion paths in ( isomorphic to a graph) , but equal to the cost of each editing operation .
A set of elementary operations on a graph usually includes:
- vertex insertion - one new marked vertex is inserted into the graph.
- deletion of a vertex - one (often unrelated to other) vertex is removed from the graph.
- vertex substitution - change the label (or color) of a given vertex.
- insert edge - a new color edge is inserted into the graph between a pair of vertices.
- edge removal - delete one edge between a pair of vertices.
- rib substitution - change the label (or color) of a given rib.
- deletion of a vertex - one (often unrelated to other) vertex is removed from the graph.
In addition, but more rarely, operations are included such as splitting an edge , in which a new vertex is inserted into the edge (which leads to the formation of two edges), and contraction of the edge , which removes a vertex of degree two between the edges (of the same color) with the union of two edges in one. Although such complex operations can be defined in terms of simpler transformations, their use allows a better parameterization of the price function when the operator is cheaper than the sum of its components.
Applications
Graph editing distance finds application in handwriting recognition [9] , fingerprint recognition [10] and chemoinformatics [11] .
Algorithms and Complexity
Exact algorithms for calculating the distance of editing a graph between a pair of graphs usually transform the problem into a task of finding the minimum path of transformations between two graphs. The calculation of the optimal editing path is reduced to finding the path or the shortest path problem , often implemented as an A * search algorithm .
In addition to exact algorithms, many effective approximation algorithms are known [12] [13] .
Despite the fact that the above-mentioned algorithms sometimes work well in practice, in the general case, the task of calculating the graph editing distance is NP-complete [14] (the proof of this is available in Section 2 on the website of Zeng et al. ) And is even difficult to approximate (formally, it is APX- hard [15] ).
Notes
- ↑ Sanfeliu, Fu, 1983 , p. 353–363.
- ↑ Gao, Xiao, Tao, Li, 2010 , p. 113-129.
- ↑ Levenshtein, 1965 , p. 845-848.
- ↑ Hamming, 1950 , p. 147-160.
- ↑ Shasha, Zhang, 1989 , p. 1245-1262.
- ↑ Zhang, 1996 , p. 205–222.
- ↑ Bille, 2005 , p. 22–34.
- ↑ Demaine, Mozes, Rossman, Weimann, 2010 , p. A2.
- ↑ Fischer, Suen, Frinken, Riesen, Bunke, 2013 , p. 194–203.
- ↑ Neuhaus, Bunke, 2005 , p. 191-200.
- ↑ Birchall, Gillet, Harper, Pickett, 2006 , p. 557-586.
- ↑ Neuhaus, Bunke, 2007 .
- ↑ Riesen, 2016 .
- ↑ Garey, Johnson, 1979 .
- ↑ Lin, 1994 , p. 74–82.
Literature
- Alberto Sanfeliu, King-Sun Fu. A distance measure between attributed relational graphs for pattern recognition // IEEE Transactions on Systems, Man and Cybernetics . - 1983 .-- T. 13 , no. 3 . - S. 353–363 . - DOI : 10.1109 / TSMC.1983.6313167 .
- Xinbo Gao, Bing Xiao, Dacheng Tao, Xuelong Li. A survey of graph edit distance // Pattern Analysis and Applications. - 2010 .-- T. 13 . - S. 113–129 . - DOI : 10.1007 / s10044-008-0141-y .
- Vladimir I. Levenshtein. Binary Codes with Correction of Dropouts, Inserts, and Symbol Substitutions // Reports of CCCP Academies of Sciences. - 1965. - T. 163 , no. 4 . - S. 845–848 .
- Richard W. Hamming. Error detecting and error correcting codes // Bell System Technical Journal . - 1950 .-- T. 29 , no. 2 . - S. 147–160 . - DOI : 10.1002 / j.1538-7305.1950.tb00463.x . Archived May 25, 2006.
- Shasha D., Zhang K. Simple fast algorithms for the editing distance between trees and related problems // SIAM J. Comput. . - 1989. - T. 18 , No. 6 . - S. 1245–1262 . - DOI : 10.1137 / 0218082 .
- Zhang K. A constrained edit distance between unordered labeled trees // Algorithmica . - 1996. - T. 15 , No. 3 . - S. 205–222 . - DOI : 10.1007 / BF01975866 .
- Bille P. A survey on tree edit distance and related problems // Theor. Comput. Sci. . - 2005 .-- T. 337 , no. 1-3 . - S. 22–34 . - DOI : 10.1016 / j.tcs.2004.12.12.030 .
- Erik D. Demaine, Shay Mozes, Benjamin Rossman, Oren Weimann. An optimal decomposition algorithm for tree edit distance // ACM Transactions on Algorithms . - 2010 .-- T. 6 , no. 1 . - S. A2 . - DOI : 10.1145 / 1644015.1644017 .
- Andreas Fischer, Ching Y. Suen, Volkmar Frinken, Kaspar Riesen, Horst Bunke. A Fast Matching Algorithm for Graph-Based Handwriting Recognition // Graph-Based Representations in Pattern Recognition. - 2013. - T. 7877. - S. 194–203. - ( Lecture Notes in Computer Science ). - ISBN 978-3-642-38220-8 . - DOI : 10.1007 / 978-3-642-38221-5_21 .
- Michel Neuhaus, Horst Bunke. A Graph Matching Based Approach to Fingerprint Classification using Directional Variance // Audio- and Video-Based Biometric Person Authentication. - 2005. - T. 3546. - S. 191–200. - ( Lecture Notes in Computer Science ). - ISBN 978-3-540-27887-0 . - DOI : 10.1007 / 11527923_20 .
- Kristian Birchall, Valerie J. Gillet, Gavin Harper, Stephen D. Pickett. Training Similarity Measures for Specific Activities: Application to Reduced Graphs // Journal of Chemical Information and Modeling . - 2006. - January ( t. 46 , No. 2 ). - S. 557-586 . - DOI : 10.1021 / ci050465e . - PMID 16562986 .
- Michel Neuhaus, Horst Bunke. Bridging the Gap between Graph Edit Distance and Kernel Machines. - World Scientific, 2007. - T. 68. - (Machine Perception and Artificial Intelligence). - ISBN 978-9812708175 .
- Kaspar Riesen. Structural Pattern Recognition with Graph Edit Distance: Approximation Algorithms and Applications. - Springer, 2016. - (Advances in Computer Vision and Pattern Recognition). - ISBN 978-3319272511 .
- Garey MR, Johnson DS Computers and Intractability: A Guide to the Theory of NP-Completeness. - WH Freeman and Company, 1979. - ISBN 978-0-7167-1045-5 .
- Chih-Long Lin. Hardness of approximating graph transformation problem // Algorithms and Computation / Ding-Zhu Du, Xiang-Sun Zhang. - Springer Berlin Heidelberg, 1994. - T. 834. - (Lecture Notes in Computer Science). - ISBN 9783540583257 . - DOI : 10.1007 / 3-540-58325-4_168 .