Busing

Boosting ( eng. Boosting - gain) is a composite meta-algorithm of machine learning , used mainly to reduce bias , as well as variance ^[1] in training with a teacher . Also defined as a family of machine learning algorithms that convert weak learning algorithms to strong ones ^[2] .

Busting is based on a question raised by Cairns and Valiant (1988, 1989) ^[3] ^[4] : “Can a set of weak training algorithms create a strong training algorithm?”. A weak learning algorithm is defined as a classifier that weakly correlates with the correct classification (it may mark examples better than random guessing). Unlike a weak algorithm, a strong training algorithm is a classifier that correlates well with the correct classification.

The positive response of Robert Shapier in an article in 1990 ^[5] to the question of Cairns and Valiant was of great importance for the theory of machine learning and statistics , and led to the creation of a wide range of boosting algorithms ^[6] .

The boosting hypothesis related to the process of setting up a weak learning algorithm to obtain rigorous training. Informally, it is asked whether the existence of an effective learning algorithm, the output of which is a hypothesis, the effectiveness of which is only slightly better than random fortune-telling (i.e. weak learning), the existence of an efficient algorithm that gives an arbitrary precision hypothesis (i.e. strong learning) ^[3] . Algorithms that quickly get such a hypothesis are simply known as “boosting.” The algorithm “arcing” of Freund and Schapier (Adaptive Resampling and Combining) ^[7] , as a general technique, is more or less synonymous with boosting ^[8]

Content

Boosting Algorithms

While boosting is not algorithmically limited, most of the boosting algorithms consist of iterative training of weak classifiers in order to assemble them into a strong classifier. When they are added, weights are usually attributed to them in some way, which are usually associated with the accuracy of training. After the weak classifier is added, the weights are recalculated, which is known as . Incorrectly classified input gains more weight, and properly classified instances lose weight ^{[nb 1]} . Thus, the subsequent weak learning focuses more on examples, where the previous weak training gave an erroneous classification.

There are many boosting algorithms. The original algorithms proposed by Robert Shapira ( recursive domination , English recursive majority gate formulation ) ^[5] and Yoav Freund (boost by domination) ^[9] were not and could not give the full advantage of weak training. Schapier and Freund then developed AdaBoost (Adaptive Boosting) - an adaptive algorithm for boosting, which won the prestigious Gödel Prize .

Only algorithms for which it can be proved that they are boosting algorithms in the formulation of approximately correct training can be precisely called boosting algorithms . Other algorithms that are close in spirit to the boosting algorithms are sometimes called “maximum use algorithms” ( English leveraging algorythms ), although they are sometimes also incorrectly called boosting algorithms ^[9] .

The main discrepancy between many boosting algorithms lies in the methods for determining of points and hypotheses . The AdaBoost algorithm is very popular and historically most significant, since it was the first algorithm that was able to adapt to poor learning. The algorithm is often used as a basic introduction to boosting algorithms in machine learning courses at universities ^[10] . There are many recently developed algorithms, such as , TotalBoost, BrownBoost , , MadaBoost, and others. Many of the boosting algorithms fall into the AnyBoost model ^[9] , which shows that the boosting implements a gradient descent in the using a convex .

Classification of signs in computer vision

If images are given that contain various objects known in the world, the classifier can be trained on the basis of them for the automatic classification of objects in future unknown images. Simple classifiers, built on the basis of some features of the image of an object, usually turn out to be ineffective in classification. The use of boosting methods for classifying objects is a way of combining weak classifiers in a special way to improve the overall classification capability.

Object classification task

is a typical task of computer vision , where it is determined whether the image contains a certain category of objects or not. The idea is closely related to recognition, identification and detection. Classification by object detection usually includes feature extraction , classifier training , and application of the classifier to new data. There are many ways to represent a category of objects, for example, by , using , using local descriptors such as SIFT , and so on. Examples of classifiers with a teacher are naive Bayes classifiers , support vector methods , a and neural networks . However, studies have shown that the categories of objects and their position in the images can also be detected through training without a teacher ^[11] .

Status quo to classify objects

Recognizing categories of objects in images is a difficult task in computer vision , especially if the number of categories is large. This is a consequence of the high internal variability of classes and the need to generalize various concepts within the class. Objects in the same category may look completely different. Even the same object may look different from different points of view, with a different or . Background noise and partial overlays also add complexity to the recognition ^[12] . People are able to recognize thousands of types of objects, while most of the existing object recognition systems are trained to recognize only a few, such as human faces , cars , simple objects, etc. ^[13] . Studies to increase the number of categories and the possibility of adding new categories are being actively conducted and, although the general problem has not yet been resolved, detectors of a large number of categories (up to hundreds and thousands ^[14] ) have been developed. This is achieved, in particular, by sharing the and boosting.

Busting for binary classification

The AdaBoost package can be used for face recognition as an example of binary classification . Two categories are faces and backgrounds. The general algorithm is as follows:

We form a large set of signs
Initialize weights for a training set of images
Do T runs
1. Normalize weights
2. For the available attributes from the set we train the classifier using one of the signs and we calculate the training error
3. Choose a classifier with the smallest error
4. We update the weight of training images: increase, if classified incorrectly, and decrease, if correct
We form the final strong classifier as a linear combination of T classifiers (the coefficient is greater if the training error is less)

After boosting, a classifier built from 200 attributes can reach 95% of successful recognitions with $10^{-5}$ ${\ displaystyle 10 ^ {- 5}}$ positive recognition errors ^[15] .

Another binary-leveling boosting application is a system that recognizes pedestrians using patterns of movement and appearance ^[16] . In this work, for the first time, information on movement and appearance is combined as signs to detect a moving person. An approach similar to the Viola-Jones object detection model is being taken.

Boosting multiclass classification

Compared to binary classification, searches for common features that can be used together in categories at the same time. They turn out to be more common like the “border” feature. During training, classifiers for each category can be trained together. Compared with a separate workout, such training has better generalizability , requires less training data and needs less signs to achieve the desired result.

The basic operation of the algorithm is similar to the binary case. The difference is that the measure of the joint workout error can be determined in advance. During each iteration, the algorithm selects a classifier of one feature (features that can be jointly classified are encouraged). This can be done by converting the multiclass classification into a binary one (a set of categories / other categories) ^[17] or by imposing a fine on categories that have no signs recognized by the classifier ^[18] .

In the article “Sharing visual features for multiclass and multiview object detection,” A. Torralba and co-authors used GentleBoost for boosting and showed that, if training data are limited, learning with the help of used signs makes the work a lot better than without sharing. Also, for a given level of performance, the total number of features required (and therefore the classifier operating time) for detecting the sharing of features grows approximately logarithmically from the number of classes, that is, slower than , which is observed in the absence of sharing. Similar results are shown in the article “Incremental Learning to Detect Objects Using the Alphabet of Visual Images”, however, the authors used AdaBoost for boosting.

Convex and non-convex boosting algorithms

Boosting algorithms can be based on or non-convex optimization algorithms. Convex algorithms, such as AdaBoost and , can “crash” due to random noise, since they cannot teach basic and learnable combinations of weak hypotheses ^[19] ^[20] . This limitation was indicated by Long and Servedo in 2008. However, in 2009 several authors demonstrated that boosting algorithms based on non-convex optimization, such as BrownBoost , can be learned from the noisy data and the underlying Long Servedio classifier for the data set trained.

Implementation

, an open source Python language learning library
, a free software package for data analysis, Orange.ensemble module
Weka is a machine learning toolkit containing a number of implementations of boosting algorithms, such as AdaBoost and LogitBoost.
The GBM (Generalized Boosted Regression Models = Generalized Booming Regression Models) package in R implements an extension of the Freund and Shapira algorithm AdaBoost and Friedman's gradient boosting.
jboost ; AdaBoost, LogitBoost, RobustBoost, Boostexter and
R adabag package: AdaBoost.M1, AdaBoost-SAMME and Bagging multiclass algorithms
The xgboost package in R: Implementing a gradient boost for linear tree-based models.

Notes

↑ . Some booster-based classification algorithms actually reduce the weights of re-misclassified instances. For example, boost by domination ( Eng. Boost by majority ) and BrownBoost

↑ Breiman, 1996 .
↑ Zhi-Hua, 2012 , p. 23.
↑ ¹ ² Kearns, 1988 .
↑ Kearns, Valiant, 1989 , p. 433–444.
↑ ¹ ² Schapire, 1990 , p. 197–227.
↑ Breiman, 1998 , p. 801–849.
↑ Freund, Schapire, 1997 , p. 119-139.
↑ Leo Bryman ( Breiman 1998 ) writes: “The concept of weak learning was introduced by Kearns and Valiant ( Kearns, Valiant, 1988 , Kearns, Valiant, 1989 ), which raised the question of whether weak and strong learning are equivalent. The question was called the boosting task , because the solution should strengthen the weak precision of weak learning to the high precision of strong learning. Shapira (1990) proved that boosting is possible. The boosting algorithm is a method that takes a weak learning method and converts it into a strong method. Freund and Schapier (1997) proved that an algorithm like arc-fs is boosting. ”
↑ ¹ ² ³ Mason, Baxter, Bartlett, Frean, 2000 , p. 512-518.
↑ Emer, Eric Boosting (AdaBoost algorithm) (Unidentified) . MIT . The appeal date is October 10, 2018.
↑ Sivic, Russell, Efros, Zisserman, Freeman, 2005 , p. 370-377.
↑ Opelt, Pinz, Fussenegger, Auer, 2006 , p. 416-431.
↑ Marszalek, Schmid, 2007 .
↑ Large Scale Visual Recognition Challenge (Neopr.) (December 2017).
↑ Viola, Jones, 2001 .
↑ Viola, Jones, Snow, 2003 .
↑ Torralba, Murphy, Freeman, 2007 , p. 854-869.
↑ Opelt, Pinz, Zisserma, 2006 , p. 3-10.
↑ Long, Servedio, 2008 , p. 608-615.
↑ Long, Servedio, 2010 , p. 287–304.

Literature

Leo Breiman . Bias, Variance, And Arcing Classifiers // Technical Report. - 1996. Archived January 19, 2015. Excerpt: “Arcing [Boosting] is more successful than bagging in variance reduction”
Zhou Zhi-Hua. Ensemble Methods: Foundations and Algorithms. - 2012. - ISBN 978-1439830031 . Excerpt: “The term boosting refers to
Michael Kearns. Thoughts on Hypothesis Boosting . - 1988. - (Unpublished manuscript (Machine Learning class project)).
Leo Breiman. Arcing Classifier (with Discussion and a Rejoinder by the Author) // Annals of Statistics. - 1998. - V. 26 , № 3 . - p. 801-849:.
Michael Kearns, Leslie Valiant . Crytographic limitations on learning Boolean formula / finite automata // Symposium on Theory of computing. - ACM, 1989. - T. 21 . - DOI : 10.1145 / 73007.73049 .
Michael Kearns, Leslie Valiant . Learning Boolean Formulae or Finite Automata is as Hard as Factoring. Technical Report TR-14-88. - 1988.
- The article was later reprinted in Journal of the Association for Computing Machinery, 41 (1): 67-95, January 1994
Robert E. Schapire. The Strength of Weak Learnability // Machine Learning. - Boston, MA: Kluwer Academic Publishers, 1990. - Vol. 5 , no. 2 - DOI : 10.1007 / bf00116037 . Archived October 10, 2012.
Leo Breiman . Arcing classifier (with discussion and a rejoinder by the author) // Ann. Stat .. - 1998. - V. 26 , no. 3 - DOI : 10.1214 / aos / 1024691079 . Excerpt: "Schapire (1990) proved that boosting is possible" (Page 823)
Yoav Freund, Robert E. Schapire. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting . - 1997. - V. 55 , no. 1 .
Andreas Opelt, Axel Pinz, Michael Fussenegger, Peter Auer. Generic Object Recognition with Boosting // IEEE Trans Pattern Anal Mach Intell. - 2006. - T. 28 . - p . 416-31 . - ISSN 0162-8828 .
Marszalek M., Schmid C. Semantic Hierarchies for Visual Object Recognition . - 2007.
Viola P., Jones M., Snow D. Detecting Pedestrians Using Patterns of Motion and Appearance // ICCV. - 2003.
Torralba A., Murphy KP, Freeman WT Sharing Visual Detection / Multicast Detection / IEEE Transactions on PAMI. - 2007. - V. 29 , no. 5 - DOI : 10.1109 / TPAMI.2007.1055 .
Andreas Opelt, Axel Pinz, Andrew Zisserma. Incremental learning of the object detectors using a visual shape alphabet // CVPR. - 2006. - p . 3-10 .
Long P., Servedio R. Random classification noise defeats all convex potential boosters // 25th International Conference on Machine Learning (ICML). - 2008. - p. 608-615.
Philip M. Long, Rocco A. Servedio. Random classification noise defeats all convex potential boosters // Machine Learning. - Springer US, 2010. - March ( v. 78 , issue 3 ). - p . 287–304 . - DOI : 10.1007 / s10994-009-5165-z .
Llew Mason, Jonathan Baxter, Peter Bartlett, Marcus Frean. Boosting Algorithms as Gradient Descent // Advances in Neural Information Processing Systems / SA Solla, TK Leen, K.-R. Muller. - MIT Press, 2000. - T. 12 .
Josef Sivic, Bryan C. Russell, Alexei A. Efros, Andrew Zisserman, William T. Freeman. Discovering objects and their location in images // ICCV 2005. Tenth IEEE International Conference on Computer Vision. - IEEE, 2005. - T. 1.
Paul Viola, Michael Jeffrey Jones. Robust Real-Time Object Detection // International Journal of Computer Vision. - 2001. - V. 57 , no. 2
Yoav Freund and Robert E. Schapire (1997); A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting , Journal of Computer and System Sciences, 55 (1): 119-139
Robert E. Schapire and Yoram Singer (1999); Improved Boosting Algorithms Using Confidence-Rated Predictors , Machine Learning, 37 (3): 297-336

Links

Robert E. Schapire (2003); Approach to Machine Learning: An Overview , Workshop on Nonlinear Estimation and Classification, MSRI
Zhou Zhi-Hua (2014) Boosting 25 years , CCL 2014 Keynote.
Zhihua Zhou. Explanation of the boosting algorithm. // Proceedings of the 21st Annual Conference on Learning Theory (COLT'08). - 2008. - p . 479–490 .
Zhihua Zhou. On the line // Artificial Intelligence. - 2013. - T. 203 . - P. 1–18 . - DOI : 10.1016 / j.artint.2013.07.002 . - arXiv : 1009.3613 .

[9] . Some booster-based classification algorithms actually reduce the weights of re-misclassified instances. For example, boost by domination ( Eng. Boost by majority ) and BrownBoost

[_f414edd63ac7c2ee-1] Breiman, 1996 .

[_4be97415bab790eb-2] Zhi-Hua, 2012 , p. 23.

[_ec01e06736d97577-3] ¹ ² Kearns, 1988 .

[_d5a47fc0ef9867e4-4] Kearns, Valiant, 1989 , p. 433–444.

[_f79a06ed82a55228-5] ¹ ² Schapire, 1990 , p. 197–227.

[_e38de82ace1c77bb-6] Breiman, 1998 , p. 801–849.

[_323acf9c705b245c-7] Freund, Schapire, 1997 , p. 119-139.

[8] Leo Bryman ( Breiman 1998 ) writes: “The concept of weak learning was introduced by Kearns and Valiant ( Kearns, Valiant, 1988 , Kearns, Valiant, 1989 ), which raised the question of whether weak and strong learning are equivalent. The question was called the boosting task , because the solution should strengthen the weak precision of weak learning to the high precision of strong learning. Shapira (1990) proved that boosting is possible. The boosting algorithm is a method that takes a weak learning method and converts it into a strong method. Freund and Schapier (1997) proved that an algorithm like arc-fs is boosting. ”

[_7d82824576c27b39-10] ¹ ² ³ Mason, Baxter, Bartlett, Frean, 2000 , p. 512-518.

[11] Emer, Eric Boosting (AdaBoost algorithm) (Unidentified) . MIT . The appeal date is October 10, 2018.

[_5e9561710d32b944-12] Sivic, Russell, Efros, Zisserman, Freeman, 2005 , p. 370-377.

[_05f6111317e1e50c-13] Opelt, Pinz, Fussenegger, Auer, 2006 , p. 416-431.

[_cc77840e8c56928e-14] Marszalek, Schmid, 2007 .

[15] Large Scale Visual Recognition Challenge (Neopr.) (December 2017).

[_175c8926f48454fe-16] Viola, Jones, 2001 .

[_f202be28c5112b4b-17] Viola, Jones, Snow, 2003 .

[_ad9c1baf0c225a1c-18] Torralba, Murphy, Freeman, 2007 , p. 854-869.

[_6f57fb80199a7ce0-19] Opelt, Pinz, Zisserma, 2006 , p. 3-10.

[_cdd6c54c6027dcc8-20] Long, Servedio, 2008 , p. 608-615.

[_4e90c2896dae3ef4-21] Long, Servedio, 2010 , p. 287–304.