CatBoost is an open software library developed by Yandex that implements a unique patented algorithm for building machine learning models using one of the original gradient boosting schemes ( English boosting is an improvement). The main API for working with the library is implemented for the Python language , there is also an implementation for the R programming language.
| Catboost | |
|---|---|
| Type of | Machine Learning Library |
| Developer | Yandex |
| Written on | C ++ , Python , JavaScript |
| First edition | July 18, 2017 |
| Hardware platform | Linux , macOS , Windows |
| Latest version | 0.9.1.1 (July 7, 2018) |
| condition | Active |
| License | Apache 2.0 |
| Site | catboost.yandex |
On July 18, 2017, Yandex laid out a library with the CatBoost algorithm in open access with an open license Apache 2.0 [1] [2] [3] , which is a continuation and development of the Yandex closed project - Matrixnet.
Content
History
Matrixnet
The Matrixnet closed machine learning system has been developed by Yandex since 2009 to use gradient boosting in the company's internal projects - primarily, to build a search engine ranking formula [4] .
CatBoost
On July 18, 2017, CatBoost was opened for free access on GitHub by Yandex under a free Apache 2.0 license . CatBoost is a machine learning system using one of the original gradient boosting schemes. CatBoost is available for 64-bit Linux , macOS, and Windows operating systems . In macOS , the original Core ML framework is used to speed up work - created by Apple for machine learning methods.
Comparing CatBoost with similar machine learning systems from Google ( TensorFlow ) and Microsoft ( LightGBM ), Anna Veronika Dorogush, head of Yandex machine learning, said Google TensorFlow solves a different class of problems by efficiently analyzing homogeneous data - for example, images. And "CatBoost works with data of different nature and can be used in conjunction with TensorFlow and other machine learning algorithms depending on specific tasks . " Microsoft LightGBM wins Russian development in quality, which is demonstrated by a table of tests with comparisons generally accepted in machine learning, but so far loses in speed - which Yandex promises to fix [5] .
Application
First of all, CatBoost technology is used in the Internet services of the Yandex company itself - it is used to improve the results of the Yandex search system , ranking the recommendations feed, to calculate weather forecasts and in other services where it proved to be better than the previous technology - MatrixNet. The Yandex Data Factory team also uses this technology in its industrial solutions, in particular, it is used to optimize the consumption of raw materials and predict defects in production.
CatBoost was introduced by the European Center for Nuclear Research ( CERN ) in research at the Large Hadron Collider (LHC) to combine information from various parts of the LHCb detector into the most accurate, aggregated particle knowledge. Using CatBoost to combine data, scientists were able to improve the quality of the final solution, where the results of CatBoost were better than the results obtained using other methods [6] [7] .
Notes
- ↑ Yandex has made publicly available a new machine learning library . Yandex . (July 18, 2017). Date of treatment June 8, 2018.
- ↑ CatBoost - a new machine learning method from Yandex . Yandex . (July 18, 2017). Date of treatment June 8, 2018.
- ↑ Yandex launches CatBoost machine learning technology . Habr . (July 18, 2017). Date of treatment June 8, 2018.
- ↑ Technologies: MatrixNet . Yandex . . Date of treatment June 8, 2018.
- ↑ Why did Yandex open access to its machine learning system . Forbes (July 19, 2017). Date of treatment June 8, 2018.
- ↑ Yandex posted an open access alternative to neural networks . CNews . (July 18, 2017). Date of treatment June 8, 2018.
- ↑ Yandex introduced the new CatBoost machine learning method Neopr . 3DNews (July 18, 2017). Date of treatment June 8, 2018.
Link
- catboost.yandex - official CatBoost website
- CatBoost Project on GitHub