Optimization by Gradient Boosting

Biau, Gérard; Cadre, Benoît

doi:10.1007/978-3-030-73249-3_2

Gérard Biau³ &
Benoît Cadre⁴

1836 Accesses
16 Citations

Abstract

Gradient boosting is a state-of-the-art prediction technique that sequentially produces a model in the form of linear combinations of elementary predictors—typically decision trees—by solving an infinite-dimensional convex optimization problem. We provide in the present paper a thorough analysis of two widespread versions of gradient boosting, and introduce a general framework for studying these algorithms from the point of view of functional optimization. We prove their convergence as the number of iterations tends to infinity and highlight the importance of having a strongly convex risk functional to minimize. We also present a reasonable statistical context ensuring consistency properties of the boosting predictors as the sample size grows. In our approach, the optimization procedures are run forever (that is, without resorting to an early stopping strategy), and statistical regularization is basically achieved via an appropriate $L^2$ penalization of the loss and strong convexity arguments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Accelerated gradient boosting

Article 04 February 2019

Case Study II: Tuning of Gradient Boosting (xgboost)

Using Automatic Programming to Improve Gradient Boosting for Classification

References

Bartlett, P. L., & Traskin, M. (2007). AdaBoost is consistent. Journal of Machine Learning Research, 8, 2347–2368.
MathSciNet MATH Google Scholar
Bartlett, P. L., Jordan, M. I., & McAuliffe, J. D. (2006). Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101, 138–156.
Article MathSciNet Google Scholar
Bickel, P. J., Ritov, Y., & Zakai, A. (2006). Some theory for generalized boosting algorithms. Journal of Machine Learning Research, 7, 705–732.
MathSciNet MATH Google Scholar
Blanchard, G., Lugosi, G., & Vayatis, N. (2003). On the rate of convergence of regularized boosting classifiers. Journal of Machine Learning Research, 4, 861–894.
MathSciNet MATH Google Scholar
Breiman, L. (1997). Arcing the edge. Technical Report 486, Statistics Department, University of California, Berkeley.
Google Scholar
Breiman, L. (1998). Arcing classifiers (with discussion). The Annals of Statistics, 26, 801–849.
Article MathSciNet Google Scholar
Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation, 11, 1493–1517.
Article Google Scholar
Breiman, L. (2000). Some infinite theory for predictor ensembles. Technical Report 577, Statistics Department, University of California, Berkeley.
Google Scholar
Breiman, L. (2004). Population theory for boosting ensembles. The Annals of Statistics, 32, 1–11.
Article MathSciNet Google Scholar
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Boca Raton: Chapman & Hall/CRC Press.
MATH Google Scholar
Bubeck, S. (2015). Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning, 8, 231–357.
Article Google Scholar
Bühlmann, P. (2006). Boosting for high-dimensional linear models. The Annals of Statistics, 34, 559–583.
Article MathSciNet Google Scholar
Bühlmann, P., & Hothorn, T. (2007). Boosting algorithms: Regularization, prediction and model fitting. Statistical Science, 22, 477–505.
MathSciNet MATH Google Scholar
Bühlmann, P., & van de Geer, S. (2011). Statistics for high-dimensional data: Methods, theory and applications. Berlin: Springer.
Book Google Scholar
Bühlmann, P., & Yu, B. (2003). Boosting with the $L_2$ loss: Regression and classification. Journal of the American Statistical Association, 98, 324–339.
Article MathSciNet Google Scholar
Champion, M., Cierco-Ayrolles, C., Gadat, S., & Vignes, M. (2014). Sparse regression and support recovery with $L_2$-boosting algorithms. Journal of Statistical Planning and Inference, 155, 19–41.
Article MathSciNet Google Scholar
Chen, T.,& Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). New York: ACM.
Google Scholar
Devroye, L., & Györfi, L. (1985). Nonparametric density estimation: The$L_1$view. New York: Wiley.
Google Scholar
Devroye, L., Györfi, L., & Lugosi, G. (1996). A probabilistic theory of pattern recognition. New York: Springer.
Book Google Scholar
Frank, M., & Wolfe, P. (1956). An algorithm for quadratic programming. Naval Research Logistics Quarterly, 3, 95–110.
Article MathSciNet Google Scholar
Freund, Y. (1995). Boosting a weak learning algorithm by majority. Information and Computation, 121, 256–285.
Article MathSciNet Google Scholar
Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Lorenza, S. (Ed.) Machine Learning: Proceedings of the Thirteenth International Conference on Machine Learning, (pp 148–156). San Francisco: Morgan Kaufmann Publishers.
Google Scholar
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55, 119–139.
Article MathSciNet Google Scholar
Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). The Annals of Statistics, 28, 337–407.
Article MathSciNet Google Scholar
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29, 1189–1232.
Article MathSciNet Google Scholar
Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics and Data Analysis, 38, 367–378.
Article MathSciNet Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). New York: Springer.
Book Google Scholar
Lugosi, G., & Vayatis, N. (2004). On the Bayes-risk consistency of regularized boosting methods. The Annals of Statistics, 32, 30–55.
Article MathSciNet MATH Google Scholar
Mallat, S. G., & Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41, 3397–3415.
Article Google Scholar
Mannor, S., Meir, R., & Zhang, T. (2003). Greedy algorithms for classification – consistency, convergence rates, and adaptivity. Journal of Machine Learning Research, 4, 713–742.
MathSciNet MATH Google Scholar
Mason, L., Baxter, L., Bartlett, P., & Frean, M. (1999). Boosting algorithms as gradient descent. In Solla, S. A., Leen, T. K., Müller, K. (Eds.) Proceedings of the 12th International Conference on Neural Information Processing Systems (pp. 512–518). Cambridge, MA: The MIT Press.
Google Scholar
Mason, L., Baxter, J., Bartlett, P., & Frean, M. (2000). Functional gradient techniques for combining hypotheses. In A. J. Smola, P. L. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 221–246). Cambridge, MA: The MIT Press.
Google Scholar
Meir, R., & Rätsch, G. (2003). An introduction to boosting and leveraging. In S. Mendelson & A. J. Smola (Eds.), Advanced lectures on machine learning: Machine learning summer school 2002 (pp. 118–183). Berlin: Springer.
Chapter Google Scholar
Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5, 197–227.
Article Google Scholar
Temlyakov, V. N. (2000). Weak greedy algorithms. Advances in Computational Mathematics, 12, 213–227.
Article MathSciNet Google Scholar
Zhang, T. (2004). Statistical behavior and consistency of classification methods based on convex risk minimization. The Annals of Statistics, 32, 56–85.
Article MathSciNet Google Scholar
Zhang, T., & Yu, B. (2005). Boosting with early stopping: Convergence and consistency. The Annals of Statistics, 33, 1538–1579.
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We greatly thank the Editors and two referees for valuable comments and insightful suggestions, which led to a substantial improvement of the paper.

Author information

Authors and Affiliations

Sorbonne Université, CNRS, LPSM, 4 place Jussieu, 75005, Paris, France
Gérard Biau
Univ Rennes, CNRS, IRMAR - UMR 6625, 35000, Rennes, France
Benoît Cadre

Authors

Gérard Biau
View author publications
You can also search for this author in PubMed Google Scholar
Benoît Cadre
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gérard Biau .

Editor information

Editors and Affiliations

Toulouse School of Economics, University of Toulouse Capitole, Toulouse, France
Abdelaati Daouia
Toulouse School of Economics, University of Toulouse Capitole, Toulouse, France
Anne Ruiz-Gazen

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 222 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Biau, G., Cadre, B. (2021). Optimization by Gradient Boosting. In: Daouia, A., Ruiz-Gazen, A. (eds) Advances in Contemporary Statistics and Econometrics. Springer, Cham. https://doi.org/10.1007/978-3-030-73249-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-73249-3_2
Published: 15 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73248-6
Online ISBN: 978-3-030-73249-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics