Skip to main content

Optimization by Gradient Boosting

  • Chapter
  • First Online:
Advances in Contemporary Statistics and Econometrics

Abstract

Gradient boosting is a state-of-the-art prediction technique that sequentially produces a model in the form of linear combinations of elementary predictors—typically decision trees—by solving an infinite-dimensional convex optimization problem. We provide in the present paper a thorough analysis of two widespread versions of gradient boosting, and introduce a general framework for studying these algorithms from the point of view of functional optimization. We prove their convergence as the number of iterations tends to infinity and highlight the importance of having a strongly convex risk functional to minimize. We also present a reasonable statistical context ensuring consistency properties of the boosting predictors as the sample size grows. In our approach, the optimization procedures are run forever (that is, without resorting to an early stopping strategy), and statistical regularization is basically achieved via an appropriate \(L^2\) penalization of the loss and strong convexity arguments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Bartlett, P. L., & Traskin, M. (2007). AdaBoost is consistent. Journal of Machine Learning Research, 8, 2347–2368.

    MathSciNet  MATH  Google Scholar 

  • Bartlett, P. L., Jordan, M. I., & McAuliffe, J. D. (2006). Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101, 138–156.

    Article  MathSciNet  Google Scholar 

  • Bickel, P. J., Ritov, Y., & Zakai, A. (2006). Some theory for generalized boosting algorithms. Journal of Machine Learning Research, 7, 705–732.

    MathSciNet  MATH  Google Scholar 

  • Blanchard, G., Lugosi, G., & Vayatis, N. (2003). On the rate of convergence of regularized boosting classifiers. Journal of Machine Learning Research, 4, 861–894.

    MathSciNet  MATH  Google Scholar 

  • Breiman, L. (1997). Arcing the edge. Technical Report 486, Statistics Department, University of California, Berkeley.

    Google Scholar 

  • Breiman, L. (1998). Arcing classifiers (with discussion). The Annals of Statistics, 26, 801–849.

    Article  MathSciNet  Google Scholar 

  • Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation, 11, 1493–1517.

    Article  Google Scholar 

  • Breiman, L. (2000). Some infinite theory for predictor ensembles. Technical Report 577, Statistics Department, University of California, Berkeley.

    Google Scholar 

  • Breiman, L. (2004). Population theory for boosting ensembles. The Annals of Statistics, 32, 1–11.

    Article  MathSciNet  Google Scholar 

  • Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Boca Raton: Chapman & Hall/CRC Press.

    MATH  Google Scholar 

  • Bubeck, S. (2015). Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning, 8, 231–357.

    Article  Google Scholar 

  • Bühlmann, P. (2006). Boosting for high-dimensional linear models. The Annals of Statistics, 34, 559–583.

    Article  MathSciNet  Google Scholar 

  • Bühlmann, P., & Hothorn, T. (2007). Boosting algorithms: Regularization, prediction and model fitting. Statistical Science, 22, 477–505.

    MathSciNet  MATH  Google Scholar 

  • Bühlmann, P., & van de Geer, S. (2011). Statistics for high-dimensional data: Methods, theory and applications. Berlin: Springer.

    Book  Google Scholar 

  • Bühlmann, P., & Yu, B. (2003). Boosting with the \(L_2\) loss: Regression and classification. Journal of the American Statistical Association, 98, 324–339.

    Article  MathSciNet  Google Scholar 

  • Champion, M., Cierco-Ayrolles, C., Gadat, S., & Vignes, M. (2014). Sparse regression and support recovery with \(L_2\)-boosting algorithms. Journal of Statistical Planning and Inference, 155, 19–41.

    Article  MathSciNet  Google Scholar 

  • Chen, T.,& Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). New York: ACM.

    Google Scholar 

  • Devroye, L., & Györfi, L. (1985). Nonparametric density estimation: The\(L_1\)view. New York: Wiley.

    Google Scholar 

  • Devroye, L., Györfi, L., & Lugosi, G. (1996). A probabilistic theory of pattern recognition. New York: Springer.

    Book  Google Scholar 

  • Frank, M., & Wolfe, P. (1956). An algorithm for quadratic programming. Naval Research Logistics Quarterly, 3, 95–110.

    Article  MathSciNet  Google Scholar 

  • Freund, Y. (1995). Boosting a weak learning algorithm by majority. Information and Computation, 121, 256–285.

    Article  MathSciNet  Google Scholar 

  • Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Lorenza, S. (Ed.) Machine Learning: Proceedings of the Thirteenth International Conference on Machine Learning, (pp 148–156). San Francisco: Morgan Kaufmann Publishers.

    Google Scholar 

  • Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55, 119–139.

    Article  MathSciNet  Google Scholar 

  • Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). The Annals of Statistics, 28, 337–407.

    Article  MathSciNet  Google Scholar 

  • Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29, 1189–1232.

    Article  MathSciNet  Google Scholar 

  • Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics and Data Analysis, 38, 367–378.

    Article  MathSciNet  Google Scholar 

  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). New York: Springer.

    Book  Google Scholar 

  • Lugosi, G., & Vayatis, N. (2004). On the Bayes-risk consistency of regularized boosting methods. The Annals of Statistics, 32, 30–55.

    Article  MathSciNet  MATH  Google Scholar 

  • Mallat, S. G., & Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41, 3397–3415.

    Article  Google Scholar 

  • Mannor, S., Meir, R., & Zhang, T. (2003). Greedy algorithms for classification – consistency, convergence rates, and adaptivity. Journal of Machine Learning Research, 4, 713–742.

    MathSciNet  MATH  Google Scholar 

  • Mason, L., Baxter, L., Bartlett, P., & Frean, M. (1999). Boosting algorithms as gradient descent. In Solla, S. A., Leen, T. K., Müller, K. (Eds.) Proceedings of the 12th International Conference on Neural Information Processing Systems (pp. 512–518). Cambridge, MA: The MIT Press.

    Google Scholar 

  • Mason, L., Baxter, J., Bartlett, P., & Frean, M. (2000). Functional gradient techniques for combining hypotheses. In A. J. Smola, P. L. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 221–246). Cambridge, MA: The MIT Press.

    Google Scholar 

  • Meir, R., & Rätsch, G. (2003). An introduction to boosting and leveraging. In S. Mendelson & A. J. Smola (Eds.), Advanced lectures on machine learning: Machine learning summer school 2002 (pp. 118–183). Berlin: Springer.

    Chapter  Google Scholar 

  • Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5, 197–227.

    Article  Google Scholar 

  • Temlyakov, V. N. (2000). Weak greedy algorithms. Advances in Computational Mathematics, 12, 213–227.

    Article  MathSciNet  Google Scholar 

  • Zhang, T. (2004). Statistical behavior and consistency of classification methods based on convex risk minimization. The Annals of Statistics, 32, 56–85.

    Article  MathSciNet  Google Scholar 

  • Zhang, T., & Yu, B. (2005). Boosting with early stopping: Convergence and consistency. The Annals of Statistics, 33, 1538–1579.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We greatly thank the Editors and two referees for valuable comments and insightful suggestions, which led to a substantial improvement of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gérard Biau .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 222 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Biau, G., Cadre, B. (2021). Optimization by Gradient Boosting. In: Daouia, A., Ruiz-Gazen, A. (eds) Advances in Contemporary Statistics and Econometrics. Springer, Cham. https://doi.org/10.1007/978-3-030-73249-3_2

Download citation

Publish with us

Policies and ethics