Skip to main content

Robust Regression via Model Based Methods

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases. Research Track (ECML PKDD 2021)

Abstract

The mean squared error loss is widely used in many applications, including auto-encoders, multi-target regression, and matrix factorization, to name a few. Despite computational advantages due to its differentiability, it is not robust to outliers. In contrast, \(\ell _p\) norms are known to be robust, but cannot be optimized via, e.g., stochastic gradient descent, as they are non-differentiable. We propose an algorithm inspired by so-called model-based optimization (MBO) [35, 36], which replaces a non-convex objective with a convex model function and alternates between optimizing the model function and updating the solution. We apply this to robust regression, proposing SADM, a stochastic variant of the Online Alternating Direction Method of Multipliers (OADM) [48] to solve the inner optimization in MBO. We show that SADM converges with the rate \(O(\log T/T)\). Finally, we demonstrate experimentally (a) the robustness of \(\ell _p\) norms to outliers and (b) the efficiency of our proposed model-based algorithms in comparison with gradient methods on autoencoders and multi-target regression.

The authors gratefully acknowledge support from the National Science Foundation (Grants CCF-1750539, IIS-1741197, and CNS-1717213), DARPA (Grant HR0011-17-C-0050), and a research grant from American Tower Corp.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/neu-spiral/ModelBasedOptimization.

References

  1. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Lojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)

    Article  MathSciNet  Google Scholar 

  2. Baccini, A., Besse, P., Falguerolles, A.: A l1-norm PCA and a heuristic approach. Ordinal Symbolic Data Anal. 1(1), 359–368 (1996)

    Article  Google Scholar 

  3. Blumensath, T., Davies, M.E.: Iterative hard thresholding for compressed sensing. Appl. Comput. Harmonic Anal. 27(3), 265–274 (2009)

    Article  MathSciNet  Google Scholar 

  4. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)

    Google Scholar 

  5. Croux, C., Filzmoser, P.: Robust factorization of a data matrix. In: COMPSTAT, pp. 245–250. Springer (1998). https://doi.org/10.1007/978-3-662-01131-7_29

  6. Davis, D., Drusvyatskiy, D.: Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1), 207–239 (2019)

    Article  MathSciNet  Google Scholar 

  7. Davis, D., Grimmer, B.: Proximally guided stochastic subgradient method for nonsmooth, nonconvex problems. SIAM J. Optim. 29(3), 1908–1930 (2019)

    Article  MathSciNet  Google Scholar 

  8. Ding, C., Zhou, D., He, X., Zha, H.: R1-PCA: rotational invariant l 1-norm principal component analysis for robust subspace factorization. In: ICML (2006)

    Google Scholar 

  9. Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43(3), 919–948 (2018)

    Article  MathSciNet  Google Scholar 

  10. Drusvyatskiy, D., Paquette, C.: Efficiency of minimizing compositions of convex functions and smooth maps. Math. Prog. 178(1), 503–558 (2019)

    Article  MathSciNet  Google Scholar 

  11. Du, L., et al.: Robust multiple kernel k-means using l21-norm. In: IJCAI (2015)

    Google Scholar 

  12. Duchi, J.C., Ruan, F.: Stochastic methods for composite and weakly convex optimization problems. SIAM J. Optim. 28(4), 3229–3259 (2018)

    Article  MathSciNet  Google Scholar 

  13. Eriksson, A., Van Den Hengel, A.: Efficient computation of robust low-rank matrix approximations in the presence of missing data using the l1 norm. In: CVPR (2010)

    Google Scholar 

  14. Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the \(\beta \)-divergence. Neural Comput. 23(9), 2421–2456 (2011)

    Article  MathSciNet  Google Scholar 

  15. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. SSS, Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7

    Book  MATH  Google Scholar 

  16. Gillis, N.: Nonnegative Matrix Factorization. SIAM - Society for Industrial and Applied Mathematics, Philadelphia (2020)

    Book  Google Scholar 

  17. Hosseini, S., Chapman, A., Mesbahi, M.: Online distributed ADMM via dual averaging. In: CDC (2014)

    Google Scholar 

  18. Jiang, W., Gao, H., Chung, F.L., Huang, H.: The l2,1-norm stacked robust autoencoders for domain adaptation. In: AAAI (2016)

    Google Scholar 

  19. Ke, Q., Kanade, T.: Robust l1 factorization in the presence of outliers and missing data by alternative convex programming. In: CVPR (2005)

    Google Scholar 

  20. Kong, D., Ding, C., Huang, H.: Robust nonnegative matrix factorization using l21-norm. In: CIKM (2011)

    Google Scholar 

  21. Kwak, N.: Principal component analysis based on l1-norm maximization. IEEE Trans. Pattern Anal. Mach. Intell. 30(9), 1672–1680 (2008)

    Article  MathSciNet  Google Scholar 

  22. Le, H., Gillis, N., Patrinos, P.: Inertial block proximal methods for non-convex non-smooth optimization. In: ICML (2020)

    Google Scholar 

  23. Lewis, A.S., Wright, S.J.: A proximal method for composite minimization. Math. Prog. 158(1), 501–546 (2016)

    Article  MathSciNet  Google Scholar 

  24. Li, X., Pang, Y., Yuan, Y.: l1-norm-based 2DPCA. IEEE Trans. Syst. Man Cybern. Part B Cybern. 40(4), 1170–1175 (2010)

    Google Scholar 

  25. Liu, J., Ye, J.: Efficient l1/lq NormRregularization. arXiv preprint arXiv:1009.4766 (2010)

  26. Liu, Y., Shang, F., Cheng, J.: Accelerated variance reduced stochastic ADMM. In: AAAI (2017)

    Google Scholar 

  27. Mai, V., Johansson, M.: Convergence of a stochastic gradient method with momentum for non-smooth non-convex optimization. In: ICML (2020)

    Google Scholar 

  28. Mehta, J., Gupta, K., Gogna, A., Majumdar, A., Anand, S.: Stacked robust autoencoder for classification. In: NeurIPS (2016)

    Google Scholar 

  29. Michelot, C.: A finite algorithm for finding the projection of a point onto the canonical simplex of n. J. Optim. Theor. Appl. 50(1), 1–6 (1986)

    Google Scholar 

  30. Moharrer, A., Gao, J., Wang, S., Bento, J., Ioannidis, S.: Massively distributed graph distances. IEEE Trans. Sig. Inf. Process. Netw. 6, 667–683 (2020)

    MathSciNet  Google Scholar 

  31. Moharrer, A., Kamran, K., Yeh, E., Ioannidis, S.: Robust regression via model based methods. arXiv preprint arXiv:2106.10759 (2021)

  32. Moreau, J.J.: Décomposition orthogonale d’un espace hilbertien selon deux cônes mutuellement polaires. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 238–240 (1962)

    MathSciNet  MATH  Google Scholar 

  33. Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)

    Article  MathSciNet  Google Scholar 

  34. Nie, F., Huang, H., Cai, X., Ding, C.H.: Efficient and robust feature selection via joint l2,1-norms minimization. In: NIPS (2010)

    Google Scholar 

  35. Ochs, P., Fadili, J., Brox, T.: Non-smooth non-convex Bregman minimization: unification and new algorithms. J. Optim. Theor. Appl. 181(1), 244–278 (2019)

    Article  MathSciNet  Google Scholar 

  36. Ochs, P., Malitsky, Y.: Model function based conditional gradient method with Armijo-like line search. In: Proceedings of the 36th International Conference on Machine Learning (2019)

    Google Scholar 

  37. Ouyang, H., He, N., Tran, L., Gray, A.: Stochastic alternating direction method of multipliers. In: International Conference on Machine Learning, pp. 80–88. PMLR (2013)

    Google Scholar 

  38. Paatero, P., Tapper, U.: Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2), 111–126 (1994)

    Article  Google Scholar 

  39. Peng, Y., Ganesh, A., Wright, J., Xu, W., Ma, Y.: Rasl: robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2233–2246 (2012)

    Article  Google Scholar 

  40. Pesme, S., Flammarion, N.: Online robust regression via SGD on the l1 loss. In: NeurIPS (2020)

    Google Scholar 

  41. Qian, M., Zhai, C.: Robust unsupervised feature selection. In: IJCAI (2013)

    Google Scholar 

  42. Spyromitros-Xioufis, E., Tsoumakas, G., Groves, W., Vlahavas, I.: Multi-target regression via input space expansion: treating targets as inputs. Mach. Learn. 104(1), 55–98 (2016)

    Article  MathSciNet  Google Scholar 

  43. Suzuki, T.: Dual averaging and proximal gradient descent for online alternating direction multiplier method. In: ICML (2013)

    Google Scholar 

  44. Tao, M., Yuan, X.: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 21(1), 57–81 (2011)

    Article  MathSciNet  Google Scholar 

  45. Vial, J.P.: Strong and weak convexity of sets and functions. Math. Oper. Res. 8(2), 231–259 (1983)

    Article  MathSciNet  Google Scholar 

  46. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(Dec), 3371–3408 (2010)

    Google Scholar 

  47. Waegeman, W., Dembczyński, K., Hüllermeier, E.: Multi-target prediction: a unifying view on problems and methods. Data Min. Knowl. Discovery 33(2), 293–324 (2019)

    Article  MathSciNet  Google Scholar 

  48. Wang, H., Banerjee, A.: Online alternating direction method. In: ICML (2012)

    Google Scholar 

  49. Zheng, S., Kwok, J.T.: Fast-and-light stochastic ADMM. In: IJCAI, pp. 2407–2613 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Armin Moharrer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Moharrer, A., Kamran, K., Yeh, E., Ioannidis, S. (2021). Robust Regression via Model Based Methods. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12977. Springer, Cham. https://doi.org/10.1007/978-3-030-86523-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86523-8_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86522-1

  • Online ISBN: 978-3-030-86523-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics