Robust Regression via Model Based Methods

Moharrer, Armin; Kamran, Khashayar; Yeh, Edmund; Ioannidis, Stratis

doi:10.1007/978-3-030-86523-8_13

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12977))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2020 Accesses
1 Citations

Abstract

The mean squared error loss is widely used in many applications, including auto-encoders, multi-target regression, and matrix factorization, to name a few. Despite computational advantages due to its differentiability, it is not robust to outliers. In contrast, $\ell _p$ norms are known to be robust, but cannot be optimized via, e.g., stochastic gradient descent, as they are non-differentiable. We propose an algorithm inspired by so-called model-based optimization (MBO) [35, 36], which replaces a non-convex objective with a convex model function and alternates between optimizing the model function and updating the solution. We apply this to robust regression, proposing SADM, a stochastic variant of the Online Alternating Direction Method of Multipliers (OADM) [48] to solve the inner optimization in MBO. We show that SADM converges with the rate $O(\log T/T)$. Finally, we demonstrate experimentally (a) the robustness of $\ell _p$ norms to outliers and (b) the efficiency of our proposed model-based algorithms in comparison with gradient methods on autoencoders and multi-target regression.

The authors gratefully acknowledge support from the National Science Foundation (Grants CCF-1750539, IIS-1741197, and CNS-1717213), DARPA (Grant HR0011-17-C-0050), and a research grant from American Tower Corp.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Signal Recovery by Stochastic Optimization

Article 16 October 2019

Geometric Optimization in Machine Learning

$$\ell _1$$ Regularized Robust and Sparse Linear Modeling Using Discrete Optimization

Notes

1.
https://github.com/neu-spiral/ModelBasedOptimization.

References

Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Lojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Article MathSciNet Google Scholar
Baccini, A., Besse, P., Falguerolles, A.: A l1-norm PCA and a heuristic approach. Ordinal Symbolic Data Anal. 1(1), 359–368 (1996)
Article Google Scholar
Blumensath, T., Davies, M.E.: Iterative hard thresholding for compressed sensing. Appl. Comput. Harmonic Anal. 27(3), 265–274 (2009)
Article MathSciNet Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)
Google Scholar
Croux, C., Filzmoser, P.: Robust factorization of a data matrix. In: COMPSTAT, pp. 245–250. Springer (1998). https://doi.org/10.1007/978-3-662-01131-7_29
Davis, D., Drusvyatskiy, D.: Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1), 207–239 (2019)
Article MathSciNet Google Scholar
Davis, D., Grimmer, B.: Proximally guided stochastic subgradient method for nonsmooth, nonconvex problems. SIAM J. Optim. 29(3), 1908–1930 (2019)
Article MathSciNet Google Scholar
Ding, C., Zhou, D., He, X., Zha, H.: R1-PCA: rotational invariant l 1-norm principal component analysis for robust subspace factorization. In: ICML (2006)
Google Scholar
Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43(3), 919–948 (2018)
Article MathSciNet Google Scholar
Drusvyatskiy, D., Paquette, C.: Efficiency of minimizing compositions of convex functions and smooth maps. Math. Prog. 178(1), 503–558 (2019)
Article MathSciNet Google Scholar
Du, L., et al.: Robust multiple kernel k-means using l21-norm. In: IJCAI (2015)
Google Scholar
Duchi, J.C., Ruan, F.: Stochastic methods for composite and weakly convex optimization problems. SIAM J. Optim. 28(4), 3229–3259 (2018)
Article MathSciNet Google Scholar
Eriksson, A., Van Den Hengel, A.: Efficient computation of robust low-rank matrix approximations in the presence of missing data using the l1 norm. In: CVPR (2010)
Google Scholar
Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the $\beta $-divergence. Neural Comput. 23(9), 2421–2456 (2011)
Article MathSciNet Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. SSS, Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Book MATH Google Scholar
Gillis, N.: Nonnegative Matrix Factorization. SIAM - Society for Industrial and Applied Mathematics, Philadelphia (2020)
Book Google Scholar
Hosseini, S., Chapman, A., Mesbahi, M.: Online distributed ADMM via dual averaging. In: CDC (2014)
Google Scholar
Jiang, W., Gao, H., Chung, F.L., Huang, H.: The l2,1-norm stacked robust autoencoders for domain adaptation. In: AAAI (2016)
Google Scholar
Ke, Q., Kanade, T.: Robust l1 factorization in the presence of outliers and missing data by alternative convex programming. In: CVPR (2005)
Google Scholar
Kong, D., Ding, C., Huang, H.: Robust nonnegative matrix factorization using l21-norm. In: CIKM (2011)
Google Scholar
Kwak, N.: Principal component analysis based on l1-norm maximization. IEEE Trans. Pattern Anal. Mach. Intell. 30(9), 1672–1680 (2008)
Article MathSciNet Google Scholar
Le, H., Gillis, N., Patrinos, P.: Inertial block proximal methods for non-convex non-smooth optimization. In: ICML (2020)
Google Scholar
Lewis, A.S., Wright, S.J.: A proximal method for composite minimization. Math. Prog. 158(1), 501–546 (2016)
Article MathSciNet Google Scholar
Li, X., Pang, Y., Yuan, Y.: l1-norm-based 2DPCA. IEEE Trans. Syst. Man Cybern. Part B Cybern. 40(4), 1170–1175 (2010)
Google Scholar
Liu, J., Ye, J.: Efficient l1/lq NormRregularization. arXiv preprint arXiv:1009.4766 (2010)
Liu, Y., Shang, F., Cheng, J.: Accelerated variance reduced stochastic ADMM. In: AAAI (2017)
Google Scholar
Mai, V., Johansson, M.: Convergence of a stochastic gradient method with momentum for non-smooth non-convex optimization. In: ICML (2020)
Google Scholar
Mehta, J., Gupta, K., Gogna, A., Majumdar, A., Anand, S.: Stacked robust autoencoder for classification. In: NeurIPS (2016)
Google Scholar
Michelot, C.: A finite algorithm for finding the projection of a point onto the canonical simplex of n. J. Optim. Theor. Appl. 50(1), 1–6 (1986)
Google Scholar
Moharrer, A., Gao, J., Wang, S., Bento, J., Ioannidis, S.: Massively distributed graph distances. IEEE Trans. Sig. Inf. Process. Netw. 6, 667–683 (2020)
MathSciNet Google Scholar
Moharrer, A., Kamran, K., Yeh, E., Ioannidis, S.: Robust regression via model based methods. arXiv preprint arXiv:2106.10759 (2021)
Moreau, J.J.: Décomposition orthogonale d’un espace hilbertien selon deux cônes mutuellement polaires. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 238–240 (1962)
MathSciNet MATH Google Scholar
Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)
Article MathSciNet Google Scholar
Nie, F., Huang, H., Cai, X., Ding, C.H.: Efficient and robust feature selection via joint l2,1-norms minimization. In: NIPS (2010)
Google Scholar
Ochs, P., Fadili, J., Brox, T.: Non-smooth non-convex Bregman minimization: unification and new algorithms. J. Optim. Theor. Appl. 181(1), 244–278 (2019)
Article MathSciNet Google Scholar
Ochs, P., Malitsky, Y.: Model function based conditional gradient method with Armijo-like line search. In: Proceedings of the 36th International Conference on Machine Learning (2019)
Google Scholar
Ouyang, H., He, N., Tran, L., Gray, A.: Stochastic alternating direction method of multipliers. In: International Conference on Machine Learning, pp. 80–88. PMLR (2013)
Google Scholar
Paatero, P., Tapper, U.: Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2), 111–126 (1994)
Article Google Scholar
Peng, Y., Ganesh, A., Wright, J., Xu, W., Ma, Y.: Rasl: robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2233–2246 (2012)
Article Google Scholar
Pesme, S., Flammarion, N.: Online robust regression via SGD on the l1 loss. In: NeurIPS (2020)
Google Scholar
Qian, M., Zhai, C.: Robust unsupervised feature selection. In: IJCAI (2013)
Google Scholar
Spyromitros-Xioufis, E., Tsoumakas, G., Groves, W., Vlahavas, I.: Multi-target regression via input space expansion: treating targets as inputs. Mach. Learn. 104(1), 55–98 (2016)
Article MathSciNet Google Scholar
Suzuki, T.: Dual averaging and proximal gradient descent for online alternating direction multiplier method. In: ICML (2013)
Google Scholar
Tao, M., Yuan, X.: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 21(1), 57–81 (2011)
Article MathSciNet Google Scholar
Vial, J.P.: Strong and weak convexity of sets and functions. Math. Oper. Res. 8(2), 231–259 (1983)
Article MathSciNet Google Scholar
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(Dec), 3371–3408 (2010)
Google Scholar
Waegeman, W., Dembczyński, K., Hüllermeier, E.: Multi-target prediction: a unifying view on problems and methods. Data Min. Knowl. Discovery 33(2), 293–324 (2019)
Article MathSciNet Google Scholar
Wang, H., Banerjee, A.: Online alternating direction method. In: ICML (2012)
Google Scholar
Zheng, S., Kwok, J.T.: Fast-and-light stochastic ADMM. In: IJCAI, pp. 2407–2613 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Northeastern University, Boston, MA, 02115, USA
Armin Moharrer, Khashayar Kamran, Edmund Yeh & Stratis Ioannidis

Authors

Armin Moharrer
View author publications
You can also search for this author in PubMed Google Scholar
Khashayar Kamran
View author publications
You can also search for this author in PubMed Google Scholar
Edmund Yeh
View author publications
You can also search for this author in PubMed Google Scholar
Stratis Ioannidis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Armin Moharrer .

Editor information

Editors and Affiliations

ELLIS - The European Laboratory for Learning and Intelligent Systems, Alicante, Spain
Nuria Oliver
ETHZ and EPFL, Zürich, Switzerland
Fernando Pérez-Cruz
Johannes Gutenberg University of Mainz, Mainz, Germany
Stefan Kramer
École Polytechnique, Palaiseau, France
Jesse Read
Basque Center for Applied Mathematics, Bilbao, Spain
Jose A. Lozano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moharrer, A., Kamran, K., Yeh, E., Ioannidis, S. (2021). Robust Regression via Model Based Methods. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12977. Springer, Cham. https://doi.org/10.1007/978-3-030-86523-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-86523-8_13
Published: 11 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86522-1
Online ISBN: 978-3-030-86523-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)