Abstract
The mean squared error loss is widely used in many applications, including auto-encoders, multi-target regression, and matrix factorization, to name a few. Despite computational advantages due to its differentiability, it is not robust to outliers. In contrast, \(\ell _p\) norms are known to be robust, but cannot be optimized via, e.g., stochastic gradient descent, as they are non-differentiable. We propose an algorithm inspired by so-called model-based optimization (MBO) [35, 36], which replaces a non-convex objective with a convex model function and alternates between optimizing the model function and updating the solution. We apply this to robust regression, proposing SADM, a stochastic variant of the Online Alternating Direction Method of Multipliers (OADM) [48] to solve the inner optimization in MBO. We show that SADM converges with the rate \(O(\log T/T)\). Finally, we demonstrate experimentally (a) the robustness of \(\ell _p\) norms to outliers and (b) the efficiency of our proposed model-based algorithms in comparison with gradient methods on autoencoders and multi-target regression.
The authors gratefully acknowledge support from the National Science Foundation (Grants CCF-1750539, IIS-1741197, and CNS-1717213), DARPA (Grant HR0011-17-C-0050), and a research grant from American Tower Corp.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Lojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Baccini, A., Besse, P., Falguerolles, A.: A l1-norm PCA and a heuristic approach. Ordinal Symbolic Data Anal. 1(1), 359–368 (1996)
Blumensath, T., Davies, M.E.: Iterative hard thresholding for compressed sensing. Appl. Comput. Harmonic Anal. 27(3), 265–274 (2009)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)
Croux, C., Filzmoser, P.: Robust factorization of a data matrix. In: COMPSTAT, pp. 245–250. Springer (1998). https://doi.org/10.1007/978-3-662-01131-7_29
Davis, D., Drusvyatskiy, D.: Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1), 207–239 (2019)
Davis, D., Grimmer, B.: Proximally guided stochastic subgradient method for nonsmooth, nonconvex problems. SIAM J. Optim. 29(3), 1908–1930 (2019)
Ding, C., Zhou, D., He, X., Zha, H.: R1-PCA: rotational invariant l 1-norm principal component analysis for robust subspace factorization. In: ICML (2006)
Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43(3), 919–948 (2018)
Drusvyatskiy, D., Paquette, C.: Efficiency of minimizing compositions of convex functions and smooth maps. Math. Prog. 178(1), 503–558 (2019)
Du, L., et al.: Robust multiple kernel k-means using l21-norm. In: IJCAI (2015)
Duchi, J.C., Ruan, F.: Stochastic methods for composite and weakly convex optimization problems. SIAM J. Optim. 28(4), 3229–3259 (2018)
Eriksson, A., Van Den Hengel, A.: Efficient computation of robust low-rank matrix approximations in the presence of missing data using the l1 norm. In: CVPR (2010)
Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the \(\beta \)-divergence. Neural Comput. 23(9), 2421–2456 (2011)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. SSS, Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Gillis, N.: Nonnegative Matrix Factorization. SIAM - Society for Industrial and Applied Mathematics, Philadelphia (2020)
Hosseini, S., Chapman, A., Mesbahi, M.: Online distributed ADMM via dual averaging. In: CDC (2014)
Jiang, W., Gao, H., Chung, F.L., Huang, H.: The l2,1-norm stacked robust autoencoders for domain adaptation. In: AAAI (2016)
Ke, Q., Kanade, T.: Robust l1 factorization in the presence of outliers and missing data by alternative convex programming. In: CVPR (2005)
Kong, D., Ding, C., Huang, H.: Robust nonnegative matrix factorization using l21-norm. In: CIKM (2011)
Kwak, N.: Principal component analysis based on l1-norm maximization. IEEE Trans. Pattern Anal. Mach. Intell. 30(9), 1672–1680 (2008)
Le, H., Gillis, N., Patrinos, P.: Inertial block proximal methods for non-convex non-smooth optimization. In: ICML (2020)
Lewis, A.S., Wright, S.J.: A proximal method for composite minimization. Math. Prog. 158(1), 501–546 (2016)
Li, X., Pang, Y., Yuan, Y.: l1-norm-based 2DPCA. IEEE Trans. Syst. Man Cybern. Part B Cybern. 40(4), 1170–1175 (2010)
Liu, J., Ye, J.: Efficient l1/lq NormRregularization. arXiv preprint arXiv:1009.4766 (2010)
Liu, Y., Shang, F., Cheng, J.: Accelerated variance reduced stochastic ADMM. In: AAAI (2017)
Mai, V., Johansson, M.: Convergence of a stochastic gradient method with momentum for non-smooth non-convex optimization. In: ICML (2020)
Mehta, J., Gupta, K., Gogna, A., Majumdar, A., Anand, S.: Stacked robust autoencoder for classification. In: NeurIPS (2016)
Michelot, C.: A finite algorithm for finding the projection of a point onto the canonical simplex of n. J. Optim. Theor. Appl. 50(1), 1–6 (1986)
Moharrer, A., Gao, J., Wang, S., Bento, J., Ioannidis, S.: Massively distributed graph distances. IEEE Trans. Sig. Inf. Process. Netw. 6, 667–683 (2020)
Moharrer, A., Kamran, K., Yeh, E., Ioannidis, S.: Robust regression via model based methods. arXiv preprint arXiv:2106.10759 (2021)
Moreau, J.J.: Décomposition orthogonale d’un espace hilbertien selon deux cônes mutuellement polaires. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 238–240 (1962)
Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)
Nie, F., Huang, H., Cai, X., Ding, C.H.: Efficient and robust feature selection via joint l2,1-norms minimization. In: NIPS (2010)
Ochs, P., Fadili, J., Brox, T.: Non-smooth non-convex Bregman minimization: unification and new algorithms. J. Optim. Theor. Appl. 181(1), 244–278 (2019)
Ochs, P., Malitsky, Y.: Model function based conditional gradient method with Armijo-like line search. In: Proceedings of the 36th International Conference on Machine Learning (2019)
Ouyang, H., He, N., Tran, L., Gray, A.: Stochastic alternating direction method of multipliers. In: International Conference on Machine Learning, pp. 80–88. PMLR (2013)
Paatero, P., Tapper, U.: Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2), 111–126 (1994)
Peng, Y., Ganesh, A., Wright, J., Xu, W., Ma, Y.: Rasl: robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2233–2246 (2012)
Pesme, S., Flammarion, N.: Online robust regression via SGD on the l1 loss. In: NeurIPS (2020)
Qian, M., Zhai, C.: Robust unsupervised feature selection. In: IJCAI (2013)
Spyromitros-Xioufis, E., Tsoumakas, G., Groves, W., Vlahavas, I.: Multi-target regression via input space expansion: treating targets as inputs. Mach. Learn. 104(1), 55–98 (2016)
Suzuki, T.: Dual averaging and proximal gradient descent for online alternating direction multiplier method. In: ICML (2013)
Tao, M., Yuan, X.: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 21(1), 57–81 (2011)
Vial, J.P.: Strong and weak convexity of sets and functions. Math. Oper. Res. 8(2), 231–259 (1983)
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(Dec), 3371–3408 (2010)
Waegeman, W., Dembczyński, K., Hüllermeier, E.: Multi-target prediction: a unifying view on problems and methods. Data Min. Knowl. Discovery 33(2), 293–324 (2019)
Wang, H., Banerjee, A.: Online alternating direction method. In: ICML (2012)
Zheng, S., Kwok, J.T.: Fast-and-light stochastic ADMM. In: IJCAI, pp. 2407–2613 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Moharrer, A., Kamran, K., Yeh, E., Ioannidis, S. (2021). Robust Regression via Model Based Methods. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12977. Springer, Cham. https://doi.org/10.1007/978-3-030-86523-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-86523-8_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86522-1
Online ISBN: 978-3-030-86523-8
eBook Packages: Computer ScienceComputer Science (R0)