Sparse conjugate directions pursuit with application to fixed-size kernel models

Karsmakers, Peter; Pelckmans, Kristiaan; De Brabanter, Kris; Van hamme, Hugo; Suykens, Johan A. K.

doi:10.1007/s10994-011-5253-8

Sparse conjugate directions pursuit with application to fixed-size kernel models

Published: 04 June 2011

Volume 85, pages 109–148, (2011)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Sparse conjugate directions pursuit with application to fixed-size kernel models

Download PDF

Peter Karsmakers^1,3,
Kristiaan Pelckmans²,
Kris De Brabanter¹,
Hugo Van hamme¹ &
…
Johan A. K. Suykens¹

904 Accesses
15 Citations
Explore all metrics

Abstract

This work studies an optimization scheme for computing sparse approximate solutions of over-determined linear systems. Sparse Conjugate Directions Pursuit (SCDP) aims to construct a solution using only a small number of nonzero (i.e. nonsparse) coefficients. Motivations of this work can be found in a setting of machine learning where sparse models typically exhibit better generalization performance, lead to fast evaluations, and might be exploited to define scalable algorithms. The main idea is to build up iteratively a conjugate set of vectors of increasing cardinality, in each iteration solving a small linear subsystem. By exploiting the structure of this conjugate basis, an algorithm is found (i) converging in at most D iterations for D-dimensional systems, (ii) with computational complexity close to the classical conjugate gradient algorithm, and (iii) which is especially efficient when a few iterations suffice to produce a good approximation. As an example, the application of SCDP to Fixed-Size Least Squares Support Vector Machines (FS-LSSVM) is discussed resulting in a scheme which efficiently finds a good model size for the FS-LSSVM setting, and is scalable to large-scale machine learning tasks. The algorithm is empirically verified in a classification context. Further discussion includes algorithmic issues such as component selection criteria, computational analysis, influence of additional hyper-parameters, and determination of a suitable stopping criterion.

Article PDF

Efficient approaches for ℓ ₂-ℓ ₀ regularization and applications to feature selection in SVM

Article 02 April 2016

Spectral pursuit for simultaneous sparse representation with accuracy guarantees

Article 11 December 2023

The Use of Infinities and Infinitesimals for Sparse Classification Problems

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Allan, D. (1974). The relationship between variable selection and prediction. Technometrics, 16, 125–127.
Article MathSciNet Google Scholar
Blumensath, T., & Davies, M. E. (2008). Gradient pursuits. IEEE Transactions on Signal Processing, 56(6), 2370–2382.
Article MathSciNet Google Scholar
Bruckstein, A. M., Donoho, D. L., & Elad, M. (2009). From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Review, 51(1), 34–81.
Article MathSciNet MATH Google Scholar
Bühlmann, P., & Yu, B. (2003). Boosting with the l2-loss: Regression and classification. Journal of the American Statistical Association, 98, 324–339.
Article MathSciNet MATH Google Scholar
Candès, E. (2006). Compressive sampling. In Proc. of the international congress of mathematicians, Madrid, Spain.
Google Scholar
Cawley, G. C. (2006). Leave-one-out cross-validation based model selection criteria for weighted LS-SVMs. In Proc. of the international joint conference on neural networks (IJCNN-2006), Vancouver, Canada (Vol. 2415, pp. 1661–1668).
Chapter Google Scholar
Cawley, G. C., & Talbot, N. L. C. (2002). A greedy training algorithm for sparse least-squares support vector machines. In Proc. of the international conference on artificial neural networks, Madrid, Spain (Vol. 2415, pp. 681–686).
Google Scholar
Chang, C. C., & Lin, C. J. (2001). Libsvm: a library for support vector machines. http://www.csie.ntu.edu.tw/cjlin/libsvm.
Chen, S., Billings, S. A., & Luo, W. (1989). Orthogonal least squares methods and their application to non-linear system identification. International Journal of Control, 50(5), 1873–1896.
Article MathSciNet MATH Google Scholar
Chen, S., Cowan, C., & Grant, P. (1991). Orthogonal least squares learning algorithm for radial basis function networks. IEEE Transactions on Neural Networks, 2(2), 302–309.
Article Google Scholar
Chen, S., Billings, S. A., & Grant, M. (1992). Recursive hybrid algorithm for non-linear system identification using radial basis function networks. International Journal of Control, 55(5), 1051–1070.
Article MathSciNet MATH Google Scholar
Chen, S., Hong, X., Luk, B. L., & Harris, C. J. (2009). Orthogonal-least-squares regression: A unified approach for data modelling. Neurocomputing, 72(10–12), 2670–2681.
Article Google Scholar
Chen, S. S., Donoho, D. L., & Saunders, M. A. (1999). Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20(1), 33–61.
Article MathSciNet MATH Google Scholar
Chu, W., Keerthi, S. S., & Ong, C. (2005). An improved conjugate gradient scheme to the solution of least squares SVM. IEEE Transactions on Neural Networks, 16(2), 498–501.
Article Google Scholar
Daubechies, I., Vore, R. D., Fornasier, M., & Gunturk, S. (2008). Iteratively re-weighted least squares minimization: Proof of faster than linear rate for sparse recovery. In Proc. of the information sciences and systems, Princeton, NJ (pp. 26–29).
Google Scholar
De Brabanter, K., De Brabanter, J., Suykens, J. A. K., & De Moor, B. (2010). Optimized fixed-size kernel models for large data sets. Computational Statistics & Data Analysis, 54(6), 1484–1504.
Article Google Scholar
de Kruif, B. J., & de Vries, T. J. A. (2003). Pruning error minimization in least squares support vector machines. IEEE Transactions on Neural Networks, 14(3), 696–702.
Article Google Scholar
Donoho, D. L. (2006). Compressed sensing. IEEE Transactions on Information Theory, 52(4), 1289–1306.
Article MathSciNet Google Scholar
Donoho, D. L., & Tsaig, Y. (2006). Fast solution of l1-norm minimization problems when the solution may be sparse (Tech. rep.). Stanford University.
Donoho, D. L., Tsaig, Y., Drori, I., & Starck, J. (2006). Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit (Tech. rep.). Stanford University.
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression (with discussion). Annals of Statistics, 32(2), 407–499.
Article MathSciNet MATH Google Scholar
Figueiredo, M. A. T., Nowak, R. D., & Wright, S. J. (2007). Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems. IEEE Journal of Selected Topics in Signal Processing, 1(4), 586–597.
Article Google Scholar
Floyd, S., & Warmuth, M. (1995). Sample compression, learnability, and the Vapnik-Chervonenkis dimension. Machine Learning Journal, 21, 269–304.
Google Scholar
Gonzalez, T. (1985). Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38, 293–306.
Article MathSciNet MATH Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. Berlin: Springer.
MATH Google Scholar
Hestenes, M. R., & Stiefel, E. (1952). Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 49, 409–436.
MathSciNet MATH Google Scholar
Jiao, L., Bo, L., & Wang, L. (2007). Fast sparse approximation for least squares support vector machine. IEEE Transactions on Neural Networks, 18(3), 685–697.
Article Google Scholar
Joachims, T. (1999). Making large-scale SVM learning practical. In B. Schlkopf, C. Burges, & A. Smola (Eds.), Advances in kernel methods—support vector learning. Cambridge: MIT Press, Chap. 11. URL http://www.cs.cornell.edu/People/tj/publications/joachims_99a.pdf.
Google Scholar
Keerthi, S. S., Chapelle, O., Decoste, D., Bennett, P., & Parrado-hernández, E. (2006). Building support vector machines with reduced classifier complexity. Journal of Machine Learning Research, 7, 1493–1515.
Google Scholar
Kim, S. J., Koh, K., Lustig, M., Boyd, S., & Gorinevsky, D. (2007). An interior-point method for large-scale l1-regularized least squares. IEEE Journal on Selected Topics in Signal Processing, 1(4), 606–617.
Article Google Scholar
Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 1464–1480.
Article Google Scholar
Lutz, R. W., & Bühlmann, P. (2006). Conjugate direction boosting. Journal of Computational and Graphical Statistics, 15(2), 287–311.
Article MathSciNet Google Scholar
Mallat, S. (1999). A wavelet tour of signal processing. New York: Academic Press.
MATH Google Scholar
Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Smola, A., & Müller, K. R. (2000). Invariant feature extraction and classification in feature spaces. Advances in Neural Information Processing Systems, 12, 526–532.
Google Scholar
Moghaddam, B., Weiss, Y., & Avidan, S. (2006). Spectral bounds for sparse PCA: exact and greedy algorithms. Advances in Neural Information Processing Systems, 18, 915–922.
Google Scholar
Möller, M. F. (1993). A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks, 6, 525–533.
Article Google Scholar
Natarajan, B. K. (1995). Sparse approximate solutions to linear systems. SIAM Journal on Computing, 24, 227–234.
Article MathSciNet MATH Google Scholar
Nelder, J. A., & Mead, R. (1965). A simplex method for function minimization. Computer Journal, 7, 308–313.
MATH Google Scholar
Nocedal, J., & Wright, S. J. (2006). Numerical optimization (2nd ed.). Berlin: Springer.
MATH Google Scholar
Osborne, M. R., Presnell, B., & Turlach, B. A. (2000). A new approach to variable selection in least squares problems. IMA Journal of Numerical Analysis, 20, 389–403.
Article MathSciNet MATH Google Scholar
Pati, Y. C., Rezaifar, R., & Krishnaprasad, P. S. (1993). Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. In Proc. of the 27-th annual Asilomar conference on signals, systems, and computers (pp. 40–44).
Chapter Google Scholar
Popovici, V., Bengio, S., & Thiran, J. P. (2005). Kernel matching pursuit for large datasets. Pattern Recognition, 38(12), 2385–2390.
Article Google Scholar
Press, W. H., Cornell, B. P. F., Teukolsky, S. A., & Vetterling, W. T. (1993). Numerical recipes in C: The art of scientific computing (2nd ed.). Cambridge: Harvard University Press.
Google Scholar
Rajasekaran, S. (2000). On simulated annealing and nested annealing. Journal of Global Optimization, 16, 43–56.
Article MathSciNet MATH Google Scholar
Saunders, C., Gammerman, A., & Vovk, V. (1998). Ridge regression learning algorithm in dual variables. In Proc. of the 15th int. conf. on machine learning (ICML-98) (pp. 515–521).
Google Scholar
Smola, A. J., & Bartlett, P. L. (2001). Sparse greedy Gaussian process regression. In Proc. neural information processing systems (Vol. 13, pp. 619–625).
Google Scholar
Suykens, J. A. K., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural Processing Letters, 9(3), 293–300.
Article MathSciNet Google Scholar
Suykens, J. A. K., Lukas, L., & Vandewalle, J. (2000). Sparse approximation using least squares support vector machines. In IEEE Proc. int. symp. circuits syst., Geneva, Switzerland (pp. 757–760).
Google Scholar
Suykens, J. A. K., Vandewalle, J., & De Moor, B. (2001). Intelligence and cooperative search by coupled local minimizers. International Journal of Bifurcation and Chaos, 11(8), 2133–2144.
Article Google Scholar
Suykens, J. A. K., De Brabanter, J., Lukas, L., & Vandewalle, J. (2002a). Weighted least squares support vector machines: robustness and sparse approximation. Neurocomputing, 48(1–4), 85–105.
Article MATH Google Scholar
Suykens, J. A. K., Van Gestel, T., De Brabanter, J., De Moor, B., & Vandewalle, J. (2002b). Least squares support vector machines. Singapore: World Scientific.
Book MATH Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society. Series B, 58(1), 267–289.
MathSciNet MATH Google Scholar
Tropp, J. A. (2004). Greed is good: algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50(10), 2231–2242.
Article MathSciNet Google Scholar
Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.
MATH Google Scholar
Vincent, P., & Bengio, Y. (2002). Kernel matching pursuit. Machine Learning, 48, 165–187.
Article MATH Google Scholar
Williams, C. K. I., & Seeger, M. (2001). Using the Nyström method to speed up kernel machines. Advances in Neural Information Processing Systems, 13, 682–688.
Google Scholar
Xavier de Souza, S., Suykens, J. A. K., Vandewalle, J., & Bollé, D. (2010). Coupled simulated annealing. IEEE Transactions on Systems, Man and Cybernetics. Part B, 40(2), 320–336.
Article Google Scholar
Young, D. (2003). Iterative solution of large linear systems. New York: Courier Dover Publications.
MATH Google Scholar
Zeng, X. Y., & Chen, X. W. (2005). Smo-based pruning methods for sparse least squares support vector machines. IEEE Transactions on Neural Networks, 16(6), 1541–1546.
Article Google Scholar
Zhang, K., Tsang, I. W., & Kwok, J. T. (2008). Improved Nyström low-rank approximation and error analysis. In Proc. of the 25th international conference on machine learning, Helsinki, Finland (pp. 1232–1239).
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, K.U.Leuven, Kasteelpark Arenberg 10, 3001, Heverlee, Belgium
Peter Karsmakers, Kris De Brabanter, Hugo Van hamme & Johan A. K. Suykens
Division of Systems and Control, Department of Information Technology, Uppsala University, 751 05, Uppsala, Sweden
Kristiaan Pelckmans
Department IBW, K.H.Kempen (Association K.U.Leuven), 2440, Geel, Belgium
Peter Karsmakers

Authors

Peter Karsmakers
View author publications
You can also search for this author in PubMed Google Scholar
Kristiaan Pelckmans
View author publications
You can also search for this author in PubMed Google Scholar
Kris De Brabanter
View author publications
You can also search for this author in PubMed Google Scholar
Hugo Van hamme
View author publications
You can also search for this author in PubMed Google Scholar
Johan A. K. Suykens
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Karsmakers.

Additional information

Editors: Sūreyya Ōzōǧūr-Akyūz, Devrim Unay, and Alex Smola.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karsmakers, P., Pelckmans, K., De Brabanter, K. et al. Sparse conjugate directions pursuit with application to fixed-size kernel models. Mach Learn 85, 109–148 (2011). https://doi.org/10.1007/s10994-011-5253-8

Download citation

Received: 27 February 2010
Accepted: 16 May 2011
Published: 04 June 2011
Issue Date: October 2011
DOI: https://doi.org/10.1007/s10994-011-5253-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Sparse conjugate directions pursuit with application to fixed-size kernel models

Abstract

Article PDF

Similar content being viewed by others

Efficient approaches for ℓ 2-ℓ 0 regularization and applications to feature selection in SVM

Spectral pursuit for simultaneous sparse representation with accuracy guarantees

The Use of Infinities and Infinitesimals for Sparse Classification Problems

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Efficient approaches for ℓ ₂-ℓ ₀ regularization and applications to feature selection in SVM