Abstract
Matrix sketching is a technique used to create summaries of large matrices. Frequent directions (FD) and its parameterized variant, \(\alpha \)-FD are deterministic sketching techniques that have theoretical guarantees and also work well in practice. An algorithm called the iterative singular value decomposition (iSVD) has been shown to have better performance than FD and \(\alpha \)-FD in several datasets, despite the lack of theoretical guarantees. However, in datasets with major and sudden drift, iSVD performs poorly when compared to the other algorithms. The \(\alpha \)-FD algorithm has better error guarantees and empirical performance when compared to FD. However, it has two limitations: the restriction on the effective values of its parameter \(\alpha \) due to its dependence on sketch size and its constant factor reduction from selected squared singular values, both of which result in reduced empirical performance. In this paper, we present a modified parameterized FD algorithm, \(\beta \)-FD in order to overcome the limitations of \(\alpha \)-FD, while maintaining similar error guarantees to that of \(\alpha \)-FD. Empirical results on datasets with sudden and major drift and those with gradual and minor or no drift indicate that there is a trade-off between the errors in both kinds of data for different parameter values, and for \(\beta \approx 28\), our algorithm has overall better error performance than \(\alpha \)-FD.



















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Achlioptas D, McSherry F (2007) Fast computation of low-rank matrix approximations. J. ACM (JACM) 54(2):9
Anguita D, Ghio A, Oneto L, Parra X, Reyes-Ortiz JL (2013) A public domain dataset for human activity recognition using smartphones. In: ESANN
Boutsidis C, Mahoney MW, Drineas P (2009) An improved approximation algorithm for the column subset selection problem. In: Proceedings of the twentieth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 968–977
Brand M (2002) Incremental singular value decomposition of uncertain data with missing values. In: European conference on computer vision. Springer, Berlin, pp 707–720
Buss S (2016) Connectus data set Florida sparse matrix collection. http://www.cise.ufl.edu/research/sparse/matrices/Buss/connectus.html
Clarkson KL, Woodruff DP (2013) Low rank approximation and regression in input sparsity time. In: Proceedings of the forty-fifth annual ACM symposium on theory of computing. ACM, New York, pp 81–90
Cuturi M (2011) Fast global alignment kernels. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 929–936
Desai A, Ghashami M, Phillips JM (2016) Improved practical matrix sketching with guarantees. IEEE Trans Knowl Data Eng 28(7):1678–1690
Drineas P, Kannan R, Mahoney MW (2006) Fast monte carlo algorithms for matrices II: computing a low-rank approximation to a matrix. SIAM J Comput 36(1):158–183
Ghashami M, Phillips JM (2014) Relative errors for deterministic low-rank matrix approximations. In: Proceedings of the twenty-fifth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 707–717
Ghashami M, Desai A, Phillips JM (2014) Improved practical matrix sketching with guarantees. In: European symposium on algorithms. Springer, Berlin, pp 467–479
Ghashami M, Liberty E, Phillips JM, Woodruff DP (2016) Frequent directions: simple and deterministic matrix sketching. SIAM J Comput 45(5):1762–1792
Hall PM, Marshall AD, Martin RR (1998) Incremental eigenanalysis for classification. In: BMVC, vol 98. Citeseer, pp 286–295
Har-Peled S (2014) Low rank matrix approximation in linear time. arXiv preprint arXiv:1410.8802
Hoens TR, Chawla NV, Polikar R (2011) Heuristic updatable weighted random subspaces for non-stationary environments. In: 2011 IEEE 11th international conference on data mining (ICDM). IEEE, Washington, pp 241–250
Katakis I, Tsoumakas G, Vlahavas IP (2008) An ensemble of classifiers for coping with recurring contexts in data streams. In: ECAI, pp 763–764
Katakis I, Tsoumakas G, Vlahavas I (2010) Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl Inf Syst 22(3):371–391
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Lecun Y, Cortes C (2009) The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/
Levey A, Lindenbaum M (2000) Sequential Karhunen–Loeve basis extraction and its application to images. IEEE Trans Image Process 9(8):1371–1374
Liberty E (2013) Simple and deterministic matrix sketching. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, London, pp 581–588
Mahoney MW (2011) Randomized algorithms for matrices and data. Found Trends Mach Learn 3(2):123–224
Nelson J, Nguyên HL (2013) Osnap: faster numerical linear algebra algorithms via sparser subspace embeddings. In: 2013 IEEE 54th annual symposium on Foundations of Computer Science (FOCS). IEEE, Washington, pp 117–126
Sarlós T (2006) Improved approximation algorithms for large matrices via random projections. In: 2006 47th annual IEEE symposium on foundations of computer science (FOCS’06). IEEE, Washington, pp 143–152
Schlimmer JC, Granger RH (1986) Incremental learning from noisy data. Mach Learn 1(3):317–354
Tsymbal A (2004) The problem of concept drift: definitions and related work. Computer Science Department, Trinity College Dublin, Dublin 106(2)
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The Caltech-UCSD birds-200-2011 dataset. Tech. rep, California Institute of Technology
Webb GI, Hyde R, Cao H, Nguyen HL, Petitjean F (2016) Characterizing concept drift. Data Min Knowl Discov 4(30):964–994
Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101
Woodruff DP et al (2014) Sketching as a tool for numerical linear algebra. Found Trends Theor Comput Sci 10(1–2):1–157
Acknowledgements
The authors would like to thank the financial support offered by the Visvesvaraya Ph.D. Scheme for Electronics and Information Technology, Ministry of Electronics and Information Technology (MeitY), Govt. of India.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Johannes Fürnkranz.
Rights and permissions
About this article
Cite this article
Francis, D.P., Raimond, K. An improvement of the parameterized frequent directions algorithm. Data Min Knowl Disc 32, 453–482 (2018). https://doi.org/10.1007/s10618-017-0542-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-017-0542-x