Abstract
Covariance matrix estimation is an important problem in statistics, with wide applications in finance, neuroscience, meteorology, oceanography, and other fields. However, when the data are high-dimensional and constantly generated and updated in a streaming fashion, the covariance matrix estimation faces huge challenges, including the curse of dimensionality and limited memory space. The existing methods either assume sparsity, ignoring any possible common factor among the variables, or obtain poor performance in recovering the covariance matrix directly from sketched data. To address these issues, we propose a novel method - KEEF: Knowledge-based Time and Memory Efficient Covariance Estimator in Factor Model. Our method leverages historical data to train a knowledge-based sketch matrix, which is used to accelerate the factor analysis of streaming data and directly estimates the covariance matrix from the sketched data. We provide theoretical guarantees, showing the advantages of our method in terms of time and space complexity, as well as accuracy. We conduct extensive experiments on synthetic and real-world data, comparing KEEF with several state-of-the-art methods, demonstrating the superior performance of our method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bickel, P.J., Levina, E.: Covariance regularization by thresholding (2008)
Breitung, J., Tenhofen, J.: Gls estimation of dynamic factor models. J. Am. Stat. Assoc. 106(495), 1150–1166 (2011)
Cai, T.T., Ren, Z., Zhou, H.H.: Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation (2016)
Chiang, T.C., Jeon, B.N., Li, H.: Dynamic correlation analysis of financial contagion: Evidence from asian markets. J. Int. Money Financ. 26(7), 1206–1228 (2007)
Clarkson, K.L., Woodruff, D.P.: Numerical linear algebra in the streaming model. In: Proceedings of the forty-first annual ACM symposium on Theory of computing. pp. 205–214 (2009)
Dasarathy, G., Shah, P., Bhaskar, B.N., Nowak, R.: Covariance sketching. In: 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton). pp. 1026–1033. IEEE (2012)
Dasarathy, G., Shah, P., Bhaskar, B.N., Nowak, R.: Sketching sparse matrices. arXiv preprint arXiv:1303.6544 (2013)
Dasarathy, G., Shah, P., Bhaskar, B.N., Nowak, R.D.: Sketching sparse matrices, covariances, and graphs via tensor products. IEEE Trans. Inf. Theory 61(3), 1373–1388 (2015)
El Karoui, N.: High-dimensionality effects in the markowitz problem and other quadratic programs with linear constraints: Risk underestimation (2010)
Fan, J., Liao, Y., Liu, H.: An overview of the estimation of large covariance and precision matrices. Economet. J. 19(1), C1–C32 (2016)
Fan, J., Liao, Y., Mincheva, M.: Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Ser. B Stat Methodol. 75(4), 603–680 (2013)
Ghashami, M., Liberty, E., Phillips, J.M., Woodruff, D.P.: Frequent directions: Simple and deterministic matrix sketching. SIAM J. Comput. 45(5), 1762–1792 (2016)
Junior, L.S., Franca, I.D.P.: Correlation of financial markets in times of crisis. Physica A 391(1–2), 187–208 (2012)
Lam, C.: Nonparametric eigenvalue-regularized precision or covariance matrix estimator. The Annals of Statistics 44(3), 928 – 953 (2016). https://doi.org/10.1214/15-AOS1393, https://doi.org/10.1214/15-AOS1393
Lam, C., Fan, J.: Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Stat. 37(6B), 4254 (2009)
Lam, C., Yao, Q.: Factor modeling for high-dimensional time series: inference for the number of factors. The Annals of Statistics pp. 694–726 (2012)
Lu, Y., Kumar, J., Collier, N., Krishna, B., Langston, M.A.: Detecting outliers in streaming time series data from arm distributed sensors. In: 2018 IEEE International Conference on Data Mining Workshops (ICDMW). pp. 779–786. IEEE (2018)
Mitra, R., Zhang, C.H.: Multivariate analysis of nonparametric estimates of large correlation matrices. arXiv preprint arXiv:1403.6195 (2014)
Onatski, A.: Asymptotics of the principal components estimator of large factor models with weakly influential factors. Journal of Econometrics 168(2), 244–258 (2012)
Rigollet, P., Tsybakov, A.: Estimation of covariance matrices under sparsity constraints. arXiv preprint arXiv:1205.1210 (2012)
Rousseeuw, P.J., Driessen, K.V.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3), 212–223 (1999)
Wegkamp, M., Zhao, Y.: Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas (2016)
Woodruff, D.P., et al.: Sketching as a tool for numerical linear algebra. Foundations and Trends® in Theoretical Computer Science 10(1–2), 1–157 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Tan, X., Wang, Z., Wang, M., Shen, D., Chen, W., Wang, B. (2024). Large Covariance Estimation from Streaming Data with Knowledge-Based Sketch Matrix. In: Onizuka, M., et al. Database Systems for Advanced Applications. DASFAA 2024. Lecture Notes in Computer Science, vol 14854. Springer, Singapore. https://doi.org/10.1007/978-981-97-5569-1_32
Download citation
DOI: https://doi.org/10.1007/978-981-97-5569-1_32
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5568-4
Online ISBN: 978-981-97-5569-1
eBook Packages: Computer ScienceComputer Science (R0)