Abstract
Dense tensor decompositions have been widely used in many signal processing problems including analyzing speech signals, identifying the localization of signal sources, and many other communication applications. Computing these decompositions poses major computational challenges for big datasets emerging in these domains. CANDECOMP/PARAFAC (CP) and Tucker formulations are the prominent tensor decomposition schemes heavily used in these fields, and the algorithms for computing them involve applying two core operations, namely tensor-times–matrix and tensor-times–vector multiplication, which are executed repetitively within an iterative framework. In the recent past, efficient computational schemes using a data structure called dimension tree, are employed to significantly reduce the cost of these two operations, through storing and reusing partial results that are commonly used across different iterations of these algorithms. This framework has been introduced for sparse CP and Tucker decompositions in the literature, and a recent work investigates using an optimal binary dimension tree structure in computing dense Tucker decompositions. In this paper, we investigate finding an optimal dimension tree for both CP and Tucker decompositions. We show that finding an optimal dimension tree for an N-dimensional tensor is NP-hard for both decompositions, provide faster exact algorithms for finding an optimal dimension tree in \(O(3^N)\) time using \(O(2^N)\) space for the Tucker case, and extend the algorithm to the case of CP decomposition with the same time and space complexities.


Similar content being viewed by others
References
Acar, E., Dunlavy, D.M., Kolda, T.G.: A scalable optimization approach for fitting canonical tensor decompositions. J. Chemom. 25(2), 67–86 (2011)
Andersson, C.A., Bro, R.: The N-way toolbox for MATLAB. Chemom. Intell. Lab. Syst. 52(1), 1–4 (2000)
Bader, B.W., Kolda, T.G.: Efficient MATLAB computations with sparse and factored tensors. SIAM J. Sci. Comput. 30(1), 205–231 (2007)
Bader, B.W., Kolda, T.G., et al.: Matlab tensor toolbox version 2.6. Available online (2015)
Baskaran, M., Meister, B., Vasilache, N., Lethin, R.: Efficient and scalable computations with sparse tensors. In: Proceedings of the IEEE Conference on High Performance Extreme Computing, HPEC 2012, pp. 1–6 (2012). https://doi.org/10.1109/HPEC.2012.6408676
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Jr., E.R.H., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI ’10, pp. 1306–1313. AAAI Press (2010). http://dl.acm.org/citation.cfm?id=2898607.2898816
Carroll, D.J., Chang, J.: Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition. Psychometrika 35(3), 283–319 (1970)
Chakaravarthy, V.T., Choi, J., Joseph, D.J., Liu, X., Murali, P., Sabharwal, Y., Sreedhar, D.: On optimizing distributed tucker decomposition for dense tensors. In: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing, IPDPS ’17, Orlando, FL, USA (2017)
Choi, J.H., Vishwanathan, S.V.N.: DFacTo: distributed factorization of tensors. In: 27th Advances in Neural Information Processing Systems, Montreal, Quebec, Canada, pp. 1296–1304 (2014)
Grasedyck, L.: Hierarchical singular value decomposition of tensors. SIAM J. Matrix Anal. Appl. 31(4), 2029–2054 (2010)
Harshman, R.A.: Foundations of the PARAFAC procedure: models and conditions for an “explanatory” multi-modal factor analysis. UCLA Working Papers in Phonetics 16, 1–84 (1970)
Håstad, J.: Tensor rank is np-complete. J. Algorithms 11(4), 644–654 (1990)
Kang, U., Papalexakis, E., Harpale, A., Faloutsos, C.: GigaTensor: Scaling tensor analysis up by 100 times—algorithms and discoveries. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, pp. 316–324. ACM, New York (2012)
Kaya, O., Uçar, B.: High-performance parallel algorithms for the Tucker decomposition of higher order sparse tensors. Technical Report RR-8801, Inria, Grenoble–Rhône-Alpes (2015)
Kaya, O., Uçar, B.: Scalable sparse tensor decompositions in distributed memory systems. Technical Report RR-8722, Inria, Grenoble–Rhône-Alpes (2015)
Kaya, O., Uçar, B.: Scalable sparse tensor decompositions in distributed memory systems. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’15, pp. 77:1–77:11. ACM, New York (2015). https://doi.org/10.1145/2807591.2807624
Kaya, O., Uçar, B.: High performance parallel algorithms for the Tucker decomposition of sparse tensors. In: Proceedings of the 45th International Conference on Parallel Processing, ICPP ’16, pp. 103–112 (2016). https://doi.org/10.1109/ICPP.2016.19
Kaya, O., Uçar, B.: Parallel CP decomposition of sparse tensors using dimension trees. Research Report RR-8976, Inria - Research Centre Grenoble–Rhône-Alpes (2016)
Kolda, T.G., Bader, B.: The TOPHITS model for higher-order web link analysis. In: Proceedings of Link Analysis, Counterterrorism and Security, pp. 26–29 (2006)
Kolda, T.G., Bader, B.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)
Lathauwer, L.D., Moor, B.D.: From matrix to tensor: Multilinear algebra and signal processing. In: Proceedings of the Institute of Mathematics and Its Applications Conference Series, vol. 67, pp. 1–16 (1998)
Lathauwer, L.D., Moor, B.D., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)
Li, J., Choi, J., Perros, I., Sun, J., Vuduc, R.: Model-driven sparse CP decomposition for higher-order tensors. In: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing, IPDPS ’17, Orlando, FL, USA, pp. 1048–1057 (2017)
Ng, C., Barketau, M., Cheng, T., Kovalyov, M.Y.: Product partition and related problems of scheduling and systems reliability: computational complexity and approximation. Eur. J. Oper. Res. 207(2), 601–604 (2010). https://doi.org/10.1016/j.ejor.2010.05.034
Nion, D., Mokios, K.N., Sidiropoulos, N.D., Potamianos, A.: Batch and adaptive PARAFAC-based blind separation of convolutive speech mixtures. IEEE Trans. Audio Speech Lang. Process. 18(6), 1193–1207 (2010). https://doi.org/10.1109/TASL.2009.2031694
Nion, D., Sidiropoulos, N.D.: Tensor algebra and multidimensional harmonic retrieval in signal processing for mimo radar. IEEE Trans. Signal Process. 58(11), 5693–5705 (2010). https://doi.org/10.1109/TSP.2010.2058802
Perros, I., Chen, R., Vuduc, R., Sun, J.: Sparse hierarchical Tucker factorization and its application to healthcare. In: Proceedings of the 2015 IEEE International Conference on Data Mining, ICDM 2015, pp. 943–948 (2015)
Phan, A.H., Tichavský, P., Cichocki, A.: Fast alternating LS algorithms for high order CANDECOMP/PARAFAC tensor factorizations. IEEE Trans. Signal Process. 61(19), 4834–4846 (2013). https://doi.org/10.1109/TSP.2013.2269903
Rendle, S., Lars, T.S.: Pairwise interaction tensor factorization for personalized tag recommendation. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, WSDM ’10, pp. 81–90. ACM, New York (2010). https://doi.org/10.1145/1718487.1718498
Rendle, S., Leandro, B.M., Nanopoulos, A., Schmidt-Thieme, L.: Learning optimal ranking with tensor factorization for tag recommendation. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’09, pp. 727–736. ACM, New York (2009). https://doi.org/10.1145/1557019.1557100
Sidiropoulos, N.D., Bro, R., Giannakis, G.B.: Parallel factor analysis in sensor array processing. IEEE Trans. Signal Process. 48(8), 2377–2388 (2000). https://doi.org/10.1109/78.852018
Smith, S., Karypis, G.: A medium-grained algorithm for sparse tensor factorization. In: 2016 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2016, Chicago, IL, USA, May 23–27, 2016, pp. 902–911 (2016)
Smith, S., Ravindran, N., Sidiropoulos, N.D., Karypis, G.: SPLATT: Efficient and parallel sparse tensor–matrix multiplication. In: Proceedings of the 29th IEEE International Parallel and Distributed Processing Symposium, IPDPS ’15, pp. 61–70. IEEE Computer Society, Hyderabad (2015)
Symeonidis, P., Nanopoulos, A., Manolopoulos, Y.: Tag recommendations based on tensor dimensionality reduction. In: Proceedings of the 2008 ACM Conference on Recommender Systems, RecSys ’08, pp. 43–50. ACM, New York (2008). https://doi.org/10.1145/1454008.1454017
Vasilescu, M.A.O., Terzopoulos, D.: Multilinear analysis of image ensembles: TensorFaces. In: Computer Vision—ECCV 2002, pp. 447–460. Springer, Berlin (2002)
Acknowledgements
This research was funded in part by the LABEX MILYON (ANR-10-LABX-0070) of Université de Lyon, within the program “Investissements d’Avenir” (ANR-11-IDEX-0007) operated by the French National Research Agency (ANR). The authors would like to thank Bora Uçar for several discussions. Finally, the authors would like to thank both anonymous reviewers for their comments and suggestions, that helped us to improve this manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Counter-Example: Tensor Having No Optimal BDT
Appendix: Counter-Example: Tensor Having No Optimal BDT
Here, we provide a counter-example using a 6-dimensional tensor \(\varvec{\mathcal {X}} \in \mathbb {R}^{I_1 \times \dots \times I_6}\) to show that using a BDT is not necessarily optimal to compute Tucker decomposition. The first three and the last three dimensions of \(\varvec{\mathcal {X}}\) have identical sizes and ranks of approximation. Specifically, we let \(I_1 = I_2 = I_3 = k\) and \(R_1 = R_2 = R_3 = R < k\) in the first three dimensions, and \(I_4 = I_5 = I_6 = R_4 = R_5 = R_6 = c\) in the last three dimensions. We call the first three and the last tree dimensions type-1 and type-2 dimensions, respectively. Note that \(\alpha = \alpha _1 = \alpha _2 = \alpha _3 < \alpha _4 = \alpha _5 = \alpha _6 = 1\), and similarly \(\beta _1 = \beta _2 = \beta _3 < \beta _4 = \beta _5 = \beta _6 = \infty \), therefore, type-1 dimensions are to be multiplied before type-2 dimensions according to Theorem 1. In Fig. 3, we provide a ternary dimension tree with a total cost of \((3 + 3\alpha + 3\alpha ^2)k + 9\alpha ^2c\). We can choose c arbitrarily large so that the term \(9\alpha ^2c\) dominates the cost, and \(\alpha \) small enough so that \(\alpha ^0 c \gg 9\alpha ^2c\) and \(\alpha ^1c \gg 9\alpha ^2c\). In this case, any BDT whose cost involves a term of order \(\alpha ^0 c\) or \(\alpha ^1 c\) cannot be optimal, having a greater cost than the provided ternary tree.
We now show that any BDT using \(\varvec{\mathcal {X}}\) either has a cost with a term of order \(\alpha ^0 c\) or \(\alpha ^1c\), or with a term \(9\alpha ^2c + d\alpha ^3c\) with \(d \ge 1\); hence, cannot be optimal with a sufficiently large c and a sufficiently small \(\alpha \). We do this by exhaustively considering all possible BDTs and analyzing their costs, while aggressively pruning tree configurations that cannot provide optimality. Luckily, most non-optimal configurations can easily be pruned due to symmetry (as we only have two types of dimensions), leaving us with only a handful of instances to consider. We begin by partitioning type-2 dimensions to the children \(t_1\) and \(t_2\) of the root. There are only two possibilities: \(4 \in \mu (t_1)\) and \(5, 6 \in \mu (t_2)\), or \(4, 5, 6 \in \mu (t_2)\). Since we have only three type-1 dimensions, in the former case either \(\mu (t_1)\) or \(\mu (t_2)\) will have one or zero type-1 dimension, while the other set having two or three of them. In this case, the TTM cost of the other vertex involves a term \(\alpha ^0c\) or \(\alpha ^1c\), which already renders this configuration non-optimal. Therefore, we only consider the partition \(4, 5, 6 \in \mu (t_2)\), which is the only one that can possibly provide an optimal solution. In this case, there are three possible configurations after partitioning type-1 dimensions: \(\mu (t_1) = \{ 1 \}\) and \(\mu (t_2) = \{ 2, 3, 4, 5, 6 \}\), \(\mu (t_1) = \{ 1, 2 \}\) and \(\mu (t_2) = \{ 3, 4, 5, 6 \}\), or \(\mu (t_1) = \{ 1, 2, 3 \}\) and \(\mu (t_2) = \{ 4, 5, 6 \}\). Note that in the second and third configurations, computing TTM for \(t_1\) involves a term \(\alpha ^1c\) and \(\alpha ^0c\), respectively, which prevents optimality. Hence, in the rest of the discussion we only consider the first configuration, in which \(t_1\) incurs the TTM cost \(3\alpha ^2c\). We focus on the cost of the sub-tree rooted at \(t_2\), and count only the cost due to terms with the coefficient c. Note that if the remaining type-1 dimensions, namely 2 and 3, reside in the same child of \(t_2\), the other child of \(t_2\) incurs a cost of at least \(\alpha ^1c\); therefore, 2 and 3 must reside in different children of \(t_2\). In Fig. 4 we provide six such possibilities, all of which incur a cost of \(9\alpha ^2c + c\alpha ^3c\) with \(c \ge 1\). Therefore, we conclude that a BDT cannot be optimal for the given \(\varvec{\mathcal {X}}\) for sufficiently small \(\alpha \) and large c.
Rights and permissions
About this article
Cite this article
Kaya, O., Robert, Y. Computing Dense Tensor Decompositions with Optimal Dimension Trees. Algorithmica 81, 2092–2121 (2019). https://doi.org/10.1007/s00453-018-0525-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-018-0525-3