Abstract
We introduce data structures answering queries concerning the occurrences of patterns from a given dictionary \(\mathsf {D}\) in fragments of a given string T of length n. The dictionary is internal in the sense that each pattern in \(\mathsf {D}\) is given as a fragment of T. This way, \(\mathsf {D}\) takes space proportional to the number of patterns \(d=|\mathsf {D}|\) rather than their total length, which could be \(\varTheta (n\cdot d)\). In particular, we consider the following types of queries: reporting and counting all occurrences of patterns from \(\mathsf {D}\) in a fragment \(T[i \mathinner {.\,.}j]\) and reporting distinct patterns from \(\mathsf {D}\) that occur in \(T[i \mathinner {.\,.}j]\). We show how to construct, in \(O((n+d) \log ^{O(1)} n)\) time, a data structure that answers each of these queries in time \(O(\log ^{O(1)} n+| output |)\). The case of counting patterns is much more involved and needs a combination of a locally consistent parsing with orthogonal range searching. Reporting distinct patterns, on the other hand, uses the structure of maximal repetitions in strings. Finally, we provide tight—up to subpolynomial factors—upper and lower bounds for the case of a dynamic dictionary.







Similar content being viewed by others
Notes
The \(\tilde{O}(\cdot )\) notation suppresses \(\log ^{O(1)} n\) factors.
References
Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975). https://doi.org/10.1145/360825.360855
Amir, A., Farach, M., Galil, Z., Giancarlo, R., Park, K.: Dynamic dictionary matching. J. Comput. Syst. Sci. 49(2), 208–222 (1994). https://doi.org/10.1016/S0022-0000(05)80047-9
Amir, A., Farach, M., Idury, R.M., La Poutré, J.A., Schäffer, A.A.: Improved dynamic dictionary matching. Inf. Comput. 119(2), 258–282 (1995). https://doi.org/10.1006/inco.1995.1090
Amir, A., Landau, G.M., Lewenstein, M., Sokol, D.: Dynamic text and static pattern matching. ACM Trans. Algo. 3(2), 19 (2007). https://doi.org/10.1145/1240233.1240242
Babenko, Maxim, Gawrychowski, Paweł, Kociumaka, Tomasz, Starikovskaya, Tatiana: Wavelet trees meet suffix trees. In 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, SIAM pp. 572–591 (2015). https://doi.org/10.1137/1.9781611973730.39
Bannai, Hideo, I, T., Inenaga, S., Nakashima, Y., Takeda, M., Tsuruta, K.: The “runs” theorem. SIAM J. Comput. 46(5), 1501–1514 (2017)
Bannai, H., Inenaga, S., Köppl, D.: Computing all distinct squares in linear time for integer alphabets. In: 28th Annual Symposium on Combinatorial Pattern Matching, CPM, : volume 78 of LIPIcs, pp. 22:1–22:18. Schloss Dagstuhl–Leibniz–Zentrum für Informatik 2017, (2017). https://doi.org/10.4230/LIPIcs.CPM.2017.22
Bender, M.A., Farach-Colton, M.: The level ancestor problem simplified. Theoret. Comput. Sci. 321(1), 5–12 (2004). https://doi.org/10.1016/j.tcs.2003.05.002
Bender, M.A., Farach-Colton, M., Pemmasani, G., Skiena, S., Sumazin, P.: Lowest common ancestors in trees and directed acyclic graphs. J. Algorithms 57(2), 75–94 (2005). https://doi.org/10.1016/j.jalgor.2005.08.001
Bentley, J.L.: Multidimensional divide-and-conquer. Commun. ACM 23(4), 214–229 (1980). https://doi.org/10.1145/358841.358850
Chan, H.-L., Hon, W.-Ka., Lam, T.W., Sadakane, K.: Dynamic dictionary matching and compressed suffix trees. In: 16th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2005 SIAM pp. 13–22 (2005). URL: http://dl.acm.org/citation.cfm?id=1070432.1070436
Chan, T.M., Pătraşcu, M.: Counting inversions, offline orthogonal range counting, and related problems. In: 21st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, pp. 161–173 (2010). https://doi.org/10.1137/1.9781611973075.15
Charalampopoulos, P., Kociumaka, T., Mohamed, M., Radoszewski, J., Rytter, W., Straszyński, J., Waleń, T., Zuba, W.: Counting distinct patterns in internal dictionary matching. In: 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020, volume 161 of LIPIcs, Schloss Dagstuhl-Leibniz-Zentrum für Informatik, pp. 8:1–8:15 (2020)
Charalampopoulos, P., Kociumaka, T., Mohamed, M., Radoszewski, J., Rytter, W., Waleń, T.: Internal Dictionary Matching. In: 30th International Symposium on Algorithms and Computation, ISAAC 2019, volume 149 of LIPIcs, Schloss Dagstuhl-Leibniz-Zentrum für Informatik, pp 22:1–22:17 (2019)
Clark, D.: Compact Pat trees. PhD thesis, University of Waterloo (1996). http://hdl.handle.net/10012/64
Cole, R., Gottlieb, L.-Ad. Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: 36th Annual ACM Symposium on Theory of Computing, STOC 2004, pp. 91–100. ACM (2004). https://doi.org/10.1145/1007352.1007374
Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press (2007). https://doi.org/10.1017/cbo9780511546853
Crochemore, M., Iliopoulos, C.S., Kubica, M., Radoszewski, J., Rytter, W., Waleń, T.: Extracting powers and periods in a word from its runs structure. Theoret. Comput. Sci. 521, 29–41 (2014). https://doi.org/10.1016/j.tcs.2013.11.018
Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. ACM 47(6), 987–1011 (2000). https://doi.org/10.1145/355541.355547
Fine, N.J., Wilf, H.S.: Uniqueness theorems for periodic functions. Proc. Am. Math. Soc. 16, 109–114 (1965)
Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011). https://doi.org/10.1137/090779759
Gawrychowski, P., Landau, G.M., Mozes, S., Weimann, O.: The nearest colored node in a tree. Theoret. Comput. Sci. 710, 66–73 (2018). https://doi.org/10.1016/j.tcs.2017.08.021
Groult, R., Prieur, É., Richomme, G.: Counting distinct palindromes in a word in linear time. Inf. Process. Lett. 110(20), 908–912 (2010). https://doi.org/10.1016/j.ipl.2010.07.018
Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984). https://doi.org/10.1137/0213024
He, M., Munro, J.I.:: Space efficient data structures for dynamic orthogonal range counting. Comput. Geom. 47(2), 268–281 (2014). https://doi.org/10.1016/j.comgeo.2013.08.007
Henzinger, M., Krinninger, S., Nanongkai, D., Saranurak, T.: Unifying and strengthening hardness for dynamic problems via the online matrix-vector multiplication conjecture. In: 47th Annual ACM on Symposium on Theory of Computing, STOC 2015, ACM 21–30 (2015). https://doi.org/10.1145/2746539.2746609
Jacobson, G.: Space-efficient static trees and graphs. In: 30th Annual Symposium on Foundations of Computer Science, FOCS 1989, IEEE Computer Society 549–554 (1989). https://doi.org/10.1109/SFCS.1989.63533
Jeż, A.: Faster fully compressed pattern matching by recompression. ACM Trans. Algo. 11(3), 20:1–20:43 (2015). https://doi.org/10.1145/2631920
Jeż, A.: Recompression: A simple and powerful technique for word equations. J. ACM 63(1), 4:1–4:51 (2016). https://doi.org/10.1145/2743014
Keller, O., Kopelowitz, T., Shir Landau, F., Moshe, L.: Generalized substring compression. Theor. Comput. Sci. 525, 42–54 (2014). https://doi.org/10.1016/j.tcs.2013.10.010
Kociumaka, T.: Efficient Data Structures for Internal Queries in Texts. PhD thesis, University of Warsaw (2018). https://mimuw.edu.pl/~kociumaka/files/phd.pdf
Kociumaka, T., Kubica, M., Radoszewski, J., Rytter, W., Waleń, T.: A linear-time algorithm for seeds computation. ACM Trans. Algorithms 16(2), 27:1–27:23 (2020). https://doi.org/10.1145/3386369
Kociumaka, T., Radoszewski, J., Rytter, W., Waleń, T.: Internal pattern matching queries in a text and applications. In: 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, SIAM 532–551 (2015). https://doi.org/10.1137/1.9781611973730.36
Kolpakov, R.M., Kucherov, G.: Finding maximal repetitions in a word in linear time. In: 40th Annual Symposium on Foundations of Computer Science, FOCS 1999, IEEE Computer Society 596–604 (1999). https://doi.org/10.1109/SFFCS.1999.814634
Kopelowitz, T., Lewenstein, M., Porat, E.: Persistency in suffix trees with applications to string interval problems. In: 18th International Symposium on String Processing and Information Retrieval, SPIRE 2011, volume 7024 of LNCS, Springer pp. 67–80 (2011). https://doi.org/10.1007/978-3-642-24583-1_8
Mäkinen, V., Navarro, G.: Rank and select revisited and extended. Theoret. Comput. Sci. 387(3), 332–347 (2007). https://doi.org/10.1016/j.tcs.2007.07.013
Munro, J.I, Nekrich, Y., Vitter, J.S.: Fast construction of wavelet trees. Theoret. Comput. Sci. 638, 91–97 (2016). https://doi.org/10.1016/j.tcs.2015.11.011
Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: 13th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2002, SIAM 657–666 (2002) URL: http://dl.acm.org/citation.cfm?id=545381.545469
Pătraşcu, M.: Unifying the landscape of cell-probe lower bounds. SIAM J. Comput. 40(3), 827–847 (2011). https://doi.org/10.1137/09075336X
Rubinchik, M., Shur, A.M.: Counting palindromes in substrings. In: 24th International Symposium on String Processing and Information Retrieval, SPIRE 2017, volume 10508 of LNCS, Springer pp. 290–303 (2017). https://doi.org/10.1007/978-3-319-67428-5_25
I, T.: Longest common extensions with recompression. In: 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017, volume 78 of LIPIcs, Schloss Dagstuhl-Leibniz-Zentrum für Informatik, pp. 18:1–18:15 (2017) https://doi.org/10.4230/LIPIcs.CPM.2017.18
Acknowledgements
Panagiotis Charalampopoulos and Manal Mohamed thank Solon Pissis for preliminary discussions.
Funding
Panagiotis Charalampopoulos was partially supported by ERC Grant TOTAL (No. 677651) under the EU’s Horizon 2020 Research and Innovation Programme. Tomasz Kociumaka was supported by ISF Grants No. 1278/16 and 1926/19, a BSF Grant No. 2018364, and an ERC grant MPM (No. 683064) under the EU’s Horizon 2020 Research and Innovation Programme. Jakub Radoszewski and Tomasz Waleń are supported by the Polish National Science Center, Grant No. 2018/31/D/ST6/03991.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A preliminary version of this paper was presented at the 30th International Symposium on Algorithms and Computation (ISAAC 2019) [14].
Rights and permissions
About this article
Cite this article
Charalampopoulos, P., Kociumaka, T., Mohamed, M. et al. Internal Dictionary Matching. Algorithmica 83, 2142–2169 (2021). https://doi.org/10.1007/s00453-021-00821-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-021-00821-y