Skip to main content
Log in

Internal Dictionary Matching

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

We introduce data structures answering queries concerning the occurrences of patterns from a given dictionary \(\mathsf {D}\) in fragments of a given string T of length n. The dictionary is internal in the sense that each pattern in \(\mathsf {D}\) is given as a fragment of T. This way, \(\mathsf {D}\) takes space proportional to the number of patterns \(d=|\mathsf {D}|\) rather than their total length, which could be \(\varTheta (n\cdot d)\). In particular, we consider the following types of queries: reporting and counting all occurrences of patterns from \(\mathsf {D}\) in a fragment \(T[i \mathinner {.\,.}j]\) and reporting distinct patterns from \(\mathsf {D}\) that occur in \(T[i \mathinner {.\,.}j]\). We show how to construct, in \(O((n+d) \log ^{O(1)} n)\) time, a data structure that answers each of these queries in time \(O(\log ^{O(1)} n+| output |)\). The case of counting patterns is much more involved and needs a combination of a locally consistent parsing with orthogonal range searching. Reporting distinct patterns, on the other hand, uses the structure of maximal repetitions in strings. Finally, we provide tight—up to subpolynomial factors—upper and lower bounds for the case of a dynamic dictionary.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. The \(\tilde{O}(\cdot )\) notation suppresses \(\log ^{O(1)} n\) factors.

References

  1. Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975). https://doi.org/10.1145/360825.360855

    Article  MathSciNet  MATH  Google Scholar 

  2. Amir, A., Farach, M., Galil, Z., Giancarlo, R., Park, K.: Dynamic dictionary matching. J. Comput. Syst. Sci. 49(2), 208–222 (1994). https://doi.org/10.1016/S0022-0000(05)80047-9

    Article  MathSciNet  MATH  Google Scholar 

  3. Amir, A., Farach, M., Idury, R.M., La Poutré, J.A., Schäffer, A.A.: Improved dynamic dictionary matching. Inf. Comput. 119(2), 258–282 (1995). https://doi.org/10.1006/inco.1995.1090

    Article  MathSciNet  MATH  Google Scholar 

  4. Amir, A., Landau, G.M., Lewenstein, M., Sokol, D.: Dynamic text and static pattern matching. ACM Trans. Algo. 3(2), 19 (2007). https://doi.org/10.1145/1240233.1240242

    Article  MathSciNet  MATH  Google Scholar 

  5. Babenko, Maxim, Gawrychowski, Paweł, Kociumaka, Tomasz, Starikovskaya, Tatiana: Wavelet trees meet suffix trees. In 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, SIAM pp. 572–591 (2015). https://doi.org/10.1137/1.9781611973730.39

  6. Bannai, Hideo, I, T., Inenaga, S., Nakashima, Y., Takeda, M., Tsuruta, K.: The “runs” theorem. SIAM J. Comput. 46(5), 1501–1514 (2017)

  7. Bannai, H., Inenaga, S., Köppl, D.: Computing all distinct squares in linear time for integer alphabets. In: 28th Annual Symposium on Combinatorial Pattern Matching, CPM, : volume 78 of LIPIcs, pp. 22:1–22:18. Schloss Dagstuhl–Leibniz–Zentrum für Informatik 2017, (2017). https://doi.org/10.4230/LIPIcs.CPM.2017.22

  8. Bender, M.A., Farach-Colton, M.: The level ancestor problem simplified. Theoret. Comput. Sci. 321(1), 5–12 (2004). https://doi.org/10.1016/j.tcs.2003.05.002

    Article  MathSciNet  MATH  Google Scholar 

  9. Bender, M.A., Farach-Colton, M., Pemmasani, G., Skiena, S., Sumazin, P.: Lowest common ancestors in trees and directed acyclic graphs. J. Algorithms 57(2), 75–94 (2005). https://doi.org/10.1016/j.jalgor.2005.08.001

    Article  MathSciNet  MATH  Google Scholar 

  10. Bentley, J.L.: Multidimensional divide-and-conquer. Commun. ACM 23(4), 214–229 (1980). https://doi.org/10.1145/358841.358850

    Article  MathSciNet  MATH  Google Scholar 

  11. Chan, H.-L., Hon, W.-Ka., Lam, T.W., Sadakane, K.: Dynamic dictionary matching and compressed suffix trees. In: 16th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2005 SIAM pp. 13–22 (2005). URL: http://dl.acm.org/citation.cfm?id=1070432.1070436

  12. Chan, T.M., Pătraşcu, M.: Counting inversions, offline orthogonal range counting, and related problems. In: 21st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, pp. 161–173 (2010). https://doi.org/10.1137/1.9781611973075.15

  13. Charalampopoulos, P., Kociumaka, T., Mohamed, M., Radoszewski, J., Rytter, W., Straszyński, J., Waleń, T., Zuba, W.: Counting distinct patterns in internal dictionary matching. In: 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020, volume 161 of LIPIcs, Schloss Dagstuhl-Leibniz-Zentrum für Informatik, pp. 8:1–8:15 (2020)

  14. Charalampopoulos, P., Kociumaka, T., Mohamed, M., Radoszewski, J., Rytter, W., Waleń, T.: Internal Dictionary Matching. In: 30th International Symposium on Algorithms and Computation, ISAAC 2019, volume 149 of LIPIcs, Schloss Dagstuhl-Leibniz-Zentrum für Informatik, pp 22:1–22:17 (2019)

  15. Clark, D.: Compact Pat trees. PhD thesis, University of Waterloo (1996). http://hdl.handle.net/10012/64

  16. Cole, R., Gottlieb, L.-Ad. Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: 36th Annual ACM Symposium on Theory of Computing, STOC 2004, pp. 91–100. ACM (2004). https://doi.org/10.1145/1007352.1007374

  17. Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press (2007). https://doi.org/10.1017/cbo9780511546853

  18. Crochemore, M., Iliopoulos, C.S., Kubica, M., Radoszewski, J., Rytter, W., Waleń, T.: Extracting powers and periods in a word from its runs structure. Theoret. Comput. Sci. 521, 29–41 (2014). https://doi.org/10.1016/j.tcs.2013.11.018

    Article  MathSciNet  MATH  Google Scholar 

  19. Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. ACM 47(6), 987–1011 (2000). https://doi.org/10.1145/355541.355547

    Article  MathSciNet  MATH  Google Scholar 

  20. Fine, N.J., Wilf, H.S.: Uniqueness theorems for periodic functions. Proc. Am. Math. Soc. 16, 109–114 (1965)

    Article  MathSciNet  Google Scholar 

  21. Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011). https://doi.org/10.1137/090779759

    Article  MathSciNet  MATH  Google Scholar 

  22. Gawrychowski, P., Landau, G.M., Mozes, S., Weimann, O.: The nearest colored node in a tree. Theoret. Comput. Sci. 710, 66–73 (2018). https://doi.org/10.1016/j.tcs.2017.08.021

  23. Groult, R., Prieur, É., Richomme, G.: Counting distinct palindromes in a word in linear time. Inf. Process. Lett. 110(20), 908–912 (2010). https://doi.org/10.1016/j.ipl.2010.07.018

    Article  MathSciNet  MATH  Google Scholar 

  24. Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984). https://doi.org/10.1137/0213024

    Article  MathSciNet  MATH  Google Scholar 

  25. He, M., Munro, J.I.:: Space efficient data structures for dynamic orthogonal range counting. Comput. Geom. 47(2), 268–281 (2014). https://doi.org/10.1016/j.comgeo.2013.08.007

  26. Henzinger, M., Krinninger, S., Nanongkai, D., Saranurak, T.: Unifying and strengthening hardness for dynamic problems via the online matrix-vector multiplication conjecture. In: 47th Annual ACM on Symposium on Theory of Computing, STOC 2015, ACM 21–30 (2015). https://doi.org/10.1145/2746539.2746609

  27. Jacobson, G.: Space-efficient static trees and graphs. In: 30th Annual Symposium on Foundations of Computer Science, FOCS 1989, IEEE Computer Society 549–554 (1989). https://doi.org/10.1109/SFCS.1989.63533

  28. Jeż, A.: Faster fully compressed pattern matching by recompression. ACM Trans. Algo. 11(3), 20:1–20:43 (2015). https://doi.org/10.1145/2631920

    Article  MathSciNet  MATH  Google Scholar 

  29. Jeż, A.: Recompression: A simple and powerful technique for word equations. J. ACM 63(1), 4:1–4:51 (2016). https://doi.org/10.1145/2743014

    Article  MathSciNet  MATH  Google Scholar 

  30. Keller, O., Kopelowitz, T., Shir Landau, F., Moshe, L.: Generalized substring compression. Theor. Comput. Sci. 525, 42–54 (2014). https://doi.org/10.1016/j.tcs.2013.10.010

    Article  MathSciNet  MATH  Google Scholar 

  31. Kociumaka, T.: Efficient Data Structures for Internal Queries in Texts. PhD thesis, University of Warsaw (2018). https://mimuw.edu.pl/~kociumaka/files/phd.pdf

  32. Kociumaka, T., Kubica, M., Radoszewski, J., Rytter, W., Waleń, T.: A linear-time algorithm for seeds computation. ACM Trans. Algorithms 16(2), 27:1–27:23 (2020). https://doi.org/10.1145/3386369

    Article  MathSciNet  MATH  Google Scholar 

  33. Kociumaka, T., Radoszewski, J., Rytter, W., Waleń, T.: Internal pattern matching queries in a text and applications. In: 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, SIAM 532–551 (2015). https://doi.org/10.1137/1.9781611973730.36

  34. Kolpakov, R.M., Kucherov, G.: Finding maximal repetitions in a word in linear time. In: 40th Annual Symposium on Foundations of Computer Science, FOCS 1999, IEEE Computer Society 596–604 (1999). https://doi.org/10.1109/SFFCS.1999.814634

  35. Kopelowitz, T., Lewenstein, M., Porat, E.: Persistency in suffix trees with applications to string interval problems. In: 18th International Symposium on String Processing and Information Retrieval, SPIRE 2011, volume 7024 of LNCS, Springer pp. 67–80 (2011). https://doi.org/10.1007/978-3-642-24583-1_8

  36. Mäkinen, V., Navarro, G.: Rank and select revisited and extended. Theoret. Comput. Sci. 387(3), 332–347 (2007). https://doi.org/10.1016/j.tcs.2007.07.013

    Article  MathSciNet  MATH  Google Scholar 

  37. Munro, J.I, Nekrich, Y., Vitter, J.S.: Fast construction of wavelet trees. Theoret. Comput. Sci. 638, 91–97 (2016). https://doi.org/10.1016/j.tcs.2015.11.011

    Article  MathSciNet  MATH  Google Scholar 

  38. Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: 13th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2002, SIAM 657–666 (2002) URL: http://dl.acm.org/citation.cfm?id=545381.545469

  39. Pătraşcu, M.: Unifying the landscape of cell-probe lower bounds. SIAM J. Comput. 40(3), 827–847 (2011). https://doi.org/10.1137/09075336X

    Article  MathSciNet  MATH  Google Scholar 

  40. Rubinchik, M., Shur, A.M.: Counting palindromes in substrings. In: 24th International Symposium on String Processing and Information Retrieval, SPIRE 2017, volume 10508 of LNCS, Springer pp. 290–303 (2017). https://doi.org/10.1007/978-3-319-67428-5_25

  41. I, T.: Longest common extensions with recompression. In: 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017, volume 78 of LIPIcs, Schloss Dagstuhl-Leibniz-Zentrum für Informatik, pp. 18:1–18:15 (2017) https://doi.org/10.4230/LIPIcs.CPM.2017.18

Download references

Acknowledgements

Panagiotis Charalampopoulos and Manal Mohamed thank Solon Pissis for preliminary discussions.

Funding

Panagiotis Charalampopoulos was partially supported by ERC Grant TOTAL (No. 677651) under the EU’s Horizon 2020 Research and Innovation Programme. Tomasz Kociumaka was supported by ISF Grants No. 1278/16 and 1926/19, a BSF Grant No. 2018364, and an ERC grant MPM (No. 683064) under the EU’s Horizon 2020 Research and Innovation Programme. Jakub Radoszewski and Tomasz Waleń are supported by the Polish National Science Center, Grant No. 2018/31/D/ST6/03991.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Panagiotis Charalampopoulos.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary version of this paper was presented at the 30th International Symposium on Algorithms and Computation (ISAAC 2019) [14].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Charalampopoulos, P., Kociumaka, T., Mohamed, M. et al. Internal Dictionary Matching. Algorithmica 83, 2142–2169 (2021). https://doi.org/10.1007/s00453-021-00821-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-021-00821-y

Keywords