Abstract
Given a set \(S = \{s_1, s_2, \ldots , s_n\}\) of strings of equal length \(L\) and an integer \(d\), the closest string problem (CSP) requires the computation of a string \(s\) of length \(L\) such that \(d(s, s_i) \le d\) for each \(s_i \in S\), where \(d(s, s_i)\) is the Hamming distance between \(s\) and \(s_i\). The problem is NP-hard and has been extensively studied in the context of approximation algorithms and fixed-parameter algorithms. Fixed-parameter algorithms provide the most practical solutions to its real-life applications in bioinformatics. In this paper we develop the first randomized fixed-parameter algorithms for CSP. Not only are the randomized algorithms much simpler than their deterministic counterparts, their time complexities are also significantly better than the previously best known (deterministic) algorithms.


Similar content being viewed by others
References
Böcker, S., Jahn, K., Mixtacki, J., Stoye, J.: Computation of median gene clusters. J. Comput. Biol. 16(8), 1085–1099 (2009)
Boucher, C., Brown, D.: Detecting motifs in a large data set: applying probabilistic insights to motif finding. In: Proceedings of the Conference on Bioinformatics and Computational Biology (BICoB), pp. 139–150 (2009)
Ben-Dor, A., Lancia, G., Perone, J., Ravi, R.: Banishing bias from consensus sequences. In: Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching, pp. 247–261 (1997)
Chen, J., Lu, S.: Improved parameterized set splitting algorithms: a probabilistic approach. Algorithmica 54(4), 472–489 (2008)
Chen, J., Lu, S., Sze, S.H., Zhang, F.: Improved algorithms for path, matching, and packing problems. In: Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 298–307 (2007)
Chen, Z.-Z., Ma, B., Wang, L.: A three-string approach to the closest string problem. J. Comput. Syst. Sci. 78, 164–178 (2012)
Chen, Z.-Z., Wang, L.: Fast exact algorithms for the closest string and substring problems with application to the planted \((\ell, d)\)-motif model. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(5), 1400–1410 (2011)
Davila, J., Balla, S., Rajasekaran, S.: Space and time efficient algorithms for planted motif search. In: Proceedings of the International Conference on Computational Science, pp. 822–829 (2006)
Deng, X., Li, G., Li, Z., Ma, B., Wang, L.: Genetic design of drugs without side-effects. SIAM J. Comput. 32(4), 1073–1090 (2003)
Dopazo, J., Rodríguez, A., Sáiz, J.C., Sobrino, F.: Design of primers for PCR amplification of highly variable genomes. CABIOS 9, 123–125 (1993)
Evans, P.A., Smith, A.D.: Complexity of approximating closest substring problems. In Proceedings of the 14th International Symposium on Foundations of Complexity Theory, pp. 210–221 (2003)
Fellows, M.R., Gramm, J., Niedermeier, R.: On the parameterized intractability of motif search problems. Combinatorica 26(2), 141–167 (2006)
Feng, Q., Wang, J., Li, S., Chen, J.: Random methods for parameterized problems. In: Proceedings of the 19th International Computing and Combinatorics Conference (COCOON), pp. 89–100 (2013)
Frances, M., Litman, A.: On covering problems of codes. Theor. Comput. Sci. 30, 113–119 (1997)
Gramm, J., Guo, J., Niedermeier, R.: On exact and approximation algorithms for distinguishing substring selection. In: Proceedings of the 14th International Symposium on Foundations of Complexity Theory, pp. 159–209 (2003)
Gramm, J., Hüffner, F., Niedermeier, R.: Closest strings, primer design, and motif search. In: Florea, L. et al. (eds.) Currents in Computational Molecular Biology. Poster Abstracts of RECOMB 2002, pp. 74–75
Gramm, J., Niedermeier, R., Rossmanith, P.: Fixed-parameter algorithms for closest string and related problems. Algorithmica 37, 25–42 (2003)
Hufsky, F., Kuchenbecker, L., Jahn, K., Stoye, J., Böcker, S.: Swiftly computing center strings. In: Proceedings of the 10th International Workshop on Algorithms in Bioinformatics, pp. 325–336 (2010)
Jiao, Y., Xu, J., Li, M.: On the k-closest substring and k-consensus pattern problems. In: Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching, pp. 130–144 (2004)
Lanctot, K., Li, M., Ma, B., Wang, S., Zhang, L.: Distinguishing string search problems. Inf. Comput. 185, 41–55 (2003)
Li, M., Ma, B., Wang, L.: On the closest string and substring problems. J. ACM 49(2), 157–171 (2002)
Lucas, K., Busch, M., Mösinger, S., Thompson, J.A.: An improved microcomputer program for finding gene- or gene family-specific oligonucleotides suitable as primers for polymerase chain reactions or as probes. CABIOS 7, 525–529 (1991)
Ma, B., Sun, X.: More efficient algorithms for closest string and substring problems. SIAM J. Comput. 39(4), 1432–1443 (2010)
Marx, D.: Closest substring problems with small distances. SIAM J. Comput. 38(4), 1382–1410 (2008)
Marx, D.: Randomized techniques for parameterized algorithms. In: Proceedings of the 7th International Symposium on Parameterized and Exact Computation (IPEC), p. 2 (2012)
Marx, D., Razgon, I.: Fixed-parameter tractability of multicut parameterized by the size of the cutset. In: Proceedings of the 43rd Annual ACM Symposium on Theory of Computing (STOC), pp. 469–478 (2011)
Mauch, H., Melzer, M.J., Hu, J.S.: Genetic algorithm approach for the closest string problem. In: Proceedings of the 2nd IEEE Computer Society Bioinformatics Conference (CSB), pp. 560–561 (2003)
Meneses, C.N., Lu, Z., Oliveira, C.A.S., Pardalos, P.M.: Optimal solutions for the closest-string problem via integer programming. INFORMS J. Comput. 16, 419–429 (2004)
Nicolas, F., Rivals, E.: Complexities of the centre and median string problems. In: Proceedings of the 14th Annual Symposium on Combinatorial Pattern Matching, pp. 315–327 (2003)
Proutski, V., Holme, E.C.: Primer master: a new program for the design and analysis of PCR primers. CABIOS 12, 253–255 (1996)
Stojanovic, N., Berman, P., Gumucio, D., Hardison, R., Miller, W.: A linear-time algorithm for the 1-mismatch problem. In: Proceedings of the 5th International Workshop on Algorithms and Data Structures, pp. 126–135 (1997)
Wang, L., Dong, L.: Randomized algorithms for motif detection. J. Bioinform. Comput. Biol. 3(5), 1039–1052 (2005)
Wang, L., Zhu, B.: Efficient algorithms for the closest string and distinguishing string selection problems. In: Proceedings of the 3rd International Frontiers of Algorithmics Workshop, pp. 261–270 (2009)
Wang, Y., Chen, W., Li, X., Cheng, B.: Degenerated primer design to amplify the heavy chain variable region from immunoglobulin cDNA. BMC Bioinform. 7(Suppl. 4), S9 (2006)
Zhao, R., Zhang, N.: A more efficient closest string algorithm. In: Proceedings of the 2nd International Conference on Bioinformatics and Computational Biology (2010)
Acknowledgments
We thank the anonymous referees for very helpful comments. Zhi-Zhong Chen was supported in part by the Grant-in-Aid for Scientific Research of the Ministry of Education, Science, Sports and Culture of Japan, under Grant No. 24500023. Bin Ma was supported in part by Natural Sciences and Engineering Research Council of Canada (RGPIN 238748). Lusheng Wang was supported by a GRF grant from Hong Kong SAR government Project No. [CityU 123013] and a grant from National Foundation of China Project No. [61373048].
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary version of this paper appeared in the Proceedings of the 25th Annual Symposium on Combinatorial Pattern Matching, 2014.
Rights and permissions
About this article
Cite this article
Chen, ZZ., Ma, B. & Wang, L. Randomized Fixed-Parameter Algorithms for the Closest String Problem. Algorithmica 74, 466–484 (2016). https://doi.org/10.1007/s00453-014-9952-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-014-9952-y