Abstract
Genomic micro-satellites are the genomic regions that consist of short and repetitive DNA motifs. In contrast to unique genome, genomic micro-satellites expose high intrinsic polymorphisms, which mainly derive from variability in length. Length distributions are widely used to represent the polymorphisms. Recent studies report that some micro-satellites alter their length distributions significantly in tumor tissue samples comparing to the ones observed in normal samples, which becomes a hot topic in cancer genomics. Several state-of-the-art approaches are proposed to identify the length distributions from the sequencing data. However, the existing approaches can only handle the micro-satellites shorter than one read length, which limits the potential research on long micro-satellite events. In this article, we propose a probabilistic approach, implemented as ELMSI that estimates the length distributions of the micro-satellites longer than one read length. The core algorithm works on a set of mapped reads. It first clusters the reads, and a k-mer extension algorithm is adopted to detect the unit and breakpoints as well. Then, it conducts an expectation maximization algorithm to approach the true length distributions. According to the experiments, ELMSI is able to handle micro-satellites with the length spectrum from shorter than one read length to 10 kbps scale. A series of comparison experiments are applied, which vary the numbers of micro-satellite regions, read lengths and sequencing coverages, and ELMSI outperforms MSIsensor in most of the cases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Miesfeld, R., Krystal, M., Arnheim, N.: A member of a new repeated sequence family which is conserved throughout eucaryotic evolution is found between the human delta and beta globin genes. Nucleic Acids Res. 9(22), 5931–5947 (1981)
Ashley, C., Warren, S.: Trinucleotide repeat expansion and human disease. Annu. Rev. Genet. 16(1), 1698–1704 (1995)
Ellegren, H.: Microsatellites: simple sequences with complex evolution. Nat. Rev. Genet. 5(6), 435–445 (2004)
Niu, B., Ye, K., Zhang, Q., et al.: MSIsensor: microsatellite instability detection using paired tumor-normal sequence data. Bioinformatics 30(7), 1015 (2014)
Murphy, K.M., Zhang, S., Geiger, T., Hafez, M.J., Bacher, J., Berg, K.D., Eshleman, J.R.: Comparison of the microsatellite instability analysis system and the bethesda panel for the determination of micro-satellite instability in colorectal cancers. J. Mol. Diagn. 8(3), 305–311 (2006)
Lu, C., Xie, M., Wendl, M., et al.: Patterns and functional implications of rare germline variants across 12 cancer types. Nat. Commun. 6(10086), 1–13 (2015)
Markowitz, S.D., Bertagnolli, M.M.: Molecular origins of cancer: molecular basis of colorectal cancer. N. Engl. J. Med. 361(25), 2449 (2009)
Kim, T.M., Laird, P.W., Park, P.J.: The landscape of microsatellite instability in colorectal and endometrial cancer genomes. Cell 155(4), 858–868 (2013)
Woerner, S.M., Kloor, M., Mueller, A., et al.: Microsatellite instability of selective target genes in HNPCC-associated colon adenomas. Oncogene 24(15), 2523–2535 (2005)
Ritchard, C.C., Morrissey, C., Kumar, A., et al.: Complex MSH2 and MSH6 mutations in hypermutated microsatellite unstable advanced prostate cancer. Nat. Commun. 5, 4988 (2014)
Ribic, C.M., Sargent, D.J., Moore, M.J., et al.: Tumor microsatellite instability status as a predictor of benefit from fluorouracil-based adjuvant chemotherapy for colon cancer. N. Engl. J. Med. 349(3), 247–257 (2003)
Pawlik, T.M., Raut, C.P., Rodriguez-Bigas, M.A.: Colorectal carcinogenesis: MSI-H versus MSI-L. Dis. Markers 20(4–5), 199–206 (2004)
Salipante, S.J., Scroggins, S.M., Hampel, H.L., et al.: Microsatellite instability detection by next generation sequencing. Clin. Chem. 60(9), 1192–1199 (2014)
Mi, N.H., Mcpherson, J.R., Cutcutache, I., et al.: MSIseq: software for assessing microsatellite instability from catalogs of somatic mutations. Sci. Rep. 5, 13321 (2015)
Wu, C.W., Chen, G.D., Jiang, K.C., et al.: A genome-wide study of microsatellite instability in advanced gastric carcinoma. Cancer 92(1), 92–101 (2015)
Acknowledgement
This work is supported by the National Science Foundation of China (Grant No: 31701150) and the Fundamental Research Funds for the Central Universities (CXTD2017003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Feng, X., Hu, H., Zhao, Z., Zhang, X., Wang, J. (2018). Estimating the Length Distributions of Genomic Micro-satellites from Next Generation Sequencing Data. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2018. Lecture Notes in Computer Science(), vol 10813. Springer, Cham. https://doi.org/10.1007/978-3-319-78723-7_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-78723-7_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78722-0
Online ISBN: 978-3-319-78723-7
eBook Packages: Computer ScienceComputer Science (R0)