Abstract
With the development of high-throughput sequencing technology, it brings a large volume of data of transcriptome. Long non-protein-coding RNAs (lncRNAs) identification is pervasive in transcriptome studies in their important roles in biological process. This paper proposed a computational method for identifying lncRNAs based on machine learning. The method first selects feature using k-mer for traversing the transcript sequence to obtain a large class of features, integrated GC content and sequence length. Then it uses variance test to select three kinds of features by grid searching and reduce the data dimension and support vector machine pressure to establish a recognition model, the final model has a certain stability and robustness. The method obtain 95.7% accuracy, 0.99 AUC for test dataset. Therefore, it could be promising for identifying lncRNA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bu, D., et al.: NONCODE v3.0: integrative annotation of long noncoding RNAs. Nucleic Acids Res 40, D210–D215 (2012)
Derrien, T., et al.: The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012)
Cheetham, S.W., Gruhl, F., Mattick, J.S., Dinger, M.E.: Long nonconding RNAs and the genetics of cancer. Br. J. Cancer 108, 2419–2425 (2013)
Li, D., Chen, G., Yang, J., Fan, X., Gong, Y., Xu, G., et al.: Transcriptome analysis reveals disti-nct patterns of long nonconding RNAs in heart and plasma of mice with heart failure. PLoS ONE 8, e77938 (2013)
Chen, L., Guo, X., Li, Z., He, Y.: Relationship between long non-coding RNAs and Alzheimer’s disease: a systematic review. Pathol. Res. Pract. 215(1), 12–20 (2019)
Li, D., Chen, G., Yang, J., et al.: Transcriptome analysis reveals distinct patterns of long noncoding RNAs in heart and plasma of mice with heart failure. PLoS ONE 8(10), e77938 (2013)
Vucicevic, D., Schrewe, H., Orom, U.A., et al.: Molecular mechanisms of long ncRNAs in neurological disorders. Front. Genet. 4, 48 (2014)
Gutschner, T., Hammerle, M., Eissmann, M., et al.: The noncoding RNA MALAT1 is a critical regulator of the metastasis phenotype of lung cancer cells. Cancer Res. 73(3), 1180–1189 (2013)
Bao, Z., Yang, Z., Huang, Z., et al.: LncRNADisease 2.0: an updated database of long non-coding RNA-associated diseases. Nucleic Acids Res. 47(D1), D1034 (2019)
Wang, H., Hu, H., Xiang, Z., Lu, C., Dai, F., Tong, X.: Identification and characterization of a new long noncoding RNA iab-1 in the Hox cluster of silkworm Bombyx mori identification of iab-1. J. Cell Biochem. 120(10), 17283–17292 (2019)
Zhang, Y., Wang, X., Kang, L.: A k-mer scheme to predict piRNAs and characterize locust piRNAs. Bioinformatics 27(6), 771–776 (2011)
Banerjee, T., Gupta, S., Ghosh, T.C.: Role of mutational bias and natural selection on genome-wide nucleotide bias in prokaryotic organisms. Biosystems 81(1), 11–18 (2005)
Kong, L., Zhang, Y., Ye, Z.Q.: CPC: Assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 35, W345–W349 (2007)
Zhou, Z.H.: Machine Learning. Tsinghua University Press, Beijing (2016)
Pang, H.-X., Dong, W.-X.: Novel linear search for support vector machine parameter selection. J. Zhejiang Univ. Sci. C 12, 885 (2011)
Derrien, T.: The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2013)
Achawanantakun, R., Chen, J., Sun, Y., Zhang, Y.: LncRNA-ID: long non-coding RNA IDentification using balanced random forests. Bioinformatics 31(24), 3897–3905 (2015)
Acknowledgments
This work is supported by the National Natural Science Foundation of China (61502243, 61502247, 61572263), China Postdoctoral Science Foundation (2018M632349), Zhejiang Engineering Research Center of Intelligent Medicine under 2016E10011, Natural Science Foundation of the Higher Education Institutions of Jiangsu Province in China (No. 16KJD520003).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, Y., Ou, Y., Xu, Z., Gong, L. (2019). Identifying lncRNA Based on Support Vector Machine. In: Wang, H., Siuly, S., Zhou, R., Martin-Sanchez, F., Zhang, Y., Huang, Z. (eds) Health Information Science. HIS 2019. Lecture Notes in Computer Science(), vol 11837. Springer, Cham. https://doi.org/10.1007/978-3-030-32962-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-32962-4_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32961-7
Online ISBN: 978-3-030-32962-4
eBook Packages: Computer ScienceComputer Science (R0)