Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers

Richa, Tambi; Ide, Soichiro; Suzuki, Ryosuke; Ebina, Teppei; Kuroda, Yutaka

doi:10.1007/s10822-016-9999-8

Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers

Published: 27 December 2016

Volume 31, pages 237–244, (2017)
Cite this article

Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Tambi Richa¹,
Soichiro Ide¹,
Ryosuke Suzuki¹,
Teppei Ebina¹^nAff2 &
…
Yutaka Kuroda¹

315 Accesses
2 Citations
Explore all metrics

Abstract

Efficient and rapid prediction of domain regions from amino acid sequence information alone is often required for swift structural and functional characterization of large multi-domain proteins. Here we introduce Fast H-DROP, a thirty times accelerated version of our previously reported H-DROP (Helical Domain linker pRediction using OPtimal features), which is unique in specifically predicting helical domain linkers (boundaries). Fast H-DROP, analogously to H-DROP, uses optimum features selected from a set of 3000 ones by combining a random forest and a stepwise feature selection protocol. We reduced the computational time from 8.5 min per sequence in H-DROP to 14 s per sequence in Fast H-DROP on an 8 Xeon processor Linux server by using SWISS-PROT instead of Genbank non-redundant (nr) database for generating the PSSMs. The sensitivity and precision of Fast H-DROP assessed by cross-validation were 33.7 and 36.2%, which were merely ~2% lower than that of H-DROP. The reduced computational time of Fast H-DROP, without affecting prediction performances, makes it more interactive and user-friendly. Fast H-DROP and H-DROP are freely available from http://domserv.lab.tuat.ac.jp/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Decoding the Structural Keywords in Protein Structure Universe

Article 18 January 2019

PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach

Article Open access 11 March 2016

PyPropel: a Python-based tool for efficiently processing and characterising protein data

Article Open access 01 March 2025

References

Han JH, Batey S, Nickson AA, Teichmann SA, Clarke J (2007) The folding and evolution of multidomain proteins. Nat Rev Mol Cell Biol 8(4):319–330
Article CAS Google Scholar
Itoh K, Sasai M (2008) Cooperativity, connectivity, and folding pathways of multidomain proteins. Proc Natl Acad Sci USA 105(37):13865–13870
Article CAS Google Scholar
Jacobs SA, Podell ER, Wuttke DS, Cech TR (2005) Soluble domains of telomerase reverse transcriptase identified by high-throughput screening. Protein Sci 14(8):2051–2058
Article CAS Google Scholar
Jawhari A, Boussert S, Lamour V, Atkinson RA, Kieffer B, Poch O, Potier N, van Dorsselaer A, Moras D, Poterszman A (2004) Domain architecture of the p62 subunit from the human transcription/repair factor TFIIH deduced by limited proteolysis and mass spectrometry analysis. Biochemistry 43(45):14420–14430
Article CAS Google Scholar
Song AX, Chang YG, Gao YG, Lin XJ, Shi YH, Lin DH, Hang QH, Hu HY (2005) Identification, expression, and purification of a unique stable domain from human HSPC144 protein. Protein Expr Purif 42(1):146–152
Article CAS Google Scholar
Both D, Steiner EM, Stadler D, Lindqvist Y, Schnell R, Schneider G (2013) Structure of LdtMt2, an L, D-transpeptidase from Mycobacterium tuberculosis. Acta Crystallogr Sect D 69(Pt 3):432–441
Article Google Scholar
Hasegawa J, Tokuda E, Tenno T, Tsujita K, Sawai H, Hiroaki H, Takenawa T, Itoh T (2011) SH3YL1 regulates dorsal ruffle formation by a novel phosphoinositide-binding domain. J Cell Biol 193(5):901–916
Article CAS Google Scholar
Chikayama E, Kurotani A, Tanaka T, Yabuki T, Miyazaki S, Yokoyama S, Kuroda Y (2010) Mathematical model for empirically optimizing large scale production of soluble protein domains. BMC Bioinform 11(1):1–9
Article Google Scholar
Hondoh T, Kato A, Yokoyama S, Kuroda Y (2006) Computer-aided NMR assay for detecting natively folded structural domains. Protein Sci 15(4):871–883
Article CAS Google Scholar
Bondugula R, Lee MS, Wallqvist A (2009) FIEFDom: a transparent domain boundary recognition system using a fuzzy mean operator. Nucleic Acids Res 37(2):452–462
Article CAS Google Scholar
Dumontier M, Yao R, Feldman HJ, Hogue CW (2005) Armadillo: domain boundary prediction by amino acid composition. J Mol Biol 350(5):1061–1073
Article CAS Google Scholar
Ebina T, Toh H, Kuroda Y (2009) Loop-length-dependent SVM prediction of domain linkers for high-throughput structural proteomics. Biopolymers 92(1):1–8
Article CAS Google Scholar
Ebina T, Toh H, Kuroda Y (2011) DROP: an SVM domain linker predictor trained with optimal features selected by random forest. Bioinformatics 27(4):487–494
Article CAS Google Scholar
Eickholt J, Deng X, Cheng J (2011) DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning. BMC Bioinform 12(1):1–8
Article Google Scholar
Miyazaki S, Kuroda Y, Yokoyama S (2002) Characterization and prediction of linker sequences of multi-domain proteins by a neural network. J Struct Func Genom 2(1):37–51
Article CAS Google Scholar
Miyazaki S, Kuroda Y, Yokoyama S (2006) Identification of putative domain linkers by a neural network – application to a large sequence database. BMC Bioinform 7(1):1–9
Article Google Scholar
Sim J, Kim SY, Lee J (2005) PPRODO: prediction of protein domain boundaries using neural networks. Proteins 59(3):627–632
Article CAS Google Scholar
Suyama M, Ohara O (2003) DomCut: prediction of inter-domain linker regions in amino acid sequences. Bioinformatics 19(5):673–674
Article CAS Google Scholar
Tanaka T, Kuroda Y, Yokoyama S (2003) Characteristics and prediction of domain linker sequences in multi-domain proteins. J Struct Func Genom 4(2–3):79–85
Article CAS Google Scholar
Xue Z, Xu D, Wang Y, Zhang Y (2013) ThreaDom: extracting protein domain boundary information from multiple threading alignments. Bioinformatics 29(13):i247–i256
Article CAS Google Scholar
Tanaka T, Yokoyama S, Kuroda Y (2006) Improvement of domain linker prediction by incorporating loop-length-dependent characteristics. Biopolymers 84:161–168
Article CAS Google Scholar
Reddy Chichili VP, Kumar V, Sivaraman J (2013) Linkers in the structural biology of protein–protein interactions. Protein Sci 22(2):153–167
Article CAS Google Scholar
George RA, Heringa J (2002) An analysis of protein domain linkers: their classification and role in protein folding. Protein Eng 15(11):871–879
Article CAS Google Scholar
Gokhale RS, Khosla C (2000) Role of linkers in communication between protein modules. Curr Opin Chem Biol 4(1):22–27
Article CAS Google Scholar
Zaki N (2009) Protein–protein interaction prediction using homology and inter-domain linker region information. In: Ao S-I, Gelman L (eds) Advances in Electrical Engineering and Computational Science. Springer Netherlands, Dordrecht, pp 635–645
Chapter Google Scholar
Argos P (1990) An investigation of oligopeptides linking domains in protein tertiary structures and possible candidates for general gene fusion. J Mol Biol 211(4):943–958
Article CAS Google Scholar
Zhu X, Zhao X, Burkholder WF, Gragerov A, Ogata CM, Gottesman ME, Hendrickson WA (1996) Structural analysis of substrate binding by the molecular chaperone DnaK. Science 272(5268):1606–1614
Article CAS Google Scholar
Ebina T, Suzuki R, Tsuji R, Kuroda Y (2014) H-DROP: an SVM based helical domain linker predictor trained with features optimized by combining random forest and stepwise selection. J Comput-Aided Mol Des 28(8):831–839
Article CAS Google Scholar
Ebina T, Umezawa Y, Kuroda Y (2013) IS-Dom: a dataset of independent structural domains automatically delineated from protein structures. J Comput-Aided Mol Des 27(5):419–426
Article CAS Google Scholar
Xu Y, Xu D, Gabow HN (2000) Protein domain decomposition using a graph-theoretic approach. Bioinformatics 16(12):1091–1104
Article CAS Google Scholar
Hirose S, Shimizu K, Kanai S, Kuroda Y, Noguchi T (2007) POODLE-L: a two-level SVM prediction system for reliably predicting long disordered regions. Bioinformatics 23(16):2046–2053
Article CAS Google Scholar
Liaw A, Wiener M (2002) Classification and Regression by randomForest. R News 2 (3):18–22.
Google Scholar
Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292(2):195–202
Article CAS Google Scholar
Ahmad S, Sarai A (2005) PSSM-based prediction of DNA binding sites in proteins. BMC Bioinform 6:33
Article Google Scholar
Mishra NK, Chang J, Zhao PX (2014) Prediction of membrane transport proteins and their substrate specificities using primary sequence information. PloS ONE 9 (6):e100278.
Article Google Scholar
Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH (2007) UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23(10):1282–1288
Article CAS Google Scholar
El-Manzalawy Y, Abbas M, Malluhi Q, Honavar V (2016) FastRNABindR: Fast and Accurate Prediction of Protein–RNA Interface Residues. PloS ONE 11 (7):e0158445.
Article Google Scholar
Bairoch A, Apweiler R (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res 28(1):45–48
Article CAS Google Scholar
Garg A, Raghava GP (2008) ESLpred2: improved method for predicting subcellular localization of eukaryotic proteins. BMC Bioinform 9:503
Article Google Scholar
Miao Z, Westhof E (2015) Prediction of nucleic acid binding probability in proteins: a neighboring residue network based score. Nucleic Acids Res 43(11):5340–5351
Article CAS Google Scholar
Schnoes AM, Brown SD, Dodevski I, Babbitt PC (2009) Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput Biol 5(12):e1000605
Article Google Scholar
Sawa J, Malet H, Krojer T, Canellas F, Ehrmann M, Clausen T (2011) Molecular adaptation of the DegQ protease to exert protein quality control in the bacterial cell envelope. J Biol Chem 286(35):30680–30690
Article CAS Google Scholar
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637
Article CAS Google Scholar

Download references

Acknowledgements

This work was supported by the Japan Society for the Promotion of Science (JSPS) postdoctoral fellowship to TR.

Author information

Teppei Ebina
Present address: Department of Physiology, Graduate school of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan

Authors and Affiliations

Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 12-24-16 Nakamachi, Koganei-shi, Tokyo, 184-8588, Japan
Tambi Richa, Soichiro Ide, Ryosuke Suzuki, Teppei Ebina & Yutaka Kuroda

Authors

Tambi Richa
View author publications
You can also search for this author inPubMed Google Scholar
Soichiro Ide
View author publications
You can also search for this author inPubMed Google Scholar
Ryosuke Suzuki
View author publications
You can also search for this author inPubMed Google Scholar
Teppei Ebina
View author publications
You can also search for this author inPubMed Google Scholar
Yutaka Kuroda
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Yutaka Kuroda.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 640 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Richa, T., Ide, S., Suzuki, R. et al. Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers. J Comput Aided Mol Des 31, 237–244 (2017). https://doi.org/10.1007/s10822-016-9999-8

Download citation

Received: 24 July 2016
Accepted: 10 December 2016
Published: 27 December 2016
Issue Date: February 2017
DOI: https://doi.org/10.1007/s10822-016-9999-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Decoding the Structural Keywords in Protein Structure Universe

PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach

PyPropel: a Python-based tool for efficiently processing and characterising protein data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (PDF 640 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now