Abstract
Efficient and rapid prediction of domain regions from amino acid sequence information alone is often required for swift structural and functional characterization of large multi-domain proteins. Here we introduce Fast H-DROP, a thirty times accelerated version of our previously reported H-DROP (Helical Domain linker pRediction using OPtimal features), which is unique in specifically predicting helical domain linkers (boundaries). Fast H-DROP, analogously to H-DROP, uses optimum features selected from a set of 3000 ones by combining a random forest and a stepwise feature selection protocol. We reduced the computational time from 8.5 min per sequence in H-DROP to 14 s per sequence in Fast H-DROP on an 8 Xeon processor Linux server by using SWISS-PROT instead of Genbank non-redundant (nr) database for generating the PSSMs. The sensitivity and precision of Fast H-DROP assessed by cross-validation were 33.7 and 36.2%, which were merely ~2% lower than that of H-DROP. The reduced computational time of Fast H-DROP, without affecting prediction performances, makes it more interactive and user-friendly. Fast H-DROP and H-DROP are freely available from http://domserv.lab.tuat.ac.jp/.




Similar content being viewed by others
References
Han JH, Batey S, Nickson AA, Teichmann SA, Clarke J (2007) The folding and evolution of multidomain proteins. Nat Rev Mol Cell Biol 8(4):319–330
Itoh K, Sasai M (2008) Cooperativity, connectivity, and folding pathways of multidomain proteins. Proc Natl Acad Sci USA 105(37):13865–13870
Jacobs SA, Podell ER, Wuttke DS, Cech TR (2005) Soluble domains of telomerase reverse transcriptase identified by high-throughput screening. Protein Sci 14(8):2051–2058
Jawhari A, Boussert S, Lamour V, Atkinson RA, Kieffer B, Poch O, Potier N, van Dorsselaer A, Moras D, Poterszman A (2004) Domain architecture of the p62 subunit from the human transcription/repair factor TFIIH deduced by limited proteolysis and mass spectrometry analysis. Biochemistry 43(45):14420–14430
Song AX, Chang YG, Gao YG, Lin XJ, Shi YH, Lin DH, Hang QH, Hu HY (2005) Identification, expression, and purification of a unique stable domain from human HSPC144 protein. Protein Expr Purif 42(1):146–152
Both D, Steiner EM, Stadler D, Lindqvist Y, Schnell R, Schneider G (2013) Structure of LdtMt2, an L, D-transpeptidase from Mycobacterium tuberculosis. Acta Crystallogr Sect D 69(Pt 3):432–441
Hasegawa J, Tokuda E, Tenno T, Tsujita K, Sawai H, Hiroaki H, Takenawa T, Itoh T (2011) SH3YL1 regulates dorsal ruffle formation by a novel phosphoinositide-binding domain. J Cell Biol 193(5):901–916
Chikayama E, Kurotani A, Tanaka T, Yabuki T, Miyazaki S, Yokoyama S, Kuroda Y (2010) Mathematical model for empirically optimizing large scale production of soluble protein domains. BMC Bioinform 11(1):1–9
Hondoh T, Kato A, Yokoyama S, Kuroda Y (2006) Computer-aided NMR assay for detecting natively folded structural domains. Protein Sci 15(4):871–883
Bondugula R, Lee MS, Wallqvist A (2009) FIEFDom: a transparent domain boundary recognition system using a fuzzy mean operator. Nucleic Acids Res 37(2):452–462
Dumontier M, Yao R, Feldman HJ, Hogue CW (2005) Armadillo: domain boundary prediction by amino acid composition. J Mol Biol 350(5):1061–1073
Ebina T, Toh H, Kuroda Y (2009) Loop-length-dependent SVM prediction of domain linkers for high-throughput structural proteomics. Biopolymers 92(1):1–8
Ebina T, Toh H, Kuroda Y (2011) DROP: an SVM domain linker predictor trained with optimal features selected by random forest. Bioinformatics 27(4):487–494
Eickholt J, Deng X, Cheng J (2011) DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning. BMC Bioinform 12(1):1–8
Miyazaki S, Kuroda Y, Yokoyama S (2002) Characterization and prediction of linker sequences of multi-domain proteins by a neural network. J Struct Func Genom 2(1):37–51
Miyazaki S, Kuroda Y, Yokoyama S (2006) Identification of putative domain linkers by a neural network – application to a large sequence database. BMC Bioinform 7(1):1–9
Sim J, Kim SY, Lee J (2005) PPRODO: prediction of protein domain boundaries using neural networks. Proteins 59(3):627–632
Suyama M, Ohara O (2003) DomCut: prediction of inter-domain linker regions in amino acid sequences. Bioinformatics 19(5):673–674
Tanaka T, Kuroda Y, Yokoyama S (2003) Characteristics and prediction of domain linker sequences in multi-domain proteins. J Struct Func Genom 4(2–3):79–85
Xue Z, Xu D, Wang Y, Zhang Y (2013) ThreaDom: extracting protein domain boundary information from multiple threading alignments. Bioinformatics 29(13):i247–i256
Tanaka T, Yokoyama S, Kuroda Y (2006) Improvement of domain linker prediction by incorporating loop-length-dependent characteristics. Biopolymers 84:161–168
Reddy Chichili VP, Kumar V, Sivaraman J (2013) Linkers in the structural biology of protein–protein interactions. Protein Sci 22(2):153–167
George RA, Heringa J (2002) An analysis of protein domain linkers: their classification and role in protein folding. Protein Eng 15(11):871–879
Gokhale RS, Khosla C (2000) Role of linkers in communication between protein modules. Curr Opin Chem Biol 4(1):22–27
Zaki N (2009) Protein–protein interaction prediction using homology and inter-domain linker region information. In: Ao S-I, Gelman L (eds) Advances in Electrical Engineering and Computational Science. Springer Netherlands, Dordrecht, pp 635–645
Argos P (1990) An investigation of oligopeptides linking domains in protein tertiary structures and possible candidates for general gene fusion. J Mol Biol 211(4):943–958
Zhu X, Zhao X, Burkholder WF, Gragerov A, Ogata CM, Gottesman ME, Hendrickson WA (1996) Structural analysis of substrate binding by the molecular chaperone DnaK. Science 272(5268):1606–1614
Ebina T, Suzuki R, Tsuji R, Kuroda Y (2014) H-DROP: an SVM based helical domain linker predictor trained with features optimized by combining random forest and stepwise selection. J Comput-Aided Mol Des 28(8):831–839
Ebina T, Umezawa Y, Kuroda Y (2013) IS-Dom: a dataset of independent structural domains automatically delineated from protein structures. J Comput-Aided Mol Des 27(5):419–426
Xu Y, Xu D, Gabow HN (2000) Protein domain decomposition using a graph-theoretic approach. Bioinformatics 16(12):1091–1104
Hirose S, Shimizu K, Kanai S, Kuroda Y, Noguchi T (2007) POODLE-L: a two-level SVM prediction system for reliably predicting long disordered regions. Bioinformatics 23(16):2046–2053
Liaw A, Wiener M (2002) Classification and Regression by randomForest. R News 2 (3):18–22.
Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292(2):195–202
Ahmad S, Sarai A (2005) PSSM-based prediction of DNA binding sites in proteins. BMC Bioinform 6:33
Mishra NK, Chang J, Zhao PX (2014) Prediction of membrane transport proteins and their substrate specificities using primary sequence information. PloS ONE 9 (6):e100278.
Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH (2007) UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23(10):1282–1288
El-Manzalawy Y, Abbas M, Malluhi Q, Honavar V (2016) FastRNABindR: Fast and Accurate Prediction of Protein–RNA Interface Residues. PloS ONE 11 (7):e0158445.
Bairoch A, Apweiler R (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res 28(1):45–48
Garg A, Raghava GP (2008) ESLpred2: improved method for predicting subcellular localization of eukaryotic proteins. BMC Bioinform 9:503
Miao Z, Westhof E (2015) Prediction of nucleic acid binding probability in proteins: a neighboring residue network based score. Nucleic Acids Res 43(11):5340–5351
Schnoes AM, Brown SD, Dodevski I, Babbitt PC (2009) Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput Biol 5(12):e1000605
Sawa J, Malet H, Krojer T, Canellas F, Ehrmann M, Clausen T (2011) Molecular adaptation of the DegQ protease to exert protein quality control in the bacterial cell envelope. J Biol Chem 286(35):30680–30690
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637
Acknowledgements
This work was supported by the Japan Society for the Promotion of Science (JSPS) postdoctoral fellowship to TR.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Richa, T., Ide, S., Suzuki, R. et al. Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers. J Comput Aided Mol Des 31, 237–244 (2017). https://doi.org/10.1007/s10822-016-9999-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-016-9999-8