NRichD


NAR Molecular Biology Database Collection entry number 1772
Richa Mudgal1, Sankaran, Sandhya2, Gayatri Kumar3, Ramanathan Sowdhamini4, Nagasuma R Chandra2 and Narayanaswamy Srinivasan3
1IISc Mathematics Initiative, Indian Institute of Science, Bangalore 560 012, India 2Department of Biochemistry, Indian Institute of Science, Bangalore 560 012, India 3Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India 4National Centre for Biological Sciences, Gandhi Krishi Vignan Kendra Campus, Bellary road, Bangalore 560 065, India

Database Description

Efficiency of protein remote homology detection methods depends on the dispersion of the protein sequence space and the availability of intermediate sequences between two related protein families. In the absence of any structural evidence and natural intermediate sequences, detecting distant evolutionary relationships is a challenging task. Large gaps, between related families, in the sequence space can be bridged through the design of protein-like sequences [1, 2]. In our recent publication [1], we developed a computational algorithm to design protein-like intermediate sequences between related protein families. 3,611,010 artificial sequences were designed between pairs of related protein families for 374 multi-membered SCOP-folds (1.75v). Such computationally designed intermediately related sequences when augmented into commonly employed databases enable detection of remote relationships. Through the NrichD database resource, we provide designed sequences plugged into commonly employed structure and sequence databases [3, 4] for the user to perform homology searches. These enriched databases (SCOP-NrichD and Pfam-NrichD), their respective natural sequence databases (SCOP-DB and Pfam-DB) and the dataset of artificial sequences (AS-DB) can be freely downloaded from the website. User can also perform jackhmmer [5] searches against these enriched databases through the web-portal. Searches are made additionally in their respective natural sequence database to achieve maximum coverage. These intermediate sequences are annotated with their parent profiles, which makes iterative searches traceable and help in fold recognition. Another useful feature provided by the web-server is to generate sequences for or between related families. User can define SCOP domain families or provide a multiple sequence alignments of the protein families and generate artificial sequences at different level of divergence.

Recent Developments

NrichD database (version 1) constitute 4 major databases namely, SCOP(v1.75)-DB, SCOP(v1.75)-NrichD, Pfam(v27.0)-DB and Pfam(v27.0)-NrichD and a dataset of computationally designed intermediate sequences (AS-DB, version1). Update for these computationally designed protein-like sequences will be released with every SCOP database update. This update will also extend to the SCOP-NrichD dataset. Likewise, Pfam-DB and Pfam-NrichD datasets will be updated with each update of Pfam.

Acknowledgements

NrichD database is supported by Mathematical Biology program of Department of Science and Technology as well as by the Department of Biotechnology (Grant code: BT/01/COE/09/01), Government of India.

References

1. Mudgal, R., Sowdhamini, R., Chandra, N., Srinivasan, N. and Sandhya, S. (2014) Filling-in void and sparse regions in protein sequence space by protein-like artificial sequences enables remarkable enhancement in remote homology detection capability. Journal of Molecular Biology, 426, 962-979.
2. Sandhya, S., Mudgal, R., Jayadev, C., Abhinandan, K.R., Sowdhamini, R. and Srinivasan, N. (2012) Cascaded walks in protein sequence space: use of artificial sequences in remote homology detection between natural proteins. Molecular BioSystems, 8, 2076-2084.
3. Murzin, A.G., Brenner, S.E., Hubbard, T. and Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol, 247, 536-540.
4. Sonnhammer, E.L., Eddy, S.R., Birney, E., Bateman, A. and Durbin, R. (1998) Pfam: multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res, 26, 320-322.
5. Eddy, S.R. (1998) Profile hidden Markov models. Bioinformatics, 14, 755-763.


Go to the abstract in the NAR 2015 Database Issue.
Oxford University Press is not responsible for the content of external internet sites