MulPSSM
NAR Molecular Biology Database Collection entry number 844
Gayatri R., Mohanty S., Mudgal R., Krishnadev O., and Srinivasan N.
Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
Contact ns@mbu.iisc.ernet.in
Database Description
Representation of multiple sequence alignments of protein families in terms of Position Specific Scoring Matrices (PSSMs) is commonly used in the detection of remote homologues. A PSSM is generated with respect to one of the sequences involved in the multiple sequence alignment as a reference. We have shown earlier that use of multiple PSSMs corresponding to an alignment, with several sequences in the family used as reference, improves the sensitivity of the remote homology detection dramatically [1,2]. MulPSSM contains PSSMs for a large number of sequence and structural families of protein domains with multiple PSSMs for every family [3]. The approach involves use of a clustering algorithm to identify most distinct sequences corresponding to a family. With each one of the distinct sequences as reference, multiple PSSMs have been generated.
Recent Developments
The current release of MulPSSM contains 304,570 PSSMs corresponding to 12,273 sequence based families from Pfam (version 25.0). 14235 PSSMs corresponding to 3856 structural families in SCOP (Version 1.75) [5]. A RPS-BLAST [6,7] interface allows sequence search against PSSMs of sequence or structural families or both. The presentation of data has been done using dynamic HTML. There is also an option to obtain the results of the RPS-BLAST as a dendrogram. Such a dendrogram will enable the users in better understanding of the relationship amongst the query and the various hits, along with an ease with which users can identify closely and remotely related hits to the query. An analysis interface allows display and convenient navigation of alignments and domain hits.
Acknowledgements
This work is supported by Department of Biotechnology, New Delhi.
References
1. Anand B., Gowri V.S., Srinivasan N., (2005) Use of multiple profiles corresponding to a sequence alignment enables effective detection of remote homologues. Bioinformatics 21, 2821-2826.
2. Gowri, V.S., Tina, K.G., Krishnadev, O., Srinivasan, N. (2007) Strategies for the effective identification of remotely related sequences in multiple PSSM search approach. Proteins 67, 789-794.
3. Gowri V.S., Krishnadev O., Swamy C.S.,
Srinivasan N. (2006) MulPSSM: a database of multiple position-specific scoring matrices of protein domain families. Nucleic Acids Res. 34, D243-246.
4. Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L.L., et.al. (2004) The Pfam Protein Families Database. Nucleic Acids Res. 32, D138-D141.
5. Balaji, S., Sujatha, S., Kumar, S.S.C. and Srinivasan, N. (2001) PALI: A database of Phylogeny and ALIgnment of homologous protein structures. Nucleic Acids Res. 29, 61-65.
6. Schaffer, A. A., Wolf, Y. I., Ponting, C. P., Koonin, E. V., Aravind, L., Altschul, S. F. (1999) IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics 12, 1000-1011.
7. Marchler-Bauer, A., Panchenko, A.R., Shoemaker, B.A., Thiessen, P.A., Geer, L.Y., Bryant, S.H. (2002) CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res. 30, 281-283.
Category: Protein sequence databases
Subcategory: Protein domain databases; protein classification
Go to the abstract in the NAR 2006 Database Issue.
Oxford University Press is not responsible for the content of external internet sites