MulPSSM
NAR Molecular Biology Database Collection entry number 844
Krishnadev O., Bhaskara R.M., Agarwal G., and Srinivasan N.
Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India.
Contact ns@mbu.iisc.ernet.in
Database Description
Representation of multiple sequence alignments of protein families in terms of Position Specific Scoring Matrices (PSSMs) is commonly used in the detection of remote homologues. A PSSM is generated with respect to one of the sequences involved in the multiple sequence alignment as a reference. We have shown recently that use of multiple PSSMs corresponding to an alignment, with several sequences in the family used as reference, improves the sensitivity of the remote homology detection dramatically (1,2). MulPSSM contains PSSMs for a large number of sequence and structural families of protein domains with multiple PSSMs for every family (3). The approach involves use of a clustering algorithm to identify most distinct sequences corresponding to a family. With each one of the distinct sequences as reference, multiple PSSMs have been generated.
Recent Developments
The current release of MulPSSM contains 40587 and 37986 PSSMs corresponding to 10334 sequence families (4) and 3361 structural families (5). A RPS-BLAST (6,7) interface allows sequence search against PSSMs of sequence or structural families or both. The presentation of data has been done using dynamic HTML. An analysis interface allows display and convenient navigation of alignments and domain hits.
Acknowledgements
OK and GA are supported by Council of Scientific and Industrial Research, New Delhi. This work is supported by Department of Biotechnology, New Delhi.
References
1. Anand B., Gowri V.S., and Srinivasan N., (2005) Use of multiple profiles corresponding to a sequence alignment enables effective detection of remote homologues. Bioinformatics 21, 2821-2826.
2. Gowri, V.S., Tina, K.G., Krishnadev, O., and Srinivasan, N. (2007) Strategies for the effective identification of remotely related sequences in multiple PSSM search approach.Proteins 67, 789-794.
3. Gowri V.S., Krishnadev O., Swamy C.S., and Srinivasan N. (2006) MulPSSM: a database of multiple position-specific scoring matrices of protein domain families. Nucleic Acids Res. 34, D243-246.
4. Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L.L., et al. (2004) The Pfam Protein Families Database. Nucleic Acids Res. 32, D138-D141.
5. Balaji, S., Sujatha, S., Kumar, S.S.C. and Srinivasan, N. (2001) PALI: A database of Phylogeny and ALIgnment of homologous protein structures. Nucleic Acids Res. 29, 61-65.
6. Schaffer, A. A., Wolf, Y. I., Ponting, C. P., Koonin, E. V., Aravind, L., and Altschul, S. F. (1999) IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics 12, 1000-1011.
7. Marchler-Bauer, A., Panchenko, A.R., Shoemaker, B.A., Thiessen, P.A., Geer, L.Y., and Bryant, S.H. (2002) CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res. 30, 281-283.
2. Gowri, V.S., Tina, K.G., Krishnadev, O., and Srinivasan, N. (2007) Strategies for the effective identification of remotely related sequences in multiple PSSM search approach.Proteins 67, 789-794.
3. Gowri V.S., Krishnadev O., Swamy C.S., and Srinivasan N. (2006) MulPSSM: a database of multiple position-specific scoring matrices of protein domain families. Nucleic Acids Res. 34, D243-246.
4. Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L.L., et al. (2004) The Pfam Protein Families Database. Nucleic Acids Res. 32, D138-D141.
5. Balaji, S., Sujatha, S., Kumar, S.S.C. and Srinivasan, N. (2001) PALI: A database of Phylogeny and ALIgnment of homologous protein structures. Nucleic Acids Res. 29, 61-65.
6. Schaffer, A. A., Wolf, Y. I., Ponting, C. P., Koonin, E. V., Aravind, L., and Altschul, S. F. (1999) IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics 12, 1000-1011.
7. Marchler-Bauer, A., Panchenko, A.R., Shoemaker, B.A., Thiessen, P.A., Geer, L.Y., and Bryant, S.H. (2002) CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res. 30, 281-283.
Category: Protein sequence databases
Subcategory: Protein domain databases; protein classification
Go to the abstract in the NAR 2006 Database Issue.
Oxford University Press is not responsible for the content of external internet sites