During the course of evolution, protein sequences derived from a common ancestor diverge by mutations, insertions and deletions, gene duplication and recombination and give rise to diverse families with no easily detectable sequence similarity. These relationships are often revealed only after the availability of protein structures and their structural comparison. SUPFAM [1-3] is a database of potential superfamily relationships derived from identifying distant evolutionary relationships between protein sequence families (Pfam families) and structural families (SCOP) using a rigorous profile-profile comparison method, AlignHUSH . The methodology exploits the evolutionary information inherent of SCOP classification to identify related Pfam families. The present SUPFAM database update (Release 6) has been derived using Pfam (version 27.0)  and SCOP database (version 1.75) . Firstly each Pfam family profile is searched against the SCOP family profiles to identify possible evolutionary relationships using AlignHUSH. We identify 5017 Pfam families could be associated with SCOP superfamilies. Secondly, the remaining Pfam families are searched against a database of profiles of Pfam families, to identify Pfam families that could be indirectly related to a SCOP family. About 247 Pfam families were associated with other Pfam families mapped to SCOP superfamily. Thus in the present database, associations of 5295 Pfam families (out of 14831 ~ 36%) with a SCOP family are reported. SUPFAM database also consists of clusters wherein Pfam families which could not be mapped to any structural superfamilies, but are found to be related to one another and are clustered together to form "Potentially New Superfamilies (PNSFs)". These PNSFs (126 in number) could provide an important resource for structural genomics initiative targets.
The current update of SUPFAM uses functionally related families in Pfam (Version 27.0) and structural families of SCOP (1.75v). A rigorous profile-profile based comparison method (AlignHUSH) has been used for the generation of the database. In this update we have applied a meticulous decision tree with stringent Z-score and length coverage thresholds. The decision tree also extends to automatedly remove the ambiguous relationships. 5295 Pfam families are now mapped (directly or indirectly) to 1656 SCOP structural super-families. We have identified more than 250 additional relationships from the previous release. We provide the details of these evolutionary connections along with the Z-scores and also specifically report the non-obvious indirect relationships derived from Pfam family connections. For each SCOP superfamily in our database, we provide a graphical representation of the families within that superfamily and their related Pfam families. About 126 potentailly new functional superfamilies have been identified comprising of 572 Pfam families which could not be associated with any SCOP structural family using AlignHUSH. Details of 684 DUF/UPF connections to SCOP domain families and 131 DUF/UPF relationships with other Pfam families have also been provided.
1. Pandit, S.B., Gosar, D., Abhiman, S., Sujatha, S., Dixit, S.S., Mhatre, N.S., Sowdhamini, R. and Srinivasan, N. (2002) SUPFAM - Database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: Implications for structural genomics and function annotation in genomes. Nucleic Acids Res. 30, 289-293
2. Pandit, S.B., Bhadra, R., Gowri, V.S., Balaji, S., Anand, B., Srinivasan, N. (2004). SUPFAM: A database of sequence superfamilies of protein domains. BMC Bioinformatics, 5, 28
3. Namboori S, Srinivasan N, Pandit SB. Recognition of remotely related structural homologues using sequence profiles of aligned homologous protein structures.In Silico Biol. 2004 4(4): 445-60
4. Krishnadev, O., and Srinivasan, N. (2011). AlignHUSH: Alignment of HMMs using structure and hydrophobicity information. BMC Bioinformatics. 12, 275
5. Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L.L., et.al. (2004) The Pfam Protein Families Database. Nucleic Acids Res. 32, D138-D141
6. Murzin A. G., Brenner S. E., Hubbard T., Chothia C. (1995). SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536-540
Go to the article in the NAR Database issue.
Oxford University Press is not responsible for the content of external internet sites