NAR Molecular Biology Database Collection entry number 561
Natale, D.A.1, Arighi, C.1, Barker, W.C.2, Hu, Z. Z. 1, Huang, H.1, Mazumder, R. 1, Nikolskaya, A. N. 1, Vasudevan. S.2, Vinayaka, C. R.2, Yeh, L.S.2, Wu, C. H.1
1Department of Biochemistry and Molecular Biology, Georgetown University Medical Center, Washington, DC 20057, USA
2National Biomedical Research Foundation, Washington, DC 20057, USA

Database Description

The PIRSF is a network system of protein classification that reflects evolutionary relationships of full-length proteins and domains. The PIRSF classification system accommodates a flexible number of levels that reflect varying degrees of sequence conservation from superfamily to subfamily levels, allowing improved protein annotation, more accurate extraction of conserved functional residues, and classification of distantly related orphan proteins. The primary PIRSF classification unit is the homeomorphic family, whose members are both homologous (evolved from a common ancestor) and homeomorphic (sharing full-length sequence similarity and a common domain architecture). The PIRSF database consists of two data sets, preliminary clusters and curated families. PIRSF families are curated systematically based on literature review and integrative sequence and functional analysis, including sequence and structure similarity, domain architecture, functional association, genome context, and phyletic pattern. The results of classification and expert annotation are summarized in PIRSF family reports with graphical viewers for taxonomic distribution, domain architecture, family hierarchy, and multiple alignment and phylogenetic tree. The PIRSF system provides a comprehensive resource for bioinformatics analysis and comparative studies of protein function and evolution. Domain searches allow identification of evolutionarily related protein families sharing domains or structural folds. Functional convergence and functional divergence are revealed by the relationships between protein classification and curated family functions. The taxonomic distribution allows the identification of lineage-specific or broadly conserved protein families and can reveal horizontal gene transfer. PIRSF is accessible from the web site at http://pir.georgetown.edu/pirsf/ for report retrieval and sequence classification, and the data sets can be downloaded at ftp://ftp.pir.georgetown.edu/pir_databases/pirsf/dagfiles/.


Supported by NIH grant U01-HG02712 and NSF grant DBI-0138188.


Wu, C.H., Nikolskaya A, Huang H, Yeh L-S, Natale D, Vinayaka CR, Hu Z, Mazumder R, Kumar S, Kourtesis P, Ledley RS, Suzek BE, Arminski L, Chen Y, Zhang J, Cardenas JL, Chung S, Castro-Alvear J, Dinkov G and Barker WC. (2004) PIRSF family classification system at the Protein Information Resource. Nucleic Acids Research, 32, D112-114.

Go to the article in the NAR Database issue.
Oxford University Press is not responsible for the content of external internet sites