Skip Navigation


NAR Molecular Biology Database Collection entry number 208
Wu, C.H.1, Huang, H.1, Chen, Y.2, Barker, W.C.2
1Department of Biochemistry and Molecular Biology, Georgetown University Medical Center, Washington, DC 20057
2National Biomedical Research Foundation, Washington, DC 20057

Database Description

The iProClass database provides an integrated view of protein information (Huang et al., 2003) and serves as a bioinformatics framework for data integration and associative analysis of proteins (Wu et al., 2004). iProClass presents value-added descriptions of all proteins in UniProtKB and contains comprehensive, up-to-date protein information derived from over 90 biological databases. Rich links to the underlying sources are provided with source attribution, hypertext links, and extracted summary information. The source databases include those of protein sequence, family, function, pathway, protein-protein interaction, complex, post-translational modification, protein expression, structure and structural classification, gene and genome, gene expression, disease, ontology, literature, and taxonomy. The iProClass protein summary report contains: (i) General information: protein ID and name (with synonyms, alternative names), source organism taxonomy (with NCBI taxonomy ID, group, and lineage), and sequence annotations such as gene names, keywords, function, and complex; (ii) Database cross-references: bibliography (with PubMed ID and link to a bibliography information and submission page), gene and genome databases including RefSeq, Entrez gene, gene ontology (with GO hierarchy and evidence tag), enzyme/function (with EC hierarchy, nomenclature and reaction), pathway (with KEGG pathway name and link to pathway map), protein-protein interaction, structure (with PDB 3D structure image, matched residue range, and % sequence identity for all structures matched at >=30% identity), structural classes (with SCOP hierarchy for structures at >=90% identity), sequence features and post-translational modifications (with residues or residue ranges); (iii) Family classification: PIRSF family, InterPro family, Pfam domain (with residue range), Prosite motif (with residue range), COG, and other classifications; and (iv) Sequence display: graphical display of domains and motifs on the amino acid sequence. iProClass is implemented in Oracle 9i database management system, updated biweekly, and searchable by both sequence and text. The data integration in iProClass allows identification of interesting relationships between protein sequence, structure, and function. It supports analyses of proteins in a “systems biology” context and has led to novel functional inference for uncharacterized proteins in the absence of sequence homology (Huang et al., 2005).

Recent Developments

iProClass is used to support an ID mapping service that associates gene and protein IDs (such as NCBI’s gi number and Entrez Gene ID) to UniProtKB identifiers. It is accessible from and currently maps between UniProtKB identifiers and 30 other database identifiers. iProClass has linked to more databases (such as MODBASE and IPI) and added more executive summaries (such as pathway descriptions and up- or down-regulation of genes) in the past year.


Supported by NSF grants DBI-9974855 and DBI-0138188.


1. Huang, H., Barker, W.C., Chen, Y. and Wu, C.H. (2003) iProClass: An integrated database of protein family, function, and structure information. Nucleic Acids Research, 31, 390-392.
2. Wu, C.H., Huang H, Nikolskaya A, Hu Z, Barker WC. (2004) The iProClass integrated database for protein functional analysis. Computational Biology and Chemistry, 28, 87-96.
3. Huang, H., Nikolskaya AN, Vinayaka CR, Chung S, Zhang J and Wu, C.H. (2005) Family classification and integrative associative analysis for protein functional annotation. Progress in bioinformatics. Nova Science Publishers, Inc. (in press)

Go to the abstract in the NAR 2003 Database Issue.
Oxford University Press is not responsible for the content of external internet sites