NAR Molecular Biology Database Collection entry number 205
Petryszak, R., Binns, D., Fleischmann, W., Kersey, P., Apweiler, R.
EMBL Outstation - The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

Database Description

The CluSTr (Clusters of Swiss-Prot and TrEMBL proteins) database offers an automatic classification of UniProt Knowledgebase ( proteins into groups of related proteins. The clustering is based on analysis of all pairwise comparisons between protein sequences. Analysis has been carried out for different levels of protein similarity, yielding a hierarchical organisation of clusters. The database provides links to InterPro (, which integrates information on protein families, domains and functional sites from PROSITE (, PRINTS (, Pfam (, ProDom (, SMART (, TIGRFAMs (, PDB (, SUPERFAMILY ( and PIR Superfamily ( CluSTr is available for querying and browsing at

Recent Developments

Currently CluSTr contains information about proteins from 109 organisms with completely sequenced genomes. A hierarchy of clusters of related proteins (ordered by their degree of similarity) is available for the full data set. Additionally, hierarchies for the proteins of each individual species (and certain collections of species) are available. Proteomes represented in the database include 10 complete eukaryote proteomes (Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Encephalitozoon cuniculi, Guillardia theta, Homo sapiens, Mus musculus, Saccharomyces cerevisiae, Schizosaccharomyces pombe and Rattus norvegicus).


The CluSTr project is funded by the TEMBLOR grant (QLRI-CT-2001-00015) of the European Commission under the RTD programme 'Quality of Life and Management of Living Resources’. The bulk of protein similarity data in CluSTr has been produced at SARA supercomputer centre (Amsterdam, The Netherlands) under the Protein World project, which involved collaboration between EBI and Gene-IT (Paris, France), Organon (Oss, The Netherlands) and the Netherlands Bioinformatics Centre (NBIC)/BioASP. The Protein World project has been funded so far by Organon and the Dutch National Computer Facility (NCF) and by NBIC from end of 2004.


