Skip Navigation

CluSTr


NAR Molecular Biology Database Collection entry number 205
Petryszak, R., Binns, D., Fleischmann, W., Kersey, P., Apweiler, R.
EMBL Outstation - The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

Database Description

The CluSTr (Clusters of Swiss-Prot and TrEMBL proteins) database offers an automatic classification of UniProt Knowledgebase (http://www.ebi.ac.uk/uniprot/) proteins into groups of related proteins. The clustering is based on analysis of all pairwise comparisons between protein sequences. Analysis has been carried out for different levels of protein similarity, yielding a hierarchical organisation of clusters. The database provides links to InterPro (http://www.ebi.ac.uk/interpro/index.html), which integrates information on protein families, domains and functional sites from PROSITE (http://www.expasy.ch/prosite/), PRINTS (http://www.bioinf.man.ac.uk/dbbrowser/PRINTS/), Pfam (http://www.sanger.ac.uk/Software/Pfam/index.shtml), ProDom (http://www.toulouse.inra.fr/prodom.html), SMART (http://smart.embl-heidelberg.de), TIGRFAMs (http://www.jcvi.org/cgi-bin/tigrfams/index.cgiindex.shtml), PDB (http://www.rcsb.org/pdb/), SUPERFAMILY (http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/) and PIR Superfamily (http://pir.georgetown.edu/iproclass/). CluSTr is available for querying and browsing at http://www.ebi.ac.uk/clustr

Recent Developments

Currently CluSTr contains information about proteins from 109 organisms with completely sequenced genomes. A hierarchy of clusters of related proteins (ordered by their degree of similarity) is available for the full data set. Additionally, hierarchies for the proteins of each individual species (and certain collections of species) are available. Proteomes represented in the database include 10 complete eukaryote proteomes (Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Encephalitozoon cuniculi, Guillardia theta, Homo sapiens, Mus musculus, Saccharomyces cerevisiae, Schizosaccharomyces pombe and Rattus norvegicus).

Acknowledgements

The CluSTr project is funded by the TEMBLOR grant (QLRI-CT-2001-00015) of the European Commission under the RTD programme 'Quality of Life and Management of Living Resources’. The bulk of protein similarity data in CluSTr has been produced at SARA supercomputer centre (Amsterdam, The Netherlands) under the Protein World project, which involved collaboration between EBI and Gene-IT (Paris, France), Organon (Oss, The Netherlands) and the Netherlands Bioinformatics Centre (NBIC)/BioASP. The Protein World project has been funded so far by Organon and the Dutch National Computer Facility (NCF) and by NBIC from end of 2004.

References

1. Kriventseva,E.V., Fleischmann,W. and Apweiler,R. (2001) CluSTr: a database of Clusters of SWISS-PROT+TrEMBL proteins. Nucleic Acids Res. 29(1): 33-36.
2. Bairoch, A., Apweiler, R., (2000) The SWISS-PROT protein sequence database and its and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45-48.
3. Apweiler,R., Attwood,T.K., Bairoch,A., Bateman,A., Birney,E., Biswas,M ., Bucher,P., Cerutti,L., Corpet,F., Croning,M.D.R., et al. (2001) The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res. 29(1):37-40.
4. Hofmann, K., Bucher, P., Falquet, L. and Bairoch, A. (1999) The PROSITE database, its status in 1999. Nucleic Acids Res., 27, 215-219.
5. Attwood, T.K., Croning, M.D.R., Flower, D.R., Lewis, A.P., Mabey, J.E., Scordis, P., Selley, J.N. and Wright, W. (2000) PRINTS-S: the database formerly known as PRINTS. Nucleic Acids Res., 28, 225-227.
6. Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Howe, K.L. and Sonnhammer, E.L. (2000) The Pfam protein families database. Nucleic Acids Res., 28, 263-266.
7. Corpet, F., Servant, F., Gouzy, J. and Kahn, D. (2000) ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res., 28, 267-269.
8. Schultz J., Copley R.R., Doerks T., Ponting C.P., Bork P.(2000) SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res. 28(1),229-32.
9. Holm, L. and Sander, C. (1999) Protein folds and families: sequence and structure alignments. Nucleic Acids Res., 27, 244-247.
10. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N. and Bourne, P.E. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235-242.


Go to the abstract in the NAR 2003 Database Issue.
Oxford University Press is not responsible for the content of external internet sites