NAR Molecular Biology Database Collection entry number 776
Apweiler R.1, Bairoch A.2 and Wu C.H.3,4
1 EMBL Outstation - The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
2Swiss Institute of Bioinformatics and Department of Structural Biology and Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211 Geneva 4, Switzerland
3Department of Biochemistry and Molecular Biology, and 4Protein Information Resource, Georgetown University Medical Center, Suite 1200, 3300 Whitehaven Street NW, Washington, DC 20007, USA

Database Description

The UniProt Reference Clusters are three separate datasets that compress sequence space at different resolutions, achieved by merging sequences and sub-sequences that are 100% (UniRef100), >=90% (UniRef90), or >=50% (UniRef50) identical, regardless of source organism. The UniRef100 database provides the most comprehensive non-redundant coverage of the known protein sequence space including not only all of UniProtKB but also splice variants that are not separated out in these databases, as well as additional active sequences from UniParc. The UniRef90 and UniRef50 databases provide a more even sampling of sequences by reducing the numbers of closely related sequence. This speeds sequence similarity searches while rendering such searches more informative. The compression of UniRef100 into UniRef90 and UniRef50 yields size reductions of approximately 40% and 65%, respectively.


UniProt is mainly supported by the National Institutes of Health (NIH) grant 1 U01 HG02712-01. Minor support for the EBI's involvement in UniProt comes from the two European Union contracts BioBabel (QLRT-2000-00981) and TEMBLOR (QLRI-2001-00015) and from the NIH grant 1R01HGO2273-01. Swiss-Prot activities at the SIB are supported by the Swiss Federal Government through the Federal Office of Education and Science. PIR activities are also supported by the National Science Foundation (NSF) grants DBI-0138188 and ITR-0205470.


Wu, C., Bairoch, A., Apweiler, R., Natale, D.A., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Mazumder, R., O'Donovan, C., Redaschi, N. (2006). The Universal Protein Resource (UniProt): an expanding universe of protein information. Database issue. Nucleic Acids Res. 34: Database issue (in press).

Go to the article in the NAR Database issue.
Go to the http://bioinformatics.oxfordjournals.org/cgi/content/abstract/23/10/1282 (abstract) in Bioinformatics, 2005, 22, 4133-4139
Oxford University Press is not responsible for the content of external internet sites