The UniProt Reference Clusters are three separate datasets that compress sequence space at different resolutions, achieved by merging sequences and sub-sequences that are 100% (UniRef100), >=90% (UniRef90), or >=50% (UniRef50) identical, regardless of source organism. The UniRef100 database provides the most comprehensive non-redundant coverage of the known protein sequence space including not only all of UniProtKB but also splice variants that are not separated out in these databases, as well as additional active sequences from UniParc. The UniRef90 and UniRef50 databases provide a more even sampling of sequences by reducing the numbers of closely related sequence. This speeds sequence similarity searches while rendering such searches more informative. The compression of UniRef100 into UniRef90 and UniRef50 yields size reductions of approximately 40% and 65%, respectively.


UniProt is mainly supported by the National Institutes of Health (NIH) grant 1 U01 HG02712-01. Minor support for the EBI's involvement in UniProt comes from the two European Union contracts BioBabel (QLRT-2000-00981) and TEMBLOR (QLRI-2001-00015) and from the NIH grant 1R01HGO2273-01. Swiss-Prot activities at the SIB are supported by the Swiss Federal Government through the Federal Office of Education and Science. PIR activities are also supported by the National Science Foundation (NSF) grants DBI-0138188 and ITR-0205470.


