NAR Molecular Biology Database Collection entry number 915
Liu H.1, Hu Z.Z.2 and Wu C.H.2
1University of Maryland at Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
2Georgetown University Medical Center, 3900 Reservoir Road, NW, Washington, DC 20057, USA

Database Description

BioThesaurus is a web-based system that maps a comprehensive collection of protein and gene names to protein entries in the UniProt Knowledgebase (UniProtKB). Currently covering more than two million protein sequences, BioThesaurus consists of over 2.8 million names extracted from multiple molecular biology databases according to the database cross-references provided in iProClass (Wu et al, 2004). The BioThesaurus web site allows the retrieval of synonymous names of given protein entries and the identification of protein entries sharing the same names. The BioThesaurus dataset can be used for automatic protein named entity recognition. It is updated monthly and can be freely downloaded at


The BioThesaurus project is supported by grants DBI-0138188, ITR-0205470 from the National Science Foundation, and in part by grant U01-HG02712 from the National Institutes of Health, USA.


Liu H, Hu ZZ, Zhang J, Wu C. (2005) BioThesaurus: a web-based thesaurus of protein and gene names. Bioinformatics. 2005 Nov 2.

Go to the abstract in Bioinformatics, 2005, 22, 103-105
