HUGE - Human Unidentified Gene-Encoded large proteins

NAR Molecular Biology Database Collection entry number 171

Database Description

HUGE is a database for human large proteins newly identified in the Kazusa cDNA project, the aim of which is to predict the primary structure of proteins from the sequences of human large cDNAs (>4 kb). In particular, we have been focusing on cDNA clones capable of coding for large proteins (>50 kDa) derived from brain. HUGE contains more than 1600 cDNA sequences with the results of computer analysis of the sequences at DNA and amino acid level, the chromosomal mapping and the expression profile. The risk of containing artifacts in cDNA cloning is considered to be high when cloning long cDNAs. To obtain accurate cDNA sequence data, we examined all of the protein-coding potentiality by GeneMark analysis, and when the warning for coding-interruption was issued, we performed additional experiments using the reverse transcription-coupled polymerase chain reaction method to detect artifacts in cloning, then corrected the sequences. The number of such revision is 220 that comprises 13 % of total cDNA sequences in HUGE. We added the detailed information on this process to HUGE, besides a periodic increase of the number of cDNA entries. HUGE is available through the World Wide Web at

Subcategory: Human ORFs

