NAR Molecular Biology Database Collection entry number 911
Pruitt, Kim; Farrell, Catherine; O'Leary, Nuala; Harte, Rachel; Loveland, Jane; Wilming, Laurens; Wallin, Craig; Diekhans, Mark; Barrell, Daniel; Searle, Stephen; Aken, Bronwen; Hiatt, Susan; Frankish, Adam; Suner, Marie-Marthe; Rajput, Bhanu; Steward, Charles; Brown, Garth; Bennett, Ruth; Murphy, Michael; Wu, Wendy; Kay, Mike; Hart, Jennifer; Rajan, Jeena; Weber, Janet; Snow, Catherine; Riddick, Lillian; Hunt, Toby; Webb, David; Tamez, Pamela; Rangwala, Sanjida; McGarvey, Kelly; Mudge, Jonathan; Pujar, Shashikant; Shkeda, Andrei; Mudge, Jonathan; Gonzalez, Jose; Gilbert, James G. R.; Trevanion, Stephen; Baertsch, Robert; Harrow, Jennifer; Ostell, James; Haussler, David; Hubbard, Tim
1National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
2University of California, Santa Cruz, CBSE/ITI-UCSC, Ste 501 E2 Bld., Santa Cruz, CA 95064, USA
3Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
4European Bioinformatics Insitute, Hinxton, Cambridge CB10 1SD, UK

Database Description

The Consensus CoDing Sequence (CCDS) project identifies the position of protein-coding regions of human and mouse genes that are annotated consistently by the collaborators and are supported by transcript evidence, use of canonical splice sites, and other quality assurance measures.

The CCDS set is calculated after a coordinated whole-genome annotation update by the collaborating annotation groups: NCBI, EBI, and WTSI. The CCDS set includes only those protein coding annotations that precisely agree in every exon placement across the entire coding region. UCSC provides independent and intensive quality control procedures augmented by work from the other groups. Once a coding gene structure is agreed, it will be maintained for future releases unless there is good evidence to modify or remove it. The collaboration has improved independent annotation methods (both computational analysis and manual annotation) and we expect continued growth in the CCDS set as more gene structures and more organisms are included in the collection.

The long-term goal of the CCDS project is to converge toward a full set of standard gene annotations for finished high-quality genomes. We anticipate that through continued manual review, experimental validation of weakly supported genes, and as automatic annotation methods continue to improve, the CCDS set will become more complete.

The CCDS web site ( provides information about CCDS and sequence identifiers, the CDS data as both coordinates and sequence, and links to genome browsers. WTSI, UCSC, EBI and NCBI web sites indicate genome annotation data that have a CCDS status.

Recent Developments

The scope was expanded to include the mouse genome annotation and an update for the human genome is underway.

Go to the article in the NAR Database issue.
Oxford University Press is not responsible for the content of external internet sites