NAR Molecular Biology Database Collection entry number 69
Rudd K.

Database Description

The EcoGene database provides a set of gene and protein sequences derived from the genome sequence of Escherichia coli K-12. EcoGene is a source of re-annotated sequences for the SWISS-PROT and Colibri databases. EcoGene is used for genetic and physical map compilations in collaboration with the Coli Genetic Stock Center. The EcoGene13 release (October, 2000) includes 4295 genes. EcoGene13 differs from the Genbank annotation of the complete genome sequence (in U00096) in several ways, including (1) the revision of 717 predicted or confirmed gene start sites, (2) the correction or hypothetical reconstruction of 61 frameshifts caused by either sequence error or mutation, (3) the reconstruction of 14 protein sequences interrupted by the insertion of IS elements, and (4) predictions that 92 genes are partially deleted gene fragments. A literature survey identified 810 proteins whose N-terminal amino acids have been verified by protein sequencing. 21,899 gene-citation cross-references to 8,552 literature citations, with links to Medline abstracts, are provided. EcoGene is accessible through a WWW interface: Users can search and retrieve information from a set of 4295 individual EcoGene GenePage HTML files. Alternatively, users can download various complete text-based genome-scale datasets including DNA sequences, protein sequences, genome sequence gene positions, database cross-references and descriptive annotation. These datasets can be easily imported into database management systems and should facilitate various genome-scale computational and functional analysis projects. In addition, a database of intergenic DNA sequences, including genome positions for a number of novel or previously characterized intergenic repeat DNA family members is provided.

