European Nucleotide Archive

NAR Molecular Biology Database Collection entry number 2
Silvester, Nicole; Alako, Blaise; Amid, Clara; Cerdeño-Tárraga, Ana; Cleland, Iain; Gibson, Richard; Goodgame, Neil; Kay, Simon; Leinonen, Rasko; Li, Weizhong; Liu, Xin,.; Lopez, Rodrigo; Pakseresht, Nima; Reddy, Kethi; Plaister, Sheila; Radhakrishnan, Rajesh; Rosello, Marc; Senf, Alexander; Smirnov, Dimitriy; Ten Hoopen, Petra; Toribio, Ana; Vaughan, Daniel; Zalunin, Vadim; Cochrane, Guy
EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK

Database Description

The European Nucleotide Archive provides a comprehensive repository for public nucleotide sequence data, attracting external users from a multitude of research disciplines and serving as underlying data infrastructure for services such as Ensembl, UniProt and ArrayExpress. The foundation for the ENA was the EMBL Data Library, which was established in EMBL Heidelberg in the early 1980s. While this component continues to be operated to this day, the mandate of the ENA has expanded enormously as sequencing technology has advanced and the breadth of applications to which sequencing can now be applied has grown. In recent years, for example, we have launched the Sequence Read Archive (for raw data from next generation sequencing platforms) and have taken responsibility for the operation of the existing European Trace Archive (for raw capillary sequence data), which was previously operated by the Wellcome Trust Sanger Institute.
Broadly, ENA captures and presents the whole scale of sequencing information from raw data, through assembly and mapping information that relates very fragmented raw sequence reads into contigs and higher order structures, through to high-level interpretations of the function of parts of nucleic acid molecules, in the form of functional annotation. The ENA achieves comprehensive coverage through partnership with the other global bioinformatics service providers, namely NCBI in the US and DDBJ in Japan. The longest running ENA collaboration, the International Nucleotide Sequence Database Collaboration (INSDC,, has been underway for over a quarter of a century and now serves as a model for data sharing in the life sciences.

Recent Developments

We have introduced important new functionality to ENA submission tools and pipelines. A template-based system has been launched that provides the user with a simple interface tailored to the type of annotation that they wish to report. For submissions of next generation sequence data to the Sequence Read Archive, we have implementated an interactive submission form for the manual upload of small-scale datasets and a web service for automated submissions.
We have also focussed on data presentation and have launched a web browser that allows users to access all ENA data, from raw reads to annotations, in a single integrated system. The browser includes identifier-based retrieval and text search and will include comprehensive sequence similarity search early in 2010. Programmatic access is supported through the use of RESTful services and components can be embedded in third party resources through HTTP and AJAX.


ENA is supported by the European Molecular Biology Laboratory and the Wellcome Trust

