NAR Molecular Biology Database Collection entry number 338
Brazma, A., Parkinson, H., Sarkans, U., Shojatalab, M., Vilo, J., Abeygunawardena, N., Holloway, E., Kapushesky, M., Kemmeren, P., Garcia Lara, G., Oezcimen, A., Sansone, S., Rocca-Serra, P.
European Bioinformatics Institute, EMBL-EBI Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
ArrayExpress is a new public repository for microarray based gene expression data, which implements the Minimum Information About a Microarray Experiment (MIAME) - a microarray data annotation standard , the Microarray Gene Expression Markup Language (MAGE-ML)  by the Microarray Gene Expression Data (MGED) society (http://www.mged.org) and Object Management Group (OMG, http://www.omg.org). ArrayExpress has three major goals: (i) to serve the scientific community as a repository for data that support publications, (ii) to provide the community with easy access to high quality gene expression data in a standard format, and (iii) to facilitate the sharing of microarray designs and experimental protocols. ArrayExpress accepts three types of submissions: arrays, experiments, and protocols (including experimental and data processing protocols). Each of these can be submitted separately and is assigned a unique accession number. This can later be used as a reference, either within the database or externally. A journal publication may use ArrayExpress accession numbers to refer their supporting data. There are two data submission routes to ArrayExpress: (i) directly via MAGE-ML files, or (ii) via a web-based submission interface, MIAMExpress. As generation of MAGE-ML format data requires both a local Laboratory Information Management System (LIMS) and informatics support, this route is best suited for projects that have the necessary infrastructure. Currently a MIAME compliant MAGE-ML based pipeline has been established with the Wellcome Trust Sanger Institute. Other similar pipelines, including ones from TIGR, Affymetrix, BASE, J-Express, and NCI, are under testing or construction. MIAMExpress is a web-based tool, which allows users to annotate the submission either during, or upon the completion of the experiment. The current MIAMExpress Version 1.0 is a generic annotation tool, suitable for annotation of any microarray gene expression experiment, irrespective of organism or type. To use MIAMExpress users need only an internet browser. The user creates an account and is presented with a series of web forms, which include a combination of drop-down fields (with appropriate controlled vocabularies) and free format text fields, to annotate the experiment. Tab-delimited data files are uploaded from the userâ€™s local computer and linked to the experiment submission. Arrays and protocols can also be submitted via MIAMExpress and can be linked to multiple experiments. Help is available from the curation team throughout the submission and contextual help is provided within the interface. Throughout the submission process the data are stored in a submission database and are subsequently curated and then exported to ArrayExpress. ArrayExpress has been accepting data submissions since February 2002. With an increasing number of microarray vendors and laboratories adopting the MAGE-ML and MIAME standards, the volume of submissions to ArrayExpress is growing rapidly. Data access and retrieval is performed through a dedicated web interface allowing case insensitive searches on fields such as Experiment, Species, Author, Organization, Array or Accession numbers. Relevant results may hence be exported to Expression-Profiler, the EBI web based expression analysis tool . Finally as MAGE-ML standard spreads throughout the microarray community, ArrayExpress aims at becoming a corner stone of microarray data exchange and mining.
ArrayExpress is an ongoing project and current developments focus on improving the query interface to exploit the full power of the MAGE-OM model. In particular gene-centric queries combining data from several experiments will provide cross-platform analysis possibilities. ArrayExpress will be fully integrated with the relevant databases at the EBI and queries combining information from different databases will be possible. The ontology developed by the MGED Ontology Working Group will be incorporated into future ArrayExpress query interfaces where possible. Future releases of MIAMExpress will incorporate terms from the MGED ontology and will also be used as a source of terms for the ontology. In addition, MIAMExpress will provide species or research area (e.g., toxicogenomics) specific interfaces, thus simplifying submissions for these data. Currently we are developing toxicogenomic-specific and plant-specific interfaces as a part of collaborative projects. We intend to extend this to other areas, for example, those required by model organisms, where existing ontologies or controlled vocabularies will be used within the interface. The infrastructure for data sharing is based on the adoption of the MAGE-ML data exchange format by the community, a process which is gathering momentum. In future as microarray LIMS support the use of MIAME and are able to export MAGE-ML, data submission to central repositories will become simpler. MAGE-ML is also an obvious candidate as a data exchange format between public repositories such as GEO at NCBI  or CIBEX currently under development at DDBJ. Moreover, the availability of common experimental and data processing protocols (described in a standard format) will encourage common laboratory practices. This, in turn, will serve to improve the comparability of datasets generated in different laboratories. In addition to the software related efforts described here, we are actively working with experimental centres and consortia to generate high quality MIAME compliant data. Examples of these include the toxicogenomics project coordinated by ILSI (http://www.ilsi.org ) which is producing cross-platform gene expression data on the effects of various toxic compounds  and the cancer profiling project by the International Genomics Consortium (IGC)  who intend to screen thousands of tumour samples and deposit the data in ArrayExpress. The ArrayExpress team is interested in collaborating with all potential data providers and array manufacturers to establish direct MAGE-ML based pipelines for data and array design submissions to the database.
The ArrayExpress project is funded by EMBL, the European Commission (TEMBLOR grant), the EBI Industry Programme (Biostandards), and the International Life Sciences Institute (ILSI/HESI) toxicogenomics database grant. Initial funding was provided by Incyte and we particularly thank Lee Grower. The authors would like to thank Rob Andrews, Jurg Bahler and Kate Rice (Sanger Institute), John Quackenbush and Joe White (TIGR), Paul Spellman (University of California at Berkeley), and Steve Chervitz (Affymetrix) all of whom who have generously provided their datasets and/or array designs in MAGE-ML format. We thank Tom Freeman (UK MRC-HGMP) for testing the MIAMExpress prototype. We acknowledge Jason Stewart (Open Informatics) for coordinating the development of the open source tools for processing MAGE-ML. We would also like to thank the MGED members and the entire EBI Microarray Informatics Team.
- Brazma,A., Hingamp,P., Quackenbush,J., Sherlock,G., Spellman,P., Stoeckert,C., Aach,J., Ansorge,W., Ball,C.A., Causton,H.C., Gaasterland,T., Glenisson,P., Holstege,F.C.P., Kim,I.F., Markowitz,V., Matese,J.C., Parkinson,H., Robinson,A., Sarkans,U., Schulze-Kremer,S., Stewart,J., Taylor,R., Vilo,J. and Vingron,M. (2001) Minimum information about a microarray experiment (MIAME)â€”toward standards for microarray data. Nature Genetics, 29, 365-371.
- Spellman,P.T., Miller,M., Stewart,J., Troup,C., Sarkans,U., Chervitz,S., Bernhart,D., Sherlock,G., Ball,C., Lepage,M., Swiatek,M., Marks,W.L., Goncalves,J., Markel,S., Iordan,D., Shojatalab,M., Pizarro,A., White,J., Hubley,R., Deutsch,E., Senger,M., Aronow,B.J., Robinson,A., Bassett,D., Stoeckert Jr,C.J. and Brazma,A. (2002) Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biology, 3(9), research0046.1-0046.9.
- Vilo,J., Kapushesky,M., Kemmeren,P., Sarkans,U. and Brazma,A. (expected 2003) Expression Profiler. In Parmigiani,G., Garrett,E.S., Irizarry,R. and Zeger,S.L. (eds.), The analysis of gene expression data: methods and software, in press, Springer-Verlag.
- Edgar,R., Domrachev,M. and Lash,A. (2002) Gene Expression Omnibus: NCBI gene expression and hybridisation array data repository. Nucleic Acids Res., 30(1), 207-210.
- Robinson,D.E., Pettit,S.D. and Morgan,D.G. (2002) Use of Genomics in Mechanism Based Risk Assessment. In Inoue,T., Pennie,W.D. (eds.), Toxicogenomics, Springer-Verlag, Tokyo, pp.194-203.
- Knight,J. (2001) Cancer comes under scrutiny in fresh genomics initiative. Nature, 4(10), 855.
Go to the abstract in the NAR 2009 Database Issue.
Oxford University Press is not responsible for the content of external internet sites