Skip Navigation


NAR Molecular Biology Database Collection entry number 1381
Lynn M. Schriml1, Cesar Arze1, Suvarna Nadendla1, Anu Ganapathy1, Victor Felix1, Anup Mahurkar1, Katherine Phillippy2, Aaron Gussman1,2, Sam Angiuoli1, Elodie Ghedin3, Owen White1 and Neil Hall4
1Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore,
2National Center for Biotechnology Information, Bethesda, MD,
3University of Pittsburgh School of Medicine, Division of Infectious Diseases, Pittsburgh, PA, USA and
4University of Liverpool, School of Biological Sciences, UK

Database Description

The Gemina system ( identifies, standardizes and integrates the outbreak metadata for the breadth of NIAID category A–C viral and bacterial pathogens (, thereby providing an investigative and surveillance tool describing the Who [Host], What [Disease, Symptom], When [Date], Where [Location] and How [Pathogen, Environmental Source, Reservoir, Transmission Method] for each pathogen. The Gemina database (1) will provide a greater understanding of the interactions of viral and bacterial pathogens with their hosts and infectious diseases through in-depth literature text-mining, integrated outbreak metadata, outbreak surveillance tools, extensive ontology development (2), metadata curation and representative genomic sequence identification and standards development. The Gemina web interface provides metadata selection and retrieval of a pathogen's; Infection Systems (Pathogen, Host, Disease, Transmission Method and Anatomy) and Incidents (Location and Date) along with a hosts Age and Gender. The Gemina system provides an integrated investigative and geospatial surveillance system connecting pathogens, pathogen products and disease anchored on the taxonomic ID of the pathogen and host to identify the breadth of hosts and diseases known for these pathogens, to identify the extent of outbreak locations, and to identify unique genomic regions with the DNA Signature Insignia Detection Tool (3).


The authors thank Matthew Davenport for his continue guidance and support of the Gemina project. They would like to thank Susan Bromberg and Mary Shimoyama at the Rat Genome Database for providing them with their disease ontology file. They are grateful to Michael Ashburner, Suzi Lewis, Warren Kibbe, Rex Chisholm, Norman Morrison, Dawn Field, Chris Mungall, Barry Smith, and the members of the OBO Foundry for their help in developing our ontology resources. They thank the following Gemina collaborators and colleagues for their continued support: Steven Salzberg, Adam Phillippy, Kunmi Ayanbule, Jay V. DePasse, Kumar Hari, AlanGoates, Ravi Jain, David Spiro, Naomi Sengamalay
We thank the US Department of Homeland Security Science and Technology Directorate. [W81XWH-05-2-005,NBCH2070002] for funding this work and the Institute for Genome Sciences for funding for open access charges.


1. Smith,B., Ashburner,M., Rosse,C., Bard,J., Bug,W., Ceusters,W., Goldberg,L.J., Eilbeck,K., Ireland,A., Mungall,C.J. et al. (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat. Biotechnol., 25, 1251-1255.
2. Mungall,C.J., Emmert,D.B. and FlyBase Consortium. (2007) A Chado case study: an ontology-based modular schema for representing genome-associated biological information. Bioinformatics, 23, 337-346.
3. Phillippy,A.M., Mason,J.A., Ayanbule,K., Sommer,D.D., Taviani,E., Huq,A., Colwell,R., Knight,I. and Salzberg,S. (2007) Comprehensive DNA discovery and validation. PLoS Comput.Biol., 18, 887-894.

Go to the abstract in the NAR 2010 Database Issue.
Oxford University Press is not responsible for the content of external internet sites