Skip Navigation


NAR Molecular Biology Database Collection entry number 847
Chen F., Mackey A.J., Stoeckert C.J.,Jr. and Roos D.S.
Departments of Biology and Chemistry, Center for Bioinformatics, and Penn Genomics Institute, University of Pennsylvania, Philadelphia PA 19104, USA

Database Description

The OrthoMCL database houses a comprehensive collection of ortholog groups automatically assigned by applying the OrthoMCL algorithm to wide variety of complete genomes, spanning the tree of life. Each ortholog group consists of orthologs from different species, combined with recent paralogs (in-paralogs) from same species. The current release incorporates complete genomes for 55 species, including 16 bacterial and 4 archaeal species representing phylogenetically diverse lineages, and most of the currently available complete eukaryotic genomes: 24 unikonts (12 animals, 9 fungi, the microsporidian, Encephalitozoan, Dictyostelium, Entamoeba), 4 plants/algae, and 7 species of apicomplexan parasites. 511,797 proteins (81.6% of the total dataset) were clustered into 70,388 ortholog groups, and this ortholog dataset may be queried based on keyword descriptions, BLAST similarity, or specified protein names or group accessions numbers. Ortholog groups exhibiting specific phyletic patterns may also be identified, using either a graphical interface or a text-based Phyletic Pattern Expression grammar. Such studies hold out great promise for drag target identification. OrthoMCL-DB will be updated twice a year, as additional genome sequencing projects progress to completion.


This work was supported by NIH grant R01-AI058515, with web-site implementation covered by NIAID contract HHSN266200400037C, supporting the ApiDB Bioinformatics Resource Center. We thank Drs. Li Li, and Shailesh Date for helpful discussions, Lucia Peixoto for running MUSCLE software, Leon Goldovsky (European Bioinformatics Institute) for providing a special version of BioLayout Software. DSR is a Ellison Medical Foundation Scholar in Global Infectious Diseases.


1. Li, L., Stoeckert, C.J., Jr. and Roos, D.S. (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res, 13, 2178-2189.

Go to the abstract in the NAR 2006 Database Issue.
Oxford University Press is not responsible for the content of external internet sites