NAR Molecular Biology Database Collection entry number 1068
Powell, Sean; Forslund, Kristoffer; Szklarczyk, Damian; Trachana, Kalliopi; Roth, Alexander; Huerta-Cepas, Jaime; Gabaldon, Toni; Rattei, Thomas; Creevey, Chris; Kuhn, Michael; Jensen, Lars,; von Mering, Christian; Bork, Peer
1European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, 2Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, 3University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, 4Biotechnology Center, TU Dresden, 01062 Dresden, Germany, 5Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, 6Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, 7Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, 8University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and 9Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany

Database Description

Next-generation sequencing technologies are now generating a vast amount of sequence data. This leads to a dramatic increase in the number of predicted protein sequences, which serve as a starting point for structural, functional and phylogenomic studies. In such studies, high-throughput comparative analyses are often required to transfer information between organisms, for which the concept of orthology is crucial. The original definition by Fitch describes orthologs as genes that diverged through a speciation event, as opposed to paralogs, which diverged after a duplication event (1). eggNOG, namely evolutionary genealogy of genes Non-supervised Orthologous Groups, is a database providing clusters of orthologous groups for a vast majority of genes covered by the whole tree of life. The database contains a hierarchy of orthologous groups to balance phylogenetic coverage and resolution, and provides automatic function annotation.

Recent Developments

In the following text, the new features of eggnog v2 are presented. Our procedure (2) has been applied to 630 complete genomes (529 bacteria, 46 archaea and 55 eukaryotes), which is a twofold increase relative to the previous version. The pipeline yielded 224,847 orthologous groups (OGs) including 9,724 extended versions of the original COG and KOG. Altogether, the protein dataset covers 2,590,259 proteins of which 2,242,035 (87%) were included in at least one of 224,847 orthologous groups generated by eggNOG.
To provide a higher resolution of orthologous groups in frequently used taxonomic groupings, we applied our procedure to several subsets of organisms separately. We updated the previously computed more fine-grained NOGs at the level of fungi (fuNOGs), metazoans (meNOGs), insects (inNOGs), vertebrates (veNOGs) and mammals (maNOGs) and added groups for archaea (arNOGs), fishes (fiNOGs), rodents (roNOGs) and primates (prNOGs). An important feature of eggNOG is the functional annotations of the orthologous groups. Our original pipeline, providing functional descriptions for the non-supervised orthologous groups (NOGs), is now complemented by an automatic inference of functional categories (FCs) which were taken from the COG database (3). The 25 FCs available from the COG resource have been widely used to assess comparative genomics studies and will enable higher order analyses of OGs identified in any dataset.
To facilitate the in-depth analysis of the orthologous relationships within the groups of proteins, we now provide pre-computed high quality Multiple Sequence Alignments (MSAs) and maximum likelihood trees via the web interface.


1. Fitch, W.M. (1970) Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99-113
2. Jensen, L.J., Julien, P., Kuhn, M., von Mering, C., Muller, J., Doerks, T. and Bork, P. (2008) eggNOG: automated construction and annotation of orthologous groups of genes. Nucleic Acids Res. 36, D250-254
3. Tatusov, R.L., Koonin, E.V. and Lipman, D.J. (1997) A genomic perspective on protein families. Science 278, 631-637

Go to the article in the NAR Database issue.
Oxford University Press is not responsible for the content of external internet sites