NAR Molecular Biology Database Collection entry number 878
DeSantis T.Z.1, Hugenholtz P.2, Larsen N.3, Rojas M.4, Brodie E.L.1, Keller K.5, Huber T.6, Dalevi D.7, Hu P.1 and Andersen G.L.1
1Center for Environmental Biotechnology, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Mail Stop 70A-3317, Berkeley, CA 94720, USA
2Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive Bldg 400-404, Walnut Creek, CA 94598, USA
3Danish Genome Institute, Gustav Wieds vej 10 C, DK-8000 Aarhus C, Denmark
4Department of Bioinformatics, Baylor University, P.O. Box 97356, 1311 S. 5th St., Waco, TX 76798-7356, USA
5Physical Biosciences Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Mail Stop 977-152, Berkeley, CA 94720, USA
6Departments of Biochemistry and Mathematics, The University of Queensland, Brisbane Qld 4072, Australia
7Department of Computer Science, Chalmers University of Technology, SE-412 96, Göteborg, Sweden

Database Description

An online full-length small-subunit (SSU) rRNA gene database called 'greengenes' that keeps pace with public submissions of both archaeal and bacterial 16S rDNA sequences has been established. It addresses a number of limitations currently associated with SSU rRNA records in the public databases by providing automated chimera-screening, taxonomic placement of unclassified environmental sequences using multiple published taxonomies for each record, multiple standard alignments and uniform sequence-associated information curated from GenBank records. Manipulation of 72,918 rDNA records revealed putative chimeras in 5% (1,901 of 38,412) of environmental sequences and, surprisingly, 0.5% (201/36,643) of records derived from isolates. Greengenes also provides a suite of utensils for manipulation of sequences including an alignment tool and has been streamlined to interface with the widely used ARB program.

Recent Developments

New tools are available for beta testing including advanced search and export features, graphical alignment viewer, sequencing read quality trimmer, sequence classifier, and a distance matrix generator. To improve the reliability of the inferred comprehensive 16S rDNA prokaryotic tree, RAxML-V has been implemented to determine the maximum likelihood of branch points. This should allow an attenuation of the disparate phylogenetic trees maintained by separate curators worldwide.


Computational support is provided through the Virtual Institute for Microbial Stress and Survival ( This work was performed under the auspices of the US Department of Energy by the University of California, Lawrence Berkeley National Laboratory under Contract No. DE-AC03-76SF00098 and was funded in part by the Department of Homeland Security under grant number HSSCHQ04X00037 and in part by the Department of Energy Natural and Accelerated Bioremediation Program.


1. DeSantis, T.Z., Dubosarskiy, I., Murray, S.R., and Andersen, G.L. (2003) Comprehensive aligned sequence construction for automated design of effective probes (CASCADE-P) using 16S rDNA. Bioinformatics 19(12), 1461-1468.

Oxford University Press is not responsible for the content of external internet sites