NAR Molecular Biology Database Collection entry number 971
Pavy N.1, Johnson J.2, Crow J.2, Paule C.2, Kunau T.2, MacKay J.1 and Retzel E.2
1ARBOREA and Canada Research Chair in Forest Genomics, Pavillon Charles-EugËne-Marchand, Université Laval, Ste.Foy, Québec G1K 7P4, Canada
2Center for Computational Genomics and Bioinformatics, University of Minnesota, 420 Delaware St. S.E., MMC 43, Minneapolis, MN 55455, USA

Database Description

ForestTreeDB is intended as a resource that centralizes large-scale EST sequencing results from several tree species ( Our group at the Center for Computational Genomics and Bioinformatics (University of Minnesota) aims to contribute to the annotation of forest tree sequences through collaborations with groups involved in forestry research. The database will be continuously enriched with other sequence resources and new features in the future. Its purpose is to make sequence annotation available for the wide community of biologists involved in tree research, and to provide a flexible interface for developing queries.
We have developed an annotation pipeline making use of several publicly available software and sequence repositories. We applied the annotation procedure to several EST collections obtained in conifer and poplar species. Unifying data related to several EST projects, the ForestTreeDB database is dedicated to store and handle these sequence and annotation data. The aim of this work was to produce an extensive EST database for tree species with links to other related plant resources.

ForestTreeDB currently encompasses 344,878 quality sequences based on 68 libraries, from diverse organs of conifer and hybrid poplar trees. It utilizes the Nimbus data model to provide a hosting system for multiple projects, and uses object-relational mapping APIs in Java and Perl for data accesses within an Oracle database. The database is designed to be scalable, maintainable and extendable. Transcriptome builds or unigene sets occupy the focal point of the system. Several of the five current species specific unigenes were used to design microarrays and SNP resources. The ForestTreeDB web application provides the means for multiple combinations of database queries. It presents the user with a list of discrete queries to retrieve and download large EST datasets or sequences from pre-compiled unigene assemblies. Functional annotation assignment is not trivial in conifers which are distantly related to angiosperm model plants. Optimal annotations are achieved through database queries that integrate results from several procedures based open-source tools. ForestTreeDB aims to facilitate sequence mining of coherent annotations in multiple species to support comparative genomic approaches.


This work was supported by Genome Québec, Genome Canada for the Arborea project to JM, and the National Science Foundation Plant Genome Research Program, and the USDA Cooperative State Research, Education and Extension Service Plant Genome Program to ER.


Pavy N., Paule C., Parsons L., Crow J., Morency M.J., Cooke J., Johnson JR., Noumen E., Guillet-Claude C., Butterfield Y., Barber S., Yang G., Liu J., Stott J., Kirkpatrick R., Siddiqui A., Holt R., Marra M., Séguin A., Retzel E., Bousquet J., MacKay J. (2005) Generation, annotation, analysis and database integration of 16,500 white spruce EST clusters. BMC Genomics, 6:144.

Category: Plant databases
Subcategory: Other plants

Go to the article in the NAR Database issue.
Oxford University Press is not responsible for the content of external internet sites