HGT-DB, Horizontal Gene Transfer-DataBase
NAR Molecular Biology Database Collection entry number 352
Garcia-Vallve, S., Guzman, E, Montero, MA., Romeu, A.
Evolutionary Genomics Group Biochemistry and Biotechnology Department 'Rovira i Virgili' University TARRAGONA
The HGT-DB is a genomic database that includes statistical parameters such as G+C content, codon and amino-acid usage, as well as information about which genes deviate in these parameters for prokaryotic complete genomes. Under the hypothesis that genes from distantly related species have different nucleotide compositions, these deviated genes may have been acquired by horizontal gene transfer. The methods used to consider whether a gene is extraneous in terms of G+C content or codon usage and a candidate to be acquired by HGT are described in Garcia-Vallve et al. 2000. The HGT-DB is organized by genome i.e. every prokaryotic genome that has been completely sequenced forms a new entry. Different chromosomes from the same organism, or genomes from the same species but different strains, are found in different entries. The current version of the database contains 88 genomes that are sorted alphabetically and classified taxonomically. For each genome, the database provides statistical parameters for all the genes, as well as averages and standard deviations of G+C content, codon usage, relative synonymous codon usage and amino-acid content. It also provides information about correspondence analyses of the codon usage, plus lists of extraneous group of genes in terms of G+C content, lists of putatively acquired genes and a tab-delimited file with all the statistical calculations for each gene of a genome. The fields available for each gene in these files are: information about its position (coordinates, strand and length), gene name, function, the Cluster of Orthologous Group, COG it belongs to, total and positional G+C content, the Mahalanobis distance to the average codon usage, amino-acid content deviations, if any, and a prediction of whether the gene belongs to a region with a high or low G+C content or whether it has been acquired by HGT. This information can be also accessed via a search engine that allows searches for gene names or keywords for a specific organism. When searching for a gene name, one can also view the upstream and downstream genes. With this information, researchers can explore the G+C content and codon usage of a gene when they find incongruences in sequence-based phylogenetic trees. HGT-DB is freely accessible at http://www.fut.es/~debb/HGT.
Recent developments of the database include: a section containing the correspondence analysis of the relative synonymous codon usage for each genome, and filters for not including highly expressed genes in the database predictions.
Garcia-Vallve, S., Romeu, A. and Palau, J. (2000) Horizontal gene transfer in bacterial and archaeal complete genomes. Genome Res., 10, 1719-1725.
Category: Genomics Databases (non-vertebrate)
Subcategory: Prokaryotic genome databases
Go to the abstract in the NAR 2003 Database Issue.
Oxford University Press is not responsible for the content of external internet sites