NAR Molecular Biology Database Collection entry number 795
Blanco E.1,2, Farre D.1,2, Alba M.1, Messeguer X.2 and Guigo R.1
1Grup de Recerca en Informatica Biomedica, Institut Municipal dInvestigacio Medica / Universitat Pompeu Fabra / Centre de Regulacio Genomica. C/Doctor Aiguader 80, 08003 Barcelona, Spain
2Grup d'Algorismica i genetica. Departament de Llenguatges i Sistemes Informatics. Universitat Politecnica de Catalunya. C/Jordi Girona 1-3, 08034 Barcelona, Spain
Contact eblanco@imim.es

Database Description

ABS (Annotated Binding Sites) is a public database of experimentally verified orthologous transcription factor binding sites (TFBSs). Annotations have been collected from the literature and are manually curated. For each gene, the TFBSs conserved in orthologous sequences from at least two different species must be available. Promoter sequences as well as the original GenBank or RefSeq entries are additionally supplied in case of future identification conflicts. The final TSS annotation has been refined using the database dbTSS. Up to this release, 500 bps upstream the annotated transcription start site (TSS) have been always extracted to form the collection of gene promoter sequences from human, mouse, rat and chicken.

For each one of the annotated 650 regulatory sites, the position, the motif and the sequence in which the site is present are available in a very simple format. Cross-references to EntrezGene, PubMed and RefSeq are also provided for each annotation. Apart from the experimental promoter annotations, predictions by popular collections of weight matrices are also provided for each promoter sequence. In addition, global and local alignments, and graphical dotplots are also available. ABS is oriented to the study of regulatory regions in the context of pattern discovery programs. Thus, ABS provides two applications to aid during the automatical training of them: CONSTRUCTOR and EVALUATOR.

CONSTRUCTOR automatically generates artificial benchmarks by planting motifs in random sequences. The user can customize the content of the background sequence, the number of motifs that are planted, the subset of the real sites that can be used, the density of motifs on each sequence, and the length and the number of the sequences.

EVALUATOR uses the standard accuracy measures to assess the correctness of the predictions introduced by the user in contrast to the real sites also submitted. In the output, a table with the accuracy at both nucleotide and site level is supplied. The formal definitions of the values are always included to facilitate the interpretation.


Funding for ABS is provided by grants BIO2000-1358-C02-02 and BIO2002-04426-C02-01, Ministerio de Ciencia y Tecnologia (Spain).

