NAR Molecular Biology Database Collection entry number 340
Matys, V., Kel-Margoulis, O.V., Fricke, E., Liebich, I., Land, S., Barre-Dirrie, A., Reuter, I., Chekmenev, D., Krull, M., Hornischer, K., Voss, N., Stegmaier, P., Lewicki-Potapov, B., Saxel, H., Kel, A.E., and Wingender, E.
BIOBASE GmbH, Halchtersche Strasse 33, D-38304 WolfenbÜttel, Germany

Database Description

The TRANSFAC® database has been constructed to model the interaction of eukaryotic transcription factors with their DNA-binding sites and how this affects gene expression. At its core are the three tables FACTOR, SITE, and GENE. A link between FACTOR and SITE indicates the interaction (binding) between them. Experimental evidence for this interaction is given in the SITE entry in form of the method (gel shift, footprinting analysis, ...) which was used to show the binding and the cell from which the factor was derived (factor source). On the basis of those, method and cell, a quality value is given to describe the "confidence" with which a binding activity could be assigned to a specific factor. When a number of binding sites have been collected for a factor, the site sequences are aligned and nucleotide distribution matrices are derived (MATRIX). These matrices are used by the tool MatchTM to find potential binding sites in uncharacterized sequences, while PatchTM, another tool, uses the single sites (and IUPAC consensus sequences), which are stored in the SITE table. A new, third tool, P-MatchTM combines now the strengths of the matrix-based and the pattern-based approaches. While the binding sites are grouped into matrices to find a common denominator for the binding specificity of a certain factor, the transcription factors themselves are classified according to their DNA-binding domains in the CLASS table, as well as in a hierarchical factor classification tree. In addition to the binding properties of the transcription factors lots of information on their structure, function and tissue specificity is collected. In the GENE entries the respective sites and their binding factors, as well as the composite elements from TRANSCompel®, are summarized. As some of the regulated genes encode transcription factors themselves, there are not only links from factors via sites to target genes, but also from genes to encoded factors and vice versa. Based on these links "gene regulatory networks" can be retrieved/constructed. The GENE table does not only connect information of TRANSFAC® and TRANSCompel®, but also of other of our databases like HumanPSDTM, S/MARtDBTM, or TRANSPATH®, a database on signaling networks, into which the factor-site-gene interactions of TRANSFAC® are fully integrated. And finally, the GENE entries serve as major linking source to a growing number of (other) external databases. Public versions of TRANSFAC® and the above mentioned programs are freely accessible for research groups from non-profit organizations at The professional version of TRANSFAC®, which contains not only a larger data amount but also an extended functionality, including integrated versions of Match® and Patch® as well as a tool for visualization of gene regulatory networks, is available at

Recent Developments

The number of linked databases has been extended. GENE entries include now, among among others, links to ENSEMBL, UniGene, EntrezGene, the proteome database HumanPSDTM, and the promoter database TRANSPRO®. In addition, standard gene names from HGNC, MGI, and RGD, are included for human, mouse, and rat genes, respectively, as well as standard ORF names for Saccharomyces cerevisiae. With the help of InterProScan, Pfam, Smart and PROSITE domains have been assigned to the the protein sequences of the transcription factors. In respect of data growth, besides a general data increase (mostly for vertebrates), the gain of Drosophila transcription factor binding sites (by courtesy of the Drosophila DNase I footprint database) and of Arabidopsis factors (by courtesy of DATF, Database of Arabidopsis Transcription Factors) has to be stressed.


We thank Prof. Jingchu Luo, Dr. Anyuan Guo, and colleagues from Peking University, Center for Bioinformatics, for providing the DATF data and Dr. Casey Bergman and colleagues from University of Cambridge (U.K.), Department of Genetics, for the Drosophila footprint data. Further we would like to thank all people who have been contributing over the years to the development and curation of the described databases and connected tools. Parts of the work were funded by grants of the German Ministry of Education and Research (BMBF) "Intergenomics" (031U210B), collectively by BioRegioN GmbH and BMBF "BioProfil" (0313092), by the European Commission under FP6-"Life sciences, genomics and biotechnology for health", contract LSHG-CT-2004-503568 "COMBIO", and by the European Commission under "Marie Curie research training networks", contract MRTN-CT-2004-512285 "TRANSISTOR".


1. Wingender,E. (1988) Compilation of transcription regulating proteins. Nucleic Acids Res., 16, 1879-1902.
2. Wingender,E., Heinemeyer,T. and Lincoln,D. (1991) In Collins,J. and Driesel,A.J. (eds), Genome Analysis - From Sequence to Function; BioTechForu - Advances in Molecular Genetics. HÜthig Buch Verlag, Heidelberg, Vol. 4, pp. 95-108.
3. KnÜppel,R., Dietze,P., Lehnberg,W., Frech, K. and Wingender,E. (1994) TRANSFAC retrieval program: a network model database of eukaryotic transcription regulating sequences and proteins. J. Comput. Biol., 1, 191-198.
4. Wingender,E., Dietze,P., Karas,H. and KnÜppel,R. (1996) TRANSFAC: A database on transcription factors and their DNA binding sites. Nucleic Acids Res., 24, 238-241.
5. Wingender,E., Kel,A.E., Kel,O.V., Karas,H., Heinemeyer,T., Dietze,P., KnÜppel,R., Romaschenko,A.G. and Kolchanov,N.A. (1997) TRANSFAC, TRRD and COMPEL: towards a federated database system on transcriptional regulation. Nucleic Acids Res., 25, 265-268.
6. Heinemeyer,T., Wingender,E., Reuter,I., Hermjakob,H., Kel,A.E., Kel,O.V., Ignatieva,E.V., Ananko,E.A., Podkolodnaya,O.A., Kolpakov,F.A., Podkolodny,N.L. and Kolchanov,N.A. (1998) Databases on transcriptional regulation: TRANSFAC, TRRD, and COMPEL. Nucleic Acids Res., 26, 362-367.
7. Heinemeyer,T., Chen,X., Karas,H., Kel,A.E., Kel,O.V., Liebich,I., Meinhardt,T., Reuter,I., Schacherer,F. and Wingender,E. (1999) Expanding of the TRANSFAC database towards an expert system of regulatory molecular mechanisms. Nucleic Acids Res., 27, 318-322.
8. Wingender,E., Chen,X., Hehl,R., Karas,H., Liebich,I., Matys,V., Meinhardt,T., PrÜß,M., Reuter,I. and Schacherer,F. (2000) TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res., 28, 316-319.
9. Wingender, E., Chen, X., Fricke, E., Geffers, R., Hehl, R., Liebich, I., Krull, M., Matys, V., Michael, H., OhnhÄuser, R., PrÜß, M., Schacherer, F., Thiele, S. and Urbach, S. (2001). The TRANSFAC system on gene expression regulation. Nucleic Acids Res., 29, 281-283
10. Matys,V., Fricke,E., Geffers,R., GÖßling,E., Haubrock,M., Hehl,R., Hornischer,K., Karas,D., Kel,A.E., Kel-Margoulis,O.V., Kloos,D.U., Land,S., Lewicki-Potapov,B., Michael,H., MÜnch,R., Reuter,I., Rotert,S., Saxel,H., Scheer,M., Thiele,S. and Wingender,E. (2003). TRANSFAC®: transcriptional regulation, from patterns to profiles. Nucleic Acids Res., 31, 374-378.
11. Kel,A.E., GÖßling,E., Reuter,I., Cheremushkin,E., Kel-Margoulis,O.V. and Wingender,E. (2003) MATCHTM: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res., 31, 3576-3579.
12. Chekmenev,D.S., Haid,C. and Kel,A.E. (2005) P-MatchTM: transcription factor binding site search by combining patterns and weight matrices. Nucleic Acids Res., 33, W432-437.
13. Kel-Margoulis,O., Matys,V., Choi,C., Reuter,I., Krull,M., Potapov,A.P., Voss,N., Liebich,I., Kel,A., and Wingender,E. (2005) Databases on Gene Regulation. In Bajic,V.B. and Tan,T.W. (ed.), Information Processing And Living Systems. World Scientific Publishing Co, Singapore, Vol. 2, pp. 709-727.

Go to the article in the NAR Database issue.
Oxford University Press is not responsible for the content of external internet sites