TRANSFAC
Encyclopedia
TRANSFAC is a manually curated database of eukaryotic transcription factor
s, their genomic binding sites and DNA
binding profiles. The contents of the database can be used to predict potential transcription factor binding sites
.
.
In 1997, TRANSFAC was transferred to a newly established company, BIOBASE, in order to secure long-term financing of the database. Since then, the most up-to-date version has to be licensed, whereas older versions are free for non-commercial users.
Binding of a TF to a genomic site is documented by specifying the localization of the site, its sequence and the experimental method applied. All sites that refer to one TF, or a group of closely related TFs, are aligned and used to construct a position-specific scoring matrix
(PSSM), or count matrix. Many matrices of the TRANSFAC matrix library have been constructed by a team of curators
, others were taken from scientific publications.
of eukaryotic transcription factors. The target sequences and the regulated genes can be listed for each TF, which can be used as training sets for new TFBS recognition algorithms. The TF classification enables to analyze such data sets with regard to the properties of the DNA-binding domains. Another application is to retrieve all TFs that regulate a given (set of) gene(s). In the context of systems-biological studies, the TF-target gene relations documented in TRANSFAC were used to construct and analyze transcription regulatory networks.
By far the most frequent use of TRANSFAC is the computational prediction of potential transcription factor binding sites (TFBS). A number of algorithms exist which either use the individual binding sites or the matrix library for this purpose:
Comparison of matrices with the matrix library of TRANSFAC and other sources:
A number of servers provide genomic annotations computed with the aid of TRANSFAC. Others have used such analyses to infer target gene sets.
Transcription factor
In molecular biology and genetics, a transcription factor is a protein that binds to specific DNA sequences, thereby controlling the flow of genetic information from DNA to mRNA...
s, their genomic binding sites and DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...
binding profiles. The contents of the database can be used to predict potential transcription factor binding sites
DNA binding site
DNA binding sites are a type of binding site found in DNA where other molecules may bind. DNA binding sites are distinct from other binding sites in that they are part of a DNA sequence and they are bound by DNA-binding proteins...
.
Introduction
The origin of the database was an early data collection published 1988. The first version that was released under the name TRANSFAC was developed at the former German National Research Centre for Biotechnology and designed for local installation (now: Helmholtz Centre for Infection Research). In one of the first publicly funded bioinformatics projects, launched in 1993, TRANSFAC developed into a resource that became available on the InternetInternet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...
.
In 1997, TRANSFAC was transferred to a newly established company, BIOBASE, in order to secure long-term financing of the database. Since then, the most up-to-date version has to be licensed, whereas older versions are free for non-commercial users.
Content and Features
The content of the database is organized in a way that it is centered around the interaction between transcription factors (TFs) and their DNA binding sites (TFBS). TFs are described with regard to their structural and functional features, extracted from the original scientific literature. They are classified to families, classes and superclasses according to the features of their DNA binding domains.Binding of a TF to a genomic site is documented by specifying the localization of the site, its sequence and the experimental method applied. All sites that refer to one TF, or a group of closely related TFs, are aligned and used to construct a position-specific scoring matrix
Position-specific scoring matrix
A position weight matrix , also called position-specific weight matrix or position-specific scoring matrix , is a commonly used representation of motifs in biological sequences....
(PSSM), or count matrix. Many matrices of the TRANSFAC matrix library have been constructed by a team of curators
Biocurator
A biocurator is a professional scientist who collects, annotates, and validates information that is disseminated by biological and model organism databases...
, others were taken from scientific publications.
Availability
The usage of an older version of TRANSFAC is free of charge for non-profit users. Access to the most up-to-date version requires a license.Applications
The TRANSFAC database can be used as an encyclopediaEncyclopedia
An encyclopedia is a type of reference work, a compendium holding a summary of information from either all branches of knowledge or a particular branch of knowledge....
of eukaryotic transcription factors. The target sequences and the regulated genes can be listed for each TF, which can be used as training sets for new TFBS recognition algorithms. The TF classification enables to analyze such data sets with regard to the properties of the DNA-binding domains. Another application is to retrieve all TFs that regulate a given (set of) gene(s). In the context of systems-biological studies, the TF-target gene relations documented in TRANSFAC were used to construct and analyze transcription regulatory networks.
By far the most frequent use of TRANSFAC is the computational prediction of potential transcription factor binding sites (TFBS). A number of algorithms exist which either use the individual binding sites or the matrix library for this purpose:
- Patch – analyzes sequence similarities with the binding sites documented in TRANSFAC; it is provided along with the database.
- SiteSeer – analyzes sequence similarities with the binding sites documented in TRANSFAC.
- Match – identifies potential TFBS using the matrix library; it is provided along with the database.
- TESS (Transcription Element Search System) – analyzes sequence similarities with binding sites of TRANSFAC as well as potential binding sites using the matrix libraries of TRANSFAC and three other sources. TESS also provides a program for the identification of cis-regulatory modules (CRMs, characteristic combinations of TFBSs), which uses TRANSFAC matrices.
- PROMO – matrix-based prediction of TFBSs with aid of the commercial database version
- TFM Explorer – Identification of common potential TFBSs in a set of genes
- MotifMogul – matrix-based sequence analysis with a number of different algorithms
- ConTra – matrix-based sequence analysis in conserved promoter regions
- PMS (Poly Matrix Search) – matrix-based sequence analysis in conserved promoter regions
Comparison of matrices with the matrix library of TRANSFAC and other sources:
- T-Reg Comparator to compare individual or groups of matrices with those of TRANSFAC or other libraries.
- MACO (Poly Matrix Search) – matrix comparison with matrix libraries.
A number of servers provide genomic annotations computed with the aid of TRANSFAC. Others have used such analyses to infer target gene sets.
Similar Data Sources
The following resources offer contents that are related to or partially overlapping with TRANSFAC:- JASPAR – collection of transcription factor binding profiles (matrices) and sequence analysis program
- PLACE – cis-regulatory DNA elements in plants; until February 2007
- PlantCARE – cis-regulatory elements and transcription factors in plants (2002)
- PRODORIC – a similar concept as TRANSFAC for prokaryoteProkaryoteThe prokaryotes are a group of organisms that lack a cell nucleus , or any other membrane-bound organelles. The organisms that have a cell nucleus are called eukaryotes. Most prokaryotes are unicellular, but a few such as myxobacteria have multicellular stages in their life cycles...
s - RegulonDBRegulonDBRegulonDB is a database of the regulatory network of Escherichia coli K-12....
– focus on the bacterium Escherichia coliEscherichia coliEscherichia coli is a Gram-negative, rod-shaped bacterium that is commonly found in the lower intestine of warm-blooded organisms . Most E. coli strains are harmless, but some serotypes can cause serious food poisoning in humans, and are occasionally responsible for product recalls... - SCPD – specific collection of data- and tools for yeast (Saccharomyces cerevisiaeSaccharomyces cerevisiaeSaccharomyces cerevisiae is a species of yeast. It is perhaps the most useful yeast, having been instrumental to baking and brewing since ancient times. It is believed that it was originally isolated from the skin of grapes...
) (1998) - TFe – the transcription factor encyclopedia
- TRDD – Transcription Regulatory Regions Database, mainly about regulatory regions and TF-binding sites
External links
- History of the TRANSFAC database on the homepage of Edgar Wingender
- What is the TRANSFAC database? at Lane Medical Library, Stanford University School of MedicineStanford University School of MedicineStanford University School of Medicine is a leading medical school located at Stanford University Medical Center in Stanford, California. Originally based in San Francisco, California as Cooper Medical College, it is the oldest continuously running medical school in the western United States...