TRANSFAC - AbsoluteAstronomy.com

TRANSFAC is a manually curated database of eukaryotic transcription factor

Transcription factor

In molecular biology and genetics, a transcription factor is a protein that binds to specific DNA sequences, thereby controlling the flow of genetic information from DNA to mRNA...

s, their genomic binding sites and DNA

DNA

Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

binding profiles. The contents of the database can be used to predict potential transcription factor binding sites

DNA binding site

DNA binding sites are a type of binding site found in DNA where other molecules may bind. DNA binding sites are distinct from other binding sites in that they are part of a DNA sequence and they are bound by DNA-binding proteins...

Introduction

The origin of the database was an early data collection published 1988. The first version that was released under the name TRANSFAC was developed at the former German National Research Centre for Biotechnology and designed for local installation (now: Helmholtz Centre for Infection Research). In one of the first publicly funded bioinformatics projects, launched in 1993, TRANSFAC developed into a resource that became available on the Internet

Internet

The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...

.

In 1997, TRANSFAC was transferred to a newly established company, BIOBASE, in order to secure long-term financing of the database. Since then, the most up-to-date version has to be licensed, whereas older versions are free for non-commercial users.

Content and Features

The content of the database is organized in a way that it is centered around the interaction between transcription factors (TFs) and their DNA binding sites (TFBS). TFs are described with regard to their structural and functional features, extracted from the original scientific literature. They are classified to families, classes and superclasses according to the features of their DNA binding domains.

Binding of a TF to a genomic site is documented by specifying the localization of the site, its sequence and the experimental method applied. All sites that refer to one TF, or a group of closely related TFs, are aligned and used to construct a position-specific scoring matrix

Position-specific scoring matrix

A position weight matrix , also called position-specific weight matrix or position-specific scoring matrix , is a commonly used representation of motifs in biological sequences....

(PSSM), or count matrix. Many matrices of the TRANSFAC matrix library have been constructed by a team of curators

Biocurator

A biocurator is a professional scientist who collects, annotates, and validates information that is disseminated by biological and model organism databases...

, others were taken from scientific publications.

Availability

The usage of an older version of TRANSFAC is free of charge for non-profit users. Access to the most up-to-date version requires a license.

Applications

The TRANSFAC database can be used as an encyclopedia

Encyclopedia

An encyclopedia is a type of reference work, a compendium holding a summary of information from either all branches of knowledge or a particular branch of knowledge....

of eukaryotic transcription factors. The target sequences and the regulated genes can be listed for each TF, which can be used as training sets for new TFBS recognition algorithms. The TF classification enables to analyze such data sets with regard to the properties of the DNA-binding domains. Another application is to retrieve all TFs that regulate a given (set of) gene(s). In the context of systems-biological studies, the TF-target gene relations documented in TRANSFAC were used to construct and analyze transcription regulatory networks.
By far the most frequent use of TRANSFAC is the computational prediction of potential transcription factor binding sites (TFBS). A number of algorithms exist which either use the individual binding sites or the matrix library for this purpose:

Patch – analyzes sequence similarities with the binding sites documented in TRANSFAC; it is provided along with the database.
SiteSeer – analyzes sequence similarities with the binding sites documented in TRANSFAC.
Match – identifies potential TFBS using the matrix library; it is provided along with the database.
TESS (Transcription Element Search System) – analyzes sequence similarities with binding sites of TRANSFAC as well as potential binding sites using the matrix libraries of TRANSFAC and three other sources. TESS also provides a program for the identification of cis-regulatory modules (CRMs, characteristic combinations of TFBSs), which uses TRANSFAC matrices.
PROMO – matrix-based prediction of TFBSs with aid of the commercial database version
TFM Explorer – Identification of common potential TFBSs in a set of genes
MotifMogul – matrix-based sequence analysis with a number of different algorithms
ConTra – matrix-based sequence analysis in conserved promoter regions
PMS (Poly Matrix Search) – matrix-based sequence analysis in conserved promoter regions

Comparison of matrices with the matrix library of TRANSFAC and other sources:

T-Reg Comparator to compare individual or groups of matrices with those of TRANSFAC or other libraries.
MACO (Poly Matrix Search) – matrix comparison with matrix libraries.

A number of servers provide genomic annotations computed with the aid of TRANSFAC. Others have used such analyses to infer target gene sets.

Similar Data Sources

The following resources offer contents that are related to or partially overlapping with TRANSFAC:

JASPAR – collection of transcription factor binding profiles (matrices) and sequence analysis program
PLACE – cis-regulatory DNA elements in plants; until February 2007
PlantCARE – cis-regulatory elements and transcription factors in plants (2002)
PRODORIC – a similar concept as TRANSFAC for prokaryote
Prokaryote
The prokaryotes are a group of organisms that lack a cell nucleus , or any other membrane-bound organelles. The organisms that have a cell nucleus are called eukaryotes. Most prokaryotes are unicellular, but a few such as myxobacteria have multicellular stages in their life cycles...

s
RegulonDB
RegulonDB
RegulonDB is a database of the regulatory network of Escherichia coli K-12....

– focus on the bacterium Escherichia coli
Escherichia coli
Escherichia coli is a Gram-negative, rod-shaped bacterium that is commonly found in the lower intestine of warm-blooded organisms . Most E. coli strains are harmless, but some serotypes can cause serious food poisoning in humans, and are occasionally responsible for product recalls...
SCPD – specific collection of data- and tools for yeast (Saccharomyces cerevisiae
Saccharomyces cerevisiae
Saccharomyces cerevisiae is a species of yeast. It is perhaps the most useful yeast, having been instrumental to baking and brewing since ancient times. It is believed that it was originally isolated from the skin of grapes...

) (1998)
TFe – the transcription factor encyclopedia
TRDD – Transcription Regulatory Regions Database, mainly about regulatory regions and TF-binding sites

External links

History of the TRANSFAC database on the homepage of Edgar Wingender
What is the TRANSFAC database? at Lane Medical Library, Stanford University School of Medicine
Stanford University School of Medicine
Stanford University School of Medicine is a leading medical school located at Stanford University Medical Center in Stanford, California. Originally based in San Francisco, California as Cooper Medical College, it is the oldest continuously running medical school in the western United States...

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.