Epigenomics
Encyclopedia
Epigenomics
Epigenomics is the study of the complete set of epigenetic modifications on the genetic material of a cell, known as the epigenome. The field is analogous to genomicsGenomics
Genomics is a discipline in genetics concerning the study of the genomes of organisms. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts. The field also includes studies of intragenomic phenomena such as heterosis,...
and proteomics
Proteomics
Proteomics is the large-scale study of proteins, particularly their structures and functions. Proteins are vital parts of living organisms, as they are the main components of the physiological metabolic pathways of cells. The term "proteomics" was first coined in 1997 to make an analogy with...
, which are the study of the genome and proteome of a cell (Russell 2010 p. 217 & 230). Epigenetic modifications are reversible modifications on a cell’s DNA or histones that affect gene expression without altering the DNA sequence (Russell 2010 p. 475). Two of the most characterized epigenetic modifications are DNA methylation
DNA methylation
DNA methylation is a biochemical process that is important for normal development in higher organisms. It involves the addition of a methyl group to the 5 position of the cytosine pyrimidine ring or the number 6 nitrogen of the adenine purine ring...
and histone modification. Epigenetic modifications play an important role in gene expression and regulation, and are involved in numerous cellular processes such as in differentiation/development and tumorigenesis (Russell 2010 p. 597). The study of epigenetics on a global level has been made possible only recently through the adaptation of genomic high-throughput assays (Laird 2010).
Introduction to Epigenetics
The mechanisms governing phenotypic plasticityPhenotypic plasticity
Phenotypic plasticity is the ability of an organism to change its phenotype in response to changes in the environment. Such plasticity in some cases expresses as several highly morphologically distinct results; in other cases, a continuous norm of reaction describes the functional interrelationship...
, or the capacity of a cell to change its state in response to stimuli, have long been the subject of research (Phenotypic plasticity 1). The traditional central dogma of biology
Central dogma of molecular biology
The central dogma of molecular biology was first articulated by Francis Crick in 1958 and re-stated in a Nature paper published in 1970:In other words, the process of producing proteins is irreversible: a protein cannot be used to create DNA....
states that the DNA of a cell is transcribed to RNA
RNA
Ribonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....
, which is translated to proteins
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...
, which perform cellular processes and functions (Crick 1970). A paradox exists, however, in that cells exhibit diverse responses to varying stimuli and that cells sharing identical sets of DNA such as in multicellular organisms can have a variety of distinct functions and phenotypes (Bird 2002). Classical views have attributed phenotypic variation to differences in primary DNA structure, be it through aberrant mutation
Mutation
In molecular biology and genetics, mutations are changes in a genomic sequence: the DNA sequence of a cell's genome or the DNA or RNA sequence of a virus. They can be defined as sudden and spontaneous changes in the cell. Mutations are caused by radiation, viruses, transposons and mutagenic...
or an inherited sequence allele
Allele
An allele is one of two or more forms of a gene or a genetic locus . "Allel" is an abbreviation of allelomorph. Sometimes, different alleles can result in different observable phenotypic traits, such as different pigmentation...
(Johannes 2008). However, while this did explain some aspects of variation, it does not explain how tightly coordinated and regulated cellular responses, such as differentiation, are carried out.
A more likely source of cellular plasticity is through the Regulation of gene expression
Regulation of gene expression
Gene modulation redirects here. For information on therapeutic regulation of gene expression, see therapeutic gene modulation.Regulation of gene expression includes the processes that cells and viruses use to regulate the way that the information in genes is turned into gene products...
, such that while two cells may have near identical DNA, the differential expression of certain genes results in variation. Research has shown that cells are capable of regulating gene expression at several stages: mRNA transcription, processing and transportation as well as in protein translation, post-translational processing and degradation. Regulatory proteins that bind to DNA, RNA, and/or proteins are key effectors in these processes and function by positively or negatively regulating specific protein level and function in a cell (Russell 2010 p 518-19). And, while DNA binding transcription factors provide a mechanism for specific control of cellular responses, a model where DNA binding transcription factors
Transcription factor
In molecular biology and genetics, a transcription factor is a protein that binds to specific DNA sequences, thereby controlling the flow of genetic information from DNA to mRNA...
are the sole regulators of gene activity is also unlikely. For example, in a study of Somatic-cell nuclear transfer, it was demonstrated that stable features of differentiation remain after the nucleus
Cell nucleus
In cell biology, the nucleus is a membrane-enclosed organelle found in eukaryotic cells. It contains most of the cell's genetic material, organized as multiple long linear DNA molecules in complex with a large variety of proteins, such as histones, to form chromosomes. The genes within these...
is transferred to a new cellular environment, suggesting that a stable and heritable mechanism of gene regulation was involved in the maintenance of the differentiated state in the absence of the DNA binding transcription factors (Bird 2002).
With the finding that DNA methylation and histone modifications are stable, heritable, and also reversible processes that influence gene expression without altering DNA primary structure, a mechanism for the observed variability in cell gene expression was provided (Johannes 2008). These modifications were termed epigenetic, from epi “on top of” the genetic material “DNA” (Epigenetics 1). The mechanisms governing epigenetic modifications are complex, but through the advent of high-throughput sequencing technology they are now becoming better understood (Johannes 2008).
Epigenetics
Genomic modifications that alter gene expression that cannot be attributed to modification of the primary DNA sequence and that are heritable mitoticallyMitosis
Mitosis is the process by which a eukaryotic cell separates the chromosomes in its cell nucleus into two identical sets, in two separate nuclei. It is generally followed immediately by cytokinesis, which divides the nuclei, cytoplasm, organelles and cell membrane into two cells containing roughly...
and meiotically
Meiosis
Meiosis is a special type of cell division necessary for sexual reproduction. The cells produced by meiosis are gametes or spores. The animals' gametes are called sperm and egg cells....
classified as epigenetic modifications. DNA methylation and histone modification are among the best characterized epigenetic processes (Russell 2010 p. 475).
DNA methylation
The first epigenetic modification to be characterized in depth was DNA methylation. As its name implies, DNA methylation is the process by which a methyl groupMethyl group
Methyl group is a functional group derived from methane, containing one carbon atom bonded to three hydrogen atoms —CH3. The group is often abbreviated Me. Such hydrocarbon groups occur in many organic compounds. The methyl group can be found in three forms: anion, cation and radical. The anion...
is added to DNA. The enzymes responsible for catalyzing this reaction are the DNA methyltransferases (DNMTs)
DNA methyltransferase
In biochemistry, the DNA methyltransferase family of enzymescatalyze the transfer of a methyl group to DNA. DNA methylation serves a wide variety of biological functions...
. While DNA methylation is stable and heritable, it can be reversed by an antagonistic group of enzymes known as DNA de-methylases. In eukaryotes, methylation is most commonly found on the carbon 5 position of cytosine residues
Cytosine
Cytosine is one of the four main bases found in DNA and RNA, along with adenine, guanine, and thymine . It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached . The nucleoside of cytosine is cytidine...
(5mC) adjacent to guanine
Guanine
Guanine is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine . In DNA, guanine is paired with cytosine. With the formula C5H5N5O, guanine is a derivative of purine, consisting of a fused pyrimidine-imidazole ring system with...
, termed CpG dinucleotides
CpG site
CpG sites or CG sites are regions of DNA where a cytosine nucleotide occurs next to a guanine nucleotide in the linear sequence of bases along its length. "CpG" is shorthand for "—C—phosphate—G—", that is, cytosine and guanine separated by only one phosphate; phosphate links any two nucleosides...
(Russell 2010 p 531-32; Laird 2010).
DNA methylation patterns vary greatly between species and even within the same organism. The usage of methylation among animals is quite different; with vertebrates
Vertebrate
Vertebrates are animals that are members of the subphylum Vertebrata . Vertebrates are the largest group of chordates, with currently about 58,000 species described. Vertebrates include the jawless fishes, bony fishes, sharks and rays, amphibians, reptiles, mammals, and birds...
exhibiting the highest levels of 5mC and invertebrates
Invertebrate
An invertebrate is an animal without a backbone. The group includes 97% of all animal species – all animals except those in the chordate subphylum Vertebrata .Invertebrates form a paraphyletic group...
more moderate levels of 5mC. Some organisms like Caenorhabditis elegans
Caenorhabditis elegans
Caenorhabditis elegans is a free-living, transparent nematode , about 1 mm in length, which lives in temperate soil environments. Research into the molecular and developmental biology of C. elegans was begun in 1974 by Sydney Brenner and it has since been used extensively as a model...
have not been demonstrated to have 5mC nor a conventional DNA methyltransferase; this would suggest that other mechanisms other than DNA methylation are also involved (Bird 2002).
Within an organism, DNA methylation levels can also vary throughout development and by region. For example, in mouse primordial germ cells
Germ cell
A germ cell is any biological cell that gives rise to the gametes of an organism that reproduces sexually. In many animals, the germ cells originate near the gut of an embryo and migrate to the developing gonads. There, they undergo cell division of two types, mitosis and meiosis, followed by...
, a genome wide de-methylation even occurs; by implantation stage, methylation levels return to their prior somatic levels (Bird 2002). When DNA methylation occurs at promoter regions, the sites of transcription initiation, it has the effect of repressing gene expression. This is in contrast to unmethylated promoter regions which are associated with actively expressed genes (Laird 2010).
The mechanism by which DNA methylation represses gene expression is a multi-step process. The distinction between methylated and unmethylated cytosine residues is carried out by specific DNA-binding proteins. Binding of these proteins recruit histone deacetylases (HATs)
Histone deacetylase
Histone deacetylases are a class of enzymes that remove acetyl groups from an ε-N-acetyl lysine amino acid on a histone. This is important because DNA is wrapped around histones, and DNA expression is regulated by acetylation and de-acetylation. Its action is opposite to that of histone...
enzyme which initiate chromatin remodeling such that the DNA becoming less accessible to transcriptional machinery, such as RNA polymerase
RNA polymerase
RNA polymerase is an enzyme that produces RNA. In cells, RNAP is needed for constructing RNA chains from DNA genes as templates, a process called transcription. RNA polymerase enzymes are essential to life and are found in all organisms and many viruses...
, effectively repressing gene expression (Russell 2010 p.532-533).
Histone Modification
In eukaryotesEukaryote
A eukaryote is an organism whose cells contain complex structures enclosed within membranes. Eukaryotes may more formally be referred to as the taxon Eukarya or Eukaryota. The defining membrane-bound structure that sets eukaryotic cells apart from prokaryotic cells is the nucleus, or nuclear...
, genomic DNA is coiled into protein-DNA complexes called chromatin
Chromatin
Chromatin is the combination of DNA and proteins that make up the contents of the nucleus of a cell. The primary functions of chromatin are; to package DNA into a smaller volume to fit in the cell, to strengthen the DNA to allow mitosis and meiosis and prevent DNA damage, and to control gene...
. Histones
Histone
In biology, histones are highly alkaline proteins found in eukaryotic cell nuclei that package and order the DNA into structural units called nucleosomes. They are the chief protein components of chromatin, acting as spools around which DNA winds, and play a role in gene regulation...
, which are the most prevalent type of protein found in chromatin, function to condense the DNA; the net positive charge on histones facilitates their bonding with DNA, which is negatively charged. The basic and repeating units of chromatin, nucleosomes
Nucleosome
Nucleosomes are the basic unit of DNA packaging in eukaryotes, consisting of a segment of DNA wound around a histone protein core. This structure is often compared to thread wrapped around a spool....
, are comprised of an octamer of histone proteins
Histone octamer
A histone octamer is an octamer of the histones found at the center of a nucleosome core particle. It consists of 2 copies of each of the four core histone proteins . The octamer assembles when a tetramer, containing two copies of both H3 and H4, complexes with two H2A/H2B dimers...
(H2A, H2B, H3 and H4) and a 146 bp length of DNA wrapped around it. Nucleosomes and the DNA connecting form a 10 nm diameter chromatin fiber, which can be further condensed (Barski et al. 2007; Kouzarides 2007).
Chromatin packaging of DNA varies depending on the cell cycle stage and by local DNA region (Russell 2010 p. 24-27). The degree to which chromatin is condensed is associated with a certain transcriptional state. Unpackaged or loose chromatin is more transcriptionally active than tightly packaged chromatin because it is more accessible to transcriptional machinery. By remodeling chromatin structure and changing the density of DNA packaging, gene expression can thus be modulated (Kouzarides 2007).
Chromatin remodeling occurs via post-translational modifications
Posttranslational modification
Posttranslational modification is the chemical modification of a protein after its translation. It is one of the later steps in protein biosynthesis, and thus gene expression, for many proteins....
of the N-terminal tails of core histone proteins (Russell 2010 p. 529-30). The collective set of histone modifications in a given cell is known as the histone code
Histone code
The histone code is a hypothesis that the transcription of genetic information encoded in DNA is in part regulated by chemical modifications to histone proteins, primarily on their unstructured ends. Together with similar modifications such as DNA methylation it is part of the epigenetic code...
. Many different types of histone modification are known, including: acetylation
Acetylation
Acetylation describes a reaction that introduces an acetyl functional group into a chemical compound...
, methylation
Methylation
In the chemical sciences, methylation denotes the addition of a methyl group to a substrate or the substitution of an atom or group by a methyl group. Methylation is a form of alkylation with, to be specific, a methyl group, rather than a larger carbon chain, replacing a hydrogen atom...
, phosphorylation, ubiquitination
Ubiquitin
Ubiquitin is a small regulatory protein that has been found in almost all tissues of eukaryotic organisms. Among other functions, it directs protein recycling.Ubiquitin can be attached to proteins and label them for destruction...
, SUMOylation
SUMO protein
Small Ubiquitin-like Modifier or SUMO proteins are a family of small proteins that are covalently attached to and detached from other proteins in cells to modify their function...
, ADP-ribosylation
ADP-ribosylation
ADP-ribosylation is the addition of one or more ADP-ribose moieties to a protein. These reactions are involved in cell signaling and the control of many cell processes, including DNA repair and apoptosis.-ADP-ribosylation enzymes:...
, deamination
Citrullination
Citrullination or deimination is the term used for the post-translational modification of the amino acid arginine in a protein into the amino acid citrulline. This reaction, shown below, is performed by enzymes called peptidylarginine deiminases...
and proline isomerization
Proline
Proline is an α-amino acid, one of the twenty DNA-encoded amino acids. Its codons are CCU, CCC, CCA, and CCG. It is not an essential amino acid, which means that the human body can synthesize it. It is unique among the 20 protein-forming amino acids in that the α-amino group is secondary...
; acetylation, methylation, phosphorylation and ubiquitination have been implicated in gene activation whereas methylation, ubiquitination, SUMOylation, deamination and proline isomerization have been implicated in gene repression. Note that several modification types including methylation, phosphorylation and ubiquitination can be associated with different transcriptional states depending on the specific amino acid on the histone being modified. Furthermore, the DNA region where histone modification occurs can also elicit different effects; an example being methylation of the 3rd core histone at lysine residue 36 (H3K36). When H3K36 occurs in the coding sections of a gene, it is associated with gene activation but the opposite is found when it is within the promoter region (Kouzarides 2007).
Histone modifications regulate gene expression by two mechanisms: by disruption of the contact between nucleosomes and by recruiting chromatin remodeling ATPases. An example of the first mechanism occurs during the acetylation of lysine
Lysine
Lysine is an α-amino acid with the chemical formula HO2CCH4NH2. It is an essential amino acid, which means that the human body cannot synthesize it. Its codons are AAA and AAG....
terminal tail amino acids, which is catalyzed by histone acetyltransferases (HATs)
Histone acetyltransferase
Histone acetyltransferases are enzymes that acetylate conserved lysine amino acids on histone proteins by transferring an acetyl group from acetyl CoA to form ε-N-acetyl lysine....
. HATs are part of a multiprotein complex that is recruited to chromatin when activators bind to DNA binding sites. Acetylation effectively neutralizes the basic charge on lysine, which was involved in stabilizing chromatin through its affinity for negatively charged DNA. Acetylated histones therefore favor the dissociation of nucleosomes and thus unwinding of chromatin can occur. Under a loose chromatin state, DNA is more accessible to transcriptional machinery and thus expression is activated. The process can be reversed through removal of histone acetyl groups by deacetylases (Kouzarides 2007; Russell 2010 p. 529-30).
The second process involves the recruitment of chromatin remodeling complexes by the binding of activator molecules to corresponding enhancer regions. The nucleosome remodeling complexes reposition nucleosomes by several mechanisms, enabling or disabling accessibility of transcriptional machinery to DNA. The SWI/SNF protein complex
SWI/SNF
SWI/SNF is a yeast nucleosome remodeling complex composed of several proteins – products of the SWI and SNF genes as well as several other polypeptides...
in yeast is one example of a chromatin remodeling complex that regulates the expression of many genes through chromatin remodeling (Kouzarides 2007; Russell 2010 p. 530).
Relation to other genomic fields
Epigenomics shares many commonalities with other genomics fields, in both methodology and in its abstract purpose. Epigenomics seeks to identify and characterize epigenetic modifications on a global level, similar to the study of the complete set of DNA in genomics or the complete set of proteins in a cell in proteomics (Russell 2010 p. 217 & 230). The logic behind performing epigenetic analysis on a global level is that inferences can be made about epigenetic modifications, which might not be otherwise be possible through analysis of specific loci (Barski et al. 2007; Russell 2010 p. 217). As in the other genomics fields, epigenomics relies heavily on bioinformaticsBioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...
, which combines the disciplines of biology, mathematics and computer science (Russell 2010 p. 218). However while epigenetic modifications had been known and studied for decades, it is through these advancements in bioinformatics technology that have allowed analyses on a global scale. Many current techniques still draw on older methods, often adapting them to genomic assays as is described in the next section.
Histone modification assays
The cellular processes of transcriptionTranscription (genetics)
Transcription is the process of creating a complementary RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes...
, DNA replication
DNA replication
DNA replication is a biological process that occurs in all living organisms and copies their DNA; it is the basis for biological inheritance. The process starts with one double-stranded DNA molecule and produces two identical copies of the molecule...
and DNA repair
DNA repair
DNA repair refers to a collection of processes by which a cell identifies and corrects damage to the DNA molecules that encode its genome. In human cells, both normal metabolic activities and environmental factors such as UV light and radiation can cause DNA damage, resulting in as many as 1...
involve the interaction between genomic DNA and nuclear proteins. It had been known that certain regions within chromatin were extremely susceptible to DNAse I
Deoxyribonuclease
A deoxyribonuclease is any enzyme that catalyzes the hydrolytic cleavage of phosphodiester linkages in the DNA backbone. Thus, deoxyribonucleases are one type of nuclease...
digestion, which cleaves DNA in a low sequence specificity manner. Such hypersensitive sites were thought to be transcriptionally active regions, as evidenced by their association with RNA polymerase
RNA polymerase
RNA polymerase is an enzyme that produces RNA. In cells, RNAP is needed for constructing RNA chains from DNA genes as templates, a process called transcription. RNA polymerase enzymes are essential to life and are found in all organisms and many viruses...
and topoisomerases I and II
Topoisomerase
Topoisomerases are enzymes that regulate the overwinding or underwinding of DNA. The winding problem of DNA arises due to the intertwined nature of its double helical structure. For example, during DNA replication, DNA becomes overwound ahead of a replication fork...
(Gross 1988).
It is now know that sensitivity to DNAse I regions correspond to regions of chromatin with loose DNA-histone association. Hypersensitive sites most often represent promoters regions, which require for DNA to be accessible for DNA binding transcriptional machinery to function (Russell 2010 p. 529).
ChIP-Chip and ChIP-Seq
Histone modification was first detected on a genome wide level through the coupling of chromatin immunoprecipitation (ChIP)Chromatin immunoprecipitation
Chromatin Immunoprecipitation is a type of immunoprecipitation experimental technique used to investigate the interaction between proteins and DNA in the cell. It aims to determine whether specific proteins are associated with specific genomic regions, such as transcription factors on promoters or...
technology with DNA microarrays
DNA microarray
A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome...
, termed ChIp-Chip
ChIP-on-chip
ChIP-on-chip is a technique that combines chromatin immunoprecipitation with microarray technology . Like regular ChIP, ChIP-on-chip is used to investigate interactions between proteins and DNA in vivo...
(Barski et al. 2007). However instead of isolating a DNA-binding transcription factor or enhancer protein through chromatin immunoprecipitation, the proteins of interest are the modified histones themselves. First, histones are cross-linked to DNA in vivo through light chemical treatment (e.g., formaldehyde
Formaldehyde
Formaldehyde is an organic compound with the formula CH2O. It is the simplest aldehyde, hence its systematic name methanal.Formaldehyde is a colorless gas with a characteristic pungent odor. It is an important precursor to many other chemical compounds, especially for polymers...
). The cells are next lysed, allowing for the chromatin to be extracted and fragmented, either by sonication
Sonication
thumb|right|A sonicator at the [[Weizmann Institute of Science]] during sonicationSonication is the act of applying sound energy to agitate particles in a sample, for various purposes. In the laboratory, it is usually applied using an ultrasonic bath or an ultrasonic probe, colloquially known as...
or treatment with a non-specific restriction enzyme (e.g., micrococcal nuclease
Micrococcal nuclease
Micrococcal Nuclease is an endo-exonuclease that preferentially digests single-stranded nucleic acids.The rate of cleavage is 30 times greater at the 5' side of A or T than at G or C and results in the production of mononucleotides and oligonucleotides with terminal 3'-phosphates...
). Modification-specific antibodies in turn, are used to immunoprecipitate the DNA-histone complexes (Kouzarides 2007). Following immunoprecipitation, the DNA is purified from the histones, amplified via PCR and labeled with a fluorescent tag
Fluorescent tag
In molecular biology and biotechnology, a fluorescent tag is a part of a molecule that researchers have attached chemically to aid in detection of the molecule to which it has been attached. The tag is some kind of fluorescent molecule...
(e.g., Cy5, Cy3). The final step involves hybridization of labeled DNA, both immunoprecipitated DNA and non-immunoprecipitated onto a microarray containing immobilized gDNA. Analysis of the relative signal intensity allows the sites of histone modification to be determined (Gibson 2009 229-230; Russell 2010 p. 532).
ChIP-chip was used extensively to characterize the global histone modification patterns of yeast
Yeast
Yeasts are eukaryotic micro-organisms classified in the kingdom Fungi, with 1,500 species currently described estimated to be only 1% of all fungal species. Most reproduce asexually by mitosis, and many do so by an asymmetric division process called budding...
. From these studies, inferences on the function of histone modifications were made; that transcriptional activation or repression was associated with certain histone modifications and by region. While this method was effective providing near full coverage of the yeast epigenome, its use in larger genomes such as humans is limited (Kouzarides 2007; Barski et al. 2007).
In order to study histone modifications on a truly genome level, other high-throughput methods were coupled with the chromatin immunoprecipitation, namely: SAGE: serial analysis of gene expression
Serial Analysis of Gene Expression
Serial analysis of gene expression is a technique used by molecular biologists to produce a snapshot of the messenger RNA population in a sample of interest in the form of small tags that correspond to fragments of those transcripts. The original technique was developed by Dr. Victor Velculescu...
(ChIP-SAGE), PET: paired end ditag sequencing
Paired-end Tags
Paired-end tags, also known as PET, refer to the short sequences at the 5’ and 3’ ends of the DNA fragment of interest, which can be a piece of genomic DNA or cDNA. These short sequences are called tags or signatures because, in theory, they should contain enough sequence information to be uniquely...
(ChIP-PET) and more recently, next-generation sequencing (ChIP-Seq)
Chip-Sequencing
ChIP-Sequencing, also known as ChIP-Seq, is used to analyze protein interactions with DNA. ChIP-Seq combines chromatin immunoprecipitation with massively parallel DNA sequencing to identify the cistrome of DNA-associated proteins. It can be used to precisely map global binding sites for any...
. ChIP-seq follows the same protocol for chromatin immunoprecipitation but instead of amplification of purified DNA and hybridization to a microarray, the DNA fragments are directly sequenced using next generation parallel re-sequencing. It has proven to be an effective method for analyzing the global histone modification patterns and protein target sites, providing higher resolution than previous methods (Barski et al. 2007; Gibson 2009 229-232).
DNA Methylation assays
Techniques for characterizing primary DNA sequences could not be directly applied to methylation assays. For example, when DNA was amplified in PCRPolymerase chain reaction
The polymerase chain reaction is a scientific technique in molecular biology to amplify a single or a few copies of a piece of DNA across several orders of magnitude, generating thousands to millions of copies of a particular DNA sequence....
or bacterial cloning techniques, the methylation pattern was not copied and thus the information lost. The DNA hybridization technique used in DNA assays, in which radioactive probes were used to map and identify DNA sequences, could not be used to distinguish between methylated and non-methylated DNA (Laird 2010; Eads 2000).
Non genome-wide approaches
The earliest methylation detection assays used methylation modification sensitive restriction endonucleases
Restriction enzyme
A Restriction Enzyme is an enzyme that cuts double-stranded DNA at specific recognition nucleotide sequences known as restriction sites. Such enzymes, found in bacteria and archaea, are thought to have evolved to provide a defense mechanism against invading viruses...
. Genomic DNA was digested with both methylation-sensitive and insensitive restriction enzymes recognizing the same restriction site. The idea being that whenever the site was methylated, only the methylation insensitive enzyme could cleave at that position. By comparing restriction fragment
Restriction fragment
A restriction fragment is a DNA fragment resulting from the cutting of a DNA strand by a restriction enzyme , a process called restriction. Each restriction enzyme is highly specific, recognising a particular short DNA sequence, or restriction site, and cutting both DNA strands at specific points...
sizes generated from the methylation-sensitive enzyme to those of the methylation-insensitive enzyme, it was possible to determine the methylation pattern of the region. This analysis step was done by amplifying the restriction fragments via PCR, separating them through gel electrophoresis
Gel electrophoresis
Gel electrophoresis is a method used in clinical chemistry to separate proteins by charge and or size and in biochemistry and molecular biology to separate a mixed population of DNA and RNA fragments by length, to estimate the size of DNA and RNA fragments or to separate proteins by charge...
and analyzing them via southern blot
Southern blot
A Southern blot is a method routinely used in molecular biology for detection of a specific DNA sequence in DNA samples. Southern blotting combines transfer of electrophoresis-separated DNA fragments to a filter membrane and subsequent fragment detection by probe hybridization. The method is named...
with probes for the region of interest (Eads et al. 2000; Laird 2010)).
This technique was used to compare the DNA methylation modification patterns in the human adult and hemoglobin gene loci. Different regions of the gene (gamma delta beta globin) were known to be expressed at different stages of development (Russell 2010 p. 552-3). Consistent with a role of DNA methylation in gene repression, regions that were associated with high levels of DNA methylation were not actively expressed (Van der Ploeg et al. 1980).
This method was limited not suitable for studies on the global methylation pattern, or ‘methylome’. Even within specific loci it was not fully representative of the true methylation pattern as only those restriction sites with corresponding methylation sensitive and insensitive restriction assays could provide useful information. Further complications could arise when incomplete digestion of DNA by restriction enzymes generated false negative results (Laird 2010).
Genome wide approaches
DNA methylation profiling on a large scale was first made possible through the Restriction Landmark Genome Scanning (RLGS)
Restriction landmark genomic scanning
Restriction Landmark Genomic Scanning is a genome analysis method that allows for rapid simultaneous visualization of thousands of landmarks, or restriction sites. Using a combination of restriction enzymes some of which are specific to DNA modifications, the technique can be used to visualize...
technique. Like the locus-specific DNA methylation assay, the technique identified methylated DNA via its digestion methylation sensitive enzymes. However it was the use of two-dimensional gel electrophoresis
Two-dimensional gel electrophoresis
Two-dimensional gel electrophoresis, abbreviated as 2-DE or 2-D electrophoresis, is a form of gel electrophoresis commonly used to analyze proteins...
that allowed be characterized on a broader scale (Laird 2010).
However it was not until the advent of microarray and next generation sequencing technology when truly high resolution and genome-wide DNA methylation became possible (Johannes 2008). As with RLGS, the endonuclease component is retained in the method but it is coupled to new technologies. One such approach is the differential methylation hybridization (DMH), in which one set of genomic DNA is digested with methylation-sensitive restriction enzymes and a parallel set of DNA is not digested. Both sets of DNA are subsequently amplified and each labelled with fluorescent dyes and used in two-colour array hybridization. The level of DNA methylation at a given loci is determined by the relative intensity ratios of the two dyes. Adaptation of next generation sequencing to DNA methylation assay provides several advantages over array hybridization. Sequence-based technology provides higher resolution to allele specific DNA methylation, can be performed on larger genomes, and does not require creation of DNA microarrays which require adjustments based on CpG density to properly function (Laird 2010).
Bisulfite sequencing
Bisulfite sequencingBisulfite sequencing
Bisulfite sequencing is the use of bisulfite treatment of DNA to determine its pattern of methylation. DNA methylation was the first discovered epigenetic mark, and remains the most studied...
relies on chemical conversion of methylated cytosines exclusively, such that they can be identified through standard DNA sequencing techniques. Sodium bisulfate
Bisulfite
Bisulfite ion is the ion HSO3−. Salts containing the HSO3− ion are termed bisulfites also known as sulfite lyes...
and alkaline treatment does this, by converting methylated cytosine residues into uracil
Uracil
Uracil is one of the four nucleobases in the nucleic acid of RNA that are represented by the letters A, G, C and U. The others are adenine, cytosine, and guanine. In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced by thymine.Uracil is a common and...
, while leaving methylated cytosine unaltered. Subsequent amplification and sequencing of untreated DNA and sodium bisulphite treated DNA allows for methylated sites to be identified. Bisulfite sequencing, like the traditional restriction based methods was historically limited to methylation patterns of specific gene loci, until whole genome sequencing technologies became available. However unlike traditional restriction based methods, bisulfite sequencing provided resolution on a nucleotide level (Eads 2000; Laird 2010).
Limitations of the bisulfite technique include the incomplete conversion of cytosine to uracil, which is a source of false positives. Further, bisulfite treatment also causes DNA degradation and requires an additional purification step to remove the sodium bisulphite (Laird 2010).
Next generation sequencing is well suited in complementing bisulfite sequencing in genome-wide methylation analysis. While this now allows for methylation pattern to be determined on the highest resolution possible, on the single nucleotide level, challenges still remain in the assembly step because of reduced sequence complexity in bisulphite treated DNA. Increases in read length seek to address this challenge, allowing for whole genome shotgun bisulphite sequencing (WGBS) to be performed. The WGBS approach using an Illumina Genome Analyzer platform and has already been implemented in Arabidopsis
Arabidopsis
Arabidopsis is a genus in the family Brassicaceae. They are small flowering plants related to cabbage and mustard. This genus is of great interest since it contains thale cress , one of the model organisms used for studying plant biology and the first plant to have its entire genome sequenced...
(Laird 2010).
See also
- EpigeneticsEpigeneticsIn biology, and specifically genetics, epigenetics is the study of heritable changes in gene expression or cellular phenotype caused by mechanisms other than changes in the underlying DNA sequence – hence the name epi- -genetics...
- GenomicsGenomicsGenomics is a discipline in genetics concerning the study of the genomes of organisms. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts. The field also includes studies of intragenomic phenomena such as heterosis,...
- Human Epigenome ProjectHuman Epigenome ProjectHuman Epigenome Project is a multinational science project, with the stated aim to "identify, catalog, and interpret genome-wide DNA methylation patterns of all human genes in all major tissues"...
- Epigenomics AGEpigenomics AGEpigenomics is a molecular diagnostics company headquartered in Berlin, Germany with a wholly owned subsidiary, Epigenomics Inc. based in Seattle, WA.-History:...