Methylated DNA immunoprecipitation
Encyclopedia
Methylated DNA immunoprecipitation (MeDIP or mDIP) is a large-scale (chromosome
- or genome
-wide) technique that is used to enrich for methylated DNA sequences
. It consists of isolating methylated DNA fragments via an antibody
raised against 5-methylcytosine
(5mC). This technique was first described by Weber M. et al. and has helped pave the way for viable methylome-level assessment efforts, as the purified fraction of methylated DNA can be input to high-throughput DNA detection methods such as high-resolution DNA microarray
s (MeDIP-chip) or next-generation sequencing (MeDIP-seq). Nonetheless, understanding of the methylome remains rudimentary; its study is complicated by the fact that, like other epigenetic properties, patterns vary from cell-type to cell-type.
, referring to the reversible methylation of the 5' position of cytosine by methyltransferases
, is a major epigenetic modification in multicellular organisms. In mammals, this modification primarily occurs at CpG sites, which in turn tend to cluster in regions called CpG islands. There is a small fraction of CpG islands that can overlap or be in close proximity to promoter regions of transcription start sites. The modification may also occur at other sites, but methylation at either of these sites can repress gene expression by either interfering with the binding of transcription factor
s or modifying chromatin
structure to a repressive state.
Disease condition studies have largely fueled the effort in understanding the role of DNA methylation. Currently, the major research interest lies in investigating disease conditions such as cancer
to identify regions of the DNA that has undergone extensive methylation changes. The genes contained in these regions are of functional interest as they may offer a mechanistic explanation to the underlying genetic causes of a disease. For instance, the abnormal methylation pattern of cancer cells was initially shown to be a mechanism through which tumor suppressor-like genes are silenced, although it was later observed that a much broader range of gene types are affected.
. Profiling technologies such as MeDIP are targeted towards a genome
- or methylome-wide level assessment of methylation; this includes restriction landmark genomic scanning
(RLGS), and bisulfite conversion
-based methods, which rely on the treatment of DNA with bisulfite
to convert unmethylated cytosine
residues to uracil
.
-based methods, despite possible single-nucleotide resolution, have a drawback: the conversion of unmethylated cytosine to uracil can be unstable. In addition, when bisulfite conversion is coupled with DNA microarrays to detect bisulfite converted sites, the reduced sequence complexity of DNA is a problem. Microarrays capable of comprehensively profiling the whole-genome become difficult to design as fewer unique probes are available.
.
) from the cells and purified. The purified DNA is then subjected to sonication to shear it into random fragments. This sonication process is quick, simple, and avoids restriction enzyme
biases. The resulting fragments range from 300 to 1000 base pairs (bp) in length, although they are typically between 400 and 600 bp. The short length of these fragments are important in obtaining adequate resolution, improving the efficiency of the downstream step in immunoprecipitation, and reducing fragment-length effects or biases. Also, the size of the fragment affects the binding of 5-methyl-cytidine (5mC) antibody because the antibody needs more than just a single 5mC for efficient binding. To further improve binding affinity of the antibodies, the DNA fragments are denatured to produce single-stranded DNA. Following denaturation, the DNA is incubated with monoclonal
5mC antibodies. The classical immunoprecipitation
technique is then applied: magnetic beads conjugated to anti-mouse-IgG are used to bind the anti-5mC antibodies, and unbound DNA is removed in the supernatant. To purify the DNA, proteinase K
is added to digest the antibodies and release the DNA, which can be collected and prepared for DNA detection.
For more details regarding the experimental steps see .
-5 (Cy5; red) deoxy-cytosine-triphosphate while the methylated DNA, enriched after the immunoprecipitation step, is labeled with cyanine
-3 (Cy3; green). The labeled DNA samples are cohybridized on a 2-channel, high-density genomic microarray to probe for presence and relative quantities. The purpose of this comparison is identify sequences that show significant differential expression, thereby confirming the sequence of interest is enriched.
Array-based identification of MeDIP sequences are limited to the array design. As a result, the resolution is restricted to the probes in the array design. There are additional standard steps required in signal processing to correct for hybridization issues such as noise, as is the case with most array technologies.
See for more details.
, Illumina (company)
(Solexa), and SoLiD (Applied Biosystems
), was first described by Down et al. in 2008. The high-throughput sequencing of the methylated DNA fragments produces a large number of short reads (36-50bp or 400 bp, depending on the technology). The short reads are aligned to a reference genome using alignment software such as Mapping and Assembly with Quality (Maq), which uses a Bayesian
approach, along with base and mapping qualities to model error probabilities for the alignments. The reads can then be extended to represent the ~400 to 700 bp fragments from the sonication step. The coverage of these extended reads can be used to estimate the methylation level of the region. A genome browser such as Ensembl can also be used to visualize the data.
Validation of the approach to assess quality and accuracy of the data can be done with quantitative PCR. This is done by comparing a sequence from the MeDIP sample against an unmethylated control sequence. The samples are then run on a gel
and the band intensities are compared. The relative intensity serves as the guide for finding enrichment. The results can also be compared with MeDIP-chip results to help determine coverage needed.
Studies using MeDIP-seq or MeDIP-chip are both genome-wide approaches that have the common aim of obtaining the functional mapping of the methylome. Once regions of DNA methylation are identified, a number of bioinformatics analyses can be applied to answer certain biological questions. One obvious step is to investigate genes contained in these regions and investigate the functional significance of their repression. For example, silencing of tumour-suppressor genes in cancer can be attributed to DNA methylation. By identifying mutational events leading to hypermethylation and subsequent repression of known tumour-suppressor genes, one can more specifically characterize the contributing factors to the cause of the disease. Alternatively, one can identify genes that are known to be normally methylated but, as a result of some mutation event, is no longer silenced.
Also, one can try and investigate and identify whether some epigenetic regulator has been affected such as DNA methyltransferase (DNMT); in these cases, enrichment may be more limited.
Most typical limitations to high-throughput, next generation sequencing apply. The problem of alignment accuracy to repetitive regions in the genome will result in less accurate analysis of methylation in those regions. Also, as was mentioned above, short reads (e.g. 36-50bp from an Illumina Genome Analyzer) represent a part of a sheared fragment when aligned to the genome; therefore, the exact methylation site can fall anywhere within a window that is a function of the fragment size. In this respect, bisulfite sequencing has much higher resolution (down to a single CpG site; single nucleotide level). However, this level of resolution may not be required for most applications, as the methylation status of CpG sites within < 1000 bp has been shown to be significantly correlated.
Chromosome
A chromosome is an organized structure of DNA and protein found in cells. It is a single piece of coiled DNA containing many genes, regulatory elements and other nucleotide sequences. Chromosomes also contain DNA-bound proteins, which serve to package the DNA and control its functions.Chromosomes...
- or genome
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....
-wide) technique that is used to enrich for methylated DNA sequences
DNA methylation
DNA methylation is a biochemical process that is important for normal development in higher organisms. It involves the addition of a methyl group to the 5 position of the cytosine pyrimidine ring or the number 6 nitrogen of the adenine purine ring...
. It consists of isolating methylated DNA fragments via an antibody
Antibody
An antibody, also known as an immunoglobulin, is a large Y-shaped protein used by the immune system to identify and neutralize foreign objects such as bacteria and viruses. The antibody recognizes a unique part of the foreign target, termed an antigen...
raised against 5-methylcytosine
5-Methylcytosine
5-Methylcytosine is a methylated form of the DNA base cytosine that may be involved in the regulation of gene transcription. When cytosine is methylated, the DNA maintains the same sequence, but the expression of methylated genes can be altered .In the figure on the right, a methyl group, is...
(5mC). This technique was first described by Weber M. et al. and has helped pave the way for viable methylome-level assessment efforts, as the purified fraction of methylated DNA can be input to high-throughput DNA detection methods such as high-resolution DNA microarray
DNA microarray
A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome...
s (MeDIP-chip) or next-generation sequencing (MeDIP-seq). Nonetheless, understanding of the methylome remains rudimentary; its study is complicated by the fact that, like other epigenetic properties, patterns vary from cell-type to cell-type.
Background
DNA methylationDNA methylation
DNA methylation is a biochemical process that is important for normal development in higher organisms. It involves the addition of a methyl group to the 5 position of the cytosine pyrimidine ring or the number 6 nitrogen of the adenine purine ring...
, referring to the reversible methylation of the 5' position of cytosine by methyltransferases
DNA methyltransferase
In biochemistry, the DNA methyltransferase family of enzymescatalyze the transfer of a methyl group to DNA. DNA methylation serves a wide variety of biological functions...
, is a major epigenetic modification in multicellular organisms. In mammals, this modification primarily occurs at CpG sites, which in turn tend to cluster in regions called CpG islands. There is a small fraction of CpG islands that can overlap or be in close proximity to promoter regions of transcription start sites. The modification may also occur at other sites, but methylation at either of these sites can repress gene expression by either interfering with the binding of transcription factor
Transcription factor
In molecular biology and genetics, a transcription factor is a protein that binds to specific DNA sequences, thereby controlling the flow of genetic information from DNA to mRNA...
s or modifying chromatin
Chromatin
Chromatin is the combination of DNA and proteins that make up the contents of the nucleus of a cell. The primary functions of chromatin are; to package DNA into a smaller volume to fit in the cell, to strengthen the DNA to allow mitosis and meiosis and prevent DNA damage, and to control gene...
structure to a repressive state.
Disease condition studies have largely fueled the effort in understanding the role of DNA methylation. Currently, the major research interest lies in investigating disease conditions such as cancer
Cancer
Cancer , known medically as a malignant neoplasm, is a large group of different diseases, all involving unregulated cell growth. In cancer, cells divide and grow uncontrollably, forming malignant tumors, and invade nearby parts of the body. The cancer may also spread to more distant parts of the...
to identify regions of the DNA that has undergone extensive methylation changes. The genes contained in these regions are of functional interest as they may offer a mechanistic explanation to the underlying genetic causes of a disease. For instance, the abnormal methylation pattern of cancer cells was initially shown to be a mechanism through which tumor suppressor-like genes are silenced, although it was later observed that a much broader range of gene types are affected.
Other technologies
There are two approaches to methylation analysis: typing and profiling technologies. Typing technologies are targeted towards a small number of loci across many samples, and involve the use of techniques such as PCR, restriction enzymes, and mass spectrometryMass spectrometry
Mass spectrometry is an analytical technique that measures the mass-to-charge ratio of charged particles.It is used for determining masses of particles, for determining the elemental composition of a sample or molecule, and for elucidating the chemical structures of molecules, such as peptides and...
. Profiling technologies such as MeDIP are targeted towards a genome
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....
- or methylome-wide level assessment of methylation; this includes restriction landmark genomic scanning
Restriction landmark genomic scanning
Restriction Landmark Genomic Scanning is a genome analysis method that allows for rapid simultaneous visualization of thousands of landmarks, or restriction sites. Using a combination of restriction enzymes some of which are specific to DNA modifications, the technique can be used to visualize...
(RLGS), and bisulfite conversion
Bisulfite sequencing
Bisulfite sequencing is the use of bisulfite treatment of DNA to determine its pattern of methylation. DNA methylation was the first discovered epigenetic mark, and remains the most studied...
-based methods, which rely on the treatment of DNA with bisulfite
Bisulfite
Bisulfite ion is the ion HSO3−. Salts containing the HSO3− ion are termed bisulfites also known as sulfite lyes...
to convert unmethylated cytosine
Cytosine
Cytosine is one of the four main bases found in DNA and RNA, along with adenine, guanine, and thymine . It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached . The nucleoside of cytosine is cytidine...
residues to uracil
Uracil
Uracil is one of the four nucleobases in the nucleic acid of RNA that are represented by the letters A, G, C and U. The others are adenine, cytosine, and guanine. In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced by thymine.Uracil is a common and...
.
Limitations of other technologies
Other methods mapping and profiling the methylome have been effective but are not without their limitations that can affect resolution, level of throughput, or experimental variations. For instance, RLGS is limited by the number of restriction sites in genome that can be targets for the restriction enzyme; typically, a maximum of ~4100 landmarks can be assessed. Bisulfite sequencingBisulfite sequencing
Bisulfite sequencing is the use of bisulfite treatment of DNA to determine its pattern of methylation. DNA methylation was the first discovered epigenetic mark, and remains the most studied...
-based methods, despite possible single-nucleotide resolution, have a drawback: the conversion of unmethylated cytosine to uracil can be unstable. In addition, when bisulfite conversion is coupled with DNA microarrays to detect bisulfite converted sites, the reduced sequence complexity of DNA is a problem. Microarrays capable of comprehensively profiling the whole-genome become difficult to design as fewer unique probes are available.
Methods
The following sections outline the method of MeDIP coupled with either high-resolution array hybridization or high-throughput sequencing. Each DNA detection method will also briefly describe post-laboratory processing and analysis. Different post-processing of the raw data is required depending on the technology used to identify the methylated sequences. This is analogous to data generated using ChIP-chip and ChIP-seqChip-Sequencing
ChIP-Sequencing, also known as ChIP-Seq, is used to analyze protein interactions with DNA. ChIP-Seq combines chromatin immunoprecipitation with massively parallel DNA sequencing to identify the cistrome of DNA-associated proteins. It can be used to precisely map global binding sites for any...
.
Methylated DNA immunoprecipitation (MeDIP)
Genomic DNA is extracted (DNA extractionDNA extraction
DNA isolation is a routine procedure to collect DNA for subsequent molecular or forensic analysis. There are three basic and two optional steps in a DNA extraction:...
) from the cells and purified. The purified DNA is then subjected to sonication to shear it into random fragments. This sonication process is quick, simple, and avoids restriction enzyme
Restriction enzyme
A Restriction Enzyme is an enzyme that cuts double-stranded DNA at specific recognition nucleotide sequences known as restriction sites. Such enzymes, found in bacteria and archaea, are thought to have evolved to provide a defense mechanism against invading viruses...
biases. The resulting fragments range from 300 to 1000 base pairs (bp) in length, although they are typically between 400 and 600 bp. The short length of these fragments are important in obtaining adequate resolution, improving the efficiency of the downstream step in immunoprecipitation, and reducing fragment-length effects or biases. Also, the size of the fragment affects the binding of 5-methyl-cytidine (5mC) antibody because the antibody needs more than just a single 5mC for efficient binding. To further improve binding affinity of the antibodies, the DNA fragments are denatured to produce single-stranded DNA. Following denaturation, the DNA is incubated with monoclonal
Monoclonal antibodies
Monoclonal antibodies are monospecific antibodies that are the same because they are made by identical immune cells that are all clones of a unique parent cell....
5mC antibodies. The classical immunoprecipitation
Immunoprecipitation
Immunoprecipitation is the technique of precipitating a protein antigen out of solution using an antibody that specifically binds to that particular protein. This process can be used to isolate and concentrate a particular protein from a sample containing many thousands of different proteins...
technique is then applied: magnetic beads conjugated to anti-mouse-IgG are used to bind the anti-5mC antibodies, and unbound DNA is removed in the supernatant. To purify the DNA, proteinase K
Proteinase K
Proteinase K is a broad-spectrum serine protease. The enzyme was discovered in 1974 in extracts of the fungus Engyodontium album . Proteinase K is able to digest native keratin , hence, the name "Proteinase K"...
is added to digest the antibodies and release the DNA, which can be collected and prepared for DNA detection.
For more details regarding the experimental steps see .
MeDIP and array-based hybridization (MeDIP-chip)
A fraction of the input DNA obtained after the sonication step above is labeled with cyanineCyanine
Cyanine is a non-systematic name of a synthetic dye family belonging to polymethine group. Cyanines have many uses as fluorescent dyes, particularly in biomedical imaging...
-5 (Cy5; red) deoxy-cytosine-triphosphate while the methylated DNA, enriched after the immunoprecipitation step, is labeled with cyanine
Cyanine
Cyanine is a non-systematic name of a synthetic dye family belonging to polymethine group. Cyanines have many uses as fluorescent dyes, particularly in biomedical imaging...
-3 (Cy3; green). The labeled DNA samples are cohybridized on a 2-channel, high-density genomic microarray to probe for presence and relative quantities. The purpose of this comparison is identify sequences that show significant differential expression, thereby confirming the sequence of interest is enriched.
Array-based identification of MeDIP sequences are limited to the array design. As a result, the resolution is restricted to the probes in the array design. There are additional standard steps required in signal processing to correct for hybridization issues such as noise, as is the case with most array technologies.
See for more details.
MeDIP and high-throughput sequencing (MeDIP-seq)
The MeDIP-seq approach, i.e. the coupling of MeDIP with next generation, short-read sequencing technologies such as 454454 Life Sciences
454 Life Sciences, is a biotechnology company based in Branford, Connecticut. It is a subsidiary of Roche, and specializes in high-throughput DNA sequencing.-History and Major Achievements:...
, Illumina (company)
Illumina (company)
Illumina, Inc. is a company incorporated in April 1998 that develops, manufactures and markets integrated systems for the analysis of genetic variation and biological function. Using its technologies, the company provides a line of products and services that serve the sequencing, genotyping and...
(Solexa), and SoLiD (Applied Biosystems
Applied Biosystems
Applied Biosystems, Inc. started as GeneCo , was the name of a pioneer biotechnology company founded in 1981 in Foster City, California, in the San Francisco Bay Area...
), was first described by Down et al. in 2008. The high-throughput sequencing of the methylated DNA fragments produces a large number of short reads (36-50bp or 400 bp, depending on the technology). The short reads are aligned to a reference genome using alignment software such as Mapping and Assembly with Quality (Maq), which uses a Bayesian
Bayesian inference
In statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...
approach, along with base and mapping qualities to model error probabilities for the alignments. The reads can then be extended to represent the ~400 to 700 bp fragments from the sonication step. The coverage of these extended reads can be used to estimate the methylation level of the region. A genome browser such as Ensembl can also be used to visualize the data.
Validation of the approach to assess quality and accuracy of the data can be done with quantitative PCR. This is done by comparing a sequence from the MeDIP sample against an unmethylated control sequence. The samples are then run on a gel
Gel electrophoresis
Gel electrophoresis is a method used in clinical chemistry to separate proteins by charge and or size and in biochemistry and molecular biology to separate a mixed population of DNA and RNA fragments by length, to estimate the size of DNA and RNA fragments or to separate proteins by charge...
and the band intensities are compared. The relative intensity serves as the guide for finding enrichment. The results can also be compared with MeDIP-chip results to help determine coverage needed.
Downstream bioinformatics analysis
The DNA methylation level estimations can be confounded by varying densities of methylated CpG sites across the genome when observing data generated by MeDIP. This can be problematic for analyzing CpG-poor (lower density) regions. One reason for this density issue is its effect on the efficiency of immunoprecipitation. In their study, Down et al. developed a tool to estimate absolute methylation levels from data generated by MeDIP by modeling the density of methylated CpG sites. This tool is called Bayesian tool for methylation analysis (Batman). The study reports the coverage of ~90% of all CpG sites in promoters, gene-coding regions, islands, and regulatory elements where methylation levels can be estimated; this is almost 20 times better coverage than any previous methods.Studies using MeDIP-seq or MeDIP-chip are both genome-wide approaches that have the common aim of obtaining the functional mapping of the methylome. Once regions of DNA methylation are identified, a number of bioinformatics analyses can be applied to answer certain biological questions. One obvious step is to investigate genes contained in these regions and investigate the functional significance of their repression. For example, silencing of tumour-suppressor genes in cancer can be attributed to DNA methylation. By identifying mutational events leading to hypermethylation and subsequent repression of known tumour-suppressor genes, one can more specifically characterize the contributing factors to the cause of the disease. Alternatively, one can identify genes that are known to be normally methylated but, as a result of some mutation event, is no longer silenced.
Also, one can try and investigate and identify whether some epigenetic regulator has been affected such as DNA methyltransferase (DNMT); in these cases, enrichment may be more limited.
Limitations of MeDIP
Limitations to take note when using MeDIP are typical experimental factors. This includes the quality and cross-reactivity of 5mC antibodies used in the procedure. Furthermore, DNA detection methods (i.e. array hybridization and high-throughput sequencing) typically involve well established limitations. Particularly for array-based procedures, as mentioned above, sequences being analyzed are limited to the specific array design used.Most typical limitations to high-throughput, next generation sequencing apply. The problem of alignment accuracy to repetitive regions in the genome will result in less accurate analysis of methylation in those regions. Also, as was mentioned above, short reads (e.g. 36-50bp from an Illumina Genome Analyzer) represent a part of a sheared fragment when aligned to the genome; therefore, the exact methylation site can fall anywhere within a window that is a function of the fragment size. In this respect, bisulfite sequencing has much higher resolution (down to a single CpG site; single nucleotide level). However, this level of resolution may not be required for most applications, as the methylation status of CpG sites within < 1000 bp has been shown to be significantly correlated.
Applications of MeDIP
- Weber et al. 2005 determined that the inactive X-chromosome in females is hypermethylated on a chromosome wide level using MeDIP coupled with microarray.
- Keshet et al. 2006 performed a study on colon and prostate cancer cells using MeDIP-chip. The result is a genome-wide analysis of genes lying in hypermethylated regions as well as conclude that there is an instructive mechanism of de novo methylation in cancer cells.
- Zhang et al. 2006 obtained a high resolution methylome mapping in Arabidopsis using MeDIP-chip.
- Novak et al. 2006 used the MeDIP-chip approach to investigate human breast cancer for methylation associated silencing and observed the inactivation of the HOXA gene cluster
See also
- EpigeneticsEpigeneticsIn biology, and specifically genetics, epigenetics is the study of heritable changes in gene expression or cellular phenotype caused by mechanisms other than changes in the underlying DNA sequence – hence the name epi- -genetics...
- ImmunoprecipitationImmunoprecipitationImmunoprecipitation is the technique of precipitating a protein antigen out of solution using an antibody that specifically binds to that particular protein. This process can be used to isolate and concentrate a particular protein from a sample containing many thousands of different proteins...
- Methylome
- Restriction landmark genomic scanningRestriction landmark genomic scanningRestriction Landmark Genomic Scanning is a genome analysis method that allows for rapid simultaneous visualization of thousands of landmarks, or restriction sites. Using a combination of restriction enzymes some of which are specific to DNA modifications, the technique can be used to visualize...
- Bisulfite sequencingBisulfite sequencingBisulfite sequencing is the use of bisulfite treatment of DNA to determine its pattern of methylation. DNA methylation was the first discovered epigenetic mark, and remains the most studied...