Genome-wide association study
Encyclopedia
In genetic epidemiology
Genetic epidemiology
Genetic epidemiology is the study of the role of genetic factors in determining health and disease in families and in populations, and the interplay of such genetic factors with environmental factors...

, a genome-wide association study (GWA study, or GWAS), also known as whole genome association study (WGA study, or WGAS), is an examination of many common genetic variants in different individuals to see if any variant is associated with a trait. Typically single-nucleotide polymorphisms (SNPs) are investigated and typically investigated traits include major diseases. The first GWAS was from 2005 and compared 96 patients with age-related macular degeneration with 50 healthy controls. Today, hundreds or thousands of individuals are tested. , over 1,200 human GWASs have examined over 200 diseases and traits, and found almost 4,000 SNP associations. The GWAS identify SNPs and other variants in DNA which are associated with a disease, but cannot on their own specify which genes are causal.

These studies normally compare the DNA of two groups of participants: people with the disease (cases) and similar people without (controls). Each person gives a sample of cells, such as swabs of cells from the inside of the cheek. DNA is extracted from these cells, and spread on gene chips
DNA microarray
A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome...

, which can read millions of DNA sequences. These chips are read into computers, where they can be analyzed with bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

 techniques. Rather than reading the entire DNA sequence, these systems usually read SNPs that are variations in single nucleotides. If genetic variations are more frequent in people with the disease, the variations are said to be "associated" with the disease. The associated genetic variations are then considered as pointers to the region of the human genome where the disease-causing problem is likely to reside. In contrast to methods which specifically test one or a few genetic regions, the GWAS investigates the entire genome. These two approaches are said to be candidate-driven and non-candidate-driven, respectively.

Surprisingly, most of the SNP variations associated with disease are not in the region of DNA that codes for a protein. Instead, they are usually in the large non-coding regions on the chromosome between genes, or in the intron
Intron
An intron is any nucleotide sequence within a gene that is removed by RNA splicing to generate the final mature RNA product of a gene. The term intron refers to both the DNA sequence within a gene, and the corresponding sequence in RNA transcripts. Sequences that are joined together in the final...

 sequences that are edited out of the DNA sequence when proteins are processed. These are presumably sequences of DNA that control splicing or expression of genes.

Background

The human genome contains many millions of single-nucleotide polymorphisms, and thousands more variations in the number of copies of large and small segments of the genome (copy number variation), which may either directly cause changes in phenotype
Phenotype
A phenotype is an organism's observable characteristics or traits: such as its morphology, development, biochemical or physiological properties, behavior, and products of behavior...

 or which tag nearby mutations containing the key differences that influence individual variation and susceptibility to disease. GWA studies allow researchers to sample 500,000 or more SNPs from each subject in a study capturing variation uniformly across the genome. To date, these studies have identified risk and protective factors for asthma, cancer, diabetes, heart disease, mental illness, and other human differences.

Most genetic variations are associated with the geographical and historical populations in which the mutations first arose. This ability of SNPs to tag surrounding blocks of ancient DNA (haplotype
Haplotype
A haplotype in genetics is a combination of alleles at adjacent locations on the chromosome that are transmitted together...

s) underlies the rationale for GWAS. However, because of this, studies must take account of the geographical and racial background of participants—controlling for what is called population stratification
Population stratification
Population stratification is the presence of a systematic difference in allele frequencies between subpopulations in a population possibly due to different ancestry, especially in the context of association studies...

. As the peoples of the world have migrated and inter-married over many generations, these geographical variations also become broken down and mixed over time.

Genes identified

In 2005, a GWAS found an association between age-related macular degeneration (ARMD) and a variation in the gene for complement factor H (CFH). Complement
Complement system
The complement system helps or “complements” the ability of antibodies and phagocytic cells to clear pathogens from an organism. It is part of the immune system called the innate immune system that is not adaptable and does not change over the course of an individual's lifetime...

 is a group of proteins that regulate inflammation. This association was unexpected from previous research in ARMD, and identified ARMD as an inflammatory process. Together with 4 other variants, these genes can predict half the risk of ARMD between siblings, and it is among the most successful examples of GWAS.

In 2007, a GWAS found an association between type 2 diabetes
Diabetes mellitus type 2
Diabetes mellitus type 2formerly non-insulin-dependent diabetes mellitus or adult-onset diabetesis a metabolic disorder that is characterized by high blood glucose in the context of insulin resistance and relative insulin deficiency. Diabetes is often initially managed by increasing exercise and...

 and a variation in several SNPs in the genes TCF7L2
TCF7L2
Transcription factor 7-like 2 also known as TCF7L2 or TCF4 is a protein acting as a transcription factor. In humans this protein is encoded by the TCF7L2 gene...

, SLC30A8
SLC30A8
Solute carrier family 30 , member 8, also known as SLC30A8, is a human gene that codes for a zinc transporter related to insulin secretion in humans. Certain alleles of this gene may increase the risk for developing type 2 diabetes....

 and others.

In 2007, the Wellcome Trust Case Control Consortium carried out genome-wide association studies for the diseases coronary heart disease
Coronary heart disease
Coronary artery disease is the end result of the accumulation of atheromatous plaques within the walls of the coronary arteries that supply the myocardium with oxygen and nutrients. It is sometimes also called coronary heart disease...

, type 1 diabetes
Diabetes mellitus type 1
Diabetes mellitus type 1 is a form of diabetes mellitus that results from autoimmune destruction of insulin-producing beta cells of the pancreas. The subsequent lack of insulin leads to increased blood and urine glucose...

, type 2 diabetes
Diabetes mellitus type 2
Diabetes mellitus type 2formerly non-insulin-dependent diabetes mellitus or adult-onset diabetesis a metabolic disorder that is characterized by high blood glucose in the context of insulin resistance and relative insulin deficiency. Diabetes is often initially managed by increasing exercise and...

, rheumatoid arthritis
Rheumatoid arthritis
Rheumatoid arthritis is a chronic, systemic inflammatory disorder that may affect many tissues and organs, but principally attacks synovial joints. The process produces an inflammatory response of the synovium secondary to hyperplasia of synovial cells, excess synovial fluid, and the development...

, Crohn's disease
Crohn's disease
Crohn's disease, also known as regional enteritis, is a type of inflammatory bowel disease that may affect any part of the gastrointestinal tract from mouth to anus, causing a wide variety of symptoms...

, bipolar disorder
Bipolar disorder
Bipolar disorder or bipolar affective disorder, historically known as manic–depressive disorder, is a psychiatric diagnosis that describes a category of mood disorders defined by the presence of one or more episodes of abnormally elevated energy levels, cognition, and mood with or without one or...

, and hypertension
Hypertension
Hypertension or high blood pressure is a cardiac chronic medical condition in which the systemic arterial blood pressure is elevated. What that means is that the heart is having to work harder than it should to pump the blood around the body. Blood pressure involves two measurements, systolic and...

. This study was successful in uncovering many new disease genes underlying these diseases.

Genes in many traditional genetic diseases, such as hemophilia, are always associated with the disease. Other genes are associated with an increased risk. Disappointingly, most of the SNP variations found by GWAS are associated with only a small increased risk of the disease, and have only a small predictive value. The median odds ratio for a SNP is 1.33 per SNP, with some variants carrying odds ratios above 3.0, and some exceeding 12.0. A common pattern is that a few variants have a large effect, but most have small effects.

Clinical applications

One of the challenges for a successful GWAS in the future will be to apply the findings in a way that accelerates drug and diagnostics development, including better integration of genetic studies into the drug-development process and a focus on the role of genetic variation in maintaining health as a blueprint for designing new drugs and diagnostics. One of such successes is related to identifying the genetic variant associating with response to anti-hepatitis C virus treatment. For genotype 1 hepatitis C treated with Pegylated interferon-alpha-2a or Pegylated interferon-alpha-2b (brand names Pegasys or PEG-Intron) combined with ribavirin
Ribavirin
Ribavirin is an anti-viral drug indicated for severe RSV infection , hepatitis C infection and other viral infections. Ribavirin is a prodrug, which when metabolised resembles purine RNA nucleotides...

, a GWAS study has shown that genetic polymorphisms near the human IL28B gene, encoding interferon lambda 3, are associated with significant differences in response to the treatment. A later report demonstrated that the same genetic variants are also associated with the natural clearance of the genotype 1 hepatitis C virus.

Problems

GWA studies are necessarily hypothesis-free: that is they search the entire genome for associations rather than focusing on small candidate areas. This aspect of GWA has attracted the criticism as expensive "factory science". Robert Elston
Robert C. Elston
Dr. Robert C. Elston is a distinguished statistical geneticist and professor at Case Western Reserve University. He was born in London, England. He is one of the eponyms of the Elston–Stewart algorithm.-External links:...

 is a prominent proponent of linkage, although he does accept association may occasionally be useful. Methodologically, the power of association to localize a mutation translates directly into the need for extremely dense searches. This led Pearson and Manolio to note that "the GWA approach can also be problematic because the massive number of statistical tests performed presents an unprecedented potential for false-positive results". Alternative strategies such as linkage analysis act as systematic studies of variation, without needing variants at each region.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK