International HapMap Project
Encyclopedia
The International HapMap Project is an organization that aims to develop a haplotype
map
(HapMap) of the human genome
, which will describe the common patterns of human genetic variation
. HapMap is a key resource for researchers to find genetic variants affecting health, disease and responses to drugs and environmental factors. The information produced by the project is made freely available to researchers around the world.
The International HapMap Project is a collaboration among researchers at academic centers, non-profit biomedical research groups and private companies in Canada
, China
, Japan
, Nigeria
, the United Kingdom
, and the United States
. It officially started with a meeting on October 27 to 29, 2002, and was expected to take about three years. It comprises two phases; the complete data obtained in Phase I were published on 27 October 2005. The analysis of the Phase II dataset was published in October 2007. The Phase III dataset was released in spring 2009.
Mendelian diseases, combinations of different genes
and the environment play a role in the development and progression of common diseases (such as diabetes, cancer
, heart disease
, stroke
, depression
and asthma
), or in the individual response to pharmacological agents. To find the genetic factors involved in these diseases, one could in principle obtain the complete genetic sequence of several individuals, some with the disease and some without, and then search for differences between the two sets of genomes. This approach is currently infeasible because of the cost of full genome sequencing
. The HapMap project proposes a shortcut.
Although any two unrelated people share about 99.5% of their DNA
sequence, some people may have an A at a particular site on a chromosome while others have a G instead. Such a site is known as a single nucleotide polymorphism
(SNP), and each of the two possibilities is called an allele
. The HapMap project focuses only on common SNPs, those where each allele occurs in at least 1% of the population.
Each person has two copies of all chromosomes, except the sex chromosomes in males. For each SNP, the combination of alleles a person has is called a genotype
. Genotyping
refers to uncovering what genotype a person has at a particular site. The HapMap project chose a sample of 269 individuals and selected several million well-defined SNPs, genotyped the individuals for these SNPs, and published the results.
The alleles of nearby SNPs on a single chromosome are correlated. This means that if the allele of one SNP for a given individual is known, the alleles of nearby SNPs can often be predicted. This is because each SNP arose in evolutionary history as a single point mutation
, and was then passed down to descendents surrounded by other, earlier, point mutations. SNPs that are separated by a large distance are typically not very well correlated, because recombination
occurs in each generation, mixing the allele sequences of the two chromosomes. A sequence of
consecutive alleles on a particular chromosome is known as a haplotype
.
To find the genetic factors involved in a particular disease, one can proceed as follows. First a certain region of interest in the genome is identified, possibly from earlier inheritance studies. In this region one then locates a set of tag SNP
s from the HapMap data; these are SNPs that are very well correlated with all the other SNPs in the region, so that knowledge of the alleles of the tag SNPs in an individual will determine the individual's haplotype with high probability. Next, one determines the genotype for these tag SNPs in several individuals, some with the disease and some without. By comparing the two groups, one can then determine the likely locations and haplotypes that are involved in the disease.
, Nigeria
(YRI), 30 trios of U.S. residents of northern and western European
ancestry (CEU), 44 unrelated individuals from Tokyo
, Japan
(JPT) and 45 unrelated Han Chinese
individuals from Beijing
, China
(CHB). Although the haplotypes revealed from these populations should be useful for studying many other populations, parallel studies are currently examining the usefulness of including additional populations in the project.
All samples were collected through a community engagement process with appropriate informed consent. The community engagement process was designed to identify and attempt to respond to culturally specific concerns and give participating communities input into the informed consent and sample collection processes.
The Canadian team was led by Thomas J. Hudson
at McGill University
in Montreal
and focused on chromosomes 2 and 4p. The Chinese team was led by Huanming Yang with centres in Beijing
, Shanghai
and Hong Kong
and focused on chromosomes 3, 8p and 21. The Japanese team was led by Yusuke Nakamura at the University of Tokyo
and focused on chromosomes 5, 11, 14, 15, 16, 17 and 19. The British team was led by David R. Bentley at the Sanger Institute
and focused on chromosomes 1, 6, 10, 13 and 20. There were four American genotyping centres: a team led by Mark Chee and Arnold Oliphant located at Illumina Inc.
in San Diego (chromosomes 8q, 9, 18q, 22 and X), a team led by David Altshuler at the Broad Institute
in Cambridge, USA
(chromosomes 4q, 7q, 18p, Y and mitochondrion
), a team led by Richard A. Gibbs at the Baylor College of Medicine
in Houston (chromosome 12) and a team led by Pui-Yan Kwok at the University of California, San Francisco
(chromosome 7p).
To obtain enough SNPs to create the Map, the Consortium had to fund a large re-sequencing project to discover millions of additional SNPs. These were submitted to the public dbSNP
database. As a result, by August 2006, there were more than ten million SNPs in the database with more than 40% of them that were known to be polymorphic. By comparison, at the start of the project, fewer than 3 million SNPs were known and no more than 10% of them were known to be polymorphic.
During Phase II more than two million additional SNPs have been genotyped throughout the genome by the company Perlegen Sciences and 500,000 by the company Affymetrix
.
program.
Haplotype
A haplotype in genetics is a combination of alleles at adjacent locations on the chromosome that are transmitted together...
map
Map
A map is a visual representation of an area—a symbolic depiction highlighting relationships between elements of that space such as objects, regions, and themes....
(HapMap) of the human genome
Human genome
The human genome is the genome of Homo sapiens, which is stored on 23 chromosome pairs plus the small mitochondrial DNA. 22 of the 23 chromosomes are autosomal chromosome pairs, while the remaining pair is sex-determining...
, which will describe the common patterns of human genetic variation
Genetic variability
Genetic variability is a measure of the tendency of individual genotypes in a population to vary from one another. Variability is different from genetic diversity, which is the amount of variation seen in a particular population. The variability of a trait describes how much that trait tends to...
. HapMap is a key resource for researchers to find genetic variants affecting health, disease and responses to drugs and environmental factors. The information produced by the project is made freely available to researchers around the world.
The International HapMap Project is a collaboration among researchers at academic centers, non-profit biomedical research groups and private companies in Canada
Canada
Canada is a North American country consisting of ten provinces and three territories. Located in the northern part of the continent, it extends from the Atlantic Ocean in the east to the Pacific Ocean in the west, and northward into the Arctic Ocean...
, China
China
Chinese civilization may refer to:* China for more general discussion of the country.* Chinese culture* Greater China, the transnational community of ethnic Chinese.* History of China* Sinosphere, the area historically affected by Chinese culture...
, Japan
Japan
Japan is an island nation in East Asia. Located in the Pacific Ocean, it lies to the east of the Sea of Japan, China, North Korea, South Korea and Russia, stretching from the Sea of Okhotsk in the north to the East China Sea and Taiwan in the south...
, Nigeria
Nigeria
Nigeria , officially the Federal Republic of Nigeria, is a federal constitutional republic comprising 36 states and its Federal Capital Territory, Abuja. The country is located in West Africa and shares land borders with the Republic of Benin in the west, Chad and Cameroon in the east, and Niger in...
, the United Kingdom
United Kingdom
The United Kingdom of Great Britain and Northern IrelandIn the United Kingdom and Dependencies, other languages have been officially recognised as legitimate autochthonous languages under the European Charter for Regional or Minority Languages...
, and the United States
United States
The United States of America is a federal constitutional republic comprising fifty states and a federal district...
. It officially started with a meeting on October 27 to 29, 2002, and was expected to take about three years. It comprises two phases; the complete data obtained in Phase I were published on 27 October 2005. The analysis of the Phase II dataset was published in October 2007. The Phase III dataset was released in spring 2009.
Background
Unlike with the rarerRare disease
A rare disease, also referred to as an orphan disease, is any disease that affects a small percentage of the population.Most rare diseases are genetic, and thus are present throughout the person's entire life, even if symptoms do not immediately appear...
Mendelian diseases, combinations of different genes
Gênes
Gênes is the name of a département of the First French Empire in present Italy, named after the city of Genoa. It was formed in 1805, when Napoleon Bonaparte occupied the Republic of Genoa. Its capital was Genoa, and it was divided in the arrondissements of Genoa, Bobbio, Novi Ligure, Tortona and...
and the environment play a role in the development and progression of common diseases (such as diabetes, cancer
Cancer
Cancer , known medically as a malignant neoplasm, is a large group of different diseases, all involving unregulated cell growth. In cancer, cells divide and grow uncontrollably, forming malignant tumors, and invade nearby parts of the body. The cancer may also spread to more distant parts of the...
, heart disease
Heart disease
Heart disease, cardiac disease or cardiopathy is an umbrella term for a variety of diseases affecting the heart. , it is the leading cause of death in the United States, England, Canada and Wales, accounting for 25.4% of the total deaths in the United States.-Types:-Coronary heart disease:Coronary...
, stroke
Stroke
A stroke, previously known medically as a cerebrovascular accident , is the rapidly developing loss of brain function due to disturbance in the blood supply to the brain. This can be due to ischemia caused by blockage , or a hemorrhage...
, depression
Clinical depression
Major depressive disorder is a mental disorder characterized by an all-encompassing low mood accompanied by low self-esteem, and by loss of interest or pleasure in normally enjoyable activities...
and asthma
Asthma
Asthma is the common chronic inflammatory disease of the airways characterized by variable and recurring symptoms, reversible airflow obstruction, and bronchospasm. Symptoms include wheezing, coughing, chest tightness, and shortness of breath...
), or in the individual response to pharmacological agents. To find the genetic factors involved in these diseases, one could in principle obtain the complete genetic sequence of several individuals, some with the disease and some without, and then search for differences between the two sets of genomes. This approach is currently infeasible because of the cost of full genome sequencing
Full genome sequencing
Full genome sequencing , also known as whole genome sequencing , complete genome sequencing, or entire genome sequencing, is a laboratory process that determines the complete DNA sequence of an organism's genome at a single time...
. The HapMap project proposes a shortcut.
Although any two unrelated people share about 99.5% of their DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...
sequence, some people may have an A at a particular site on a chromosome while others have a G instead. Such a site is known as a single nucleotide polymorphism
Single nucleotide polymorphism
A single-nucleotide polymorphism is a DNA sequence variation occurring when a single nucleotide — A, T, C or G — in the genome differs between members of a biological species or paired chromosomes in an individual...
(SNP), and each of the two possibilities is called an allele
Allele
An allele is one of two or more forms of a gene or a genetic locus . "Allel" is an abbreviation of allelomorph. Sometimes, different alleles can result in different observable phenotypic traits, such as different pigmentation...
. The HapMap project focuses only on common SNPs, those where each allele occurs in at least 1% of the population.
Each person has two copies of all chromosomes, except the sex chromosomes in males. For each SNP, the combination of alleles a person has is called a genotype
Genotype
The genotype is the genetic makeup of a cell, an organism, or an individual usually with reference to a specific character under consideration...
. Genotyping
Genotyping
Genotyping is the process of determining differences in the genetic make-up of an individual by examining the individual's DNA sequence using biological assays and comparing it to another individual's sequence or a reference sequence. It reveals the alleles an individual has inherited from their...
refers to uncovering what genotype a person has at a particular site. The HapMap project chose a sample of 269 individuals and selected several million well-defined SNPs, genotyped the individuals for these SNPs, and published the results.
The alleles of nearby SNPs on a single chromosome are correlated. This means that if the allele of one SNP for a given individual is known, the alleles of nearby SNPs can often be predicted. This is because each SNP arose in evolutionary history as a single point mutation
Mutation
In molecular biology and genetics, mutations are changes in a genomic sequence: the DNA sequence of a cell's genome or the DNA or RNA sequence of a virus. They can be defined as sudden and spontaneous changes in the cell. Mutations are caused by radiation, viruses, transposons and mutagenic...
, and was then passed down to descendents surrounded by other, earlier, point mutations. SNPs that are separated by a large distance are typically not very well correlated, because recombination
Genetic recombination
Genetic recombination is a process by which a molecule of nucleic acid is broken and then joined to a different one. Recombination can occur between similar molecules of DNA, as in homologous recombination, or dissimilar molecules, as in non-homologous end joining. Recombination is a common method...
occurs in each generation, mixing the allele sequences of the two chromosomes. A sequence of
consecutive alleles on a particular chromosome is known as a haplotype
Haplotype
A haplotype in genetics is a combination of alleles at adjacent locations on the chromosome that are transmitted together...
.
To find the genetic factors involved in a particular disease, one can proceed as follows. First a certain region of interest in the genome is identified, possibly from earlier inheritance studies. In this region one then locates a set of tag SNP
Tag SNP
A tag SNP is a representative single nucleotide polymorphism in a region of the genome with high linkage disequilibrium . It is possible to identify genetic variation without genotyping every SNP in a chromosomal region...
s from the HapMap data; these are SNPs that are very well correlated with all the other SNPs in the region, so that knowledge of the alleles of the tag SNPs in an individual will determine the individual's haplotype with high probability. Next, one determines the genotype for these tag SNPs in several individuals, some with the disease and some without. By comparing the two groups, one can then determine the likely locations and haplotypes that are involved in the disease.
Samples used
Haplotypes are generally shared between populations, but their frequency can differ widely. Four populations were selected for inclusion in the HapMap: 30 adult-and-both-parents trios from IbadanIbadan
Ibadan is the capital city of Oyo State and the third largest metropolitan area in Nigeria, after Lagos and Kano, with a population of 1,338,659 according to the 2006 census. Ibadan is also the largest metropolitan geographical area...
, Nigeria
Nigeria
Nigeria , officially the Federal Republic of Nigeria, is a federal constitutional republic comprising 36 states and its Federal Capital Territory, Abuja. The country is located in West Africa and shares land borders with the Republic of Benin in the west, Chad and Cameroon in the east, and Niger in...
(YRI), 30 trios of U.S. residents of northern and western European
European ethnic groups
The ethnic groups in Europe are the various ethnic groups that reside in the nations of Europe. European ethnology is the field of anthropology focusing on Europe....
ancestry (CEU), 44 unrelated individuals from Tokyo
Tokyo
, ; officially , is one of the 47 prefectures of Japan. Tokyo is the capital of Japan, the center of the Greater Tokyo Area, and the largest metropolitan area of Japan. It is the seat of the Japanese government and the Imperial Palace, and the home of the Japanese Imperial Family...
, Japan
Japan
Japan is an island nation in East Asia. Located in the Pacific Ocean, it lies to the east of the Sea of Japan, China, North Korea, South Korea and Russia, stretching from the Sea of Okhotsk in the north to the East China Sea and Taiwan in the south...
(JPT) and 45 unrelated Han Chinese
Han Chinese
Han Chinese are an ethnic group native to China and are the largest single ethnic group in the world.Han Chinese constitute about 92% of the population of the People's Republic of China , 98% of the population of the Republic of China , 78% of the population of Singapore, and about 20% of the...
individuals from Beijing
Beijing
Beijing , also known as Peking , is the capital of the People's Republic of China and one of the most populous cities in the world, with a population of 19,612,368 as of 2010. The city is the country's political, cultural, and educational center, and home to the headquarters for most of China's...
, China
China
Chinese civilization may refer to:* China for more general discussion of the country.* Chinese culture* Greater China, the transnational community of ethnic Chinese.* History of China* Sinosphere, the area historically affected by Chinese culture...
(CHB). Although the haplotypes revealed from these populations should be useful for studying many other populations, parallel studies are currently examining the usefulness of including additional populations in the project.
All samples were collected through a community engagement process with appropriate informed consent. The community engagement process was designed to identify and attempt to respond to culturally specific concerns and give participating communities input into the informed consent and sample collection processes.
Scientific strategy
For the Phase I, one common SNP was genotyped every 5,000 bases. Overall, more than one million SNPs were genotyped. The genotyping was carried out by 10 centres using five different genotyping technologies. Genotyping quality was assessed by using duplicate or related samples and by having periodic quality checks where centres had to genotype common sets of SNPs.The Canadian team was led by Thomas J. Hudson
Thomas J. Hudson
Thomas James Hudson, M.D., is a Canadian genome scientist noted for his leading role in the generation of physical maps of the human and mouse genomes and also his role in the International HapMap Project whose goal is to develop a haplotype map of the human genome.As director of the McGill...
at McGill University
McGill University
Mohammed Fathy is a public research university located in Montreal, Quebec, Canada. The university bears the name of James McGill, a prominent Montreal merchant from Glasgow, Scotland, whose bequest formed the beginning of the university...
in Montreal
Montreal
Montreal is a city in Canada. It is the largest city in the province of Quebec, the second-largest city in Canada and the seventh largest in North America...
and focused on chromosomes 2 and 4p. The Chinese team was led by Huanming Yang with centres in Beijing
Beijing
Beijing , also known as Peking , is the capital of the People's Republic of China and one of the most populous cities in the world, with a population of 19,612,368 as of 2010. The city is the country's political, cultural, and educational center, and home to the headquarters for most of China's...
, Shanghai
Shanghai
Shanghai is the largest city by population in China and the largest city proper in the world. It is one of the four province-level municipalities in the People's Republic of China, with a total population of over 23 million as of 2010...
and Hong Kong
Hong Kong
Hong Kong is one of two Special Administrative Regions of the People's Republic of China , the other being Macau. A city-state situated on China's south coast and enclosed by the Pearl River Delta and South China Sea, it is renowned for its expansive skyline and deep natural harbour...
and focused on chromosomes 3, 8p and 21. The Japanese team was led by Yusuke Nakamura at the University of Tokyo
University of Tokyo
, abbreviated as , is a major research university located in Tokyo, Japan. The University has 10 faculties with a total of around 30,000 students, 2,100 of whom are foreign. Its five campuses are in Hongō, Komaba, Kashiwa, Shirokane and Nakano. It is considered to be the most prestigious university...
and focused on chromosomes 5, 11, 14, 15, 16, 17 and 19. The British team was led by David R. Bentley at the Sanger Institute
Sanger Institute
The Wellcome Trust Sanger Institute is a non-profit, British genomics and genetics research institute, primarily funded by the Wellcome Trust....
and focused on chromosomes 1, 6, 10, 13 and 20. There were four American genotyping centres: a team led by Mark Chee and Arnold Oliphant located at Illumina Inc.
Illumina (company)
Illumina, Inc. is a company incorporated in April 1998 that develops, manufactures and markets integrated systems for the analysis of genetic variation and biological function. Using its technologies, the company provides a line of products and services that serve the sequencing, genotyping and...
in San Diego (chromosomes 8q, 9, 18q, 22 and X), a team led by David Altshuler at the Broad Institute
Broad Institute
The Broad Institute is a genomic medicine research center located in Cambridge, Massachusetts, United States. Although it is independently governed and supported as a 501 nonprofit research organization, the institute is formally affiliated with the Massachusetts Institute of Technology, Harvard...
in Cambridge, USA
Cambridge, Massachusetts
Cambridge is a city in Middlesex County, Massachusetts, United States, in the Greater Boston area. It was named in honor of the University of Cambridge in England, an important center of the Puritan theology embraced by the town's founders. Cambridge is home to two of the world's most prominent...
(chromosomes 4q, 7q, 18p, Y and mitochondrion
Mitochondrion
In cell biology, a mitochondrion is a membrane-enclosed organelle found in most eukaryotic cells. These organelles range from 0.5 to 1.0 micrometers in diameter...
), a team led by Richard A. Gibbs at the Baylor College of Medicine
Baylor College of Medicine
Baylor College of Medicine, located in the Texas Medical Center in Houston, Texas, USA, is a highly regarded medical school and leading center for biomedical research and clinical care...
in Houston (chromosome 12) and a team led by Pui-Yan Kwok at the University of California, San Francisco
University of California, San Francisco
The University of California, San Francisco is one of the world's leading centers of health sciences research, patient care, and education. UCSF's medical, pharmacy, dentistry, nursing, and graduate schools are among the top health science professional schools in the world...
(chromosome 7p).
To obtain enough SNPs to create the Map, the Consortium had to fund a large re-sequencing project to discover millions of additional SNPs. These were submitted to the public dbSNP
DbSNP
The Single Nucleotide Polymorphism Database is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information in collaboration with the National Human Genome Research Institute...
database. As a result, by August 2006, there were more than ten million SNPs in the database with more than 40% of them that were known to be polymorphic. By comparison, at the start of the project, fewer than 3 million SNPs were known and no more than 10% of them were known to be polymorphic.
During Phase II more than two million additional SNPs have been genotyped throughout the genome by the company Perlegen Sciences and 500,000 by the company Affymetrix
Affymetrix
Affymetrix is a company that manufactures DNA microarrays; it is based in Santa Clara, California, United States. The company was founded by Dr. Stephen Fodor in 1992. It began as a unit in Affymax N.V...
.
Data access
All of the data generated by the project, including SNP frequencies, genotypes and haplotypes, were placed in the public domain and are available for download. This website also contains a genome browser which allows to find SNPs in any region of interest, their allele frequencies and their association to nearby SNPs. A tool that can determine tag SNPs for a given region of interest is also provided. These data can also be directly accessed from the widely used HaploviewHaploview
Haploview is a commonly used bioinformatics software which is designed to analyze and visualize patterns of linkage disequilibrium in genetic data. Haploview can also perform association studies, choosing tagSNPs and estimating haplotype frequencies. Haploview is developed and maintained by Dr...
program.
Publications
- International HapMap Consortium. (2003) The International HapMap Project. Nature 426(6968):789-96.
- International HapMap Consortium. (2004) Integrating ethics and science in the International HapMap Project. Nat Rev Genet. 5(6):467-75.
- International HapMap Consortium. (2005) A haplotype map of the human genome. Nature 437(7063):1299-320.
- International HapMap Consortium. (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449(7164):851-861.
- Deloukas P, Bentley D. (2004) The HapMap project and its application to genetic studies of drug response. Pharmacogenomics J. 4(2):88-90.
- Secko, David Phase I of the HapMap Complete The Scientist (October, 2005)
- Thorisson GA, Smith AV, Krishnan L, Stein LD. (2005) The International HapMap Project Web site. Genome Res. 15(11):1592-3.
- Terwilliger JD and Hiekkalinna T (2006). An utter refutation of the 'Fundamental Theorem of the HapMap' European Journal of Human Genetics 14, 426–437
See also
- Genealogical DNA testGenealogical DNA testA genealogical DNA test examines the nucleotides at specific locations on a person's DNA for genetic genealogy purposes. The test results are not meant to have any informative medical value and do not determine specific genetic diseases or disorders ; they are intended only to give genealogical...
- The 1000 Genomes ProjectThe 1000 Genomes ProjectThe 1000 Genomes Project, launched in January 2008, is an international research effort to establish by far the most detailed catalogue of human genetic variation...
- Population groups in biomedicine
- Human Variome ProjectHuman Variome ProjectThe Human Variome Project is the global initiative to collect and curate all human genetic variation affecting human health. Its mission is to improve health outcomes by facilitating the unification of data on human genetic variation and its impact on human health.-Inception:The HVP concept was...
- Human genetic variationHuman genetic variationHuman genetic variation refers to genetic differences both within and among populations. There may be multiple variants of any given gene in the human population , leading to polymorphism. Many genes are not polymorphic, meaning that only a single allele is present in the population: that allele is...