Human Genome Project
Encyclopedia
The Human Genome Project (HGP) is an international scientific research project with a primary goal of determining the sequence of chemical base pairs which make up DNA
, and of identifying and mapping the approximately 20,000–25,000 genes
of the human genome
from both a physical and functional standpoint.
The project began in October 1990 and was initially headed by Ari Patrinos, head of the Office of Biological and Environmental Research in the U.S. Department of Energy's Office of Science
. Francis Collins
directed the National Institutes of Health National Human Genome Research Institute efforts. A working draft of the genome was announced in 2000 and a complete one in 2003, with further, more detailed analysis still being published. A parallel project was conducted outside of government by the Celera Corporation, which was formally launched in 1998. Most of the government-sponsored sequencing was performed in universities
and research centres from the United States
, the United Kingdom
, Japan
, France
, Germany
. The mapping of human genes is an important step in the development of medicines and other aspects of health care.
While the objective of the Human Genome Project is to understand the genetic
makeup of the human
species, the project has also focused on several other nonhuman organisms such as E. coli, the fruit fly, and the laboratory mouse. It remains one of the largest single investigative projects in modern science.
The Human Genome Project originally aimed to map the nucleotides contained in a human haploid reference genome
(more than three billion). Several groups have announced efforts to extend this to diploid human genomes including the International HapMap Project
, Applied Biosystems
, Perlegen, Illumina
, J. Craig Venter Institute
, Personal Genome Project
, and Roche-454
.
The "genome" of any given individual (except for identical twins and clone
d organisms) is unique; mapping "the human genome" involves sequencing multiple variations of each gene. The project did not study the entire DNA found in human cells; some heterochromatic
areas (about 8% of the total genome) remain un-sequenced.
, in particular workshops in 1984 and 1986 and a subsequent initiative
of the US Department of Energy. This 1987 report stated boldly, "The ultimate goal of this initiative is to understand the human genome" and "knowledge of the human is as necessary to the continuing progress of medicine and other health sciences as knowledge of human anatomy has been for the present state of medicine." Candidate technologies were already being considered for the proposed undertaking at least as early as 1985.
James D. Watson
was head of the National Center for Human Genome Research at the National Institutes of Health
(NIH) in the United States starting from 1988. Largely due to his disagreement with his boss, Bernadine Healy
, over the issue of patenting genes, Watson was forced to resign in 1992. He was replaced by Francis Collins
in April 1993, and the name of the Center was changed to the National Human Genome Research Institute
(NHGRI) in 1997.
The $3-billion project
was formally founded in 1990 by the United States Department of Energy
and the U.S. National Institutes of Health, and was expected to take 15 years. In addition to the United States
, the international consortium
comprised geneticist
s in the United Kingdom
, France
, Germany
, Japan
, China
, and India
.
Due to widespread international cooperation and advances in the field of genomics
(especially in sequence analysis
), as well as major advances in computing technology, a 'rough draft' of the genome was finished in 2000 (announced jointly by U.S. President Bill Clinton
and the British
Prime Minister
Tony Blair
on June 26, 2000). This first available rough draft assembly
of the genome was completed by the Genome Bioinformatics Group at the University of California, Santa Cruz
, primarily led by then graduate student Jim Kent
. Ongoing sequencing
led to the announcement of the essentially complete genome
in April 2003, 2 years earlier than planned. In May 2006, another milestone was passed on the way to completion of the project, when the sequence of the last chromosome
was published in the journal Nature
.
is stored in database
s available to anyone on the Internet
. The U.S. National Center for Biotechnology Information
(and sister organizations in Europe and Japan) house the gene sequence in a database known as GenBank
, along with sequences of known and hypothetical genes and proteins. Other organizations, such as the Genome Bioinformatics Group at the University of California, Santa Cruz
, and Ensembl
present additional data and annotation and powerful tools for visualizing and searching it. Computer program
s have been developed to analyse the data, because the data itself is difficult to interpret without such programs.
The process of identifying the boundaries between genes and other features in a raw DNA sequence is called genome annotation and is the domain of bioinformatics
. While expert biologists make the best annotators, their work proceeds slowly, and computer programs are increasingly used to meet the high-throughput demands of genome sequencing projects. The best current technologies for annotation make use of statistical models that take advantage of parallels between DNA sequences and human language
, using concepts from computer science such as formal grammar
s.
All humans have unique gene sequences. Therefore the data published by the HGP does not represent the exact sequence of every individual's genome. It is the combined "reference genome" of a small number of anonymous donors. The HGP genome is a scaffold for future work in identifying differences among individuals. Most of the current effort in identifying differences among individuals involves single-nucleotide polymorphisms and the HapMap.
1. There are approximately 30,000 genes in human beings, the same range as in mice and roundworms. Understanding how these genes express themselves will provide clues to how diseases are caused.
2. Between 1.1% to 1.4% of the genome sequence codes for proteins
3. The human genome has significantly more segmental duplication
s (nearly identical, repeated sections of DNA) than other mammalian genomes. These sections may underlie the creation of new primate-specific genes
4. At the time when the draft sequence was published less than 7% of protein families
appeared to be vertebrate specific
It was far too expensive at that time to think of sequencing patients’ whole genomes. So the National Institutes of Health embraced the idea for a "shortcut", which was to look just at sites on the genome where many people have a variant DNA unit. The theory behind the shortcut was that since the major diseases are common, so too would be the genetic variants that caused them. Natural selection keeps the human genome free of variants that damage health before children are grown, the theory held, but fails against variants that strike later in life, allowing them to become quite common. (In 2002 the National Institutes of Health started a $138 million project called the HapMap to catalog the common variants in European, East Asian and African genomes.)
The genome was broken into smaller pieces; approximately 150,000 base pairs in length. These pieces were then ligated into a type of vector known as "bacterial artificial chromosome
s", or BACs, which are derived from bacterial chromosomes which have been genetically engineered. The vectors containing the genes can be inserted into bacteria where they are copied by the bacterial DNA replication
machinery. Each of these pieces was then sequenced separately as a small "shotgun"
project and then assembled. The larger, 150,000 base pairs go together to create chromosomes. This is known as the "hierarchical shotgun" approach, because the genome is first broken into relatively large chunks, which are then mapped to chromosomes before being selected for sequencing.
Funding came from the US government through the National Institutes of Health in the United States, and a UK charity organization, the Wellcome Trust
, as well as numerous other groups from around the world. The funding supported a number of large sequencing centers including those at Whitehead Institute
, the Sanger Centre, Washington University in St. Louis
, and Baylor College of Medicine
.
The Human Genome Project is considered a Mega Project
because the human genome has approximately 3.3 billion base-pairs.
If the sequence obtained was to be stored in book form, and if each page contained 1000 base-pairs recorded and each book contained 1000 pages, then 3300 such books would be needed in order to store the complete genome. However, if expressed in units of computer data storage, 3.3 billion base-pairs recorded at 2 bits per pair would equal 786 megabytes of raw data. This is comparable to a fully data loaded CD.
, and his firm Celera Genomics
. Venter was a scientist at the NIH during the early 1990s when the project was initiated. The $300,000,000 Celera effort was intended to proceed at a faster pace and at a fraction of the cost of the roughly $3 billion publicly funded project
.
Celera used a technique called whole genome shotgun sequencing, employing pairwise end sequencing, which had been used to sequence bacterial genomes of up to six million base pairs in length, but not for anything nearly as large as the three billion base pair human genome.
Celera initially announced that it would seek patent protection on "only 200–300" genes, but later amended this to seeking "intellectual property protection" on "fully-characterized important structures" amounting to 100–300 targets. The firm eventually filed preliminary ("place-holder") patent applications on 6,500 whole or partial genes.
Celera also promised to publish their findings in accordance with the terms of the 1996 "Bermuda Statement
," by releasing new data annually (the HGP released its new data daily), although, unlike the publicly funded project, they would not permit free redistribution or scientific use of the data. The publicly funded competitor UC Santa Cruz was compelled to publish the first draft of the human genome before Celera for this reason. On July 7, 2000, the UCSC Genome Bioinformatics Group released a first working draft on the web. The scientific community downloaded one-half trillion bytes of information from the UCSC genome server in the first 24 hours of free and unrestricted access to the first ever assembled blueprint of our human species.
In March 2000, President Clinton
announced that the genome sequence
could not be patented, and should be made freely available to all researchers. The statement sent Celera's stock plummeting and dragged down the biotechnology
-heavy Nasdaq
. The biotechnology sector lost about $50 billion in market capitalization
in two days. But the public release of the data ensured its fair use and availability to all mankind.
Although the working draft was announced in June 2000, it was not until February 2001 that Celera and the HGP scientists published details of their drafts. Special issues of Nature
(which published the publicly funded project's scientific paper
) and Science
(which published Celera's paper) described the methods used to produce the draft sequence and offered analysis of the sequence. These drafts covered about 83% of the genome (90% of the euchromatic regions with 150,000 gaps and the order and orientation of many segments not yet established). In February 2001, at the time of the joint publications, press releases
announced that the project had been completed by both groups. Improved drafts were announced in 2003 and 2005, filling in to ≈92% of the sequence currently.
The competition proved to be very good for the project, spurring the public groups to modify their strategy in order to accelerate progress. The rivals at UC Santa Cruz initially agreed to pool their data, but the agreement fell apart when Celera refused to deposit its data in the unrestricted public database GenBank
. Celera had incorporated the public data into their genome, but forbade the public effort to use Celera data.
HGP is the most well known of many international genome project
s aimed at sequencing the DNA of a specific organism. While the human DNA sequence
offers the most tangible benefits, important developments in biology and medicine are predicted as a result of the sequencing of model organisms, including mice
, fruit flies
, zebrafish
, yeast
, nematodes, plants
, and many microbial organisms and parasites.
In 2004, researchers from the International Human Genome Sequencing Consortium (IHGSC) of the HGP announced a new estimate of 20,000 to 25,000 genes in the human genome. Previously 30,000 to 40,000 had been predicted, while estimates at the start of the project reached up to as high as 2,000,000. The number continues to fluctuate and it is now expected that it will take many years to agree on a precise value for the number of genes in the human genome.
Human Genome Project (HGP), researchers collected blood (female) or sperm (male) samples from a large number of donors. Only a few of many collected samples were processed as DNA resources. Thus the donor identities were protected so neither donors nor scientists could know whose DNA was sequenced. DNA clones from many different libraries were used in the overall project, with most of those libraries being created by Dr. Pieter J. de Jong. Much of the sequence (>70%) of the reference genome
produced by the public HGP came from a single anonymous male donor from Buffalo, New York
(code name
RP11).
HGP scientists used white blood cell
s from the blood of two male and two female donors (randomly selected from 20 of each) -- each donor yielding a separate DNA library. One of these libraries (RP11) was used considerably more than others, due to quality considerations. One minor technical issue is that male samples contain just over half as much DNA from the sex chromosomes (one X chromosome
and one Y chromosome
) compared to female samples (which contain two X chromosome
s). The other 22 chromosomes (the autosomes) are the same for both genders.
Although the main sequencing phase of the HGP has been completed, studies of DNA variation continue in the International HapMap Project
, whose goal is to identify patterns of single-nucleotide polymorphism (SNP) groups (called haplotype
s, or “haps”). The DNA samples for the HapMap came from a total of 270 individuals: Yoruba people
in Ibadan
, Nigeria
; Japanese people
in Tokyo
; Han Chinese
in Beijing
; and the French Centre d’Etude du Polymorphisms Humain (CEf) resource, which consisted of residents of the United States having ancestry from Western and Northern Europe
.
In the Celera Genomics
private-sector
project, DNA from five different individuals were used for sequencing. The lead scientist of Celera Genomics at that time, Craig Venter
, later acknowledged (in a public letter to the journal Science
) that his DNA was one of 21 samples in the pool, five of which were selected for use.
On September 4, 2007, a team led by Craig Venter
published his complete DNA sequence, unveiling the six-billion-nucleotide genome of a single individual for the first time.
and biotechnology
. Clear practical results of the project emerged even before the work was finished. For example, a number of companies, such as Myriad Genetics
started offering easy ways to administer genetic tests that can show predisposition to a variety of illnesses, including breast cancer
, hemostasis disorders, cystic fibrosis
, liver
diseases and many others. Also, the etiologies
for cancer
s, Alzheimer's disease
and other areas of clinical interest are considered likely to benefit from genome information and possibly may lead in the long term to significant advances in their management.
There are also many tangible benefits for biological scientists. For example, a researcher investigating a certain form of cancer
may have narrowed down his/her search to a particular gene. By visiting the human genome database on the World Wide Web
, this researcher can examine what other scientists have written about this gene, including (potentially) the three-dimensional structure of its product, its function(s), its evolutionary relationships to other human genes, or to genes in mice or yeast or fruit flies, possible detrimental mutations, interactions with other genes, body tissues in which this gene is activated, and diseases associated with this gene or other datatypes.
Further, deeper understanding of the disease processes at the level of molecular biology may determine new therapeutic procedures. Given the established importance of DNA in molecular biology and its central role in determining the fundamental operation of cellular processes
, it is likely that expanded knowledge in this area will facilitate medical advances in numerous areas of clinical interest that may not have been possible without them.
The analysis of similarities between DNA sequences from different organisms is also opening new avenues in the study of evolution
. In many cases, evolutionary questions can now be framed in terms of molecular biology
; indeed, many major evolutionary milestones (the emergence of the ribosome
and organelle
s, the development of embryo
s with body plans, the vertebrate
immune system
) can be related to the molecular level. Many questions about the similarities and differences between humans and our closest relatives (the primate
s, and indeed the other mammal
s) are expected to be illuminated by the data from this project.
The Human Genome Diversity Project
(HGDP), spinoff research aimed at mapping the DNA that varies between human ethnic group
s, which was rumored to have been halted, actually did continue and to date has yielded new conclusions. In the future, HGDP could possibly expose new data in disease surveillance
, human development
and anthropology
. HGDP could unlock secrets behind and create new strategies for managing the vulnerability of ethnic group
s to certain disease
s (see race in biomedicine). It could also show how human population
s have adapted to these vulnerabilities.
Advantages of Human Genome Project:
Debra Harry, Executive Director of the U.S group Indigenous Peoples Council on Biocolonialism (IPCB), says that despite a decade of ELSI funding, the burden of genetics education has fallen on the tribes themselves to understand the motives of Human genome project and its potential impacts on their lives. Meanwhile, the government has been busily funding projects studying indigenous groups without any meaningful consultation with the groups. (See Biopiracy
.)
The main criticism of ELSI is the failure to address the conditions raised by population-based research, especially with regard to unique processes for group decision-making and cultural worldviews. Genetic variation research such as HGP is group population research, but most ethical guidelines, according to Harry, focus on individual rights instead of group rights. She says the research represents a clash of culture: indigenous people's life revolves around collectivity and group decision making whereas the Western culture promotes individuality. Harry suggests that one of the challenges of ethical research is to include respect for collective review and decision making, while also upholding the Western model of individual rights.
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...
, and of identifying and mapping the approximately 20,000–25,000 genes
Gênes
Gênes is the name of a département of the First French Empire in present Italy, named after the city of Genoa. It was formed in 1805, when Napoleon Bonaparte occupied the Republic of Genoa. Its capital was Genoa, and it was divided in the arrondissements of Genoa, Bobbio, Novi Ligure, Tortona and...
of the human genome
Human genome
The human genome is the genome of Homo sapiens, which is stored on 23 chromosome pairs plus the small mitochondrial DNA. 22 of the 23 chromosomes are autosomal chromosome pairs, while the remaining pair is sex-determining...
from both a physical and functional standpoint.
The project began in October 1990 and was initially headed by Ari Patrinos, head of the Office of Biological and Environmental Research in the U.S. Department of Energy's Office of Science
Office of Science
The Office of Science is a component of the United States Department of Energy . The Office of Science is the lead federal agency supporting fundamental scientific research for energy and the Nation’s largest supporter of basic research in the physical sciences...
. Francis Collins
Francis Collins
Francis Collins may refer to:*Francis Collins , geneticist*Francis Dolan Collins , 19th century American politician-See also:*Frank Collins *Francis Collings, BBC journalist*Francis Collin, English footballer...
directed the National Institutes of Health National Human Genome Research Institute efforts. A working draft of the genome was announced in 2000 and a complete one in 2003, with further, more detailed analysis still being published. A parallel project was conducted outside of government by the Celera Corporation, which was formally launched in 1998. Most of the government-sponsored sequencing was performed in universities
University
A university is an institution of higher education and research, which grants academic degrees in a variety of subjects. A university is an organisation that provides both undergraduate education and postgraduate education...
and research centres from the United States
United States
The United States of America is a federal constitutional republic comprising fifty states and a federal district...
, the United Kingdom
United Kingdom
The United Kingdom of Great Britain and Northern IrelandIn the United Kingdom and Dependencies, other languages have been officially recognised as legitimate autochthonous languages under the European Charter for Regional or Minority Languages...
, Japan
Japan
Japan is an island nation in East Asia. Located in the Pacific Ocean, it lies to the east of the Sea of Japan, China, North Korea, South Korea and Russia, stretching from the Sea of Okhotsk in the north to the East China Sea and Taiwan in the south...
, France
France
The French Republic , The French Republic , The French Republic , (commonly known as France , is a unitary semi-presidential republic in Western Europe with several overseas territories and islands located on other continents and in the Indian, Pacific, and Atlantic oceans. Metropolitan France...
, Germany
Germany
Germany , officially the Federal Republic of Germany , is a federal parliamentary republic in Europe. The country consists of 16 states while the capital and largest city is Berlin. Germany covers an area of 357,021 km2 and has a largely temperate seasonal climate...
. The mapping of human genes is an important step in the development of medicines and other aspects of health care.
While the objective of the Human Genome Project is to understand the genetic
Genetics
Genetics , a discipline of biology, is the science of genes, heredity, and variation in living organisms....
makeup of the human
Human
Humans are the only living species in the Homo genus...
species, the project has also focused on several other nonhuman organisms such as E. coli, the fruit fly, and the laboratory mouse. It remains one of the largest single investigative projects in modern science.
The Human Genome Project originally aimed to map the nucleotides contained in a human haploid reference genome
Reference genome
A reference genome is a digital nucleic acid sequence database, assembled by scientists as a representative example of a species' genetic code. As they are often assembled from the sequencing of DNA from a number of donors, reference genomes do not accurately represent the genetic code of any...
(more than three billion). Several groups have announced efforts to extend this to diploid human genomes including the International HapMap Project
International HapMap Project
The International HapMap Project is an organization that aims to develop a haplotype map of the human genome, which will describe the common patterns of human genetic variation. HapMap is a key resource for researchers to find genetic variants affecting health, disease and responses to drugs and...
, Applied Biosystems
Applied Biosystems
Applied Biosystems, Inc. started as GeneCo , was the name of a pioneer biotechnology company founded in 1981 in Foster City, California, in the San Francisco Bay Area...
, Perlegen, Illumina
Illumina (company)
Illumina, Inc. is a company incorporated in April 1998 that develops, manufactures and markets integrated systems for the analysis of genetic variation and biological function. Using its technologies, the company provides a line of products and services that serve the sequencing, genotyping and...
, J. Craig Venter Institute
J. Craig Venter Institute
The J. Craig Venter Institute is a non-profit genomics research institute founded by J. Craig Venter, Ph.D. in October 2006. The Institute was the result of consolidating four organizations: the Center for the Advancement of Genomics, The Institute for Genomic Research, the Institute for...
, Personal Genome Project
Personal Genome Project
The Personal Genome Project is a long term, large cohort study which aims to sequence and publicize the complete genomes and medical records of 100,000 volunteers, in order to enable research into personalized medicine. It was initiated by Harvard University's George Church and announced in...
, and Roche-454
454 Life Sciences
454 Life Sciences, is a biotechnology company based in Branford, Connecticut. It is a subsidiary of Roche, and specializes in high-throughput DNA sequencing.-History and Major Achievements:...
.
The "genome" of any given individual (except for identical twins and clone
Cloning
Cloning in biology is the process of producing similar populations of genetically identical individuals that occurs in nature when organisms such as bacteria, insects or plants reproduce asexually. Cloning in biotechnology refers to processes used to create copies of DNA fragments , cells , or...
d organisms) is unique; mapping "the human genome" involves sequencing multiple variations of each gene. The project did not study the entire DNA found in human cells; some heterochromatic
Heterochromatin
Heterochromatin is a tightly packed form of DNA, which comes in different varieties. These varieties lie on a continuum between the two extremes of constitutive and facultative heterochromatin...
areas (about 8% of the total genome) remain un-sequenced.
History
The project began with the culmination of several years of work supported by the United States Department of EnergyUnited States Department of Energy
The United States Department of Energy is a Cabinet-level department of the United States government concerned with the United States' policies regarding energy and safety in handling nuclear material...
, in particular workshops in 1984 and 1986 and a subsequent initiative
of the US Department of Energy. This 1987 report stated boldly, "The ultimate goal of this initiative is to understand the human genome" and "knowledge of the human is as necessary to the continuing progress of medicine and other health sciences as knowledge of human anatomy has been for the present state of medicine." Candidate technologies were already being considered for the proposed undertaking at least as early as 1985.
James D. Watson
James D. Watson
James Dewey Watson is an American molecular biologist, geneticist, and zoologist, best known as one of the co-discoverers of the structure of DNA in 1953 with Francis Crick...
was head of the National Center for Human Genome Research at the National Institutes of Health
National Institutes of Health
The National Institutes of Health are an agency of the United States Department of Health and Human Services and are the primary agency of the United States government responsible for biomedical and health-related research. Its science and engineering counterpart is the National Science Foundation...
(NIH) in the United States starting from 1988. Largely due to his disagreement with his boss, Bernadine Healy
Bernadine Healy
Bernadine Patricia Healy was an American physician, cardiologist, academic and a former head of the National Institutes of Health . She was a professor of medicine at Johns Hopkins University, professor and dean of the College of Medicine and Public Health at the Ohio State University, and served...
, over the issue of patenting genes, Watson was forced to resign in 1992. He was replaced by Francis Collins
Francis Collins (geneticist)
Francis Sellers Collins , is an American physician-geneticist, noted for his discoveries of disease genes and his leadership of the Human Genome Project . He currently serves as Director of the National Institutes of Health in Bethesda, Maryland. Prior to being appointed Director, he founded and...
in April 1993, and the name of the Center was changed to the National Human Genome Research Institute
National Human Genome Research Institute
The National Human Genome Research Institute is a division of the National Institutes of Health, located in Bethesda, Maryland.NHGRI began as the National Center for Human Genome Research , which was established in 1989 to carry out the role of the NIH in the International Human Genome Project...
(NHGRI) in 1997.
The $3-billion project
Research funding
Research funding is a term generally covering any funding for scientific research, in the areas of both "hard" science and technology and social science. The term often connotes funding obtained through a competitive process, in which potential research projects are evaluated and only the most...
was formally founded in 1990 by the United States Department of Energy
United States Department of Energy
The United States Department of Energy is a Cabinet-level department of the United States government concerned with the United States' policies regarding energy and safety in handling nuclear material...
and the U.S. National Institutes of Health, and was expected to take 15 years. In addition to the United States
United States
The United States of America is a federal constitutional republic comprising fifty states and a federal district...
, the international consortium
Consortium
A consortium is an association of two or more individuals, companies, organizations or governments with the objective of participating in a common activity or pooling their resources for achieving a common goal....
comprised geneticist
Geneticist
A geneticist is a biologist who studies genetics, the science of genes, heredity, and variation of organisms. A geneticist can be employed as a researcher or lecturer. Some geneticists perform experiments and analyze data to interpret the inheritance of skills. A geneticist is also a Consultant or...
s in the United Kingdom
United Kingdom
The United Kingdom of Great Britain and Northern IrelandIn the United Kingdom and Dependencies, other languages have been officially recognised as legitimate autochthonous languages under the European Charter for Regional or Minority Languages...
, France
France
The French Republic , The French Republic , The French Republic , (commonly known as France , is a unitary semi-presidential republic in Western Europe with several overseas territories and islands located on other continents and in the Indian, Pacific, and Atlantic oceans. Metropolitan France...
, Germany
Germany
Germany , officially the Federal Republic of Germany , is a federal parliamentary republic in Europe. The country consists of 16 states while the capital and largest city is Berlin. Germany covers an area of 357,021 km2 and has a largely temperate seasonal climate...
, Japan
Japan
Japan is an island nation in East Asia. Located in the Pacific Ocean, it lies to the east of the Sea of Japan, China, North Korea, South Korea and Russia, stretching from the Sea of Okhotsk in the north to the East China Sea and Taiwan in the south...
, China
China
Chinese civilization may refer to:* China for more general discussion of the country.* Chinese culture* Greater China, the transnational community of ethnic Chinese.* History of China* Sinosphere, the area historically affected by Chinese culture...
, and India
India
India , officially the Republic of India , is a country in South Asia. It is the seventh-largest country by geographical area, the second-most populous country with over 1.2 billion people, and the most populous democracy in the world...
.
Due to widespread international cooperation and advances in the field of genomics
Genomics
Genomics is a discipline in genetics concerning the study of the genomes of organisms. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts. The field also includes studies of intragenomic phenomena such as heterosis,...
(especially in sequence analysis
Sequence analysis
In bioinformatics, the term sequence analysis refers to the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. Methodologies used include sequence alignment, searches against biological...
), as well as major advances in computing technology, a 'rough draft' of the genome was finished in 2000 (announced jointly by U.S. President Bill Clinton
Bill Clinton
William Jefferson "Bill" Clinton is an American politician who served as the 42nd President of the United States from 1993 to 2001. Inaugurated at age 46, he was the third-youngest president. He took office at the end of the Cold War, and was the first president of the baby boomer generation...
and the British
United Kingdom
The United Kingdom of Great Britain and Northern IrelandIn the United Kingdom and Dependencies, other languages have been officially recognised as legitimate autochthonous languages under the European Charter for Regional or Minority Languages...
Prime Minister
Prime Minister of the United Kingdom
The Prime Minister of the United Kingdom of Great Britain and Northern Ireland is the Head of Her Majesty's Government in the United Kingdom. The Prime Minister and Cabinet are collectively accountable for their policies and actions to the Sovereign, to Parliament, to their political party and...
Tony Blair
Tony Blair
Anthony Charles Lynton Blair is a former British Labour Party politician who served as the Prime Minister of the United Kingdom from 2 May 1997 to 27 June 2007. He was the Member of Parliament for Sedgefield from 1983 to 2007 and Leader of the Labour Party from 1994 to 2007...
on June 26, 2000). This first available rough draft assembly
Genome project
Genome projects are scientific endeavours that ultimately aim to determine the complete genome sequence of an organism and to annotate protein-coding genes and other important genome-encoded features...
of the genome was completed by the Genome Bioinformatics Group at the University of California, Santa Cruz
University of California, Santa Cruz
The University of California, Santa Cruz, also known as UC Santa Cruz or UCSC, is a public, collegiate university; one of ten campuses in the University of California...
, primarily led by then graduate student Jim Kent
Jim Kent
William James Kent is an American research scientist and computer programmer. He has been a contributor to genome database projects and the 2003 winner of the Benjamin Franklin Award .- Early life :...
. Ongoing sequencing
Sequencing
In genetics and biochemistry, sequencing means to determine the primary structure of an unbranched biopolymer...
led to the announcement of the essentially complete genome
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....
in April 2003, 2 years earlier than planned. In May 2006, another milestone was passed on the way to completion of the project, when the sequence of the last chromosome
Chromosome 1 (human)
Chromosome 1 is the designation for the largest human chromosome. Humans have two copies of chromosome 1, as they do with all of the autosomes, which are the non-sex chromosomes. Chromosome 1 spans about 247 million nucleotide base pairs, which are the basic units of information for DNA...
was published in the journal Nature
Nature (journal)
Nature, first published on 4 November 1869, is ranked the world's most cited interdisciplinary scientific journal by the Science Edition of the 2010 Journal Citation Reports...
.
State of completion
The Human Genome Project was completed in 2003, but research is still continuing.Goals
The sequence of the human DNADNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...
is stored in database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
s available to anyone on the Internet
Internet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...
. The U.S. National Center for Biotechnology Information
National Center for Biotechnology Information
The National Center for Biotechnology Information is part of the United States National Library of Medicine , a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper...
(and sister organizations in Europe and Japan) house the gene sequence in a database known as GenBank
GenBank
The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. This database is produced and maintained by the National Center for Biotechnology Information as part of the International Nucleotide Sequence...
, along with sequences of known and hypothetical genes and proteins. Other organizations, such as the Genome Bioinformatics Group at the University of California, Santa Cruz
University of California, Santa Cruz
The University of California, Santa Cruz, also known as UC Santa Cruz or UCSC, is a public, collegiate university; one of ten campuses in the University of California...
, and Ensembl
Ensembl
Ensembl is a joint scientific project between the European Bioinformatics Institute and the Wellcome Trust Sanger Institute, which was launched in 1999 in response to the imminent completion of the Human Genome Project...
present additional data and annotation and powerful tools for visualizing and searching it. Computer program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...
s have been developed to analyse the data, because the data itself is difficult to interpret without such programs.
The process of identifying the boundaries between genes and other features in a raw DNA sequence is called genome annotation and is the domain of bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...
. While expert biologists make the best annotators, their work proceeds slowly, and computer programs are increasingly used to meet the high-throughput demands of genome sequencing projects. The best current technologies for annotation make use of statistical models that take advantage of parallels between DNA sequences and human language
Language
Language may refer either to the specifically human capacity for acquiring and using complex systems of communication, or to a specific instance of such a system of complex communication...
, using concepts from computer science such as formal grammar
Formal grammar
A formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...
s.
All humans have unique gene sequences. Therefore the data published by the HGP does not represent the exact sequence of every individual's genome. It is the combined "reference genome" of a small number of anonymous donors. The HGP genome is a scaffold for future work in identifying differences among individuals. Most of the current effort in identifying differences among individuals involves single-nucleotide polymorphisms and the HapMap.
Findings
Key findings of the draft (2001) and complete (2004) genome sequences include http://www.genome.gov/120112381. There are approximately 30,000 genes in human beings, the same range as in mice and roundworms. Understanding how these genes express themselves will provide clues to how diseases are caused.
2. Between 1.1% to 1.4% of the genome sequence codes for proteins
3. The human genome has significantly more segmental duplication
Segmental duplication
Segmental duplications are segments of DNA with near-identical sequence.Segmental duplications give rise to low copy repeats and are believed to have played a role in creating new primate genes as reflected in human genetic variation...
s (nearly identical, repeated sections of DNA) than other mammalian genomes. These sections may underlie the creation of new primate-specific genes
4. At the time when the draft sequence was published less than 7% of protein families
Protein family
A protein family is a group of evolutionarily-related proteins, and is often nearly synonymous with gene family. The term protein family should not be confused with family as it is used in taxonomy....
appeared to be vertebrate specific
How it was accomplished
The Human Genome Project was started in 1989 with the goal of sequencing and identifying all three billion chemical units in the human genetic instruction set, finding the genetic roots of disease and then developing treatments. With the sequence in hand, the next step was to identify the genetic variants that increase the risk for common diseases like cancer and diabetes.It was far too expensive at that time to think of sequencing patients’ whole genomes. So the National Institutes of Health embraced the idea for a "shortcut", which was to look just at sites on the genome where many people have a variant DNA unit. The theory behind the shortcut was that since the major diseases are common, so too would be the genetic variants that caused them. Natural selection keeps the human genome free of variants that damage health before children are grown, the theory held, but fails against variants that strike later in life, allowing them to become quite common. (In 2002 the National Institutes of Health started a $138 million project called the HapMap to catalog the common variants in European, East Asian and African genomes.)
The genome was broken into smaller pieces; approximately 150,000 base pairs in length. These pieces were then ligated into a type of vector known as "bacterial artificial chromosome
Bacterial artificial chromosome
A bacterial artificial chromosome is a DNA construct, based on a functional fertility plasmid , used for transforming and cloning in bacteria, usually E. coli. F-plasmids play a crucial role because they contain partition genes that promote the even distribution of plasmids after bacterial cell...
s", or BACs, which are derived from bacterial chromosomes which have been genetically engineered. The vectors containing the genes can be inserted into bacteria where they are copied by the bacterial DNA replication
DNA replication
DNA replication is a biological process that occurs in all living organisms and copies their DNA; it is the basis for biological inheritance. The process starts with one double-stranded DNA molecule and produces two identical copies of the molecule...
machinery. Each of these pieces was then sequenced separately as a small "shotgun"
Shotgun sequencing
In genetics, shotgun sequencing, also known as shotgun cloning, is a method used for sequencing long DNA strands. It is named by analogy with the rapidly-expanding, quasi-random firing pattern of a shotgun....
project and then assembled. The larger, 150,000 base pairs go together to create chromosomes. This is known as the "hierarchical shotgun" approach, because the genome is first broken into relatively large chunks, which are then mapped to chromosomes before being selected for sequencing.
Funding came from the US government through the National Institutes of Health in the United States, and a UK charity organization, the Wellcome Trust
Wellcome Trust
The Wellcome Trust was established in 1936 as an independent charity funding research to improve human and animal health. With an endowment of around £13.9 billion, it is the United Kingdom's largest non-governmental source of funds for biomedical research...
, as well as numerous other groups from around the world. The funding supported a number of large sequencing centers including those at Whitehead Institute
Whitehead Institute
Founded in 1982, the Whitehead Institute for Biomedical Research is a non-profit research and teaching institution located in Cambridge, Massachusetts, USA....
, the Sanger Centre, Washington University in St. Louis
Washington University in St. Louis
Washington University in St. Louis is a private research university located in suburban St. Louis, Missouri. Founded in 1853, and named for George Washington, the university has students and faculty from all fifty U.S. states and more than 110 nations...
, and Baylor College of Medicine
Baylor College of Medicine
Baylor College of Medicine, located in the Texas Medical Center in Houston, Texas, USA, is a highly regarded medical school and leading center for biomedical research and clinical care...
.
The Human Genome Project is considered a Mega Project
Megaproject
A megaproject is an extremely large-scale investment project. Megaprojects are typically defined as costing more than US$1 billion and attracting a lot of public attention because of substantial impacts on communities, environment, and budgets. Megaprojects can also be defined as "initiatives that...
because the human genome has approximately 3.3 billion base-pairs.
If the sequence obtained was to be stored in book form, and if each page contained 1000 base-pairs recorded and each book contained 1000 pages, then 3300 such books would be needed in order to store the complete genome. However, if expressed in units of computer data storage, 3.3 billion base-pairs recorded at 2 bits per pair would equal 786 megabytes of raw data. This is comparable to a fully data loaded CD.
Public versus private approaches
In 1998, a similar, privately funded quest was launched by the American researcher Craig VenterCraig Venter
John Craig Venter is an American biologist and entrepreneur, most famous for his role in being one of the first to sequence the human genome and for his role in creating the first cell with a synthetic genome in 2010. Venter founded Celera Genomics, The Institute for Genomic Research and the J...
, and his firm Celera Genomics
Celera Genomics
Celera Corporation was a business unit of the Applera Corporation, but was spun off in July 2008 to become an independent publicly traded company. In May 2011 Quest Diagnostics Incorporated completed the acquisition of Celera, which thus became a wholly owned subsidiary...
. Venter was a scientist at the NIH during the early 1990s when the project was initiated. The $300,000,000 Celera effort was intended to proceed at a faster pace and at a fraction of the cost of the roughly $3 billion publicly funded project
Research funding
Research funding is a term generally covering any funding for scientific research, in the areas of both "hard" science and technology and social science. The term often connotes funding obtained through a competitive process, in which potential research projects are evaluated and only the most...
.
Celera used a technique called whole genome shotgun sequencing, employing pairwise end sequencing, which had been used to sequence bacterial genomes of up to six million base pairs in length, but not for anything nearly as large as the three billion base pair human genome.
Celera initially announced that it would seek patent protection on "only 200–300" genes, but later amended this to seeking "intellectual property protection" on "fully-characterized important structures" amounting to 100–300 targets. The firm eventually filed preliminary ("place-holder") patent applications on 6,500 whole or partial genes.
Celera also promised to publish their findings in accordance with the terms of the 1996 "Bermuda Statement
Bermuda Principles
The multinational effort to sequence the human genome generated vast quantities of data about the genetic make-up of humans and other organisms. But, in some respects, even more remarkable than the impressive quantity of data generated by the human genome project is the speed at which that data...
," by releasing new data annually (the HGP released its new data daily), although, unlike the publicly funded project, they would not permit free redistribution or scientific use of the data. The publicly funded competitor UC Santa Cruz was compelled to publish the first draft of the human genome before Celera for this reason. On July 7, 2000, the UCSC Genome Bioinformatics Group released a first working draft on the web. The scientific community downloaded one-half trillion bytes of information from the UCSC genome server in the first 24 hours of free and unrestricted access to the first ever assembled blueprint of our human species.
In March 2000, President Clinton
Bill Clinton
William Jefferson "Bill" Clinton is an American politician who served as the 42nd President of the United States from 1993 to 2001. Inaugurated at age 46, he was the third-youngest president. He took office at the end of the Cold War, and was the first president of the baby boomer generation...
announced that the genome sequence
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....
could not be patented, and should be made freely available to all researchers. The statement sent Celera's stock plummeting and dragged down the biotechnology
Biotechnology
Biotechnology is a field of applied biology that involves the use of living organisms and bioprocesses in engineering, technology, medicine and other fields requiring bioproducts. Biotechnology also utilizes these products for manufacturing purpose...
-heavy Nasdaq
NASDAQ
The NASDAQ Stock Market, also known as the NASDAQ, is an American stock exchange. "NASDAQ" originally stood for "National Association of Securities Dealers Automated Quotations". It is the second-largest stock exchange by market capitalization in the world, after the New York Stock Exchange. As of...
. The biotechnology sector lost about $50 billion in market capitalization
Market capitalization
Market capitalization is a measurement of the value of the ownership interest that shareholders hold in a business enterprise. It is equal to the share price times the number of shares outstanding of a publicly traded company...
in two days. But the public release of the data ensured its fair use and availability to all mankind.
Although the working draft was announced in June 2000, it was not until February 2001 that Celera and the HGP scientists published details of their drafts. Special issues of Nature
Nature (journal)
Nature, first published on 4 November 1869, is ranked the world's most cited interdisciplinary scientific journal by the Science Edition of the 2010 Journal Citation Reports...
(which published the publicly funded project's scientific paper
Academic publishing
Academic publishing describes the subfield of publishing which distributes academic research and scholarship. Most academic work is published in journal article, book or thesis form. The part of academic written output that is not formally published but merely printed up or posted is often called...
) and Science
Science (journal)
Science is the academic journal of the American Association for the Advancement of Science and is one of the world's top scientific journals....
(which published Celera's paper) described the methods used to produce the draft sequence and offered analysis of the sequence. These drafts covered about 83% of the genome (90% of the euchromatic regions with 150,000 gaps and the order and orientation of many segments not yet established). In February 2001, at the time of the joint publications, press releases
News release
A press release, news release, media release, press statement or video release is a written or recorded communication directed at members of the news media for the purpose of announcing something ostensibly newsworthy...
announced that the project had been completed by both groups. Improved drafts were announced in 2003 and 2005, filling in to ≈92% of the sequence currently.
The competition proved to be very good for the project, spurring the public groups to modify their strategy in order to accelerate progress. The rivals at UC Santa Cruz initially agreed to pool their data, but the agreement fell apart when Celera refused to deposit its data in the unrestricted public database GenBank
GenBank
The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. This database is produced and maintained by the National Center for Biotechnology Information as part of the International Nucleotide Sequence...
. Celera had incorporated the public data into their genome, but forbade the public effort to use Celera data.
HGP is the most well known of many international genome project
Genome project
Genome projects are scientific endeavours that ultimately aim to determine the complete genome sequence of an organism and to annotate protein-coding genes and other important genome-encoded features...
s aimed at sequencing the DNA of a specific organism. While the human DNA sequence
DNA sequence
The sequence or primary structure of a nucleic acid is the composition of atoms that make up the nucleic acid and the chemical bonds that bond those atoms. Because nucleic acids, such as DNA and RNA, are unbranched polymers, this specification is equivalent to specifying the sequence of...
offers the most tangible benefits, important developments in biology and medicine are predicted as a result of the sequencing of model organisms, including mice
Mouse
A mouse is a small mammal belonging to the order of rodents. The best known mouse species is the common house mouse . It is also a popular pet. In some places, certain kinds of field mice are also common. This rodent is eaten by large birds such as hawks and eagles...
, fruit flies
Drosophila melanogaster
Drosophila melanogaster is a species of Diptera, or the order of flies, in the family Drosophilidae. The species is known generally as the common fruit fly or vinegar fly. Starting from Charles W...
, zebrafish
Danio rerio
The zebrafish, Danio rerio, is a tropical freshwater fish belonging to the minnow family of order Cypriniformes. It is a popular aquarium fish, frequently sold under the trade name zebra danio, and is an important vertebrate model organism in scientific research.-Taxonomy:The zebrafish are...
, yeast
Yeast
Yeasts are eukaryotic micro-organisms classified in the kingdom Fungi, with 1,500 species currently described estimated to be only 1% of all fungal species. Most reproduce asexually by mitosis, and many do so by an asymmetric division process called budding...
, nematodes, plants
Arabidopsis thaliana
Arabidopsis thaliana is a small flowering plant native to Europe, Asia, and northwestern Africa. A spring annual with a relatively short life cycle, arabidopsis is popular as a model organism in plant biology and genetics...
, and many microbial organisms and parasites.
In 2004, researchers from the International Human Genome Sequencing Consortium (IHGSC) of the HGP announced a new estimate of 20,000 to 25,000 genes in the human genome. Previously 30,000 to 40,000 had been predicted, while estimates at the start of the project reached up to as high as 2,000,000. The number continues to fluctuate and it is now expected that it will take many years to agree on a precise value for the number of genes in the human genome.
Genome donors
In the IHGSC international public-sectorPublic sector
The public sector, sometimes referred to as the state sector, is a part of the state that deals with either the production, delivery and allocation of goods and services by and for the government or its citizens, whether national, regional or local/municipal.Examples of public sector activity range...
Human Genome Project (HGP), researchers collected blood (female) or sperm (male) samples from a large number of donors. Only a few of many collected samples were processed as DNA resources. Thus the donor identities were protected so neither donors nor scientists could know whose DNA was sequenced. DNA clones from many different libraries were used in the overall project, with most of those libraries being created by Dr. Pieter J. de Jong. Much of the sequence (>70%) of the reference genome
Reference genome
A reference genome is a digital nucleic acid sequence database, assembled by scientists as a representative example of a species' genetic code. As they are often assembled from the sequencing of DNA from a number of donors, reference genomes do not accurately represent the genetic code of any...
produced by the public HGP came from a single anonymous male donor from Buffalo, New York
Buffalo, New York
Buffalo is the second most populous city in the state of New York, after New York City. Located in Western New York on the eastern shores of Lake Erie and at the head of the Niagara River across from Fort Erie, Ontario, Buffalo is the seat of Erie County and the principal city of the...
(code name
Code name
A code name or cryptonym is a word or name used clandestinely to refer to another name or word. Code names are often used for military purposes, or in espionage...
RP11).
HGP scientists used white blood cell
White blood cell
White blood cells, or leukocytes , are cells of the immune system involved in defending the body against both infectious disease and foreign materials. Five different and diverse types of leukocytes exist, but they are all produced and derived from a multipotent cell in the bone marrow known as a...
s from the blood of two male and two female donors (randomly selected from 20 of each) -- each donor yielding a separate DNA library. One of these libraries (RP11) was used considerably more than others, due to quality considerations. One minor technical issue is that male samples contain just over half as much DNA from the sex chromosomes (one X chromosome
X chromosome
The X chromosome is one of the two sex-determining chromosomes in many animal species, including mammals and is common in both males and females. It is a part of the XY sex-determination system and X0 sex-determination system...
and one Y chromosome
Y chromosome
The Y chromosome is one of the two sex-determining chromosomes in most mammals, including humans. In mammals, it contains the gene SRY, which triggers testis development if present. The human Y chromosome is composed of about 60 million base pairs...
) compared to female samples (which contain two X chromosome
X chromosome
The X chromosome is one of the two sex-determining chromosomes in many animal species, including mammals and is common in both males and females. It is a part of the XY sex-determination system and X0 sex-determination system...
s). The other 22 chromosomes (the autosomes) are the same for both genders.
Although the main sequencing phase of the HGP has been completed, studies of DNA variation continue in the International HapMap Project
International HapMap Project
The International HapMap Project is an organization that aims to develop a haplotype map of the human genome, which will describe the common patterns of human genetic variation. HapMap is a key resource for researchers to find genetic variants affecting health, disease and responses to drugs and...
, whose goal is to identify patterns of single-nucleotide polymorphism (SNP) groups (called haplotype
Haplotype
A haplotype in genetics is a combination of alleles at adjacent locations on the chromosome that are transmitted together...
s, or “haps”). The DNA samples for the HapMap came from a total of 270 individuals: Yoruba people
Yoruba people
The Yoruba people are one of the largest ethnic groups in West Africa. The majority of the Yoruba speak the Yoruba language...
in Ibadan
Ibadan
Ibadan is the capital city of Oyo State and the third largest metropolitan area in Nigeria, after Lagos and Kano, with a population of 1,338,659 according to the 2006 census. Ibadan is also the largest metropolitan geographical area...
, Nigeria
Nigeria
Nigeria , officially the Federal Republic of Nigeria, is a federal constitutional republic comprising 36 states and its Federal Capital Territory, Abuja. The country is located in West Africa and shares land borders with the Republic of Benin in the west, Chad and Cameroon in the east, and Niger in...
; Japanese people
Japanese people
The are an ethnic group originating in the Japanese archipelago and are the predominant ethnic group of Japan. Worldwide, approximately 130 million people are of Japanese descent; of these, approximately 127 million are residents of Japan. People of Japanese ancestry who live in other countries...
in Tokyo
Tokyo
, ; officially , is one of the 47 prefectures of Japan. Tokyo is the capital of Japan, the center of the Greater Tokyo Area, and the largest metropolitan area of Japan. It is the seat of the Japanese government and the Imperial Palace, and the home of the Japanese Imperial Family...
; Han Chinese
Han Chinese
Han Chinese are an ethnic group native to China and are the largest single ethnic group in the world.Han Chinese constitute about 92% of the population of the People's Republic of China , 98% of the population of the Republic of China , 78% of the population of Singapore, and about 20% of the...
in Beijing
Beijing
Beijing , also known as Peking , is the capital of the People's Republic of China and one of the most populous cities in the world, with a population of 19,612,368 as of 2010. The city is the country's political, cultural, and educational center, and home to the headquarters for most of China's...
; and the French Centre d’Etude du Polymorphisms Humain (CEf) resource, which consisted of residents of the United States having ancestry from Western and Northern Europe
Northern Europe
Northern Europe is the northern part or region of Europe. Northern Europe typically refers to the seven countries in the northern part of the European subcontinent which includes Denmark, Estonia, Latvia, Lithuania, Norway, Finland and Sweden...
.
In the Celera Genomics
Celera Genomics
Celera Corporation was a business unit of the Applera Corporation, but was spun off in July 2008 to become an independent publicly traded company. In May 2011 Quest Diagnostics Incorporated completed the acquisition of Celera, which thus became a wholly owned subsidiary...
private-sector
Private sector
In economics, the private sector is that part of the economy, sometimes referred to as the citizen sector, which is run by private individuals or groups, usually as a means of enterprise for profit, and is not controlled by the state...
project, DNA from five different individuals were used for sequencing. The lead scientist of Celera Genomics at that time, Craig Venter
Craig Venter
John Craig Venter is an American biologist and entrepreneur, most famous for his role in being one of the first to sequence the human genome and for his role in creating the first cell with a synthetic genome in 2010. Venter founded Celera Genomics, The Institute for Genomic Research and the J...
, later acknowledged (in a public letter to the journal Science
Science (journal)
Science is the academic journal of the American Association for the Advancement of Science and is one of the world's top scientific journals....
) that his DNA was one of 21 samples in the pool, five of which were selected for use.
On September 4, 2007, a team led by Craig Venter
Craig Venter
John Craig Venter is an American biologist and entrepreneur, most famous for his role in being one of the first to sequence the human genome and for his role in creating the first cell with a synthetic genome in 2010. Venter founded Celera Genomics, The Institute for Genomic Research and the J...
published his complete DNA sequence, unveiling the six-billion-nucleotide genome of a single individual for the first time.
Benefits
The work on interpretation of genome data is still in its initial stages. It is anticipated that detailed knowledge of the human genome will provide new avenues for advances in medicineMedicine
Medicine is the science and art of healing. It encompasses a variety of health care practices evolved to maintain and restore health by the prevention and treatment of illness....
and biotechnology
Biotechnology
Biotechnology is a field of applied biology that involves the use of living organisms and bioprocesses in engineering, technology, medicine and other fields requiring bioproducts. Biotechnology also utilizes these products for manufacturing purpose...
. Clear practical results of the project emerged even before the work was finished. For example, a number of companies, such as Myriad Genetics
Myriad Genetics
Myriad Genetics, Inc. is a molecular diagnostic company based in Salt Lake City, Utah. Myriad employs a number of proprietary technologies that permit doctors and patients to understand the genetic basis of human disease and the role that genes play in the onset, progression and treatment of disease...
started offering easy ways to administer genetic tests that can show predisposition to a variety of illnesses, including breast cancer
Breast cancer
Breast cancer is cancer originating from breast tissue, most commonly from the inner lining of milk ducts or the lobules that supply the ducts with milk. Cancers originating from ducts are known as ductal carcinomas; those originating from lobules are known as lobular carcinomas...
, hemostasis disorders, cystic fibrosis
Cystic fibrosis
Cystic fibrosis is a recessive genetic disease affecting most critically the lungs, and also the pancreas, liver, and intestine...
, liver
Liver
The liver is a vital organ present in vertebrates and some other animals. It has a wide range of functions, including detoxification, protein synthesis, and production of biochemicals necessary for digestion...
diseases and many others. Also, the etiologies
Etiology
Etiology is the study of causation, or origination. The word is derived from the Greek , aitiologia, "giving a reason for" ....
for cancer
Cancer
Cancer , known medically as a malignant neoplasm, is a large group of different diseases, all involving unregulated cell growth. In cancer, cells divide and grow uncontrollably, forming malignant tumors, and invade nearby parts of the body. The cancer may also spread to more distant parts of the...
s, Alzheimer's disease
Alzheimer's disease
Alzheimer's disease also known in medical literature as Alzheimer disease is the most common form of dementia. There is no cure for the disease, which worsens as it progresses, and eventually leads to death...
and other areas of clinical interest are considered likely to benefit from genome information and possibly may lead in the long term to significant advances in their management.
There are also many tangible benefits for biological scientists. For example, a researcher investigating a certain form of cancer
Cancer
Cancer , known medically as a malignant neoplasm, is a large group of different diseases, all involving unregulated cell growth. In cancer, cells divide and grow uncontrollably, forming malignant tumors, and invade nearby parts of the body. The cancer may also spread to more distant parts of the...
may have narrowed down his/her search to a particular gene. By visiting the human genome database on the World Wide Web
World Wide Web
The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...
, this researcher can examine what other scientists have written about this gene, including (potentially) the three-dimensional structure of its product, its function(s), its evolutionary relationships to other human genes, or to genes in mice or yeast or fruit flies, possible detrimental mutations, interactions with other genes, body tissues in which this gene is activated, and diseases associated with this gene or other datatypes.
Further, deeper understanding of the disease processes at the level of molecular biology may determine new therapeutic procedures. Given the established importance of DNA in molecular biology and its central role in determining the fundamental operation of cellular processes
Biological process
A biological process is a process of a living organism. Biological processes are made up of any number of chemical reactions or other events that results in a transformation....
, it is likely that expanded knowledge in this area will facilitate medical advances in numerous areas of clinical interest that may not have been possible without them.
The analysis of similarities between DNA sequences from different organisms is also opening new avenues in the study of evolution
Evolution
Evolution is any change across successive generations in the heritable characteristics of biological populations. Evolutionary processes give rise to diversity at every level of biological organisation, including species, individual organisms and molecules such as DNA and proteins.Life on Earth...
. In many cases, evolutionary questions can now be framed in terms of molecular biology
Molecular biology
Molecular biology is the branch of biology that deals with the molecular basis of biological activity. This field overlaps with other areas of biology and chemistry, particularly genetics and biochemistry...
; indeed, many major evolutionary milestones (the emergence of the ribosome
Ribosome
A ribosome is a component of cells that assembles the twenty specific amino acid molecules to form the particular protein molecule determined by the nucleotide sequence of an RNA molecule....
and organelle
Organelle
In cell biology, an organelle is a specialized subunit within a cell that has a specific function, and is usually separately enclosed within its own lipid bilayer....
s, the development of embryo
Embryo
An embryo is a multicellular diploid eukaryote in its earliest stage of development, from the time of first cell division until birth, hatching, or germination...
s with body plans, the vertebrate
Vertebrate
Vertebrates are animals that are members of the subphylum Vertebrata . Vertebrates are the largest group of chordates, with currently about 58,000 species described. Vertebrates include the jawless fishes, bony fishes, sharks and rays, amphibians, reptiles, mammals, and birds...
immune system
Immune system
An immune system is a system of biological structures and processes within an organism that protects against disease by identifying and killing pathogens and tumor cells. It detects a wide variety of agents, from viruses to parasitic worms, and needs to distinguish them from the organism's own...
) can be related to the molecular level. Many questions about the similarities and differences between humans and our closest relatives (the primate
Primate
A primate is a mammal of the order Primates , which contains prosimians and simians. Primates arose from ancestors that lived in the trees of tropical forests; many primate characteristics represent adaptations to life in this challenging three-dimensional environment...
s, and indeed the other mammal
Mammal
Mammals are members of a class of air-breathing vertebrate animals characterised by the possession of endothermy, hair, three middle ear bones, and mammary glands functional in mothers with young...
s) are expected to be illuminated by the data from this project.
The Human Genome Diversity Project
Human Genome Diversity Project
The Human Genome Diversity Project was started by Stanford University's Morrison Institute and a collaboration of scientists around the world. It is the result of many years of work by Luigi Cavalli-Sforza, one of the most cited scientists in the world, which has published extensively in the use...
(HGDP), spinoff research aimed at mapping the DNA that varies between human ethnic group
Ethnic group
An ethnic group is a group of people whose members identify with each other, through a common heritage, often consisting of a common language, a common culture and/or an ideology that stresses common ancestry or endogamy...
s, which was rumored to have been halted, actually did continue and to date has yielded new conclusions. In the future, HGDP could possibly expose new data in disease surveillance
Clinical surveillance
Clinical surveillance refers to the surveillance of health data about a clinical syndrome that has a significant impact on public health, which is then used to drive decisions about health policy and health education...
, human development
Human development (biology)
Human development is the process of growing to maturity. In biological terms, this entails growth from a one-celled zygote to an adult human being.- Biological development:...
and anthropology
Anthropology
Anthropology is the study of humanity. It has origins in the humanities, the natural sciences, and the social sciences. The term "anthropology" is from the Greek anthrōpos , "man", understood to mean mankind or humanity, and -logia , "discourse" or "study", and was first used in 1501 by German...
. HGDP could unlock secrets behind and create new strategies for managing the vulnerability of ethnic group
Ethnic group
An ethnic group is a group of people whose members identify with each other, through a common heritage, often consisting of a common language, a common culture and/or an ideology that stresses common ancestry or endogamy...
s to certain disease
Disease
A disease is an abnormal condition affecting the body of an organism. It is often construed to be a medical condition associated with specific symptoms and signs. It may be caused by external factors, such as infectious disease, or it may be caused by internal dysfunctions, such as autoimmune...
s (see race in biomedicine). It could also show how human population
Population
A population is all the organisms that both belong to the same group or species and live in the same geographical area. The area that is used to define a sexual population is such that inter-breeding is possible between any pair within the area and more probable than cross-breeding with individuals...
s have adapted to these vulnerabilities.
Advantages of Human Genome Project:
- Knowledge of the effects of variation of DNA among individuals can revolutionize the ways to diagnose, treat and even prevent a number of diseases that affects the human beings.
- It provides clues to the understanding of human biology.
Ethical, legal and social issues
The project's goals included not only identifying all of the approximately 24,000 genes in the human genome, but also to address the ethical, legal, and social issues (ELSI) that might arise from the availability of genetic information. Five percent of the annual budget was allocated to address the ELSI arising from the project.Debra Harry, Executive Director of the U.S group Indigenous Peoples Council on Biocolonialism (IPCB), says that despite a decade of ELSI funding, the burden of genetics education has fallen on the tribes themselves to understand the motives of Human genome project and its potential impacts on their lives. Meanwhile, the government has been busily funding projects studying indigenous groups without any meaningful consultation with the groups. (See Biopiracy
Biopiracy
- Biopiracy and bioprospecting :Bioprospecting is an umbrella term describing the discovery of new and useful biological samples and mechanisms, typically in less-developed countries, either with or without the help of indigenous knowledge, and with or without compensation...
.)
The main criticism of ELSI is the failure to address the conditions raised by population-based research, especially with regard to unique processes for group decision-making and cultural worldviews. Genetic variation research such as HGP is group population research, but most ethical guidelines, according to Harry, focus on individual rights instead of group rights. She says the research represents a clash of culture: indigenous people's life revolves around collectivity and group decision making whereas the Western culture promotes individuality. Harry suggests that one of the challenges of ethical research is to include respect for collective review and decision making, while also upholding the Western model of individual rights.
See also
- Chimpanzee Genome ProjectChimpanzee Genome ProjectThe Chimpanzee Genome Project is an effort to determine the DNA sequence of the Chimpanzee genome. It is expected that by comparing the genomes of humans and other apes, it will be possible to better understand what makes humans distinct from other species....
- Craig Venter's Genome
- EuroPhysiomeEuroPhysiomeThe Europhysiome initiative is coordinated by the Coordination action, which aims to establish a better coordination between European Physiome projects. The term is also used to indicate collectively all Europe-based Physiome projects such as the [Renal Physiome Project], the [Giome Project], the...
- Gene patent
- Genome projectGenome projectGenome projects are scientific endeavours that ultimately aim to determine the complete genome sequence of an organism and to annotate protein-coding genes and other important genome-encoded features...
- Human Connectome ProjectConnectomicsConnectomics is a high-throughput application of neural imaging and histological techniques in order to increase the speed, efficiency, and resolution of maps of the multitude of neural connections in a nervous system...
- Human Cytome Project
- Human genomeHuman genomeThe human genome is the genome of Homo sapiens, which is stored on 23 chromosome pairs plus the small mitochondrial DNA. 22 of the 23 chromosomes are autosomal chromosome pairs, while the remaining pair is sex-determining...
- Human microbiome projectHuman microbiome projectThe Human Microbiome Project is a United States National Institutes of Health initiative with the goal of identifying and characterizing the microorganisms which are found in association with both healthy and diseased humans . Launched in 2008, it is a five-year project, best characterized as a...
- Human Variome ProjectHuman Variome ProjectThe Human Variome Project is the global initiative to collect and curate all human genetic variation affecting human health. Its mission is to improve health outcomes by facilitating the unification of data on human genetic variation and its impact on human health.-Inception:The HVP concept was...
- HUGO Gene Nomenclature CommitteeHUGO Gene Nomenclature CommitteeThe HUGO Gene Nomenclature Committee approves a unique and meaningful name for every known human gene based on a query of experts. In addition to a long name, the HGNC also assigns an abbreviation to every gene...
- International HapMap ProjectInternational HapMap ProjectThe International HapMap Project is an organization that aims to develop a haplotype map of the human genome, which will describe the common patterns of human genetic variation. HapMap is a key resource for researchers to find genetic variants affecting health, disease and responses to drugs and...
- National Human Genome Research InstituteNational Human Genome Research InstituteThe National Human Genome Research Institute is a division of the National Institutes of Health, located in Bethesda, Maryland.NHGRI began as the National Center for Human Genome Research , which was established in 1989 to carry out the role of the NIH in the International Human Genome Project...
- Neanderthal Genome ProjectNeanderthal Genome ProjectThe Neanderthal genome project is a collaboration of scientists coordinated by the Max Planck Institute for Evolutionary Anthropology in Germany and 454 Life Sciences in the United States to sequence the Neanderthal genome....
- Personal Genome ProjectPersonal Genome ProjectThe Personal Genome Project is a long term, large cohort study which aims to sequence and publicize the complete genomes and medical records of 100,000 volunteers, in order to enable research into personalized medicine. It was initiated by Harvard University's George Church and announced in...
- Sanger InstituteSanger InstituteThe Wellcome Trust Sanger Institute is a non-profit, British genomics and genetics research institute, primarily funded by the Wellcome Trust....
- The 1000 Genomes ProjectThe 1000 Genomes ProjectThe 1000 Genomes Project, launched in January 2008, is an international research effort to establish by far the most detailed catalogue of human genetic variation...
- Genographic ProjectThe Genographic ProjectThe Genographic Project, launched on April 13, 2005 by the National Geographic Society and IBM, is a multi-year genetic anthropology study that aims to map historical human migration patterns by collecting and analyzing DNA samples from hundreds of thousands of people from around the...
Further reading
- Victor K. McElheny. Drawing the Map of Life: Inside the Human Genome Project (Basic Books; 2010) 361 pages. Examines the intellectual origins, history, and motivations of the project to map the human genome; draws on interviews with key figures.
External links
- Human Genome Project official information page
- Delaware Valley Personalized Medicine Project Uses data from the Human Genome Project to help make medicine personal
- National Human Genome Research Institute (NHGRI). NHGRI led the National Institutes of Health's (NIH's) contribution to the International Human Genome Project. This project, which had as its primary goal the sequencing of the three thousand million base pairs that make up human genome, was successfully completed in April 2003.
- Human Genome News. Published from 1989 to 2002 by the US Department of Energy, this newsletter was a major communications method for coordination of the Human Genome Project. Complete online archives are available.
- Project GutenbergProject GutenbergProject Gutenberg is a volunteer effort to digitize and archive cultural works, to "encourage the creation and distribution of eBooks". Founded in 1971 by Michael S. Hart, it is the oldest digital library. Most of the items in its collection are the full texts of public domain books...
hosts e-texts for Human Genome Project, titled Human Genome Project, Chromosome Number # (# denotes 01-22, X and Y). This information is raw sequence, released in November 2002; access to entry pages with download links is available through http://www.gutenberg.org/etext/3501 for Chromosome 1 sequentially to http://www.gutenberg.org/etext/3524 for the Y Chromosome. Note that this sequence might not be considered definitive due to ongoing revisions and refinements. In addition to the chromosome files, there is a supplementary information file dated March 2004 which contains additional sequence information. - The HGP information pages Department of Energy's portal to the international Human Genome Project, Microbial Genome Program, and Genomics:GTL systems biology for energy and environment
- yourgenome.org: The Sanger Institute public information pages has general and detailed primers on DNA, genes and genomes, the Human Genome Project and science spotlights.
- Ensembl project, an automated annotation system and browser for the human genome
- UCSC genome browser, This site contains the reference sequence and working draft assemblies for a large collection of genomes. It also provides a portal to the ENCODE project.
- Nature magazine's human genome gateway, including the HGP's paper on the draft genome sequence
- Wellcome Trust Human Genome website A free resource allowing you to explore the human genome, your health and your future.
- Learning about the Human Genome. Part 1: Challenge to Science Educators. ERIC Digest.
- Learning about the Human Genome. Part 2: Resources for Science Educators. ERIC Digest.
- Patenting Life by Merrill Goozner
- Prepared Statement of Craig Venter of Celera Venter discusses Celera's progress in deciphering the human genome sequence and its relationship to healthcare and to the federally funded Human Genome Project.
- Cracking the Code of Life Companion website to 2-hour NOVA program documenting the race to decode the genome, including the entire program hosted in 16 parts in either QuickTimeQuickTimeQuickTime is an extensible proprietary multimedia framework developed by Apple Inc., capable of handling various formats of digital video, picture, sound, panoramic images, and interactivity. The classic version of QuickTime is available for Windows XP and later, as well as Mac OS X Leopard and...
or RealPlayerRealPlayerRealPlayer is a cross-platform media player by RealNetworks that plays a number of multimedia formats including MP3, MPEG-4, QuickTime, Windows Media, and multiple versions of proprietary RealAudio and RealVideo formats.-History:...
format.