UCSC Genome Browser
Encyclopedia
The University of California, Santa Cruz
(UCSC) Genome Browser
is an up-to-date source for genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations. The Browser is a graphical viewer optimized to support fast interactive performance and is an open-source, web-based tool suite built on top of a MySQL database for rapid visualization, examination, and querying of the data at many levels. The Genome Browser Database, browsing tools, downloadable data files, and documentation can all be found on the UCSC Genome Bioinformatics website.
, then a graduate student,
and David Haussler
, professor of Computer Science (now Biomolecular Engineering)
at the University of California, Santa Cruz
in 2000, the UCSC Genome Browser began as a resource for the distribution of the initial fruits of the Human Genome Project
. Funded by the Howard Hughes Medical Institute
and the National Human Genome Research Institute, NHGRI (one of the US National Institutes of Health
), the browser offered a graphical display of the first full-chromosome draft assembly of human genome sequence. Today the browser is used by geneticists, molecular biologists and physicians as well as students and teachers of evolution for access to genomic information.
genome sequences of all vertebrate species and selected invertebrates for which
high-coverage genomic
sequences is available, now including 46
species. High coverage is necessary to allow overlap to guide the
construction of larger contiguous regions.
Genomic sequences with less coverage are included in multiple-alignment tracks on
some browsers, but the fragmented nature of these
assemblies does not make them suitable for building full featured browsers.
(more below on multiple-alignment tracks). The species hosted with full-featured
genome browsers are shown in the table.
in the literature makes it necessary to collect and digest information
using the tools of bioinformatics
. The UCSC Genome Browser presents a
diverse collection of annotation datasets (known as "tracks" and
presented graphically), including
mRNA alignments, mappings of DNA repeat elements, gene predictions,
gene-expression data, disease-association data (representing the
relationships of genes to diseases), and mappings of commercially available
gene chips (e.g., Illumina and Agilent). The basic paradigm of display is
to show the genome sequence in the horizontal dimension, and show graphical
representations of the locations of the mRNAs, gene predictions, etc.
Blocks of color along the coordinate axis show the locations of the
alignments of the various data types. The ablitiy to show this large
variety of data types on a single coordinate axis makes the browser a
handy tool for the vertical integration of the data.
To find a specific gene or genomic region, the user may type in the gene name,
(e.g., BRCA1)
an accession number for an RNA, the name of a genomic cytological band
(e.g., 20p13 for band 13 on the short arm of chr20) or a chromosomal position
(chr17:38,450,000-38,531,000 for the region around the gene
BRCA1
)
Presenting the data in the graphical format allows the browser to present link
access to detailed information about any of the annotations. The
gene details page
of the UCSC Genes track provides a large number of links to more specific information
about the gene at many other data resources,
such as Online Mendelian Inheritance in Man (OMIM) and SwissProt.
Designed for the presentation of complex and voluminous data, the UCSC Browser
is optimized for speed. By pre-aligning the 55 million RNAs of GenBank
to each of
the 81 genome assemblies (many of the 46 species have more than one assembly),
the browser allows instant access to the alignments of any RNA to any
of the hosted species.
The juxtaposition of the many types
of data allow researchers to display exactly the combination
of data that will answer specific questions. A pdf/postscript
output functionality allows export of a camera-ready image
for publication in academic journals.
One unique and useful feature that distinguishes the UCSC Browser from other
genome browsers is the continuously variable nature of the display.
Sequence of any size can displayed, from a single DNA base up to the entire chromosome
(human chr1 = 245 million bases, Mb) with full annotation tracks.
Researchers can display a single gene, a single
exon, or an entire chromosome band, showing dozens or hundreds of genes and any
combination of the many annotations. A convenient drag-and-zoom feature allows the user to
choose any region in the genome image and
expand it to occupy the full screen.
Researchers may also use the browser to display their own data via the Custom Tracks tool.
This feature allows users to upload a file of their own data and view the data in the
context of the reference genome assembly. Users may also
use the data hosted by UCSC, creating subsets of the data of their choosing with the
Table Browser tool (such as only the SNPs that change
the amino acid sequence of a protein) and display
this specific subset of the data in the browser as a Custom Track.
Any browser view created by a user, including those containing Custom Tracks, may be
shared with other users via the Saved Sessions tool.
release of the dbSNP database from
NCBI
are mapped to human, mouse and other genomes.
This includes the fruits of the
1000 Genomes Project, as soon as they are released in dbSNP. Other types of variation data
include copy-number variation data (CNV) and human population allele
frequencies from the HapMap
project.
The Genome Browser offers a unique set of comparative-genomic data for most of the
species hosted on the site. The comparative alignments give a graphical view of the
evolutionary relationships among species. This makes it a useful tool both for
the researcher, who can visualize regions of conservation among a group of species
and make predictions about functional elements in unknown DNA regions, and
in the classroom as a tool to illustrate one of the most
compelling arguments for the evolution of species. The 44-way comparative
track on the human assembly clearly shows that the farther one goes back in
evolutionary time, the less sequence homology remains, but functionally important
regions of the genome (e.g., exons and control elements, but not introns typically)
are conserved much farther back in evolutionary time.
tools, including a full-featured
GUI interface for mining the information in the browser database (the
Table Browser),
a fast sequence alignment tool (BLAT
)
that is also useful for simply finding sequences
in the massive sequence (human genome = 2.8 billion bases, Gb) of any of the featured genomes.
A liftOver tool uses whole-genome alignments to allow conversion of sequences from one
assembly to another or between species. The Genome Graphs
tool allows users
to view all chromosomes at once and display the results of genome-wide association studies (GWAS).
The Gene Sorter displays genes grouped by parameters not linked
to genome location, such as expression pattern in tissues.
Click to download the spreadsheet:
ucscLinks.xls
Careful use of Excel's "copy" and "move" functions should allow the links on this sheet to be used without modification.
This example shows how to create a link that turns on specific tracks of interest. In this case, three tracks are explicitly turned on:
Database of Genomic Variants (table: dgv)
UCSC Genes (table: knownGene)
OMIM Genes (table: omimGene)
Each track is set to "pack" in the link as follows:
dgv=pack
knownGene=pack
omimGene=pack
Any track that has been open in a session will remain in the view when the new browser window opens.
A new track can be added using the tableName and a visibility of choice:
&snp131=dense
Simply add to the end of the url any other desired tableName=visibility, connected to the url by an ampersand (&). The simplest way to learn the name of the table underlying a track is to do a mouseover in a Genome Browser image and read the url at the bottom of the browser page. The table is shown in the url as
g=tableName
Visibility options include:
hide
dense
squish
pack
full
locally by many research groups, allowing private display of data in the context of
the public data. The UCSC
Browser is mirrored at several locations world-wide, as shown in the table.
The Browser code is also used in separate installations by the
UCSC Malaria Genome Browser
and the Archaea Browser.
University of California, Santa Cruz
The University of California, Santa Cruz, also known as UC Santa Cruz or UCSC, is a public, collegiate university; one of ten campuses in the University of California...
(UCSC) Genome Browser
is an up-to-date source for genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations. The Browser is a graphical viewer optimized to support fast interactive performance and is an open-source, web-based tool suite built on top of a MySQL database for rapid visualization, examination, and querying of the data at many levels. The Genome Browser Database, browsing tools, downloadable data files, and documentation can all be found on the UCSC Genome Bioinformatics website.
History
Initially built and still managed by Jim KentJim Kent
William James Kent is an American research scientist and computer programmer. He has been a contributor to genome database projects and the 2003 winner of the Benjamin Franklin Award .- Early life :...
, then a graduate student,
and David Haussler
David Haussler
David Haussler is a Howard Hughes Medical Institute Investigator. He is also Professor of Biomolecular Engineering and Director of the Center for Biomolecular Science and Engineering at the University of California, Santa Cruz; director of the California Institute for Quantitative Biosciences on...
, professor of Computer Science (now Biomolecular Engineering)
at the University of California, Santa Cruz
University of California, Santa Cruz
The University of California, Santa Cruz, also known as UC Santa Cruz or UCSC, is a public, collegiate university; one of ten campuses in the University of California...
in 2000, the UCSC Genome Browser began as a resource for the distribution of the initial fruits of the Human Genome Project
Genome project
Genome projects are scientific endeavours that ultimately aim to determine the complete genome sequence of an organism and to annotate protein-coding genes and other important genome-encoded features...
. Funded by the Howard Hughes Medical Institute
Howard Hughes Medical Institute
Howard Hughes Medical Institute is a United States non-profit medical research organization based in Chevy Chase, Maryland. It was founded by the American businessman Howard Hughes in 1953. It is one of the largest private funding organizations for biological and medical research in the United...
and the National Human Genome Research Institute, NHGRI (one of the US National Institutes of Health
National Institutes of Health
The National Institutes of Health are an agency of the United States Department of Health and Human Services and are the primary agency of the United States government responsible for biomedical and health-related research. Its science and engineering counterpart is the National Science Foundation...
), the browser offered a graphical display of the first full-chromosome draft assembly of human genome sequence. Today the browser is used by geneticists, molecular biologists and physicians as well as students and teachers of evolution for access to genomic information.
Genomes
In the years since its inception, the UCSC Browser has expanded to accommodategenome sequences of all vertebrate species and selected invertebrates for which
high-coverage genomic
sequences is available, now including 46
species. High coverage is necessary to allow overlap to guide the
construction of larger contiguous regions.
Genomic sequences with less coverage are included in multiple-alignment tracks on
some browsers, but the fragmented nature of these
assemblies does not make them suitable for building full featured browsers.
(more below on multiple-alignment tracks). The species hosted with full-featured
genome browsers are shown in the table.
primates | non-primate mammals | non-mammal chordates | invertebrates |
---|---|---|---|
human | mouse | chicken | lancelet |
chimpanzee | rat | zebra finch | sea squirt |
orangutan | guinea pig | lizard | sea urchin |
rhesus macaque | rabbit | frog (Xenopus tropicalis) | 11 Drosophila flies |
marmoset | cat | zebrafish | mosquito |
dog | Tetraodon (pufferfish) | honey bee | |
panda | Fugu (pufferfish) | C. elegans + 5 other worms | |
horse | stickleback | Aplysia (sea hare) | |
pig | medaka | yeast | |
cow | lamprey | ||
elephant | |||
opossum | |||
platypus |
Browser Functionality
The large amount of data about biological systems that is accumulatingin the literature makes it necessary to collect and digest information
using the tools of bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...
. The UCSC Genome Browser presents a
diverse collection of annotation datasets (known as "tracks" and
presented graphically), including
mRNA alignments, mappings of DNA repeat elements, gene predictions,
gene-expression data, disease-association data (representing the
relationships of genes to diseases), and mappings of commercially available
gene chips (e.g., Illumina and Agilent). The basic paradigm of display is
to show the genome sequence in the horizontal dimension, and show graphical
representations of the locations of the mRNAs, gene predictions, etc.
Blocks of color along the coordinate axis show the locations of the
alignments of the various data types. The ablitiy to show this large
variety of data types on a single coordinate axis makes the browser a
handy tool for the vertical integration of the data.
To find a specific gene or genomic region, the user may type in the gene name,
(e.g., BRCA1)
an accession number for an RNA, the name of a genomic cytological band
(e.g., 20p13 for band 13 on the short arm of chr20) or a chromosomal position
(chr17:38,450,000-38,531,000 for the region around the gene
BRCA1
BRCA1
BRCA1 is a human caretaker gene that produces a protein called breast cancer type 1 susceptibility protein, responsible for repairing DNA. The first evidence for the existence of the gene was provided by the King laboratory at UC Berkeley in 1990...
)
Presenting the data in the graphical format allows the browser to present link
access to detailed information about any of the annotations. The
gene details page
of the UCSC Genes track provides a large number of links to more specific information
about the gene at many other data resources,
such as Online Mendelian Inheritance in Man (OMIM) and SwissProt.
Designed for the presentation of complex and voluminous data, the UCSC Browser
is optimized for speed. By pre-aligning the 55 million RNAs of GenBank
GenBank
The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. This database is produced and maintained by the National Center for Biotechnology Information as part of the International Nucleotide Sequence...
to each of
the 81 genome assemblies (many of the 46 species have more than one assembly),
the browser allows instant access to the alignments of any RNA to any
of the hosted species.
The juxtaposition of the many types
of data allow researchers to display exactly the combination
of data that will answer specific questions. A pdf/postscript
output functionality allows export of a camera-ready image
for publication in academic journals.
One unique and useful feature that distinguishes the UCSC Browser from other
genome browsers is the continuously variable nature of the display.
Sequence of any size can displayed, from a single DNA base up to the entire chromosome
(human chr1 = 245 million bases, Mb) with full annotation tracks.
Researchers can display a single gene, a single
exon, or an entire chromosome band, showing dozens or hundreds of genes and any
combination of the many annotations. A convenient drag-and-zoom feature allows the user to
choose any region in the genome image and
expand it to occupy the full screen.
Researchers may also use the browser to display their own data via the Custom Tracks tool.
This feature allows users to upload a file of their own data and view the data in the
context of the reference genome assembly. Users may also
use the data hosted by UCSC, creating subsets of the data of their choosing with the
Table Browser tool (such as only the SNPs that change
the amino acid sequence of a protein) and display
this specific subset of the data in the browser as a Custom Track.
Any browser view created by a user, including those containing Custom Tracks, may be
shared with other users via the Saved Sessions tool.
Variation data
Many types of variation data are also displayed. For example, the entire contents of eachrelease of the dbSNP database from
NCBI
National Center for Biotechnology Information
The National Center for Biotechnology Information is part of the United States National Library of Medicine , a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper...
are mapped to human, mouse and other genomes.
This includes the fruits of the
1000 Genomes Project, as soon as they are released in dbSNP. Other types of variation data
include copy-number variation data (CNV) and human population allele
frequencies from the HapMap
International HapMap Project
The International HapMap Project is an organization that aims to develop a haplotype map of the human genome, which will describe the common patterns of human genetic variation. HapMap is a key resource for researchers to find genetic variants affecting health, disease and responses to drugs and...
project.
The Genome Browser offers a unique set of comparative-genomic data for most of the
species hosted on the site. The comparative alignments give a graphical view of the
evolutionary relationships among species. This makes it a useful tool both for
the researcher, who can visualize regions of conservation among a group of species
and make predictions about functional elements in unknown DNA regions, and
in the classroom as a tool to illustrate one of the most
compelling arguments for the evolution of species. The 44-way comparative
track on the human assembly clearly shows that the farther one goes back in
evolutionary time, the less sequence homology remains, but functionally important
regions of the genome (e.g., exons and control elements, but not introns typically)
are conserved much farther back in evolutionary time.
Analysis tools
More than simply a genome browser, the UCSC site hosts a set of genome analysistools, including a full-featured
GUI interface for mining the information in the browser database (the
Table Browser),
a fast sequence alignment tool (BLAT
)
that is also useful for simply finding sequences
in the massive sequence (human genome = 2.8 billion bases, Gb) of any of the featured genomes.
A liftOver tool uses whole-genome alignments to allow conversion of sequences from one
assembly to another or between species. The Genome Graphs
tool allows users
to view all chromosomes at once and display the results of genome-wide association studies (GWAS).
The Gene Sorter displays genes grouped by parameters not linked
to genome location, such as expression pattern in tissues.
Creating spreadsheet links to UCSC Genome Browser views
Many users of the Genome Browser gather data of their own in Excel spreadsheets and would like to create links to the Browser using data in the spreadsheet. For example, a clinical geneticist may have lists of regions for a patient that are duplicated or deleted, as determined by comparative genomic hybridization (CGH). These regions can be the source information for a browser view allowing access to each region with a single click.Click to download the spreadsheet:
ucscLinks.xls
Careful use of Excel's "copy" and "move" functions should allow the links on this sheet to be used without modification.
Customizing the links
The contents of the last cell in the image above (cell G22 in the actual spreadsheet) are as follows:
=HYPERLINK("http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&position="&E22&"&dgv=pack&knownGene=pack&omimGene=pack","ucsc")
This example shows how to create a link that turns on specific tracks of interest. In this case, three tracks are explicitly turned on:
Database of Genomic Variants (table: dgv)
UCSC Genes (table: knownGene)
OMIM Genes (table: omimGene)
Each track is set to "pack" in the link as follows:
dgv=pack
knownGene=pack
omimGene=pack
Any track that has been open in a session will remain in the view when the new browser window opens.
A new track can be added using the tableName and a visibility of choice:
&snp131=dense
Simply add to the end of the url any other desired tableName=visibility, connected to the url by an ampersand (&). The simplest way to learn the name of the table underlying a track is to do a mouseover in a Genome Browser image and read the url at the bottom of the browser page. The table is shown in the url as
g=tableName
Visibility options include:
hide
dense
squish
pack
full
Open Source / Mirrors
The UCSC Browser code base is open-source for non-commercial use, and is mirroredlocally by many research groups, allowing private display of data in the context of
the public data. The UCSC
Browser is mirrored at several locations world-wide, as shown in the table.
mirror sites |
---|
Medical College of Wisconsin |
Cornell University, NY |
Duke University, NC |
University of Copenhagen, Denmark |
Queensland Facility for Advanced Bioinformatics (QFAB), Australia |
The Browser code is also used in separate installations by the
UCSC Malaria Genome Browser
UCSC Malaria Genome Browser
UCSC Malaria Genome Browser is a bioinformatic research tool to study the malaria genome, developed by Hughes Undergraduate Research Laboratory together with the laboratory of Prof. Manuel Ares Jr. at the University of California, Santa Cruz....
and the Archaea Browser.