Phylo (video game)
Encyclopedia
Phylo is an experimental video game about multiple sequence alignment
optimization. Developed by the McGill
Centre for Bioinformatics, it was originally released as a free Flash
game in November 2010. Designed as a game with a purpose, players solve pattern-matching puzzles that represent nucleotide sequences of different phylogenetic taxa to optimize alignments over a computer algorithm. By aligning together each nucleotide sequence, represented as differently coloured blocks, players attempt to create the highest point value score for each set of sequences by matching as many colours as possible and minimizing gaps.
The nucleotide sequences generated by Phylo are obtained from actual sequence data from the UCSC Genome Browser
. High-scoring player alignments are collected as data and sent back to the McGill Centre for Bioinformatics to be further evaluated with a stronger scoring algorithm. Those player alignments that score higher than the current computer-generated score will be re-introduced into the global alignment as an optimization.
is to determine the most likely nucleotide sequence of each species by comparing the sequences of children species with those of a most recent common ancestor
. Producing such an optimal multiple sequence alignment is usually determined with a dynamic programming algorithm that finds the most probable evolutionary outcome by minimizing the number of mutation
s required. These algorithms generate phylogenetic trees for each nucleotide in a sequence for each species, and determine the genetic sequence for a common ancestor by comparing the trees of the child species. The algorithms then score and sort the completed phylogenetic tree, and the alignment with the maximum parsimony
score is determined to be the optimal, and thus most evolutionarily likely, multiple sequence alignment. However, finding such an optimal alignment for a large number of sequences has been determined to be an NP-complete
problem.
Phylo uses human-based computation
to create an interactive genetic algorithm to solve the multiple sequence alignment problem instead. Generation of the ancestral sequences and parsimony scoring is still computed using a variation of the Fitch algorithm, but Phylo abstracts the genetic sequences obtained from the UCSC Genome Browser into a pattern-matching game, allowing human players to suggest the most likely alignment rather than algorithmically considering all possible trees.
Scoring of the sequence alignment is done by comparing each of the player-aligned sequences with an algorithm-determined ancestral sequence generated at each node. A colour match yields +1 to the score, a mismatch yields -1, an opening of a gap yields -5, and an extension of any existing gap yields -1. The sum of all comparisons is then determined every several seconds, which provides the final score for that player's alignment. For each puzzle, only a few sequences are initially available at the beginning of the game. A computer-determined par score must be beaten by the player before moving on to the next round and unlocking more sequences to match. A player wins and is allowed to submit their sequence alignment to the database by matching or surpassing the final par score generated by the computer for each puzzle.
Multiple sequence alignment
A multiple sequence alignment is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor...
optimization. Developed by the McGill
McGill University
Mohammed Fathy is a public research university located in Montreal, Quebec, Canada. The university bears the name of James McGill, a prominent Montreal merchant from Glasgow, Scotland, whose bequest formed the beginning of the university...
Centre for Bioinformatics, it was originally released as a free Flash
Adobe Flash
Adobe Flash is a multimedia platform used to add animation, video, and interactivity to web pages. Flash is frequently used for advertisements, games and flash animations for broadcast...
game in November 2010. Designed as a game with a purpose, players solve pattern-matching puzzles that represent nucleotide sequences of different phylogenetic taxa to optimize alignments over a computer algorithm. By aligning together each nucleotide sequence, represented as differently coloured blocks, players attempt to create the highest point value score for each set of sequences by matching as many colours as possible and minimizing gaps.
The nucleotide sequences generated by Phylo are obtained from actual sequence data from the UCSC Genome Browser
UCSC Genome Browser
The University of California, Santa Cruz is an up-to-date source for genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations...
. High-scoring player alignments are collected as data and sent back to the McGill Centre for Bioinformatics to be further evaluated with a stronger scoring algorithm. Those player alignments that score higher than the current computer-generated score will be re-introduced into the global alignment as an optimization.
Background
The goal of multiple sequence alignments in phylogeneticsPhylogenetics
In biology, phylogenetics is the study of evolutionary relatedness among groups of organisms , which is discovered through molecular sequencing data and morphological data matrices...
is to determine the most likely nucleotide sequence of each species by comparing the sequences of children species with those of a most recent common ancestor
Most recent common ancestor
In genetics, the most recent common ancestor of any set of organisms is the most recent individual from which all organisms in the group are directly descended...
. Producing such an optimal multiple sequence alignment is usually determined with a dynamic programming algorithm that finds the most probable evolutionary outcome by minimizing the number of mutation
Mutation
In molecular biology and genetics, mutations are changes in a genomic sequence: the DNA sequence of a cell's genome or the DNA or RNA sequence of a virus. They can be defined as sudden and spontaneous changes in the cell. Mutations are caused by radiation, viruses, transposons and mutagenic...
s required. These algorithms generate phylogenetic trees for each nucleotide in a sequence for each species, and determine the genetic sequence for a common ancestor by comparing the trees of the child species. The algorithms then score and sort the completed phylogenetic tree, and the alignment with the maximum parsimony
Maximum parsimony
Parsimony is a non-parametric statistical method commonly used in computational phylogenetics for estimating phylogenies. Under parsimony, the preferred phylogenetic tree is the tree that requires the least evolutionary change to explain some observed data....
score is determined to be the optimal, and thus most evolutionarily likely, multiple sequence alignment. However, finding such an optimal alignment for a large number of sequences has been determined to be an NP-complete
NP-complete
In computational complexity theory, the complexity class NP-complete is a class of decision problems. A decision problem L is NP-complete if it is in the set of NP problems so that any given solution to the decision problem can be verified in polynomial time, and also in the set of NP-hard...
problem.
Phylo uses human-based computation
Human-based computation
Human-based computation is a computer science technique in which a computational process performs its function by outsourcing certain steps to humans...
to create an interactive genetic algorithm to solve the multiple sequence alignment problem instead. Generation of the ancestral sequences and parsimony scoring is still computed using a variation of the Fitch algorithm, but Phylo abstracts the genetic sequences obtained from the UCSC Genome Browser into a pattern-matching game, allowing human players to suggest the most likely alignment rather than algorithmically considering all possible trees.
Gameplay
Each puzzle in Phylo is categorized based on the number of total sequence fragments to be aligned and a disease that is associated with that fragment in humans. Once a puzzle is chosen, a few of the genetic sequence fragments for each species to be aligned, represented as coloured blocks, are each placed on a single row of a grid. Each nucleotide of a genetic sequence fragment is free to move along the grid. Players can then adjust the sequences as necessary in order to create the largest number of colour matches in each column between them, while minimizing the number of the gaps that appear.Scoring of the sequence alignment is done by comparing each of the player-aligned sequences with an algorithm-determined ancestral sequence generated at each node. A colour match yields +1 to the score, a mismatch yields -1, an opening of a gap yields -5, and an extension of any existing gap yields -1. The sum of all comparisons is then determined every several seconds, which provides the final score for that player's alignment. For each puzzle, only a few sequences are initially available at the beginning of the game. A computer-determined par score must be beaten by the player before moving on to the next round and unlocking more sequences to match. A player wins and is allowed to submit their sequence alignment to the database by matching or surpassing the final par score generated by the computer for each puzzle.
See also
- Citizen scienceCitizen scienceCitizen science is a term used for the systematic collection and analysis of data; development of technology; testing of natural phenomena; and the dissemination of these activities by researchers on a primarily avocational basis...
- CrowdsourcingCrowdsourcingCrowdsourcing is the act of sourcing tasks traditionally performed by specific individuals to a group of people or community through an open call....
- Human-based computationHuman-based computationHuman-based computation is a computer science technique in which a computational process performs its function by outsourcing certain steps to humans...
- Computational phylogeneticsComputational phylogeneticsComputational phylogenetics is the application of computational algorithms, methods and programs to phylogenetic analyses. The goal is to assemble a phylogenetic tree representing a hypothesis about the evolutionary ancestry of a set of genes, species, or other taxa...
- List of crowdsourcing projects