Rosetta@home
Encyclopedia
Rosetta@home is a distributed computing
project for protein structure prediction
on the Berkeley Open Infrastructure for Network Computing
(BOINC) platform, run by the Baker laboratory
at the University of Washington
. Rosetta@home aims to predict protein–protein docking and design new proteins
with the help of about sixty thousand active volunteered computers processing at 62 teraFLOPS
on average as of October 18, 2011. Foldit
, a Rosetta@Home videogame, aims to reach these goals with a crowdsourcing
approach. Though much of the project is oriented towards basic research
on improving the accuracy and robustness of the proteomics
methods, Rosetta@home also does applied research on malaria
, Alzheimer's disease
and other pathologies.
Like all BOINC projects, Rosetta@home uses idle computer processing resources from volunteers' computers to perform calculations on individual workunits
. Completed results are sent to a central project server
where they are validated and assimilated into project database
s. The project is cross-platform
, and runs on a wide variety of hardware
configurations. Users can view the progress of their individual protein structure prediction on the Rosetta@home screensaver.
In addition to disease-related research, the Rosetta@home network serves as a testing framework for new methods in structural bioinformatics
. These new methods are then used in other Rosetta-based applications, like RosettaDock and the Human Proteome Folding Project
, after being sufficiently developed and proven stable on Rosetta@home's large and diverse collection of volunteer computers. Two particularly important tests for the new methods developed in Rosetta@home are the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and Critical Assessment of Prediction of Interactions
(CAPRI) experiments, biannual experiments which evaluate the state of the art in protein structure prediction and protein–protein docking prediction, respectively. Rosetta@home consistently ranks among the foremost docking predictors, and is one of the best tertiary structure
predictors available.
, Linux
and Macintosh
platforms (BOINC also runs on several other platforms, e.g. FreeBSD). Participation in Rosetta@home requires a central processing unit
(CPU) with a clock speed of at least 500 MHz
, 200 megabyte
s of free disk space, 512 megabytes of physical memory, and Internet connectivity. As of May 4, 2010, the current version of the Rosetta application is 5.98, and the current version of the Rosetta Mini application is 2.14. The current recommended BOINC program version is 6.2.19. Standard HTTP (port
80) is used for communication between the user's BOINC client and the Rosetta@home servers at the University of Washington; HTTPS
(port 443) is used during password exchange. Remote and local control of the BOINC client use port 31416 and port 1043, which might need to be specifically unblocked if they are behind a firewall
. Workunits
containing data on individual proteins are distributed from servers located in the Baker lab at the University of Washington
to volunteers' computers, which then calculate a structure prediction for the assigned protein. To avoid duplicate structure predictions on a given protein, each workunit is initialized with a random number seed
. This gives each prediction a unique trajectory of descent along the protein's energy landscape
. Protein structure predictions from Rosetta@home are approximations of a global minimum
in a given protein's energy landscape. That global minimum represents the most energetically favorable conformation of the protein, i.e. its native state
.
A primary feature of the Rosetta@home graphical user interface
(GUI) is a screensaver
which shows a current workunit
's progress during the simulated protein folding
process. In the upper-left of the current screensaver, the target protein is shown adopting different shapes (conformations) in its search for the lowest energy structure. Depicted immediately to the right is the structure of the most recently accepted. On the upper right the lowest energy conformation of the current decoy is shown; below that is the true, or native, structure of the protein if it has already been determined. Three graphs are included in the screensaver. Near the middle, a graph for the accept model's free energy
is displayed, which fluctuates as the accepted model changes. A graph of the accepted model's root mean square deviation
(RMSD), which measures how structurally similar the accepted model is to the native model, is shown far right. On the right of the accepted energy graph and below the RMSD graph, the results from these two functions are used to produce an energy vs. RMSD plot as the model is progressively refined.
Like all BOINC projects, Rosetta@home runs in the background of the user's computer using idle computer power, either at or before logging in to an account on the host operating system
. Rosetta@home frees resources from the CPU as they are required by other applications so that normal computer usage is unaffected. To minimize power consumption or heat production from a computer running at sustained capacity, the maximum percentage of CPU resources that Rosetta@home is allowed to use can be specified through a user's account preferences. The times of day during which Rosetta@home is allowed to do work can also be adjusted, along with many other preferences, through a user's account settings.
Rosetta, the software that runs on the Rosetta@home network, was rewritten in C++
to allow easier development than that offered by its original version, which was written in Fortran
. This new version is object-oriented, and was released on February 8, 2008. Development of the Rosetta code is done by Rosetta Commons. The software is freely licensed to the academic community and available to pharmaceutical companies for a fee.
, of many proteins that carry out functions within the cell. To better understand a protein's function and aid in rational drug design, scientists need to know the protein's three-dimensional tertiary structure
.
Protein 3D structures are currently determined experimentally through X-ray crystallography
or nuclear magnetic resonance
(NMR) spectroscopy. The process is slow (it can take weeks or even months to figure out how to crystallize a protein for the first time) and comes at high cost (around $100,000 USD per protein). Unfortunately, the rate at which new sequences are discovered far exceeds the rate of structure determination – out of more than 7,400,000 protein sequences available in the NCBI
non-redundant (nr) protein database, fewer than 52,000 proteins' 3D structures have been solved and deposited in the Protein Data Bank
, the main repository for structural information on proteins. One of the main goals of Rosetta@home is to predict protein structures with the same accuracy as existing methods, but in a way that requires significantly less time and money. Rosetta@home also develops methods to determine the structure and docking of membrane protein
s (e.g., GPCRs), which are exceptionally difficult to analyze with traditional techniques like X-ray crystallography and NMR spectroscopy, yet represent the majority of targets for modern drugs.
Progress in protein structure prediction is evaluated in the biannual Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiment, in which researchers from around the world attempt to derive a protein's structure from the protein's amino acid sequence. High scoring groups in this sometimes competitive experiment are considered the de facto
standard-bearers for what is the state of the art in protein structure prediction. Rosetta, the program on which Rosetta@home is based, has been used since CASP5 in 2002. In the 2004 CASP6 experiment, Rosetta made history by being the first to produce a close to atomic-level resolution, ab initio protein structure prediction in its submitted model for CASP target T0281. Ab initio modeling is considered an especially difficult category of protein structure prediction, as it does not use information from structural homology
and must rely on information from sequence homology
and modeling physical interactions within the protein. Rosetta@home has been used in CASP since 2006, where it was among the top predictors in every category of structure prediction in CASP7. These high quality predictions were enabled by the computing power made available by Rosetta@home volunteers. Increasing computational power allows Rosetta@home to sample more regions of conformation space
(the possible shapes a protein can assume), which, according to Levinthal's paradox, is predicted to increase exponentially
with protein length.
Rosetta@home is also used in protein docking prediction, which determines the structure of multiple complexed proteins
, or quaternary structure
. This type of protein interaction affects many cellular functions, including antigen–antibody and enzyme–inhibitor binding and cellular import and export. Determining these interactions is critical for drug design
. Rosetta is used in the Critical Assessment of Prediction of Interactions
(CAPRI) experiment, which evaluates the state of the protein docking field similar to how CASP gauges progress in protein structure prediction. The computing power made available by Rosetta@home's project volunteers has been cited as a major factor in Rosetta's performance in CAPRI, where its docking predictions have been among the most accurate and complete.
In early 2008, Rosetta was used to computationally design a protein with a function never before observed in nature. This was inspired in part by the retraction of a high-profile paper from 2004 which originally described the computational design of a protein with improved enzymatic activity compared to its natural form. The 2008 research paper from David Baker's group describing how the protein was made, which cited Rosetta@home for the computational resources it made available, represented an important proof of concept
for this protein design method. This type of protein design could have future applications in drug discovery, green chemistry
, and bioremediation
.
. By taking hexapeptides (six amino acid-long fragments) of a protein of interest and selecting the lowest energy match to a structure similar to that of a known fibril forming hexapeptide, RosettaDesign was able to identify peptides twice as likely to form fibrils as are random proteins. Rosetta@home was used in the same study to predict structures for amyloid beta
, a fibril-forming protein that has been postulated to cause Alzheimer's disease
. Preliminary but as yet unpublished results have been produced on Rosetta-designed proteins that may prevent fibrils from forming, although it is unknown whether it can prevent the disease.
. The computational model accurately predicted docking between LF and PA, helping to establish which domains
of the respective proteins are involved in the LF–PA complex. This insight was eventually used in research resulting in improved anthrax vaccines.
(immunoglobulin G
) and a surface protein expressed by herpes simplex virus 1 (HSV-1) which serves to degrade the antiviral antibody. The protein complex predicted by RosettaDock closely agreed with the particularly difficult-to-obtain experimental models, leading researchers to conclude that the docking method has potential in addressing some of the problems that X-ray crystallography has with modeling protein–protein interfaces.
(HIV).
initiative, Rosetta has been used to computationally design novel homing endonuclease proteins, which could eradicate Anopheles gambiae
or otherwise render the mosquito unable to transmit malaria
. Being able to model and alter protein–DNA interactions specifically, like those of homing endonucleases, gives computational protein design methods like Rosetta an important role in gene therapy
(which includes possible cancer
treatments).
, as it attempts to decipher the structural "meaning" of proteins' amino acid sequences. More than seven years after Rosetta's first appearance, the Rosetta@home project was released (i.e. announced as no longer beta) on October 6, 2005. Many of the graduate students and other researchers involved in Rosetta's initial development have since moved to other universities and research institutions, and subsequently enhanced different parts of the Rosetta project.
. In 2002 RosettaDesign was used to design Top7
, a 93-amino acid long α/β protein that had an overall fold
never before recorded in nature. This new conformation was predicted by Rosetta to within 1.2 Å
RMSD of the structure determined by X-ray crystallography
, representing an unusually accurate structure prediction. Rosetta and RosettaDesign earned widespread recognition by being the first to design and accurately predict the structure of a novel protein of such length, as reflected by the 2002 paper describing the dual approach prompting two positive letters in the journal Science
, and being cited by more than 240 other scientific articles. The visible product of that research, Top7
, was featured as the Protein Data Bank
's 'Molecule of the Month' in October 2006; a superposition
of the respective cores (residues 60–79) of its predicted and X-ray crystal structures are featured in the Rosetta@home logo.
Brian Kuhlman, a former postdoctoral associate in David Baker's
lab and now an associate professor at the University of North Carolina, Chapel Hill, offers RosettaDesign as an online service.
experiment in 2002 as the Baker laboratory's algorithm
for protein–protein docking prediction. In that experiment, RosettaDock made a high-accuracy prediction for the docking between streptococcal pyogenic exotoxin A and a T cell-receptor β-chain, and a medium accuracy prediction for a complex between porcine α-amylase and a camelid
antibody
. While the RosettaDock method only made two acceptably accurate predictions out of seven possible, this was enough to rank it seventh out of nineteen prediction methods in the first CAPRI assessment.
Development of RosettaDock diverged into two branches for subsequent CAPRI rounds as Jeffrey Gray, who laid the groundwork for RosettaDock while at the University of Washington
, continued working on the method in his new position at Johns Hopkins University
. Members of the Baker laboratory further developed RosettaDock in Gray's absence. The two versions differed slightly in side-chain modeling, decoy selection and other areas. Despite these differences, both the Baker and Gray methods performed well in the second CAPRI assessment, placing fifth and seventh respectively out of 30 predictor groups. Jeffrey Gray's RosettaDock server is available as a free docking prediction service for non-commercial use.
In October 2006, RosettaDock was integrated into Rosetta@home. The method used a fast, crude docking model phase using only the protein backbone. This was followed by a slow full-atom refinement phase in which the orientation of the two interacting proteins relative to each other, and side-chain interactions at the protein–protein interface, were simultaneously optimized to find the lowest energy conformation. The vastly increased computational power afforded by the Rosetta@home network, in combination with revised "fold-tree" representations for backbone flexibility and loop modeling
, made RosettaDock sixth out of 63 prediction groups in the third CAPRI assessment.
experiments since CASP5 in 2002, performing among the best in the automated server prediction category. Robetta has since competed in CASP6 and 7, where it did better than average among both automated server and human predictor groups.
In modeling protein structure as of CASP6, Robetta first searches for structural homologs using BLAST
, PSI-BLAST, and 3D-Jury
, then parses the target sequence into its individual domains
, or independently folding units of proteins, by matching the sequence to structural families in the Pfam database
. Domains with structural homologs then follow a "template-based model" (i.e., homology modeling
) protocol. Here, the Baker laboratory's in-house alignment program, K*sync, produces a group of sequence homologs, and each of these is modeled by the Rosetta de novo method to produce a decoy (possible structure). The final structure prediction is selected by taking the lowest energy
model as determined by a low-resolution Rosetta energy function. For domains that have no detected structural homologs, a de novo protocol is followed in which the lowest energy model from a set of generated decoys is selected as the final prediction. These domain predictions are then connected together to investigate inter-domain, tertiary-level interactions within the protein. Finally, side-chain contributions are modeled using a protocol for Monte Carlo
conformational search.
In CASP8, Robetta was augmented to use Rosetta's high resolution all-atom refinement method, the absence of which was cited as the main cause for Robetta being less accurate than the Rosetta@home network in CASP7.
program, the Baker lab publicly released Foldit
, an online protein structure prediction game based on the Rosetta platform. As of September 25, 2008, Foldit has over 59,000 registered users. The game gives users a set of controls (e.g. "shake", "wiggle", "rebuild") to manipulate the backbone and amino acid side chain
s of the target protein into more energetically favorable conformations. Users can work on solutions individually as "soloists" or collectively as "evolvers", accruing points under either category as they improve their structure predictions. Users can also individually compete with other users through a "duel" feature, in which the player with the lowest energy structure after 20 moves wins.
is the only one not to use the BOINC platform.
Both Rosetta@home and Folding@home study protein misfolding diseases such as Alzheimer's disease
, but Folding@home does so much more exclusively.
Folding@home almost exclusively uses all-atom molecular dynamics
models to understand how and why proteins fold (or potentially misfold, and subsequently aggregate to cause diseases).
In other words, Folding@home's strength is modeling the process of protein folding, while Rosetta@home's strength is computational protein design and prediction of protein structure and docking.
Some of Rosetta@home's results are used as the basis for some Folding@home projects. Rosetta provides the most likely structure, but it is not definite if that is the form the molecule takes or whether or not it is viable. Folding@home can then be used to verify Rosetta@home's results, but can also provide additional atomic-level information, as well as details into how the molecule changes shape.
The two projects also differ significantly in their computing power and host diversity. Averaging about 6,650 teraFLOPS from a host base of CPUs, GPU
s and PS3s,
Folding@home has nearly 108 times more computing power than Rosetta@home.
(HPF), a subproject of World Community Grid
, have used the Rosetta program to make structural and functional annotations of various genomes. Although he now uses it to create databases for biologists, Richard Bonneau
, head scientist of the Human Proteome Folding Project, was active in the original development of Rosetta at David Baker's laboratory while obtaining his PhD. More information on the relationship between the HPF1, HPF2 and Rosetta@home can be found on Richard Bonneau's website.
specializes in protein structure prediction. Predictor@home plans to develop new areas for its distributed computing platform in protein design and docking (using the CHARMM
package for molecular dynamics), further likening it to Rosetta@home. While Rosetta@home uses the Rosetta program for its structure prediction, Predictor@home uses the dTASSER methodology.
Other protein related distributed computing projects on BOINC include QMC@home
, Docking@home
, POEM@home
, SIMAP
, and TANPAKU
. RALPH@home, the Rosetta@home alpha project which tests new application versions, work units, and updates before they move on to Rosetta@home, runs on BOINC as well.
Users are granted BOINC credits
as a measure of their contribution. The credit granted for each workunit is the number of decoys produced for that workunit multiplied by the average claimed credit for the decoys submitted by all computer hosts for that workunit. This custom system was designed to address significant differences between credit granted to users with the standard BOINC client and an optimized BOINC client, and credit differences between users running Rosetta@home on Windows
and Linux
operating systems. The amount of credit granted per second of CPU work is lower for Rosetta@home than most other BOINC projects. Despite this disadvantage to BOINC users competing for rank, Rosetta@home is fifth out of over 40 BOINC projects in terms of total credit.
Rosetta@home users who predict protein structures submitted for the CASP
experiment are acknowledged in scientific publications regarding their results. Users who predict the lowest energy structure for a given workunit are featured on the Rosetta@home homepage
as 'Predictor of the Day', along with any team of which they are a member. A 'User of the Day' is chosen at random each day to be on the homepage as well from users who have made a Rosetta@home profile.
Online Rosetta services
Distributed computing
Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...
project for protein structure prediction
Protein structure prediction
Protein structure prediction is the prediction of the three-dimensional structure of a protein from its amino acid sequence — that is, the prediction of its secondary, tertiary, and quaternary structure from its primary structure. Structure prediction is fundamentally different from the inverse...
on the Berkeley Open Infrastructure for Network Computing
Berkeley Open Infrastructure for Network Computing
The Berkeley Open Infrastructure for Network Computing is an open source middleware system for volunteer and grid computing. It was originally developed to support the SETI@home project before it became useful as a platform for other distributed applications in areas as diverse as mathematics,...
(BOINC) platform, run by the Baker laboratory
David Baker (biochemist)
David Baker is an American biochemist and computational biologist who studies methods to predict the three-dimensional structures of proteins. He is a Professor of Biochemistry at the University of Washington, where he is the principal investigator of the 30+ member...
at the University of Washington
University of Washington
University of Washington is a public research university, founded in 1861 in Seattle, Washington, United States. The UW is the largest university in the Northwest and the oldest public university on the West Coast. The university has three campuses, with its largest campus in the University...
. Rosetta@home aims to predict protein–protein docking and design new proteins
Protein design
Protein design is the design of new protein molecules, either from scratch or by making calculated variations on a known structure. The use of rational design techniques for proteins is a major aspect of protein engineering....
with the help of about sixty thousand active volunteered computers processing at 62 teraFLOPS
FLOPS
In computing, FLOPS is a measure of a computer's performance, especially in fields of scientific calculations that make heavy use of floating-point calculations, similar to the older, simpler, instructions per second...
on average as of October 18, 2011. Foldit
Foldit
Foldit is an online puzzle video game about protein folding. The game is part of an experimental research project, and is developed by the University of Washington's Center for Game Science in collaboration with the UW Department of Biochemistry...
, a Rosetta@Home videogame, aims to reach these goals with a crowdsourcing
Crowdsourcing
Crowdsourcing is the act of sourcing tasks traditionally performed by specific individuals to a group of people or community through an open call....
approach. Though much of the project is oriented towards basic research
Basic Research
Basic Research is an herbal supplement and cosmetics manufacturer based in Salt Lake City, Utah that distributes products through a large number of subsidiaries. In addition, their products are sold domestically and internationally through a number of high-end retailers. Dennis Gay is the...
on improving the accuracy and robustness of the proteomics
Proteomics
Proteomics is the large-scale study of proteins, particularly their structures and functions. Proteins are vital parts of living organisms, as they are the main components of the physiological metabolic pathways of cells. The term "proteomics" was first coined in 1997 to make an analogy with...
methods, Rosetta@home also does applied research on malaria
Malaria
Malaria is a mosquito-borne infectious disease of humans and other animals caused by eukaryotic protists of the genus Plasmodium. The disease results from the multiplication of Plasmodium parasites within red blood cells, causing symptoms that typically include fever and headache, in severe cases...
, Alzheimer's disease
Alzheimer's disease
Alzheimer's disease also known in medical literature as Alzheimer disease is the most common form of dementia. There is no cure for the disease, which worsens as it progresses, and eventually leads to death...
and other pathologies.
Like all BOINC projects, Rosetta@home uses idle computer processing resources from volunteers' computers to perform calculations on individual workunits
BOINC client-server technology
BOINC client–server technology refers to the model under which BOINC works. The BOINC framework consists of two layers which operate under the client–server architecture. Once the BOINC software is installed in a machine, the server starts sending tasks to the client...
. Completed results are sent to a central project server
Server (computing)
In the context of client-server architecture, a server is a computer program running to serve the requests of other programs, the "clients". Thus, the "server" performs some computational task on behalf of "clients"...
where they are validated and assimilated into project database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
s. The project is cross-platform
Cross-platform
In computing, cross-platform, or multi-platform, is an attribute conferred to computer software or computing methods and concepts that are implemented and inter-operate on multiple computer platforms...
, and runs on a wide variety of hardware
Hardware
Hardware is a general term for equipment such as keys, locks, hinges, latches, handles, wire, chains, plumbing supplies, tools, utensils, cutlery and machine parts. Household hardware is typically sold in hardware stores....
configurations. Users can view the progress of their individual protein structure prediction on the Rosetta@home screensaver.
In addition to disease-related research, the Rosetta@home network serves as a testing framework for new methods in structural bioinformatics
Structural bioinformatics
Structural bioinformatics is the branch of bioinformatics which is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA...
. These new methods are then used in other Rosetta-based applications, like RosettaDock and the Human Proteome Folding Project
Human Proteome Folding Project
The Human Proteome Folding Project is a collaborative effort between New York University , the Institute for Systems Biology and the University of Washington , using the Rosetta software developed by the Rosetta Commons....
, after being sufficiently developed and proven stable on Rosetta@home's large and diverse collection of volunteer computers. Two particularly important tests for the new methods developed in Rosetta@home are the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and Critical Assessment of Prediction of Interactions
Critical Assessment of Prediction of Interactions
Critical Assessment of Prediction of Interactions is a community-wide experiment in modelling the molecular structure of protein complexes, otherwise known as protein–protein docking....
(CAPRI) experiments, biannual experiments which evaluate the state of the art in protein structure prediction and protein–protein docking prediction, respectively. Rosetta@home consistently ranks among the foremost docking predictors, and is one of the best tertiary structure
Tertiary structure
In biochemistry and molecular biology, the tertiary structure of a protein or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates.-Relationship to primary structure:...
predictors available.
Computing platform
Both the Rosetta@home application and the BOINC distributed computing platform are available for the Microsoft WindowsMicrosoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...
, Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
and Macintosh
Macintosh
The Macintosh , or Mac, is a series of several lines of personal computers designed, developed, and marketed by Apple Inc. The first Macintosh was introduced by Apple's then-chairman Steve Jobs on January 24, 1984; it was the first commercially successful personal computer to feature a mouse and a...
platforms (BOINC also runs on several other platforms, e.g. FreeBSD). Participation in Rosetta@home requires a central processing unit
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...
(CPU) with a clock speed of at least 500 MHz
Hertz
The hertz is the SI unit of frequency defined as the number of cycles per second of a periodic phenomenon. One of its most common uses is the description of the sine wave, particularly those used in radio and audio applications....
, 200 megabyte
Megabyte
The megabyte is a multiple of the unit byte for digital information storage or transmission with two different values depending on context: bytes generally for computer memory; and one million bytes generally for computer storage. The IEEE Standards Board has decided that "Mega will mean 1 000...
s of free disk space, 512 megabytes of physical memory, and Internet connectivity. As of May 4, 2010, the current version of the Rosetta application is 5.98, and the current version of the Rosetta Mini application is 2.14. The current recommended BOINC program version is 6.2.19. Standard HTTP (port
TCP and UDP port
In computer networking, a port is an application-specific or process-specific software construct serving as a communications endpoint in a computer's host operating system. A port is associated with an IP address of the host, as well as the type of protocol used for communication...
80) is used for communication between the user's BOINC client and the Rosetta@home servers at the University of Washington; HTTPS
Https
Hypertext Transfer Protocol Secure is a combination of the Hypertext Transfer Protocol with SSL/TLS protocol to provide encrypted communication and secure identification of a network web server...
(port 443) is used during password exchange. Remote and local control of the BOINC client use port 31416 and port 1043, which might need to be specifically unblocked if they are behind a firewall
Firewall (computing)
A firewall is a device or set of devices designed to permit or deny network transmissions based upon a set of rules and is frequently used to protect networks from unauthorized access while permitting legitimate communications to pass....
. Workunits
BOINC client-server technology
BOINC client–server technology refers to the model under which BOINC works. The BOINC framework consists of two layers which operate under the client–server architecture. Once the BOINC software is installed in a machine, the server starts sending tasks to the client...
containing data on individual proteins are distributed from servers located in the Baker lab at the University of Washington
University of Washington
University of Washington is a public research university, founded in 1861 in Seattle, Washington, United States. The UW is the largest university in the Northwest and the oldest public university on the West Coast. The university has three campuses, with its largest campus in the University...
to volunteers' computers, which then calculate a structure prediction for the assigned protein. To avoid duplicate structure predictions on a given protein, each workunit is initialized with a random number seed
Random seed
A random seed is a number used to initialize a pseudorandom number generator.The choice of a good random seed is crucial in the field of computer security...
. This gives each prediction a unique trajectory of descent along the protein's energy landscape
Energy landscape
In physics, an energy landscape is a mapping of all possible conformations of a molecular entity, or the spatial positions of interacting molecules in a system, and their corresponding energy levels, typically Gibbs free energy, on a two- or three-dimensional Cartesian coordinate system.In...
. Protein structure predictions from Rosetta@home are approximations of a global minimum
Maxima and minima
In mathematics, the maximum and minimum of a function, known collectively as extrema , are the largest and smallest value that the function takes at a point either within a given neighborhood or on the function domain in its entirety .More generally, the...
in a given protein's energy landscape. That global minimum represents the most energetically favorable conformation of the protein, i.e. its native state
Native state
In biochemistry, the native state of a protein is its operative or functional form. While all protein molecules begin as simple unbranched chains of amino acids, once completed they assume highly specific three-dimensional shapes; that ultimate shape, known as tertiary structure, is the folded...
.
A primary feature of the Rosetta@home graphical user interface
Graphical user interface
In computing, a graphical user interface is a type of user interface that allows users to interact with electronic devices with images rather than text commands. GUIs can be used in computers, hand-held devices such as MP3 players, portable media players or gaming devices, household appliances and...
(GUI) is a screensaver
Screensaver
A screensaver is a type of computer program initially designed to prevent phosphor burn-in on CRT and plasma computer monitors by blanking the screen or filling it with moving images or patterns when the computer is not in use...
which shows a current workunit
BOINC client-server technology
BOINC client–server technology refers to the model under which BOINC works. The BOINC framework consists of two layers which operate under the client–server architecture. Once the BOINC software is installed in a machine, the server starts sending tasks to the client...
's progress during the simulated protein folding
Protein folding
Protein folding is the process by which a protein structure assumes its functional shape or conformation. It is the physical process by which a polypeptide folds into its characteristic and functional three-dimensional structure from random coil....
process. In the upper-left of the current screensaver, the target protein is shown adopting different shapes (conformations) in its search for the lowest energy structure. Depicted immediately to the right is the structure of the most recently accepted. On the upper right the lowest energy conformation of the current decoy is shown; below that is the true, or native, structure of the protein if it has already been determined. Three graphs are included in the screensaver. Near the middle, a graph for the accept model's free energy
Thermodynamic free energy
The thermodynamic free energy is the amount of work that a thermodynamic system can perform. The concept is useful in the thermodynamics of chemical or thermal processes in engineering and science. The free energy is the internal energy of a system less the amount of energy that cannot be used to...
is displayed, which fluctuates as the accepted model changes. A graph of the accepted model's root mean square deviation
Root mean square deviation
The root-mean-square deviation is the measure of the average distance between the atoms of superimposed proteins...
(RMSD), which measures how structurally similar the accepted model is to the native model, is shown far right. On the right of the accepted energy graph and below the RMSD graph, the results from these two functions are used to produce an energy vs. RMSD plot as the model is progressively refined.
Like all BOINC projects, Rosetta@home runs in the background of the user's computer using idle computer power, either at or before logging in to an account on the host operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
. Rosetta@home frees resources from the CPU as they are required by other applications so that normal computer usage is unaffected. To minimize power consumption or heat production from a computer running at sustained capacity, the maximum percentage of CPU resources that Rosetta@home is allowed to use can be specified through a user's account preferences. The times of day during which Rosetta@home is allowed to do work can also be adjusted, along with many other preferences, through a user's account settings.
Rosetta, the software that runs on the Rosetta@home network, was rewritten in C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
to allow easier development than that offered by its original version, which was written in Fortran
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...
. This new version is object-oriented, and was released on February 8, 2008. Development of the Rosetta code is done by Rosetta Commons. The software is freely licensed to the academic community and available to pharmaceutical companies for a fee.
Project significance
With the proliferation of genome sequencing projects, scientists can infer the amino acid sequence, or primary structurePrimary structure
The primary structure of peptides and proteins refers to the linear sequence of its amino acid structural units. The term "primary structure" was first coined by Linderstrøm-Lang in 1951...
, of many proteins that carry out functions within the cell. To better understand a protein's function and aid in rational drug design, scientists need to know the protein's three-dimensional tertiary structure
Tertiary structure
In biochemistry and molecular biology, the tertiary structure of a protein or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates.-Relationship to primary structure:...
.
Protein 3D structures are currently determined experimentally through X-ray crystallography
X-ray crystallography
X-ray crystallography is a method of determining the arrangement of atoms within a crystal, in which a beam of X-rays strikes a crystal and causes the beam of light to spread into many specific directions. From the angles and intensities of these diffracted beams, a crystallographer can produce a...
or nuclear magnetic resonance
Nuclear magnetic resonance
Nuclear magnetic resonance is a physical phenomenon in which magnetic nuclei in a magnetic field absorb and re-emit electromagnetic radiation...
(NMR) spectroscopy. The process is slow (it can take weeks or even months to figure out how to crystallize a protein for the first time) and comes at high cost (around $100,000 USD per protein). Unfortunately, the rate at which new sequences are discovered far exceeds the rate of structure determination – out of more than 7,400,000 protein sequences available in the NCBI
National Center for Biotechnology Information
The National Center for Biotechnology Information is part of the United States National Library of Medicine , a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper...
non-redundant (nr) protein database, fewer than 52,000 proteins' 3D structures have been solved and deposited in the Protein Data Bank
Protein Data Bank
The Protein Data Bank is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids....
, the main repository for structural information on proteins. One of the main goals of Rosetta@home is to predict protein structures with the same accuracy as existing methods, but in a way that requires significantly less time and money. Rosetta@home also develops methods to determine the structure and docking of membrane protein
Membrane protein
A membrane protein is a protein molecule that is attached to, or associated with the membrane of a cell or an organelle. More than half of all proteins interact with membranes.-Function:...
s (e.g., GPCRs), which are exceptionally difficult to analyze with traditional techniques like X-ray crystallography and NMR spectroscopy, yet represent the majority of targets for modern drugs.
Progress in protein structure prediction is evaluated in the biannual Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiment, in which researchers from around the world attempt to derive a protein's structure from the protein's amino acid sequence. High scoring groups in this sometimes competitive experiment are considered the de facto
De facto
De facto is a Latin expression that means "concerning fact." In law, it often means "in practice but not necessarily ordained by law" or "in practice or actuality, but not officially established." It is commonly used in contrast to de jure when referring to matters of law, governance, or...
standard-bearers for what is the state of the art in protein structure prediction. Rosetta, the program on which Rosetta@home is based, has been used since CASP5 in 2002. In the 2004 CASP6 experiment, Rosetta made history by being the first to produce a close to atomic-level resolution, ab initio protein structure prediction in its submitted model for CASP target T0281. Ab initio modeling is considered an especially difficult category of protein structure prediction, as it does not use information from structural homology
Structural alignment
Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules...
and must rely on information from sequence homology
Multiple sequence alignment
A multiple sequence alignment is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor...
and modeling physical interactions within the protein. Rosetta@home has been used in CASP since 2006, where it was among the top predictors in every category of structure prediction in CASP7. These high quality predictions were enabled by the computing power made available by Rosetta@home volunteers. Increasing computational power allows Rosetta@home to sample more regions of conformation space
Configuration space
- Configuration space in physics :In classical mechanics, the configuration space is the space of possible positions that a physical system may attain, possibly subject to external constraints...
(the possible shapes a protein can assume), which, according to Levinthal's paradox, is predicted to increase exponentially
Computational complexity theory
Computational complexity theory is a branch of the theory of computation in theoretical computer science and mathematics that focuses on classifying computational problems according to their inherent difficulty, and relating those classes to each other...
with protein length.
Rosetta@home is also used in protein docking prediction, which determines the structure of multiple complexed proteins
Protein complex
A multiprotein complex is a group of two or more associated polypeptide chains. If the different polypeptide chains contain different protein domain, the resulting multiprotein complex can have multiple catalytic functions...
, or quaternary structure
Quaternary structure
In biochemistry, quaternary structure is the arrangement of multiple folded protein or coiling protein molecules in a multi-subunit complex.-Description and examples:...
. This type of protein interaction affects many cellular functions, including antigen–antibody and enzyme–inhibitor binding and cellular import and export. Determining these interactions is critical for drug design
Drug design
Drug design, also sometimes referred to as rational drug design or structure-based drug design, is the inventive process of finding new medications based on the knowledge of the biological target...
. Rosetta is used in the Critical Assessment of Prediction of Interactions
Critical Assessment of Prediction of Interactions
Critical Assessment of Prediction of Interactions is a community-wide experiment in modelling the molecular structure of protein complexes, otherwise known as protein–protein docking....
(CAPRI) experiment, which evaluates the state of the protein docking field similar to how CASP gauges progress in protein structure prediction. The computing power made available by Rosetta@home's project volunteers has been cited as a major factor in Rosetta's performance in CAPRI, where its docking predictions have been among the most accurate and complete.
In early 2008, Rosetta was used to computationally design a protein with a function never before observed in nature. This was inspired in part by the retraction of a high-profile paper from 2004 which originally described the computational design of a protein with improved enzymatic activity compared to its natural form. The 2008 research paper from David Baker's group describing how the protein was made, which cited Rosetta@home for the computational resources it made available, represented an important proof of concept
Proof of concept
A proof of concept or a proof of principle is a realization of a certain method or idea to demonstrate its feasibility, or a demonstration in principle, whose purpose is to verify that some concept or theory that has the potential of being used...
for this protein design method. This type of protein design could have future applications in drug discovery, green chemistry
Green chemistry
Green chemistry, also called sustainable chemistry, is a philosophy of chemical research and engineering that encourages the design of products and processes that minimize the use and generation of hazardous substances...
, and bioremediation
Bioremediation
Bioremediation is the use of microorganism metabolism to remove pollutants. Technologies can be generally classified as in situ or ex situ. In situ bioremediation involves treating the contaminated material at the site, while ex situ involves the removal of the contaminated material to be treated...
.
Disease-related research
In addition to basic research in predicting protein structure, docking and design, Rosetta@home is also used in immediate disease-related research. Numerous minor research projects are described in David Baker's Rosetta@home journal.Alzheimer's disease
A component of the Rosetta software suite, RosettaDesign, was used to accurately predict which regions of amyloidogenic proteins were most likely to make amyloid-like fibrilsAmyloid
Amyloids are insoluble fibrous protein aggregates sharing specific structural traits. Abnormal accumulation of amyloid in organs may lead to amyloidosis, and may play a role in various neurodegenerative diseases.-Definition:...
. By taking hexapeptides (six amino acid-long fragments) of a protein of interest and selecting the lowest energy match to a structure similar to that of a known fibril forming hexapeptide, RosettaDesign was able to identify peptides twice as likely to form fibrils as are random proteins. Rosetta@home was used in the same study to predict structures for amyloid beta
Amyloid beta
Amyloid beta is a peptide of 36–43 amino acids that is processed from the Amyloid precursor protein. While it is most commonly known in association with Alzheimer's disease, it does not exist specifically to cause disease...
, a fibril-forming protein that has been postulated to cause Alzheimer's disease
Alzheimer's disease
Alzheimer's disease also known in medical literature as Alzheimer disease is the most common form of dementia. There is no cure for the disease, which worsens as it progresses, and eventually leads to death...
. Preliminary but as yet unpublished results have been produced on Rosetta-designed proteins that may prevent fibrils from forming, although it is unknown whether it can prevent the disease.
Anthrax
Another component of Rosetta, RosettaDock, was used in conjunction with experimental methods to model interactions between three proteins—lethal factor (LF), edema factor (EF) and protective antigen (PA)—that make up anthrax toxinAnthrax toxin
Anthrax toxin is a three-protein exotoxin secreted by virulent strains of the bacterium, Bacillus anthracis--the causative agent of anthrax. The toxin was first discovered by Harry Smith in 1954. Anthrax toxin is composed of a cell-binding protein, known as protective antigen , and two enzyme...
. The computational model accurately predicted docking between LF and PA, helping to establish which domains
Protein domain
A protein domain is a part of protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain. Each domain forms a compact three-dimensional structure and often can be independently stable and folded. Many proteins consist of several structural...
of the respective proteins are involved in the LF–PA complex. This insight was eventually used in research resulting in improved anthrax vaccines.
Herpes simplex virus 1
RosettaDock was used to model docking between an antibodyAntibody
An antibody, also known as an immunoglobulin, is a large Y-shaped protein used by the immune system to identify and neutralize foreign objects such as bacteria and viruses. The antibody recognizes a unique part of the foreign target, termed an antigen...
(immunoglobulin G
Immunoglobulin G
Immunoglobulin G are antibody molecules. Each IgG is composed of four peptide chains — two heavy chains γ and two light chains. Each IgG has two antigen binding sites. Other immunoglobulins may be described in terms of polymers with the IgG structure considered the monomer.IgG constitutes 75%...
) and a surface protein expressed by herpes simplex virus 1 (HSV-1) which serves to degrade the antiviral antibody. The protein complex predicted by RosettaDock closely agreed with the particularly difficult-to-obtain experimental models, leading researchers to conclude that the docking method has potential in addressing some of the problems that X-ray crystallography has with modeling protein–protein interfaces.
HIV
As part of research funded by a $19.4 million dollar grant by the Bill and Melinda Gates Foundation, Rosetta@home has been used in designing multiple possible vaccines for human immunodeficiency virusHIV
Human immunodeficiency virus is a lentivirus that causes acquired immunodeficiency syndrome , a condition in humans in which progressive failure of the immune system allows life-threatening opportunistic infections and cancers to thrive...
(HIV).
Malaria
In research involved with the Grand Challenges in Global HealthGrand challenges in global health
The Grand Challenges in Global Health initiative is a partnership dedicated to supporting scientific and technical research to solve critical health problems in the developing world. Currently 14 independent "Grand Challenges" are supported...
initiative, Rosetta has been used to computationally design novel homing endonuclease proteins, which could eradicate Anopheles gambiae
Anopheles gambiae
Anopheles gambiae is a complex of at least seven morphologically distinguishable species of mosquitoes in the genus Anopheles. This complex was recognised in the 1960s and includes the most important vectors of malaria in sub-Saharan Africa and the most efficient malaria vectors known.This species...
or otherwise render the mosquito unable to transmit malaria
Malaria
Malaria is a mosquito-borne infectious disease of humans and other animals caused by eukaryotic protists of the genus Plasmodium. The disease results from the multiplication of Plasmodium parasites within red blood cells, causing symptoms that typically include fever and headache, in severe cases...
. Being able to model and alter protein–DNA interactions specifically, like those of homing endonucleases, gives computational protein design methods like Rosetta an important role in gene therapy
Gene therapy
Gene therapy is the insertion, alteration, or removal of genes within an individual's cells and biological tissues to treat disease. It is a technique for correcting defective genes that are responsible for disease development...
(which includes possible cancer
Cancer
Cancer , known medically as a malignant neoplasm, is a large group of different diseases, all involving unregulated cell growth. In cancer, cells divide and grow uncontrollably, forming malignant tumors, and invade nearby parts of the body. The cancer may also spread to more distant parts of the...
treatments).
Development history and branches
Originally introduced by the Baker laboratory in 1998 as an ab initio approach to structure prediction, Rosetta has since branched into several development streams and distinct services. The Rosetta platform derives its name from the Rosetta StoneRosetta Stone
The Rosetta Stone is an ancient Egyptian granodiorite stele inscribed with a decree issued at Memphis in 196 BC on behalf of King Ptolemy V. The decree appears in three scripts: the upper text is Ancient Egyptian hieroglyphs, the middle portion Demotic script, and the lowest Ancient Greek...
, as it attempts to decipher the structural "meaning" of proteins' amino acid sequences. More than seven years after Rosetta's first appearance, the Rosetta@home project was released (i.e. announced as no longer beta) on October 6, 2005. Many of the graduate students and other researchers involved in Rosetta's initial development have since moved to other universities and research institutions, and subsequently enhanced different parts of the Rosetta project.
RosettaDesign
RosettaDesign, a computational approach to protein design based on Rosetta, began in 2000 with a study in redesigning the folding pathway of Protein GProtein G
Protein G is an immunoglobulin-binding protein expressed in group C and G Streptococcal bacteria much like Protein A but with differing specificities. It is a 65-kDa and a 58 kDa cell surface protein that has found application in purifying antibodies through its binding to the Fc region...
. In 2002 RosettaDesign was used to design Top7
Top7
Top7 is an artificial 93-residue protein, classified as a de novo protein since it was designed by Brian Kuhlman and Gautam Dantas in David Baker's laboratory at the University of Washington to have a unique fold not found in nature. The protein was designed ab initio on a computer with the help of...
, a 93-amino acid long α/β protein that had an overall fold
Structural Classification of Proteins
The Structural Classification of Proteins database is a largely manual classification of protein structural domains based on similarities of their structures and amino acid sequences. A motivation for this classification is to determine the evolutionary relationship between proteins...
never before recorded in nature. This new conformation was predicted by Rosetta to within 1.2 Å
Å
Å represents various sounds in several languages. Å is part of the alphabets used for the Alemannic and the Bavarian-Austrian dialects of German...
RMSD of the structure determined by X-ray crystallography
X-ray crystallography
X-ray crystallography is a method of determining the arrangement of atoms within a crystal, in which a beam of X-rays strikes a crystal and causes the beam of light to spread into many specific directions. From the angles and intensities of these diffracted beams, a crystallographer can produce a...
, representing an unusually accurate structure prediction. Rosetta and RosettaDesign earned widespread recognition by being the first to design and accurately predict the structure of a novel protein of such length, as reflected by the 2002 paper describing the dual approach prompting two positive letters in the journal Science
Science (journal)
Science is the academic journal of the American Association for the Advancement of Science and is one of the world's top scientific journals....
, and being cited by more than 240 other scientific articles. The visible product of that research, Top7
Top7
Top7 is an artificial 93-residue protein, classified as a de novo protein since it was designed by Brian Kuhlman and Gautam Dantas in David Baker's laboratory at the University of Washington to have a unique fold not found in nature. The protein was designed ab initio on a computer with the help of...
, was featured as the Protein Data Bank
Protein Data Bank
The Protein Data Bank is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids....
's 'Molecule of the Month' in October 2006; a superposition
Structural alignment
Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules...
of the respective cores (residues 60–79) of its predicted and X-ray crystal structures are featured in the Rosetta@home logo.
Brian Kuhlman, a former postdoctoral associate in David Baker's
David Baker (biochemist)
David Baker is an American biochemist and computational biologist who studies methods to predict the three-dimensional structures of proteins. He is a Professor of Biochemistry at the University of Washington, where he is the principal investigator of the 30+ member...
lab and now an associate professor at the University of North Carolina, Chapel Hill, offers RosettaDesign as an online service.
RosettaDock
RosettaDock was added to the Rosetta software suite during the first CAPRICapri
Capri is an Italian island in the Tyrrhenian Sea off the Sorrentine Peninsula, on the south side of the Gulf of Naples, in the Campania region of Southern Italy...
experiment in 2002 as the Baker laboratory's algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
for protein–protein docking prediction. In that experiment, RosettaDock made a high-accuracy prediction for the docking between streptococcal pyogenic exotoxin A and a T cell-receptor β-chain, and a medium accuracy prediction for a complex between porcine α-amylase and a camelid
Camelid
Camelids are members of the biological family Camelidae, the only living family in the suborder Tylopoda. Dromedaries, Bactrian Camels, llamas, alpacas, vicuñas, and guanacos are in this group....
antibody
Antibody
An antibody, also known as an immunoglobulin, is a large Y-shaped protein used by the immune system to identify and neutralize foreign objects such as bacteria and viruses. The antibody recognizes a unique part of the foreign target, termed an antigen...
. While the RosettaDock method only made two acceptably accurate predictions out of seven possible, this was enough to rank it seventh out of nineteen prediction methods in the first CAPRI assessment.
Development of RosettaDock diverged into two branches for subsequent CAPRI rounds as Jeffrey Gray, who laid the groundwork for RosettaDock while at the University of Washington
University of Washington
University of Washington is a public research university, founded in 1861 in Seattle, Washington, United States. The UW is the largest university in the Northwest and the oldest public university on the West Coast. The university has three campuses, with its largest campus in the University...
, continued working on the method in his new position at Johns Hopkins University
Johns Hopkins University
The Johns Hopkins University, commonly referred to as Johns Hopkins, JHU, or simply Hopkins, is a private research university based in Baltimore, Maryland, United States...
. Members of the Baker laboratory further developed RosettaDock in Gray's absence. The two versions differed slightly in side-chain modeling, decoy selection and other areas. Despite these differences, both the Baker and Gray methods performed well in the second CAPRI assessment, placing fifth and seventh respectively out of 30 predictor groups. Jeffrey Gray's RosettaDock server is available as a free docking prediction service for non-commercial use.
In October 2006, RosettaDock was integrated into Rosetta@home. The method used a fast, crude docking model phase using only the protein backbone. This was followed by a slow full-atom refinement phase in which the orientation of the two interacting proteins relative to each other, and side-chain interactions at the protein–protein interface, were simultaneously optimized to find the lowest energy conformation. The vastly increased computational power afforded by the Rosetta@home network, in combination with revised "fold-tree" representations for backbone flexibility and loop modeling
Loop modeling
Loop modeling is a problem in protein structure prediction requiring the prediction of the conformations of loop regions in proteins without the use of a structural template. The problem arises often in homology modeling, where the tertiary structure of an amino acid sequence is predicted based on...
, made RosettaDock sixth out of 63 prediction groups in the third CAPRI assessment.
Robetta
The Robetta server is an automated protein structure prediction service offered by the Baker laboratory for non-commercial ab initio and comparative modeling. It has participated as an automated prediction server in the biannual CASPCASP
CASP, which stands for Critical Assessment of Techniques for Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994...
experiments since CASP5 in 2002, performing among the best in the automated server prediction category. Robetta has since competed in CASP6 and 7, where it did better than average among both automated server and human predictor groups.
In modeling protein structure as of CASP6, Robetta first searches for structural homologs using BLAST
BLAST
In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences...
, PSI-BLAST, and 3D-Jury
3D-Jury
3D-Jury is a metaserver that aggregates and compares models from various protein structure prediction servers. It takes in groups of predictions made by a collection of servers and assigns each pair a 3D-Jury score, based on structural similarity. The score is generated by counting the number of...
, then parses the target sequence into its individual domains
Protein domain
A protein domain is a part of protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain. Each domain forms a compact three-dimensional structure and often can be independently stable and folded. Many proteins consist of several structural...
, or independently folding units of proteins, by matching the sequence to structural families in the Pfam database
Pfam
Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models.- Features :For each family in Pfam one can:* Look at multiple alignments* View protein domain architectures...
. Domains with structural homologs then follow a "template-based model" (i.e., homology modeling
Homology modeling
Homology modeling, also known as comparative modeling of protein refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein...
) protocol. Here, the Baker laboratory's in-house alignment program, K*sync, produces a group of sequence homologs, and each of these is modeled by the Rosetta de novo method to produce a decoy (possible structure). The final structure prediction is selected by taking the lowest energy
Thermodynamic free energy
The thermodynamic free energy is the amount of work that a thermodynamic system can perform. The concept is useful in the thermodynamics of chemical or thermal processes in engineering and science. The free energy is the internal energy of a system less the amount of energy that cannot be used to...
model as determined by a low-resolution Rosetta energy function. For domains that have no detected structural homologs, a de novo protocol is followed in which the lowest energy model from a set of generated decoys is selected as the final prediction. These domain predictions are then connected together to investigate inter-domain, tertiary-level interactions within the protein. Finally, side-chain contributions are modeled using a protocol for Monte Carlo
Monte Carlo method
Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. Monte Carlo methods are often used in computer simulations of physical and mathematical systems...
conformational search.
In CASP8, Robetta was augmented to use Rosetta's high resolution all-atom refinement method, the absence of which was cited as the main cause for Robetta being less accurate than the Rosetta@home network in CASP7.
Foldit
On May 9, 2008, after Rosetta@home users suggested an interactive version of the distributed computingDistributed computing
Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...
program, the Baker lab publicly released Foldit
Foldit
Foldit is an online puzzle video game about protein folding. The game is part of an experimental research project, and is developed by the University of Washington's Center for Game Science in collaboration with the UW Department of Biochemistry...
, an online protein structure prediction game based on the Rosetta platform. As of September 25, 2008, Foldit has over 59,000 registered users. The game gives users a set of controls (e.g. "shake", "wiggle", "rebuild") to manipulate the backbone and amino acid side chain
Side chain
In organic chemistry and biochemistry, a side chain is a chemical group that is attached to a core part of the molecule called "main chain" or backbone. The placeholder R is often used as a generic placeholder for alkyl group side chains in chemical structure diagrams. To indicate other non-carbon...
s of the target protein into more energetically favorable conformations. Users can work on solutions individually as "soloists" or collectively as "evolvers", accruing points under either category as they improve their structure predictions. Users can also individually compete with other users through a "duel" feature, in which the player with the lowest energy structure after 20 moves wins.
Comparison to similar distributed computing projects
There are several distributed computed projects which have study areas similar to those of Rosetta@home, but differ in their research approach:Folding@home
Of all the major distributed computing projects involved in protein research, Folding@homeFolding@home
Folding@home is a distributed computing project designed to use spare processing power on personal computers to perform simulations of disease-relevant protein folding and other molecular dynamics, and to improve on the methods of doing so...
is the only one not to use the BOINC platform.
Both Rosetta@home and Folding@home study protein misfolding diseases such as Alzheimer's disease
Alzheimer's disease
Alzheimer's disease also known in medical literature as Alzheimer disease is the most common form of dementia. There is no cure for the disease, which worsens as it progresses, and eventually leads to death...
, but Folding@home does so much more exclusively.
Folding@home almost exclusively uses all-atom molecular dynamics
Molecular dynamics
Molecular dynamics is a computer simulation of physical movements of atoms and molecules. The atoms and molecules are allowed to interact for a period of time, giving a view of the motion of the atoms...
models to understand how and why proteins fold (or potentially misfold, and subsequently aggregate to cause diseases).
In other words, Folding@home's strength is modeling the process of protein folding, while Rosetta@home's strength is computational protein design and prediction of protein structure and docking.
Some of Rosetta@home's results are used as the basis for some Folding@home projects. Rosetta provides the most likely structure, but it is not definite if that is the form the molecule takes or whether or not it is viable. Folding@home can then be used to verify Rosetta@home's results, but can also provide additional atomic-level information, as well as details into how the molecule changes shape.
The two projects also differ significantly in their computing power and host diversity. Averaging about 6,650 teraFLOPS from a host base of CPUs, GPU
Graphics processing unit
A graphics processing unit or GPU is a specialized circuit designed to rapidly manipulate and alter memory in such a way so as to accelerate the building of images in a frame buffer intended for output to a display...
s and PS3s,
Folding@home has nearly 108 times more computing power than Rosetta@home.
World Community Grid
Both Phase I and Phase II of the Human Proteome Folding ProjectHuman Proteome Folding Project
The Human Proteome Folding Project is a collaborative effort between New York University , the Institute for Systems Biology and the University of Washington , using the Rosetta software developed by the Rosetta Commons....
(HPF), a subproject of World Community Grid
World Community Grid
World Community Grid is an effort to create the world's largest public computing grid to tackle scientific research projects that benefit humanity...
, have used the Rosetta program to make structural and functional annotations of various genomes. Although he now uses it to create databases for biologists, Richard Bonneau
Richard Bonneau
Richard Bonneau is an American computational biologist who studies methods to analyze systems biology datasets and methods to predict the three-dimensional structures of proteins. He is an Assistant Professor in the New York University Department of Biology and the Courant Institute for...
, head scientist of the Human Proteome Folding Project, was active in the original development of Rosetta at David Baker's laboratory while obtaining his PhD. More information on the relationship between the HPF1, HPF2 and Rosetta@home can be found on Richard Bonneau's website.
Predictor@home
Like Rosetta@home, Predictor@homePredictor@home
Predictor@home was a distributed computing project that used BOINC.It was established by The Scripps Research Institute to predict protein structure from protein sequence in the context of the 6th biannual CASP, or Critical Assessment of Techniques for Protein Structure Prediction...
specializes in protein structure prediction. Predictor@home plans to develop new areas for its distributed computing platform in protein design and docking (using the CHARMM
CHARMM
CHARMM is the name of a widely used set of force fields for molecular dynamics as well as the name for the molecular dynamics simulation and analysis package associated with them...
package for molecular dynamics), further likening it to Rosetta@home. While Rosetta@home uses the Rosetta program for its structure prediction, Predictor@home uses the dTASSER methodology.
Other protein related distributed computing projects on BOINC include QMC@home
QMC@Home
QMC@Home is a distributed computing project for the BOINC client aimed at further developing and testing Quantum Monte Carlo for use in quantum chemistry. It is hosted by the University of Münster with participation by the Cavendish Laboratory...
, Docking@home
Docking@Home
Docking@Home is a distributed computing project hosted by the University of Delaware and running on the Berkeley Open Infrastructure for Network Computing software platform. It models protein-ligand docking using the CHARMM program. The ultimate aim is the development of new pharmaceutical...
, POEM@home
POEM@Home
POEM@Home is a distributed computing project hosted by the Karlsruhe Institute of Technology and running on the Berkeley Open Infrastructure for Network Computing software platform...
, SIMAP
SIMAP
Similarity Matrix of Proteins, or SIMAP, is a database of protein similarities created using distributed computing, which is freely accessible for scientific purposes...
, and TANPAKU
TANPAKU
TANPAKU was a distributed computing project aimed at attacking the protein structure prediction problem. The project used the Berkeley Open Infrastructure for Network Computing platform, and was developed in collaboration with Yamato lab and Takeda lab groups at the Tokyo University of Science.The...
. RALPH@home, the Rosetta@home alpha project which tests new application versions, work units, and updates before they move on to Rosetta@home, runs on BOINC as well.
Volunteer contributions
Rosetta@home depends on computing power donated by individual project members for its research. As of October 18, 2011, about 40,000 users from 150 countries were active members of Rosetta@home, together contributing idle processor time from about 60,000 computers for a combined average performance of over 62 teraFLOPS.Users are granted BOINC credits
BOINC Credit System
Within the BOINC platform for volunteer computing, the BOINC Credit System helps volunteers keep track of how much CPU time they have donated to various distributed computing projects. The credit system is designed to avoid cheating by validating results before granting credit on projects...
as a measure of their contribution. The credit granted for each workunit is the number of decoys produced for that workunit multiplied by the average claimed credit for the decoys submitted by all computer hosts for that workunit. This custom system was designed to address significant differences between credit granted to users with the standard BOINC client and an optimized BOINC client, and credit differences between users running Rosetta@home on Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...
and Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
operating systems. The amount of credit granted per second of CPU work is lower for Rosetta@home than most other BOINC projects. Despite this disadvantage to BOINC users competing for rank, Rosetta@home is fifth out of over 40 BOINC projects in terms of total credit.
Rosetta@home users who predict protein structures submitted for the CASP
CASP
CASP, which stands for Critical Assessment of Techniques for Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994...
experiment are acknowledged in scientific publications regarding their results. Users who predict the lowest energy structure for a given workunit are featured on the Rosetta@home homepage
Homepage
A home page or homepage has various related meanings to do with web sites:* It most often refers to the initial or main web page of a web site, sometimes called the front page ....
as 'Predictor of the Day', along with any team of which they are a member. A 'User of the Day' is chosen at random each day to be on the homepage as well from users who have made a Rosetta@home profile.
External links
- Rosetta@home Project website
- Baker Lab Baker Lab website
- David Baker's Rosetta@home journal
- BOINC Includes platform overview, as well as a guide for installing BOINC and attaching to Rosetta@home
- BOINCstats – Rosetta@home Detailed contribution statistics
- RALPH@home Website for Rosetta@home alpha testing project
- Rosetta@home video on YouTube Overview of Rosetta@home given by David Baker and lab members
- Rosetta Commons Academic collaborative for development of the Rosetta platform
- http://sites.google.com/site/kuhlmanlabwebpage/ Kuhlman lab webpage, home of RosettaDesign
Online Rosetta services
- Robetta Protein structure prediction server
- RosettaDesign Protein design server
- RosettaDock Protein–protein docking server