IPlant Collaborative
Encyclopedia
The iPlant Collaborative is a virtual organization
created by a cooperative agreement funded by the US National Science Foundation
(NSF) to create cyberinfrastructure
for the plant sciences (botany
). The NSF compared cyberinfrastructure to physical infrastructure
, "... the distributed computer
, information and communication technologies
combined with the personnel and integrating components that provide a long-term platform to empower the modern scientific research endeavor."
The project develops computing systems and software that combine computing resources, like those of TeraGrid
, and bioinformatics
and computational biology
software. Its goal is easier collaboration among researchers with improved data access and processing efficiency. Primarily centered in the United States, it collaborates internationally.
, computational biology
, DNA sequencing
, geographic information systems and others computers can greatly assist researchers who study plant life looking for solutions to challenges in medicine
, biofuels, biodiversity
, agriculture
and problems like drought tolerance
, plant breeding
, and sustainable farming. Many of these problems cross traditional disciplines and facilitating collaboration between plant scientists of diverse backgrounds and specialties is necessary.
In 2006, the NSF solicited proposals to create "a new type of organization – a cyberinfrastructure collaborative for plant science" with a program titled "Plant Science Cyberinfrastructure Collaborative" (PSCIC) with Christopher Greer as program director. A proposal was accepted (adopting the convention of using the word "Collaborative" as a noun) and iPlant was officially created on February 1, 2008.
Funding was estimated as $10 million per year over five years.
Richard Jorgensen led the team through the proposal stage and was the principal investigator
(PI) from 2008 to 2009. Gregory Andrews, Vicki Chandler, Sudha Ram and Lincoln Stein served as Co-Principal Investigators (Co-PIs) from 2008 to 2009. In late 2009, Stephen Goff was named PI and Daniel Stanzione was added as a Co-PI.
The iPlant project supports what has been called e-Science
, which is a use of information systems technology that is being adopted by the research community in efforts such as the National Center for Ecological Analysis and Synthesis
(NCEAS), ELIXIR, and the Bamboo Technology Project that started in September 2010. iPlant is "designed to create the foundation to support the computational needs of the research community and facilitate progress toward solutions of major problems in plant biology."
The project works as a collaboration
. It seeks input from the wider plant science community on what to build.
Based on that input, it has enabled easier use of large data sets, created a community-driven research environment to share existing data collections within a research area and between research areas and shares data with provenance
tracking.
One model studied for collaboration was Wikipedia
.
Several more recent National Science Foundation awards mentioned iPlant explicitly in their descriptions, as either a design pattern to follow or a collaborator with whom the recipient will work.
, located within the BIO5 Institute in Tucson. Since its inception in 2008, personnel worked at other institutions including Cold Spring Harbor Laboratory
, University of North Carolina, Wilmington, and the University of Texas at Austin
in the Texas Advanced Computing Center
.
Purdue University
and Arizona State University
were part of the original project group.
Other collaborating institutions that received support from iPlant for their work on a Grand Challenge
in phylogenetics
starting in March 2009 included Yale University
, University of Florida
, and the University of Pennsylvania
.
A trait evolution group was led at the University of Tennessee
.
A visualization project added Virginia Polytechnic Institute and State University
(Virginia Tech).
The NSF requires that funding subcontracts stay within the United States, but international collaboration started in 2009 with the Technical University Munich and University of Toronto
in 2010.
East Main Educational Consulting provides external oversight, advice, and assistance.
s of data using high-performance supercomputers to perform these tasks much more quickly. It has an interface designed to hide the complexity needed to do this from the end user. The goal was to make the cyberinfrastructure available to non-technical end users who are not as comfortable using a command-line interface
.
s (APIs) for developers allow access to iPlant services, including authentication, data management, high performance supercomputing resources from custom, locally produced software.
platform that provides easy access to pre-configured, frequently used analysis routines, relevant algorithms, and data sets, and accommodates computationally and data-intensive bioinformatics tasks.
It uses the Eucalyptus
virtualization platform.
linking using a plant science focused ontology
.
The My-Plant network uses the terminology clade
s to group users in a manner similar to phylogenetics
of plants themselves.
It was implemented using Drupal
as its content management system
.
(GUI) to generate DNA
sequence annotations, explore plant genome
s for members of gene and transposon
families, and conduct phylogenetic
analyses. It makes high-level DNA analysis available to faculty and students by simplifying annotation and comparative genomics workflows.
It was developed for iPlant by the Dolan DNA Learning Center
.
Virtual Organization (Grid computing)
In grid computing, a Virtual Organization refers to a dynamic set of individuals or institutions defined around a set of resource-sharing rules and conditions...
created by a cooperative agreement funded by the US National Science Foundation
National Science Foundation
The National Science Foundation is a United States government agency that supports fundamental research and education in all the non-medical fields of science and engineering. Its medical counterpart is the National Institutes of Health...
(NSF) to create cyberinfrastructure
Cyberinfrastructure
United States federal research funders use the term cyberinfrastructure to describe research environments that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services distributed over...
for the plant sciences (botany
Botany
Botany, plant science, or plant biology is a branch of biology that involves the scientific study of plant life. Traditionally, botany also included the study of fungi, algae and viruses...
). The NSF compared cyberinfrastructure to physical infrastructure
Infrastructure
Infrastructure is basic physical and organizational structures needed for the operation of a society or enterprise, or the services and facilities necessary for an economy to function...
, "... the distributed computer
Distributed computing
Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...
, information and communication technologies
Information technology
Information technology is the acquisition, processing, storage and dissemination of vocal, pictorial, textual and numerical information by a microelectronics-based combination of computing and telecommunications...
combined with the personnel and integrating components that provide a long-term platform to empower the modern scientific research endeavor."
The project develops computing systems and software that combine computing resources, like those of TeraGrid
TeraGrid
TeraGrid is an e-Science grid computing infrastructure combining resources at eleven partner sites. The project started in 2001 and operated from 2004 through 2011....
, and bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...
and computational biology
Computational biology
Computational biology involves the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems...
software. Its goal is easier collaboration among researchers with improved data access and processing efficiency. Primarily centered in the United States, it collaborates internationally.
History
Biology is relying more and more on computers. Plant biology is changing with the rise of new technologies. With the advent of bioinformaticsBioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...
, computational biology
Computational biology
Computational biology involves the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems...
, DNA sequencing
DNA sequencing
DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA....
, geographic information systems and others computers can greatly assist researchers who study plant life looking for solutions to challenges in medicine
Medicine
Medicine is the science and art of healing. It encompasses a variety of health care practices evolved to maintain and restore health by the prevention and treatment of illness....
, biofuels, biodiversity
Biodiversity
Biodiversity is the degree of variation of life forms within a given ecosystem, biome, or an entire planet. Biodiversity is a measure of the health of ecosystems. Biodiversity is in part a function of climate. In terrestrial habitats, tropical regions are typically rich whereas polar regions...
, agriculture
Agriculture
Agriculture is the cultivation of animals, plants, fungi and other life forms for food, fiber, and other products used to sustain life. Agriculture was the key implement in the rise of sedentary human civilization, whereby farming of domesticated species created food surpluses that nurtured the...
and problems like drought tolerance
Drought tolerance
Drought tolerance refers to the degree to which a plant is adapted to arid or drought conditions. Desiccation tolerance is an extreme degree of drought tolerance...
, plant breeding
Plant breeding
Plant breeding is the art and science of changing the genetics of plants in order to produce desired characteristics. Plant breeding can be accomplished through many different techniques ranging from simply selecting plants with desirable characteristics for propagation, to more complex molecular...
, and sustainable farming. Many of these problems cross traditional disciplines and facilitating collaboration between plant scientists of diverse backgrounds and specialties is necessary.
In 2006, the NSF solicited proposals to create "a new type of organization – a cyberinfrastructure collaborative for plant science" with a program titled "Plant Science Cyberinfrastructure Collaborative" (PSCIC) with Christopher Greer as program director. A proposal was accepted (adopting the convention of using the word "Collaborative" as a noun) and iPlant was officially created on February 1, 2008.
Funding was estimated as $10 million per year over five years.
Richard Jorgensen led the team through the proposal stage and was the principal investigator
Principal investigator
A principal investigator is the lead scientist or engineer for a particular well-defined science project, such as a laboratory study or clinical trial....
(PI) from 2008 to 2009. Gregory Andrews, Vicki Chandler, Sudha Ram and Lincoln Stein served as Co-Principal Investigators (Co-PIs) from 2008 to 2009. In late 2009, Stephen Goff was named PI and Daniel Stanzione was added as a Co-PI.
The iPlant project supports what has been called e-Science
E-Science
E-Science is computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable distributed collaboration, such as the Access Grid...
, which is a use of information systems technology that is being adopted by the research community in efforts such as the National Center for Ecological Analysis and Synthesis
National Center for Ecological Analysis and Synthesis
The National Center for Ecological Analysis and Synthesis is a research center at the University of California, Santa Barbara, in Santa Barbara, California. Better known by its acronym, NCEAS opened in May 1995, and is funded by the U.S...
(NCEAS), ELIXIR, and the Bamboo Technology Project that started in September 2010. iPlant is "designed to create the foundation to support the computational needs of the research community and facilitate progress toward solutions of major problems in plant biology."
The project works as a collaboration
Collaboration
Collaboration is working together to achieve a goal. It is a recursive process where two or more people or organizations work together to realize shared goals, — for example, an intriguing endeavor that is creative in nature—by sharing...
. It seeks input from the wider plant science community on what to build.
Based on that input, it has enabled easier use of large data sets, created a community-driven research environment to share existing data collections within a research area and between research areas and shares data with provenance
Provenance
Provenance, from the French provenir, "to come from", refers to the chronology of the ownership or location of an historical object. The term was originally mostly used for works of art, but is now used in similar senses in a wide range of fields, including science and computing...
tracking.
One model studied for collaboration was Wikipedia
Wikipedia
Wikipedia is a free, web-based, collaborative, multilingual encyclopedia project supported by the non-profit Wikimedia Foundation. Its 20 million articles have been written collaboratively by volunteers around the world. Almost all of its articles can be edited by anyone with access to the site,...
.
Several more recent National Science Foundation awards mentioned iPlant explicitly in their descriptions, as either a design pattern to follow or a collaborator with whom the recipient will work.
Institutions
The primary institution for the iPlant project is the University of ArizonaUniversity of Arizona
The University of Arizona is a land-grant and space-grant public institution of higher education and research located in Tucson, Arizona, United States. The University of Arizona was the first university in the state of Arizona, founded in 1885...
, located within the BIO5 Institute in Tucson. Since its inception in 2008, personnel worked at other institutions including Cold Spring Harbor Laboratory
Cold Spring Harbor Laboratory
The Cold Spring Harbor Laboratory is a private, non-profit institution with research programs focusing on cancer, neurobiology, plant genetics, genomics and bioinformatics. The Laboratory has a broad educational mission, including the recently established Watson School of Biological Sciences. It...
, University of North Carolina, Wilmington, and the University of Texas at Austin
University of Texas at Austin
The University of Texas at Austin is a state research university located in Austin, Texas, USA, and is the flagship institution of the The University of Texas System. Founded in 1883, its campus is located approximately from the Texas State Capitol in Austin...
in the Texas Advanced Computing Center
Texas Advanced Computing Center
The Texas Advanced Computing Center at the University of Texas at Austin, United States, is a research center for advanced computational science, engineering and technology. TACC is located on UT's J.J. Pickle Research Campus....
.
Purdue University
Purdue University
Purdue University, located in West Lafayette, Indiana, U.S., is the flagship university of the six-campus Purdue University system. Purdue was founded on May 6, 1869, as a land-grant university when the Indiana General Assembly, taking advantage of the Morrill Act, accepted a donation of land and...
and Arizona State University
Arizona State University
Arizona State University is a public research university located in the Phoenix Metropolitan Area of the State of Arizona...
were part of the original project group.
Other collaborating institutions that received support from iPlant for their work on a Grand Challenge
Grand Challenge
Grand Challenges were USA policy terms set as goals in the late 1980s for funding high-performance computing and communications research in part in response to the Japanese 5th Generation 10-year project....
in phylogenetics
Phylogenetics
In biology, phylogenetics is the study of evolutionary relatedness among groups of organisms , which is discovered through molecular sequencing data and morphological data matrices...
starting in March 2009 included Yale University
Yale University
Yale University is a private, Ivy League university located in New Haven, Connecticut, United States. Founded in 1701 in the Colony of Connecticut, the university is the third-oldest institution of higher education in the United States...
, University of Florida
University of Florida
The University of Florida is an American public land-grant, sea-grant, and space-grant research university located on a campus in Gainesville, Florida. The university traces its historical origins to 1853, and has operated continuously on its present Gainesville campus since September 1906...
, and the University of Pennsylvania
University of Pennsylvania
The University of Pennsylvania is a private, Ivy League university located in Philadelphia, Pennsylvania, United States. Penn is the fourth-oldest institution of higher education in the United States,Penn is the fourth-oldest using the founding dates claimed by each institution...
.
A trait evolution group was led at the University of Tennessee
University of Tennessee
The University of Tennessee is a public land-grant university headquartered at Knoxville, Tennessee, United States...
.
A visualization project added Virginia Polytechnic Institute and State University
Virginia Polytechnic Institute and State University
Virginia Polytechnic Institute and State University, popularly known as Virginia Tech , is a public land-grant university with the main campus in Blacksburg, Virginia with other research and educational centers throughout the Commonwealth of Virginia, United States, and internationally.Founded in...
(Virginia Tech).
The NSF requires that funding subcontracts stay within the United States, but international collaboration started in 2009 with the Technical University Munich and University of Toronto
University of Toronto
The University of Toronto is a public research university in Toronto, Ontario, Canada, situated on the grounds that surround Queen's Park. It was founded by royal charter in 1827 as King's College, the first institution of higher learning in Upper Canada...
in 2010.
East Main Educational Consulting provides external oversight, advice, and assistance.
Services
The iPlant project makes its cyberinfrastructure available several different ways and offers services to make it the accessible to its primary audience. The design was meant to grow in response to needs of the research community it serves.The Discovery Environment
The Discovery Environment integrates community-recommended software tools into a system that can handle terabyteTerabyte
The terabyte is a multiple of the unit byte for digital information. The prefix tera means 1012 in the International System of Units , and therefore 1 terabyte is , or 1 trillion bytes, or 1000 gigabytes. 1 terabyte in binary prefixes is 0.9095 tebibytes, or 931.32 gibibytes...
s of data using high-performance supercomputers to perform these tasks much more quickly. It has an interface designed to hide the complexity needed to do this from the end user. The goal was to make the cyberinfrastructure available to non-technical end users who are not as comfortable using a command-line interface
Command-line interface
A command-line interface is a mechanism for interacting with a computer operating system or software by typing commands to perform specific tasks...
.
iPlant Foundational APIs
A set of application programming interfaceApplication programming interface
An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...
s (APIs) for developers allow access to iPlant services, including authentication, data management, high performance supercomputing resources from custom, locally produced software.
Atmosphere
Atmosphere is a cloud computingCloud computing
Cloud computing is the delivery of computing as a service rather than a product, whereby shared resources, software, and information are provided to computers and other devices as a utility over a network ....
platform that provides easy access to pre-configured, frequently used analysis routines, relevant algorithms, and data sets, and accommodates computationally and data-intensive bioinformatics tasks.
It uses the Eucalyptus
Eucalyptus (computing)
Eucalyptus is a software platform for the implementation of private cloud computing on computer clusters. There is an open-core enterprise edition and an open-source edition. Currently, it exports a user-facing interface that is compatible with the Amazon EC2 and S3 services but the platform is...
virtualization platform.
iPlant Semantic Web
The iPlant Semantic Web effort uses an iPlant-created architecture, protocol, and platform called the Simple Semantic Web Architecture and Protocol (SSWAP) for semantic webSemantic Web
The Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...
linking using a plant science focused ontology
Ontology
Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories of being and their relations...
.
Taxonomic Name Resolution Service
The Taxonomic Name Resolution Service (TNRS) is a free utility for correcting and standardizing plant names. This is needed because plant names that are misspelled, out of date (because a newer synonym is preferred), or incomplete make it impossible to use computers to process large lists.My-Plant
My-Plant.org is a social networking community for plant biologists, educators and others to come together to share information and research, collaborate, and track the latest developments in plant science.The My-Plant network uses the terminology clade
Clade
A clade is a group consisting of a species and all its descendants. In the terms of biological systematics, a clade is a single "branch" on the "tree of life". The idea that such a "natural group" of organisms should be grouped together and given a taxonomic name is central to biological...
s to group users in a manner similar to phylogenetics
Phylogenetics
In biology, phylogenetics is the study of evolutionary relatedness among groups of organisms , which is discovered through molecular sequencing data and morphological data matrices...
of plants themselves.
It was implemented using Drupal
Drupal
Drupal is a free and open-source content management system and content management framework written in PHP and distributed under the GNU General Public License. It is used as a back-end system for at least 1.5% of all websites worldwide ranging from personal blogs to corporate, political, and...
as its content management system
Content management system
A content management system is a system providing a collection of procedures used to manage work flow in a collaborative environment. These procedures can be manual or computer-based...
.
DNA Subway
The DNA Subway website uses a graphical user interfaceGraphical user interface
In computing, a graphical user interface is a type of user interface that allows users to interact with electronic devices with images rather than text commands. GUIs can be used in computers, hand-held devices such as MP3 players, portable media players or gaming devices, household appliances and...
(GUI) to generate DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...
sequence annotations, explore plant genome
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....
s for members of gene and transposon
Transposon
Transposable elements are sequences of DNA that can move or transpose themselves to new positions within the genome of a single cell. The mechanism of transposition can be either "copy and paste" or "cut and paste". Transposition can create phenotypically significant mutations and alter the cell's...
families, and conduct phylogenetic
Phylogenetics
In biology, phylogenetics is the study of evolutionary relatedness among groups of organisms , which is discovered through molecular sequencing data and morphological data matrices...
analyses. It makes high-level DNA analysis available to faculty and students by simplifying annotation and comparative genomics workflows.
It was developed for iPlant by the Dolan DNA Learning Center
Dolan DNA Learning Center
DNA Learning Center is a genetics learning center affiliated with the Cold Spring Harbor Laboratory, in Cold Spring Harbor, New York. It is the world's first science center devoted entirely to genetics education and offers online education, class field trips, student summer day camps, and teacher...
.