Anduril (workflow engine)
Encyclopedia
Anduril is an open source component-based workflow framework for scientific data analysis developed at the Computational Systems Biology Laboratory, University of Helsinki
.
Anduril is designed to enable systematic, flexible and efficient data analysis, particularly in the field of high-throughput experiments in biomedical research. The workflow system currently provides components for several types of analysis such as sequencing, gene expression
, SNP, ChIP-on-chip
, comparative genomic hybridization
and exon microarray analysis as well as flow cytometry
and cell imaging
analysis.
, a popular multipurpose GUI, or from the command line.
The core Anduril engine is written in Java and components are written in a variety of programming languages, including Java, R
, MATLAB
, Lua, Perl
and Python
. Components may also have dependencies on third-party libraries, such as Bioconductor
. Components for cell imaging and microarray analysis are provided but additional components can be implemented by users. The Anduril core has been tested on Linux and Windows.
Commenting follows the syntax of Java:
Components are called by assigning their calls to named component instances. Names cannot be re-used within a single workflow. There are special components for input files that include external files to the script. Supported atomic types are integer, float, boolean and string, and typing is done implicitly.
Workflows are constructed by assigning outputs of component instances to inputs of following components.
Component instances can also be wrapped as functions.
In addition to standard if-else and switch-case statements, AndurilScript also includes for-loops.
framework for the cancer research
and molecular biology
. The framework provides a relation database that represents a graph of biological entities such as genes, protein, drugs, pathways, diseases, biological processes, cellular components, and molecular functions. In addition, there is a wide set of analysis and accession tools built on top of this data. The great majority of these tools are implemented as Anduril components and functions.
Moksiskaan is used mainly to interpret lists of candidate genes
obtained from the genome wide studies. Its tools can be used to generate graphs of biological entities related to the input genes. The exact for of these graphs may vary from the drug target predictions to the time series
of signalling cascades. Some of the goals of these tools are closely related to IPA.
University of Helsinki
The University of Helsinki is a university located in Helsinki, Finland since 1829, but was founded in the city of Turku in 1640 as The Royal Academy of Turku, at that time part of the Swedish Empire. It is the oldest and largest university in Finland with the widest range of disciplines available...
.
Anduril is designed to enable systematic, flexible and efficient data analysis, particularly in the field of high-throughput experiments in biomedical research. The workflow system currently provides components for several types of analysis such as sequencing, gene expression
Gene expression
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product. These products are often proteins, but in non-protein coding genes such as ribosomal RNA , transfer RNA or small nuclear RNA genes, the product is a functional RNA...
, SNP, ChIP-on-chip
ChIP-on-chip
ChIP-on-chip is a technique that combines chromatin immunoprecipitation with microarray technology . Like regular ChIP, ChIP-on-chip is used to investigate interactions between proteins and DNA in vivo...
, comparative genomic hybridization
Array comparative genomic hybridization
Array-comparative genomic hybridization is a technique to detect genomic copy number variations at a higher resolution level than chromosome-based comparative genomic hybridization .-Process:DNA from...
and exon microarray analysis as well as flow cytometry
Flow cytometry
Flow cytometry is a technique for counting and examining microscopic particles, such as cells and chromosomes, by suspending them in a stream of fluid and passing them by an electronic detection apparatus. It allows simultaneous multiparametric analysis of the physical and/or chemical...
and cell imaging
Image analysis
Image analysis is the extraction of meaningful information from images; mainly from digital images by means of digital image processing techniques...
analysis.
Architecture and Features
A workflow is a series of processing steps connected together so that the output of one step is used as the input of another. Processing steps implement data analysis tasks such as data importing, statistical tests and report generation. In Anduril, processing steps are implemented using components, which are reusable executable code that can be written in any programming language. Components are wired together into a workflow, or a component network, that is executed by the Anduril workflow engine. Workflow configuration is done using a simple yet powerful scripting language, AndurilScript. Workflow configuration and execution can be done from EclipseEclipse (software)
Eclipse is a multi-language software development environment comprising an integrated development environment and an extensible plug-in system...
, a popular multipurpose GUI, or from the command line.
The core Anduril engine is written in Java and components are written in a variety of programming languages, including Java, R
R (programming language)
R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....
, MATLAB
MATLAB
MATLAB is a numerical computing environment and fourth-generation programming language. Developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages,...
, Lua, Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...
and Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
. Components may also have dependencies on third-party libraries, such as Bioconductor
Bioconductor
Bioconductor is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology....
. Components for cell imaging and microarray analysis are provided but additional components can be implemented by users. The Anduril core has been tested on Linux and Windows.
AndurilScript language
Hello world in AndurilScript is simply
std.echo("Hello world!")
Commenting follows the syntax of Java:
// A simple comment
/* Another simple comment */
/** A description that will be included in component description */
Components are called by assigning their calls to named component instances. Names cannot be re-used within a single workflow. There are special components for input files that include external files to the script. Supported atomic types are integer, float, boolean and string, and typing is done implicitly.
in1 = INPUT(path="myFile.csv")
constant1 = 1
componentInstance1 = MyComponent(inputPort1 = in1, inputParam1 = constant1)
Workflows are constructed by assigning outputs of component instances to inputs of following components.
componentInstance2 = AnotherComponent(inputPort1 = componentInstance1.outputPort1)
Component instances can also be wrapped as functions.
function MyFunction(InType1 in1, ..., optional InTypeM inM,
ParType1 param1, ..., ParTypeP paramP=defaultP)
-> (OutType1 out1, ..., OutTypeN outN)
{
... statements ...
return record(out1=x1, ..., outN=xN)
}
In addition to standard if-else and switch-case statements, AndurilScript also includes for-loops.
// Iterates over 1, 2, ..., 10
array = record
for i: std.range(1, 10) {
array[i] = SomeComponent(k=i)
}
Extensibility
Anduril can be extended on multiple levels. Users can add new components to existing component bundles. However, if the new component or components carry out tasks that are not related to existing bundles, users can also create new bundles.Moksiskaan
Moksiskaan is a data integrationData integration
Data integration involves combining data residing in different sources and providing users with a unified view of these data.This process becomes significant in a variety of situations, which include both commercial and scientific domains...
framework for the cancer research
Cancer research
Cancer research is basic research into cancer in order to identify causes and develop strategies for prevention, diagnosis, treatments and cure....
and molecular biology
Molecular biology
Molecular biology is the branch of biology that deals with the molecular basis of biological activity. This field overlaps with other areas of biology and chemistry, particularly genetics and biochemistry...
. The framework provides a relation database that represents a graph of biological entities such as genes, protein, drugs, pathways, diseases, biological processes, cellular components, and molecular functions. In addition, there is a wide set of analysis and accession tools built on top of this data. The great majority of these tools are implemented as Anduril components and functions.
Moksiskaan is used mainly to interpret lists of candidate genes
Candidate gene
A candidate gene is a gene, located in a chromosome region suspected of being involved in the expression of a trait such as a disease, whose protein product suggests that it could be the gene in question...
obtained from the genome wide studies. Its tools can be used to generate graphs of biological entities related to the input genes. The exact for of these graphs may vary from the drug target predictions to the time series
Time series
In statistics, signal processing, econometrics and mathematical finance, a time series is a sequence of data points, measured typically at successive times spaced at uniform time intervals. Examples of time series are the daily closing value of the Dow Jones index or the annual flow volume of the...
of signalling cascades. Some of the goals of these tools are closely related to IPA.
See also
- Bioinformatics workflow management systemsBioinformatics workflow management systemsA bioinformatics workflow management system is a specialized form of workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, in a specific domain of science, bioinformatics....
- GenePatternGenePatternis a freely available software package developed at the Broad Institute of MIT and Harvard for the analysis of genomic data. Designed to enable researchers to develop, capture, and reproduce genomic analysis methodologies, GenePattern was first released in 2004...
- KeplerKepler scientific workflow systemKepler is a free software system for designing, executing, reusing, evolving, archiving, and sharing scientific workflows.Kepler's facilities provide process and data monitoring, provenance information, and high-speed data movement solutions...
- TavernaTaverna workbenchTaverna Workbench is an open source software tool for designing and executing workflows, created by the myGrid project and funded through the OMII-UK...
Further reading
- Scientists develop new database that provides comprehensive view of Glioblastoma Multiforme genome in the Cancer Genome Atlas Research Briefs, March 2011, by Catherine Evans.Open accessabstractonline