Biological network inference
Encyclopedia
Biological network inference is the process of making inference
s and predictions about biological networks.
. Still less is known on the parameters governing the behavior of such networks over time, how the networks at different levels in a cell interact, and how to predict the complete state description of a eukaryotic
cell or bacterial organism at a given point in the future. Systems biology
, in this sense, is still in its infancy. Prediction is the subject of dynamic modeling. This article focuses on a necessary prerequisite to dynamic modeling of a network: inference of the topology
, that is, prediction of the "wiring diagram" of the network. More specifically, we focus here on inference of biological network structure using the growing sets of high-throughput expression data for gene
s, protein
s, and metabolites
.
Briefly, methods using high-throughput data for inference of regulatory networks rely on searching for patterns of partial correlation or conditional probabilities that indicate causal influence. Such patterns of partial correlations found in the high-throughput data, possibly combined with other supplemental data on the genes or proteins in the proposed networks, or combined with other information on the organism, form the basis upon which such algorithm
s work.
Such algorithms can be of use in inferring the topology of any network where the change in state of one node
can affect the state of other nodes.
1) Transcriptional regulatory networks. Genes are the nodes and the edges are directed. A gene serves as the source of a direct regulatory edge to a target gene by producing an RNA
or protein molecule that functions as a transcriptional activator or inhibitor of the target gene. If the gene is an activator, then it is the source of a positive regulatory connection; if an inhibitor, then it is the source of a negative regulatory connection. Computational algorithms used to infer the topology take as primary input the data from a set of microarray
runs measuring the mRNA expression levels of the genes under consideration for inclusion in the network.
As of 2007, the great bulk of high-throughput data being fed into correlation-based algorithms comes from microarray experiments, and such analysis is the most fruitful point of biological application for such algorithms. (This is reflected in the reference list at bottom, where almost all bioinformatic algorithm references are directed toward use of microarray data.) Clustering or some form of statistical classification is typically employed to perform an initial organization of the high-throughput mRNA expression values derived from microarray experiments. The question then arises: how can the clustering or classification results be connected to the underlying biology? Such results can be useful for pattern classification – for example, to classify subtypes of cancer
, or to predict differential responses to a drug
(pharmacogenomics). But to understand the relationships between the genes, that is, to more precisely define the influence of each gene on the others, the scientist typically attempts to reconstruct the transcriptional regulatory network. This can be done by using background literature, or information in public database
s, combined with the clustering results. It can also be done by the application of a correlation-based inference algorithm, as will be discussed below, an approach which is having increased success as the size of the available microarray sets keeps increasing
2) Signal transduction
networks (very important in the biology of cancer). Proteins are the nodes and the edges are directed. Primary input into the inference algorithm would be data from a set of experiments measuring protein activation / inactivation (e.g., phosphorylation
/ dephosphorylation) across a set of proteins.
3) Metabolite
networks. Metabolites are the nodes and the edges are directed. Primary input into an algorithm would be data from a set of experiments measuring metabolite levels.
4) Intraspecies or interspecies communication networks in microbial communities. Nodes are excreted
organic compounds and the edges are directed. Input into an inference algorithm is data from a set of experiments measuring levels of excreted molecules.
Protein-protein interaction networks are also under very active study. However, reconstruction of these networks does not use correlation-based inference in the sense discussed for the networks already described (interaction does not necessarily imply a change in protein state), and a description of such interaction network reconstruction is left to other articles.
Inference
Inference is the act or process of deriving logical conclusions from premises known or assumed to be true. The conclusion drawn is also called an idiomatic. The laws of valid inference are studied in the field of logic.Human inference Inference is the act or process of deriving logical conclusions...
s and predictions about biological networks.
Biological networks
Many types of biological networks exist. Few such networks are known in anything approaching their complete structure, even in the simplest bacteriaBacteria
Bacteria are a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria have a wide range of shapes, ranging from spheres to rods and spirals...
. Still less is known on the parameters governing the behavior of such networks over time, how the networks at different levels in a cell interact, and how to predict the complete state description of a eukaryotic
Eukaryote
A eukaryote is an organism whose cells contain complex structures enclosed within membranes. Eukaryotes may more formally be referred to as the taxon Eukarya or Eukaryota. The defining membrane-bound structure that sets eukaryotic cells apart from prokaryotic cells is the nucleus, or nuclear...
cell or bacterial organism at a given point in the future. Systems biology
Systems biology
Systems biology is a term used to describe a number of trends in bioscience research, and a movement which draws on those trends. Proponents describe systems biology as a biology-based inter-disciplinary study field that focuses on complex interactions in biological systems, claiming that it uses...
, in this sense, is still in its infancy. Prediction is the subject of dynamic modeling. This article focuses on a necessary prerequisite to dynamic modeling of a network: inference of the topology
Topology
Topology is a major area of mathematics concerned with properties that are preserved under continuous deformations of objects, such as deformations that involve stretching, but no tearing or gluing...
, that is, prediction of the "wiring diagram" of the network. More specifically, we focus here on inference of biological network structure using the growing sets of high-throughput expression data for gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...
s, protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...
s, and metabolites
Metabolism
Metabolism is the set of chemical reactions that happen in the cells of living organisms to sustain life. These processes allow organisms to grow and reproduce, maintain their structures, and respond to their environments. Metabolism is usually divided into two categories...
.
Briefly, methods using high-throughput data for inference of regulatory networks rely on searching for patterns of partial correlation or conditional probabilities that indicate causal influence. Such patterns of partial correlations found in the high-throughput data, possibly combined with other supplemental data on the genes or proteins in the proposed networks, or combined with other information on the organism, form the basis upon which such algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
s work.
Such algorithms can be of use in inferring the topology of any network where the change in state of one node
Node (networking)
In communication networks, a node is a connection point, either a redistribution point or a communication endpoint . The definition of a node depends on the network and protocol layer referred to...
can affect the state of other nodes.
Computational inference methods
In a topological sense, a network is a set of nodes and a set of directed or undirected edges between the nodes. Biological networks currently under study using such computational inference methods include:1) Transcriptional regulatory networks. Genes are the nodes and the edges are directed. A gene serves as the source of a direct regulatory edge to a target gene by producing an RNA
RNA
Ribonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....
or protein molecule that functions as a transcriptional activator or inhibitor of the target gene. If the gene is an activator, then it is the source of a positive regulatory connection; if an inhibitor, then it is the source of a negative regulatory connection. Computational algorithms used to infer the topology take as primary input the data from a set of microarray
Microarray
A microarray is a multiplex lab-on-a-chip. It is a 2D array on a solid substrate that assays large amounts of biological material using high-throughput screening methods.Types of microarrays include:...
runs measuring the mRNA expression levels of the genes under consideration for inclusion in the network.
As of 2007, the great bulk of high-throughput data being fed into correlation-based algorithms comes from microarray experiments, and such analysis is the most fruitful point of biological application for such algorithms. (This is reflected in the reference list at bottom, where almost all bioinformatic algorithm references are directed toward use of microarray data.) Clustering or some form of statistical classification is typically employed to perform an initial organization of the high-throughput mRNA expression values derived from microarray experiments. The question then arises: how can the clustering or classification results be connected to the underlying biology? Such results can be useful for pattern classification – for example, to classify subtypes of cancer
Cancer
Cancer , known medically as a malignant neoplasm, is a large group of different diseases, all involving unregulated cell growth. In cancer, cells divide and grow uncontrollably, forming malignant tumors, and invade nearby parts of the body. The cancer may also spread to more distant parts of the...
, or to predict differential responses to a drug
Drug
A drug, broadly speaking, is any substance that, when absorbed into the body of a living organism, alters normal bodily function. There is no single, precise definition, as there are different meanings in drug control law, government regulations, medicine, and colloquial usage.In pharmacology, a...
(pharmacogenomics). But to understand the relationships between the genes, that is, to more precisely define the influence of each gene on the others, the scientist typically attempts to reconstruct the transcriptional regulatory network. This can be done by using background literature, or information in public database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
s, combined with the clustering results. It can also be done by the application of a correlation-based inference algorithm, as will be discussed below, an approach which is having increased success as the size of the available microarray sets keeps increasing
2) Signal transduction
Signal transduction
Signal transduction occurs when an extracellular signaling molecule activates a cell surface receptor. In turn, this receptor alters intracellular molecules creating a response...
networks (very important in the biology of cancer). Proteins are the nodes and the edges are directed. Primary input into the inference algorithm would be data from a set of experiments measuring protein activation / inactivation (e.g., phosphorylation
Phosphorylation
Phosphorylation is the addition of a phosphate group to a protein or other organic molecule. Phosphorylation activates or deactivates many protein enzymes....
/ dephosphorylation) across a set of proteins.
3) Metabolite
Metabolite
Metabolites are the intermediates and products of metabolism. The term metabolite is usually restricted to small molecules. A primary metabolite is directly involved in normal growth, development, and reproduction. Alcohol is an example of a primary metabolite produced in large-scale by industrial...
networks. Metabolites are the nodes and the edges are directed. Primary input into an algorithm would be data from a set of experiments measuring metabolite levels.
4) Intraspecies or interspecies communication networks in microbial communities. Nodes are excreted
Excretion
Excretion is the process by which waste products of metabolism and other non-useful materials are eliminated from an organism. This is primarily carried out by the lungs, kidneys and skin. This is in contrast with secretion, where the substance may have specific tasks after leaving the cell...
organic compounds and the edges are directed. Input into an inference algorithm is data from a set of experiments measuring levels of excreted molecules.
Protein-protein interaction networks are also under very active study. However, reconstruction of these networks does not use correlation-based inference in the sense discussed for the networks already described (interaction does not necessarily imply a change in protein state), and a description of such interaction network reconstruction is left to other articles.