Cyclomatic complexity
Encyclopedia
Cyclomatic complexity is a software metric
(measurement). It was developed by Thomas J. McCabe, Sr. in 1976 and is used to indicate the complexity of a program. It directly measures the number of linearly independent paths through a program's source code
. The concept, although not the method, is somewhat similar to that of general text complexity measured by the Flesch-Kincaid Readability Test
.
Cyclomatic complexity is computed using the control flow graph
of the program: the nodes of the graph
correspond to indivisible groups of commands of a program, and a directed
edge connects two nodes if the second command might be executed immediately after the first command. Cyclomatic complexity may also be applied to individual functions, modules
, methods
or classes
within a program.
One testing
strategy, called Basis Path Testing by McCabe who first proposed it, is to test each linearly independent path through the program; in this case, the number of test cases will equal the cyclomatic complexity of the program.
through the source code
. For instance, if the source code contained no decision points such as IF statements or FOR loops, the complexity would be 1, since there is only a single path through the code. If the code had a single IF statement containing a single condition there would be two paths through the code, one path where the IF statement is evaluated as TRUE and one path where the IF statement is evaluated as FALSE.
Mathematically, the cyclomatic complexity of a structured program
Here "structured" means in particular "with a single exit (return statement
) per function". is defined with reference to the control flow graph
of the program, a directed graph
containing the basic block
s of the program, with an edge between two basic blocks if control may pass from the first to the second. The complexity M is then defined as:
where
An alternative formulation is to use a graph in which each exit point is connected back to the entry point. In this case, the graph is said to be strongly connected, and the cyclomatic complexity of the program is equal to the cyclomatic number of its graph (also known as the first Betti number), which is defined as:
This may be seen as calculating the number of linearly independent cycles that exist in the graph, i.e. those cycles that do not contain other cycles within themselves. Note that because each exit point loops back to the entry point, there is at least one such cycle for each exit point.
For a single program (or subroutine or method), P is always equal to 1. Cyclomatic complexity may, however, be applied to several such programs or subprograms at the same time (e.g., to all of the methods in a class), and in these cases P will be equal to the number of programs in question, as each subprogram will appear as a disconnected subset of the graph.
It can be shown that the cyclomatic complexity of any structured program with only one entrance point and one exit point is equal to the number of decision points (i.e., 'if' statements or conditional loops) contained in that program plus one.
Cyclomatic complexity may be extended to a program with multiple exit points; in this case it is equal to:
where π is the number of decision points in the program, and s is the number of exit points.
) is one where every vertex is incident with an even number of edges; such subgraphs are unions of cycles and isolated vertices. In the following, even subgraphs will be identified with their edge sets, which is equivalent to only considering those even subgraphs which contain all vertices of the full graph.
The set of all even subgraphs of a graph is closed under symmetric difference
, and may thus be viewed as a vector space over GF(2)
; this vector space is called the cycle space of the graph. The cyclomatic number of the graph is defined as the dimension of this space. Since GF(2) has two elements, the cyclomatic number is also equal to the 2-logarithm of the number of elements in the cycle space.
A basis for the cycle space is easily constructed by first fixing a maximal spanning forest of the graph, and then considering the cycles formed by one edge not in the forest and the path in the forest connecting the endpoints of that edge; these cycles constitute a basis for the cycle space. Hence, the cyclomatic number also equals the number of edges not in a maximal spanning forest of a graph. Since the number of edges in a maximal spanning forest of a graph is equal to the number of vertices minus the number of components, the formula above for the cyclomatic number follows.
For the more topologically inclined, cyclomatic complexity can alternatively be defined as a relative Betti number
, the size of a relative homology
group:
which is read as “the first homology of the graph G, relative to the terminal nodes t”. This is a technical way of saying “the number of linearly independent paths through the flow graph from an entry to an exit”, where:
This corresponds to the intuitive notion of cyclomatic complexity, and can be calculated as above.
Alternatively, one can compute this via absolute Betti number (absolute homology – not relative) by identifying (gluing together) all terminal nodes on a given component (or equivalently, draw paths connecting the exits to the entrance), in which case (calling the new, augmented graph , which is ), one obtains:
This corresponds to the characterization of cyclomatic complexity as “number of loops plus number of components”.
A better name for popular usage would be Conditional Complexity, as "it has been found to be more convenient to count conditions instead of predicates when calculating complexity".
Cyclometic complexity is also known as the number of FAN IN and FAN OUT in a program.
It is useful because of two properties of the cyclomatic complexity, M, for a specific module:
All three of the above numbers may be equal: branch coverage cyclomatic complexity number of paths.
For example, consider a program that consists of two sequential if-then-else statements.
In this example, two test cases are sufficient to achieve a complete branch coverage, while four are necessary for complete path coverage. The cyclomatic complexity of the program is 3 (as the strongly connected graph for the program contains 9 edges, 7 nodes and 1 connected component) (9-7+1).
In general, in order to fully test a module all execution paths through the module should be exercised. This implies a module with a high complexity number requires more testing effort than a module with a lower value since the higher complexity number indicates more pathways through the code. This also implies that a module with higher complexity is more difficult for a programmer to understand since the programmer must understand the different pathways and the results of those pathways.
Unfortunately, it is not always practical to test all possible paths through a program. Considering the example above, each time an additional if-then-else statement is added, the number of possible paths doubles. As the program grew in this fashion, it would quickly reach the point where testing all of the paths was impractical.
One common testing strategy, espoused for example by the NIST Structured Testing methodology, is to use the cyclomatic complexity of a module to determine the number of white-box tests that are required to obtain sufficient coverage of the module. In almost all cases, according to such a methodology, a module should have at least as many tests as its cyclomatic complexity; in most cases, this number of tests is adequate to exercise all the relevant paths of the function.
As an example of a function that requires more than simply branch coverage to test accurately, consider again the above function, but assume that to avoid a bug occurring, any code that calls either f1 or f3 must also call the other.This is a fairly common type of condition; consider the possibility that f1 allocates some resource which f3 releases. Assuming that the results of c1 and c2 are independent, that means that the function as presented above contains a bug. Branch coverage would allow us to test the method with just two tests, and one possible set of tests would be to test the following cases:
Neither of these cases exposes the bug. If, however, we use cyclomatic complexity to indicate the number of tests we require, the number increases to 3. We must therefore test one of the following paths:
Either of these tests will expose the bug.
(less than functional cohesion) than a module with lower complexity. The possible correlation between higher complexity measure with a lower level of cohesion is predicated on a module with more decision points generally implementing more than a single well defined function. A 2005 study showed stronger correlations between complexity metrics and an expert assessment of cohesion in the classes studied than the correlation between the expert's assessment and metrics designed to calculate cohesion.
However, studies that control for program size (i.e., comparing modules that have different complexities but similar size, typically measured in lines of code) are generally less conclusive, with many finding no significant correlation, while others do find correlation. Some researchers who have studied the area question the validity of the methods used by the studies finding no correlation.
Les Hatton
claimed recently (Keynote at TAIC-PART 2008, Windsor, UK, Sept 2008) that McCabe Cyclomatic Complexity has the same prediction ability as lines of code.
Software metric
A software metric is a measure of some property of a piece of software or its specifications. Since quantitative measurements are essential in all sciences, there is a continuous effort by computer science practitioners and theoreticians to bring similar approaches to software development...
(measurement). It was developed by Thomas J. McCabe, Sr. in 1976 and is used to indicate the complexity of a program. It directly measures the number of linearly independent paths through a program's source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...
. The concept, although not the method, is somewhat similar to that of general text complexity measured by the Flesch-Kincaid Readability Test
Flesch-Kincaid Readability Test
The Flesch/Flesch–Kincaid readability tests are designed to indicate comprehension difficulty when reading a passage of contemporary academic English. There are two tests, the Flesch Reading Ease, and the Flesch–Kincaid Grade Level...
.
Cyclomatic complexity is computed using the control flow graph
Control flow graph
A control flow graph in computer science is a representation, using graph notation, of all paths that might be traversed through a program during its execution.- Overview :...
of the program: the nodes of the graph
Graph (mathematics)
In mathematics, a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected objects are represented by mathematical abstractions called vertices, and the links that connect some pairs of vertices are called edges...
correspond to indivisible groups of commands of a program, and a directed
Directed graph
A directed graph or digraph is a pair G= of:* a set V, whose elements are called vertices or nodes,...
edge connects two nodes if the second command might be executed immediately after the first command. Cyclomatic complexity may also be applied to individual functions, modules
Modular programming
Modular programming is a software design technique that increases the extent to which software is composed of separate, interchangeable components called modules by breaking down program functions into modules, each of which accomplishes one function and contains everything necessary to accomplish...
, methods
Method (computer science)
In object-oriented programming, a method is a subroutine associated with a class. Methods define the behavior to be exhibited by instances of the associated class at program run time...
or classes
Class (computer science)
In object-oriented programming, a class is a construct that is used as a blueprint to create instances of itself – referred to as class instances, class objects, instance objects or simply objects. A class defines constituent members which enable these class instances to have state and behavior...
within a program.
One testing
Software testing
Software testing is an investigation conducted to provide stakeholders with information about the quality of the product or service under test. Software testing can also provide an objective, independent view of the software to allow the business to appreciate and understand the risks of software...
strategy, called Basis Path Testing by McCabe who first proposed it, is to test each linearly independent path through the program; in this case, the number of test cases will equal the cyclomatic complexity of the program.
Description
The cyclomatic complexity of a section of source code is the count of the number of linearly independent pathsControl flow
In computer science, control flow refers to the order in which the individual statements, instructions, or function calls of an imperative or a declarative program are executed or evaluated....
through the source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...
. For instance, if the source code contained no decision points such as IF statements or FOR loops, the complexity would be 1, since there is only a single path through the code. If the code had a single IF statement containing a single condition there would be two paths through the code, one path where the IF statement is evaluated as TRUE and one path where the IF statement is evaluated as FALSE.
Mathematically, the cyclomatic complexity of a structured program
Structured programming
Structured programming is a programming paradigm aimed on improving the clarity, quality, and development time of a computer program by making extensive use of subroutines, block structures and for and while loops - in contrast to using simple tests and jumps such as the goto statement which could...
Here "structured" means in particular "with a single exit (return statement
Return statement
In computer programming, a return statement causes execution to leave the current subroutine and resume at the point in the code immediately after where the subroutine was called, known as its return address. The return address is saved, usually on the process's call stack, as part of the operation...
) per function". is defined with reference to the control flow graph
Control flow graph
A control flow graph in computer science is a representation, using graph notation, of all paths that might be traversed through a program during its execution.- Overview :...
of the program, a directed graph
Directed graph
A directed graph or digraph is a pair G= of:* a set V, whose elements are called vertices or nodes,...
containing the basic block
Basic block
In computing, a basic block is a portion of the code within a program with certain desirable properties that make it highly amenable to analysis. Compilers usually decompose programs into their basic blocks as a first step in the analysis process...
s of the program, with an edge between two basic blocks if control may pass from the first to the second. The complexity M is then defined as:
- M = E − N + 2P
where
- E = the number of edges of the graph
- N = the number of nodes of the graph
- P = the number of connected componentsConnected component (graph theory)In graph theory, a connected component of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices. For example, the graph shown in the illustration on the right has three connected components...
An alternative formulation is to use a graph in which each exit point is connected back to the entry point. In this case, the graph is said to be strongly connected, and the cyclomatic complexity of the program is equal to the cyclomatic number of its graph (also known as the first Betti number), which is defined as:
- M = E − N + P
This may be seen as calculating the number of linearly independent cycles that exist in the graph, i.e. those cycles that do not contain other cycles within themselves. Note that because each exit point loops back to the entry point, there is at least one such cycle for each exit point.
For a single program (or subroutine or method), P is always equal to 1. Cyclomatic complexity may, however, be applied to several such programs or subprograms at the same time (e.g., to all of the methods in a class), and in these cases P will be equal to the number of programs in question, as each subprogram will appear as a disconnected subset of the graph.
It can be shown that the cyclomatic complexity of any structured program with only one entrance point and one exit point is equal to the number of decision points (i.e., 'if' statements or conditional loops) contained in that program plus one.
Cyclomatic complexity may be extended to a program with multiple exit points; in this case it is equal to:
- π - s + 2
where π is the number of decision points in the program, and s is the number of exit points.
Formal definition
An even subgraph of a graph (also known as an Eulerian subgraphEulerian path
In graph theory, an Eulerian trail is a trail in a graph which visits every edge exactly once. Similarly, an Eulerian circuit or Eulerian cycle is a Eulerian trail which starts and ends on the same vertex. They were first discussed by Leonhard Euler while solving the famous Seven Bridges of...
) is one where every vertex is incident with an even number of edges; such subgraphs are unions of cycles and isolated vertices. In the following, even subgraphs will be identified with their edge sets, which is equivalent to only considering those even subgraphs which contain all vertices of the full graph.
The set of all even subgraphs of a graph is closed under symmetric difference
Symmetric difference
In mathematics, the symmetric difference of two sets is the set of elements which are in either of the sets and not in their intersection. The symmetric difference of the sets A and B is commonly denoted by A\,\Delta\,B\,orA \ominus B....
, and may thus be viewed as a vector space over GF(2)
GF(2)
GF is the Galois field of two elements. It is the smallest finite field.- Definition :The two elements are nearly always called 0 and 1, being the additive and multiplicative identities, respectively...
; this vector space is called the cycle space of the graph. The cyclomatic number of the graph is defined as the dimension of this space. Since GF(2) has two elements, the cyclomatic number is also equal to the 2-logarithm of the number of elements in the cycle space.
A basis for the cycle space is easily constructed by first fixing a maximal spanning forest of the graph, and then considering the cycles formed by one edge not in the forest and the path in the forest connecting the endpoints of that edge; these cycles constitute a basis for the cycle space. Hence, the cyclomatic number also equals the number of edges not in a maximal spanning forest of a graph. Since the number of edges in a maximal spanning forest of a graph is equal to the number of vertices minus the number of components, the formula above for the cyclomatic number follows.
For the more topologically inclined, cyclomatic complexity can alternatively be defined as a relative Betti number
Betti number
In algebraic topology, a mathematical discipline, the Betti numbers can be used to distinguish topological spaces. Intuitively, the first Betti number of a space counts the maximum number of cuts that can be made without dividing the space into two pieces....
, the size of a relative homology
Relative homology
In algebraic topology, a branch of mathematics, the homology of a topological space relative to a subspace is a construction in singular homology, for pairs of spaces. The relative homology is useful and important in several ways...
group:
which is read as “the first homology of the graph G, relative to the terminal nodes t”. This is a technical way of saying “the number of linearly independent paths through the flow graph from an entry to an exit”, where:
- “linearly independent” corresponds to homology, and means one does not double-count backtracking;
- “paths” corresponds to first homology: a path is a 1-dimensional object;
- “relative” means the path must begin and end at an entry or exit point.
This corresponds to the intuitive notion of cyclomatic complexity, and can be calculated as above.
Alternatively, one can compute this via absolute Betti number (absolute homology – not relative) by identifying (gluing together) all terminal nodes on a given component (or equivalently, draw paths connecting the exits to the entrance), in which case (calling the new, augmented graph , which is ), one obtains:
This corresponds to the characterization of cyclomatic complexity as “number of loops plus number of components”.
Etymology / Naming
The name Cyclomatic Complexity presents some confusion, as this metric does not only count cycles (loops) in the program. Instead, the name refers to the number of different cycles in the program control flow graph, after having added an imagined branch back from the exit node to the entry node.A better name for popular usage would be Conditional Complexity, as "it has been found to be more convenient to count conditions instead of predicates when calculating complexity".
Cyclometic complexity is also known as the number of FAN IN and FAN OUT in a program.
Limiting complexity during development
One of McCabe's original applications was to limit the complexity of routines during program development; he recommended that programmers should count the complexity of the modules they are developing, and split them into smaller modules whenever the cyclomatic complexity of the module exceeded 10. This practice was adopted by the NIST Structured Testing methodology, with an observation that since McCabe's original publication, the figure of 10 had received substantial corroborating evidence, but that in some circumstances it may be appropriate to relax the restriction and permit modules with a complexity as high as 15. As the methodology acknowledged that there were occasional reasons for going beyond the agreed-upon limit, it phrased its recommendation as: "For each module, either limit cyclomatic complexity to [the agreed-upon limit] or provide a written explanation of why the limit was exceeded."Implications for Software Testing
Another application of cyclomatic complexity is in determining the number of test cases that are necessary to achieve thorough test coverage of a particular module.It is useful because of two properties of the cyclomatic complexity, M, for a specific module:
- M is an upper bound for the number of test cases that are necessary to achieve a complete branch coverage.
- M is a lower bound for the number of paths through the control flow graph (CFG). Assuming each test case takes one path, the number of cases needed to achieve path coverage is equal to the number of paths that can actually be taken. But some paths may be impossible, so although the number of paths through the CFG is clearly an upper bound on the number of test cases needed for path coverage, this latter number (of possible paths) is sometimes less than M.
All three of the above numbers may be equal: branch coverage cyclomatic complexity number of paths.
For example, consider a program that consists of two sequential if-then-else statements.
In this example, two test cases are sufficient to achieve a complete branch coverage, while four are necessary for complete path coverage. The cyclomatic complexity of the program is 3 (as the strongly connected graph for the program contains 9 edges, 7 nodes and 1 connected component) (9-7+1).
In general, in order to fully test a module all execution paths through the module should be exercised. This implies a module with a high complexity number requires more testing effort than a module with a lower value since the higher complexity number indicates more pathways through the code. This also implies that a module with higher complexity is more difficult for a programmer to understand since the programmer must understand the different pathways and the results of those pathways.
Unfortunately, it is not always practical to test all possible paths through a program. Considering the example above, each time an additional if-then-else statement is added, the number of possible paths doubles. As the program grew in this fashion, it would quickly reach the point where testing all of the paths was impractical.
One common testing strategy, espoused for example by the NIST Structured Testing methodology, is to use the cyclomatic complexity of a module to determine the number of white-box tests that are required to obtain sufficient coverage of the module. In almost all cases, according to such a methodology, a module should have at least as many tests as its cyclomatic complexity; in most cases, this number of tests is adequate to exercise all the relevant paths of the function.
As an example of a function that requires more than simply branch coverage to test accurately, consider again the above function, but assume that to avoid a bug occurring, any code that calls either f1 or f3 must also call the other.This is a fairly common type of condition; consider the possibility that f1 allocates some resource which f3 releases. Assuming that the results of c1 and c2 are independent, that means that the function as presented above contains a bug. Branch coverage would allow us to test the method with just two tests, and one possible set of tests would be to test the following cases:
- c1 returns true and c2 returns true
- c1 returns false and c2 returns false
Neither of these cases exposes the bug. If, however, we use cyclomatic complexity to indicate the number of tests we require, the number increases to 3. We must therefore test one of the following paths:
- c1 returns true and c2 returns false
- c1 returns false and c2 returns true
Either of these tests will expose the bug.
Cohesion
One would also expect that a module with higher complexity would tend to have lower cohesionCohesion (computer science)
In computer programming, cohesion is a measure of how strongly-related each piece of functionality expressed by the source code of a software module is...
(less than functional cohesion) than a module with lower complexity. The possible correlation between higher complexity measure with a lower level of cohesion is predicated on a module with more decision points generally implementing more than a single well defined function. A 2005 study showed stronger correlations between complexity metrics and an expert assessment of cohesion in the classes studied than the correlation between the expert's assessment and metrics designed to calculate cohesion.
Correlation to number of defects
A number of studies have investigated cyclomatic complexity's correlation to the number of defects contained in a module. Most such studies find a strong positive correlation between cyclomatic complexity and defects: modules that have the highest complexity tend to also contain the most defects. For example, a 2008 study by metric-monitoring software supplier Enerjy analyzed classes of open-source Java applications and divided them into two sets based on how commonly faults were found in them. They found strong correlation between cyclomatic complexity and their faultiness, with classes with a combined complexity of 11 having a probability of being fault-prone of just 0.28, rising to 0.98 for classes with a complexity of 74.However, studies that control for program size (i.e., comparing modules that have different complexities but similar size, typically measured in lines of code) are generally less conclusive, with many finding no significant correlation, while others do find correlation. Some researchers who have studied the area question the validity of the methods used by the studies finding no correlation.
Les Hatton
Les Hatton
Les Hatton is a British-born computer scientist and mathematician most notable for his work in failures and vulnerabilities in software controlled systems....
claimed recently (Keynote at TAIC-PART 2008, Windsor, UK, Sept 2008) that McCabe Cyclomatic Complexity has the same prediction ability as lines of code.
See also
- ComplexityComplexityIn general usage, complexity tends to be used to characterize something with many parts in intricate arrangement. The study of these complex linkages is the main goal of complex systems theory. In science there are at this time a number of approaches to characterizing complexity, many of which are...
- Complexity trap
- Computer programComputer programA computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...
- Computer programmingComputer programmingComputer programming is the process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages. The purpose of programming is to create a program that performs specific operations or exhibits a...
- Control flowControl flowIn computer science, control flow refers to the order in which the individual statements, instructions, or function calls of an imperative or a declarative program are executed or evaluated....
- Decision-to-decision pathDecision-to-decision pathA decision-to-decision path, or DD-Path, is a path of execution that does not include any conditional nodes...
- Design predicatesDesign predicatesDesign predicates are a method invented by Thomas McCabe, to quantify the complexity of the integration of two units of software. Each of the four types of design predicates have an associated integration complexity rating...
- Essential complexityEssential complexityEssential complexity refers to a situation where all reasonable solutions to a problem must be complicated because the "simple" solutions would not adequately solve the problem...
- Halstead complexity measuresHalstead complexity measuresHalstead complexity measures are software metrics introduced by Maurice Howard Halstead in 1977 as part of his treatise on establishing an empirical science of software development....
- PanopticodePanopticodeThe Panopticode free software/open source project provides a standardized format for describing the structure of software projects and integrates software metrics from several tools into that format...
- Software engineeringSoftware engineeringSoftware Engineering is the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software...
- Software testingSoftware testingSoftware testing is an investigation conducted to provide stakeholders with information about the quality of the product or service under test. Software testing can also provide an objective, independent view of the software to allow the business to appreciate and understand the risks of software...
- Synchronization ComplexitySynchronization ComplexitySynchronization complexity is a quantified attribute of a characteristic of a concurrent software product. It measures the additional complexity incurred by the synchronization constructs used in the software, and does that by analyzing the software source code. It is essentially an extension of...
External links
- A Complexity Measure (McCabe's original paper in IEEE Transactions on Software Engineering Vol. 2, No. 4, p. 308 (1976))
- Structured Testing: A Testing Methodology Using the Cyclomatic Complexity Metric NIST Special Publication 500-235
- The role of empiricism in improving the reliability of future software