CAL Actor Language
Encyclopedia
CAL is a high-level programming language for writing (dataflow) actors, which are stateful operators that transform input streams of data objects (tokens) into output streams. CAL has been compiled to a variety of target platforms, including single-core processors, multicore processors, and programmable hardware. It has been used in several application areas, including video and image processing and DSP. The MPEG Reconfigurable Video Coding
(RVC) working group has adopted CAL as part of their standardization efforts.
Another common reason for choosing dataflow is that the goal is an efficient parallel implementation which would be difficult or impossible to achieve using a sequential programming language. Sequential languages are notoriously difficult to parallelize in general, so efficient parallel implementations will usu- ally require significant guidance from the user. A CAL dataflow program provides simple, understandable, and powerful abstractions that allow the specification of as much or as little parallelism as is required, enabling tools to produce sophisticated implementations that exploit the concurrent structure of a computation.
When programming in dataflow, the programmer is typically constructing a concurrent description of a computational system, which is different from a common sequential program. Rather than being concerned with the step-by-step execution of an algorithm, a dataflow programmer builds a system of asynchronously communicating entities called actors. Much of the programming effort is directed toward finding a good factoring of the problem into actors, and toward engineering appropriate communication patterns among those actors.
Consequently, describing an actor involves describing its interface to the outside, the ports, the structure of its internal state, as well as the steps it can perform, what these steps do (in terms of token production and consumption, and the update of the actor state), and how to pick the step that the actor will perform next. This section discusses some of the constructs in the CAL language that deal with these issues.
The first line declares the actor name, followed by a list of parameters (which is empty, in this case), and the declaration of the input and output ports. The input ports are those in front of the > sign (here only one port named In), the output ports are those after it (in this case only one port named Out).
The second line defines an action. Actions are the beef of an actor—they describe the things that happen during a step that an actor takes. In fact, it is accurate to say that a step consists of executing an action. In general, actors may have any number of actions, but ID has only one.
Recall that when an actor takes a step, it may consume input tokens and produce output tokens. The action in ID demonstrates how to specify token consumption and production. The part in front of the >, which we call input patterns, again pertains to input ports, and it specifies how many tokens to consume from which ports and what to call those tokens in the rest of the action. There is one input pattern in this action, In: [a]. It says that one token is to be read (and consumed) from input port In, and that the token is to be called a in the rest of the action. Such an input pattern also defines a condition that must be met for this action to fire—if the required token is not present, this action will not be executed. Therefore, input patterns do the following:
The output side of an action is a little simpler—following the > sign, the output expressions simply define the number and values of the output tokens that will be produced on each output port by each firing of the action. In this case, Out:[a] says, that exactly one token will be generated at output port Out, and its value is a.
It is worth noting that although syntactically the use of a in the input pattern In:[a] looks the same as the one in the output expression Out:[a], their meanings are very different. In the input pattern, the name a is declared, it is introduced as the name of the token that is consumed whenever the action is fired. By contrast, the occurrence of a in the output expression uses that name.
It is permissible to omit the explicit naming of the port that an input pattern or output expression applies to if an action provides as many input patterns as there are input ports, or output expressions as there are output ports. In such a case, the patterns or expressions are matched by position against the port declarations. For instance, the following versions of ID are all equivalent to the original one above:
The next example, Add, shows an actor that has two input ports. Like ID, it also has a single action, but this time, the action reads one token from each of the input ports. The single output token produced by this action is the sum of the two input tokens:
Incidentally, this illustrates the difference between input patterns and output expressions—an expression such as a + b is a perfectly valid way of specifying the value of an output token inside an output expression, but it would be illegal in an input pattern.
Just as in the case of ID, we can write Add a little more concisely by omitting the ports in the description of the action:
One way of thinking about an actor is as an operator on streams of data — sequences of tokens enter it on its input ports, and sequences of tokens leave it on its output ports. When discussing the operation of an actor, it is often useful to look at it as an operator on streams. For instance, say we look at the Add actor, at a point in time when the tokens 5, 7, -3 are on its Input1 and 11, 7, and 0 are on its Input2, with no token so far produced at its Output. We could write this as Input1: [5, 7, -3], Input2: [11, 7, 0] > Output: [] or more concisely as [5, 7, -3], [11, 7, 0] > [] if the order of ports is understood, in the same way in which we elide the input patterns and output expressions.
Now we can look at a run of the Add actor by looking at how the sequences of tokens evolve as the actor makes its steps. Starting from the sequences above, this would look as follows:
Note how inputs are consumed from the front of the input sequences, and outputs are produced (and appended to them) at their end.
During a firing, actors can consume more than one token from any input port, and the can produce more than one output token. The following actor AddSeq consumes two tokens from its single input port and adds them:
A run of AddSeq could look like this:
The actor AddSub produces two output tokens—one the sum, the other the difference between its input tokens:
This might be a run of this actor:
Actors can have parameters. They act as constants during the actor execution, and are given a concrete value when an actor is instantiated as part of an actor network. The main purpose of actor parameters is to allow programmers to specify families of related actors, without having to duplicate a lot of code.
An instance of this actor with k=7 could have this run:
The first action consumes a token from Input1 and sends it to the output, the second does the same for Input2. Each for itself is very similar to the action in ID, in that they copy a token from an input port to an output port. However, both action copy tokens from different input ports to the same output port—and therein lies the rub.
To illustrate the problem, let us look at runs of this actor. This one is obvious:
And so is this one:
But what happens if there are tokens available at both input ports?
The issue here is that both actions have enough input tokens to fire, and the output will look different depending on which we choose. If we pick the first, we get
However, if we pick the second, we get
Clearly, it does make a difference which action is chosen, so the question is: What is the rule for determining which action gets to fire in such a case?
The answer is that there is no such rule. If more than one action satisfies all its firing conditions at any point in time, then the next action to fire is one of those actions, but the choice among them is not part of the actor specification. What this means is that the author of the actor has left this choice open, and that an implementation, or simulation, is free to pick whichever it deems best.
What we see here is called non-determinism — a non-deterministic actor is one that, for the same input sequences, allows more than one run and more than one possible output.5 Non-determinism can be very powerful when used appropriately, but it can also be a very troublesome source of errors. A particular concern is that non-determinism might be introduced into an actor inadvertently, i.e. the author thinks the actor is deterministic even though it isn’t. One of the key design goals of the CAL language was to allow the description of non-deterministic actors, while at the same time permitting tools to identify possible sources of non-determinism, so that they can warn the user about them.
A key consequence of a non-deterministic actor like NDMerge is that during an actual execution, its output may depend on the timing of its input. If both its input queues are empty, and NDMerge is waiting for input, then whatever input the next token arrives at may be the one that is copied next to the output.
Consequently, the scheduling of activities in the actor network, or the relative speeds of the actors feeding into an actor like NDMerge may affect the output of the system. This may, occasionally, by desirable, and at other times it may not. In any event, it is a property that one needs to be aware of.
One way to look at non-determinism of the kind that makes an actor dependent on the precise timing of token arrivals is that such an actor only appears to be non-deterministic if we look at it as an operator on streams, because that view abstracts from the temporal properties of the execution, and thus purposefully removes information that is used to determine the sequence in which actions fire. From the perspective of the CAL language, this is not entirely accurate, but even so, it is easy to write non-deterministic actors that would not be deterministic even if we knew everything about the timing of the tokens and the actor implementation—such as the following:
Admittedly, it may not immediately be obvious what this actor could be used for, but it is an illustration of the nature of non-determinism in dataflow.
The guard clause of an action contains a number of expressions that all need to be true in order for the action to be firable. For the first action to be firable, the incoming token needs to be greater or equal to zero, in which case it will be sent to output P. Otherwise that action cannot fire. Conversely, for the second action to be firable, the token needs to be less than zero, in which case it is sent to output N. A run of this actor might look like this:
There are three things of note about this actor. First, the way it is written, the guard conditions happen to be exhaustive — i.e. the guard conditions cover all possible input — assuming only real numbers (or integers) are seen at the input port, there will never be an input such that neither of the two guards is true. For instance, say we modified the first guard just slightly:
This actor will run into trouble if it ever encounters a zero token, because none of its actions will be able to fire on it. As a consequence, that token will never be consumed, and the actor will no longer be able to fire at all — it will be dead.
It’s not illegal to write actors that terminate on some input, and in fact it may be important to have a few of those in some systems.
But it is a pitfall that one needs to be aware of. Secondly, the guard conditions are also disjoint in addition to being exhaustive — i.e., none of the two guards are true at the same time. Modifying the second guard of Split a little, we get this actor:
Even though SplitND has only guarded actions, it is still non-deterministic, because for some input (zero), both actions can fire. In other words, in addition to the runs of Split, this actor also has, e.g., this run:
Finally, note that guard conditions can ”peek” at the incoming tokens without actually consuming them — if the guards happen to be false or the action is not fired for some other reason, and if the token is not consumed by another action, then it remains where it is, and will be available for the next firing. (Or it will remain there forever, as in the case of the zero token in front of SplitDead, which is never removed because the actor is dead.)
The Select actor below is another example of the use of guarded actions. It is similar to the NDMerge actor in the sense that it merges two streams (the ones arriving at its A and B input ports). However, it does so according to the (Boolean) values of the tokens arriving at its S input port.
A simple example of this is the Sum actor:
This actor maintains a variable in which it accumulates the sum of all tokens it has seen (and consumed). The declaration sum := 0; introduces the variable name and also assigns the variable an initial value. The action, in addition to consuming an input token and producing an output token, now also modifies the actor state by assigning a new value to the state variable. The next time this actor fires, the state variable will have that new, updated, value. An run of Sum might be this:
Note that the value that is produced by the output expression is the value of the state variable at the end of the action firing, i.e. after the variable has been updated. This is a general rule, and important to keep in mind: If state variables occur in output expressions, the value that they refer to is the value at the end of the action firing. If the action modified that state variable, then it is the new value that will be used.
Sometimes, one would like to use the old value, the one that was valid at the beginning of the action firing, before any possible updates might have happened. The old keyword can be used to identify that value. It may only be used with state variables:
This actor would have the following run:
Sometimes, state is used to control the selection of actions. Let us Recall the Select actor:
The way this actor is written, the selection of the next input token and the actual copying of the token to the output is one atomic step. Suppose we want to rewrite that actor to perform these two things in two distinct actions. The actor would then execute in two stages—in the first, it would wait for a token on input S. Once it read that token it would, depending on its value wait for a data token on either A or B. Once that arrived, it would copy it to the output, and go back to waiting for a token on S.
The following actor IterSelect is written in that way. Its state variable state is used to select the action that is waiting for input, depending on whether the variable is 0, 1, or 2. Initially, by making 0 the initial value of state, IterSelect waits for input on S, and then it proceeds as described above.
Note that Select and IterSelect are almost, but not entirely, equivalent. First of all, IterSelect makes twice as many steps in order to process the same number of tokens. Secondly, it actually reads, and therefore consumes, the S input token, irrespective of whether a matching data token is available on A or B.
Unlike the previous examples, this actor uses guards that depend on an actor state variable rather than on an input token. Of course, combinations are possible, as in this example:
This actor would have a run such as this:
Schedules
The IterSelect actor of the previous section illustrated the use of state to control the selection of actions. This is an extremely common thing to do in practice, and the CAL language provides special syntax for this purpose in the form of schedules. Conceptually, one can think of schedules as codifying a particular pattern of using a state variable—they do not add anything to the language in terms of expressiveness. The rationale for using schedules is twofold:
A version of IterSelect using a schedule looks like this:
First, let us look at the labels in front of the actions — readT, readF, copyA, and copyB. These are action tags, and are used to identify actions further down in the schedule. Then there is the schedule itself. Basically, it is a textual representation of a finite state machine, given as a list of possible state transitions. The states of that finite state machine are the first and the last identifiers in those transitions — in this case, init, waitA, waitB. Relating this back to the original version of IterSelect, these states are the possible values of the state variable, i.e. 0, 1, and 2. The initial state of the schedule is the one following schedule fsm — in this case, it is init.
Each state transition consists of three parts: the original state, a list of action tags, and the following state. For instance, in the transition init (readT) --> waitA; we have init as the original state, readT as the action tag, and waitA as the following state. The way to read this is that if the schedule is in state init and an action tagged with readT occurs, the schedule will subsequently be in state waitA.
One thing worth noting is that the number of actions has increased—instead of the original three, the new version with the schedule now has four actions. The reason is that an action can no longer directly assign the successor state, as it did in the original, where depending on the value of the token read state would be assigned either the value 1 or 2. In the version with a schedule, that state modification is implicit in the structure of the state machine, and it happens depending on which action fires. Accordingly, the condition that checks the value of the token has moved from within the body of the action to the guards of the two actions tagged readT and readF.
Let us do this again with a slightly smaller example, another actor that merges two streams.
Suppose we want to make sure that merging happens more deterministically than it did in NDMerge, i.e. we alternate between reading from the two inputs. That is what AlmostFairMerge does — it is not perfectly fair, as it is biased with respect to which input it starts reading from. But once it is running, it will strictly alternate between the two:
Obviously, this actor has two states, depending on which port it is waiting for input. A simple schedule can be used to express this logic much more succinctly:
This actor is clearly nondeterministic. As long as it has only input on one of its input ports, everthing is unambiguous. But, just like NDMerge, as soon as input is available on both input ports, it could fire either of its two actions, and there is nothing in that actor specification which would predispose it to choose one over the other.
Suppose now that this actor processes, e.g., audio data that continuously streams in on its In input port, and that this processing depends on the value of its state variable c—imagine c containing the setting of the volume dial. Every now and then, the user turns that dial, and a new value for c is sent to this actor. Clearly, it is not irrelevant in which order the two actions fire. In fact, we would like to make sure that the first action fires as soon as possible, so that the new user setting will take effect. More precisely, we would like to express the requirement that, should both actions be able to fire, the first one will be fired next.
Interestingly, none of the language constructs so far would allow us to do this. Unlike in this case of schedules, which could be regarded syntactic sugar
because they could be reduced to existing elements of the language (state variables, guards, and assignments), this situation does in fact require a true extension—action priorities.
The basic idea is to add a number of inequalities that relate actions with respect to their firing precedence.11 In our example, this leads to the following solution:
Just as in the case of schedules, we use action tags to identify actions that we want to refer to later on—this time within the priority inequality. The priority block contains only one such inequality, relating the action tagged config to the one tagged process, giving the former priority over the latter.
Of course, even this version is still very much timing-dependent. In this case, that need not be a problem, and in fact is probably a requirement for this actor to perform its function. But in general, it is important to understand that priorities, especially when used as in the previous example, need to be well- understood to yield the correct results. Especially when information about the timing of the communication within the network is vague, it is probably best to think of them as strong implementation directives.
Another group of basic expressions are variable references. Syntactically, a variable is any sequence of letters, digits, and the ” ” character that (a) does not start with a digit and (b) is not a keyword.
One important property of expressions is that they are guaranteed not to change variables (we also say they have no side effects)—consequently, within an expression, multiple references to the same variable will always yield the same result.
CAL provides operators of two kinds to build expressions: unary and binary. A unary operator in CAL is always a prefix operator, i.e. it appears before its single operand. A binary operator occurs between its two operands. These are examples of expressions using unary operators: -a, #s. The unary - operator negates the value of its operand, which must be a number (i.e. it must evaluate to a number). The unary # operator applies to lists (and other collections), and computes their size, i.e. the number of elements in them. (More on lists in section 3.1.3.)
These are examples of uses of binary operators: a + 1,a + b + c, and a + b * c. Of course, the usual rules of operator binding apply,so that the last expression can also be written a + (b * c).
There is also a conditional expression, which works much like the ?:-operator in C-like languages, albeit with a slightly different syntax. For example, one can write if a > b then 0 else 1 end where a > b is the condition, and 0 and 1 are the expressions that are evaluated in case the condition is true or false, respectively. Note that the conditional expression is different from operators not only in the number of expressions it contains (three instead of one or two), but also in the way it evaluates those expressions. If the condition is true, then only the then-branch expression matters for the result of the conditional expression, and therefore it is guaranteed to be defined even if the else-branch expression, for instance, is not. For example, if a = 0 then null else 1/a end will produce a defined value (null) if a is zero, even though the else-branch expression is undefined in that case.
may be arbitrary expressions:
[a, a + 1, a * a] With, say, a = 7, this expression would evaluate to a list of three elements 7, 8, and 49.
Lists can be built from existing lists using a construction called list comprehension.
It looks like this:
This results in a list with the elements 1, 4, 9, and 16. The expression in front of the colon, a * a, is an element expression. Because of the generator that follows the colon, it is evaluated for the variable a bound to each element of the generator list, in this case 1, 2, 3, and 4.
Comprehensions may contain more than one generator, as in this example:
In this case, the result list is constructed by binding the variables a and b to all combinations of values from the respective generator lists. The further to the right a generator is, the faster does its generator variable vary over the elements of the generator list. In the example above, the b generator is to the right of the a generator.
Consequently, after the first element, which is 2 * 7 = 14, the next element is obtained by taking the next element in the second generator, yielding 2 * 11 = 22, rather than 3 * 7 = 21. Consequently, the list resulting from evaluating the comprehension above contains the elements 14, 22, 21, 33, 35, 55 in that order.
Similarly, a list comprehension can contain more than one element expression. For example, [a,a*a: for a in [2,3,5]] results in a list containing 2, 4, 3, 9, 5, 25 in that order.
In order to extract a part of a collection such as a list, one needs to use an indexer. An indexer is an expression that contains (a) an expression computing a composite object, such as a list, and one or more expressions computing indices. The indices are identifying a location inside the composite object, at which the part of it that we want to use resides. In the case of lists, indices are the natural numbers from zero to the length of the list minus one. So for instance, if b is the list[1, 1, 2, 3, 5, 8],then the indexer b[4] would evaluate to 5. So does, by the way, the rather confusing-looking expression [1, 1, 2, 3, 5, 8][4]
Lists can be concatenated using the + operator, so for example, the expression
[1, 2, 3] + [4, 5] results in the list [1, 2, 3, 4, 5]. Concatenating a list with an empty list has no effect.
Here, double is the function name, x is a parameter, and the expression between the colon and the end is the function body.
One thing to note about functions is that they contain exactly one expression in their body. Because assignments are statements, no variables can be changed through the invocation of a function.
Functions may be recursive:
Functions defined in the same scope may be mutually recursive:
The evaluation of expressions containing function applications, like the evaluation of expressions in general, can easily take advantage of some fine-grained parallelism inherent in CAL — for instance, in the expression F(G(x), H(x, y)), the order in which G(x) and H(x, y) are evaluated does not change the result, and in fact they may be evaluated in parallel. This is a consequence of the absence of side-effects for CAL expressions.
Statements are executed in strict sequential order, and unless otherwise specified, the execution of statements proceeds in the order in which they appear in the program text, which means that any variable changes produced by a statement may affect the execution of subsequent statements.
All of these simply change the old value of a variable to a new one.
Often variables contain composite objects, for instance a list of things rather than, say, an integer. In such a case, it is often desirable to change only a part of the object, while leaving the rest of it as before. This can be achieved using a simple assignment as above, e.g. like this:
The right-hand side of this assignment computes a list that only differs from the list in m by one element: at position k, it has the value v. After assigning that list to m, the effect is the same as if we had modified the original value of m at position k. Clearly, that is a very roundabout way of achieving this, which is why there are indexed assignments to make this more concise. The assignment above is equivalent to the following indexed assignment: m[k] := v;
Unlike for conditional expressions, a conditional statement may omit the else-branch:
Loops are another way of controlling the flow of execution. The simplest one is the while-loop, which executes a piece of code over and over again as long as a specified condition remains true:
The above loop would iterate over the valid indices of the list in variable a, starting at 0 and continuing until it is no longer true that i < #a. The body of the loop adds the element a[i] to sum, and also increments the variable i itself.
Iterating over the elements of a collection is so common that there is a special construct for it, the foreach-loop. Using it, the loop above could also be written like this:
The part of this loop that directly follows the foreach keyword is a generator, much like those in list comprehensions. And like comprehensions, foreach-loops can have more than one generator:
Procedures are used to abstract and parameterize sequences of statements, just as functions are abstracting and parameterizing expressions. For instance,
Such a procedure can be invoked in the usual way:
Reconfigurable video coding
-Overview:The Reconfigurable Video Coding is an MPEG initiative to provide a innovative framework of video coding development. This framework offers a way to overcome the lack of interoperability between the many video codecs deployed in the market. Indeed, an RVC codec is described using the...
(RVC) working group has adopted CAL as part of their standardization efforts.
History and Introduction
The CAL Actor Language was developed in 2001 as part of the Ptolemy II project at University of California at Berkeley. CAL is a dataflow language geared towards a variety of application domains, such as multimedia processing, control systems, network processing etc. A good guide for whether dataflow might be a good choice for a given problem domains whether a description of the computation itself (as opposed to, say, the class structure or the use cases) starts with a diagram of blocks connected by arcs that denote the transmission of packets of information. If it is, chances are that this translates well into a dataflow program.Another common reason for choosing dataflow is that the goal is an efficient parallel implementation which would be difficult or impossible to achieve using a sequential programming language. Sequential languages are notoriously difficult to parallelize in general, so efficient parallel implementations will usu- ally require significant guidance from the user. A CAL dataflow program provides simple, understandable, and powerful abstractions that allow the specification of as much or as little parallelism as is required, enabling tools to produce sophisticated implementations that exploit the concurrent structure of a computation.
When programming in dataflow, the programmer is typically constructing a concurrent description of a computational system, which is different from a common sequential program. Rather than being concerned with the step-by-step execution of an algorithm, a dataflow programmer builds a system of asynchronously communicating entities called actors. Much of the programming effort is directed toward finding a good factoring of the problem into actors, and toward engineering appropriate communication patterns among those actors.
The Structure of Actors
Actors perform their computation in a sequence of steps we call firings. In each of those steps:- 1. the actor may consume tokens from its input ports,
- 2. it may modify its internal state,
- 3. it may produce tokens at its output ports.
Consequently, describing an actor involves describing its interface to the outside, the ports, the structure of its internal state, as well as the steps it can perform, what these steps do (in terms of token production and consumption, and the update of the actor state), and how to pick the step that the actor will perform next. This section discusses some of the constructs in the CAL language that deal with these issues.
A Very Simple Actor
One of the simplest actors that does anything at all is one that only copies a token from its input port to its output port. That is what the actor ID does:The first line declares the actor name, followed by a list of parameters (which is empty, in this case), and the declaration of the input and output ports. The input ports are those in front of the > sign (here only one port named In), the output ports are those after it (in this case only one port named Out).
The second line defines an action. Actions are the beef of an actor—they describe the things that happen during a step that an actor takes. In fact, it is accurate to say that a step consists of executing an action. In general, actors may have any number of actions, but ID has only one.
Recall that when an actor takes a step, it may consume input tokens and produce output tokens. The action in ID demonstrates how to specify token consumption and production. The part in front of the >, which we call input patterns, again pertains to input ports, and it specifies how many tokens to consume from which ports and what to call those tokens in the rest of the action. There is one input pattern in this action, In: [a]. It says that one token is to be read (and consumed) from input port In, and that the token is to be called a in the rest of the action. Such an input pattern also defines a condition that must be met for this action to fire—if the required token is not present, this action will not be executed. Therefore, input patterns do the following:
- They define the number of tokens (for each port) that will be consumed when the action is executed (fired).
- They declare the variable symbols by which tokens consumed by an action firing will be referred to within the action.
- They define a firing condition for the action, i.e. a condition that must be met for the action to be able to fire.
The output side of an action is a little simpler—following the > sign, the output expressions simply define the number and values of the output tokens that will be produced on each output port by each firing of the action. In this case, Out:[a] says, that exactly one token will be generated at output port Out, and its value is a.
It is worth noting that although syntactically the use of a in the input pattern In:[a] looks the same as the one in the output expression Out:[a], their meanings are very different. In the input pattern, the name a is declared, it is introduced as the name of the token that is consumed whenever the action is fired. By contrast, the occurrence of a in the output expression uses that name.
It is permissible to omit the explicit naming of the port that an input pattern or output expression applies to if an action provides as many input patterns as there are input ports, or output expressions as there are output ports. In such a case, the patterns or expressions are matched by position against the port declarations. For instance, the following versions of ID are all equivalent to the original one above:
The next example, Add, shows an actor that has two input ports. Like ID, it also has a single action, but this time, the action reads one token from each of the input ports. The single output token produced by this action is the sum of the two input tokens:
Incidentally, this illustrates the difference between input patterns and output expressions—an expression such as a + b is a perfectly valid way of specifying the value of an output token inside an output expression, but it would be illegal in an input pattern.
Just as in the case of ID, we can write Add a little more concisely by omitting the ports in the description of the action:
One way of thinking about an actor is as an operator on streams of data — sequences of tokens enter it on its input ports, and sequences of tokens leave it on its output ports. When discussing the operation of an actor, it is often useful to look at it as an operator on streams. For instance, say we look at the Add actor, at a point in time when the tokens 5, 7, -3 are on its Input1 and 11, 7, and 0 are on its Input2, with no token so far produced at its Output. We could write this as Input1: [5, 7, -3], Input2: [11, 7, 0] > Output: [] or more concisely as [5, 7, -3], [11, 7, 0] > [] if the order of ports is understood, in the same way in which we elide the input patterns and output expressions.
Now we can look at a run of the Add actor by looking at how the sequences of tokens evolve as the actor makes its steps. Starting from the sequences above, this would look as follows:
Note how inputs are consumed from the front of the input sequences, and outputs are produced (and appended to them) at their end.
During a firing, actors can consume more than one token from any input port, and the can produce more than one output token. The following actor AddSeq consumes two tokens from its single input port and adds them:
A run of AddSeq could look like this:
The actor AddSub produces two output tokens—one the sum, the other the difference between its input tokens:
This might be a run of this actor:
Actors can have parameters. They act as constants during the actor execution, and are given a concrete value when an actor is instantiated as part of an actor network. The main purpose of actor parameters is to allow programmers to specify families of related actors, without having to duplicate a lot of code.
An instance of this actor with k=7 could have this run:
Non determinism
Up to this point, all actors had a single action, although it was already mentioned that this need not be the case in general. Actors can have any number of actions, including none at all. The following actor, NDMerge, has two:The first action consumes a token from Input1 and sends it to the output, the second does the same for Input2. Each for itself is very similar to the action in ID, in that they copy a token from an input port to an output port. However, both action copy tokens from different input ports to the same output port—and therein lies the rub.
To illustrate the problem, let us look at runs of this actor. This one is obvious:
And so is this one:
But what happens if there are tokens available at both input ports?
The issue here is that both actions have enough input tokens to fire, and the output will look different depending on which we choose. If we pick the first, we get
However, if we pick the second, we get
Clearly, it does make a difference which action is chosen, so the question is: What is the rule for determining which action gets to fire in such a case?
The answer is that there is no such rule. If more than one action satisfies all its firing conditions at any point in time, then the next action to fire is one of those actions, but the choice among them is not part of the actor specification. What this means is that the author of the actor has left this choice open, and that an implementation, or simulation, is free to pick whichever it deems best.
What we see here is called non-determinism — a non-deterministic actor is one that, for the same input sequences, allows more than one run and more than one possible output.5 Non-determinism can be very powerful when used appropriately, but it can also be a very troublesome source of errors. A particular concern is that non-determinism might be introduced into an actor inadvertently, i.e. the author thinks the actor is deterministic even though it isn’t. One of the key design goals of the CAL language was to allow the description of non-deterministic actors, while at the same time permitting tools to identify possible sources of non-determinism, so that they can warn the user about them.
A key consequence of a non-deterministic actor like NDMerge is that during an actual execution, its output may depend on the timing of its input. If both its input queues are empty, and NDMerge is waiting for input, then whatever input the next token arrives at may be the one that is copied next to the output.
Consequently, the scheduling of activities in the actor network, or the relative speeds of the actors feeding into an actor like NDMerge may affect the output of the system. This may, occasionally, by desirable, and at other times it may not. In any event, it is a property that one needs to be aware of.
One way to look at non-determinism of the kind that makes an actor dependent on the precise timing of token arrivals is that such an actor only appears to be non-deterministic if we look at it as an operator on streams, because that view abstracts from the temporal properties of the execution, and thus purposefully removes information that is used to determine the sequence in which actions fire. From the perspective of the CAL language, this is not entirely accurate, but even so, it is easy to write non-deterministic actors that would not be deterministic even if we knew everything about the timing of the tokens and the actor implementation—such as the following:
Admittedly, it may not immediately be obvious what this actor could be used for, but it is an illustration of the nature of non-determinism in dataflow.
Guarded actions
So far, the only firing condition for actions was that there be sufficiently many tokens for them to consume, as specified in their input patterns. However, in many cases we want to specify additional criteria that need to be satisfied for an action to fire—conditions, for instance, that depend on the values of the tokens, or the state of the actor, or both. These conditions can be specified using guards, as for example in the Split actor:The guard clause of an action contains a number of expressions that all need to be true in order for the action to be firable. For the first action to be firable, the incoming token needs to be greater or equal to zero, in which case it will be sent to output P. Otherwise that action cannot fire. Conversely, for the second action to be firable, the token needs to be less than zero, in which case it is sent to output N. A run of this actor might look like this:
There are three things of note about this actor. First, the way it is written, the guard conditions happen to be exhaustive — i.e. the guard conditions cover all possible input — assuming only real numbers (or integers) are seen at the input port, there will never be an input such that neither of the two guards is true. For instance, say we modified the first guard just slightly:
This actor will run into trouble if it ever encounters a zero token, because none of its actions will be able to fire on it. As a consequence, that token will never be consumed, and the actor will no longer be able to fire at all — it will be dead.
It’s not illegal to write actors that terminate on some input, and in fact it may be important to have a few of those in some systems.
But it is a pitfall that one needs to be aware of. Secondly, the guard conditions are also disjoint in addition to being exhaustive — i.e., none of the two guards are true at the same time. Modifying the second guard of Split a little, we get this actor:
Even though SplitND has only guarded actions, it is still non-deterministic, because for some input (zero), both actions can fire. In other words, in addition to the runs of Split, this actor also has, e.g., this run:
Finally, note that guard conditions can ”peek” at the incoming tokens without actually consuming them — if the guards happen to be false or the action is not fired for some other reason, and if the token is not consumed by another action, then it remains where it is, and will be available for the next firing. (Or it will remain there forever, as in the case of the zero token in front of SplitDead, which is never removed because the actor is dead.)
The Select actor below is another example of the use of guarded actions. It is similar to the NDMerge actor in the sense that it merges two streams (the ones arriving at its A and B input ports). However, it does so according to the (Boolean) values of the tokens arriving at its S input port.
Actors with state
In all the actors so far, nothing an action firing did would in any way affect subsequent firings of actions of the same actor. Using state variables, action firings can leave information behind for subsequent firings of either the same or a different action of the same actor.A simple example of this is the Sum actor:
This actor maintains a variable in which it accumulates the sum of all tokens it has seen (and consumed). The declaration sum := 0; introduces the variable name and also assigns the variable an initial value. The action, in addition to consuming an input token and producing an output token, now also modifies the actor state by assigning a new value to the state variable. The next time this actor fires, the state variable will have that new, updated, value. An run of Sum might be this:
Note that the value that is produced by the output expression is the value of the state variable at the end of the action firing, i.e. after the variable has been updated. This is a general rule, and important to keep in mind: If state variables occur in output expressions, the value that they refer to is the value at the end of the action firing. If the action modified that state variable, then it is the new value that will be used.
Sometimes, one would like to use the old value, the one that was valid at the beginning of the action firing, before any possible updates might have happened. The old keyword can be used to identify that value. It may only be used with state variables:
This actor would have the following run:
Sometimes, state is used to control the selection of actions. Let us Recall the Select actor:
The way this actor is written, the selection of the next input token and the actual copying of the token to the output is one atomic step. Suppose we want to rewrite that actor to perform these two things in two distinct actions. The actor would then execute in two stages—in the first, it would wait for a token on input S. Once it read that token it would, depending on its value wait for a data token on either A or B. Once that arrived, it would copy it to the output, and go back to waiting for a token on S.
The following actor IterSelect is written in that way. Its state variable state is used to select the action that is waiting for input, depending on whether the variable is 0, 1, or 2. Initially, by making 0 the initial value of state, IterSelect waits for input on S, and then it proceeds as described above.
Note that Select and IterSelect are almost, but not entirely, equivalent. First of all, IterSelect makes twice as many steps in order to process the same number of tokens. Secondly, it actually reads, and therefore consumes, the S input token, irrespective of whether a matching data token is available on A or B.
Unlike the previous examples, this actor uses guards that depend on an actor state variable rather than on an input token. Of course, combinations are possible, as in this example:
This actor would have a run such as this:
Schedules
The IterSelect actor of the previous section illustrated the use of state to control the selection of actions. This is an extremely common thing to do in practice, and the CAL language provides special syntax for this purpose in the form of schedules. Conceptually, one can think of schedules as codifying a particular pattern of using a state variable—they do not add anything to the language in terms of expressiveness. The rationale for using schedules is twofold:
- They are usually easier to use and less error prone than using a state variable and lots of guards and assignments.
- Tools can use the information encoded in a schedule more easily, and thus recognize regularities in the actor that might help them to produce more efficient code, or perform other analyses that help in implementation and design.
A version of IterSelect using a schedule looks like this:
First, let us look at the labels in front of the actions — readT, readF, copyA, and copyB. These are action tags, and are used to identify actions further down in the schedule. Then there is the schedule itself. Basically, it is a textual representation of a finite state machine, given as a list of possible state transitions. The states of that finite state machine are the first and the last identifiers in those transitions — in this case, init, waitA, waitB. Relating this back to the original version of IterSelect, these states are the possible values of the state variable, i.e. 0, 1, and 2. The initial state of the schedule is the one following schedule fsm — in this case, it is init.
Each state transition consists of three parts: the original state, a list of action tags, and the following state. For instance, in the transition init (readT) --> waitA; we have init as the original state, readT as the action tag, and waitA as the following state. The way to read this is that if the schedule is in state init and an action tagged with readT occurs, the schedule will subsequently be in state waitA.
One thing worth noting is that the number of actions has increased—instead of the original three, the new version with the schedule now has four actions. The reason is that an action can no longer directly assign the successor state, as it did in the original, where depending on the value of the token read state would be assigned either the value 1 or 2. In the version with a schedule, that state modification is implicit in the structure of the state machine, and it happens depending on which action fires. Accordingly, the condition that checks the value of the token has moved from within the body of the action to the guards of the two actions tagged readT and readF.
Let us do this again with a slightly smaller example, another actor that merges two streams.
Suppose we want to make sure that merging happens more deterministically than it did in NDMerge, i.e. we alternate between reading from the two inputs. That is what AlmostFairMerge does — it is not perfectly fair, as it is biased with respect to which input it starts reading from. But once it is running, it will strictly alternate between the two:
Obviously, this actor has two states, depending on which port it is waiting for input. A simple schedule can be used to express this logic much more succinctly:
Priorities
Consider the following actor:This actor is clearly nondeterministic. As long as it has only input on one of its input ports, everthing is unambiguous. But, just like NDMerge, as soon as input is available on both input ports, it could fire either of its two actions, and there is nothing in that actor specification which would predispose it to choose one over the other.
Suppose now that this actor processes, e.g., audio data that continuously streams in on its In input port, and that this processing depends on the value of its state variable c—imagine c containing the setting of the volume dial. Every now and then, the user turns that dial, and a new value for c is sent to this actor. Clearly, it is not irrelevant in which order the two actions fire. In fact, we would like to make sure that the first action fires as soon as possible, so that the new user setting will take effect. More precisely, we would like to express the requirement that, should both actions be able to fire, the first one will be fired next.
Interestingly, none of the language constructs so far would allow us to do this. Unlike in this case of schedules, which could be regarded syntactic sugar
Syntactic sugar
Syntactic sugar is a computer science term that refers to syntax within a programming language that is designed to make things easier to read or to express....
because they could be reduced to existing elements of the language (state variables, guards, and assignments), this situation does in fact require a true extension—action priorities.
The basic idea is to add a number of inequalities that relate actions with respect to their firing precedence.11 In our example, this leads to the following solution:
Just as in the case of schedules, we use action tags to identify actions that we want to refer to later on—this time within the priority inequality. The priority block contains only one such inequality, relating the action tagged config to the one tagged process, giving the former priority over the latter.
Of course, even this version is still very much timing-dependent. In this case, that need not be a problem, and in fact is probably a requirement for this actor to perform its function. But in general, it is important to understand that priorities, especially when used as in the previous example, need to be well- understood to yield the correct results. Especially when information about the timing of the communication within the network is vague, it is probably best to think of them as strong implementation directives.
Statements and expressions
The previous chapter focused primarily on those constructs in CAL that are related to actor-specific concepts—token input and output, actions, controlling the action selection and so forth. This section discusses the more ”pedestrian” parts of CAL, the statements and expressions used to manipulate data objects and express (sequential) algorithms. This part of the language is similar to what can be found in many procedural programming languages (such as C, Pascal, Java, Ada, ...), so we will focus on areas that might be slightly different in CAL.Expressions
Unlike languages such as C, CAL makes a strong distinction between statements and expressions. They have very distinct roles, very distinct meanings, and they can never be used interchangeably. An expression in CAL is a piece of code whose sole purpose is to compute a value. We also say that an expression has a value, or that it evaluates to a value. For most expressions, the value that they evaluate to will depend on the values of one or more variables at the time when the expression is evaluated. Since variable values may change over time, the same expression may have different values when evaluated at different points in time.Atomic expressions
Probably the most fundamental expressions are constants. These are expressions whose values are guaranteed not to depend on any variables. Constants in CAL are the Boolean values true and false, numerical constants such as 11, -1, 3.14, and 1.3806503e-23, and strings enclosed in quotation marks like "abc", another string and"", as well as the null value null.Another group of basic expressions are variable references. Syntactically, a variable is any sequence of letters, digits, and the ” ” character that (a) does not start with a digit and (b) is not a keyword.
One important property of expressions is that they are guaranteed not to change variables (we also say they have no side effects)—consequently, within an expression, multiple references to the same variable will always yield the same result.
Simple composite expressions
CAL provides operators of two kinds to build expressions: unary and binary. A unary operator in CAL is always a prefix operator, i.e. it appears before its single operand. A binary operator occurs between its two operands. These are examples of expressions using unary operators: -a, #s. The unary - operator negates the value of its operand, which must be a number (i.e. it must evaluate to a number). The unary # operator applies to lists (and other collections), and computes their size, i.e. the number of elements in them. (More on lists in section 3.1.3.)
These are examples of uses of binary operators: a + 1,a + b + c, and a + b * c. Of course, the usual rules of operator binding apply,so that the last expression can also be written a + (b * c).
There is also a conditional expression, which works much like the ?:-operator in C-like languages, albeit with a slightly different syntax. For example, one can write if a > b then 0 else 1 end where a > b is the condition, and 0 and 1 are the expressions that are evaluated in case the condition is true or false, respectively. Note that the conditional expression is different from operators not only in the number of expressions it contains (three instead of one or two), but also in the way it evaluates those expressions. If the condition is true, then only the then-branch expression matters for the result of the conditional expression, and therefore it is guaranteed to be defined even if the else-branch expression, for instance, is not. For example, if a = 0 then null else 1/a end will produce a defined value (null) if a is zero, even though the else-branch expression is undefined in that case.
Lists
Collections are composite data objects built from a number of other objects. A common example of a collection is a list, which can be constructed like this: [1, 2, 3] This builds a list of three elements, the integers 1, 2, and 3. The expression [] results in the empty list. The elements in such a list expressionmay be arbitrary expressions:
[a, a + 1, a * a] With, say, a = 7, this expression would evaluate to a list of three elements 7, 8, and 49.
Lists can be built from existing lists using a construction called list comprehension.
It looks like this:
This results in a list with the elements 1, 4, 9, and 16. The expression in front of the colon, a * a, is an element expression. Because of the generator that follows the colon, it is evaluated for the variable a bound to each element of the generator list, in this case 1, 2, 3, and 4.
Comprehensions may contain more than one generator, as in this example:
In this case, the result list is constructed by binding the variables a and b to all combinations of values from the respective generator lists. The further to the right a generator is, the faster does its generator variable vary over the elements of the generator list. In the example above, the b generator is to the right of the a generator.
Consequently, after the first element, which is 2 * 7 = 14, the next element is obtained by taking the next element in the second generator, yielding 2 * 11 = 22, rather than 3 * 7 = 21. Consequently, the list resulting from evaluating the comprehension above contains the elements 14, 22, 21, 33, 35, 55 in that order.
Similarly, a list comprehension can contain more than one element expression. For example, [a,a*a: for a in [2,3,5]] results in a list containing 2, 4, 3, 9, 5, 25 in that order.
In order to extract a part of a collection such as a list, one needs to use an indexer. An indexer is an expression that contains (a) an expression computing a composite object, such as a list, and one or more expressions computing indices. The indices are identifying a location inside the composite object, at which the part of it that we want to use resides. In the case of lists, indices are the natural numbers from zero to the length of the list minus one. So for instance, if b is the list[1, 1, 2, 3, 5, 8],then the indexer b[4] would evaluate to 5. So does, by the way, the rather confusing-looking expression [1, 1, 2, 3, 5, 8][4]
List miscellanea
The function Integers takes two arguments and computes a list of all integers between them, inclusively, and in order. For instance, Integers(3, 7) results in the list [3, 4, 5, 6, 7]. If the second argument is greater than the first, the resulting list is empty. The .. operator serves as a short form of the Integers function — the term Integers(a, b)is equivalent to a .. b. The # operator is used to determine the size of a list, i.e. the number of elements init. For instance, #[1, 1, 2, 3, 5, 8] evaluates to 6.This can be used to to make sure that an index into a list is actually valid, and also to iterate over the elements of a list. For example, if a contains a list, then the following expression computes the reverse of that list:Lists can be concatenated using the + operator, so for example, the expression
[1, 2, 3] + [4, 5] results in the list [1, 2, 3, 4, 5]. Concatenating a list with an empty list has no effect.
Functions
Functions encapsulate expressions and allow the programmer to parameterize them. For instance,Here, double is the function name, x is a parameter, and the expression between the colon and the end is the function body.
One thing to note about functions is that they contain exactly one expression in their body. Because assignments are statements, no variables can be changed through the invocation of a function.
Functions may be recursive:
Functions defined in the same scope may be mutually recursive:
The evaluation of expressions containing function applications, like the evaluation of expressions in general, can easily take advantage of some fine-grained parallelism inherent in CAL — for instance, in the expression F(G(x), H(x, y)), the order in which G(x) and H(x, y) are evaluated does not change the result, and in fact they may be evaluated in parallel. This is a consequence of the absence of side-effects for CAL expressions.
Statements
In some ways, statements in CAL are just the opposite of expressions: they do not have a ”return value”, but they can change the values of variables. Indeed, changing the values of variables is the whole point of statements. That is what they do.Statements are executed in strict sequential order, and unless otherwise specified, the execution of statements proceeds in the order in which they appear in the program text, which means that any variable changes produced by a statement may affect the execution of subsequent statements.
Assignments
So in the same way that an expression can be characterized by describing the value that it evaluates to, a statement can be described by how it changes variables. The most fundamental statement is an assignment, and the simplest assignment looks like these:All of these simply change the old value of a variable to a new one.
Often variables contain composite objects, for instance a list of things rather than, say, an integer. In such a case, it is often desirable to change only a part of the object, while leaving the rest of it as before. This can be achieved using a simple assignment as above, e.g. like this:
The right-hand side of this assignment computes a list that only differs from the list in m by one element: at position k, it has the value v. After assigning that list to m, the effect is the same as if we had modified the original value of m at position k. Clearly, that is a very roundabout way of achieving this, which is why there are indexed assignments to make this more concise. The assignment above is equivalent to the following indexed assignment: m[k] := v;
Control flow
As in most other programming languages, there are constructs to control the order in which the statements within a program are executed. The most basic one is the conditional statement:Unlike for conditional expressions, a conditional statement may omit the else-branch:
Loops are another way of controlling the flow of execution. The simplest one is the while-loop, which executes a piece of code over and over again as long as a specified condition remains true:
The above loop would iterate over the valid indices of the list in variable a, starting at 0 and continuing until it is no longer true that i < #a. The body of the loop adds the element a[i] to sum, and also increments the variable i itself.
Iterating over the elements of a collection is so common that there is a special construct for it, the foreach-loop. Using it, the loop above could also be written like this:
The part of this loop that directly follows the foreach keyword is a generator, much like those in list comprehensions. And like comprehensions, foreach-loops can have more than one generator:
Procedures are used to abstract and parameterize sequences of statements, just as functions are abstracting and parameterizing expressions. For instance,
Such a procedure can be invoked in the usual way:
Action
- Input patterns: declaring variables
- Guard: specifying enabling conditions
- Output expressions: computing output tokens
- Body: modifying the actor state