Perl language structure
Encyclopedia
The structure of the Perl
programming language encompasses both the syntactical rules of the language and the general ways in which programs are organized. Perl's design philosophy is expressed in the commonly cited motto "there's more than one way to do it". As a multi-paradigm
, dynamically typed
language, Perl allows a great degree of flexibility in program design. Perl also encourages modularization; this has been attributed to the component-based design structure of its Unix roots, and is responsible for the size of the CPAN
archive, a community maintained repository of approximately 20,000 modules.
This prints
the string
Hello, world! and a newline
, symbolically expressed by an
(a backslash).
An entire Perl program may also be specified as a command-line parameter to Perl, so the same program can also be executed from the command line (example shown for Unix):
The canonical form of the program is slightly more verbose:
The hash mark character introduces a comment
in Perl, which runs up to the end of the line of code and is ignored by the compiler (except on Windows). The comment used here is of a special kind: it’s called the shebang
line. This tells Unix-like operating systems to find the Perl interpreter, making it possible to invoke the program without explicitly mentioning
systems, Perl programs are typically invoked by associating the
with the Perl interpreter. In order to deal with such circumstances,
The second line in the canonical form includes a semicolon, which is used to separate statements in Perl. With only a single statement in a block or file, a separator is unnecessary, so it can be omitted from the minimal form of the program—or more generally from the final statement in any block or file. The canonical form includes it because it is common to terminate every statement even when it is unnecessary to do so, as this makes editing easier: code can be added to, or moved away from, the end of a block or file without having to adjust semicolons.
Version 5.10 of Perl introduces a
s. The most commonly used and discussed are scalars
, array
s, hashes
, filehandles, and subroutines:
To include a double quotation mark in a string, precede it with a backslash or enclose the string in single quotes. To include a single quotation mark, precede it with a backslash or enclose the string in double quotes. Strings can also be quoted with the q and qq quote-like operators. 'this' is identical to q(this) and "$this" is identical to qq($this).
Finally, multiline strings can be defined using here documents:
Numbers (numeric constants) do not require quotation. Perl will convert numbers into strings and vice versa depending on the context in which they are used. When strings are converted into numbers, trailing non-numeric parts of the strings are discarded. If no leading part of a string is numeric, the string will be converted to the number 0. In the following example, the strings $n and $m are treated as numbers. This code prints the number '5'. The values of the variables remain the same. Note that in Perl,
Functions are provided for the rounding
of fractional values to integer values:
Perl also has a boolean context that it uses in evaluating conditional statements. The following values all evaluate as false in Perl:
All other (non-zero evaluating) values evaluate to true. This includes the odd self-describing literal string of "0 but true", which in fact is 0 as a number, but true when used as a boolean. All non-numeric strings also have this property, but this particular string is truncated by Perl without a numeric warning. A less explicit but more conceptually portable version of this string is '0E0' or '0e0', which does not rely on characters being evaluated as 0, because '0E0' is literally zero times ten to the power zero.
Evaluated boolean expressions are also scalar values. The documentation does not promise which particular value of true or false is returned. Many boolean operators return 1 for true and the empty-string for false. The defined function determines whether a variable has any value set. In the above examples, defined($false) is true for every value except undef.
If either 1 or 0 are specifically needed, an explicit conversion can be done using the conditional operator:
(or list) is specified by listing its elements, separated by commas, enclosed by parentheses (at least where required by operator precedence).
The qw quote-like operator allows the definition of a list of strings without typing of quotes and commas. Almost any delimiter can be used instead of parentheses. The following lines are equivalent:
The split function returns a list of strings, which are split from a string expression using a delimiter string or regular expression.
Individual elements of a list are accessed by providing a numerical index in square brackets. The scalar sigil must be used. Sublists (array slices) can also be specified, using a range or list of numeric indices in brackets. The array sigil is used in this case. For example, $month[3] is "April" (the first element in an array has an index value of 0), and @month[4..6] is ("May", "June", "July").
) from a list of key/value pairs. If the keys are separated from the values with the
Individual values in a hash are accessed by providing the corresponding key, in curly braces. The
Multiple elements may be accessed using the
@favorite{'joe', 'sam'} equals ('red', 'blue').
, or even a scalar value.
Originally, filehandles could only be created with package variables, using the ALL_CAPS convention to distinguish it from other variables. Perl 5.6 and newer also accept a scalar variable, which will be set (autovivified
) to a reference to an anonymous filehandle, in place of a named filehandle.
It has block-oriented control structures, similar to those in the C, JavaScript
, and Java
programming languages. Conditions are surrounded by parentheses, and controlled blocks are surrounded by braces:
label while ( cond ) { ... }
label while ( cond ) { ... } continue { ... }
label for ( init-expr ; cond-expr ; incr-expr ) { ... }
label foreach var ( list ) { ... }
label foreach var ( list ) { ... } continue { ... }
if ( cond ) { ... }
if ( cond ) { ... } else { ... }
if ( cond ) { ... } elsif ( cond ) { ... } else { ... }
Where only a single statement is being controlled, statement modifiers provide a more-concise syntax:
statement if cond ;
statement unless cond ;
statement while cond ;
statement until cond ;
statement foreach list ;
Short-circuit logical operators are commonly used to affect control flow at the expression level:
expr and expr
expr && expr
expr or expr
expr|| expr
(The "and" and "or" operators are similar to && and|| but have lower precedence
, which makes it easier to use them to control entire statements.)
The flow control keywords
Perl also has two implicit looping constructs, each of which has two forms:
results = grep { ... } list
results = grep expr, list
results = map { ... } list
results = map expr, list
style.
Up until the 5.10.0 release, there was no switch statement
in Perl 5. From 5.10.0 onward, a multi-way branch statement called
use v5.10; # must be present to import the new 5.10 functions
given ( expr ) { when ( cond ) { ... } default { ... } }
Syntactically, this structure behaves similarly to switch statement
s found in other languages, but with a few important differences. The largest is that unlike switch/case structures, given/when statements break execution after the first successful branch, rather than waiting for explicitly defined break commands. Conversely, explicit continues are instead necessary to emulate switch behavior.
For those not using Perl 5.10, the Perl documentation describes a half-dozen ways to achieve the same effect by using other control structures. There is also a Switch module, which provides functionality modeled on the forthcoming Perl 6
re-design. It is implemented using a source filter, so its use is unofficially discouraged.
Perl includes a
There is also a
. It terminates the current subroutine and immediately calls the specified
management than Perl itself (typically because no change to the current stack is required), and in deep recursion, tail calling can have substantial positive impact on performance because it avoids the overhead of scope/stack management on return.
s are defined with the
A list of arguments may be provided after the subroutine name. Arguments may be scalars, lists, or hashes.
The parameters to a subroutine do not need to be declared as to either number or type; in fact, they may vary from call to call. Any validation of parameters must be performed explicitly inside the subroutine.
Arrays are expanded to their elements; hashes are expanded to a list of key/value pairs; and the whole lot is passed into the subroutine as one flat list of scalars.
Whatever arguments are passed are available to the subroutine in the special array
Elements of
However, the resulting code can be difficult to read, and the parameters have pass-by-reference semantics, which may be undesirable.
One common idiom is to assign
This provides mnemonic parameter names and implements pass-by-value semantics. The
Another idiom is to shift parameters off of
Subroutines may assign
Subroutines may return values.
If the subroutine does not exit via a
The returned expression is evaluated in the calling context of the subroutine; this can surprise the unwary.
A subroutine can discover its calling context with the
s (RE, or regexes), and the interpreter contains an engine for matching strings to regular expressions. The regular-expression engine uses a backtracking
algorithm, extending its capabilities from simple pattern matching to string capture and substitution. The regular-expression engine is derived from regex written by Henry Spencer
.
The Perl regular-expression syntax was originally taken from Unix Version 8 regular expressions. However, it diverged before the first release of Perl and has since grown to include far more features. Many other languages and applications are now adopting Perl compatible regular expressions over POSIX
regular expressions, such as PHP
, Ruby, Java
, Microsoft's .NET Framework
, and the Apache HTTP server
.
Regular-expression syntax is extremely compact, owing to history. The first regular-expression dialects were only slightly more expressive than globs, and the syntax was designed so that an expression would resemble the text that it matches. This meant using no more than a single punctuation character or a pair of delimiting characters to express the few supported assertions. Over time, the expressiveness of regular expressions grew tremendously, but the syntax design was never revised and continues to rely on punctuation. As a result, regular expressions can be cryptic and extremely dense.
evaluates to true if and only if
the string
The
Another use of regular expressions is to specify delimiters for the
The
Because the compact syntax of regular expressions can make them dense and cryptic, the
Captured strings
Perl regular expressions also allow built-in or user-defined functions apply to the captured match, by using the
code in Perl. The most basic is using "blessed" references
. This works by identifying a reference of any type as belonging to a given package, and the package provides the methods for the blessed reference. For example, a two-dimensional point could be defined this way:
This class can be used by invoking
Many modern Perl applications use the Moose
object system. Moose is built on top of Class::MOP, a meta-object protocol, providing complete introspection for all Moose-using classes. Thus you can ask classes about their attributes, parents, children, methods, etc. using a simple API.
Moose classes:
Moose roles:
This is a class named
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...
programming language encompasses both the syntactical rules of the language and the general ways in which programs are organized. Perl's design philosophy is expressed in the commonly cited motto "there's more than one way to do it". As a multi-paradigm
Programming paradigm
A programming paradigm is a fundamental style of computer programming. Paradigms differ in the concepts and abstractions used to represent the elements of a program and the steps that compose a computation A programming paradigm is a fundamental style of computer programming. (Compare with a...
, dynamically typed
Type system
A type system associates a type with each computed value. By examining the flow of these values, a type system attempts to ensure or prove that no type errors can occur...
language, Perl allows a great degree of flexibility in program design. Perl also encourages modularization; this has been attributed to the component-based design structure of its Unix roots, and is responsible for the size of the CPAN
CPAN
CPAN, the Comprehensive Perl Archive Network, is an archive of nearly 100,000 modules of software written in Perl, as well as documentation for it. It has a presence on the World Wide Web at and is mirrored worldwide at more than 200 locations...
archive, a community maintained repository of approximately 20,000 modules.
Basic syntax
In Perl, the minimal Hello world program may be written as follows:This prints
Input/output
In computing, input/output, or I/O, refers to the communication between an information processing system , and the outside world, possibly a human, or another information processing system. Inputs are the signals or data received by the system, and outputs are the signals or data sent from it...
the string
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....
Hello, world! and a newline
Newline
In computing, a newline, also known as a line break or end-of-line marker, is a special character or sequence of characters signifying the end of a line of text. The name comes from the fact that the next character after the newline will appear on a new line—that is, on the next line below the...
, symbolically expressed by an
n
character whose interpretation is altered by the preceding escape characterEscape character
In computing and telecommunication, an escape character is a character which invokes an alternative interpretation on subsequent characters in a character sequence. An escape character is a particular case of metacharacters...
(a backslash).
An entire Perl program may also be specified as a command-line parameter to Perl, so the same program can also be executed from the command line (example shown for Unix):
The canonical form of the program is slightly more verbose:
The hash mark character introduces a comment
Comment (computer programming)
In computer programming, a comment is a programming language construct used to embed programmer-readable annotations in the source code of a computer program. Those annotations are potentially significant to programmers but typically ignorable to compilers and interpreters. Comments are usually...
in Perl, which runs up to the end of the line of code and is ignored by the compiler (except on Windows). The comment used here is of a special kind: it’s called the shebang
Shebang (Unix)
In computing, a shebang is the character sequence consisting of the characters number sign and exclamation point , when it occurs as the first two characters on the first line of a text file...
line. This tells Unix-like operating systems to find the Perl interpreter, making it possible to invoke the program without explicitly mentioning
perl
. (Note that, on Microsoft WindowsMicrosoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...
systems, Perl programs are typically invoked by associating the
.pl
extensionFilename extension
A filename extension is a suffix to the name of a computer file applied to indicate the encoding of its contents or usage....
with the Perl interpreter. In order to deal with such circumstances,
perl
detects the shebang line and parses it for switches.)The second line in the canonical form includes a semicolon, which is used to separate statements in Perl. With only a single statement in a block or file, a separator is unnecessary, so it can be omitted from the minimal form of the program—or more generally from the final statement in any block or file. The canonical form includes it because it is common to terminate every statement even when it is unnecessary to do so, as this makes editing easier: code can be added to, or moved away from, the end of a block or file without having to adjust semicolons.
Version 5.10 of Perl introduces a
say
function that implicitly appends a newline character to its output, making the minimal "Hello world" program even shorter:Data types
Perl has a number of fundamental data typeData type
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of...
s. The most commonly used and discussed are scalars
Scalar (computing)
In computing, a scalar variable or field is one that can hold only one value at a time; as opposed to composite variables like array, list, hash, record, etc. In some contexts, a scalar value may be understood to be numeric. A scalar data type is the type of a scalar variable...
, array
Array data type
In computer science, an array type is a data type that is meant to describe a collection of elements , each selected by one or more indices that can be computed at run time by the program. Such a collection is usually called an array variable, array value, or simply array...
s, hashes
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...
, filehandles, and subroutines:
Type | Sigil | Example | Description |
---|---|---|---|
Scalar Scalar (computing) In computing, a scalar variable or field is one that can hold only one value at a time; as opposed to composite variables like array, list, hash, record, etc. In some contexts, a scalar value may be understood to be numeric. A scalar data type is the type of a scalar variable... |
$ | $foo | A single value; it may be a number, a string String (computer science) In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet.... , a filehandle, or a reference Reference (computer science) In computer science, a reference is a value that enables a program to indirectly access a particular data item, such as a variable or a record, in the computer's memory or in some other storage device. The reference is said to refer to the data item, and accessing those data is called... . |
Array Array data type In computer science, an array type is a data type that is meant to describe a collection of elements , each selected by one or more indices that can be computed at run time by the program. Such a collection is usually called an array variable, array value, or simply array... |
@ | @foo | An ordered collection of scalars. |
Hash Associative array In computer science, an associative array is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection.... |
% | %foo | A map from strings to scalars; the strings are called keys, and the scalars are called values. Also known as an associative array. |
Filehandle | none | $foo or FOO | An opaque representation of an open file or other target for reading, writing, or both. |
Subroutine Subroutine In computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code.... |
& | &foo | A piece of code that may be passed arguments, be executed, and return data. |
Typeglob | * | *foo | The symbol table entry for all types with the name 'foo'. |
Scalar values
String values (literals) must be enclosed by quotes. Enclosing a string in double quotes allows the values of variables whose names appear in the string to automatically replace the variable name (or be interpolated) in the string. Enclosing a string in single quotes prevents variable interpolation. If $name is "Jim", print("My name is $name") will print "My name is Jim", but print('My name is $name') will print "My name is $name".To include a double quotation mark in a string, precede it with a backslash or enclose the string in single quotes. To include a single quotation mark, precede it with a backslash or enclose the string in double quotes. Strings can also be quoted with the q and qq quote-like operators. 'this' is identical to q(this) and "$this" is identical to qq($this).
Finally, multiline strings can be defined using here documents:
Numbers (numeric constants) do not require quotation. Perl will convert numbers into strings and vice versa depending on the context in which they are used. When strings are converted into numbers, trailing non-numeric parts of the strings are discarded. If no leading part of a string is numeric, the string will be converted to the number 0. In the following example, the strings $n and $m are treated as numbers. This code prints the number '5'. The values of the variables remain the same. Note that in Perl,
+
is always the numeric addition operator. The string concatenation operator is the period.Functions are provided for the rounding
Rounding
Rounding a numerical value means replacing it by another value that is approximately equal but has a shorter, simpler, or more explicit representation; for example, replacing $23.4476 with $23.45, or the fraction 312/937 with 1/3, or the expression √2 with 1.414.Rounding is often done on purpose to...
of fractional values to integer values:
int
chops off the fractional part, rounding towards zero; POSIX::ceil
and POSIX::floor
round always up and always down, respectively. The number-to-string conversion of printf "%f"
or sprintf "%f"
round out even, use bankers' rounding.Perl also has a boolean context that it uses in evaluating conditional statements. The following values all evaluate as false in Perl:
All other (non-zero evaluating) values evaluate to true. This includes the odd self-describing literal string of "0 but true", which in fact is 0 as a number, but true when used as a boolean. All non-numeric strings also have this property, but this particular string is truncated by Perl without a numeric warning. A less explicit but more conceptually portable version of this string is '0E0' or '0e0', which does not rely on characters being evaluated as 0, because '0E0' is literally zero times ten to the power zero.
Evaluated boolean expressions are also scalar values. The documentation does not promise which particular value of true or false is returned. Many boolean operators return 1 for true and the empty-string for false. The defined function determines whether a variable has any value set. In the above examples, defined($false) is true for every value except undef.
If either 1 or 0 are specifically needed, an explicit conversion can be done using the conditional operator:
Array values
An array valueArray data type
In computer science, an array type is a data type that is meant to describe a collection of elements , each selected by one or more indices that can be computed at run time by the program. Such a collection is usually called an array variable, array value, or simply array...
(or list) is specified by listing its elements, separated by commas, enclosed by parentheses (at least where required by operator precedence).
The qw quote-like operator allows the definition of a list of strings without typing of quotes and commas. Almost any delimiter can be used instead of parentheses. The following lines are equivalent:
The split function returns a list of strings, which are split from a string expression using a delimiter string or regular expression.
Individual elements of a list are accessed by providing a numerical index in square brackets. The scalar sigil must be used. Sublists (array slices) can also be specified, using a range or list of numeric indices in brackets. The array sigil is used in this case. For example, $month[3] is "April" (the first element in an array has an index value of 0), and @month[4..6] is ("May", "June", "July").
Hash values
Perl programmers may initialize a hash (or associative arrayAssociative array
In computer science, an associative array is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection....
) from a list of key/value pairs. If the keys are separated from the values with the
=>
operator (sometimes called a fat comma), rather than a comma, they may be unquoted (barewords). The following lines are equivalent:Individual values in a hash are accessed by providing the corresponding key, in curly braces. The
$
sigil identifies the accessed element as a scalar. For example, $favorite{joe} equals 'red'. A hash can also be initialized by setting its values individually:Multiple elements may be accessed using the
@
sigil instead (identifying the result as a list). For example,@favorite{'joe', 'sam'} equals ('red', 'blue').
Filehandles
Filehandles provide read and write access to resources. These are most often files on disk, but can also be a device, a pipePipeline (Unix)
In Unix-like computer operating systems , a pipeline is the original software pipeline: a set of processes chained by their standard streams, so that the output of each process feeds directly as input to the next one. Each connection is implemented by an anonymous pipe...
, or even a scalar value.
Originally, filehandles could only be created with package variables, using the ALL_CAPS convention to distinguish it from other variables. Perl 5.6 and newer also accept a scalar variable, which will be set (autovivified
Autovivification
Autovivification is a distinguishing feature of the Perl programming language involving the dynamic creation of data structures. Autovivification is the automatic creation of a variable reference when an undefined value is dereferenced...
) to a reference to an anonymous filehandle, in place of a named filehandle.
Typeglob values
A typeglob value is a symbol table entry. The main use of typeglobs is creating symbol table aliases. For example:Array functions
The number of elements in an array can be determined either by evaluating the array in scalar context or with the help of the$#
sigil. The latter gives the index of the last element in the array, not the number of elements. The expressions scalar(@array) and ($#array + 1) are equivalent.Hash functions
There are a few functions that operate on entire hashes. The keys function takes a hash and returns the list of its keys. Similarly, the values function returns a hash's values. Note that the keys and values are returned in a consistent but arbitrary order.Control structures
Perl has several kinds of control structures.It has block-oriented control structures, similar to those in the C, JavaScript
JavaScript
JavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....
, and Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
programming languages. Conditions are surrounded by parentheses, and controlled blocks are surrounded by braces:
label while ( cond ) { ... }
label while ( cond ) { ... } continue { ... }
label for ( init-expr ; cond-expr ; incr-expr ) { ... }
label foreach var ( list ) { ... }
label foreach var ( list ) { ... } continue { ... }
if ( cond ) { ... }
if ( cond ) { ... } else { ... }
if ( cond ) { ... } elsif ( cond ) { ... } else { ... }
Where only a single statement is being controlled, statement modifiers provide a more-concise syntax:
statement if cond ;
statement unless cond ;
statement while cond ;
statement until cond ;
statement foreach list ;
Short-circuit logical operators are commonly used to affect control flow at the expression level:
expr and expr
expr && expr
expr or expr
expr
(The "and" and "or" operators are similar to && and
Order of operations
In mathematics and computer programming, the order of operations is a rule used to clarify unambiguously which procedures should be performed first in a given mathematical expression....
, which makes it easier to use them to control entire statements.)
The flow control keywords
next
(corresponding to C's continue
), last
(corresponding to C's break
), return
, and redo
are expressions, so they can be used with short-circuit operators.Perl also has two implicit looping constructs, each of which has two forms:
results = grep { ... } list
results = grep expr, list
results = map { ... } list
results = map expr, list
grep
returns all elements of list for which the controlled block or expression evaluates to true. map
evaluates the controlled block or expression for each element of list and returns a list of the resulting values. These constructs enable a simple functional programmingFunctional programming
In computer science, functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data. It emphasizes the application of functions, in contrast to the imperative programming style, which emphasizes changes in state...
style.
Up until the 5.10.0 release, there was no switch statement
Switch statement
In computer programming, a switch, case, select or inspect statement is a type of selection control mechanism that exists in most imperative programming languages such as Pascal, Ada, C/C++, C#, Java, and so on. It is also included in several other types of languages...
in Perl 5. From 5.10.0 onward, a multi-way branch statement called
given
/when
is available, which takes the following form:use v5.10; # must be present to import the new 5.10 functions
given ( expr ) { when ( cond ) { ... } default { ... } }
Syntactically, this structure behaves similarly to switch statement
Switch statement
In computer programming, a switch, case, select or inspect statement is a type of selection control mechanism that exists in most imperative programming languages such as Pascal, Ada, C/C++, C#, Java, and so on. It is also included in several other types of languages...
s found in other languages, but with a few important differences. The largest is that unlike switch/case structures, given/when statements break execution after the first successful branch, rather than waiting for explicitly defined break commands. Conversely, explicit continues are instead necessary to emulate switch behavior.
For those not using Perl 5.10, the Perl documentation describes a half-dozen ways to achieve the same effect by using other control structures. There is also a Switch module, which provides functionality modeled on the forthcoming Perl 6
Perl 6
Perl 6 is a major revision to the Perl programming language. It is still in development, as a specification from which several interpreter and compiler implementations are being written. It is introducing elements of many modern and historical languages. Perl 6 is intended to have many...
re-design. It is implemented using a source filter, so its use is unofficially discouraged.
Perl includes a
goto label
statement, but it is rarely used. Situations where a goto
is called for in other languages don't occur as often in Perl because of its breadth of flow control options.There is also a
goto &sub
statement that performs a tail callTail call
In computer science, a tail call is a subroutine call that happens inside another procedure and that produces a return value, which is then immediately returned by the calling procedure. The call site is then said to be in tail position, i.e. at the end of the calling procedure. If a subroutine...
. It terminates the current subroutine and immediately calls the specified
sub
. This is used in situations where a caller can perform more-efficient stackCall stack
In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program. This kind of stack is also known as an execution stack, control stack, run-time stack, or machine stack, and is often shortened to just "the stack"...
management than Perl itself (typically because no change to the current stack is required), and in deep recursion, tail calling can have substantial positive impact on performance because it avoids the overhead of scope/stack management on return.
Subroutines
SubroutineSubroutine
In computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....
s are defined with the
sub
keyword and are invoked simply by naming them. If the subroutine in question has not yet been declared, invocation requires either parentheses after the function name or an ampersand (&) before it. But using & without parentheses will also implicitly pass the arguments of the current subroutine to the one called, and using & with parentheses will bypass prototypes.A list of arguments may be provided after the subroutine name. Arguments may be scalars, lists, or hashes.
The parameters to a subroutine do not need to be declared as to either number or type; in fact, they may vary from call to call. Any validation of parameters must be performed explicitly inside the subroutine.
Arrays are expanded to their elements; hashes are expanded to a list of key/value pairs; and the whole lot is passed into the subroutine as one flat list of scalars.
Whatever arguments are passed are available to the subroutine in the special array
@_
. The elements of @_
are references to the actual arguments; changing an element of @_
changes the corresponding argument.Elements of
@_
may be accessed by subscripting it in the usual way.However, the resulting code can be difficult to read, and the parameters have pass-by-reference semantics, which may be undesirable.
One common idiom is to assign
@_
to a list of named variables.This provides mnemonic parameter names and implements pass-by-value semantics. The
my
keyword indicates that the following variables are lexically scoped to the containing block.Another idiom is to shift parameters off of
@_
. This is especially common when the subroutine takes only one argument or for handling the $self
argument in object-oriented modules.Subroutines may assign
@_
to a hash to simulate named arguments; this is recommended in Perl Best Practices for subroutines that are likely to ever have more than three parameters.Subroutines may return values.
If the subroutine does not exit via a
return
statement, then it returns the last expression evaluated within the subroutine body. Arrays and hashes in the return value are expanded to lists of scalars, just as they are for arguments.The returned expression is evaluated in the calling context of the subroutine; this can surprise the unwary.
A subroutine can discover its calling context with the
wantarray
function.Regular expressions
The Perl language includes a specialized syntax for writing regular expressionRegular expression
In computing, a regular expression provides a concise and flexible means for "matching" strings of text, such as particular characters, words, or patterns of characters. Abbreviations for "regular expression" include "regex" and "regexp"...
s (RE, or regexes), and the interpreter contains an engine for matching strings to regular expressions. The regular-expression engine uses a backtracking
Backtracking
Backtracking is a general algorithm for finding all solutions to some computational problem, that incrementally builds candidates to the solutions, and abandons each partial candidate c as soon as it determines that c cannot possibly be completed to a valid solution.The classic textbook example...
algorithm, extending its capabilities from simple pattern matching to string capture and substitution. The regular-expression engine is derived from regex written by Henry Spencer
Henry Spencer
Henry Spencer is a Canadian computer programmer and space enthusiast. He wrote "regex", a widely-used software library for regular expressions, and co-wrote C News, a Usenet server program. He also authored The Ten Commandments for C Programmers. He is coauthor, with David Lawrence, of the book...
.
The Perl regular-expression syntax was originally taken from Unix Version 8 regular expressions. However, it diverged before the first release of Perl and has since grown to include far more features. Many other languages and applications are now adopting Perl compatible regular expressions over POSIX
POSIX
POSIX , an acronym for "Portable Operating System Interface", is a family of standards specified by the IEEE for maintaining compatibility between operating systems...
regular expressions, such as PHP
PHP
PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...
, Ruby, Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
, Microsoft's .NET Framework
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...
, and the Apache HTTP server
Apache HTTP Server
The Apache HTTP Server, commonly referred to as Apache , is web server software notable for playing a key role in the initial growth of the World Wide Web. In 2009 it became the first web server software to surpass the 100 million website milestone...
.
Regular-expression syntax is extremely compact, owing to history. The first regular-expression dialects were only slightly more expressive than globs, and the syntax was designed so that an expression would resemble the text that it matches. This meant using no more than a single punctuation character or a pair of delimiting characters to express the few supported assertions. Over time, the expressiveness of regular expressions grew tremendously, but the syntax design was never revised and continues to rely on punctuation. As a result, regular expressions can be cryptic and extremely dense.
Uses
Them//
(match) operator introduces a regular-expression match. (If it is delimited by slashes, as in all of the examples here, then the leading m
may be omitted for brevity. If the m
is present, as in all of the following examples, other delimiters can be used in place of slashes.) In the simplest case, an expression such asevaluates to true if and only if
If and only if
In logic and related fields such as mathematics and philosophy, if and only if is a biconditional logical connective between statements....
the string
$x
matches the regular expression abc
.The
s///
(substitute) operator, on the other hand, specifies a search-and-replace operation:Another use of regular expressions is to specify delimiters for the
split
function:The
split
function creates a list of the parts of the string that are separated by matches of the regular expression. In this example, a line is divided into a list of its comma-separated parts, and this list is then assigned to the @words
array.Modifiers
Perl regular expressions can take modifiers. These are single-letter suffixes that modify the meaning of the expression:Because the compact syntax of regular expressions can make them dense and cryptic, the
/x
modifier was added in Perl to help programmers write more-legible regular expressions. It allows programmers to place whitespace and comments inside regular expressions:Capturing
Portions of a regular expression may be enclosed in parentheses; corresponding portions of a matching string are captured. Captured strings are assigned to the sequential built-in variables$1, $2, $3, ...
, and a list of captured strings is returned as the value of the match.Captured strings
$1, $2, $3, ...
can be used later in the code.Perl regular expressions also allow built-in or user-defined functions apply to the captured match, by using the
/e
modifier:Objects
There are many ways to write object-orientedObject-oriented programming
Object-oriented programming is a programming paradigm using "objects" – data structures consisting of data fields and methods together with their interactions – to design applications and computer programs. Programming techniques may include features such as data abstraction,...
code in Perl. The most basic is using "blessed" references
Reference (computer science)
In computer science, a reference is a value that enables a program to indirectly access a particular data item, such as a variable or a record, in the computer's memory or in some other storage device. The reference is said to refer to the data item, and accessing those data is called...
. This works by identifying a reference of any type as belonging to a given package, and the package provides the methods for the blessed reference. For example, a two-dimensional point could be defined this way:
This class can be used by invoking
new
to construct instances, and invoking distance
on those instances.Many modern Perl applications use the Moose
Moose (Perl)
Moose is an extension of the Perl 5 object system. It brings modern object-oriented language features to Perl 5, making object-oriented programming more consistent and less tedious.-Features:...
object system. Moose is built on top of Class::MOP, a meta-object protocol, providing complete introspection for all Moose-using classes. Thus you can ask classes about their attributes, parents, children, methods, etc. using a simple API.
Moose classes:
- A class has zero or more attributes.
- A class has zero or more methods.
- A class has zero or more superclasses (aka parent classes). A class inherits from its superclass(es).
- A class does zero or more roles, which add the ability to add pre-defined functionality to classes without subclassing.
- A class has a constructor and a destructor.
- A class has a metaclass.
- A class has zero or more method modifiers. These modifiers can apply to its own methods, methods that are inherited from its ancestors, or methods that are provided by roles.
Moose roles:
- A role is something that a class does, somewhat like mixinMixinIn object-oriented programming languages, a mixin is a class that provides a certain functionality to be inherited or just reused by a subclass, while not meant for instantiation , Mixins are synonymous functionally with abstract base classes...
s or interfaces in other object-oriented programming languages. Unlike mixins and interfaces, roles can be applied to individual object instances. - A role has zero or more attributes.
- A role has zero or more methods.
- A role has zero or more method modifiers.
- A role has zero or more required methods.
Examples
An example of a class written using the MooseX::Declare extension to Moose:This is a class named
Point3D
that extends another class named Point
explained in Moose examples. It adds to its base class a new attribute z
, redefines the method set_to
and extends the method clear
.