Preprocessor
Encyclopedia
In computer science
, a preprocessor is a program
that processes its input data to produce output that is used as input to another program. The output is said to be a preprocessed form of the input data, which is often used by some subsequent programs like compiler
s. The amount and kind of processing done depends on the nature of the preprocessor; some preprocessors are only capable of performing relatively simple textual substitutions and macro expansions, while others have the power of full-fledged programming language
s.
A common example from computer programming
is the processing performed on source code
before the next step of compilation.
In some computer languages (e.g., C
and PL/I ) there is a phase of translation
known as preprocessing.
, that is, they operate on the source text, prior to any parsing, by performing simple substitution of tokenized
character sequences for other tokenized character sequences, according to user-defined rules. They typically perform macro substitution, textual inclusion
of other files, and conditional compilation or inclusion.
, which takes lines beginning with '#' as directives. Because it knows nothing about the underlying language, its use has been criticized and many of its features built directly into other languages. For example, macros replaced with aggressive inlining and templates, includes with compile-time imports (this requires the preservation of type information in the object code, making this feature impossible to retrofit into a language); conditional compilation is effectively accomplished with
, and GEMA
, an open source macro processor which operates on patterns of context.
family of languages. Their role is to transform syntax trees according to a number of user-defined rules. For some programming languages, the rules are written in the same language as the program (compile-time reflection). This is the case with Lisp
and OCaml. Some other languages rely on a fully external language to define the transformations, such as the XSLT
preprocessor for XML
, or its statically typed counterpart CDuce
.
Syntactic preprocessors are typically used to customize the syntax of a language, extend a language by adding new primitives, or embed a Domain-Specific Programming Language
inside a general purpose language.
Similarly, a number of programs written in OCaml customize the syntax of the language by the addition of new operators.
family of languages. While the languages, by themselves, are simple dynamically typed functional cores, the standard distributions of Scheme or Common Lisp
permit imperative or object-oriented programming, as well as static typing. Almost all of these features are implemented by syntactic preprocessing, although it bears noting that the "macro expansion" phase of compilation is handled by the compiler in Lisp. This can still be considered a form of preprocessing, since it takes place before other phases of compilation.
Similarly, statically checked, type-safe regular expressions or code generation
may be added to the syntax and semantics of OCaml through macros, as well as micro-threads (also known as coroutines or fibers
), monads
or transparent XML manipulation.
family of languages is the possibility of using macros to create an internal
Domain-Specific Programming Language
. Typically, in a large Lisp
-based project, a module may be written in a variety of such minilanguages, one perhaps using a SQL
-based dialect of Lisp
, another written in a dialect specialized for GUIs or pretty-printing, etc. Common Lisp
's standard library contains an example of this level of syntactic abstraction in the form of the LOOP macro, which implements an Algol-like minilanguage to describe complex iteration, while still enabling the use of standard Lisp operators.
The MetaOCaml preprocessor/language provides similar features for external Domain-Specific Programming Languages
. This preprocessor takes the description of the semantics of a language (i.e. an interpreter) and, by combining compile-time interpretation and code generation, turns that definition into a compiler to the OCaml programming language—and from that language, either to bytecode or to native code.
language). A preprocessor may be promoted as being general purpose, meaning that it is not aimed at a specific usage or programming language, and is intended to be used for a wide variety of text processing tasks.
M4
is probably the most well known example of such a general purpose preprocessor, although the C
preprocessor is sometimes used in a non-C specific role. Examples:
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, a preprocessor is a program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...
that processes its input data to produce output that is used as input to another program. The output is said to be a preprocessed form of the input data, which is often used by some subsequent programs like compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...
s. The amount and kind of processing done depends on the nature of the preprocessor; some preprocessors are only capable of performing relatively simple textual substitutions and macro expansions, while others have the power of full-fledged programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
s.
A common example from computer programming
Computer programming
Computer programming is the process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages. The purpose of programming is to create a program that performs specific operations or exhibits a...
is the processing performed on source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...
before the next step of compilation.
In some computer languages (e.g., C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
and PL/I ) there is a phase of translation
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...
known as preprocessing.
Lexical preprocessors
Lexical preprocessors are the lowest-level of preprocessors, in so far as they only require lexical analysisLexical analysis
In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. A program or function which performs lexical analysis is called a lexical analyzer, lexer or scanner...
, that is, they operate on the source text, prior to any parsing, by performing simple substitution of tokenized
Lexical analysis
In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. A program or function which performs lexical analysis is called a lexical analyzer, lexer or scanner...
character sequences for other tokenized character sequences, according to user-defined rules. They typically perform macro substitution, textual inclusion
Header file
Some programming languages use header files. These files allow programmers to separate certain elements of a program's source code into reusable files. Header files commonly contain forward declarations of classes, subroutines, variables, and other identifiers...
of other files, and conditional compilation or inclusion.
C preprocessor
The most common example of this is the C preprocessorC preprocessor
The C preprocessor is the preprocessor for the C and C++ computer programming languages. The preprocessor handles directives for source file inclusion , macro definitions , and conditional inclusion ....
, which takes lines beginning with '#' as directives. Because it knows nothing about the underlying language, its use has been criticized and many of its features built directly into other languages. For example, macros replaced with aggressive inlining and templates, includes with compile-time imports (this requires the preservation of type information in the object code, making this feature impossible to retrofit into a language); conditional compilation is effectively accomplished with
if-then-else
and dead code elimination in some languages.Other lexical preprocessors
Other lexical preprocessors include the general-purpose m4, most commonly used in cross-platform build systems such as autoconfAutoconf
GNU Autoconf is a tool for producing configure scripts for building, installing and packaging software on computer systems where a Bourne shell is available....
, and GEMA
General Purpose Macro Processor
A general purpose macro processor is a macro processor that is not tied to or integrated with a particular language or piece of software.A macro processor is a program that copies a stream of text from one place to another, making a systematic set of replacements as it does so. Macro processors...
, an open source macro processor which operates on patterns of context.
Syntactic preprocessors
Syntactic preprocessors were introduced with the LispLisp programming language
Lisp is a family of computer programming languages with a long history and a distinctive, fully parenthesized syntax. Originally specified in 1958, Lisp is the second-oldest high-level programming language in widespread use today; only Fortran is older...
family of languages. Their role is to transform syntax trees according to a number of user-defined rules. For some programming languages, the rules are written in the same language as the program (compile-time reflection). This is the case with Lisp
Lisp programming language
Lisp is a family of computer programming languages with a long history and a distinctive, fully parenthesized syntax. Originally specified in 1958, Lisp is the second-oldest high-level programming language in widespread use today; only Fortran is older...
and OCaml. Some other languages rely on a fully external language to define the transformations, such as the XSLT
XSLT
XSLT is a declarative, XML-based language used for the transformation of XML documents. The original document is not changed; rather, a new document is created based on the content of an existing one. The new document may be serialized by the processor in standard XML syntax or in another format,...
preprocessor for XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
, or its statically typed counterpart CDuce
CDuce
CDuce is an XML-oriented functional language, which extends XDuce in a few directions.It features XML regular expression types, XML regular expression patterns,XML iterators...
.
Syntactic preprocessors are typically used to customize the syntax of a language, extend a language by adding new primitives, or embed a Domain-Specific Programming Language
Domain-specific programming language
In software development and domain engineering, a domain-specific language is a programming language or specification language dedicated to a particular problem domain, a particular problem representation technique, and/or a particular solution technique...
inside a general purpose language.
Customizing syntax
A good example of syntax customization is the existence of two different syntaxes in the Objective Caml programming language. Programs may be written indifferently using the "normal syntax" or the "revised syntax", and may be pretty-printed with either syntax on demand.Similarly, a number of programs written in OCaml customize the syntax of the language by the addition of new operators.
Extending a language
The best examples of language extension through macros are found in the LispLisp programming language
Lisp is a family of computer programming languages with a long history and a distinctive, fully parenthesized syntax. Originally specified in 1958, Lisp is the second-oldest high-level programming language in widespread use today; only Fortran is older...
family of languages. While the languages, by themselves, are simple dynamically typed functional cores, the standard distributions of Scheme or Common Lisp
Common Lisp
Common Lisp, commonly abbreviated CL, is a dialect of the Lisp programming language, published in ANSI standard document ANSI INCITS 226-1994 , . From the ANSI Common Lisp standard the Common Lisp HyperSpec has been derived for use with web browsers...
permit imperative or object-oriented programming, as well as static typing. Almost all of these features are implemented by syntactic preprocessing, although it bears noting that the "macro expansion" phase of compilation is handled by the compiler in Lisp. This can still be considered a form of preprocessing, since it takes place before other phases of compilation.
Similarly, statically checked, type-safe regular expressions or code generation
Automatic programming
In computer science, the term automatic programming identifies a type of computer programming in which some mechanism generates a computer program to allow human programmers to write the code at a higher abstraction level....
may be added to the syntax and semantics of OCaml through macros, as well as micro-threads (also known as coroutines or fibers
Fiber (computer science)
In computer science, a fiber is a particularly lightweight thread of execution.Like threads, fibers share address space. However, fibers use co-operative multitasking while threads use pre-emptive multitasking. Threads often depend on the kernel's thread scheduler to preempt a busy thread and...
), monads
Monads in functional programming
In functional programming, a monad is a programming structure that represents computations. Monads are a kind of abstract data type constructor that encapsulate program logic instead of data in the domain model...
or transparent XML manipulation.
Specializing a language
One of the unusual features of the LispLisp programming language
Lisp is a family of computer programming languages with a long history and a distinctive, fully parenthesized syntax. Originally specified in 1958, Lisp is the second-oldest high-level programming language in widespread use today; only Fortran is older...
family of languages is the possibility of using macros to create an internal
Domain-Specific Programming Language
Domain-specific programming language
In software development and domain engineering, a domain-specific language is a programming language or specification language dedicated to a particular problem domain, a particular problem representation technique, and/or a particular solution technique...
. Typically, in a large Lisp
Lisp programming language
Lisp is a family of computer programming languages with a long history and a distinctive, fully parenthesized syntax. Originally specified in 1958, Lisp is the second-oldest high-level programming language in widespread use today; only Fortran is older...
-based project, a module may be written in a variety of such minilanguages, one perhaps using a SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....
-based dialect of Lisp
Lisp programming language
Lisp is a family of computer programming languages with a long history and a distinctive, fully parenthesized syntax. Originally specified in 1958, Lisp is the second-oldest high-level programming language in widespread use today; only Fortran is older...
, another written in a dialect specialized for GUIs or pretty-printing, etc. Common Lisp
Common Lisp
Common Lisp, commonly abbreviated CL, is a dialect of the Lisp programming language, published in ANSI standard document ANSI INCITS 226-1994 , . From the ANSI Common Lisp standard the Common Lisp HyperSpec has been derived for use with web browsers...
's standard library contains an example of this level of syntactic abstraction in the form of the LOOP macro, which implements an Algol-like minilanguage to describe complex iteration, while still enabling the use of standard Lisp operators.
The MetaOCaml preprocessor/language provides similar features for external Domain-Specific Programming Languages
Domain-specific programming language
In software development and domain engineering, a domain-specific language is a programming language or specification language dedicated to a particular problem domain, a particular problem representation technique, and/or a particular solution technique...
. This preprocessor takes the description of the semantics of a language (i.e. an interpreter) and, by combining compile-time interpretation and code generation, turns that definition into a compiler to the OCaml programming language—and from that language, either to bytecode or to native code.
General purpose preprocessor
Most preprocessors are specific to a particular data processing task (e.g., compiling the CC (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
language). A preprocessor may be promoted as being general purpose, meaning that it is not aimed at a specific usage or programming language, and is intended to be used for a wide variety of text processing tasks.
M4
M4 (computer language)
m4 is a general purpose macro processor designed by Brian Kernighan and Dennis Ritchie. m4 is an extension of an earlier macro processor m3, written by Ritchie for the AP-3 minicomputer.-Use:...
is probably the most well known example of such a general purpose preprocessor, although the C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
preprocessor is sometimes used in a non-C specific role. Examples:
- using C preprocessorC preprocessorThe C preprocessor is the preprocessor for the C and C++ computer programming languages. The preprocessor handles directives for source file inclusion , macro definitions , and conditional inclusion ....
for JavascriptJavaScriptJavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....
preprocessing. - using M4M4 (computer language)m4 is a general purpose macro processor designed by Brian Kernighan and Dennis Ritchie. m4 is an extension of an earlier macro processor m3, written by Ritchie for the AP-3 minicomputer.-Use:...
(see on-article example) or C preprocessorC preprocessorThe C preprocessor is the preprocessor for the C and C++ computer programming languages. The preprocessor handles directives for source file inclusion , macro definitions , and conditional inclusion ....
as a template engineTemplate engineA template engine is software that is designed to process web templates and content information to produce output web documents. It runs in the context of a template system.-Types:...
, to HTMLHTMLHyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....
generation. - imakeImakeimake is a build automation system implemented on top of the C preprocessor.imake generates makefiles from a template, a set of cpp macro functions, and a per-directory input file called an Imakefile...
, a make interface using the C preprocessorC preprocessorThe C preprocessor is the preprocessor for the C and C++ computer programming languages. The preprocessor handles directives for source file inclusion , macro definitions , and conditional inclusion ....
, used in the X Window SystemX Window SystemThe X window system is a computer software system and network protocol that provides a basis for graphical user interfaces and rich input device capability for networked computers...
but now deprecated in favour of automakeAutomakeGNU Automake is a programming tool that produces portable makefiles for use by the make program, used in compiling software. It is made by the Free Software Foundation as one of GNU programs, and is part of the GNU build system. The makefiles produced follow the GNU Coding Standards.It is written...
. - grompp, a preprocessor for simulation input files for GROMACS (a fast, free, open-source code for some problems in computational chemistryComputational chemistryComputational chemistry is a branch of chemistry that uses principles of computer science to assist in solving chemical problems. It uses the results of theoretical chemistry, incorporated into efficient computer programs, to calculate the structures and properties of molecules and solids...
) which calls the system C preprocessorC preprocessorThe C preprocessor is the preprocessor for the C and C++ computer programming languages. The preprocessor handles directives for source file inclusion , macro definitions , and conditional inclusion ....
(or other preprocessor as determined by the simulation input file) to parse the topology, using mostly the #define and #include mechanisms to determine the effective topology at grompp run time.
See also
- Directive (programming)Directive (programming)In computer programming, the term directive is applied in a variety of ways that are similar to the term command. It is also used to describe some programming language constructs ....
- MetaprogrammingMetaprogrammingMetaprogramming is the writing of computer programs that write or manipulate other programs as their data, or that do part of the work at compile time that would otherwise be done at runtime...
- Macros
- Snippet management
- Template engineTemplate engineA template engine is software that is designed to process web templates and content information to produce output web documents. It runs in the context of a template system.-Types:...
- The C preprocessorC preprocessorThe C preprocessor is the preprocessor for the C and C++ computer programming languages. The preprocessor handles directives for source file inclusion , macro definitions , and conditional inclusion ....
- The OCaml preprocessor-pretty-printerCamlp4Camlp4 is a software system for writing extensible parsers for programming languages. It provides a set of Objective Caml libraries that are used to define grammars as well as loadable syntax extensions of such grammars...
- The Windows software trace preprocessorWindows software trace preprocessorThe Windows software trace preprocessor is a preprocessor that simplifies the use of WMI event tracing to implement efficient software tracing in drivers and applications that target Windows 2000 and later operating systems...
External links
- DSL Design in Lisp
- Programming from the bottom up
- The Generic PreProcessor
- Gema, the General Purpose Macro Processor
- The PIKTPIKTPIKT is cross-categorical, multi-purpose software for global-view, site-at-a-time system and network administration. Applicability includes system monitoring, configuration management, server and network administration, system security, and many other uses....
piktc text, script, and configuration file preprocessor - pyexpander, a python based general purpose macro processor
- minimac, a minimalist macro processor
- Java Comment Preprocessor