Glasgow Haskell Compiler
Encyclopedia
The Glorious Glasgow Haskell Compilation System, more commonly known as the Glasgow Haskell Compiler or GHC, is an open source
native code compiler
for the functional
programming
language
Haskell
. The lead developers are Simon Peyton Jones
and Simon Marlow. It is distributed along with the Haskell Platform
.
by Kevin Hammond at the University of Glasgow
. Later that year, the prototype was completely rewritten in Haskell, except for its parser, by Cordelia Hall, Will Partain, and Simon Peyton Jones. Its first beta release was on April 1, 1991 and subsequent releases added a strictness analyzer
as well as language extensions such as monadic I/O
, mutable arrays, unboxed data types, concurrent and parallel programming models (such as software transactional memory
and data parallelism
) and a profiler
.
Peyton Jones, as well as Simon Marlow, later moved to Microsoft Research
in Cambridge, England, where they continue to be primarily responsible for developing GHC. GHC also contains code from more than sixty other contributors.
Since 2009, third-party contributions to GHC have been funded by the Industrial Haskell Group.
), but the runtime system
for Haskell, essential to run programs, is written in C
and C--
.
GHC's front end — incorporating the lexer
, parser and typechecker
— is designed to preserve as much information about the source language as possible until after type inference
is complete, toward the goal of providing clear error messages to users. After type checking, the Haskell code is desugared
into a typed intermediate language
known as "Core" (based on System F
, extended with
, and is now based on an extension to System F known as System FC.
In the tradition of type-directed compilation, GHC's simplifier, or "middle end", where most of the optimizations
implemented in GHC are performed, is structured as a series of source-to-source
transformations
on Core code. The analyses and transformations performed in this compiler stage include demand analysis (a generalization of strictness analysis
), application of user-defined rewrite rules (including a set of rules included in GHC's standard libraries that performs foldr/build fusion
), unfolding (called "inlining" in more traditional compilers), let-floating, an analysis that determines which function arguments can be unboxed, constructed product result analysis
, specialization
of overloaded
functions, as well as a set of simpler local transformations such as constant folding
and beta reduction.
The back end of the compiler transforms Core code into an internal representation of C--, via an intermediate language STG (short for "Spineless Tagless G-machine"). The C-- code can then take one of three routes: it is either printed as C code for compilation with GCC
, converted directly into native machine code (the traditional "code generation" phase), or converted to LLVM
virtual machine code for compilation with LLVM. In all three cases, the resultant native code is finally linked against the GHC runtime system to produce an executable.
(STM) library, which allows for Composable Memory Transactions.
The extensions supported by the Glasgow Haskell Compiler include:
and type class
es.
The Glasgow Haskell Compiler supports an extended type system based on the theoretical System Fc. Major extensions to the type system include:
Extensions relating to type class
es include:
and most varieties of Unix
(such as the numerous GNU/Linux
flavors, FreeBSD
, and Mac OS X
). GHC has also been ported
to several different processor architectures
.
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...
native code compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...
for the functional
Functional programming
In computer science, functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data. It emphasizes the application of functions, in contrast to the imperative programming style, which emphasizes changes in state...
programming
Computer programming
Computer programming is the process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages. The purpose of programming is to create a program that performs specific operations or exhibits a...
language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
Haskell
Haskell (programming language)
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...
. The lead developers are Simon Peyton Jones
Simon Peyton Jones
Simon Peyton Jones is a British computer scientist who researches the implementation and applications of functional programming languages, particularly lazy functional languages...
and Simon Marlow. It is distributed along with the Haskell Platform
Haskell Platform
The Haskell Platform is a collection of software-packages, tools and libraries, which is to create a common platform for using and developing applications in Haskell. With the Haskell Platform, Haskell follows the same principle as Python: "Batteries included"...
.
History
GHC originally started in 1989 as a prototype, written in LML (Lazy ML)Lazy ML
Lazy ML is a functional programming language developed in the early 1980s by Lennart Augustsson and Thomas Johnsson at Chalmers University of Technology, prior to Miranda and Haskell. LML is a strongly typed, statically scoped implementation of ML, with lazy evaluation.The key innovation of LML...
by Kevin Hammond at the University of Glasgow
University of Glasgow
The University of Glasgow is the fourth-oldest university in the English-speaking world and one of Scotland's four ancient universities. Located in Glasgow, the university was founded in 1451 and is presently one of seventeen British higher education institutions ranked amongst the top 100 of the...
. Later that year, the prototype was completely rewritten in Haskell, except for its parser, by Cordelia Hall, Will Partain, and Simon Peyton Jones. Its first beta release was on April 1, 1991 and subsequent releases added a strictness analyzer
Strictness analysis
In computer science, strictness analysis refers to any algorithm used to prove that a function in a non-strict functional programming language is strict in one or more of its arguments. This information is useful to compilers because strict functions can be compiled more efficiently...
as well as language extensions such as monadic I/O
Monads in functional programming
In functional programming, a monad is a programming structure that represents computations. Monads are a kind of abstract data type constructor that encapsulate program logic instead of data in the domain model...
, mutable arrays, unboxed data types, concurrent and parallel programming models (such as software transactional memory
Software transactional memory
In computer science, software transactional memory is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. It is an alternative to lock-based synchronization. A transaction in this context is a piece of code that...
and data parallelism
Data parallelism
Data parallelism is a form of parallelization of computing across multiple processors in parallel computing environments. Data parallelism focuses on distributing the data across different parallel computing nodes...
) and a profiler
Performance analysis
In software engineering, profiling is a form of dynamic program analysis that measures, for example, the usage of memory, the usage of particular instructions, or frequency and duration of function calls...
.
Peyton Jones, as well as Simon Marlow, later moved to Microsoft Research
Microsoft Research
Microsoft Research is the research division of Microsoft created in 1991 for developing various computer science ideas and integrating them into Microsoft products. It currently employs Turing Award winners C.A.R. Hoare, Butler Lampson, and Charles P...
in Cambridge, England, where they continue to be primarily responsible for developing GHC. GHC also contains code from more than sixty other contributors.
Since 2009, third-party contributions to GHC have been funded by the Industrial Haskell Group.
Architecture
GHC is itself written in Haskell (in a technique known as bootstrappingBootstrapping (compilers)
In computer science, bootstrapping is the process of writing a compiler in the target programming language which it is intended to compile...
), but the runtime system
Run-time system
A run-time system is a software component designed to support the execution of computer programs written in some computer language...
for Haskell, essential to run programs, is written in C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
and C--
C--
C-- is a C-like programming language. Its creators, functional programming researchers Simon Peyton Jones and Norman Ramsey, designed it to be generated mainly by compilers for very high-level languages rather than written by human programmers...
.
GHC's front end — incorporating the lexer
Lexer
Lexer may refer to:*Matthias von Lexer, a German lexicographer*Erich Lexer, German physician*A program performing lexical analysis...
, parser and typechecker
Type system
A type system associates a type with each computed value. By examining the flow of these values, a type system attempts to ensure or prove that no type errors can occur...
— is designed to preserve as much information about the source language as possible until after type inference
Type inference
Type inference refers to the automatic deduction of the type of an expression in a programming language. If some, but not all, type annotations are already present it is referred to as type reconstruction....
is complete, toward the goal of providing clear error messages to users. After type checking, the Haskell code is desugared
Syntactic sugar
Syntactic sugar is a computer science term that refers to syntax within a programming language that is designed to make things easier to read or to express....
into a typed intermediate language
Intermediate language
In computer science, an intermediate language is the language of an abstract machine designed to aid in the analysis of computer programs. The term comes from their use in compilers, where a compiler first translates the source code of a program into a form more suitable for code-improving...
known as "Core" (based on System F
System F
System F, also known as the polymorphic lambda calculus or the second-order lambda calculus, is a typed lambda calculus that differs from the simply typed lambda calculus by the introduction of a mechanism of universal quantification over types...
, extended with
let
and case
expressions). Recently, Core was extended to support generalized algebraic datatypes in its type systemType system
A type system associates a type with each computed value. By examining the flow of these values, a type system attempts to ensure or prove that no type errors can occur...
, and is now based on an extension to System F known as System FC.
In the tradition of type-directed compilation, GHC's simplifier, or "middle end", where most of the optimizations
Compiler optimization
Compiler optimization is the process of tuning the output of a compiler to minimize or maximize some attributes of an executable computer program. The most common requirement is to minimize the time taken to execute a program; a less common one is to minimize the amount of memory occupied...
implemented in GHC are performed, is structured as a series of source-to-source
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...
transformations
Program transformation
A program transformation is any operation that takes a computer program and generates another program. In many cases the transformed program is required to be semantically equivalent to the original, relative to a particular formal semantics and in fewer cases the transformations result in programs...
on Core code. The analyses and transformations performed in this compiler stage include demand analysis (a generalization of strictness analysis
Strictness analysis
In computer science, strictness analysis refers to any algorithm used to prove that a function in a non-strict functional programming language is strict in one or more of its arguments. This information is useful to compilers because strict functions can be compiled more efficiently...
), application of user-defined rewrite rules (including a set of rules included in GHC's standard libraries that performs foldr/build fusion
Deforestation (computer science)
In the theory of programming languages in computer science, deforestation is a program transformation to eliminate tree structures....
), unfolding (called "inlining" in more traditional compilers), let-floating, an analysis that determines which function arguments can be unboxed, constructed product result analysis
Constructed product result analysis
In the field of compiler implementation in computer science, constructed product result analysis is a static analysis that determines which functions in a given program can return multiple results in an efficient manner...
, specialization
Partial evaluation
In computing, partial evaluation is a technique for several different types of program optimization by specialization. The most straightforward application is to produce new programs which run faster than the originals while being guaranteed to behave in the same way...
of overloaded
Type class
In computer science, a type class is a type system construct that supports ad-hoc polymorphism. This is achieved by adding constraints to type variables in parametrically polymorphic types...
functions, as well as a set of simpler local transformations such as constant folding
Constant folding
Constant folding and constant propagation are related compiler optimizations used by many modern compilers. An advanced form of constant propagation known as sparse conditional constant propagation can more accurately propagate constants and simultaneously remove dead code.- Constant folding...
and beta reduction.
The back end of the compiler transforms Core code into an internal representation of C--, via an intermediate language STG (short for "Spineless Tagless G-machine"). The C-- code can then take one of three routes: it is either printed as C code for compilation with GCC
GNU Compiler Collection
The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain...
, converted directly into native machine code (the traditional "code generation" phase), or converted to LLVM
Low Level Virtual Machine
The Low Level Virtual Machine is a compiler infrastructure written in C++ that is designed for compile-time, link-time, run-time, and "idle-time" optimization of programs written in arbitrary programming languages...
virtual machine code for compilation with LLVM. In all three cases, the resultant native code is finally linked against the GHC runtime system to produce an executable.
Language
GHC complies with the language standard, called Haskell 98. It also supports many optional extensions to the Haskell standard: for example, the software transactional memorySoftware transactional memory
In computer science, software transactional memory is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. It is an alternative to lock-based synchronization. A transaction in this context is a piece of code that...
(STM) library, which allows for Composable Memory Transactions.
Extensions to Haskell
A number of extensions to Haskell have been proposed. These extensions provide features not described in the language specification, or they redefine existing constructs. As such, each extension may not be supported by all Haskell implementations. There is an ongoing effort to describe extensions and select those which will be included in future versions of the language specification.The extensions supported by the Glasgow Haskell Compiler include:
- Unboxed types and operations. These represent the primitive datatypes of the underlying hardware, without the indirection of a pointer to the heap or the possibility of deferred evaluation. Numerically intensive code can be significantly faster when coded using these types.
- The ability to specify strict evaluation for a value, pattern binding, or datatype field.
- More convenient syntax for working with modules, patterns, list comprehensions, operators, records, and tuples.
- Syntactic sugarSyntactic sugarSyntactic sugar is a computer science term that refers to syntax within a programming language that is designed to make things easier to read or to express....
for computing with arrows and recursively-defined monadic values. Both of these concepts extend the monadic do-notation provided in standard Haskell. - A significantly more powerful system of types and typeclasses, described below.
- Template HaskellTemplate HaskellTemplate Haskell is an experimental language extension to the programming language Haskell implemented in the Glasgow Haskell Compiler . In early incarnations it was also known as Template Meta-Haskell....
, a system for compile-time metaprogrammingMetaprogrammingMetaprogramming is the writing of computer programs that write or manipulate other programs as their data, or that do part of the work at compile time that would otherwise be done at runtime...
. A programmer can write expressions that produce Haskell code in the form of an abstract syntax treeAbstract syntax treeIn computer science, an abstract syntax tree , or just syntax tree, is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code. The syntax is 'abstract' in the sense that it...
. These expressions are typechecked and evaluated at compile time; the generated code is then included as if it were written directly by the programmer. Together with the ability to reflectReflection (computer science)In computer science, reflection is the process by which a computer program can observe and modify its own structure and behavior at runtime....
on definitions, this provides a powerful tool for further extensions to the language. - Quasi-quotation, which allows the user to define new concrete syntax for expressions and patterns. Quasi-quotation is useful when a metaprogram written in Haskell manipulates code written in a language other than Haskell.
- GenericGeneric programmingIn a broad definition, generic programming is a style of computer programming in which algorithms are written in terms of to-be-specified-later types that are then instantiated when needed for specific types provided as parameters...
typeclasses, which specify functions solely in terms the algebraic structure of the types they operate on. - Parallel evaluation of expressions using multiple CPU cores. This does not require explicitly spawning threads. The distribution of work happens implicitly, based on annotations provided by the programmer.
- Compiler pragmaDirective (programming)In computer programming, the term directive is applied in a variety of ways that are similar to the term command. It is also used to describe some programming language constructs ....
s for directing optimizations such as inline expansionInline expansionIn computing, inline expansion, or inlining, is a manual or compiler optimization that replaces a function call site with the body of the callee. This optimization may improve time and space usage at runtime, at the possible cost of increasing the final size of the program In computing, inline...
and specializing functions for particular types. - Customizable rewrite rules. The programmer can provide rules describing how to replace one expression with an equivalent but more efficiently evaluated expression. These are used within core datastructure libraries to provide improved performance throughout application-level code.
Type system extensions
An expressive static type system is one of the major defining features of Haskell. Accordingly, much of the work in extending the language has been directed towards typesData type
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of...
and type class
Type class
In computer science, a type class is a type system construct that supports ad-hoc polymorphism. This is achieved by adding constraints to type variables in parametrically polymorphic types...
es.
The Glasgow Haskell Compiler supports an extended type system based on the theoretical System Fc. Major extensions to the type system include:
- Arbitrary-rank and impredicative polymorphismType polymorphismIn computer science, polymorphism is a programming language feature that allows values of different data types to be handled using a uniform interface. The concept of parametric polymorphism applies to both data types and functions...
. Essentially, a polymorphic function or datatype constructor may require that one of its arguments is itself polymorphic. Impredicative polymorphism is now considered deprecated, and can be eliminated by declaring a new monomorphic datatype to wrap a polymorphic type. - Generalized algebraic data typeGeneralized Algebraic Data TypeIn functional programming, a generalized algebraic data type is a generalization of the algebraic data types of Haskell and ML, applying to parametric types.With this extension, the parameters of the return type of a data constructor can be freely chosen when declaring...
s. Each constructor of a polymorphic datatype can encode information into the resulting type. A function which pattern-matches on this type can use the per-constructor type information to perform more specific operations on data. - Existential types. These can be used to "bundle" some data together with operations on that data, in such a way that the operations can be used without exposing the specific type of the underlying data. Such a value is very similar to an objectObject (computer science)In computer science, an object is any entity that can be manipulated by the commands of a programming language, such as a value, variable, function, or data structure...
as found in object-oriented programmingObject-oriented programmingObject-oriented programming is a programming paradigm using "objects" – data structures consisting of data fields and methods together with their interactions – to design applications and computer programs. Programming techniques may include features such as data abstraction,...
languages. - Data types that do not actually contain any values. These can be useful to represent data in type-level metaprogrammingMetaprogrammingMetaprogramming is the writing of computer programs that write or manipulate other programs as their data, or that do part of the work at compile time that would otherwise be done at runtime...
. - Type families: user-defined functions from types to types. Whereas parametric polymorphism provides the same structure for every type instantiation, type families provide ad-hoc polymorphism with implementations that can differ between instantiations. Use cases include content-aware optimizing containers and type-level metaprogramming.
- Implicit function parameters that have dynamic scopeScope (programming)In computer programming, scope is an enclosing context where values and expressions are associated. Various programming languages have various types of scopes. The type of scope determines what kind of entities it can contain and how it affects them—or semantics...
. These are represented in types in much the same way as type class constraints.
Extensions relating to type class
Type class
In computer science, a type class is a type system construct that supports ad-hoc polymorphism. This is achieved by adding constraints to type variables in parametrically polymorphic types...
es include:
- A type class may be parametrized on more than one type. Thus a type class can describe not only a set of types, but an n-ary relation on types.
- Functional dependencies, which constrain parts of that relation to be a mathematical functionFunction (mathematics)In mathematics, a function associates one quantity, the argument of the function, also known as the input, with another quantity, the value of the function, also known as the output. A function assigns exactly one output to each input. The argument and the value may be real numbers, but they can...
on types. That is, the constraint specifies that some type class parameter is completely determined once some other set of parameters is fixed. This guides the process of type inferenceType inferenceType inference refers to the automatic deduction of the type of an expression in a programming language. If some, but not all, type annotations are already present it is referred to as type reconstruction....
in situations where otherwise there would be ambiguity. - Significantly relaxed rules regarding the allowable shape of type class instances. When these are enabled in full, the type class system becomes a Turing-complete language for logic programmingLogic programmingLogic programming is, in its broadest sense, the use of mathematical logic for computer programming. In this view of logic programming, which can be traced at least as far back as John McCarthy's [1958] advice-taker proposal, logic is used as a purely declarative representation language, and a...
at compile time. - Type families, as described above, may also be associated with a type class.
- The automatic generation of certain type class instances is extended in several ways. New type classes for generic programmingGeneric programmingIn a broad definition, generic programming is a style of computer programming in which algorithms are written in terms of to-be-specified-later types that are then instantiated when needed for specific types provided as parameters...
and common recursion patterns are supported. Additionally, when a new type is declared as isomorphic to an existing type, any type class instance declared for the underlying type may be lifted to the new type "for free".
Portability
Versions of GHC are available for several platforms, including WindowsMicrosoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...
and most varieties of Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
(such as the numerous GNU/Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
flavors, FreeBSD
FreeBSD
FreeBSD is a free Unix-like operating system descended from AT&T UNIX via BSD UNIX. Although for legal reasons FreeBSD cannot be called “UNIX”, as the direct descendant of BSD UNIX , FreeBSD’s internals and system APIs are UNIX-compliant...
, and Mac OS X
Mac OS X
Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...
). GHC has also been ported
Porting
In computer science, porting is the process of adapting software so that an executable program can be created for a computing environment that is different from the one for which it was originally designed...
to several different processor architectures
CPU design
CPU design is the design engineering task of creating a central processing unit , a component of computer hardware. It is a subfield of electronics engineering and computer engineering.- Overview :CPU design focuses on these areas:...
.