Data pointer
Encyclopedia
In computer science
, a pointer is a programming language
data type
whose value refers directly to (or "points to") another value stored elsewhere in the computer memory
using its address
. For high-level programming language
s, pointers effectively take the place of general purpose registers in low-level languages such as assembly language
or machine code
, but may be in available memory
. A pointer references a location in memory, and obtaining the value at the location a pointer refers to is known as dereferencing the pointer. A pointer is a simple, more concrete implementation of the more abstract reference
data type. Several languages support some type of pointer, although some have more restrictions on their use than others. As an analogy, a page number in a book could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number.
Pointers to data significantly improve performance for repetitive operations such as traversing strings, lookup table
s, control table
s and tree
structures. In particular, it is often much cheaper in time and space to copy and dereference pointers than it is to copy and access the data to which the pointers point.
Pointers are also used to hold the addresses of entry points for call
ed subroutines in procedural programming
and for run-time linking to dynamic link libraries (DLLs). In object-oriented programming
, pointers to functions
are used for binding
methods
, often using what are called virtual method table
s.
While "pointer" has been used to refer to references in general, it more properly applies to data structures whose interface explicitly allows the pointer to be manipulated (arithmetically via pointer arithmetic) as a memory address, as opposed to a magic cookie
or capability
where this is not possible. Because pointers allow both protected and unprotected access to memory addresses, there are risks associated with using them particularly in the latter case. Primitive pointers are often stored in a format similar to an integer
; however, attempting to dereference or "look up" a pointer whose value was never a valid memory address would cause a program to crash. To ameliorate this potential problem, as a matter of type safety
, pointers are considered a separate type to the type of data they point to, even if the underlying representation is the same. Other measures may also be taken.
, a pointer is a kind of reference
.
A data primitive (or just primitive) is any datum that can be read from or written to computer memory
using one memory access (for instance, both a byte
and word are primitives).
A data aggregate (or just aggregate) is a group of primitives that are logically
contiguous in memory and that are viewed collectively as one datum (for instance, an aggregate could be 3 logically contiguous bytes, the values of which represent the 3 coordinates of a point in space); when an aggregate is entirely composed of the same type of primitive, the aggregate may be called an array; in a sense, a multi-byte word primitive is an array of bytes, and some programs use words in this way.
In the context of these definitions, a byte is the smallest primitive; each memory address
specifies a different byte. The memory address of the first byte of a datum is considered the memory address (or base memory address) of the entire datum.
A memory pointer (or just pointer) is a primitive, the value of which is intended to be used as a memory address; it is said that a pointer points to a memory address. It is also said that a pointer points to a datum [in memory] when the pointer's value is the datum's memory address.
More generally, a pointer is a kind of reference
, and it is said that a pointer references a datum stored somewhere in memory; to obtain that datum is to dereference the pointer. The feature that separates pointers from other kinds of reference is that a pointer's value is meant to be interpreted as a memory address, which is a rather 'low-level' concept.
References serve as a level of indirection: A pointer's value determines which memory address (that is, which datum) is to be used in a calculation. Because indirection is a fundamental aspect of algorithms, pointers are often expressed as a fundamental data type
in programming language
s; in statically (or strongly) typed programming languages, the type
of a pointer determines the type of the datum to which the pointer points.
s like lists, queues and trees, it is necessary to have pointers to help manage how the structure is implemented and controlled. Typical examples of pointers are start pointers, end pointers, and stack
pointers. These pointers can either be absolute (the actual physical address
or a virtual address
in virtual memory
) or relative (an offset from an absolute start address ("base") that typically uses fewer bits than a full address, but will usually require one additional arithmetic operation to resolve).
A two-byte offset, containing a 16-bit, unsigned integer, can be used to provide relative addressing for up to 64 kilobytes of a data structure. This can easily be extended to 128K, 256K or 512K if the address pointed to is forced to be on a half-word, word or double-word boundary (but, requiring an additional "shift left" bitwise operation
—by 1,2 or 3 bits—in order to adjust the offset by a factor of 2,3 or 4, before its addition to the base address). Generally, though, such schemes are a lot of trouble, and for convenience to the programmer a flat address space is preferred.
A one byte offset, such as the hexadecimal ASCII
value of a character (e.g. X'29') can be used to point to an alternative integer value (or index) in an array (e.g. X'01'). In this way, characters can be very efficiently translated from 'raw data
' to a usable sequential index
and then to an absolute address without a lookup table
.
s, that are used to control program flow, usually make extensive use of pointers. The pointers, usually embedded in a table entry, may, for instance, be used to hold the entry points to subroutine
s to be executed, based on certain conditions defined in the same table entry. The pointers can however be simply indexes to other separate, but associated, tables comprising an array of the actual addresses or the addresses themselves (depending upon the programming language constructs available). They can also be used to point (back) to earlier table entries (as in loop processing) or forward to skip some table entries (as in a switch
or "early" exit from a loop). For this latter purpose, the "pointer" may simply be the table entry number itself and can be transformed into an actual address by simple arithmetic.
on top of the addressing capabilities provided by most modern architecture
s. In the simplest scheme, an address
, or a numeric index
, is assigned to each unit of memory in the system, where the unit is typically either a byte
or a word, effectively transforming all of memory into a very large array. Then, if we have an address, the system provides an operation to retrieve the value stored in the memory unit at that address (usually utilizing the machine's general purpose registers).
In the usual case, a pointer is large enough to hold more addresses than there are units of memory in the system. This introduces the possibility that a program may attempt to access an address which corresponds to no unit of memory, either because not enough memory is installed (i.e. beyond the range of available memory) or the architecture does not support such addresses. The first case may, in certain platforms such as the Intel x86 architecture, be called a segmentation fault
(segfault). The second case is possible in the current implementation of AMD64
, where pointers are 64 bit long and addresses only extend to 48 bits. There, pointers must conform to certain rules (canonical addresses), so if a noncanonical pointer is dereferenced, the processor raises a general protection fault
.
On the other hand, some systems have more units of memory than there are addresses. In this case, a more complex scheme such as memory segmentation or paging
is employed to use different parts of the memory at different times. The last incarnations of the x86 architecture support up to 36 bits of physical memory addresses, which were mapped to the 32-bit linear address space through the PAE
paging mechanism. Thus, only 1/16 of the possible total memory may be accessed at a time. Another example in the same computer family was the 16-bit protected mode
of the 80286 processor, which, though supporting only 16 MiB of physical memory, could access up to 1 GiB of virtual memory, but the combination of 16-bit address and segment registers made accessing more than 64 KiB in one data structure cumbersome. Some restrictions of ANSI pointer arithmetic may have been due to the segmented memory models of this processor family.
In order to provide a consistent interface, some architectures provide memory-mapped I/O
, which allows some addresses to refer to units of memory while others refer to device register
s of other devices in the computer. There are analogous concepts such as file offsets, array indices, and remote object references that serve some of the same purposes as addresses for other types of objects.
, C
, C++
, Pascal, and most assembly language
s. They are primarily used for constructing reference
s, which in turn are fundamental to constructing nearly all data structure
s, as well as in passing data between different parts of a program.
In functional programming languages that rely heavily on lists, pointers and references are managed abstractly by the language using internal constructs like cons
.
When dealing with arrays, the critical lookup
operation typically involves a stage called address calculation which involves constructing a pointer to the desired data element in the array. If the data elements in the array have lengths that are divisible by powers of two, this arithmetic is usually much more efficient
. Padding is frequently used as a mechanism for ensuring this is the case, despite the increased memory requirement. In other data structures, such as linked lists, pointers are used as references to explicitly tie one piece of the structure to another.
Pointers are used to pass parameters by reference. This is useful if the programmer wants a function's modifications to a parameter to be visible to the function's caller. This is also useful for returning multiple values from a function.
Pointers can also be used to allocate and deallocate dynamic variables and arrays in memory. Since a variable will often become redundant after it has served its purpose, it is a waste of memory to keep it, and therefore it is good practice to deallocate it (using the original pointer reference) when it is no longer needed. Failure to do so may result in a memory leak
(where available free memory gradually, or in severe cases rapidly, diminishes because of an accumulation of numerous redundant memory blocks).
This declares
This is usually stated more succinctly as '
Because the C language does not specify an implicit initialization for objects of automatic storage duration, care should often be taken to ensure that the address to which
Dereferencing a null pointer in C produces undefined behavior, which could be catastrophic. However, most implementations simply halt execution of the program in question, usually with a segmentation fault.
However, initializing pointers unnecessarily could hinder program analyses, thereby hiding bugs.
In any case, once a pointer has been declared, the next logical step is for it to point at something:
This assigns the value of
This means take the contents of
If
This example may be more clear if memory is examined directly.
Assume that
(The NULL pointer shown here is 0x00000000.)
By assigning the address of
yields the following memory values:
Then by dereferencing
the computer will take the contents of
Clearly, accessing
This allocates a block of five integers and names the block
which returns a consecutive block of memory of no less than the requested size that can be used as an array.
While most operators on arrays and pointers are equivalent, it is important to note that the
Default values of an array can be declared like:
If you assume that
, like the addresses):
Represented here are five integers: 2, 4, 3, 1, and 5. These five integers occupy 32 bits (4 bytes) each with the least-significant byte stored first (this is a little-endian CPU architecture) and are stored consecutively starting at address 0x1000.
The syntax for C with pointers is:
The last example is how to access the contents of
E.g.
in C.
Note that this pointer-recursive definition is essentially the same as the reference-recursive definition from the Haskell programming language
:
data Link a = Nil
| Cons a (Link a)
cell of type
The definition with references, however, is type-checked and does not use potentially confusing signal values. For this reason, data structures in C are usually dealt with via wrapper function
s, which are carefully checked for correctness.
code:
The example C code below illustrates how structure objects are dynamically allocated and referenced. The standard C library provides the function
for allocating memory blocks from the heap. It takes the size of an object to allocate as a parameter and returns a pointer to a newly allocated block of memory suitable for storing the object, or it returns a null pointer if the allocation failed.
The code below illustrates how memory objects are dynamically deallocated, i.e., returned to the heap or free store. The standard C library provides the function
Assigning addresses to pointers is an invaluable tool when programming microcontrollers. Below is a simple example declaring a pointer of type int and initialising it to a hexadecimal
address in this example the constant
In the mid 80s, using the BIOS
to access the video capabilities of PCs was slow. Applications that were display-intensive typically used to access CGA
video memory directly by casting the hexadecimal
constant
code in the low byte, and a colour in the high byte. Thus, to put the letter 'A' at row 5, column 2 in bright white on blue, one would write code like the following:
Typed pointers and casting
In many languages, pointers have the additional restriction that the object they point to has a specific type. For example, a pointer may be declared to point to an integer
; the language will then attempt to prevent the programmer from pointing it to objects which are not integers, such as floating-point numbers, eliminating some errors.
For example, in C
The following would yield a compiler warning of "assignment from incompatible pointer type" under GCC
because
To suppress the compiler warning, it must be made explicit that you do indeed wish to make the assignment by typecasting it
which says to cast the integer pointer of
A 2005 draft of the C standard requires that casting a pointer derived from one type to one of another type should maintain the alignment correctness for both types (6.3.2.3 Pointers, par. 7):
In languages that allow pointer arithmetic, arithmetic on pointers takes into account the size of the type. For example, adding an integer number to a pointer produces another pointer that points to an address that is higher by that number times the size of the type. This allows us to easily compute the address of elements of an array of a given type, as was shown in the C arrays example above. When a pointer of one type is cast to another type of a different size, the programmer should expect that pointer arithmetic will be calculated differently. In C, for example, if the
Although it is impossible in general to determine at compile-time which casts are safe, some languages store run-time type information
which can be used to confirm that these dangerous casts are valid at runtime. Other languages merely accept a conservative approximation of safe casts, or none at all.
Making pointers safer
Because a pointer allows a program to attempt to access an object that may not be defined, pointers can be the source of a variety of programming errors
. However, the usefulness of pointers is so great that it can be difficult to perform programming tasks without them. Consequently, many languages have created constructs designed to provide some of the useful features of pointers without some of their pitfalls
.
One major problem with pointers is that as long as they can be directly manipulated as a number, they can be made to point to unused addresses or to data which is being used for other purposes. Many languages, including most functional programming languages and recent imperative languages like Java
, replace pointers with a more opaque type of reference, typically referred to as simply a reference, which can only be used to refer to objects and not manipulated as numbers, preventing this type of error. Array indexing is handled as a special case.
A pointer which does not have any address assigned to it is called a wild pointer. Any attempt to use such uninitialized pointers can cause unexpected behavior, either because the initial value is not a valid address, or because using it may damage other parts of the program. The result is often a segmentation fault
or storage violation
.
In systems with explicit memory allocation, it is possible to create a dangling pointer
by deallocating the memory region it points into. This type of pointer is dangerous and subtle because a deallocated memory region may contain the same data as it did before it was deallocated but may be then reallocated and overwritten by unrelated code, unknown to the earlier code. It is claimed that languages with garbage collection
prevent this type of error (because deallocation is performed automatically) but the pointer itself is not removed by the garbage collector and it may point to irrelevant and unpredictable data if re-used at any time after it has been deallocated.
Some languages, like C++
, support smart pointer
s, which use a simple form of reference counting
to help track allocation of dynamic memory in addition to acting as a reference. In the absence of reference cycles, where an object refers to itself indirectly through a sequence of smart pointers, these eliminate the possibility of dangling pointers and memory leaks. Delphi
strings support reference counting natively.
Null pointer
A null pointer has a value reserved for indicating that the pointer does not refer to a valid object. Null pointers are routinely used to represent conditions such as the end of a list of unknown length or the failure to perform some action; this use of null pointers can be compared to nullable types and to the Nothing value in an option type
.
Null pointers are often considered similar to null values in relational database
s, but they have somewhat different semantics. Null pointer in most programming languages means "no value", while null value in relational database means "unknown value". This leads to important difference in practice: two null pointers are considered equal in most programming languages, but two null values in relational database are not (since they represent unknown values, it is unknown whether they are equal).
In some programming language environments (at least one proprietary Lisp implementation, for example), the value used as the null pointer (called
In C, two null pointers of any type are guaranteed to compare equal. The macro
, and it is where the interrupt table is stored), but modern operating systems usually map virtual address spaces in such a way that accessing address zero is forbidden.
In C++, while the
A null pointer should not be confused with an uninitialized pointer: A null pointer is guaranteed to compare unequal to any pointer that points to a valid object. However, depending on the language and implementation, an uninitialized pointer has either an indeterminate (random or meaningless) value or a specific value that is not necessarily any kind of null pointer constant.
The null reference was invented by C.A.R. Hoare in 1965 as part of the Algol W
language. Hoare later (2009) described his invention as a "billion-dollar mistake":
Because a null pointer does not point to a meaningful object, an attempt to dereference a null pointer usually causes a run-time error:
In languages with a tagged architecture
, a possibly-null pointer can be replaced with a tagged union
which enforces explicit handling of the exceptional case; in fact, a possibly-null pointer can be seen as a tagged pointer
with a computed tag.
Autorelative pointer
The term autorelative pointer may refer to a pointer whose value is interpreted as an offset from the address of the pointer itself; thus, if a data structure, , has an autorelative pointer member, , that points to some portion of itself, then may be relocated in memory without having to update the value of .
The cited patent also uses the term self-relative pointer to mean the same thing. However, the meaning of that term has been used in other ways:
Based pointer
Double indirection
In some languages a pointer can reference another pointer, requiring two dereference operations to get to the original value. While each level of indirection may add a performance cost, it is sometimes necessary in order to provide correct behavior for complex data structures. For example, in C it is typical to define a linked list
in terms of an element that contains a pointer to the next element of the list:
This implementation uses a pointer to the first element in the list as a surrogate for the entire list. If a new value is added to the beginning of the list,
In this case, if the value of
Wild pointers
Wild pointers are pointers that have not been initialized (that is, a wild pointer has not had any address assigned to it) and may make a program crash or behave oddly. In the Pascal or C programming languages
, pointers that are not specifically initialized may point to unpredictable addresses in memory.
The following example code shows a wild pointer:
Here,
.
Wild branch
Where a pointer is used as the address of the entry point to a program or start of a subroutine
and is also either uninitialized or corrupted, if a call or jump is nevertheless made to this address, a "wild branch
" is said to have occurred. The consequences are usually unpredictable and the error may present itself in several different ways depending upon whether or not the pointer is a "valid" address and whether or not there is (coincidentally) a valid instruction (opcode) at that address. The detection of a wild branch can present one of the most difficult and frustrating debugging exercises since much of the evidence may already have been destroyed beforehand or by execution of one or more inappropriate instructions at the branch location. If available, an instruction set simulator
can usually not only detect a wild branch before it takes effect, but also provide a complete or partial trace of its history.
Simulation using an array index
It is possible to simulate pointer behavior using an index to an (normally one-dimensional) array.
Primarily for languages which do not support pointers explicitly but do support arrays, the array
can be thought of and processed as if it were the entire memory range (within the scope of the particular array) and any index
to it can be thought of as equivalent to a general purpose register in assembly language (that points to the individual bytes but whose actual value is relative to the start of the array, not its absolute address in memory).
Assuming the array is, say, a contiguous 16 megabyte
character data structure
, individual bytes (or a string
of contiguous bytes within the array) can be directly addressed and manipulated using the name of the array with a 31 bit unsigned integer
as the simulated pointer (this is quite similar to the C arrays example shown above). Pointer arithmetic can be simulated by adding or subtracting from the index, with minimal additional overhead compared to genuine pointer arithmetic.
It is even theoretically possible, using the above technique, together with a suitable instruction set simulator
to simulate any machine code
or the intermediate (byte code) of any processor/language in another language that does not support pointers at all (for example Java
/ JavaScript
). To achieve this, the binary
code can initially be loaded into contiguous bytes of the array for the simulator to "read", interpret and action entirely within the memory contained of the same array.
If necessary, to completely avoid buffer overflow
problems, bounds checking
can usually be actioned for the compiler (or if not, hand coded in the simulator).
Quotations
to be raised. Pointers in Ada are called access types. Ada 83 did not permit arithmetic on access types (although many compiler vendors provided for it as a non-standard feature), but Ada 95 supports “safe” arithmetic on access types via the package
had support for STRPTR to return the address of a string, and for VARPTR to return the address of a variable. Visual Basic 5 also had support for OBJPTR to return the address of an object interface, and for an ADDRESSOF operator to return the address of a function. The types of all of these are integers, but their values are equivalent to those held by pointer types.
Newer dialects of BASIC
, such as FreeBASIC
or BlitzMax, have exhaustive pointer implementations, however. In FreeBASIC, arithmetic on
and C++
pointers are variables that store addresses and can be null. Each pointer has a type it points to, but one can freely cast between pointer types, although the behavior is implementation-defined. A special pointer type called the “void pointer” allows pointing to any variable type, but is limited by the fact that it cannot be dereferenced directly. The address itself can often be directly manipulated by casting a pointer to and from an integral type of sufficient size, though the results are implementation-defined and may indeed cause undefined behavior; while earlier C standards did not have an integral type that was guaranteed to be large enough, C99
specifies the
C++
fully supports C pointers and C typecasting. It also supports a new group of typecasting operators to help catch some unintended dangerous casts at compile-time. The C++ standard library
also provides
which can be used in some situations as a safe alternative to primitive C pointers. C++ also supports another form of reference, quite different from a pointer, called simply a reference
or reference type.
Pointer arithmetic, that is, the ability to modify a pointer's target address with arithmetic operations (as well as magnitude comparisons), is restricted by the language standard to remain within the bounds of a single array object (or just after it), though many non-segmented architectures will allow for more lenient arithmetic. Adding or subtracting from a pointer moves it by a multiple of the size of the datatype it points to. For example, adding 1 to a pointer to 4-byte integer values will increment the pointer by 4. This has the effect of incrementing the pointer to point at the next element in a contiguous array of integers—which is often the intended result. Pointer arithmetic cannot be performed on
has no size, and thus the pointed address can not be added to, although gcc
and other compilers will perform byte arithmetic on
Pointer arithmetic provides the programmer with a single way of dealing with different types: adding and subtracting the number of elements required instead of the actual offset in bytes. (though the
While powerful, pointer arithmetic can be a source of computer bugs. It tends to confuse novice programmer
s, forcing them into different contexts: an expression can be an ordinary arithmetic one or a pointer arithmetic one, and sometimes it is easy to mistake one for the other. In response to this, many modern high-level computer languages (for example Java
) do not permit direct access to memory using addresses. Also, the safe C dialect Cyclone
addresses many of the issues with pointers. See C programming language for more criticism.
The
K&R
C used
C++ does not allow the implicit conversion of
In C++, there is no
The syntax is essentially the same as in C++, and the address pointed can be either managed
or unmanaged
memory. However, pointers to managed memory (any pointer to a managed object) must be declared using the
from moving the pointed object as part of memory management while the pointer is in scope, thus keeping the pointer address valid.
An exception to this is from using the
The .NET framework includes many classes and methods in the
types and pointers (for example,
.
programming language supports pointers to variables. Primitive or group (record) data objects declared within the
Memory space for each pointed-to data object is typically allocated dynamically using external
statements or via embedded extended language constructs such as
or
statements.
Extended versions of COBOL also provide pointer variables declared with
Some extended versions of COBOL also provide
.
, which are typed and do not allow any form of pointer arithmetic. The ECMA
standard for Eiffel includes an "attached type" mechanism that claims to guarantee void safety
.
introduced a strongly typed pointer capability. Fortran pointers contain more than just a simple memory address. They also encapsulate the lower and upper bounds of array dimensions, strides (for example, to support arbitrary array sections), and other metadata. An association operator,
Fortran-2003 adds support for procedure pointers. Also, as part of the C Interoperability feature, Fortran-2003 supports intrinsic functions for converting C-style pointers into Fortran pointers and back.
has pointers. Its declaration syntax is equivalent to that of C, but written the other way around, ending with the type. Unlike C, Go has garbage collection, and disallows pointer arithmetic. Reference types, like in C++, do not exist. Some built-in types, like maps and channels, are boxed (i.e. internally they are pointers to mutable structures), and are initialized using the
is even more strongly typed than Pascal, with fewer ways to escape the type system. Some of the variants of Modula-2 (such as Modula-3
) include garbage collection.
and its variants are still safer with respect to pointers than Modula-2 or its variants. As with Modula-3
, garbage collection is a part of the language specification.
or C
. It also removes some risks caused by dangling pointers, but the ability to dynamically let go of referenced space by using the
) means that the risk of dangling pointers has not been entirely eliminated.
However, in some commercial and open source Pascal (or derivatives) compiler implementations —like Free Pascal
, Turbo Pascal
or the Object Pascal
in Embarcadero Delphi— a pointer is allowed to reference standard static or local variables and can be cast from one pointer type to another. Moreover pointer arithmetic is unrestricted: adding or subtracting from a pointer moves it by that number of bytes in either direction, but using the
supports pointers, although rarely used, in the form of the pack and unpack functions. These are intended only for simple interactions with compiled OS libraries. In all other cases, Perl uses references
, which are typed and do not allow any form of pointer arithmetic. They are used to construct complex data structures.
See also
External links
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, a pointer is a programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
data type
Data type
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of...
whose value refers directly to (or "points to") another value stored elsewhere in the computer memory
Computer memory
In computing, memory refers to the physical devices used to store programs or data on a temporary or permanent basis for use in a computer or other digital electronic device. The term primary memory is used for the information in physical systems which are fast In computing, memory refers to the...
using its address
Memory address
A digital computer's memory, more specifically main memory, consists of many memory locations, each having a memory address, a number, analogous to a street address, at which computer programs store and retrieve, machine code or data. Most application programs do not directly read and write to...
. For high-level programming language
High-level programming language
A high-level programming language is a programming language with strong abstraction from the details of the computer. In comparison to low-level programming languages, it may use natural language elements, be easier to use, or be from the specification of the program, making the process of...
s, pointers effectively take the place of general purpose registers in low-level languages such as assembly language
Assembly language
An assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...
or machine code
Machine code
Machine code or machine language is a system of impartible instructions executed directly by a computer's central processing unit. Each instruction performs a very specific task, typically either an operation on a unit of data Machine code or machine language is a system of impartible instructions...
, but may be in available memory
Virtual memory
In computing, virtual memory is a memory management technique developed for multitasking kernels. This technique virtualizes a computer architecture's various forms of computer data storage , allowing a program to be designed as though there is only one kind of memory, "virtual" memory, which...
. A pointer references a location in memory, and obtaining the value at the location a pointer refers to is known as dereferencing the pointer. A pointer is a simple, more concrete implementation of the more abstract reference
Reference (computer science)
In computer science, a reference is a value that enables a program to indirectly access a particular data item, such as a variable or a record, in the computer's memory or in some other storage device. The reference is said to refer to the data item, and accessing those data is called...
data type. Several languages support some type of pointer, although some have more restrictions on their use than others. As an analogy, a page number in a book could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number.
Pointers to data significantly improve performance for repetitive operations such as traversing strings, lookup table
Lookup table
In computer science, a lookup table is a data structure, usually an array or associative array, often used to replace a runtime computation with a simpler array indexing operation. The savings in terms of processing time can be significant, since retrieving a value from memory is often faster than...
s, control table
Control table
Control tables are tables that control the program flow or play a major part in program control. There are no rigid rules concerning the structure or content of a control table - its only qualifying attribute is its ability to direct program flow in some way through its 'execution' by an associated...
s and tree
Tree (data structure)
In computer science, a tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes.Mathematically, it is an ordered directed tree, more specifically an arborescence: an acyclic connected graph where each node has zero or more children nodes and at...
structures. In particular, it is often much cheaper in time and space to copy and dereference pointers than it is to copy and access the data to which the pointers point.
Pointers are also used to hold the addresses of entry points for call
System call
In computing, a system call is how a program requests a service from an operating system's kernel. This may include hardware related services , creating and executing new processes, and communicating with integral kernel services...
ed subroutines in procedural programming
Procedural programming
Procedural programming can sometimes be used as a synonym for imperative programming , but can also refer to a programming paradigm, derived from structured programming, based upon the concept of the procedure call...
and for run-time linking to dynamic link libraries (DLLs). In object-oriented programming
Object-oriented programming
Object-oriented programming is a programming paradigm using "objects" – data structures consisting of data fields and methods together with their interactions – to design applications and computer programs. Programming techniques may include features such as data abstraction,...
, pointers to functions
Function pointer
A function pointer is a type of pointer in C, C++, D, and other C-like programming languages, and Fortran 2003. When dereferenced, a function pointer can be used to invoke a function and pass it arguments just like a normal function...
are used for binding
Name binding
In programming languages, name binding is the association of objects with identifiers. An identifier bound to an object is said to reference that object. Machine languages have no built-in notion of identifiers, but name-object bindings as a service and notation for the programmer is implemented...
methods
Method (computer science)
In object-oriented programming, a method is a subroutine associated with a class. Methods define the behavior to be exhibited by instances of the associated class at program run time...
, often using what are called virtual method table
Virtual method table
A virtual method table, virtual function table, dispatch table, or vtable, is a mechanism used in a programming language to support dynamic dispatch ....
s.
While "pointer" has been used to refer to references in general, it more properly applies to data structures whose interface explicitly allows the pointer to be manipulated (arithmetically via pointer arithmetic) as a memory address, as opposed to a magic cookie
Magic cookie
A magic cookie or just cookie for short, is a token or short packet of data passed between communicating programs, where the data is typically not meaningful to the recipient program. The contents are opaque and not usually interpreted until the recipient passes the cookie data back to the sender...
or capability
Capability-based security
Capability-based security is a concept in the design of secure computing systems, one of the existing security models. A capability is a communicable, unforgeable token of authority. It refers to a value that references an object along with an associated set of access rights...
where this is not possible. Because pointers allow both protected and unprotected access to memory addresses, there are risks associated with using them particularly in the latter case. Primitive pointers are often stored in a format similar to an integer
Integer
The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...
; however, attempting to dereference or "look up" a pointer whose value was never a valid memory address would cause a program to crash. To ameliorate this potential problem, as a matter of type safety
Type safety
In computer science, type safety is the extent to which a programming language discourages or prevents type errors. A type error is erroneous or undesirable program behaviour caused by a discrepancy between differing data types...
, pointers are considered a separate type to the type of data they point to, even if the underlying representation is the same. Other measures may also be taken.
Formal description
In computer scienceComputer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, a pointer is a kind of reference
Reference (computer science)
In computer science, a reference is a value that enables a program to indirectly access a particular data item, such as a variable or a record, in the computer's memory or in some other storage device. The reference is said to refer to the data item, and accessing those data is called...
.
A data primitive (or just primitive) is any datum that can be read from or written to computer memory
Computer memory
In computing, memory refers to the physical devices used to store programs or data on a temporary or permanent basis for use in a computer or other digital electronic device. The term primary memory is used for the information in physical systems which are fast In computing, memory refers to the...
using one memory access (for instance, both a byte
Byte
The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...
and word are primitives).
A data aggregate (or just aggregate) is a group of primitives that are logically
Logical address
In computing, a logical address is the address at which an item appears to reside from the perspective of an executing application program....
contiguous in memory and that are viewed collectively as one datum (for instance, an aggregate could be 3 logically contiguous bytes, the values of which represent the 3 coordinates of a point in space); when an aggregate is entirely composed of the same type of primitive, the aggregate may be called an array; in a sense, a multi-byte word primitive is an array of bytes, and some programs use words in this way.
In the context of these definitions, a byte is the smallest primitive; each memory address
Memory address
A digital computer's memory, more specifically main memory, consists of many memory locations, each having a memory address, a number, analogous to a street address, at which computer programs store and retrieve, machine code or data. Most application programs do not directly read and write to...
specifies a different byte. The memory address of the first byte of a datum is considered the memory address (or base memory address) of the entire datum.
A memory pointer (or just pointer) is a primitive, the value of which is intended to be used as a memory address; it is said that a pointer points to a memory address. It is also said that a pointer points to a datum [in memory] when the pointer's value is the datum's memory address.
More generally, a pointer is a kind of reference
Reference (computer science)
In computer science, a reference is a value that enables a program to indirectly access a particular data item, such as a variable or a record, in the computer's memory or in some other storage device. The reference is said to refer to the data item, and accessing those data is called...
, and it is said that a pointer references a datum stored somewhere in memory; to obtain that datum is to dereference the pointer. The feature that separates pointers from other kinds of reference is that a pointer's value is meant to be interpreted as a memory address, which is a rather 'low-level' concept.
References serve as a level of indirection: A pointer's value determines which memory address (that is, which datum) is to be used in a calculation. Because indirection is a fundamental aspect of algorithms, pointers are often expressed as a fundamental data type
Data type
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of...
in programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
s; in statically (or strongly) typed programming languages, the type
Type system
A type system associates a type with each computed value. By examining the flow of these values, a type system attempts to ensure or prove that no type errors can occur...
of a pointer determines the type of the datum to which the pointer points.
Use in data structures
When setting up data structureData structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
s like lists, queues and trees, it is necessary to have pointers to help manage how the structure is implemented and controlled. Typical examples of pointers are start pointers, end pointers, and stack
Stack (data structure)
In computer science, a stack is a last in, first out abstract data type and linear data structure. A stack can have any abstract data type as an element, but is characterized by only three fundamental operations: push, pop and stack top. The push operation adds a new item to the top of the stack,...
pointers. These pointers can either be absolute (the actual physical address
Physical address
In computing, a physical address, also real address, or binary address, is the memory address that is represented in the form of a binary number on the address bus circuitry in order to enable the data bus to access a particular storage cell of main memory.In a computer with virtual memory, the...
or a virtual address
Virtual address
In computer technology, a virtual address is an address identifying a virtual, i.e. non-physical, entity.-Description:The term virtual address is most commonly used for an address pointing to virtual memory or, in networking, when referring to a virtual network address...
in virtual memory
Virtual memory
In computing, virtual memory is a memory management technique developed for multitasking kernels. This technique virtualizes a computer architecture's various forms of computer data storage , allowing a program to be designed as though there is only one kind of memory, "virtual" memory, which...
) or relative (an offset from an absolute start address ("base") that typically uses fewer bits than a full address, but will usually require one additional arithmetic operation to resolve).
A two-byte offset, containing a 16-bit, unsigned integer, can be used to provide relative addressing for up to 64 kilobytes of a data structure. This can easily be extended to 128K, 256K or 512K if the address pointed to is forced to be on a half-word, word or double-word boundary (but, requiring an additional "shift left" bitwise operation
Bitwise operation
A bitwise operation operates on one or more bit patterns or binary numerals at the level of their individual bits. This is used directly at the digital hardware level as well as in microcode, machine code and certain kinds of high level languages...
—by 1,2 or 3 bits—in order to adjust the offset by a factor of 2,3 or 4, before its addition to the base address). Generally, though, such schemes are a lot of trouble, and for convenience to the programmer a flat address space is preferred.
A one byte offset, such as the hexadecimal ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
value of a character (e.g. X'29') can be used to point to an alternative integer value (or index) in an array (e.g. X'01'). In this way, characters can be very efficiently translated from 'raw data
Raw data
'\putang inaIn computing, it may have the following attributes: possibly containing errors, not validated; in sfferent formats; uncoded or unformatted; and suspect, requiring confirmation or citation. For example, a data input sheet might contain dates as raw data in many forms: "31st January...
' to a usable sequential index
Index (information technology)
In computer science, an index can be:# an integer that identifies an array element# a data structure that enables sublinear-time lookup -Array element identifier:...
and then to an absolute address without a lookup table
Lookup table
In computer science, a lookup table is a data structure, usually an array or associative array, often used to replace a runtime computation with a simpler array indexing operation. The savings in terms of processing time can be significant, since retrieving a value from memory is often faster than...
.
Use in control tables
Control tableControl table
Control tables are tables that control the program flow or play a major part in program control. There are no rigid rules concerning the structure or content of a control table - its only qualifying attribute is its ability to direct program flow in some way through its 'execution' by an associated...
s, that are used to control program flow, usually make extensive use of pointers. The pointers, usually embedded in a table entry, may, for instance, be used to hold the entry points to subroutine
Subroutine
In computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....
s to be executed, based on certain conditions defined in the same table entry. The pointers can however be simply indexes to other separate, but associated, tables comprising an array of the actual addresses or the addresses themselves (depending upon the programming language constructs available). They can also be used to point (back) to earlier table entries (as in loop processing) or forward to skip some table entries (as in a switch
Switch statement
In computer programming, a switch, case, select or inspect statement is a type of selection control mechanism that exists in most imperative programming languages such as Pascal, Ada, C/C++, C#, Java, and so on. It is also included in several other types of languages...
or "early" exit from a loop). For this latter purpose, the "pointer" may simply be the table entry number itself and can be transformed into an actual address by simple arithmetic.
Architectural roots
Pointers are a very thin abstractionAbstraction (computer science)
In computer science, abstraction is the process by which data and programs are defined with a representation similar to its pictorial meaning as rooted in the more complex realm of human life and language with their higher need of summarization and categorization , while hiding away the...
on top of the addressing capabilities provided by most modern architecture
Software architecture
The software architecture of a system is the set of structures needed to reason about the system, which comprise software elements, relations among them, and properties of both...
s. In the simplest scheme, an address
Memory address
A digital computer's memory, more specifically main memory, consists of many memory locations, each having a memory address, a number, analogous to a street address, at which computer programs store and retrieve, machine code or data. Most application programs do not directly read and write to...
, or a numeric index
Index (information technology)
In computer science, an index can be:# an integer that identifies an array element# a data structure that enables sublinear-time lookup -Array element identifier:...
, is assigned to each unit of memory in the system, where the unit is typically either a byte
Byte
The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...
or a word, effectively transforming all of memory into a very large array. Then, if we have an address, the system provides an operation to retrieve the value stored in the memory unit at that address (usually utilizing the machine's general purpose registers).
In the usual case, a pointer is large enough to hold more addresses than there are units of memory in the system. This introduces the possibility that a program may attempt to access an address which corresponds to no unit of memory, either because not enough memory is installed (i.e. beyond the range of available memory) or the architecture does not support such addresses. The first case may, in certain platforms such as the Intel x86 architecture, be called a segmentation fault
Segmentation fault
A segmentation fault , bus error or access violation is generally an attempt to access memory that the CPU cannot physically address. It occurs when the hardware notifies an operating system about a memory access violation. The OS kernel then sends a signal to the process which caused the exception...
(segfault). The second case is possible in the current implementation of AMD64
X86-64
x86-64 is an extension of the x86 instruction set. It supports vastly larger virtual and physical address spaces than are possible on x86, thereby allowing programmers to conveniently work with much larger data sets. x86-64 also provides 64-bit general purpose registers and numerous other...
, where pointers are 64 bit long and addresses only extend to 48 bits. There, pointers must conform to certain rules (canonical addresses), so if a noncanonical pointer is dereferenced, the processor raises a general protection fault
General protection fault
A general protection fault in the Intel x86 and AMD x86-64 architectures, and other unrelated architectures, is a fault that can encompass several cases in which protection mechanisms within the processor architecture are violated by any of the programs that are running, either the kernel or a...
.
On the other hand, some systems have more units of memory than there are addresses. In this case, a more complex scheme such as memory segmentation or paging
Paging
In computer operating systems, paging is one of the memory-management schemes by which a computer can store and retrieve data from secondary storage for use in main memory. In the paging memory-management scheme, the operating system retrieves data from secondary storage in same-size blocks called...
is employed to use different parts of the memory at different times. The last incarnations of the x86 architecture support up to 36 bits of physical memory addresses, which were mapped to the 32-bit linear address space through the PAE
Physical Address Extension
In computing, Physical Address Extension is a feature to allow x86 processors to access a physical address space larger than 4 gigabytes....
paging mechanism. Thus, only 1/16 of the possible total memory may be accessed at a time. Another example in the same computer family was the 16-bit protected mode
Protected mode
In computing, protected mode, also called protected virtual address mode, is an operational mode of x86-compatible central processing units...
of the 80286 processor, which, though supporting only 16 MiB of physical memory, could access up to 1 GiB of virtual memory, but the combination of 16-bit address and segment registers made accessing more than 64 KiB in one data structure cumbersome. Some restrictions of ANSI pointer arithmetic may have been due to the segmented memory models of this processor family.
In order to provide a consistent interface, some architectures provide memory-mapped I/O
Memory-mapped I/O
Memory-mapped I/O and port I/O are two complementary methods of performing input/output between the CPU and peripheral devices in a computer...
, which allows some addresses to refer to units of memory while others refer to device register
Device register
A Device Register is the view any device presents to a programmer.Each programmable bit in the device is presented with a logical address and it appears as a part of a byte in the device registers...
s of other devices in the computer. There are analogous concepts such as file offsets, array indices, and remote object references that serve some of the same purposes as addresses for other types of objects.
Uses
Pointers are directly supported without restrictions in languages such as PL/IPL/I
PL/I is a procedural, imperative computer programming language designed for scientific, engineering, business and systems programming applications...
, C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
, C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
, Pascal, and most assembly language
Assembly language
An assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...
s. They are primarily used for constructing reference
Reference (computer science)
In computer science, a reference is a value that enables a program to indirectly access a particular data item, such as a variable or a record, in the computer's memory or in some other storage device. The reference is said to refer to the data item, and accessing those data is called...
s, which in turn are fundamental to constructing nearly all data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
s, as well as in passing data between different parts of a program.
In functional programming languages that rely heavily on lists, pointers and references are managed abstractly by the language using internal constructs like cons
Cons
In computer programming, cons is a fundamental function in most dialects of the Lisp programming language. cons constructs memory objects which hold two values or pointers to values. These objects are referred to as cells, conses, non-atomic s-expressions , or pairs...
.
When dealing with arrays, the critical lookup
Lookup table
In computer science, a lookup table is a data structure, usually an array or associative array, often used to replace a runtime computation with a simpler array indexing operation. The savings in terms of processing time can be significant, since retrieving a value from memory is often faster than...
operation typically involves a stage called address calculation which involves constructing a pointer to the desired data element in the array. If the data elements in the array have lengths that are divisible by powers of two, this arithmetic is usually much more efficient
Algorithmic efficiency
In computer science, efficiency is used to describe properties of an algorithm relating to how much of various types of resources it consumes. Algorithmic efficiency can be thought of as analogous to engineering productivity for a repeating or continuous process, where the goal is to reduce...
. Padding is frequently used as a mechanism for ensuring this is the case, despite the increased memory requirement. In other data structures, such as linked lists, pointers are used as references to explicitly tie one piece of the structure to another.
Pointers are used to pass parameters by reference. This is useful if the programmer wants a function's modifications to a parameter to be visible to the function's caller. This is also useful for returning multiple values from a function.
Pointers can also be used to allocate and deallocate dynamic variables and arrays in memory. Since a variable will often become redundant after it has served its purpose, it is a waste of memory to keep it, and therefore it is good practice to deallocate it (using the original pointer reference) when it is no longer needed. Failure to do so may result in a memory leak
Memory leak
A memory leak, in computer science , occurs when a computer program consumes memory but is unable to release it back to the operating system. In object-oriented programming, a memory leak happens when an object is stored in memory but cannot be accessed by the running code...
(where available free memory gradually, or in severe cases rapidly, diminishes because of an accumulation of numerous redundant memory blocks).
C pointers
The basic syntax to define a pointer is:This declares
ptr
as the identifier of an object of the following type:
- pointer that points to an object of type
int
This is usually stated more succinctly as '
ptr
is a pointer to int
.Because the C language does not specify an implicit initialization for objects of automatic storage duration, care should often be taken to ensure that the address to which
ptr
points is valid; this is why it is sometimes suggested that a pointer be explicitly initialized to the null pointer value, which is traditionally specified in C with the standardized macro NULL
:Dereferencing a null pointer in C produces undefined behavior, which could be catastrophic. However, most implementations simply halt execution of the program in question, usually with a segmentation fault.
However, initializing pointers unnecessarily could hinder program analyses, thereby hiding bugs.
In any case, once a pointer has been declared, the next logical step is for it to point at something:
This assigns the value of
ptr
to be the address of a
. For example, if a
is stored at memory location of 0x8130 then the value of ptr
will be 0x8130 after the assignment. To dereference the pointer, an asterisk is used again:This means take the contents of
ptr
(which is 0x8130), "locate" that address in memory and set its value to 8.If
a
is later accessed again, its new value will be 8.This example may be more clear if memory is examined directly.
Assume that
a
is located at address 0x8130 in memory and ptr
at 0x8134; also assume this is a 32-bit machine such that an int is 32-bits wide. The following is what would be in memory after the following code snippet is executed:Address | Contents |
---|---|
0x8130 | 0x00000005 |
0x8134 | 0x00000000 |
(The NULL pointer shown here is 0x00000000.)
By assigning the address of
a
to ptr
:yields the following memory values:
Address | Contents |
---|---|
0x8130 | 0x00000005 |
0x8134 | 0x00008130 |
Then by dereferencing
ptr
by coding:the computer will take the contents of
ptr
(which is 0x8130), 'locate' that address, and assign 8 to that location yielding the following memory:Address | Contents |
---|---|
0x8130 | 0x00000008 |
0x8134 | 0x00008130 |
Clearly, accessing
a
will yield the value of 8 because the previous instruction modified the contents of a
by way of the pointer ptr
.C arrays
In C, array indexing is formally defined in terms of pointer arithmetic; that is, the language specification requires thatarray[i]
be equivalent to *(array + i)
. Thus in C, arrays can be thought of as pointers to consecutive areas of memory (with no gaps), and the syntax for accessing arrays is identical for that which can be used to dereference pointers. For example, an array array
can be declared and used in the following manner:This allocates a block of five integers and names the block
array
, which acts as a pointer to the block. Another common use of pointers is to point to dynamically allocated memory from mallocMalloc
C dynamic memory allocation refers to performing dynamic memory allocation in the C via a group of functions in the C standard library, namely malloc, realloc, calloc and free....
which returns a consecutive block of memory of no less than the requested size that can be used as an array.
While most operators on arrays and pointers are equivalent, it is important to note that the
sizeof
operator will differ. In this example, sizeof(array)
will evaluate to 5*sizeof(int)
(the size of the array), while sizeof(ptr)
will evaluate to sizeof(int*)
, the size of the pointer itself.Default values of an array can be declared like:
If you assume that
array
is located in memory starting at address 0x1000 on a 32-bit little-endian machine then memory will contain the following (values are in hexadecimalHexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...
, like the addresses):
0 | 1 | 2 | 3 | |
1000 | 2 | 0 | 0 | 0 |
1004 | 4 | 0 | 0 | 0 |
1008 | 3 | 0 | 0 | 0 |
100C | 1 | 0 | 0 | 0 |
1010 | 5 | 0 | 0 | 0 |
Represented here are five integers: 2, 4, 3, 1, and 5. These five integers occupy 32 bits (4 bytes) each with the least-significant byte stored first (this is a little-endian CPU architecture) and are stored consecutively starting at address 0x1000.
The syntax for C with pointers is:
-
array
means 0x1000 -
array+1
means 0x1004 (note that the "+1" really means to add one times the size of anint
(4 bytes) not literally "plus one") -
*array
means to dereference the contents ofarray
. Considering the contents as a memory address (0x1000) , look up the value at that location (0x0002). -
array[i]
means element numberi
, 0-based, ofarray
which is translated into*(array + i)
The last example is how to access the contents of
array
. Breaking it down:
-
array + i
is the memory location of the (i+1)th element ofarray
-
*(array + i)
takes that memory address and dereferences it to access the value.
E.g.
array[3]
is synonymous with *(array+3)
, meaning *(0x1000 + 3*sizeof(int))
, which says "dereference the value stored at 0x100C
", in this case 0x0001
.C linked list
Below is an example definition of a linked listLinked list
In computer science, a linked list is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of a datum and a reference to the next node in the sequence; more complex variants add additional links...
in C.
Note that this pointer-recursive definition is essentially the same as the reference-recursive definition from the Haskell programming language
Haskell (programming language)
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...
:
data Link a = Nil
| Cons a (Link a)
Nil
is the empty list, and Cons a (Link a)
is a consCons
In computer programming, cons is a fundamental function in most dialects of the Lisp programming language. cons constructs memory objects which hold two values or pointers to values. These objects are referred to as cells, conses, non-atomic s-expressions , or pairs...
cell of type
a
with another link also of type a
.The definition with references, however, is type-checked and does not use potentially confusing signal values. For this reason, data structures in C are usually dealt with via wrapper function
Wrapper function
A wrapper function is a function in a computer program whose main purpose is to call a second function with little or no additional computation. This is also known as method delegation. Wrapper functions can be used for a number of purposes....
s, which are carefully checked for correctness.
Pass-by-address using pointers
Pointers can be used to pass variables by their address, allowing their value to be changed. For example consider the following C++C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
code:
Dynamic memory allocation
Pointers are used to store and manage the addresses of dynamically allocated blocks of memory. Such blocks are used to store data objects or arrays of objects. Most structured and object-oriented languages provide an area of memory, called the heap or free store, from which objects are dynamically allocated.The example C code below illustrates how structure objects are dynamically allocated and referenced. The standard C library provides the function
malloc
Malloc
C dynamic memory allocation refers to performing dynamic memory allocation in the C via a group of functions in the C standard library, namely malloc, realloc, calloc and free....
for allocating memory blocks from the heap. It takes the size of an object to allocate as a parameter and returns a pointer to a newly allocated block of memory suitable for storing the object, or it returns a null pointer if the allocation failed.
The code below illustrates how memory objects are dynamically deallocated, i.e., returned to the heap or free store. The standard C library provides the function
free
for deallocating a previously allocated memory block and returning it back to the heap.Memory-mapped hardware
On some computing architectures, pointers can be used to directly manipulate memory or memory-mapped devices.Assigning addresses to pointers is an invaluable tool when programming microcontrollers. Below is a simple example declaring a pointer of type int and initialising it to a hexadecimal
Hexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...
address in this example the constant
0x7FFF
:In the mid 80s, using the BIOS
BIOS
In IBM PC compatible computers, the basic input/output system , also known as the System BIOS or ROM BIOS , is a de facto standard defining a firmware interface....
to access the video capabilities of PCs was slow. Applications that were display-intensive typically used to access CGA
Color Graphics Adapter
The Color Graphics Adapter , originally also called the Color/Graphics Adapter or IBM Color/Graphics Monitor Adapter, introduced in 1981, was IBM's first color graphics card, and the first color computer display standard for the IBM PC....
video memory directly by casting the hexadecimal
Hexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...
constant
0xB8000
to a pointer to an array of 80 unsigned 16-bit int values. Each value consisted of an ASCIIASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
code in the low byte, and a colour in the high byte. Thus, to put the letter 'A' at row 5, column 2 in bright white on blue, one would write code like the following:
Typed pointers and casting
In many languages, pointers have the additional restriction that the object they point to has a specific type. For example, a pointer may be declared to point to an integer
Integer
The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...
; the language will then attempt to prevent the programmer from pointing it to objects which are not integers, such as floating-point numbers, eliminating some errors.
For example, in C
money
would be an integer pointer and bags
would be a char pointer.The following would yield a compiler warning of "assignment from incompatible pointer type" under GCC
GNU Compiler Collection
The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain...
because
money
and bags
were declared with different types.To suppress the compiler warning, it must be made explicit that you do indeed wish to make the assignment by typecasting it
which says to cast the integer pointer of
money
to a char pointer and assign to bags
.A 2005 draft of the C standard requires that casting a pointer derived from one type to one of another type should maintain the alignment correctness for both types (6.3.2.3 Pointers, par. 7):
In languages that allow pointer arithmetic, arithmetic on pointers takes into account the size of the type. For example, adding an integer number to a pointer produces another pointer that points to an address that is higher by that number times the size of the type. This allows us to easily compute the address of elements of an array of a given type, as was shown in the C arrays example above. When a pointer of one type is cast to another type of a different size, the programmer should expect that pointer arithmetic will be calculated differently. In C, for example, if the
money
array starts at 0x2000 and sizeof(int)
is 4 bytes whereas sizeof(char)
is 1 bytes, then (money+1)
will point to 0x2004 but (bags+1)
will point to 0x2001. Other risks of casting include loss of data when "wide" data is written to "narrow" locations (e.g. bags[0]=65537;
), unexpected results when bit-shifting values, and comparison problems, especially with signed vs unsigned values.Although it is impossible in general to determine at compile-time which casts are safe, some languages store run-time type information
Run-time type information
In programming, RTTI refers to a C++ system that makes information about an object's data type available at runtime. Run-time type information can apply to simple data types, such as integers and characters, or to generic objects...
which can be used to confirm that these dangerous casts are valid at runtime. Other languages merely accept a conservative approximation of safe casts, or none at all.
Making pointers safer
Because a pointer allows a program to attempt to access an object that may not be defined, pointers can be the source of a variety of programming errors
Software bug
A software bug is the common term used to describe an error, flaw, mistake, failure, or fault in a computer program or system that produces an incorrect or unexpected result, or causes it to behave in unintended ways. Most bugs arise from mistakes and errors made by people in either a program's...
. However, the usefulness of pointers is so great that it can be difficult to perform programming tasks without them. Consequently, many languages have created constructs designed to provide some of the useful features of pointers without some of their pitfalls
Anti-pattern
In software engineering, an anti-pattern is a pattern that may be commonly used but is ineffective and/or counterproductive in practice.The term was coined in 1995 by Andrew Koenig,...
.
One major problem with pointers is that as long as they can be directly manipulated as a number, they can be made to point to unused addresses or to data which is being used for other purposes. Many languages, including most functional programming languages and recent imperative languages like Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
, replace pointers with a more opaque type of reference, typically referred to as simply a reference, which can only be used to refer to objects and not manipulated as numbers, preventing this type of error. Array indexing is handled as a special case.
A pointer which does not have any address assigned to it is called a wild pointer. Any attempt to use such uninitialized pointers can cause unexpected behavior, either because the initial value is not a valid address, or because using it may damage other parts of the program. The result is often a segmentation fault
Segmentation fault
A segmentation fault , bus error or access violation is generally an attempt to access memory that the CPU cannot physically address. It occurs when the hardware notifies an operating system about a memory access violation. The OS kernel then sends a signal to the process which caused the exception...
or storage violation
Storage violation
A storage violation occurs when a task modifies, or attempts to modify, computer storage that it does not own.-Types of storage violation:Storage violation can, for instance, consist of writing to or freeing storage not owned by the task....
.
In systems with explicit memory allocation, it is possible to create a dangling pointer
Dangling pointer
Dangling pointers and wild pointers in computer programming are pointers that do not point to a valid object of the appropriate type. These are special cases of memory safety violations....
by deallocating the memory region it points into. This type of pointer is dangerous and subtle because a deallocated memory region may contain the same data as it did before it was deallocated but may be then reallocated and overwritten by unrelated code, unknown to the earlier code. It is claimed that languages with garbage collection
Garbage collection (computer science)
In computer science, garbage collection is a form of automatic memory management. The garbage collector, or just collector, attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the program...
prevent this type of error (because deallocation is performed automatically) but the pointer itself is not removed by the garbage collector and it may point to irrelevant and unpredictable data if re-used at any time after it has been deallocated.
Some languages, like C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
, support smart pointer
Smart pointer
In computer science, a smart pointer is an abstract data type that simulates a pointer while providing additional features, such as automatic garbage collection or bounds checking. These additional features are intended to reduce bugs caused by the misuse of pointers while retaining efficiency...
s, which use a simple form of reference counting
Reference counting
In computer science, reference counting is a technique of storing the number of references, pointers, or handles to a resource such as an object, block of memory, disk space or other resource...
to help track allocation of dynamic memory in addition to acting as a reference. In the absence of reference cycles, where an object refers to itself indirectly through a sequence of smart pointers, these eliminate the possibility of dangling pointers and memory leaks. Delphi
Borland Delphi
Embarcadero Delphi is an integrated development environment for console, desktop graphical, web, and mobile applications.Delphi's compilers use its own Object Pascal dialect of Pascal and generate native code for 32- and 64-bit Windows operating systems, as well as 32-bit Mac OS X and iOS...
strings support reference counting natively.
Null pointer
A null pointer has a value reserved for indicating that the pointer does not refer to a valid object. Null pointers are routinely used to represent conditions such as the end of a list of unknown length or the failure to perform some action; this use of null pointers can be compared to nullable types and to the Nothing value in an option type
Option type
In programming languages and type theory, an option type or maybe type is a polymorphic type that represents encapsulation of an optional value; e.g. it is used as the return type of functions which may or may not return a meaningful value when they're applied...
.
Null pointers are often considered similar to null values in relational database
Relational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
s, but they have somewhat different semantics. Null pointer in most programming languages means "no value", while null value in relational database means "unknown value". This leads to important difference in practice: two null pointers are considered equal in most programming languages, but two null values in relational database are not (since they represent unknown values, it is unknown whether they are equal).
In some programming language environments (at least one proprietary Lisp implementation, for example), the value used as the null pointer (called
nil
in Lisp) may actually be a pointer to a block of internal data useful to the implementation (but not explicitly reachable from user programs), thus allowing the same register to be used as a useful constant and a quick way of accessing implementation internals. This is known as the nil
vector.In C, two null pointers of any type are guaranteed to compare equal. The macro
NULL
is defined as an implementation-defined null pointer constant, which in C99 can be portably expressed as the integer value 0
converted implicitly or explicitly to the type void*
. Note, though, that the physical address zero is often directly accessible by hardware (for instance, it is directly accessible in x86 real modeReal mode
Real mode, also called real address mode, is an operating mode of 80286 and later x86-compatible CPUs. Real mode is characterized by a 20 bit segmented memory address space and unlimited direct software access to all memory, I/O addresses and peripheral hardware...
, and it is where the interrupt table is stored), but modern operating systems usually map virtual address spaces in such a way that accessing address zero is forbidden.
In C++, while the
NULL
macro was inherited from C, the integer literal for zero has been traditionally preferred to represent a null pointer constant. However, C++11 has introduced an explicit nullptr
constant to be used instead.A null pointer should not be confused with an uninitialized pointer: A null pointer is guaranteed to compare unequal to any pointer that points to a valid object. However, depending on the language and implementation, an uninitialized pointer has either an indeterminate (random or meaningless) value or a specific value that is not necessarily any kind of null pointer constant.
The null reference was invented by C.A.R. Hoare in 1965 as part of the Algol W
ALGOL W
ALGOL W is a programming language. It was based on a proposal for ALGOL X by Niklaus Wirth and C. A. R. Hoare as a successor to ALGOL 60 in IFIP Working Group 2.1. When the committee decided that the proposal was not a sufficient advance over ALGOL 60, the proposal was published as A contribution...
language. Hoare later (2009) described his invention as a "billion-dollar mistake":
Because a null pointer does not point to a meaningful object, an attempt to dereference a null pointer usually causes a run-time error:
- In C, the behavior of dereferencing a null pointer is undefined. Many implementations cause such code to result in the program being halted with a segmentation faultSegmentation faultA segmentation fault , bus error or access violation is generally an attempt to access memory that the CPU cannot physically address. It occurs when the hardware notifies an operating system about a memory access violation. The OS kernel then sends a signal to the process which caused the exception...
, because the null pointer representation is chosen to be an address that is never allocated by the system for storing objects. However, this behavior is not universal. - In Java, access to a null reference triggers a , which can be caught by error handling code, but the preferred practice is to ensure that such exceptions never occur.
- In Objective-CObjective-CObjective-C is a reflective, object-oriented programming language that adds Smalltalk-style messaging to the C programming language.Today, it is used primarily on Apple's Mac OS X and iOS: two environments derived from the OpenStep standard, though not compliant with it...
, messages may be sent to anil
object (which is essentially a null pointer) without causing the program to be interrupted; the message is simply ignored, and the return value (if any) isnil
or0
, depending on the type.
In languages with a tagged architecture
Tagged architecture
In computer science, a tagged architecture is a particular type of computer architecture where every word of memory constitutes a tagged union, being divided into a number of bits of data, and a tag section that describes the type of the data: how it is to be interpreted, and, if it is a reference,...
, a possibly-null pointer can be replaced with a tagged union
Tagged union
In computer science, a tagged union, also called a variant, variant record, discriminated union, or disjoint union, is a data structure used to hold a value that could take on several different, but fixed types. Only one of the types can be in use at any one time, and a tag field explicitly...
which enforces explicit handling of the exceptional case; in fact, a possibly-null pointer can be seen as a tagged pointer
Tagged pointer
In computer science, a tagged pointer is a common example of a tagged union, where the primary type of data to be stored in the union is a pointer...
with a computed tag.
Autorelative pointer
The term autorelative pointer may refer to a pointer whose value is interpreted as an offset from the address of the pointer itself; thus, if a data structure, , has an autorelative pointer member, , that points to some portion of itself, then may be relocated in memory without having to update the value of .
The cited patent also uses the term self-relative pointer to mean the same thing. However, the meaning of that term has been used in other ways:
- It is often used to mean an offset from the address of a structure rather than from the address of the pointer itself.
- It has been used to mean a pointer containing its own address, which can be useful for reconstructing in any arbitrary region of memory a collection of data structures that point to each other.
Based pointer
A based pointer is a pointer whose value is an offset from the value of another pointer. This can be used to store and load blocks of data, assigning the address of the beginning of the block to the base pointer.
Double indirectionIn some languages a pointer can reference another pointer, requiring two dereference operations to get to the original value. While each level of indirection may add a performance cost, it is sometimes necessary in order to provide correct behavior for complex data structures. For example, in C it is typical to define a linked list
Linked list
In computer science, a linked list is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of a datum and a reference to the next node in the sequence; more complex variants add additional links...
in terms of an element that contains a pointer to the next element of the list:
This implementation uses a pointer to the first element in the list as a surrogate for the entire list. If a new value is added to the beginning of the list,
head
has to be changed to point to the new element. Since C arguments are always passed by value, using double indirection allows the insertion to be implemented correctly, and has the desirable side-effect of eliminating special case code to deal with insertions at the front of the list:In this case, if the value of
item
is less than that of head
, the caller's head
is properly updated to the address of the new item.Wild pointers
Wild pointers are pointers that have not been initialized (that is, a wild pointer has not had any address assigned to it) and may make a program crash or behave oddly. In the Pascal or C programming languages
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
, pointers that are not specifically initialized may point to unpredictable addresses in memory.
The following example code shows a wild pointer:
Here,
p2
may point to anywhere in memory, so performing the assignment *p2 = 'b'
can corrupt an unknown area of memory or trigger a segmentation faultSegmentation fault
A segmentation fault , bus error or access violation is generally an attempt to access memory that the CPU cannot physically address. It occurs when the hardware notifies an operating system about a memory access violation. The OS kernel then sends a signal to the process which caused the exception...
.
Wild branch
Where a pointer is used as the address of the entry point to a program or start of a subroutine
Subroutine
In computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....
and is also either uninitialized or corrupted, if a call or jump is nevertheless made to this address, a "wild branch
Wild branch
In computer programming, a wild branch is a description of a GOTO instruction where the target address is indeterminate, random or otherwise unintended. It is usually the result of a software bug causing the accidental corruption of a pointer, index or array subscript. It is "wild" in the sense...
" is said to have occurred. The consequences are usually unpredictable and the error may present itself in several different ways depending upon whether or not the pointer is a "valid" address and whether or not there is (coincidentally) a valid instruction (opcode) at that address. The detection of a wild branch can present one of the most difficult and frustrating debugging exercises since much of the evidence may already have been destroyed beforehand or by execution of one or more inappropriate instructions at the branch location. If available, an instruction set simulator
Instruction Set Simulator
An instruction set simulator is a simulation model, usually coded in a high-level programming language, which mimics the behavior of a mainframe or microprocessor by "reading" instructions and maintaining internal variables which represent the processor's registers.Instruction simulation is a...
can usually not only detect a wild branch before it takes effect, but also provide a complete or partial trace of its history.
Simulation using an array index
It is possible to simulate pointer behavior using an index to an (normally one-dimensional) array.
Primarily for languages which do not support pointers explicitly but do support arrays, the array
Array data type
In computer science, an array type is a data type that is meant to describe a collection of elements , each selected by one or more indices that can be computed at run time by the program. Such a collection is usually called an array variable, array value, or simply array...
can be thought of and processed as if it were the entire memory range (within the scope of the particular array) and any index
Index (information technology)
In computer science, an index can be:# an integer that identifies an array element# a data structure that enables sublinear-time lookup -Array element identifier:...
to it can be thought of as equivalent to a general purpose register in assembly language (that points to the individual bytes but whose actual value is relative to the start of the array, not its absolute address in memory).
Assuming the array is, say, a contiguous 16 megabyte
Megabyte
The megabyte is a multiple of the unit byte for digital information storage or transmission with two different values depending on context: bytes generally for computer memory; and one million bytes generally for computer storage. The IEEE Standards Board has decided that "Mega will mean 1 000...
character data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
, individual bytes (or a string
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....
of contiguous bytes within the array) can be directly addressed and manipulated using the name of the array with a 31 bit unsigned integer
Integer
The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...
as the simulated pointer (this is quite similar to the C arrays example shown above). Pointer arithmetic can be simulated by adding or subtracting from the index, with minimal additional overhead compared to genuine pointer arithmetic.
It is even theoretically possible, using the above technique, together with a suitable instruction set simulator
Instruction Set Simulator
An instruction set simulator is a simulation model, usually coded in a high-level programming language, which mimics the behavior of a mainframe or microprocessor by "reading" instructions and maintaining internal variables which represent the processor's registers.Instruction simulation is a...
to simulate any machine code
Machine code
Machine code or machine language is a system of impartible instructions executed directly by a computer's central processing unit. Each instruction performs a very specific task, typically either an operation on a unit of data Machine code or machine language is a system of impartible instructions...
or the intermediate (byte code) of any processor/language in another language that does not support pointers at all (for example Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
/ JavaScript
JavaScript
JavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....
). To achieve this, the binary
Binary numeral system
The binary numeral system, or base-2 number system, represents numeric values using two symbols, 0 and 1. More specifically, the usual base-2 system is a positional notation with a radix of 2...
code can initially be loaded into contiguous bytes of the array for the simulator to "read", interpret and action entirely within the memory contained of the same array.
If necessary, to completely avoid buffer overflow
Buffer overflow
In computer security and programming, a buffer overflow, or buffer overrun, is an anomaly where a program, while writing data to a buffer, overruns the buffer's boundary and overwrites adjacent memory. This is a special case of violation of memory safety....
problems, bounds checking
Bounds checking
In computer programming, bounds checking is any method of detecting whether a variable is within some bounds before its use. It is particularly relevant to a variable used as an index into an array to ensure its value lies within the bounds of the array...
can usually be actioned for the compiler (or if not, hand coded in the simulator).
Quotations
Ada
Ada is a strongly typed language where all pointers are typed and only safe type conversions are permitted. All pointers are by default initialized tonull
, and any attempt to access data through a null
pointer causes an exceptionException handling
Exception handling is a programming language construct or computer hardware mechanism designed to handle the occurrence of exceptions, special conditions that change the normal flow of program execution....
to be raised. Pointers in Ada are called access types. Ada 83 did not permit arithmetic on access types (although many compiler vendors provided for it as a non-standard feature), but Ada 95 supports “safe” arithmetic on access types via the package
System.Storage_Elements
.BASIC
Several old versions of BASICBASIC
BASIC is a family of general-purpose, high-level programming languages whose design philosophy emphasizes ease of use - the name is an acronym from Beginner's All-purpose Symbolic Instruction Code....
had support for STRPTR to return the address of a string, and for VARPTR to return the address of a variable. Visual Basic 5 also had support for OBJPTR to return the address of an object interface, and for an ADDRESSOF operator to return the address of a function. The types of all of these are integers, but their values are equivalent to those held by pointer types.
Newer dialects of BASIC
BASIC
BASIC is a family of general-purpose, high-level programming languages whose design philosophy emphasizes ease of use - the name is an acronym from Beginner's All-purpose Symbolic Instruction Code....
, such as FreeBASIC
FreeBASIC
FreeBASIC is a free/open source , 32-bit BASIC compiler for Microsoft Windows, protected-mode DOS , Linux, FreeBSD and Xbox....
or BlitzMax, have exhaustive pointer implementations, however. In FreeBASIC, arithmetic on
ANY
pointers (equivalent to C's void*
) are treated as though the ANY
pointer was a byte width. ANY
pointers cannot be dereferenced, as in C. Also, casting between ANY
and any other type's pointers will not generate any warnings.C and C++
In CC (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
and C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
pointers are variables that store addresses and can be null. Each pointer has a type it points to, but one can freely cast between pointer types, although the behavior is implementation-defined. A special pointer type called the “void pointer” allows pointing to any variable type, but is limited by the fact that it cannot be dereferenced directly. The address itself can often be directly manipulated by casting a pointer to and from an integral type of sufficient size, though the results are implementation-defined and may indeed cause undefined behavior; while earlier C standards did not have an integral type that was guaranteed to be large enough, C99
C99
C99 is a modern dialect of the C programming language. It extends the previous version with new linguistic and library features, and helps implementations make better use of available computer hardware and compiler technology.-History:...
specifies the
uintptr_t
typedef name defined in
, but an implementation need not provide it.C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
fully supports C pointers and C typecasting. It also supports a new group of typecasting operators to help catch some unintended dangerous casts at compile-time. The C++ standard library
C++ standard library
In C++, the C++ Standard Library is a collection of classes and functions, which are written in the core language and part of the C++ ISO Standard itself...
also provides
auto ptrAuto ptrauto_ptr is a class template available in the C++ Standard Library that provides some basic RAII features for C++ raw pointers....
, a sort of smart pointerSmart pointer
In computer science, a smart pointer is an abstract data type that simulates a pointer while providing additional features, such as automatic garbage collection or bounds checking. These additional features are intended to reduce bugs caused by the misuse of pointers while retaining efficiency...
which can be used in some situations as a safe alternative to primitive C pointers. C++ also supports another form of reference, quite different from a pointer, called simply a reference
Reference (C++)
In the C++ programming language, a reference is a simple reference datatype that is less powerful but safer than the pointer type inherited from C...
or reference type.
Pointer arithmetic, that is, the ability to modify a pointer's target address with arithmetic operations (as well as magnitude comparisons), is restricted by the language standard to remain within the bounds of a single array object (or just after it), though many non-segmented architectures will allow for more lenient arithmetic. Adding or subtracting from a pointer moves it by a multiple of the size of the datatype it points to. For example, adding 1 to a pointer to 4-byte integer values will increment the pointer by 4. This has the effect of incrementing the pointer to point at the next element in a contiguous array of integers—which is often the intended result. Pointer arithmetic cannot be performed on
void
pointers because the void typeVoid type
The void type, in several programming languages derived from C and Algol68, is the type for the result of a function that returns normally, but does not provide a result value to its caller. Usually such functions are called for their side effects, such as performing some task or writing to their...
has no size, and thus the pointed address can not be added to, although gcc
GNU Compiler Collection
The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain...
and other compilers will perform byte arithmetic on
void*
as a non-standard extension. For working "directly" with bytes they usually cast pointers to BYTE*
, or unsigned char*
if BYTE
is not defined in the standard library used.Pointer arithmetic provides the programmer with a single way of dealing with different types: adding and subtracting the number of elements required instead of the actual offset in bytes. (though the
char
pointer, char
being defined as always having a size of one byte, allows the element offset of pointer arithmetic to in practice be equal to a byte offset) In particular, the C definition explicitly declares that the syntax a[n]
, which is the n
-th element of the array a
, is equivalent to *(a+n)
, which is the content of the element pointed by a+n
. This implies that n[a]
is equivalent to a[n]
, and one can write, e.g., a[3]
or 3[a]
equally well to access the fourth element of an array a
.While powerful, pointer arithmetic can be a source of computer bugs. It tends to confuse novice programmer
Programmer
A programmer, computer programmer or coder is someone who writes computer software. The term computer programmer can refer to a specialist in one area of computer programming or to a generalist who writes code for many kinds of software. One who practices or professes a formal approach to...
s, forcing them into different contexts: an expression can be an ordinary arithmetic one or a pointer arithmetic one, and sometimes it is easy to mistake one for the other. In response to this, many modern high-level computer languages (for example Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
) do not permit direct access to memory using addresses. Also, the safe C dialect Cyclone
Cyclone programming language
The Cyclone programming language is intended to be a safe dialect of the C language. Cyclone is designed to avoid buffer overflows and other vulnerabilities that are endemic in C programs, without losing the power and convenience of C as a tool for system programming.Cyclone development was started...
addresses many of the issues with pointers. See C programming language for more criticism.
The
void
pointer, or void*
, is supported in ANSI C and C++ as a generic pointer type. A pointer to void
can store an address to any data type, and, in C, is implicitly converted to any other pointer type on assignment, but it must be explicitly cast if dereferenced inline.K&R
The C Programming Language (book)
The C Programming Language is a well-known programming book written by Brian Kernighan and Dennis Ritchie, the latter of whom originally designed and implemented the language, as well as co-designed the Unix operating system with which development of the language was closely intertwined...
C used
char*
for the “type-agnostic pointer” purpose (before ANSI C).C++ does not allow the implicit conversion of
void*
to other pointer types, not even in assignments. This was a design decision to avoid careless and even unintended casts, though most compilers only output warnings, not errors, when encountering other ill casts.In C++, there is no
void&
(reference to void) to complement void*
(pointer to void), because references behave like aliases to the variables they point to, and there can never be a variable whose type is void
.C#
In the C# programming language, pointers are supported only under certain conditions: any block of code including pointers must be marked with theunsafe
keyword. Such blocks usually require higher security permissions than pointerless code to be allowed to run.The syntax is essentially the same as in C++, and the address pointed can be either managed
Managed code
Managed code is a term coined by Microsoft to identify computer program code that requires and will only execute under the "management" of a Common Language Runtime virtual machine ....
or unmanaged
Managed code
Managed code is a term coined by Microsoft to identify computer program code that requires and will only execute under the "management" of a Common Language Runtime virtual machine ....
memory. However, pointers to managed memory (any pointer to a managed object) must be declared using the
fixed
keyword, which prevents the garbage collectorGarbage collection (computer science)
In computer science, garbage collection is a form of automatic memory management. The garbage collector, or just collector, attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the program...
from moving the pointed object as part of memory management while the pointer is in scope, thus keeping the pointer address valid.
An exception to this is from using the
IntPtr
structure, which is a safe managed equivalent to int*
, and does not require unsafe code. This type is often returned when using methods from the System.Runtime.InteropServices
, for example:The .NET framework includes many classes and methods in the
System
and System.Runtime.InteropServices
namespaces (such as the Marshal
class) which convert .NET types (for example, System.String
) to and from many unmanagedManaged code
Managed code is a term coined by Microsoft to identify computer program code that requires and will only execute under the "management" of a Common Language Runtime virtual machine ....
types and pointers (for example,
LPWSTR
or void *
) to allow communication with unmanaged codeManaged code
Managed code is a term coined by Microsoft to identify computer program code that requires and will only execute under the "management" of a Common Language Runtime virtual machine ....
.
COBOL
The COBOLCOBOL
COBOL is one of the oldest programming languages. Its name is an acronym for COmmon Business-Oriented Language, defining its primary domain in business, finance, and administrative systems for companies and governments....
programming language supports pointers to variables. Primitive or group (record) data objects declared within the
LINKAGE
SECTION
of a program are inherently pointer-based, with space for the address of the data object (typically a single memory word) allocated as the actual (implicit hidden) data item. In program source code, these variables are used just like any other WORKING-STORAGE
variable, but their contents are implicitly accessed indirectly through their pointers.Memory space for each pointed-to data object is typically allocated dynamically using external
CALL
Subroutine
In computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....
statements or via embedded extended language constructs such as
EXEC
CICS
CICS
Customer Information Control System is a transaction server that runs primarily on IBM mainframe systems under z/OS and z/VSE.CICS is a transaction manager designed for rapid, high-volume online processing. This processing is mostly interactive , but background transactions are possible...
or
EXEC
SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....
statements.
Extended versions of COBOL also provide pointer variables declared with
USAGE
IS
POINTER
clauses. The values of such pointer variables are established and modified using SET
and SET
ADDRESS
statements.Some extended versions of COBOL also provide
PROCEDURE-POINTER
variables, which are capable of storing the addresses of executable codeFunction pointer
A function pointer is a type of pointer in C, C++, D, and other C-like programming languages, and Fortran 2003. When dereferenced, a function pointer can be used to invoke a function and pass it arguments just like a normal function...
.
D
The D programming language is a derivative of C and C++ which fully supports C pointers and C typecasting.Eiffel
The Eiffel object-oriented language supports pointers in the form of referencesReference (computer science)
In computer science, a reference is a value that enables a program to indirectly access a particular data item, such as a variable or a record, in the computer's memory or in some other storage device. The reference is said to refer to the data item, and accessing those data is called...
, which are typed and do not allow any form of pointer arithmetic. The ECMA
Ecma International
Ecma International is an international, private non-profit standards organization for information and communication systems. It acquired its name in 1994, when the European Computer Manufacturers Association changed its name to reflect the organization's global reach and activities...
standard for Eiffel includes an "attached type" mechanism that claims to guarantee void safety
Void safety
Void safety is a guarantee within an object-oriented programming language that that no object references will have null or void values....
.
Fortran
Fortran-90Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...
introduced a strongly typed pointer capability. Fortran pointers contain more than just a simple memory address. They also encapsulate the lower and upper bounds of array dimensions, strides (for example, to support arbitrary array sections), and other metadata. An association operator,
=>
is used to associate a POINTER
to a variable which has a TARGET
attribute. The Fortran-90 ALLOCATE
statement may also be used to associate a pointer to a block of memory. For example, the following code might be used to define and create a linked list structure:Fortran-2003 adds support for procedure pointers. Also, as part of the C Interoperability feature, Fortran-2003 supports intrinsic functions for converting C-style pointers into Fortran pointers and back.
Go
GoGo (programming language)
Go is a compiled, garbage-collected, concurrent programming language developed by Google Inc.The initial design of Go was started in September 2007 by Robert Griesemer, Rob Pike, and Ken Thompson. Go was officially announced in November 2009. In May 2010, Rob Pike publicly stated that Go was being...
has pointers. Its declaration syntax is equivalent to that of C, but written the other way around, ending with the type. Unlike C, Go has garbage collection, and disallows pointer arithmetic. Reference types, like in C++, do not exist. Some built-in types, like maps and channels, are boxed (i.e. internally they are pointers to mutable structures), and are initialized using the
make
function. As a different (than reference types) approach to unified syntax between pointers and non-pointers, the arrow (->
) operator has been dropped -- it is possible to use the dot operator directly on a pointer to a data type to access a field or method of the dereferenced value, as if the dot operator were used on the underlying data type. This, however, only works with 1 level of indirection.Modula-2
Pointers are implemented very much as in Pascal, as areVAR
parameters in procedure calls. Modula-2Modula-2
Modula-2 is a computer programming language designed and developed between 1977 and 1980 by Niklaus Wirth at ETH Zurich as a revision of Pascal to serve as the sole programming language for the operating system and application software for the personal workstation Lilith...
is even more strongly typed than Pascal, with fewer ways to escape the type system. Some of the variants of Modula-2 (such as Modula-3
Modula-3
In computer science, Modula-3 is a programming language conceived as a successor to an upgraded version of Modula-2 known as Modula-2+. While it has been influential in research circles it has not been adopted widely in industry...
) include garbage collection.
Oberon
Much as with Modula-2, pointers are available. There are still fewer ways to evade the type system and so OberonOberon (programming language)
Oberon is a programming language created in 1986 by Professor Niklaus Wirth and his associates at ETH Zurich in Switzerland. It was developed as part of the implementation of the Oberon operating system...
and its variants are still safer with respect to pointers than Modula-2 or its variants. As with Modula-3
Modula-3
In computer science, Modula-3 is a programming language conceived as a successor to an upgraded version of Modula-2 known as Modula-2+. While it has been influential in research circles it has not been adopted widely in industry...
, garbage collection is a part of the language specification.
Pascal
Unlike many languages that feature pointers, standard ISO Pascal only allows pointers to reference dynamically created variables that are anonymous and does not allow them to reference standard static or local variables. It does not have pointer arithmetic. Pointers also must have an associated type and a pointer to one type is not compatible with a pointer to another type (e.g. a pointer to a char is not compatible with a pointer to an integer). This helps eliminate the type security issues inherent with other pointer implementations, particularly those used for PL/IPL/I
PL/I is a procedural, imperative computer programming language designed for scientific, engineering, business and systems programming applications...
or C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
. It also removes some risks caused by dangling pointers, but the ability to dynamically let go of referenced space by using the
dispose
standard procedure (which has the same effect as the free
library function found in CC (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
) means that the risk of dangling pointers has not been entirely eliminated.
However, in some commercial and open source Pascal (or derivatives) compiler implementations —like Free Pascal
Free Pascal
Free Pascal Compiler is a free Pascal and Object Pascal compiler.In addition to its own Object Pascal dialect, Free Pascal supports, to varying degrees, the dialects of several other compilers, including those of Turbo Pascal, Delphi, and some historical Macintosh compilers...
, Turbo Pascal
Turbo Pascal
Turbo Pascal is a software development system that includes a compiler and an integrated development environment for the Pascal programming language running on CP/M, CP/M-86, and DOS, developed by Borland under Philippe Kahn's leadership...
or the Object Pascal
Object Pascal
Object Pascal refers to a branch of object-oriented derivatives of Pascal, mostly known as the primary programming language of Embarcadero Delphi.-Early history at Apple:...
in Embarcadero Delphi— a pointer is allowed to reference standard static or local variables and can be cast from one pointer type to another. Moreover pointer arithmetic is unrestricted: adding or subtracting from a pointer moves it by that number of bytes in either direction, but using the
Inc
or Dec
standard procedures with it moves the pointer by the size of the datatype it is declared to point to.Perl
The Perl programming languagePerl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...
supports pointers, although rarely used, in the form of the pack and unpack functions. These are intended only for simple interactions with compiled OS libraries. In all other cases, Perl uses references
Reference (computer science)
In computer science, a reference is a value that enables a program to indirectly access a particular data item, such as a variable or a record, in the computer's memory or in some other storage device. The reference is said to refer to the data item, and accessing those data is called...
, which are typed and do not allow any form of pointer arithmetic. They are used to construct complex data structures.
See also
- Address constantAddress constantIn IBM/360 , an address constant or "adcon" is an Assembly language data type whose value refers directly to another value stored elsewhere in the computer memory using its address. An address constant can be one, two, three or four bytes long...
- Bounded pointerBounded pointerIn computer science a bounded pointer is a pointer that is augmented with additional information that enable the storage bounds within which it may point to be deduced...
- Buffer overflowBuffer overflowIn computer security and programming, a buffer overflow, or buffer overrun, is an anomaly where a program, while writing data to a buffer, overruns the buffer's boundary and overwrites adjacent memory. This is a special case of violation of memory safety....
- Function pointerFunction pointerA function pointer is a type of pointer in C, C++, D, and other C-like programming languages, and Fortran 2003. When dereferenced, a function pointer can be used to invoke a function and pass it arguments just like a normal function...
- Hazard pointerHazard pointerIn a multithreaded computing environment, a hazard pointer is an element used by a method that allows the memory allocated to the nodes of lock-free dynamic shared objects to be reclaimed....
- Opaque pointerOpaque pointerIn computer programming, an opaque pointer is a special case of an opaque data type, a datatype that is declared to be a pointer to a record or data structure of some unspecified type....
- Pointer swizzlingPointer swizzlingIn computer science, pointer swizzling is the conversion of references based on name or position to direct pointer references. It is typically performed during the deserialization of a relocatable object from disk, such as an executable file or pointer-based data structure...
- Reference (computer science)Reference (computer science)In computer science, a reference is a value that enables a program to indirectly access a particular data item, such as a variable or a record, in the computer's memory or in some other storage device. The reference is said to refer to the data item, and accessing those data is called...
- Static code analysisStatic code analysisStatic program analysis is the analysis of computer software that is performed without actually executing programs built from that software In most cases the analysis is performed on some version of the source code and in the other cases some form of the object code...
- Storage violationStorage violationA storage violation occurs when a task modifies, or attempts to modify, computer storage that it does not own.-Types of storage violation:Storage violation can, for instance, consist of writing to or freeing storage not owned by the task....
- Variable (programming)Variable (programming)In computer programming, a variable is a symbolic name given to some known or unknown quantity or information, for the purpose of allowing the name to be used independently of the information it represents...
External links
- Pointers and Memory Introduction to pointers – Stanford Computer Science Education Library
- 0pointer.de A terse list of minimum length source codes that dereference a null pointer in several different programming languages
- "The C book" – containing pointer examples in ANSI C.