Symbol table
Encyclopedia
In computer science
, a symbol table is a data structure
used by a language translator such as a compiler
or interpreter
, where each identifier
in a program's source code
is associated with information relating to its declaration or appearance in the source, such as its type
, scope
level and sometimes its location
.
implementation. A compiler may use one large symbol table for all symbols or use separated, hierarchical symbol tables for different scopes
.
will contain a symbol table of the identifiers it contains that are externally visible. During the linking of different object files, a linker will use these symbol tables to resolve any unresolved references.
A symbol table may only exist during the translation process, or it may be embedded in the output of that process for later exploitation, for example, during an interactive debugging session
, or as a resource for formatting a diagnostic report during or after execution
of a program.
While reverse engineering an executable, many tools refer to the symbol table to check what addresses have been assigned to global variables and known functions. If the symbol table has been stripped
or cleaned out before being converted into an executable, tools will find it harder to determine addresses or understand anything about the program.
At the time of accessing variables and allocating memory dynamically, a compiler should perform many works and as such the extended stack model requires the symbol table.
nm
utility. There is one data symbol, (noted by the "D" type), and many functions (self defined as well as from the standard library). The first column is where the symbol is located in the memory, the second is "The symbol type" and the third is the name of the symbol. By passing suitable parameters, the symbol table was made to sort on basis of address.
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, a symbol table is a data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
used by a language translator such as a compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...
or interpreter
Interpreter (computing)
In computer science, an interpreter normally means a computer program that executes, i.e. performs, instructions written in a programming language...
, where each identifier
Identifier
An identifier is a name that identifies either a unique object or a unique class of objects, where the "object" or class may be an idea, physical [countable] object , or physical [noncountable] substance...
in a program's source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...
is associated with information relating to its declaration or appearance in the source, such as its type
Data type
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of...
, scope
Scope (programming)
In computer programming, scope is an enclosing context where values and expressions are associated. Various programming languages have various types of scopes. The type of scope determines what kind of entities it can contain and how it affects them—or semantics...
level and sometimes its location
Memory address
A digital computer's memory, more specifically main memory, consists of many memory locations, each having a memory address, a number, analogous to a street address, at which computer programs store and retrieve, machine code or data. Most application programs do not directly read and write to...
.
Implementation
A common implementation technique is to use a hash tableHash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...
implementation. A compiler may use one large symbol table for all symbols or use separated, hierarchical symbol tables for different scopes
Scope (programming)
In computer programming, scope is an enclosing context where values and expressions are associated. Various programming languages have various types of scopes. The type of scope determines what kind of entities it can contain and how it affects them—or semantics...
.
Uses
An object fileObject file
An object file is a file containing relocatable format machine code that is usually not directly executable. Object files are produced by an assembler, compiler, or other language translator, and used as input to the linker....
will contain a symbol table of the identifiers it contains that are externally visible. During the linking of different object files, a linker will use these symbol tables to resolve any unresolved references.
A symbol table may only exist during the translation process, or it may be embedded in the output of that process for later exploitation, for example, during an interactive debugging session
Debugger
A debugger or debugging tool is a computer program that is used to test and debug other programs . The code to be examined might alternatively be running on an instruction set simulator , a technique that allows great power in its ability to halt when specific conditions are encountered but which...
, or as a resource for formatting a diagnostic report during or after execution
Execution (computers)
Execution in computer and software engineering is the process by which a computer or a virtual machine carries out the instructions of a computer program. The instructions in the program trigger sequences of simple actions on the executing machine...
of a program.
While reverse engineering an executable, many tools refer to the symbol table to check what addresses have been assigned to global variables and known functions. If the symbol table has been stripped
Strip (Unix)
In Unix and Unix-like operating systems, the strip program removes unnecessary information from executable binary programs and object files, thus potentially resulting in better performance and sometimes significantly less disk space usage...
or cleaned out before being converted into an executable, tools will find it harder to determine addresses or understand anything about the program.
At the time of accessing variables and allocating memory dynamically, a compiler should perform many works and as such the extended stack model requires the symbol table.
Example
The symbol table of a small program is listed below. The table itself was generated using the GNU binutils'GNU Binary Utilities
The GNU Binary Utilities, or binutils, comprise a collection of programming tools for the manipulation of object code in various object file formats. The current versions were originally written by programmers at Cygnus Solutions using the Binary File Descriptor library...
nm
Nm (Unix)
The nm command ships with a number of later versions of Unix and similar operating systems. nm is used to examine binary files and to display the contents of those files, or meta information stored in them, specifically the symbol table...
utility. There is one data symbol, (noted by the "D" type), and many functions (self defined as well as from the standard library). The first column is where the symbol is located in the memory, the second is "The symbol type" and the third is the name of the symbol. By passing suitable parameters, the symbol table was made to sort on basis of address.
Address | Type | Name |
---|---|---|
00000020 | a | T_BIT |
00000040 | a | F_BIT |
00000080 | a | I_BIT |
20000004 | t | irqvec |
20000008 | t | fiqvec |
2000000c | t | InitReset |
20000018 | T | _main |
20000024 | t | End |
20000030 | T | AT91F_US3_CfgPIO_useB |
2000005c | t | AT91F_PIO_CfgPeriph |
200000b0 | T | main |
20000120 | T | AT91F_DBGU_Printk |
20000190 | t | AT91F_US_TxReady |
200001c0 | t | AT91F_US_PutChar |
200001f8 | T | AT91F_SpuriousHandler |
20000214 | T | AT91F_DataAbort |
20000230 | T | AT91F_FetchAbort |
2000024c | T | AT91F_Undef |
20000268 | T | AT91F_UndefHandler |
20000284 | T | AT91F_LowLevelInit |
200002e0 | t | AT91F_DBGU_CfgPIO |
2000030c | t | AT91F_PIO_CfgPeriph |
20000360 | t | AT91F_US_Configure |
200003dc | t | AT91F_US_SetBaudrate |
2000041c | t | AT91F_US_Baudrate |
200004ec | t | AT91F_US_SetTimeguard |
2000051c | t | AT91F_PDC_Open |
2000059c | t | AT91F_PDC_DisableRx |
200005c8 | t | AT91F_PDC_DisableTx |
200005f4 | t | AT91F_PDC_SetNextTx |
20000638 | t | AT91F_PDC_SetNextRx |
2000067c | t | AT91F_PDC_SetTx |
200006c0 | t | AT91F_PDC_SetRx |
20000704 | t | AT91F_PDC_EnableRx |
20000730 | t | AT91F_PDC_EnableTx |
2000075c | t | AT91F_US_EnableTx |
20000788 | T | __aeabi_uidiv |
20000788 | T | __udivsi3 |
20000884 | T | __aeabi_uidivmod |
2000089c | T | __aeabi_idiv0 |
2000089c | T | __aeabi_ldiv0 |
2000089c | T | __div0 |
200009a0 | D | _data |
200009a0 | A | _etext |
200009a0 | D | holaamigosh |
200009a4 | A | __bss_end__ |
200009a4 | A | __bss_start |
200009a4 | A | __bss_start__ |
200009a4 | A | _edata |
200009a4 | A | _end |