Bytecode
Encyclopedia
Bytecode, also known as p-code
(portable code), is a term which has been used to denote various forms of instruction set
s designed for efficient execution by a software interpreter
as well as being suitable for further compilation into machine code. Since instructions are processed by software, they may be arbitrarily complex, but are nonetheless often akin to traditional hardware instructions; virtual stack machine
s are the most common, but virtual register machine
s have also been built. Different parts may often be stored in separate files, similar to object modules
, but dynamically loaded during execution.
The name bytecode stems from instruction sets which have one-byte
opcode
s followed by optional parameters. Intermediate representations such as bytecode may be output by programming language
implementations to ease interpretation, or it may be used to reduce hardware and operating system
dependence by allowing the same code to run on different platforms. Bytecode may often be either directly executed on a virtual machine
(i.e. interpreter), or it may be further compiled into machine code for better performance.
Unlike human-readable
source code
, bytecodes are compact numeric codes, constants, and references (normally numeric addresses) which encode the result of parsing and semantic analysis
of things like type, scope, and nesting depths of program objects. They therefore allow much better performance than direct interpretation of source code.
" (JIT) compilers, translate bytecode into machine language as necessary at runtime: this makes the virtual machine unportable, but doesn't lose the portability of the bytecode itself. For example, Java
and Smalltalk
code is typically stored in bytecoded format, which is typically then JIT compiled to translate the bytecode to machine code before execution. This introduces a delay before a program is run, when bytecode is compiled to native machine code, but improves execution speed considerably compared to direct interpretation of the source code—normally by several magnitudes.
Because of its performance advantage, today many language implementations execute a program in two phases, first compiling the source code into bytecode, and then passing the bytecode to the virtual machine. Therefore, there are virtual machines for Java
, Python
, PHP
, Forth, and Tcl
. The implementation of Perl
and Ruby 1.8 instead work by walking an abstract syntax tree
representation derived from the source code.
P-Code machine
In computer programming, a p-code machine, or portable code machine is a virtual machine designed to execute p-code...
(portable code), is a term which has been used to denote various forms of instruction set
Instruction set
An instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...
s designed for efficient execution by a software interpreter
Interpreter (computing)
In computer science, an interpreter normally means a computer program that executes, i.e. performs, instructions written in a programming language...
as well as being suitable for further compilation into machine code. Since instructions are processed by software, they may be arbitrarily complex, but are nonetheless often akin to traditional hardware instructions; virtual stack machine
Stack machine
A stack machine may be* A real or emulated computer that evaluates each sub-expression of a program statement via a pushdown data stack and uses a reverse Polish notation instruction set....
s are the most common, but virtual register machine
Register machine
In mathematical logic and theoretical computer science a register machine is a generic class of abstract machines used in a manner similar to a Turing machine...
s have also been built. Different parts may often be stored in separate files, similar to object modules
Object file
An object file is a file containing relocatable format machine code that is usually not directly executable. Object files are produced by an assembler, compiler, or other language translator, and used as input to the linker....
, but dynamically loaded during execution.
The name bytecode stems from instruction sets which have one-byte
Byte
The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...
opcode
Opcode
In computer science engineering, an opcode is the portion of a machine language instruction that specifies the operation to be performed. Their specification and format are laid out in the instruction set architecture of the processor in question...
s followed by optional parameters. Intermediate representations such as bytecode may be output by programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
implementations to ease interpretation, or it may be used to reduce hardware and operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
dependence by allowing the same code to run on different platforms. Bytecode may often be either directly executed on a virtual machine
Virtual machine
A virtual machine is a "completely isolated guest operating system installation within a normal host operating system". Modern virtual machines are implemented with either software emulation or hardware virtualization or both together.-VM Definitions:A virtual machine is a software...
(i.e. interpreter), or it may be further compiled into machine code for better performance.
Unlike human-readable
Human-readable
A human-readable medium or human-readable format is a representation of data or information that can be naturally read by humans.In computing, human-readable data is often encoded as ASCII or Unicode text, rather than presented in a binary representation...
source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...
, bytecodes are compact numeric codes, constants, and references (normally numeric addresses) which encode the result of parsing and semantic analysis
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...
of things like type, scope, and nesting depths of program objects. They therefore allow much better performance than direct interpretation of source code.
Execution
A bytecode program may be executed by parsing and directly executing the instructions, one at a time. This kind of bytecode interpreter is very portable. Some systems, called dynamic translators, or "just-in-timeJust-in-time compilation
In computing, just-in-time compilation , also known as dynamic translation, is a method to improve the runtime performance of computer programs. Historically, computer programs had two modes of runtime operation, either interpreted or static compilation...
" (JIT) compilers, translate bytecode into machine language as necessary at runtime: this makes the virtual machine unportable, but doesn't lose the portability of the bytecode itself. For example, Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
and Smalltalk
Smalltalk
Smalltalk is an object-oriented, dynamically typed, reflective programming language. Smalltalk was created as the language to underpin the "new world" of computing exemplified by "human–computer symbiosis." It was designed and created in part for educational use, more so for constructionist...
code is typically stored in bytecoded format, which is typically then JIT compiled to translate the bytecode to machine code before execution. This introduces a delay before a program is run, when bytecode is compiled to native machine code, but improves execution speed considerably compared to direct interpretation of the source code—normally by several magnitudes.
Because of its performance advantage, today many language implementations execute a program in two phases, first compiling the source code into bytecode, and then passing the bytecode to the virtual machine. Therefore, there are virtual machines for Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
, Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
, PHP
PHP
PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...
, Forth, and Tcl
Tcl
Tcl is a scripting language created by John Ousterhout. Originally "born out of frustration", according to the author, with programmers devising their own languages intended to be embedded into applications, Tcl gained acceptance on its own...
. The implementation of Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...
and Ruby 1.8 instead work by walking an abstract syntax tree
Abstract syntax tree
In computer science, an abstract syntax tree , or just syntax tree, is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code. The syntax is 'abstract' in the sense that it...
representation derived from the source code.
Examples
- ActionScriptActionScriptActionScript is an object-oriented language originally developed by Macromedia Inc. . It is a dialect of ECMAScript , and is used primarily for the development of websites and software targeting the Adobe Flash Player platform, used on Web pages in the form of...
executes in the ActionScript Virtual Machine (AVM), which is part of Flash Player and AIR. ActionScript code is typically transformed into bytecode format by a compiler. Examples of compilers include the one built in to Adobe Flash Professional and the one that is built in to Adobe Flash Builder and available in the Adobe Flex SDK. - Adobe FlashAdobe FlashAdobe Flash is a multimedia platform used to add animation, video, and interactivity to web pages. Flash is frequently used for advertisements, games and flash animations for broadcast...
objects - BANCStarBANCStar programming languageIn computing, BANCStar is a specialist programming language for financial applications. The language is an internal language for the National Financial Computer Services, Inc BANCStar application, which is software to automate the operations of a bank branch.The language resembles an esoteric...
, originally bytecode for an interface-building tool but used as a language in its own right. - Byte Code Engineering LibraryByte Code Engineering LibraryThe Byte Code Engineering Library is a project sponsored by the Apache Foundation under their Jakarta charter to provide a simple API for decomposing, modifying, and recomposing binary Java classes . The project was originally conceived and developed by Markus Dahm prior to officially being...
- C to Java Virtual Machine compilers
- CLISPCLISPIn computing, CLISP is an implementation of the programming language Common Lisp originally developed by Bruno Haible and Michael Stoll for the Atari ST...
implementation of Common LispCommon LispCommon Lisp, commonly abbreviated CL, is a dialect of the Lisp programming language, published in ANSI standard document ANSI INCITS 226-1994 , . From the ANSI Common Lisp standard the Common Lisp HyperSpec has been derived for use with web browsers...
compiles only to bytecode - CMUCLCMUCLCMUCL is a free Common Lisp implementation, originally developed at Carnegie Mellon University.CMUCL runs on most Unix-like platforms, including Linux and BSD; there is an experimental Windows port as well. Steel Bank Common Lisp is derived from CMUCL...
and Scieneer Common LispScieneer Common LispScieneer Common Lisp is a commercial implementation of the Common Lisp programming language featuring support for Symmetric multiprocessing on a range of Linux, Solaris and HP-UX platforms. The compiler generates fast 64-bit and 32-bit native code.-License:...
implementations of Common LispCommon LispCommon Lisp, commonly abbreviated CL, is a dialect of the Lisp programming language, published in ANSI standard document ANSI INCITS 226-1994 , . From the ANSI Common Lisp standard the Common Lisp HyperSpec has been derived for use with web browsers...
can compile either to bytecode or to native code; bytecode is much more compact - Dalvik bytecode, designed for the Android platform, is executed by the Dalvik virtual machineDalvik virtual machineDalvik is the process virtual machine in Google's Android operating system. It is the software that runs the apps on Android phones. Dalvik is thus an integral part of Android, which is typically used on mobile devices such as mobile phones, tablet computers and netbooks. Programs are commonly...
. - EiffelStudioEiffelStudioEiffelStudio is a development environment for the Eiffel programming language developed and distributed by Eiffel Software.EiffelStudio includes a combination of tools integrated under a single user interface: compiler, interpreter, debugger, browser, metrics tool, profiler, diagram tool...
for the EiffelEiffel (programming language)Eiffel is an ISO-standardized, object-oriented programming language designed by Bertrand Meyer and Eiffel Software. The design of the language is closely connected with the Eiffel programming method...
programming language - EmacsEmacsEmacs is a class of text editors, usually characterized by their extensibility. GNU Emacs has over 1,000 commands. It also allows the user to combine these commands into macros to automate work.Development began in the mid-1970s and continues actively...
is a text editor with a majority of its functionality implemented by its specific dialectEmacs LispEmacs Lisp is a dialect of the Lisp programming language used by the GNU Emacs and XEmacs text editors . It is used for implementing most of the editing functionality built into Emacs, the remainder being written in C...
of Lisp. These features are compiled into bytecode. This architecture allows users to customize the editor with a high level language, which after compilation into bytecode yields reasonable performance. - Embeddable Common LispEmbeddable Common LispEmbeddable Common Lisp is a LGPL Common Lisp implementation aimed at producing a small-footprint Lisp system that can be embedded into existing C-based applications...
implementation of Common LispCommon LispCommon Lisp, commonly abbreviated CL, is a dialect of the Lisp programming language, published in ANSI standard document ANSI INCITS 226-1994 , . From the ANSI Common Lisp standard the Common Lisp HyperSpec has been derived for use with web browsers...
can compile to bytecode or C code - Ericsson implementation of Erlang uses BEAM bytecodes
- Icon and Unicon programming languages
- InfocomInfocomInfocom was a software company, based in Cambridge, Massachusetts, that produced numerous works of interactive fiction. They also produced one notable business application, a relational database called Cornerstone....
used the Z-machineZ-machineThe Z-machine is a virtual machine that was developed by Joel Berez and Marc Blank in 1979 and used by Infocom for its text adventure games. Infocom compiled game code to files containing Z-machine instructions , and could therefore port all its text adventures to a new platform simply by writing a...
to make its software applications more portable. - Java bytecodeJava bytecodeJava bytecode is the form of instructions that the Java virtual machine executes. Each bytecode opcode is one byte in length, although some require parameters, resulting in some multi-byte instructions. Not all of the possible 256 opcodes are used. 51 are reserved for future use...
, which is executed by the Java Virtual MachineJava Virtual MachineA Java virtual machine is a virtual machine capable of executing Java bytecode. It is the code execution component of the Java software platform. Sun Microsystems stated that there are over 4.5 billion JVM-enabled devices.-Overview:...
- ASMObjectWeb ASMThe ASM library is a project of the ObjectWeb consortium. It provides a simple API for decomposing, modifying, and recomposing binary Java classes . The project was originally conceived and developed by Eric Bruneton...
- BCEL
- JavassistJavassistJavassist is a Java library providing a means to manipulate the Java bytecode of an application. In this sense Javassist provides the support for structural reflection, i.e...
- JMangler
- ASM
- LLVM, a modular bytecode compiler and virtual machine
- Lua, using a register-based virtual machine, also compiles LUAC forms of its scripts for small fast systems that need not include the compiler.
- m-code of the MATLABMATLABMATLAB is a numerical computing environment and fourth-generation programming language. Developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages,...
programming language - Managed codeManaged codeManaged code is a term coined by Microsoft to identify computer program code that requires and will only execute under the "management" of a Common Language Runtime virtual machine ....
such as Microsoft .NET.NET FrameworkThe .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...
Common Intermediate LanguageCommon Intermediate LanguageCommon Intermediate Language is the lowest-level human-readable programming language defined by the Common Language Infrastructure specification and is used by the .NET Framework and Mono...
, executed by the .NET Common Language RuntimeCommon Language RuntimeThe Common Language Runtime is the virtual machine component of Microsoft's .NET framework and is responsible for managing the execution of .NET programs. In a process known as just-in-time compilation, the CLR compiles the intermediate language code known as CIL into the machine instructions...
(CLR) - O-codeO-code machineThe O-code machine is a virtual machine that was developed by Martin Richards in the late 1960s to give machine independence to BCPL, the low-level forerunner to C and C++. The concept behind the O-Code machine was to create O-code output through the BCPL compiler. The O-code was then either...
of the BCPLBCPLBCPL is a procedural, imperative, and structured computer programming language designed by Martin Richards of the University of Cambridge in 1966.- Design :...
programming language - Objective CamlObjective CamlOCaml , originally known as Objective Caml, is the main implementation of the Caml programming language, created by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy and others in 1996...
(Ocaml) programming language optionally compiles to a compact bytecode form - p-codeP-Code machineIn computer programming, a p-code machine, or portable code machine is a virtual machine designed to execute p-code...
of UCSD PascalUCSD PascalUCSD Pascal was a Pascal programming language system that ran on the UCSD p-System, a portable, highly machine-independent operating system. UCSD Pascal was first released in 1978...
implementation of the PascalPascal (programming language)Pascal is an influential imperative and procedural programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small and efficient language intended to encourage good programming practices using structured programming and data structuring.A derivative known as Object Pascal...
programming language - Parrot virtual machineParrot virtual machineParrot is a register-based process virtual machine designed to run dynamic languages efficiently. It uses just-in-time compilation for speed to reduce the interpretation overhead. It is currently possible to compile Parrot assembly language and PIR to Parrot bytecode and execute it...
- The R environment for statistical computingR (programming language)R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....
offers a byte code compiler through the compiler package, now standard with R version 2.13.0. It is possible to compile this version of R so that the base and recommended packages take advantage of this. - Scheme 48Scheme 48Scheme 48 is a free software Scheme implementation using a bytecode interpreter. It has a foreign function interface for calling functions from C and comes with a regex library, and a POSIX interface...
implementation of Scheme using bytecode interpreter - Bytecodes of many implementations of the SmalltalkSmalltalkSmalltalk is an object-oriented, dynamically typed, reflective programming language. Smalltalk was created as the language to underpin the "new world" of computing exemplified by "human–computer symbiosis." It was designed and created in part for educational use, more so for constructionist...
programming language - The SPIN interpreter built into the ParallaxParallax, Inc. (company)Parallax Inc. is a privately held company in Rocklin, California. Parallax Inc. designs, manufactures, and sells BASIC Stamp microcontrollers, Propeller microcontrollers, microcontroller accessories Parallax Inc. is a privately held company in Rocklin, California. Parallax Inc. designs,...
PropellerParallax PropellerThe Parallax P8X32A Propeller chip, introduced in 2006, is a multi-core architecture parallel microcontroller with eight 32-bit RISC CPU cores....
MicrocontrollerMicrocontrollerA microcontroller is a small computer on a single integrated circuit containing a processor core, memory, and programmable input/output peripherals. Program memory in the form of NOR flash or OTP ROM is also often included on chip, as well as a typically small amount of RAM... - SWEET16SWEET16SWEET16 is an interpreted "byte-code" language invented by Steve Wozniak and implemented as part of the Integer BASIC ROM in the Apple II series of computers...
- Visual FoxProVisual FoxProVisual FoxPro is a data-centric object-oriented and procedural programming language produced by Microsoft. It is derived from FoxPro which was developed by Fox Software beginning in 1984. Fox Technologies merged with Microsoft in 1992, after which the software acquired further features and the...
compiles to bytecode - YARVYARVYARV is a bytecode interpreter that was developed for the Ruby programming language by Koichi Sasada. The goal of the project was to greatly reduce the execution time of Ruby programs....
and RubiniusRubiniusRubinius is an alternative Ruby programming language implementation created by Evan Phoenix. Based loosely on the Smalltalk-80 Blue Book design, Rubinius seeks to"provide a rich, high-performance environment for running Ruby code."-Goals:...
for RubyRuby (programming language)Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...
.