Polymorphic code
Encyclopedia
In computer terminology, polymorphic code is code that uses a polymorphic engine
to mutate while keeping the original algorithm
intact. That is, the code changes itself each time it runs, but the function of the code (its semantics
) will not change at all. This technique is sometimes used by computer virus
es, shellcode
s and computer worm
s to hide their presence.
Encryption
is the most common method to hide code. With encryption, the main body of the code (also called its payload) is encrypted and will appear meaningless. For the code to function as before, a decryption function is added to the code. When the code is executed this function reads the payload and decrypts it before executing it in turn.
Encryption alone is not polymorphism. To gain polymorphic behavior, the encryptor/decryptor pair are mutated with each copy of the code. This allows different versions of some code while all function the same.
. If the security software finds patterns that correspond to known computer viruses or worms, it takes appropriate steps to neutralize the threat. Polymorphic algorithms make it difficult for such software to recognise the offending code because it constantly mutates.
Malicious programmer
s have sought to protect their encrypted code from this virus-scanning strategy by rewriting the unencrypted decryption engine (and the resulting encrypted payload) each time the virus or worm is propagated. Anti-virus software uses sophisticated pattern analysis to find underlying patterns within the different mutations of the decryption engine, in hopes of reliably detecting such malware
.
Emulation may be used to defeat polymorphic obfuscation by letting the malware demangle itself in a virtual environment before utilising other methods, such as traditional signature scanning. Such virtual environment is sometimes called a sandbox
. Polymorphism does not protect the virus against such emulation, if the decrypted payload remains the same regardless of variation in the decryption algorithm. Metamorphic code
techniques may be used to complicate detection further, as the virus may execute without ever having identifiable code blocks in memory that remain constant from infection to infection.
The first known polymorphic virus was written by Mark Washburn. The virus, called 1260
, was written in 1990. A more well-known polymorphic virus was created in 1992 by the hacker Dark Avenger
(a pseudonym
) as a means of avoiding pattern recognition from antivirus software. A common and very virulent polymorphic virus is the file infecter Virut.
An algorithm that uses, for example, the variables A and B but not the variable C could stay intact even if you added lots of code that changed the contents of the variable C.
lots of encrypted code
...
Decryption_Code:
C = C + 1
A = Encrypted
Loop:
B = *A
C = 3214 * A
B = B XOR CryptoKey
*A = B
C = 1
C = A + B
A = A + 1
GOTO Loop IF NOT A = Decryption_Code
C = C^2
GOTO Encrypted
CryptoKey:
some_random_number
The encrypted code is the payload. To make different versions of the code, in each copy the garbage lines which manipulate C will change. The code inside "Encrypted" ("lots of encrypted code") can search the code between Decryption_Code and CryptoKey and e algorithm for new code that does the same thing. Usually the coder uses a zero key (for example; A xor 0 = A) for the first generation of the virus, making it easier for the coder because with this key the code is not encrypted. The coder then implements an incremental key algorithm or a random one.
Polymorphic engine
A polymorphic engine is a computer program that can be used to transform another program into a version that consists of different code with the same functionality...
to mutate while keeping the original algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
intact. That is, the code changes itself each time it runs, but the function of the code (its semantics
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....
) will not change at all. This technique is sometimes used by computer virus
Computer virus
A computer virus is a computer program that can replicate itself and spread from one computer to another. The term "virus" is also commonly but erroneously used to refer to other types of malware, including but not limited to adware and spyware programs that do not have the reproductive ability...
es, shellcode
Shellcode
In computer security, a shellcode is a small piece of code used as the payload in the exploitation of a software vulnerability. It is called "shellcode" because it typically starts a command shell from which the attacker can control the compromised machine. Shellcode is commonly written in...
s and computer worm
Computer worm
A computer worm is a self-replicating malware computer program, which uses a computer network to send copies of itself to other nodes and it may do so without any user intervention. This is due to security shortcomings on the target computer. Unlike a computer virus, it does not need to attach...
s to hide their presence.
Encryption
Encryption
In cryptography, encryption is the process of transforming information using an algorithm to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key. The result of the process is encrypted information...
is the most common method to hide code. With encryption, the main body of the code (also called its payload) is encrypted and will appear meaningless. For the code to function as before, a decryption function is added to the code. When the code is executed this function reads the payload and decrypts it before executing it in turn.
Encryption alone is not polymorphism. To gain polymorphic behavior, the encryptor/decryptor pair are mutated with each copy of the code. This allows different versions of some code while all function the same.
Malicious code
Most anti-virus software and intrusion detection systems (IDS) attempt to locate malicious code by searching through computer files and data packets sent over a computer networkComputer network
A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information....
. If the security software finds patterns that correspond to known computer viruses or worms, it takes appropriate steps to neutralize the threat. Polymorphic algorithms make it difficult for such software to recognise the offending code because it constantly mutates.
Malicious programmer
Programmer
A programmer, computer programmer or coder is someone who writes computer software. The term computer programmer can refer to a specialist in one area of computer programming or to a generalist who writes code for many kinds of software. One who practices or professes a formal approach to...
s have sought to protect their encrypted code from this virus-scanning strategy by rewriting the unencrypted decryption engine (and the resulting encrypted payload) each time the virus or worm is propagated. Anti-virus software uses sophisticated pattern analysis to find underlying patterns within the different mutations of the decryption engine, in hopes of reliably detecting such malware
Malware
Malware, short for malicious software, consists of programming that is designed to disrupt or deny operation, gather information that leads to loss of privacy or exploitation, or gain unauthorized access to system resources, or that otherwise exhibits abusive behavior...
.
Emulation may be used to defeat polymorphic obfuscation by letting the malware demangle itself in a virtual environment before utilising other methods, such as traditional signature scanning. Such virtual environment is sometimes called a sandbox
Sandbox (computer security)
In computer security, a sandbox is a security mechanism for separating running programs. It is often used to execute untested code, or untrusted programs from unverified third-parties, suppliers, untrusted users and untrusted websites....
. Polymorphism does not protect the virus against such emulation, if the decrypted payload remains the same regardless of variation in the decryption algorithm. Metamorphic code
Metamorphic code
In computer virus terms, metamorphic code is code that can reprogram itself. Often, it does this by translating its own code into a temporary representation, editing the temporary representation of itself, and then writing itself back to normal code again. This procedure is done with the virus...
techniques may be used to complicate detection further, as the virus may execute without ever having identifiable code blocks in memory that remain constant from infection to infection.
The first known polymorphic virus was written by Mark Washburn. The virus, called 1260
1260 (computer virus)
1260, or V2PX, was a computer virus written in 1989 by Mark Washburn that used a form of polymorphic encryption. Derived from Ralph Burger's publication of the disassembled Vienna virus source code, the 1260 altered its signature by randomizing and obfuscating its decryption algorithm in an effort...
, was written in 1990. A more well-known polymorphic virus was created in 1992 by the hacker Dark Avenger
Dark Avenger
Dark Avenger was a pseudonym of a computer virus writer from Sofia, Bulgaria. He gained considerable popularity during the early 1990s, as some of his viruses spread not only nationwide, but across Europe as well, even reaching the United States....
(a pseudonym
Pseudonym
A pseudonym is a name that a person assumes for a particular purpose and that differs from his or her original orthonym...
) as a means of avoiding pattern recognition from antivirus software. A common and very virulent polymorphic virus is the file infecter Virut.
Example
This example is not a really polymorphic code but introduce you to the world of encryption with the XOR operator.An algorithm that uses, for example, the variables A and B but not the variable C could stay intact even if you added lots of code that changed the contents of the variable C.
lots of encrypted code
...
Decryption_Code:
C = C + 1
A = Encrypted
Loop:
B = *A
C = 3214 * A
B = B XOR CryptoKey
*A = B
C = 1
C = A + B
A = A + 1
GOTO Loop IF NOT A = Decryption_Code
C = C^2
GOTO Encrypted
CryptoKey:
some_random_number
The encrypted code is the payload. To make different versions of the code, in each copy the garbage lines which manipulate C will change. The code inside "Encrypted" ("lots of encrypted code") can search the code between Decryption_Code and CryptoKey and e algorithm for new code that does the same thing. Usually the coder uses a zero key (for example; A xor 0 = A) for the first generation of the virus, making it easier for the coder because with this key the code is not encrypted. The coder then implements an incremental key algorithm or a random one.
See also
- Timeline of notable computer viruses and wormsTimeline of notable computer viruses and wormsThis is a timeline of noteworthy computer viruses, worms and Trojan horses.- 1966 :* The work of John von Neumann on the "Theory of self-reproducing automata" is published...
- Metamorphic codeMetamorphic codeIn computer virus terms, metamorphic code is code that can reprogram itself. Often, it does this by translating its own code into a temporary representation, editing the temporary representation of itself, and then writing itself back to normal code again. This procedure is done with the virus...
- Self-modifying codeSelf-modifying codeIn computer science, self-modifying code is code that alters its own instructions while it is executing - usually to reduce the instruction path length and improve performance or simply to reduce otherwise repetitively similar code, thus simplifying maintenance...
- Alphanumeric codeAlphanumeric codeIn general, in computing, an alphanumeric code is a series of letters and numbers which are written in a form that can be processed by a computer....
- ShellcodeShellcodeIn computer security, a shellcode is a small piece of code used as the payload in the exploitation of a software vulnerability. It is called "shellcode" because it typically starts a command shell from which the attacker can control the compromised machine. Shellcode is commonly written in...
- Software crackingSoftware crackingSoftware cracking is the modification of software to remove or disable features which are considered undesirable by the person cracking the software, usually related to protection methods: copy protection, trial/demo version, serial number, hardware key, date checks, CD check or software annoyances...
- Security cracking
- Obfuscated codeObfuscated codeObfuscated code is source or machine code that has been made difficult to understand for humans. Programmers may deliberately obfuscate code to conceal its purpose or its logic to prevent tampering, deter reverse engineering, or as a puzzle or recreational challenge for someone reading the source...
- Oligomorphic codeOligomorphic codeAn oligomorphic engine is generally used by a computer virus to generate a decryptor for itself in a way comparable to a simple polymorphic engine. It does this by randomly selecting each piece of the decryptor from several predefined alternatives....