Machine Check Exception
Encyclopedia
A Machine Check Exception (MCE) is a type of computer hardware
Computer hardware
Personal computer hardware are component devices which are typically installed into or peripheral to a computer case to create a personal computer upon which system software is installed including a firmware interface such as a BIOS and an operating system which supports application software that...

 error
Error
The word error entails different meanings and usages relative to how it is conceptually applied. The concrete meaning of the Latin word "error" is "wandering" or "straying". Unlike an illusion, an error or a mistake can sometimes be dispelled through knowledge...

 that occurs when a computer
Computer
A computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...

's central processing unit
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...

 detects a hardware problem.

Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

 displays the error using the blue screen of death
Blue Screen of Death
To forse a BSOD Open regedit.exe,Then search: HKLM\SYSTEM\CurrentControlSet\services\i8042prt\ParametersThen make a new DWORD called "CrashOnCtrlScroll" And set the value to 1....

 containing the error message (the parameters inside the brackets vary):
STOP: 0x0000009C (0x00000004, 0x00000000, 0xB2000000, 0x00020151) "MACHINE_CHECK_EXCEPTION"

On Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

, a process (such as klogd
) writes a message to the kernel log and/or the console screen (usually only to the console when the error is non-recoverable and the machine crashes as a result):
CPU 0: Machine Check Exception: 0000000000000004
Bank 2: f200200000000863
Kernel panic: CPU context corrupt


The error usually occurs due to failure or overstressing of hardware components where the error cannot be more specifically identified with a different error message. Diagnosing the error message can be difficult, although Intel Pentium processors do generate more specific codes which can be decoded by contacting the manufacturer.

MCEs require a restart of the system before users can continue normal operation: they often indicate a long-term problem of a general nature.

Problem types

Most of these errors relate specifically to the Pentium processor family. Similar errors may occur on other processors and will cause similar problems.

Some of the main hardware problems that cause MCEs include:
  • System bus
    System bus
    A system bus is a single computer bus that connects the major components of a computer system. The technique was developed to reduce costs and improve modularity....

     errors (error communicating between the processor and the motherboard
    Motherboard
    In personal computers, a motherboard is the central printed circuit board in many modern computers and holds many of the crucial components of the system, providing connectors for other peripherals. The motherboard is sometimes alternatively known as the mainboard, system board, or, on Apple...

    ).
  • Memory errors that may include parity
    Parity bit
    A parity bit is a bit that is added to ensure that the number of bits with the value one in a set of bits is even or odd. Parity bits are used as the simplest form of error detecting code....

     / Error correction code (ECC) problems. Error checking ensures that data is stored correctly in the RAM; if information is corrupted, then random errors occur.
  • Cache
    CPU cache
    A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations...

     errors in the processor; the cache stores important data and code. If this is corrupted, errors often occur.

Causes

Normal causes for MCE errors include overheating and/or incorrect hardware installation. Some specific manually-induced causes could include:
  • overclocking
    Overclocking
    Overclocking is the process of operating a computer component at a higher clock rate than it was designed for or was specified by the manufacturer, but some manufacturers purposely underclock their components to improve battery life. Many people just overclock or 'rightclock' their hardware to...

     (which normally increases heat-output)
  • poorly fitted heatsink/computer fan
    Computer fan
    A computer fan is any fan inside, or attached to, a computer case used for cooling purposes, and may refer to fans that draw cooler air into the case from the outside, expel warm air from inside, or move air across a heatsink to cool a particular component...

    s (the same problem can happen with excessive dust in the CPU fan)
  • an overloaded internal or external power supply (fixable by upgrading)


Computer software can also cause MCE errors (normally by corrupting data which programs read or write). For example, software performing read or write operations from or to non-existent memory regions can lead to confusion for the processor and/or the system bus.

Decoding MCEs

As noted previously, decoding MCE errors can prove difficult. Normally the manufacturer (especially processor manufacturers) will be able to provide information about specific codes. Consult the Intel 64 and IA-32 Architectures Software Developer's Manual Chapter 15 (Machine-Check Architecture), or the Microsoft KB Article on Windows Exceptions.

Programs to Decode MCEs

mcat: A Windows command-line program from AMD to decode MCEs from AMD K8
AMD K8
The AMD K8 is a computer processor microarchitecture designed by AMD as the successor to the AMD K7 microarchitecture. The K8 was the first implementation of the AMD64 64-bit extension to the x86 processor architecture.Processors based on the K8 core include:...

, Family 0x10
AMD K10
The AMD Family 10h is a microprocessor microarchitecture by AMD. Though there were once reports that the K10 had been canceled, the first third-generation Opteron products for servers were launched on September 10, 2007, with the Phenom processors for desktops following and launching on November...

 and 0x11 processors
mcelog: A Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

 daemon by Andi Kleen to handle MCEs for modern x86 processors. mcelog can also decode machine checks.
parsemce: A Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

 program by Dave Jones to decode MCEs from AMD K7 processors
mced: A Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

 program by Tim Hockin to gather MCEs from the kernel and alert interested applications. The primary difference between this app and others is that this is a daemon (it is always running) which means that it can get MCE notifications as soon as the kernel finds them. It does not try to interpret the MCE data, just alert other apps.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK