Interrupt storm
Encyclopedia
In operating system
s, an interrupt storm is an event during which a processor receives an inordinate number of interrupt
s that consume the majority of the processor's time. Interrupt storms are typically caused by hardware devices that do not support interrupt rate limiting.
processing is typically a non-preemptible task in time-sharing
operating system
s, an interrupt storm will cause low perceived system responsiveness, or even appear to be a complete system freeze. This state is commonly known as live lock. In such a state, the system is spending so much time processing interrupts that it is not completing any other work. Therefore, it does not appear to be processing anything at all, because of a lack of output to the user, the network, or otherwise. An interrupt storm is sometimes mistaken for thrashing, since they both have similar symptoms, but different causes.
An interrupt storm can have many different causes, including misconfigured or faulty hardware devices, faulty device drivers, or flaws in the operating system. Most modern hardware implement methods for reducing or eliminating the possibility of an interrupt storm. For example, many Ethernet
controllers implement interrupt "rate limiting", which causes the controller to wait a programmable minimum amount of time between each interrupt it generates.
The most common interrupt storm is a faulty driver
under an APIC
(Advanced Programmable Interrupt Controller) where a device "behind" another signals an interrupt to the APIC. The OS then asks each driver on that interrupt if it was from its hardware. Faulty drivers may always claim "yes", but then proceed no further as the hardware attached actually did not interrupt. The device which originally interrupted did not get its interrupt serviced, so interrupts again and the cycle begins anew. Many operating systems, e.g. Linux
, lock dead under an interrupt storm; others have mechanisms to avoid it. This was (and remains) a problem on the SoundBlaster Live! series of sound card
s on some motherboard
s; only a kernel debugger
can break the storm by unloading the faulty driver.
Many OSes implement a polling
mode that disables interrupts for devices which generate too many interrupts. In this mode, the OS periodically queries the hardware for pending tasks. As the number of interrupts increase and the efficiency of an interrupt mode diminishes, an OS may change the interrupting device from an interrupt mode to a polling mode. Likewise, as the polling mode becomes less efficient than the interrupt mode, the OS will switch the device back to the interrupt mode. The implementation of interrupt rate limiting in hardware almost negates the need for such polling modes.
controller with interrupt rate limiting will buffer the packets it receives from the network in between each interrupt. If the rate is set too high, the controller's buffer will overflow, and packets will be dropped. The rate must take into account how fast the buffer may fill between interrupts, and the interrupt latency
between the interrupt and the transfer of the buffer to the system.
detects interrupt storms and masks problematic interrupt for some time.
An example of a hardware-based approach is the one used by NAPI, where the system (driver) starts in interrupt enabled state. The Interrupt handler
then disables the interrupt and lets a thread/task handle the event(s) and then task polls the device, processing some number of events and enabling the interrupt.
Another interesting approach using hardware support is one where the device generates interrupt when the event queue state changes from "empty" to "not empty". Then, if there are no free DMA descriptors at the RX FIFO tail, the device drops the event. The event is then added to the tail and the FIFO entry is marked as occupied. If at that point entry (tail−1) is free (cleared), an interrupt will be generated (level interrupt) and the tail pointer will be incremented. If the hardware requires the interrupt be acknowledged, the CPU (interrupt handler) will do that, handle the valid DMA descriptors at the head, and return from the interrupt.
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
s, an interrupt storm is an event during which a processor receives an inordinate number of interrupt
Interrupt
In computing, an interrupt is an asynchronous signal indicating the need for attention or a synchronous event in software indicating the need for a change in execution....
s that consume the majority of the processor's time. Interrupt storms are typically caused by hardware devices that do not support interrupt rate limiting.
Background
Because interruptInterrupt
In computing, an interrupt is an asynchronous signal indicating the need for attention or a synchronous event in software indicating the need for a change in execution....
processing is typically a non-preemptible task in time-sharing
Time-sharing
Time-sharing is the sharing of a computing resource among many users by means of multiprogramming and multi-tasking. Its introduction in the 1960s, and emergence as the prominent model of computing in the 1970s, represents a major technological shift in the history of computing.By allowing a large...
operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
s, an interrupt storm will cause low perceived system responsiveness, or even appear to be a complete system freeze. This state is commonly known as live lock. In such a state, the system is spending so much time processing interrupts that it is not completing any other work. Therefore, it does not appear to be processing anything at all, because of a lack of output to the user, the network, or otherwise. An interrupt storm is sometimes mistaken for thrashing, since they both have similar symptoms, but different causes.
An interrupt storm can have many different causes, including misconfigured or faulty hardware devices, faulty device drivers, or flaws in the operating system. Most modern hardware implement methods for reducing or eliminating the possibility of an interrupt storm. For example, many Ethernet
Ethernet
Ethernet is a family of computer networking technologies for local area networks commercially introduced in 1980. Standardized in IEEE 802.3, Ethernet has largely replaced competing wired LAN technologies....
controllers implement interrupt "rate limiting", which causes the controller to wait a programmable minimum amount of time between each interrupt it generates.
The most common interrupt storm is a faulty driver
Device driver
In computing, a device driver or software driver is a computer program allowing higher-level computer programs to interact with a hardware device....
under an APIC
Advanced Programmable Interrupt Controller
In computing, an Advanced Programmable Interrupt Controller is a more complex Programmable Interrupt Controller than Intel's original types such as the 8259A...
(Advanced Programmable Interrupt Controller) where a device "behind" another signals an interrupt to the APIC. The OS then asks each driver on that interrupt if it was from its hardware. Faulty drivers may always claim "yes", but then proceed no further as the hardware attached actually did not interrupt. The device which originally interrupted did not get its interrupt serviced, so interrupts again and the cycle begins anew. Many operating systems, e.g. Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
, lock dead under an interrupt storm; others have mechanisms to avoid it. This was (and remains) a problem on the SoundBlaster Live! series of sound card
Sound card
A sound card is an internal computer expansion card that facilitates the input and output of audio signals to and from a computer under control of computer programs. The term sound card is also applied to external audio interfaces that use software to generate sound, as opposed to using hardware...
s on some motherboard
Motherboard
In personal computers, a motherboard is the central printed circuit board in many modern computers and holds many of the crucial components of the system, providing connectors for other peripherals. The motherboard is sometimes alternatively known as the mainboard, system board, or, on Apple...
s; only a kernel debugger
Debugger
A debugger or debugging tool is a computer program that is used to test and debug other programs . The code to be examined might alternatively be running on an instruction set simulator , a technique that allows great power in its ability to halt when specific conditions are encountered but which...
can break the storm by unloading the faulty driver.
Many OSes implement a polling
Polling (computer science)
Polling, or polled operation, in computer science, refers to actively sampling the status of an external device by a client program as a synchronous activity. Polling is most often used in terms of input/output , and is also referred to as polled or software driven .Polling is sometimes used...
mode that disables interrupts for devices which generate too many interrupts. In this mode, the OS periodically queries the hardware for pending tasks. As the number of interrupts increase and the efficiency of an interrupt mode diminishes, an OS may change the interrupting device from an interrupt mode to a polling mode. Likewise, as the polling mode becomes less efficient than the interrupt mode, the OS will switch the device back to the interrupt mode. The implementation of interrupt rate limiting in hardware almost negates the need for such polling modes.
History
Perhaps the first interrupt storm occurred during the Apollo 11's lunar descent in 1969.Considerations
Interrupt rate limiting must be carefully configured for optimum results. For example, an EthernetEthernet
Ethernet is a family of computer networking technologies for local area networks commercially introduced in 1980. Standardized in IEEE 802.3, Ethernet has largely replaced competing wired LAN technologies....
controller with interrupt rate limiting will buffer the packets it receives from the network in between each interrupt. If the rate is set too high, the controller's buffer will overflow, and packets will be dropped. The rate must take into account how fast the buffer may fill between interrupts, and the interrupt latency
Interrupt latency
In real-time operating systems, interrupt latency is the time between the generation of an interrupt by a device and the servicing of the device which generated the interrupt. For many operating systems, devices are serviced as soon as the device's interrupt handler is executed...
between the interrupt and the transfer of the buffer to the system.
Interrupt mitigating
There are hardware-based and software-based approaches to the problem. For an example of a software-based approach, FreeBSDFreeBSD
FreeBSD is a free Unix-like operating system descended from AT&T UNIX via BSD UNIX. Although for legal reasons FreeBSD cannot be called “UNIX”, as the direct descendant of BSD UNIX , FreeBSD’s internals and system APIs are UNIX-compliant...
detects interrupt storms and masks problematic interrupt for some time.
An example of a hardware-based approach is the one used by NAPI, where the system (driver) starts in interrupt enabled state. The Interrupt handler
Interrupt handler
An interrupt handler, also known as an interrupt service routine , is a callback subroutine in microcontroller firmware, operating system or device driver whose execution is triggered by the reception of an interrupt...
then disables the interrupt and lets a thread/task handle the event(s) and then task polls the device, processing some number of events and enabling the interrupt.
Another interesting approach using hardware support is one where the device generates interrupt when the event queue state changes from "empty" to "not empty". Then, if there are no free DMA descriptors at the RX FIFO tail, the device drops the event. The event is then added to the tail and the FIFO entry is marked as occupied. If at that point entry (tail−1) is free (cleared), an interrupt will be generated (level interrupt) and the tail pointer will be incremented. If the hardware requires the interrupt be acknowledged, the CPU (interrupt handler) will do that, handle the valid DMA descriptors at the head, and return from the interrupt.
See also
- Broadcast radiationBroadcast radiationBroadcast radiation is the accumulation of broadcast and multicast traffic on a computer network. Extreme amounts of broadcast traffic constitute a broadcast storm. A broadcast storm can consume sufficient network resources so as to render the network unable to transport normal traffic.-Causes:Most...
- Inter-processor interruptInter-Processor InterruptAn inter-processor interrupt is a special type of interrupt by which one processor may interrupt another processor in a multiprocessor system. IPIs are typically used to implement a cache coherency synchronization point.- Windows :...
(IPI) - Non-maskable interruptNon-Maskable interruptA non-maskable interrupt is a computer processor interrupt that cannot be ignored by standard interrupt masking techniques in the system. It is typically used to signal attention for non-recoverable hardware errors...
(NMI) - Programmable Interrupt ControllerProgrammable Interrupt ControllerIn computing, a programmable interrupt controller is a device that is used to combine several sources of interrupt onto one or more CPU lines, while allowing priority levels to be assigned to its interrupt outputs. When the device has multiple interrupt outputs to assert, it will assert them in...
(PIC)