TCP congestion avoidance algorithm
Encyclopedia
Transmission Control Protocol
(TCP) uses a network congestion avoidance algorithm that includes various aspects of an additive increase/multiplicative decrease (AIMD) scheme, with other schemes such as slow-start
in order to achieve congestion avoidance.
The TCP congestion avoidance algorithm is the primary basis for congestion control in the Internet.
and the city of Reno, Nevada
). The “Tahoe” algorithm first appeared in 4.3BSD-Tahoe (which was made to support the CCI Power 6/32 “Tahoe” minicomputer), and was made available to non-AT&T licensees as part of the “4.3BSD Networking Release 1”; this ensured its wide distribution and implementation. Improvements, described below, were made in 4.3BSD-Reno and subsequently released to the public as “Networking Release 2” and later 4.4BSD-Lite. The “TCP Foo” names for the algorithms appear to have originated in a 1996 paper by Kevin Fall and Sally Floyd.
, limiting the total number of unacknowledged packets that may be in transit end-to-end. This is somewhat analogous to TCP's sliding window
used for flow control. TCP uses a mechanism called slow start
to increase the congestion window after a connection is initialized and after a timeout. It starts with a window of two times the maximum segment size
(MSS). Although the initial rate is low, the rate of increase is very rapid: for every packet acknowledged, the congestion window increases by 1 MSS so that the congestion window effectively doubles for every round trip time
(RTT). When the congestion window exceeds a threshold ssthresh the algorithm enters a new state, called congestion avoidance. In some implementations (e.g., Linux), the initial ssthresh is large, and so the first slow start usually ends after a loss. However, ssthresh is updated at the end of each slow start, and will often affect subsequent slow starts triggered by timeouts.
Congestion avoidance: As long as non-duplicate ACKs are received, the congestion window is additively increased by one MSS every round trip time. When a packet is lost, the likelihood of duplicate ACKs being received is very high (it's possible though unlikely that the stream just underwent extreme packet reordering, which would also prompt duplicate ACKs). The behavior of Tahoe and Reno differ in how they detect and react to packet loss:
Fast Recovery. (Reno Only) In this state, TCP retransmits the missing packet that was signaled by three duplicate ACKs, and waits for an acknowledgment of the entire transmit window before returning to congestion avoidance. If there is no acknowledgment, TCP Reno experiences a timeout and enters the slow-start state.
Both algorithms reduce congestion window to 1 MSS on a timeout event.
researchers Larry Peterson and Lawrence Brakmo
introduced TCP Vegas, in which timeouts were set and round-trip delays were measured for every packet in the transmit buffer. In addition, TCP Vegas uses additive increases in the congestion window. This variant was not widely deployed outside Peterson's laboratory.
However, TCP Vegas was deployed as default congestion control method for DD-WRT firmwares v24 SP2.
Because the timeout timer is reset whenever there is progress in the transmit buffer, this allows New Reno to fill large holes, or multiple holes, in the sequence space - much like TCP SACK. Because New Reno can send new packets at the end of the congestion window during fast recovery, high throughput is maintained during the hole-filling process, even when there are multiple holes, of multiple packets each. When TCP enters fast recovery it records the highest outstanding unacknowledged packet sequence number. When this sequence number is acknowledged, TCP returns to the congestion avoidance state.
A problem occurs with New Reno when there are no packet losses but instead, packets are reordered by more than 3 packet sequence numbers. When this happens, New Reno mistakenly enters fast recovery, but when the reordered packet is delivered, ACK sequence-number progress occurs and from there until the end of fast recovery, every bit of sequence-number progress produces a duplicate and needless retransmission that is immediately ACKed.
New Reno performs as well as SACK at low packet error rates, and substantially outperforms Reno at high error rates.
is an implementation of
TCP
with an optimized congestion control algorithm for high speed networks with high latency (called LFN, long fat networks, in RFC 1072). BIC is used by default in Linux kernel
s 2.6.8 through 2.6.18.
is a less aggressive and more systematic derivative of BIC, in which the window is a cubic function of time since the last congestion event, with the inflection point set to the window prior to the event. CUBIC is used by default in Linux kernel
s since version 2.6.19.
implementation of TCP which maintains two different congestion windows simultaneously, with the goal of achieving good performance on LFNs while not impairing fairness
. It has been widely deployed with Microsoft Windows Vista
and Windows Server 2008 and has been ported to older Microsoft Windows versions as well as Linux
.
TCP New Reno is the most commonly implemented algorithm, SACK support is very common and is an extension to Reno/New Reno. Most others are competing proposals which still need evaluation. Starting with 2.6.8 the Linux kernel switched the default implementation from Reno to BIC
. The default implementation was again changed to CUBIC in the 2.6.19 version.
When the per-flow product of bandwidth and latency increases, regardless of the queuing scheme, TCP becomes inefficient and prone to instability. This becomes increasingly important as the Internet evolves to incorporate very high-bandwidth optical links.
TCP Interactive (iTCP) allows applications to subscribe to TCP events and respond accordingly enabling various functional extensions to TCP from outside TCP layer. Most TCP congestion schemes work internally. iTCP additionally enables advanced applications to directly participate in congestion control such as to control the source generation rate.
Zeta-TCP
detects the congestions from both the latency and loss rate measures, and applies different CWND backoff strategies based on the likelihood of the congestions to maximize the goodput. It also has a couple of other improvements to accurately detect the packet losses, avoiding RTO retransmission; and accelerate/control the inbound (download) traffic.
Transmission Control Protocol
The Transmission Control Protocol is one of the core protocols of the Internet Protocol Suite. TCP is one of the two original components of the suite, complementing the Internet Protocol , and therefore the entire suite is commonly referred to as TCP/IP...
(TCP) uses a network congestion avoidance algorithm that includes various aspects of an additive increase/multiplicative decrease (AIMD) scheme, with other schemes such as slow-start
Slow-start
Slow-start is part of the congestion control strategy used by TCP, the data transmission protocol used by many Internet applications. Slow-start is used in conjunction with other algorithms to avoid sending more data than the network is capable of transmitting, that is, to avoid causing network...
in order to achieve congestion avoidance.
The TCP congestion avoidance algorithm is the primary basis for congestion control in the Internet.
Naming history
Two such variations are those offered by TCP Tahoe and Reno. The two algorithms were retrospectively named after the 4.3BSD operating system in which each first appeared (which were themselves named after Lake TahoeLake Tahoe
Lake Tahoe is a large freshwater lake in the Sierra Nevada of the United States. At a surface elevation of , it is located along the border between California and Nevada, west of Carson City. Lake Tahoe is the largest alpine lake in North America. Its depth is , making it the USA's second-deepest...
and the city of Reno, Nevada
Reno, Nevada
Reno is the county seat of Washoe County, Nevada, United States. The city has a population of about 220,500 and is the most populous Nevada city outside of the Las Vegas metropolitan area...
). The “Tahoe” algorithm first appeared in 4.3BSD-Tahoe (which was made to support the CCI Power 6/32 “Tahoe” minicomputer), and was made available to non-AT&T licensees as part of the “4.3BSD Networking Release 1”; this ensured its wide distribution and implementation. Improvements, described below, were made in 4.3BSD-Reno and subsequently released to the public as “Networking Release 2” and later 4.4BSD-Lite. The “TCP Foo” names for the algorithms appear to have originated in a 1996 paper by Kevin Fall and Sally Floyd.
TCP Tahoe and Reno
To avoid congestion collapse, TCP uses a multi-faceted congestion control strategy. For each connection, TCP maintains a congestion windowCongestion window
In Transmission Control Protocol , the congestion window is one of the factors that determines the number of bytes that can be outstanding at any time. This is not to be confused with the TCP window size which is maintained by the receiver. This is a means of stopping the link between two places...
, limiting the total number of unacknowledged packets that may be in transit end-to-end. This is somewhat analogous to TCP's sliding window
Sliding Window Protocol
A sliding window protocol is a feature of packet-based data transmission protocols. Sliding window protocols are used where reliable in-order delivery of packets is required, such as in the Data Link Layer as well as in the Transmission Control Protocol .Conceptually, each portion of the...
used for flow control. TCP uses a mechanism called slow start
Slow-start
Slow-start is part of the congestion control strategy used by TCP, the data transmission protocol used by many Internet applications. Slow-start is used in conjunction with other algorithms to avoid sending more data than the network is capable of transmitting, that is, to avoid causing network...
to increase the congestion window after a connection is initialized and after a timeout. It starts with a window of two times the maximum segment size
Maximum segment size
The maximum segment size is a parameter of the TCP protocol that specifies the largest amount of data, specified in octets, that a computer or communications device can receive in a single TCP segment, and therefore in a single IP datagram. It does not count the TCP header or the IP header...
(MSS). Although the initial rate is low, the rate of increase is very rapid: for every packet acknowledged, the congestion window increases by 1 MSS so that the congestion window effectively doubles for every round trip time
Round-trip delay time
In telecommunications, the round-trip delay time or round-trip time is the length of time it takes for a signal to be sent plus the length of time it takes for an acknowledgment of that signal to be received...
(RTT). When the congestion window exceeds a threshold ssthresh the algorithm enters a new state, called congestion avoidance. In some implementations (e.g., Linux), the initial ssthresh is large, and so the first slow start usually ends after a loss. However, ssthresh is updated at the end of each slow start, and will often affect subsequent slow starts triggered by timeouts.
Congestion avoidance: As long as non-duplicate ACKs are received, the congestion window is additively increased by one MSS every round trip time. When a packet is lost, the likelihood of duplicate ACKs being received is very high (it's possible though unlikely that the stream just underwent extreme packet reordering, which would also prompt duplicate ACKs). The behavior of Tahoe and Reno differ in how they detect and react to packet loss:
- Tahoe: Triple duplicate ACKS are treated the same as a timeout. Tahoe will perform "fast retransmitFast retransmitFast Retransmit is an enhancement to TCP which reduces the time a sender waits before retransmitting a lost segment.A TCP sender uses a timer to recognize lost segments...
", reduce congestion window to 1 MSS, and reset to slow-start state. - Reno: If three duplicate ACKs are received (i.e., four ACKs acknowledging the same packet, which are not piggybacked on data, and do not change the receiver's advertised window), Reno will halve the congestion window, perform a fast retransmit, and enter a phase called Fast Recovery. If an ACK times out, slow start is used as it is with Tahoe.
Fast Recovery. (Reno Only) In this state, TCP retransmits the missing packet that was signaled by three duplicate ACKs, and waits for an acknowledgment of the entire transmit window before returning to congestion avoidance. If there is no acknowledgment, TCP Reno experiences a timeout and enters the slow-start state.
Both algorithms reduce congestion window to 1 MSS on a timeout event.
TCP Vegas
Until the mid 1990s, all of TCP's set timeouts and measured round-trip delays were based upon only the last transmitted packet in the transmit buffer. University of ArizonaUniversity of Arizona
The University of Arizona is a land-grant and space-grant public institution of higher education and research located in Tucson, Arizona, United States. The University of Arizona was the first university in the state of Arizona, founded in 1885...
researchers Larry Peterson and Lawrence Brakmo
Lawrence Brakmo
Lawrence Brakmo is currently a member of technical staff at Google. Previously Brakmo was a researcher and project manager at NTT DoCoMo USA Labs. Before that he was affiliated with the Western Research Lab of Digital Equipment Corporation/Compaq/Hewlett-Packard. Brakmo received his Ph.D...
introduced TCP Vegas, in which timeouts were set and round-trip delays were measured for every packet in the transmit buffer. In addition, TCP Vegas uses additive increases in the congestion window. This variant was not widely deployed outside Peterson's laboratory.
However, TCP Vegas was deployed as default congestion control method for DD-WRT firmwares v24 SP2.
TCP New Reno
TCP New Reno improves retransmission during the fast recovery phase of TCP Reno. During fast recovery, for every duplicate ACK that is returned to TCP New Reno, a new unsent packet from the end of the congestion window is sent, to keep the transmit window full. For every ACK that makes partial progress in the sequence space, the sender assumes that the ACK points to a new hole, and the next packet beyond the ACKed sequence number is sent.Because the timeout timer is reset whenever there is progress in the transmit buffer, this allows New Reno to fill large holes, or multiple holes, in the sequence space - much like TCP SACK. Because New Reno can send new packets at the end of the congestion window during fast recovery, high throughput is maintained during the hole-filling process, even when there are multiple holes, of multiple packets each. When TCP enters fast recovery it records the highest outstanding unacknowledged packet sequence number. When this sequence number is acknowledged, TCP returns to the congestion avoidance state.
A problem occurs with New Reno when there are no packet losses but instead, packets are reordered by more than 3 packet sequence numbers. When this happens, New Reno mistakenly enters fast recovery, but when the reordered packet is delivered, ACK sequence-number progress occurs and from there until the end of fast recovery, every bit of sequence-number progress produces a duplicate and needless retransmission that is immediately ACKed.
New Reno performs as well as SACK at low packet error rates, and substantially outperforms Reno at high error rates.
TCP Hybla
TCP Hybla aims to eliminate penalization of TCP connections that incorporate a high-latency terrestrial or satellite radio link, due to their longer round trip times. It stems from an analytical evaluation of the congestion window dynamics, which suggests the necessary modifications to remove the performance dependence on RTT.TCP BIC
Binary Increase Congestion controlBIC TCP
BIC TCP is an implementation of TCP with an optimized congestion control algorithm for high speed networks with high latency: so-called "long fat networks".BIC has a unique congestion window algorithm...
is an implementation of
TCP
Transmission Control Protocol
The Transmission Control Protocol is one of the core protocols of the Internet Protocol Suite. TCP is one of the two original components of the suite, complementing the Internet Protocol , and therefore the entire suite is commonly referred to as TCP/IP...
with an optimized congestion control algorithm for high speed networks with high latency (called LFN, long fat networks, in RFC 1072). BIC is used by default in Linux kernel
Linux kernel
The Linux kernel is an operating system kernel used by the Linux family of Unix-like operating systems. It is one of the most prominent examples of free and open source software....
s 2.6.8 through 2.6.18.
TCP CUBIC
CUBICCUBIC TCP
CUBIC is an implementation of TCP with an optimized congestion control algorithm for high speed networks with high latency ....
is a less aggressive and more systematic derivative of BIC, in which the window is a cubic function of time since the last congestion event, with the inflection point set to the window prior to the event. CUBIC is used by default in Linux kernel
Linux kernel
The Linux kernel is an operating system kernel used by the Linux family of Unix-like operating systems. It is one of the most prominent examples of free and open source software....
s since version 2.6.19.
Compound TCP
Compound TCP is a MicrosoftMicrosoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...
implementation of TCP which maintains two different congestion windows simultaneously, with the goal of achieving good performance on LFNs while not impairing fairness
Fairness measure
Fairness measures or metrics are used in network engineering to determine whether users or applications are receiving a fair share of system resources. There are several mathematical and conceptual definitions of fairness.-TCP fairness:...
. It has been widely deployed with Microsoft Windows Vista
Windows Vista
Windows Vista is an operating system released in several variations developed by Microsoft for use on personal computers, including home and business desktops, laptops, tablet PCs, and media center PCs...
and Windows Server 2008 and has been ported to older Microsoft Windows versions as well as Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
.
Other TCP congestion avoidance algorithms
- FAST TCPFAST TCPFAST TCP is a TCP congestion avoidance algorithm especially targeted at long-distance, high latency links, developed at the , California Institute of Technology and now being commercialized by...
- H-TCPH-TCPH-TCP is another implementation of TCP with an optimized congestion control algorithm for high speed networks with high latency...
- Data Center TCPData Center TCPData Center TCP utilizes Explicit Congestion Notification to enhance the Transmission Control Protocol congestion control algorithm. It is used in data center networks. Whereas the standard TCP congestion control algorithm is only able to detect the presence of congestion, DCTCP, using ECN, is...
- High Speed TCP
- HSTCP-LP
- TCP-IllinoisTCP-IllinoisTCP-Illinois is a variant of TCP congestion control protocol, developed at the University of Illinois at Urbana-Champaign. It is especially targeted at high-speed, long-distance networks...
- TCP-LP
- TCP SACK
- Scalable TCP
- TCP Veno
- WestwoodTCP WestwoodTCP Westwood , is a sender-side-only modification to TCP New Reno that is intended to better handle large bandwidth-delay product paths , with potential packet loss due to transmission or other errors , and with dynamic load .TCP Westwood relies on mining the ACK stream for information to help it...
- Westwood+TCP Westwood plusTCP Westwood+ is a sender-side only modification of the TCP Reno protocol stack that optimizes the performance of TCP congestion control over both wireline and wireless networks. TCP Westwood+ is based on end-to-end bandwidth estimation to set congestion window and slow start threshold after a...
- XCP
- YeAH-TCP
- TCP-FIT
- Congestion Avoidance with Normalized Interval of Time (CANIT)
TCP New Reno is the most commonly implemented algorithm, SACK support is very common and is an extension to Reno/New Reno. Most others are competing proposals which still need evaluation. Starting with 2.6.8 the Linux kernel switched the default implementation from Reno to BIC
BIC TCP
BIC TCP is an implementation of TCP with an optimized congestion control algorithm for high speed networks with high latency: so-called "long fat networks".BIC has a unique congestion window algorithm...
. The default implementation was again changed to CUBIC in the 2.6.19 version.
When the per-flow product of bandwidth and latency increases, regardless of the queuing scheme, TCP becomes inefficient and prone to instability. This becomes increasingly important as the Internet evolves to incorporate very high-bandwidth optical links.
TCP Interactive (iTCP) allows applications to subscribe to TCP events and respond accordingly enabling various functional extensions to TCP from outside TCP layer. Most TCP congestion schemes work internally. iTCP additionally enables advanced applications to directly participate in congestion control such as to control the source generation rate.
Zeta-TCP
Zeta-TCP
Zeta-TCP refers to a set of proprietary TCP algorithms targeted to improve the end-to-end performance of TCP, regardless of whether the peer is Zeta-TCP or any other TCP protocol stack, in other words, to be compatible with the existing TCP algorithms...
detects the congestions from both the latency and loss rate measures, and applies different CWND backoff strategies based on the likelihood of the congestions to maximize the goodput. It also has a couple of other improvements to accurately detect the packet losses, avoiding RTO retransmission; and accelerate/control the inbound (download) traffic.
See also
- Transmission Control Protocol#Development
- Congestion avoidance
- BufferbloatBufferbloatBufferbloat is a phenomenon in a packet-switched computer network whereby excess buffering of packets inside the network causes high latency and jitter, as well as reducing the overall network throughput...