Known problems in TCP algorithms in Linux

David X. Wei

Netlab @ Caltech

Jun 2006

With NS-2 TCP Linux, several problems are found in Linux congestion control algorithms

For all problems that are found by NS-2 TCP Linux, including those that are resolved, see here.

This document only shows the existing problems. The NS-2 TCP Linux release patch includes the original Linux files and hence carries these bugs/problems. They are documented here so that users are expected to pay attention to these bugs before drawing conclusions on a particular congestion control algorithms. -- We want to separate bugs in implementation from problems in algorithm designs.

I will try to update the patch as Linux rolls out new versions with bug fixes.

If you find any other implementation problems, you are welcome to report to me. Thank you.



There are two known problems with Vegas implementation in Linux

Setting of Slow-Start-Threshold

Linux Vegas might be very unfair among different Vegas flows, due to a potential bug with Slow Start Threshold setting.
The following figures (Red: an unfairly large flow; Green: an unfairly small flow) show the problem:

Congestion window trajectory
Rate trajectory

This scenario runs 100 TCP-Vegas flows, with alpha=beta=20 and a hardcoded baseRTT of 100ms, so that the unfairness is not due to wrong baseRTT estimation.
The bottleneck capacity is 1.2Gbps and the edge capacity is 1.5Gbps. Bottleneck buffer is 5000 packets. Round trip propagation delay is 100ms.

The Flow 0 keeps a huge congestion window (1800+ pkt) and grabs most of the bandwidth.
Flow 99 (and many other flows, most of them got a window of (100 pkt) ) get very small share of the bandwidth.

This is due to a potential problem in setting slow start threshold. In the Linux source code, ssthresh is set to 2 when "cwnd is no greater than ssthresh" and "the delay is too large (comparing to gamma)".
This code can be abstracted as:

if (one RTT finishes) {
   if (cwnd <= ssthresh) {
     if ( diff > gamma) {
       set ssthresh to be 2;
  } else { /* congestion avoidance */
     ...(No change on ssthresh)
} else {
   if (cwnd <= ssthresh) tcp_slow_start(tp);

The code works perfectly fine if there is no loss.
But if there is a loss happens before delay is detected, ssthresh will be set by the reno loss recovery algorithm to be cwnd/2. Then the flow exits slow start with a relatively large ssthresh (much larger than 2).
If ssthresh is higher than what the fair cwnd should be, this flow becomes lucky:

This loop keeps the cwnd to be ssthresh+1.

A fix can be made by moving the "if ( diff > gamma ) ssthresh=2" part out of the " if (cwnd <= ssthresh)" so that ssthresh is always reset to 2 when diff is large.
The patch (against net/ipv4/tcp_vegas.c in Linux kernel can be found here.