Known problems in TCP algorithms in Linux 2.6.16.3

David X. Wei

Netlab @ Caltech

Jun 2006

With NS-2 TCP Linux, several problems are found in Linux congestion control algorithms with kernel release of 2.6.16.3.

The NS-2 TCP Linux release patch includes the original Linux files and hence carries these bugs/problems. They are documented here so that users are expected to pay attention to these bugs before drawing conclusions on a particular congestion control algorithms. -- We want to separate bugs in implementation from problems in algorithm designs.

I will try to update the patch as Linux rolls out new versions with bug fixes.

If you find any other implementation problems, you are welcome to report to me. Thank you.

Contents

HighSpeed TCP

(These three problems with HighSpeed TCP have been fixed in the latest Linux kernel 2.6.19.2.)

There are three bugs with the highspeed tcp implementations in the patch, inherited from Linux source code. These problems can be shown in the following example (with the script and config file ):

Congestion window trajectory
Rate trajectory

This scenario has 32 flows running on a 64ms RTT path, with bottleneck capacity of 100Mbps and buffer size of 220pkt. We show the trajectories of 4 flows. Three problems can be found:

Vegas

There are two known problems with Vegas implementation in Linux 2.6.16.3.

Setting of Slow-Start-Threshold

Linux Vegas might be very unfair among different Vegas flows, due to a potential bug with Slow Start Threshold setting.
The following figures (Red: an unfairly large flow; Green: an unfairly small flow) show the problem:

Congestion window trajectory
Rate trajectory

This scenario runs 100 TCP-Vegas flows, with alpha=beta=20 and a hardcoded baseRTT of 100ms, so that the unfairness is not due to wrong baseRTT estimation.
The bottleneck capacity is 1.2Gbps and the edge capacity is 1.5Gbps. Bottleneck buffer is 5000 packets. Round trip propagation delay is 100ms.

The Flow 0 keeps a huge congestion window (1800+ pkt) and grabs most of the bandwidth.
Flow 99 (and many other flows, most of them got a window of (100 pkt) ) get very small share of the bandwidth.

This is due to a potential problem in setting slow start threshold. In the Linux source code, ssthresh is set to 2 when "cwnd is no greater than ssthresh" and "the delay is too large (comparing to gamma)".
This code can be abstracted as:

if (one RTT finishes) {
   if (cwnd <= ssthresh) {
     if ( diff > gamma) {
       set ssthresh to be 2;
       ...
     }
     tcp_slow_start(tp);
  } else { /* congestion avoidance */
     ...(No change on ssthresh)
  }
} else {
   if (cwnd <= ssthresh) tcp_slow_start(tp);
}

The code works perfectly fine if there is no loss.
But if there is a loss happens before delay is detected, ssthresh will be set by the reno loss recovery algorithm to be cwnd/2. Then the flow exits slow start with a relatively large ssthresh (much larger than 2).
If ssthresh is higher than what the fair cwnd should be, this flow becomes lucky:

This loop keeps the cwnd to be ssthresh+1.

A fix can be made by moving the "if ( diff > gamma ) ssthresh=2" part out of the " if (cwnd <= ssthresh)" so that ssthresh is always reset to 2 when diff is large.
The patch (against net/ipv4/tcp_vegas.c in Linux kernel 2.6.19.2) can be found here.

Setting of alpha

(The future Linux kernel (version 2.6.20) will change the default alpha value to be 2)
Linux Vegas might not get full utilization when using DelAck in some situation. The following figures (Red: from NS-2 TCP Linux; Green from dummynet; Blue from NS-2 TCP-Vegas) show the problem:

Congestion window trajectory
Rate trajectory

This scenario runs a single Vegas flow on a 64ms path with bottleneck capacity of 100Mbps, and edge capacity of 1Gbps. buffer size of 220pkt.

The problem is caused by the combination of three factors: integer computation in Linux; the use of delayed ack; and the Vegas alpha parameter value of 1.
With Delayed Ack, Vegas can only get one RTT sample for each two data packets. When an acknowlegment comes back, two data packets are acknowledged. Two new packets are sent into network back-to-back. The second packet will see almost one packet worth of queueng delay generated by the first packet (more accurately, it is 1pkt/100Mbps - 1pkt/1Gbps delay, almost equal to 1pkt/100Mbps). When the second packet arrives the receiver, the receiver generates an acknowledgment (The ack for the first packet is delayed.) Hence, the sender will only see the acknowledgment for the second packet, with almost one packet worth of queueing delay. Since alpha equals to 1 in Vegas, this one packet worth of queueing delay stops the congestion window to grow in some point before there is really persistent queueing delay in the bottleneck.

Setting alpha to be 2 (and change gamma and beta correspondingly) solves the problem.