A bug in TCP/Sack1 that leads to long finish time of short TCP flows

(Part of the NS-2 enhancement project)

David X. Wei

Netlab @ Caltech

Sep 2006

Update: this bugfix has been incorported into NS-2 codebase since Dec 2006.

This bug is found by Dr. Dohy Hong and his students (expert from Dr. Hong's observation). It can be reproduced in NS-2.29 with his script (I slightly changed the monitor part to include measurements on duplicate ack number). This bug may leads to misunderstanding (under-estimation) on the performance of TCP with finite flows.

The Fix

A patch can fix this problem. The patch is against NS-2.29. Some modification may be necessary for other versions of NS.

The problem

The problem is generic but the example shown is on a single TCP/Sack1 on a single bottleneck link.

As shown in the figure below (generated by a gnuplot script), at the time around 2598th second, there is a loss event that leads to halving the cwnd. At this time, all the data packets have been sent out (the flow should complete after the lost packets are restransmitted). However, no lost packets are retransmitted even the number of duplicated acks is as high as 40. And TCP/Sack1 waits until timeout to send the lost packet.

The bug in the code

The bug is due to the send_much function in TCP/SACK1 which exits whenever there is not new data packets from the application, without checking any available lost packets for retransmission. This bug may leads to misunderstanding on the performance of TCP in flows.