A mini-tutorial for TCP-Linux in NS-2

(Part of the NS-2 enhancement project)

David X. Wei

Netlab @ Caltech

May 2006

This tutorial is dedicated to people who want to use TCP-Linux to do NS-2 simulations. For information on how to install TCP-Linux into NS-2, see TCP-Linux website. For general tutorials of NS-2, see the NS-2 website.

Table of Content:
  1. Change your existing NS-2 simulation script to use TCP-Linux
  2. Develop your own congestion control algorithm with TCP-Linux
  3. Port a new congestion control algorithm from Linux to TCP-Linux
  4. Q&A
  5. Acknowledgment

Change your existing NS-2 simulation script to use TCP-Linux

It is very simple to change existing an existing NS-2 simulation script to use TCP-Linux. Two changes need to be done:
  1. Change the tcp agent to "Agent/TCP/Linux".
  2. Make sure the TCP Sink has Sack1 support. That is, either you are using Agent/TCPSink/Sack1 or Agent/TCPSink/Sack1/DelAck . Currently, TCP-Linux does not support receivers without SACK.
  3. Add a TCP command "select_ca <the name of your congestion control algorithm>"
  4. (Optional) delete any assignment of windowOption_. This is not necessary. Once a congestion control algorithm is selected, the value of windowOption_  has no effect to the real code. But deleting this assignment may avoid any confusion to others who read the script.
The following table shows an example. The left side is a simple NS-2 simulation script that uses SACK1 as the TCP sender. The right side is the corresponding NS-2 simulation script that uses TCP-Linux as the TCP sender. Once the two blue on the left lines are changed to the two red lines on the right, TCP-Linux is effective.

WARNING: You also need to set "window_" option in tcp agent to be large enough to see the performance difference. "window_" is the upperbound of congestion window in a TCP. It is 20 by default.

A script for TCP-Sack1
A script for TCP-Linux
#Create a simulator object
set ns [new Simulator]

#Create two nodes and a link
set bs [$ns node]
set br [$ns node]
$ns duplex-link $bs $br 100Mb 10ms DropTail

#setup sender side   
set tcp [new Agent/TCP/Sack1]
$tcp set timestamps_ true
$tcp set windowOption_ 8
$ns attach-agent $bs $tcp

#set up receiver side
set sink [new Agent/TCPSink/Sack1]
$sink set ts_echo_rfc1323_ true
$ns attach-agent $br $sink

#logical connection
$ns connect $tcp $sink

#Setup a FTP over TCP connection
set ftp [new Application/FTP]
$ftp attach-agent $tcp
$ftp set type_ FTP

#Schedule the life of the FTP
$ns at 0 "$ftp start"
$ns at 10 "$ftp stop"

#Schedule the stop of the simulation
$ns at 11 "exit 0"

#Start the simulation
$ns run
#Create a simulator object
set ns [new Simulator]

#Create two nodes and a link
set bs [$ns node]
set br [$ns node]
$ns duplex-link $bs $br 100Mb 10ms DropTail

#setup sender side   
set tcp [new Agent/TCP/Linux]
$tcp set timestamps_ true
$ns at 0 "$tcp select_ca highspeed"
$ns attach-agent $bs $tcp

#set up receiver side
set sink [new Agent/TCPSink/Sack1]
$sink set ts_echo_rfc1323_ true
$ns attach-agent $br $sink

#logical connection
$ns connect $tcp $sink

#Setup a FTP over TCP connection
set ftp [new Application/FTP]
$ftp attach-agent $tcp
$ftp set type_ FTP

#Schedule the life of the FTP
$ns at 0 "$ftp start"
$ns at 10 "$ftp stop"

#Schedule the stop of the simulation
$ns at 11 "exit 0"

#Start the simulation
$ns run

Develop a new congestion control algorithm with TCP-Linux

Here we explain the very basic concepts which are enough for developing simple loss-based algorithms.

An example: the implementation of a very simple Reno

The following table gives the implementation of a very simple Reno (In fact, it's a FACK since TCP-Linux takes care of all loss detection, loss recovery and rate-halving.)
Naive Reno ( u32 in the codes are equivalent to unsigned int)
/* This is a very naive Reno implementation, shown as an example on how to develop a new congestion control algorithm with TCP-Linux. */
/* This file itself should be copied to tcp/linux/ directory. */
/* To let the compiler compiles this file, an entry "tcp/linux/<NameOfThisFile>.o" should be added to Makefile */

/* This two header files link your implementation to TCP-Linux */

#include "ns-linux-c.h"
#include "ns-linux-util.h"

/* This equivalent to opencwnd in other implementation of NS-2. */
/* This function increase congestion window for each acknowledgment*/

void tcp_naive_reno_cong_avoid(struct tcp_sk *tp, u32 ack, u32 rtt, u32 in_flight, int flag)
{
    if (tp->snd_cwnd < tp->snd_ssthresh) {
        tp->snd_cwnd++;
    } else {
        if (tp->snd_cwnd_cnt >= tp->snd_cwnd) {
            if (tp->snd_cwnd < tp->snd_cwnd_clamp)
                tp->snd_cwnd++;
             tp->snd_cwnd_cnt = 0;
        } else {
            tp->snd_cwnd_cnt++;
        }
    }
}

/* This function returns the slow-start threshold after a loss.*/
/* ssthreshold should be half of the congestion window after a loss */

u32 tcp_naive_reno_ssthresh(struct tcp_sk *tp)
{
        return max(tp->snd_cwnd >> 1U, 2U);
}

/* This function returns the congestion window after a loss -- it is called AFTER the function ssthresh (above)  */
/* Congestion window should be equal to the slow start threshold (after slow start threshold set to half of cwnd before loss). */

u32 tcp_naive_reno_min_cwnd(struct tcp_sk *tp)
{
        return tp->snd_ssthresh;
}

/* a constant record for this congestion control algorithm */
/* The record should be hooked in tcp/linux/ns-linux-util.h and tcp/tcp-linux.cc */
struct tcp_congestion_ops naive_reno = {
        .name           = "naive_reno",
        .ssthresh       = tcp_naive_reno_ssthresh,
        .cong_avoid     = tcp_naive_reno_cong_avoid,
        .min_cwnd       = tcp_naive_reno_min_cwnd
};
As in the example above, an implementation includes three parts:
  1. The header files to link the implementation to TCP-Linux;
  2. Implementation of (at least) the three congestion control functions defined in struct tcp_congestion_ops: cong_avoid, ssthresh and min_cwnd;
  3. A static record of struct tcp_congestion_ops to store the function calls and algorithm's name.(In the example, I gave the algorithm a name " naive_reno")
After copying the file to tcp/linux, changing Makefile, exporting the algorithm (changes in tcp/linux/ns-linux-util.h) and registering the algorithm (changes in tcp/tcp-linux.c), you can run the algorithm by adding "select_ca naive_reno" in your tcl script.

To develop your algorithm seriously, please go ahead to read the following details.

To fully understand the process, readers are expected to have knowledge in C programming. For more complicated algorithms, readers are encouraged to read the Linux Kernel Documents: Documentation/networking/tcp.txt (in any Linux kernel source code with Version 2.6.13 or above).

Data structure interface

TCP-Linux exposes several important variables in Linux TCP to NS-2 (in tcp_sk structure of tcp/linux/ns-linux-util.h in the NS-2 code patched with TCP-Linux), as listed in the following table. Most of these variables are read-only, except the red ones (snd_ssthresh, snd_cwndsnd_cwnd_cnt, and icsk_ca_priv).
Variable Name
type
(32bit by default)
Meanings
equivalence in existing NS-2 TCP
snd_nxt
unsigned
The  sequence number of the next byte that TCP is going to send.
t_seqno_*size_
snd_una unsigned The  sequence number of the next byte that TCP is waiting for acknowledgment
(highest_ack_+1)*size_
mss_cache
unsigned The size of a packet
size_
srtt
unsigned 8 times of the smooth RTT
t_srtt_
rx_opt.rcv_tsecr unsigned Value of timestamp echoed by the last acknowledgment
ts_echo_
rx_opt.saw_tstamp bool
Whether tiemstamp is seen in the last acknowledgment !hdr_flags::access(pkt)->no_ts_
snd_ssthresh
unsigned Slow-Start threshold
ssthresh_
snd_cwnd
unsigned Congestion window
trunc(cwnd_)
snd_cwnd_cnt
unsigned
(16 bit)
Fraction of congestion window which is not accumulated to 1
trunc(cwnd_*cwnd_)%cwnd_
snd_cwnd_clamp
unsigned
(16bit)
upper bound of the congestion window
wnd_
snd_cwnd_stamp
unsigned the last time that the congestion window is changed (to detect idling and other situations)
n/a
bytes_acked
unsigned
the number of bytes that were acknowledged in the last acknowledgment (for ABC)
n/a
icsk_ca_state
unsigned
(8bit)
The current congestion control state, which can be one of the followings:
TCP_CA_Open: normal state
TCP_CA_Recovery: Loss Recovery after a Fast Transmission
TCP_CA_Loss: Loss Recovery after a  Timeout
(The following two states are not effective in TCP-Linux but is effective in Linux)
TCP_CA_Disorder: duplicate packets detected, but haven't reach the threshold. So TCP  shall assume that  packet reordering is happening.
TCP_CA_CWR: the state that congestion window is decreasing (after local congesiton in NIC, or ECN and etc).
n/a
icsk_ca_priv
unsigned[16]
private data for individual congestion control algorithm for this flow
n/a
icsk_ca_ops
struct tcp_congesiton_ops*
a pointer to the congestion control algorithm structure for this flow
n/a

Congestion control algorithm interface

The congestion control algorithm interface is described in struct tcp_congestion_ops, which is a structure of function call pointers.
The structure is defined as below (in tcp/linux/ns-linux-util.h in the NS-2 code patched with TCP-Linux):
struct tcp_congestion_ops {
    char name[16];

    void (*cong_avoid)(struct tcp_sk *sk, unsigned int ack, unsigned int rtt, unsigned int in_flight, bool good_ack);
   
unsigned int (*ssthresh)(struct tcp_sk *sk);
   
unsigned int (*min_cwnd)(struct tcp_sk *sk);

    unsigned int (*undo_cwnd)(struct tcp_sk *sk);
    void (*rtt_sample)(struct tcp_sock *sk, unsigned int usrtt);
   
void (*set_state)(struct tcp_sock *sk, unsigned int newstate);
   
void (*cwnd_event)(struct tcp_sock *sk, enum tcp_ca_event ev);
    void (*pkts_acked)(struct tcp_sock *sk, unsigned int num_acked);

    void (*init)(struct tcp_sock *sk);
   
void (*release)(struct tcp_sock *sk);
}
name[16]
is the name of the TCP congestion control algorithm. This will be the name for "select_ca" command in tcl script.
struct sock* tcp_sk is always the pointer of the TCP data structure of the flow.
The three function calls (in red) are function calls that are REQUIRED to be implemented. Others are optional. They are explained in the table below:

function name
explanation
cong_avoid
This function is called every time an acknowledgment is received and the congestion window can be increased. This is equivalent to opencwnd in tcp.cc.
ack is the number of bytes that are acknowledged in the latest acknowledgment;
rtt is the the rtt measured by the latest acknowledgment;
in_flight is the packet in flight before the latest acknowledgment;
good_ack is an indicator whether the current situation is normal (no duplicate ack, no loss and no SACK).
ssthresh
This function is called when the TCP flow detects a loss.
It returns the slow start threshold of a flow, after a packet loss is detected.
min_cwnd
This function is called when the TCP flow detects a loss.
It returns the congestion window of a flow, after a packet loss is detected; (for many algorithms, this will be equal to ssthresh). When a loss is detected, min_cwnd is called after ssthresh. But some others algorithms might set min_cwnd to be smaller than ssthresh. If this is the case, there will be a slow start after loss recovery.
undo_cwnd
returns the congestion window of a flow, after a false loss detection (due to false timeout or packet reordering) is confirmed.  This function is not effective in the current version of TCP-Linux.
rtt_sample
This function is called when a new RTT sample is obtained. It is mainly used by delay-based congestion control algorithms which usually need accurate timestamps.
usrtt is the RTT value in microsecond (us) unit.
set_state This function is called when the congestion state of the TCP is changed.
newstate is the state code for the state that TCP is going to be in. The possible states are listed in the data structure interface table.
It is to notify the congestion control algorithm and is used by some algorithms which turn off their special control during loss recovery.
cwnd_event This function is called when there is an event that might be interested for congestion control algorithm.
ev is the congestion event code. The possible events are:
CA_EVENT_FAST_ACK: An acknowledgment in sequence is received;
CA_EVENT_SLOW_ACK: An acknowledgment not in sequence is received;
CA_EVENT_TX_START: first transmission when no packet is in flight
CA_EVENT_CWND_RESTART: congestion window is restarted
CA_EVENT_COMPLETE_CWR: congestion window recovery is finished.
CA_EVENT_FRTO: fast recovery timeout happens
CA_EVENT_LOSS: retransmission timeout happens
pkts_acked
This function is called when there is an acknowledgment that acknowledges some new packets.
num_acked is the number of packets that are acknowledged by this acknowledgments.
init
This function is called after the first acknowledgment is received and before the congestion control algorithm will be called for the first time.
If the congestion control algorithm has private data, it should initialize its private date here.
release
This function is called when the flow finishes. If the congestion control algorithm has allocated additional memory other than the 16 unsigned int of icsk_ca_priv, it should delete the additional memory here to avoid memory leak.

The process to implement a new (and simple) congestion control algorithm

  1. Understand the data structure interface and congestion control interface
  2. Give a name for your congestion control algorithm -- this name will be used in the "select_ca" command.
  3. Implement at least the three required congestion control functions (cong_avoid, ssthresh, and min_cwnd) in the congestion control interface
  4. Create a constant struct struct tcp_congestion_ops YourCongestionControlStructure {...} with the name, cong_avoid, ssthresh, and min_cwnd (and/or other congetion implemented functions) filled
  5. Include two header files: "linux-linux-util.h" and "ns-linux-c.h".
  6. copy your file (mytcpfile.c) to tcp/linux/ directory
  7. Add an entry in Makefile: tcp/linux/mytcpfile.o to let the compiler know how to compile your file
  8. add a record in tcp/linux/ns-linux-util.h to export your congestion control structure to NS2:
    extern struct tcp_congestion_ops YourCongestionControlStructure;
  9. add a record (&YourCongestionControlStructure) in struct tcp_congestion_ops* static_list[] of tcp/tcp-linux.cc to register your algorithm, also increase CONGESTION_CONTROL_OPS_NUM by 1.
  10. compile, run and check  the simulation results
You might encounter one of the following problems in the last step:
  1. If your algorithm has some variables with the names identical to some other existing algorithms, you will see duplicate definition error. In this case, change the duplicated names to others.

Port a new congestion control algorithm from Linux to TCP-Linux

Porting a congestion control algorithm from Linux to TCP-Linux is relatively easier. Take the following steps:
  1. copy the Linux module files of the new congestion control algorithm to tcp/linux/ directory in your NS2 code base
  2. add a record in tcp/linux/ns-linux-util.h to export the congestion control structure's name to NS2:
    extern struct tcp_congestion_ops YourCongestionControlStructure;
  3. add a record (&YourCongestionControlStructure) in struct tcp_congestion_ops* static_list[] of tcp/tcp-linux.cc to register your algorithm, also increase CONGESTION_CONTROL_OPS_NUM by 1.
  4. add a record in Makefile by adding an item to let compiler know syour code:
    tcp/linux/YourCode.o
  5. compile, run and compare the simulation results with Linux experiments results.

You might encounter one of the following problems in the last step:

  1. If your algorithm requires access to many new fields in Linux TCP structure, you might need to add more fields to struct tcp_sk in tcp/linux/ns-linux-util.h;
  2. If your algorithm has some variables with the names identical to some other existing algorithms, you will see duplicate definition error. In this case, change the duplicated names to others;
  3. If you algorithm has some actions in module register function, let the module init function to call the register function, since this NS2 implementation does not call module register function.

Q&A

1. What happens if the select_ca command selects a non-existing congesiton control algorithm (e.g. "highsped" by a typo)?

TCP-Linux will first display an error message to the screen: Error: do not find highsped as a congestion control algorithm. Then, TCP-Linux calls the default congestion control algorithm in tcp.cc. (And in this case, the value of windowOption_ is effective.)

2. I found some fairness problem with HighSpeed TCP.

Please check the known Linux bugs page to make sure it is really the problem of the algorithm, not a bug in Linux implementation.

3. I found that TCP Vegas does not fully utilize the capacity.

Please check the known Linux bugs page to make sure it is really the problem of the algorithm, not a bug in Linux implementation.

Acknowledgment

This work is inspired and greatly helped by Prof. Pei Cao at Stanford and by Prof. Steven Low at Caltech. Many thanks to them!