A mini-tutorial for TCP-Linux in NS-2

(Part of the NS-2 enhancement project)

David X. Wei

Netlab @ Caltech

Initial Draft: May 2006;
Revision 1 for parameter tunings: Sep 2007.

This tutorial is dedicated to people who want to use TCP-Linux to do NS-2 simulations. For information on how to install TCP-Linux into NS-2, see TCP-Linux website. For general tutorials of NS-2, see the NS-2 website.

Table of Content:
  1. Change your existing NS-2 simulation script to use TCP-Linux with default parameters
  2. Change parameters of Linux congestion control modules in TCP-Linux simulations
  3. Change parameters of Linux system in TCP-Linux simulations
  4. Update the TCP congestion control module source codes with a newer linux kernel
  5. Develop your own congestion control algorithm with TCP-Linux
  6. Q&A
  7. Acknowledgment

Change your existing NS-2 simulation script to use TCP-Linux with default parameters

It is very simple to change existing an existing NS-2 simulation script to use TCP-Linux. Two changes need to be done:
  1. Change the tcp agent to "Agent/TCP/Linux".
  2. Make sure the TCP Sink has Sack1 support. That is, either you are using Agent/TCPSink/Sack1 or Agent/TCPSink/Sack1/DelAck . Currently, TCP-Linux does not support receivers without SACK. (More accurately, the results without SACK have not been validated by comparing against emulation results.)
  3. Add a TCP command "select_ca <the name of your congestion control algorithm>"
  4. (Optional) delete any assignment of windowOption_. This is not necessary. Once a congestion control algorithm is selected, the value of windowOption_  has no effect to the real code. But deleting this assignment may avoid any confusion to others who read the script.
The following table shows an example. The left side is a simple NS-2 simulation script that uses SACK1 as the TCP sender. The right side is the corresponding NS-2 simulation script that uses TCP-Linux as the TCP sender. Once the two blue on the left lines are changed to the two red lines on the right, TCP-Linux is effective.

WARNING: You also need to set "window_" option in tcp agent to be large enough to see the performance difference. "window_" is the upperbound of congestion window in a TCP. It is 20 by default.

A script for TCP-Sack1
A script for TCP-Linux using Highspeed TCP (hstcp)
#Create a simulator object
set ns [new Simulator]

#Create two nodes and a link
set bs [$ns node]
set br [$ns node]
$ns duplex-link $bs $br 100Mb 10ms DropTail

#setup sender side   
set tcp [new Agent/TCP/Sack1]
$tcp set timestamps_ true
$tcp set windowOption_ 8
$ns attach-agent $bs $tcp

#set up receiver side
set sink [new Agent/TCPSink/Sack1]
$sink set ts_echo_rfc1323_ true
$ns attach-agent $br $sink

#logical connection
$ns connect $tcp $sink

#Setup a FTP over TCP connection
set ftp [new Application/FTP]
$ftp attach-agent $tcp
$ftp set type_ FTP

#Schedule the life of the FTP
$ns at 0 "$ftp start"
$ns at 10 "$ftp stop"

#Schedule the stop of the simulation
$ns at 11 "exit 0"

#Start the simulation
$ns run
#Create a simulator object
set ns [new Simulator]

#Create two nodes and a link
set bs [$ns node]
set br [$ns node]
$ns duplex-link $bs $br 100Mb 10ms DropTail

#setup sender side   
set tcp [new Agent/TCP/Linux]
$tcp set timestamps_ true
$ns at 0 "$tcp select_ca highspeed"
$ns attach-agent $bs $tcp

#set up receiver side
set sink [new Agent/TCPSink/Sack1]
$sink set ts_echo_rfc1323_ true
$ns attach-agent $br $sink

#logical connection
$ns connect $tcp $sink

#Setup a FTP over TCP connection
set ftp [new Application/FTP]
$ftp attach-agent $tcp
$ftp set type_ FTP

#Schedule the life of the FTP
$ns at 0 "$ftp start"
$ns at 10 "$ftp stop"

#Schedule the stop of the simulation
$ns at 11 "exit 0"

#Start the simulation
$ns run

Change parameters of Linux congestion control modules in TCP-Linux simulations

Linux kernel congestion control modules may have module parameters, for example, the alpha, beta and gamma parameters in TCP Vegas. All the connections in a Linux system share the same parameter values.
The TCP-Linux supports changing congestion control parameters in the simulation system-wide (as Linux supports), or, on per-connectoin basis.
The command set_ca_default_param changes the default value of a parameter system-wide. This is a very efficient implementation (similar to Linux) without simulation overhead. However, once a default value is changed system-wide, all connections using this parameter default value are affected. (Connections that have defined their own local parameter values will not be affected.)
The command set_ca_param changes a parameter value in a particular flow. The change is local to this particular flow. Other flows running the same congestion control algorithm continue using their own local values of parameters (if they have ever called set_ca_param) or using the default value. This implementation introduces simulation overhead for each packet being processed. The simulation runs slower if there are many flows that set their local parameters.
For optimum simulation performance, it is suggested that we set the default values to be the values that are used by the majority of the flows (and these flows do not have to set their local values). For flows that can not use the default values, we set local values for them.

Changing Default Parameters

If all the TCP/Linux flows in the simulation have the same set of parameter values, we can use "set_ca_default_param" command to change the default parameters any time. If any flow calls this command, all other flows can see the changes, too.

The format of the command is:
< tcp instance > set_ca_default_param < algorithm name> < parameter name> < new value>

To print out the current default value of a parameter, the command get_ca_default_param:
< tcp instance > get_ca_default_param < algorithm name> < parameter name>

Changing Local Parameters on a per-flow basis

If a particular flow has to use a different value for a parameter, we can use "set_ca_param" command to change the local value of a parameter any time. This command may slow down the simulation.

The format of the command is:
< tcp instance > set_ca_param < algorithm name> < parameter name> < new value>

To print out the current local value of a parameter, the command get_ca_param:
< tcp instance > get_ca_param < algorithm name> < parameter name>

Example

The following table shows an example of changing parameters in TCP Vegas.
At time 3 sec, the Vegas parameters (both alpha and beta) are changed to 40. 40 is equivalent to 20 packets because Vegas uses the last bit of the parameters for accuracy preservation.
Note that this change is a global change on Vegas parameters. All the TCP-Linux which is running TCP-Vegas without per-connection parameters will be affected by this change. In the following example, we only have tcp(1) change default parameters, tcp(2) can see the new values too.
At time 6 sec, tcp(3) changes its local parameter of alpha and beta to be 20 (equivalent to 10 packets). Due to a smaller alpha value than other flows, tcp(3) will see smaller throughput from 6 sec.
And at the bottleneck queue, the queue length will be around 9 packets from 0 to 3 seconds, around 60 packets from 3 to 6 seconds, and around 50 packets from 6 to 10 seconds.

A script for TCP-Linux using TCP-Vegas (vegas)
#Create a simulator object
set ns [new Simulator]

#Create a bottleneck link.
set router_snd [$ns node]
set router_rcv [$ns node]
$ns duplex-link $router_snd $router_rcv 10Mb 10ms DropTail
$ns queue-limit $router_snd $router_rcv 10000
# Create two flows sharing the bottleneck link
for {set i 1} {$i <=3} {incr i 1} {
  #Create the sending nodes,the receiving nodes.
  set bs($i) [$ns node]
  $ns duplex-link $bs($i) $router_snd 100Mb 1ms DropTail
  set br($i) [$ns node]
  $ns duplex-link $router_rcv $br($i) 100Mb 1ms DropTail
  #setup sender side
  set tcp($i) [new Agent/TCP/Linux]
  $tcp($i) set timestamps_ true
  $tcp($i) set window_ 100000
  $ns at 0 "$tcp($i) select_ca vegas"
  $ns attach-agent $bs($i) $tcp($i)

  #set up receiver side
   set sink($i) [new Agent/TCPSink/Sack1]
  $sink($i) set ts_echo_rfc1323_ true
  $ns attach-agent $br($i) $sink($i)

  #logical connection
  $ns connect $tcp($i) $sink($i)

  #Setup a FTP over TCP connection
  set ftp($i) [new Application/FTP]
  $ftp($i) attach-agent $tcp($i)
  $ftp($i) set type_ FTP

  #Schedule the life of the FTP
  $ns at 0 "$ftp($i) start"
  $ns at 10 "$ftp($i) stop"
}

#change default parameters, all TCP/Linux will see the changes!
$ns at 3 "$tcp(1) set_ca_default_param vegas alpha 40"
$ns at 3 "$tcp(1) set_ca_default_param vegas beta 40"

# confirm the changes by printing the parameter values (optional). Note that tcp(2) can see the change of default value even the change is made by tcp(1)
$ns at 3 "$tcp(2) get_ca_default_param vegas alpha"
$ns at 3 "$tcp(2) get_ca_default_param vegas beta"


# change local parameters, only tcp(3) is affected.
$ns at 6 "$tcp(3) set_ca_param vegas alpha 20"
$ns at 6 "$tcp(3) set_ca_param vegas beta 20"

# confirm the changes by printing the parameter values (optional)
$ns at 6 "$tcp(3) get_ca_param vegas alpha"
$ns at 6 "$tcp(3) get_ca_param vegas beta"

#Schedule the stop of the simulation
$ns at 11 "exit 0"

#Start the simulation
$ns run

Change parameters of Linux system in TCP-Linux simulations

The patch supports the simulation to change Linux parameters (out side the congestion control modules) in the same way as the congestion control modules. The Linux system is regarded as a special module "linux". Hence, get_default_ca_param, set_default_ca_param, get_ca_param, set_ca_param can also tune the Linux parameters. For example:
set_default_ca_param linux sysctl_tcp_abc 0
turns off the ABC option of in Linux system.

All the Linux system variables are listed in tcp/linux/ns-linux-param.c. The following table summarizes all the parameters currently available:
Variable NameDefault valueDescription
sysctl_tcp_abc10: Turn off Appropriate Byte Counting (ABC); 1: Turn on ABC. Turn on for faster cwnd growth in bulk transfer.
tcp_max_burst3The maximum number of packets that can be sent back-to-back during loss recovery. This parameter controls the maximum burst size.
debug_level1The verbose level of debug message. 0: print everying including INFO; 1: print ERROR and NOTICE; 2: print ERROR only

Update the TCP congestion control module source codes with a newer linux kernel

Take the following steps:
  1. download the latest linux kernel source code (e.g. from kernel.org). (For simiplicity of the explanation, let's say you place the kernel source code in /tmp/kernel_src/ directory.)
  2. go to your NS-2 directory by: cd < NS2-Directory>/tcp/linux
  3. run sh migrate.sh < path of the linux kernel> < directory name you want to save the src code>.
    For example,
    sh migrate.sh /tmp/kernel_src/ 2.6.25
    will copy all the relevant files (usually tcp_* files in net/ipv4 directory) from /tmp/kernel_src/ to < NS2-Directory>/tcp/linux/2.6.25/ , remove the old directory of < NS2-Directory>/tcp/linux/src/ , and create a soft link from < NS2-Directory>/tcp/linux/2.6.25/ to < NS2-Directory>/tcp/linux/src/
  4. compile, run and compare the simulation results with Linux experiments results.

You might encounter one of the following problems in the last step:

  1. If the new kernel source code has new congestion control algorithms in new files, add records in Makefile by adding items to let compiler know the new codes:
    tcp/linux/src/< new code name> .o
  2. If your algorithm requires access to many new fields in Linux TCP structure, you might need to add more fields to struct tcp_sock in tcp/linux/ns-linux-util.h;

Develop a new congestion control algorithm with TCP-Linux

Here we explain the very basic concepts which are enough for developing simple loss-based algorithms.

An example: the implementation of a very simple Reno

The following table gives the implementation of a very simple Reno (In fact, it's a FACK since TCP-Linux takes care of all loss detection, loss recovery and rate-halving.)
This Reno implementation includes two parameters: AI parameter (alpha) and MD parameter (beta). Every RTT without loss, the congestion window is increased by alpha pkt. Every loss event, the congestion window is reduced by 1/beta*cwnd. By default, alpha=1 and beta=2, as the Reno algorithm.
Naive Reno ( u32 in the codes are equivalent to unsigned int)
/* This is a very naive Reno implementation, shown as an example on how to develop a new congestion control algorithm with TCP-Linux. */
/* This file itself should be copied to tcp/linux/ directory. */
/* To let the compiler compiles this file, an entry "tcp/linux/<NameOfThisFile>.o" should be added to Makefile */

/* This definition lets the compiler knows the name of this protocol */
#define NS_PROTOCOL "tcp_naive_reno.c"

/* This two header files link your implementation to TCP-Linux */
#include "ns-linux-c.h"
#include "ns-linux-util.h"

/* Define a parameter alpha for AI parameter */
static int alpha = 1;
/* Declare alpha as a parameter */
module_param(alpha, int, 0644);
/* Declare the explanation for alpha*/
MODULE_PARM_DESC(alpha, "AI increment size of window (in unit of pkt/round trip time)");

/* Define a parameter beta for MD parameter */
/* Declare beta as a parameter */
static int beta = 2;
module_param(beta, int, 0644);
/* Declare the explanation for beta*/
MODULE_PARM_DESC(beta, "MD decrement portion of window: every loss the window is reduced by a proportion of 1/beta");

/* This equivalent to opencwnd in other implementation of NS-2. */
/* This function increase congestion window for each acknowledgment*/

void tcp_naive_reno_cong_avoid(struct tcp_sock *tp, u32 ack, u32 rtt, u32 in_flight, int flag)
{
    if (tp->snd_cwnd < tp->snd_ssthresh) {
        tp->snd_cwnd++;
    } else {
        if (tp->snd_cwnd_cnt >= tp->snd_cwnd) {
            tp->snd_cwnd += alpha;
            tp->snd_cwnd_cnt = 0;
            if (tp->snd_cwnd > tp->snd_cwnd_clamp)
               tp->snd_cwnd = tp->snd_cwnd_clamp;
        } else {
            tp->snd_cwnd_cnt++;
        }
    }
}

/* This function returns the slow-start threshold after a loss.*/
/* ssthreshold should be half of the congestion window after a loss */

u32 tcp_naive_reno_ssthresh(struct tcp_sock *tp)
{
        int reduction = tp->snd_cwnd / beta;
        return max(tp->snd_cwnd - reduction, 2U);
}

/* This function returns the congestion window after a loss -- it is called AFTER the function ssthresh (above)  */
/* Congestion window should be equal to the slow start threshold (after slow start threshold set to half of cwnd before loss). */

u32 tcp_naive_reno_min_cwnd(struct tcp_sock *tp)
{
        return tp->snd_ssthresh;
}

/* a constant record for this congestion control algorithm */
static struct tcp_congestion_ops tcp_naive_reno = {
        .name           = "naive_reno",
        .ssthresh       = tcp_naive_reno_ssthresh,
        .cong_avoid     = tcp_naive_reno_cong_avoid,
        .min_cwnd       = tcp_naive_reno_min_cwnd
};

/* defines a initialization function */
int tcp_naive_reno_register(void)
{
       tcp_register_congestion_control(&tcp_naive_reno);
       return 0;
}
/* declare the initialization function */
module_init(tcp_naive_reno_register);
As in the example above, an implementation includes five parts:
  1. The header files to link the implementation to TCP-Linux;
  2. Optionally, define and decluare parameters -- parameters have to be defined static because different modules might have different parameters with the same variable names
  3. Implementation of (at least) the three congestion control functions defined in struct tcp_congestion_ops: cong_avoid, ssthresh and min_cwnd;
  4. A static record of struct tcp_congestion_ops to store the function calls and algorithm's name.(In the example, I gave the algorithm a name " naive_reno")
  5. An module initialization function which calls tcp_register_congestion_control to register the module.
After copying the file to tcp/linux, changing Makefile, you can run the algorithm by adding "select_ca naive_reno" in your tcl script.

To develop your algorithm seriously, please go ahead to read the following details.

To fully understand the process, readers are expected to have knowledge in C programming. For more complicated algorithms, readers are encouraged to read the Linux Kernel Documents: Documentation/networking/tcp.txt (in any Linux kernel source code with Version 2.6.13 or above).

Data structure interface

TCP-Linux exposes several important variables in Linux TCP to NS-2 (in tcp_sock structure of tcp/linux/ns-linux-util.h in the NS-2 code patched with TCP-Linux), as listed in the following table. Most of these variables are read-only, except the red ones (snd_ssthresh, snd_cwndsnd_cwnd_cnt, and icsk_ca_priv).
Variable Name
type
(32bit by default)
Meanings
equivalence in existing NS-2 TCP
snd_nxt
unsigned
The  sequence number of the next byte that TCP is going to send.
t_seqno_*size_
snd_una unsigned The  sequence number of the next byte that TCP is waiting for acknowledgment
(highest_ack_+1)*size_
mss_cache
unsigned The size of a packet
size_
srtt
unsigned 8 times of the smooth RTT
t_srtt_
rx_opt.rcv_tsecr unsigned Value of timestamp echoed by the last acknowledgment
ts_echo_
rx_opt.saw_tstamp bool
Whether tiemstamp is seen in the last acknowledgment !hdr_flags::access(pkt)->no_ts_
snd_ssthresh
unsigned Slow-Start threshold
ssthresh_
snd_cwnd
unsigned Congestion window
trunc(cwnd_)
snd_cwnd_cnt
unsigned
(16 bit)
Fraction of congestion window which is not accumulated to 1
trunc(cwnd_*cwnd_)%cwnd_
snd_cwnd_clamp
unsigned
(16bit)
upper bound of the congestion window
wnd_
snd_cwnd_stamp
unsigned the last time that the congestion window is changed (to detect idling and other situations)
n/a
bytes_acked
unsigned
the number of bytes that were acknowledged in the last acknowledgment (for ABC)
n/a
icsk_ca_state
unsigned
(8bit)
The current congestion control state, which can be one of the followings:
TCP_CA_Open: normal state
TCP_CA_Recovery: Loss Recovery after a Fast Transmission
TCP_CA_Loss: Loss Recovery after a  Timeout
(The following two states are not effective in TCP-Linux but is effective in Linux)
TCP_CA_Disorder: duplicate packets detected, but haven't reach the threshold. So TCP  shall assume that  packet reordering is happening.
TCP_CA_CWR: the state that congestion window is decreasing (after local congesiton in NIC, or ECN and etc).
n/a
icsk_ca_priv
unsigned[16]
private data for individual congestion control algorithm for this flow
n/a
icsk_ca_ops
struct tcp_congesiton_ops*
a pointer to the congestion control algorithm structure for this flow
n/a

Congestion control algorithm interface

The congestion control algorithm interface is described in struct tcp_congestion_ops, which is a structure of function call pointers.
The structure is defined as below (in tcp/linux/ns-linux-util.h in the NS-2 code patched with TCP-Linux):
struct tcp_congestion_ops {
    char name[16];

    void (*cong_avoid)(struct tcp_sock *sk, unsigned int ack, unsigned int rtt, unsigned int in_flight, int good_ack);
   
unsigned int (*ssthresh)(struct tcp_sock *sk);
   
unsigned int (*min_cwnd)(struct tcp_sock *sk);

    unsigned int (*undo_cwnd)(struct tcp_sock *sk);
    void (*rtt_sample)(struct tcp_sock *sk, unsigned int usrtt);
   
void (*set_state)(struct tcp_sock *sk, unsigned int newstate);
   
void (*cwnd_event)(struct tcp_sock *sk, enum tcp_ca_event ev);
    void (*pkts_acked)(struct tcp_sock *sk, unsigned int num_acked, ktime_t last);

    void (*init)(struct tcp_sock *sk);
   
void (*release)(struct tcp_sock *sk);
}
name[16]
is the name of the TCP congestion control algorithm. This will be the name for "select_ca" command in tcl script.
struct sock* tcp_sk is always the pointer of the TCP data structure of the flow.
The three function calls (in red) are function calls that are REQUIRED to be implemented. Others are optional. They are explained in the table below:

function name
explanation
cong_avoid
This function is called every time an acknowledgment is received and the congestion window can be increased. This is equivalent to opencwnd in tcp.cc.
ack is the number of bytes that are acknowledged in the latest acknowledgment;
rtt is the the rtt measured by the latest acknowledgment;
in_flight is the packet in flight before the latest acknowledgment;
good_ack is an indicator whether the current situation is normal (no duplicate ack, no loss and no SACK). Value: 1 for normal, 0 for dubious
ssthresh
This function is called when the TCP flow detects a loss.
It returns the slow start threshold of a flow, after a packet loss is detected.
min_cwnd
This function is called when the TCP flow detects a loss.
It returns the congestion window of a flow, after a packet loss is detected; (for many algorithms, this will be equal to ssthresh). When a loss is detected, min_cwnd is called after ssthresh. But some others algorithms might set min_cwnd to be smaller than ssthresh. If this is the case, there will be a slow start after loss recovery.
undo_cwnd
returns the congestion window of a flow, after a false loss detection (due to false timeout or packet reordering) is confirmed.  This function is not effective in the current version of TCP-Linux.
rtt_sample
This function is called when a new RTT sample is obtained. It is mainly used by delay-based congestion control algorithms which usually need accurate timestamps.
usrtt is the RTT value in microsecond (us) unit.
set_state This function is called when the congestion state of the TCP is changed.
newstate is the state code for the state that TCP is going to be in. The possible states are listed in the data structure interface table.
It is to notify the congestion control algorithm and is used by some algorithms which turn off their special control during loss recovery.
cwnd_event This function is called when there is an event that might be interested for congestion control algorithm.
ev is the congestion event code. The possible events are:
CA_EVENT_FAST_ACK: An acknowledgment in sequence is received;
CA_EVENT_SLOW_ACK: An acknowledgment not in sequence is received;
CA_EVENT_TX_START: first transmission when no packet is in flight
CA_EVENT_CWND_RESTART: congestion window is restarted
CA_EVENT_COMPLETE_CWR: congestion window recovery is finished.
CA_EVENT_FRTO: fast recovery timeout happens
CA_EVENT_LOSS: retransmission timeout happens
pkts_acked
This function is called when there is an acknowledgment that acknowledges some new packets.
num_acked is the number of packets that are acknowledged by this acknowledgments.
last is the time (in microsecond) when the latest acked packet was sent. A value of 0 means no timestamp measurement is collected for this acked packet.
init
This function is called after the first acknowledgment is received and before the congestion control algorithm will be called for the first time.
If the congestion control algorithm has private data, it should initialize its private date here.
release
This function is called when the flow finishes. If the congestion control algorithm has allocated additional memory other than the 16 unsigned int of icsk_ca_priv, it should delete the additional memory here to avoid memory leak.

The process to implement a new (and simple) congestion control algorithm

  1. Understand the data structure interface and congestion control interface
  2. Give a name for your congestion control algorithm -- this name will be used in the "select_ca" command.
  3. Implement at least the three required congestion control functions (cong_avoid, ssthresh, and min_cwnd) in the congestion control interface
  4. Create a constant struct struct tcp_congestion_ops YourCongestionControlStructure {...} with the name, cong_avoid, ssthresh, and min_cwnd (and/or other congetion implemented functions) filled
  5. Include two header files: "linux-linux-util.h" and "ns-linux-c.h".
  6. copy your file (mytcpfile.c) to tcp/linux/ directory
  7. Add an entry in Makefile: tcp/linux/mytcpfile.o to let the compiler knows to compile your file
  8. compile, run and check  the simulation results

Q&A

1. What happens if the select_ca command selects a non-existing congesiton control algorithm (e.g. "highsped" by a typo)?

TCP-Linux will first display an error message to the screen: Error: do not find highsped as a congestion control algorithm. Then, TCP-Linux calls the default congestion control algorithm in tcp.cc. (And in this case, the value of windowOption_ is effective.)

2. I found some fairness issue in TCP Vegas.

Please check the known Linux bugs page to make sure it is really the problem of the algorithm, not a bug in Linux implementation.

Acknowledgment

This work is inspired and greatly helped by Prof. Pei Cao at Stanford and by Prof. Steven Low at Caltech. Many thanks to them!