[Ns-developers] datagram socket tx buffer back pressure
Mathieu Lacage
mathieu.lacage at sophia.inria.fr
Tue Mar 23 08:00:11 PDT 2010
hi,
1. The problem
--------------
Both Packet and Udp sockets suffer from similar problems in ns-3: if
your application generates a lot of traffic without its own congestion
control, all of the packets generated will be dropped by the egress
NetDevice when its tx queue becomes full and the application has no way
of knowing when this happens. I would like to fix this (mostly because
the datagram sockets in ns-3-simu need this information to block the udp
traffic generation applications).
2. The linux solution
---------------------
I spent a while looking at how the linux kernel handles this problem.
The mechanism they have implemented keeps track of which sockets
allocate which buffers and makes the sockets block when they have
allocated too many buffers and unblock when the buffers they have
allocated have been either dropped or transmitted. They do this by
setting a 'destructor' (a function pointer) in each skb which is invoked
when the skb refcount reaches zero. This destructor wakes up the
associated socket tx path if it is blocked.
Note that this scheme works only if the tx socket buffer size is smaller
than the maximum size of the egress device queue. I guess we can live
with this if the linux people can live with it.
3. Proposal for an ns-3 solution
--------------------------------
Because the ns-3 sockets are not synchronous, we are not going to make
them block: we must make the send functions fail instead with a magic
errno to indicate the condition that it's not possible to allocate a new
buffer in this socket. I would like to propose making the Send functions
set ENOBUS or EAGAIN or EWOULDBLOCK. From the sendmsg manpage:
EAGAIN or EWOULDBLOCK
The socket is marked non-blocking and the requested
operation would block.
ENOBUFS
The output queue for a network interface was full.
This generally indicates that the interface has stopped sending, but may
be caused by transient congestion. (Normally, this does not occur in
Linux. Packets are just silently dropped when a device queue overflows.)
Finally, we need to make sure that we decrement the total number of
bytes allocated whenever an associated buffer is transmitted or dropped
by the egress NetDevice. Proposed API:
The easiest adhoc way to implement the above would be something along
the lines of:
class Packet
{
public:
enum NotificationType {
DEQUEUED,
// XXX: maybe other types ?
};
void SetNotificationListener (Callback<Ptr<Packet>, enum
NotificationType>callback);
// clear listener once it's been notified.
void NotifyDequeued (void);
};
the udp socket could call SetNotificationListener in the packet before
sending it down and the NetDevices would be responsible for calling
Notify when the relevant event happens.
Potential problem: lifetime management of the callback: what happens if
the socket which sent the packet is closed before the NetDevice calls
Notify and, then, the NetDevice calls Notify ?
Other options which suck:
a. introduce Packet::SetDestroyListener and make Packet::~Packet call it
automatically. This sucks because we would have to be very careful in
all our tx paths to destroy the packets just at the right time. It also
sucks because it's going to be really hard to trace all the locations
where the destructor could be called and to make sure it's called at
least and only from where it needs to be called.
b. use packet tags and store the listener in a tag. This sucks because
tags can't really store pointers (they can but it's hard to get right)
and because that would force us to make sure everyone uses the same tag.
What's the point of using a tag if everyone in all devices needs to know
about it? At this point, it's part of the non-optional mandatory API so,
it's better to at least make this very clear by putting it explicitely
in the packet class API.
c. use existing trace sources/sinks in NetDevice. This sucks because, by
definition, traces are not supposed to have any side-effect other than
maybe generate some logging.
To summarize, I don't like any of the above solutions very much but I
feel that the first one is at least conceptually simple and reasonably
easy to maintain. Better ideas would be welcome.
Mathieu
More information about the Ns-developers
mailing list