[Ns-developers] ns-3-dev performance

mathieu lacage Mathieu.Lacage at sophia.inria.fr
Sat Mar 3 04:47:56 PST 2007


On Sat, 2007-03-03 at 12:33 +0000, Gustavo Carneiro wrote:
>   The other day I did a small experiment with the ns-3-dev branch; changed
> the serial netdevice example to make it create 10000 pairs of nodes, each
> pair exchanging 500 byte UDP packets every 50 ms, for 30 seconds.  It took
> about 2 hours and ten minutes to run (on a core 2 duo 1600 Mhz), and spent a
> maximum of 35 MiB of virtual memory.
> 
>   The conclusion I draw from this experiment is that, while ns-3 is heading
> towards an amazing memory performance, the processing time could probably be
> improved.  Since the memory consumption is much lower than what we usually
> need, maybe there are speed/memory tradeoffs we can take advantage of?
> 
>   One such tradeoff that occurred to me was that right now "real" packets
> are being generated by serialization of protocol headers.  Maybe if we could

Well, guesses are mostly useless if you want to talk about performance:
you need actual profiling results. Did you profile this simulation
experiment ? If so, what does the profile look like ?

> have packets a sequence of headers instead of a sequence of bytes, similar
> to what is done in ns-2, we can reduce the processing time of packets.  The

The short answer is that you could maybe get something but my own
experiments show that you won't ever get more than 20% CPU improvement
if you use memcpy rather than the full-blown
serialization/deserialization code. You can try it out by yourself by
changing the serialization routines to do a correspondingly-sized
memcpy. It should not be too hard :)

> packet could automatically detect if it is being captured to a pcap trace
> file and serialize the headers only if needed.  What do people think of this
> idea?

I think that the potential performance improvement (20% on raw packet
srialization/deserialization according to my tests but other independent
tests would be welcome) is not worth the complexity increase. A clear
use-case is also supporting correctly complex fragmentation/reassembly
scenarios which can become reasonably complicated if you do not
manipulate the real packet.

All in all, I think that this design is about tradeoffs. Given the list
of requirements, you won't be able to solve all problems at once
perfectly so, you have to make tradeoffs. I believe that 20% CPU (in the
theoretical case of having a simulation do _only_ packet
serialization/deserialization which will never happen in practice) is a
decent tradeoff.

But I would be interested in seeing your benchmark code or real
profiling results (I suggest oprofile for the profiling part). Real data
is always interesting.

Mathieu



More information about the Ns-developers mailing list