[Ns-developers] A Reason for Slowness of NS3

Mathieu Lacage mathieu.lacage at sophia.inria.fr
Fri Feb 13 21:57:28 PST 2009


On Fri, 2009-02-13 at 14:58 -0500, Adrian Sai-Wah TAM wrote:
> Hi,
> 
> Recently I did some profiling with the NS3. I compile everything with
> the optimized build and measure the time in my laptop computer running
> Debian. Following is the result.

cool.

> 
> $ time LD_LIBRARY_PATH=~/ns-3-dev/build/optimized ./csma-star
> --SchedulerType=ns3::MapScheduler
> real    0m2.805s
> user    0m2.692s
> sys     0m0.028s
> $ time LD_LIBRARY_PATH=~/ns-3-dev/build/optimized ./csma-star
> --SchedulerType=ns3::Ns2CalendarScheduler
> real    0m2.863s
> user    0m2.740s
> sys     0m0.040s

It seems consistent with every other benchmark result I have seen.

> It looks not bad. However, when I compile everything into binary (i.e.
> not using libns3.so) with "-pg -ggdb" options and pass through the
> gprof, I found the following:
> 
>  %   cumulative   self              self     total
>  time   seconds   seconds    calls   s/call   s/call  name
>  1.81      1.60     0.12 14590560     0.00     0.00
> ns3::TypeId::TypeId(ns3::TypeId const&)
>  1.66      1.83     0.11 15729313     0.00     0.00  ns3::TypeId::~TypeId()

You might know this already but, just to make sure no one gets infected
by this strange belief, the time shown here is wholly irrelevant: C++
programs very logic-heavy such as ns-3 with lots of small functions and
templates should be profiled with statistical profilers such as oprofile
or sysprof, not with profilers such as gprof (unless, of course, you use
gprof only as a tool to measure the number of function calls).

> What caught my eye is the number of calls. I studied the code and

Indeed, this is a relevant measure.

> found that there are a lot of cases using TypeId and whenever a
> function call is returning TypeId or taking TypeId as a *read only*
> parameter, it is done as follows:
> 
> TypeId GetTypeId() const;
> bool function(TypeId in);
> TypeId tid = GetTypeId();
> 
> I spend a day to change all such function into:
> 
> const TypeId& GetTypeId() const;
> function(const TypeId& in);
> const TypeId& tid = GetTypeId();
> 
> That means, I try to avoid invoking the copy constructor as much as
> possible by using references. After that, the regression test still
> pass and the new running time is as follows:
> 
> $ time LD_LIBRARY_PATH=~/ns-3-dev/build/optimized ./csma-star
> --SchedulerType=ns3::MapScheduler
> real    0m2.788s
> user    0m2.660s
> sys     0m0.052s
> $ time LD_LIBRARY_PATH=~/ns-3-dev/build/optimized ./csma-star
> --SchedulerType=ns3::Ns2CalendarScheduler
> real    0m2.858s
> user    0m2.748s
> sys     0m0.024s
> 
> Not a huge improvement, but observable. And what I see from the gprof output is:
> 
>  %   cumulative   self              self     total
>  time   seconds   seconds    calls   s/call   s/call  name
>  0.49      3.32     0.03  2503869     0.00     0.00  ns3::TypeId::~TypeId()
>  0.33      3.85     0.02  1365059     0.00     0.00
> ns3::TypeId::TypeId(ns3::TypeId const&)
> 
> which is an order of magnitude fewer number of calls.
> 
> This story says that, performance of NS3 can improve by avoiding copy
> constructors when it is possible to use reference. TypeId is not the
> only case. I can provide the diff file of my changes, just to see if
> other people agree this is a necessary to make the code nicer.

Before I get to commenting this specific change, let me point out one
thing which I think is very important: you have been using an optimized
PIC shared library build. As I explained before on this mailing-list,
this results in about a 40% wall-clock pessimization on every benchmark
due to the use of an extra indirection jump for almost every non-static
function call. If you can manage to build a static non-pic version of
ns-3, you will see most likely a _very_ different performance profile
with _very_ different functions on top. 

Now, what I am trying to get to is that spending time on trying to
figure out how we can enable static builds as widely as possible, or
possibly instead use techniques such as the one which came up last time
we talked about it on this mailing-list (see
http://mailman.isi.edu/pipermail/ns-developers/2009-January/005182.html
and
http://mailman.isi.edu/pipermail/ns-developers/2009-January/005186.html)
would be a much more useful contribution than trying to shave off less
than 1% with micro-optimizations.

Anyway, specific comments below:

1) this kind of less than 1% performance improvement in within the noise
of measurement (could you show average relative perf improvement
together with variance for 20 runs to show proof that you are not
playing with noise ?)

2) returning a _reference_ from a function ? Holly crap, never, thank
you.

3) transforming by-value input arguments to const reference input
arguments ? Maybe.

To summarize, I would be fine with a patch which did 3). 2) is not going
to happen. 1) would be great. But, really, trying to figure out how to
solve the static non-pic build problem would be probably a much better
way to save CO2 emissions (or get more stuff produced with the same
amount of emission).

thanks for looking into this issue,

regards,
Mathieu



More information about the Ns-developers mailing list