[Ns-developers] A Reason for Slowness of NS3
Mathieu Lacage
mathieu.lacage at sophia.inria.fr
Sun Feb 15 00:47:33 PST 2009
On Sat, 2009-02-14 at 16:41 -0500, Adrian Sai-Wah TAM wrote:
> > Before I get to commenting this specific change, let me point out one
> > thing which I think is very important: you have been using an optimized
> > PIC shared library build. As I explained before on this mailing-list,
> > this results in about a 40% wall-clock pessimization on every benchmark
> > due to the use of an extra indirection jump for almost every non-static
> > function call. If you can manage to build a static non-pic version of
> > ns-3, you will see most likely a _very_ different performance profile
> > with _very_ different functions on top.
> Yes, I know. Actually I have a handful of ways to optimize NS3 a
> little bit. Of course, I am looking for the possibility to boost the
> performance by 50% or more but what I posted about >10M call on copy
> constructor is just something sound strange to me at the first glance.
I understand this.
> > Now, what I am trying to get to is that spending time on trying to
> > figure out how we can enable static builds as widely as possible, or
> > possibly instead use techniques such as the one which came up last time
> > we talked about it on this mailing-list (see
> > http://mailman.isi.edu/pipermail/ns-developers/2009-January/005182.html
> > and
> > http://mailman.isi.edu/pipermail/ns-developers/2009-January/005186.html)
> > would be a much more useful contribution than trying to shave off less
> > than 1% with micro-optimizations.
> Agree. But why we can't just, say, besides having -d debug and -d
> optimized in waf configure, we add -d static to that as well? This
> sounds natural and consistent to me.
Yes, someone just needs to make it work and patches to do so are
pre-approved. Once this is done, if you feel concerned about the kind of
micro-optimization you are looking at, and if you can show measurable
and repeatable improvements without uglyfying the code too much, then,
yes, I am all for these changes. But trying to do this now _before_
dealing with the lower-hanging fruit makes little sense because the
non-pic change will most likely decrease a lot the overhead of very
small function calls.
> > 1) this kind of less than 1% performance improvement in within the noise
> > of measurement (could you show average relative perf improvement
> > together with variance for 20 runs to show proof that you are not
> > playing with noise ?)
> I didn't do this and I agree with you, 1% improvement is not a big
> deal. But what I want to raise is that, a lot other instance of copy
> constructors besides TypeId are called while reference can do the job.
> Since allocating and releasing memory doesn't cheap, why don't we scan
> the code for once to see if we can change something to make it more
> economical?
Because the kind of changes you propose are not always a win on every
hardware architecture and compiler, and they don't really improve the
readability of the code.
> > 2) returning a _reference_ from a function ? Holly crap, never, thank
> > you.
> True, it sounds weired. But look at this:
> static TypeId Class::GetTypeId()
> {
> static TypeId tid = ....;
> return tid;
> }
> Shouldn't this frequently-called thing be something read-only? If so,
> I think returning a const reference is good. Of course, just returning
> reference is dangerous but make it const shall be fine. (I believe)
It would be correct but, as I said, this kind of change does not improve
the readability of the code and that is my main concern: a less than 1%
improvement is not sufficient to justify such a code readability
decrease when the goal is to try to keep this code running in 10 years
from now, when those who wrote the code in the first place are dead or
gone, and when the hardware architectures we use are very different from
now.
Let's say that you could get 10% wall-clock time improvement with very
many similar modifications which sprinkle our code with returned const
refs everywhere (I don't believe you could achieve this, but, feel free
to prove me wrong :): even that would be a very hard sell from a
maintenance point of view because, I could bet whatever you want that in
10 years from now, compilers (in 1 or 2 years, LTO will be available for
gcc and that new feature, on its own, will change everything we take for
granted for the optimization of C++ programs) and hardware will have
changed so much that this kind of technique will be useless or, worse,
actively harmful and the new kids graduating won't understand that we
were so un-insightful to take such short-term gains over long-term
simplicity.
Now, to get back to the real issue, I would try to investigate why
GetTypeId is called so often instead of blindly modifying the code. In
that case, although I did not profile the code, I would bet that this
function is called from Object::GetObject and that function is called
from the ipv4 stack not caching the output of GetObject.
Object::GetObject is a linear search within the list of aggregated
objects. As such, using it from a performance sensitive code path does
not make much sense. i.e., I would bet that if you bothered with caching
the output of GetObject from within the ipv4 stack, you would get much
better micro-benchmark results than what you have now. Of course, that
would mean increasing your memory footprint to cache the pointer values
so, again, is it going to be worth the change ? Only the ipv4
maintainers can say so :)
regards,
Mathieu
More information about the Ns-developers
mailing list