[Ns-developers] merging multithreaded simulation core incrementally

Mathieu Lacage mathieu.lacage at sophia.inria.fr
Wed Jul 15 02:55:42 PDT 2009


On Wed, 2009-07-15 at 10:39 +0100, Gustavo Carneiro wrote:

>         
>         3) make simulator.h threadsafe: this requires the addition of
>         assembly
>         atomic operations (stolen from glib) to implement
>         RefCountBaseThreadSafe
>         and use it from EventImpl, as well as the use of a mutex lock
>         in
>         DefaultSimulatorImpl, similarly to RealtimeSimulatorImpl. A
>         nice
>         side-effect of this new feature is that it would allow getting
>         rid of
>         some ugly code in the emu and tab-bridge devices. The
>         performance cost
>         of that change is around 10% in micro-benchmarks for
>         Scheduling
>         operations, most of which is coming from the lock mutex in
>         DefaultSimulatorImpl. I have not measured any impact on
>         non-micro
>         benchmarks.
> 
> 10% in micro-benchmarks is OK; it will probably mean less than 1% in
> real world benchmarks.

I am not sure: some of my tests show this still on the order of 5% on
some non-micro benchmarks. Still need to finish running them.
> 
> However, we should point out that the penalty will be huge for non-GCC
> or non-x86 builds, thereby greatly reducing the portability of NS-3. 

The glib atomics are portable across many systems (alpha, arm, ppc,
etc.) so, I see not reason not to be able to do at least as good. Of
course, if users who don't use gcc wish to contribute patches to make
ns-3 work faster without gcc, I see no reason not to take their patches.
I wonder if icc supports gcc asm syntax. I should point out, though,
that none of our explicitely-supported platforms include a non-gcc
compiler.

>  Should we be sacrificing portability of NS-3 for the sake of parallel
> execution?

Certainly not.
> 
> Or, put another way, how much do you think parallel execution will
> really gain us, in terms of speed, and at what cost in terms of
> complexity?  Personally, I already have been doing parallel
> simulations in multicore systems for ages, it is called multiple
> processes.  The only downside of multiple processes is that you
> require more RAM to run multiple simulations in parallel than you
> would need to run a single simulation faster.

Yes, but the two are different usecases.

> Isn't it possible to just create an alternative scheduler which is
> thread safe and leave the default scheduler alone, just like the
> realtime scheduler is thread safe but does not interfere with the
> simple default one?

I think that it would make sense to make simulator.h threadsafe,
independently from whether or not we get parallel multithreaded
simulations.

>         
>         4) various changes to the simulator.h API to make it nicer to
>         the
>         non-default SimulatorImpl subclasses:
>          - make Simulator::SetScheduler take an ObjectFactory rather
>         than a
>         Ptr<Scheduler>
>          - kill Simulator::RunOneEvent, IsFinished, and, Next
>          - introduce Simulator::SetMonitoringCallback(Callback<void>
>         cb);
>         
>         5) changes to various APIs in src/core to make them threadsafe
>         or make
>         it easier to do threadsafe things. I have not yet investigated
>         the cpu
>         runtime cost of these features (they have no memory cost).
>         This is
>         really about:
>          - use RefCountBase where users use their own refcounting
>         implementation.
>          - replace RefCountBase by RefCountBaseTs where needed
>          - make Object::Ref/Unref threadsafe.
>          - make AsciiWriter threadsafe
>          - remove Packet::PeekData and use Packet::CopyData instead
>         where
>         needed. This will make it easier to use a non-COW
>         implementation of
>         Packet if needed in the multithreaded case.
> 
> Again, if you add thread safety to all these classes you are
> sacrificing simplicity and performance of non-threaded simulations for
> the sake of parallel scheduler.  I do not know whether there is an
> alternative, but we should think carefully before going down this
> path.

I believe that this comment does not apply to (4) above. Some items in
(5) are just nice cleanups (make sure we don't re-implement refcounting
in too many places). Others are more intrusive and I agree that we need
to be careful. 

Things I think are ok:
 - use RefCountBase where users use their own refcounting implementation
(this requires some changes to the RefCountBase API to avoid perf
regressions but I have this locally so, no big deal).
 - remove Packet::PeekData and use Packet::CopyData instead where
needed. I think that this is a nice way to improve encapsulation of the
Packet class without any perf cost (all current users of PeekData
perform a copy in one way or another. Sometimes, it's well hidden, but
the copy is there so I think that it's nice to make it explicit).

The others are much less useful without the multithreaded simulation
core so, I would refrain from merging them before. I guess I mentionned
them here merely to give a taste of what needs to be done to do the
final merge. I will take back:
 - replace RefCountBase by RefCountBaseTs where needed
 - make Object::Ref/Unref threadsafe.
 - make AsciiWriter threadsafe
>         
Mathieu




More information about the Ns-developers mailing list