[Ns-developers] merging multithreaded simulation core incrementally
Mathieu Lacage
mathieu.lacage at sophia.inria.fr
Wed Jul 15 02:55:42 PDT 2009
On Wed, 2009-07-15 at 10:39 +0100, Gustavo Carneiro wrote:
>
> 3) make simulator.h threadsafe: this requires the addition of
> assembly
> atomic operations (stolen from glib) to implement
> RefCountBaseThreadSafe
> and use it from EventImpl, as well as the use of a mutex lock
> in
> DefaultSimulatorImpl, similarly to RealtimeSimulatorImpl. A
> nice
> side-effect of this new feature is that it would allow getting
> rid of
> some ugly code in the emu and tab-bridge devices. The
> performance cost
> of that change is around 10% in micro-benchmarks for
> Scheduling
> operations, most of which is coming from the lock mutex in
> DefaultSimulatorImpl. I have not measured any impact on
> non-micro
> benchmarks.
>
> 10% in micro-benchmarks is OK; it will probably mean less than 1% in
> real world benchmarks.
I am not sure: some of my tests show this still on the order of 5% on
some non-micro benchmarks. Still need to finish running them.
>
> However, we should point out that the penalty will be huge for non-GCC
> or non-x86 builds, thereby greatly reducing the portability of NS-3.
The glib atomics are portable across many systems (alpha, arm, ppc,
etc.) so, I see not reason not to be able to do at least as good. Of
course, if users who don't use gcc wish to contribute patches to make
ns-3 work faster without gcc, I see no reason not to take their patches.
I wonder if icc supports gcc asm syntax. I should point out, though,
that none of our explicitely-supported platforms include a non-gcc
compiler.
> Should we be sacrificing portability of NS-3 for the sake of parallel
> execution?
Certainly not.
>
> Or, put another way, how much do you think parallel execution will
> really gain us, in terms of speed, and at what cost in terms of
> complexity? Personally, I already have been doing parallel
> simulations in multicore systems for ages, it is called multiple
> processes. The only downside of multiple processes is that you
> require more RAM to run multiple simulations in parallel than you
> would need to run a single simulation faster.
Yes, but the two are different usecases.
> Isn't it possible to just create an alternative scheduler which is
> thread safe and leave the default scheduler alone, just like the
> realtime scheduler is thread safe but does not interfere with the
> simple default one?
I think that it would make sense to make simulator.h threadsafe,
independently from whether or not we get parallel multithreaded
simulations.
>
> 4) various changes to the simulator.h API to make it nicer to
> the
> non-default SimulatorImpl subclasses:
> - make Simulator::SetScheduler take an ObjectFactory rather
> than a
> Ptr<Scheduler>
> - kill Simulator::RunOneEvent, IsFinished, and, Next
> - introduce Simulator::SetMonitoringCallback(Callback<void>
> cb);
>
> 5) changes to various APIs in src/core to make them threadsafe
> or make
> it easier to do threadsafe things. I have not yet investigated
> the cpu
> runtime cost of these features (they have no memory cost).
> This is
> really about:
> - use RefCountBase where users use their own refcounting
> implementation.
> - replace RefCountBase by RefCountBaseTs where needed
> - make Object::Ref/Unref threadsafe.
> - make AsciiWriter threadsafe
> - remove Packet::PeekData and use Packet::CopyData instead
> where
> needed. This will make it easier to use a non-COW
> implementation of
> Packet if needed in the multithreaded case.
>
> Again, if you add thread safety to all these classes you are
> sacrificing simplicity and performance of non-threaded simulations for
> the sake of parallel scheduler. I do not know whether there is an
> alternative, but we should think carefully before going down this
> path.
I believe that this comment does not apply to (4) above. Some items in
(5) are just nice cleanups (make sure we don't re-implement refcounting
in too many places). Others are more intrusive and I agree that we need
to be careful.
Things I think are ok:
- use RefCountBase where users use their own refcounting implementation
(this requires some changes to the RefCountBase API to avoid perf
regressions but I have this locally so, no big deal).
- remove Packet::PeekData and use Packet::CopyData instead where
needed. I think that this is a nice way to improve encapsulation of the
Packet class without any perf cost (all current users of PeekData
perform a copy in one way or another. Sometimes, it's well hidden, but
the copy is there so I think that it's nice to make it explicit).
The others are much less useful without the multithreaded simulation
core so, I would refrain from merging them before. I guess I mentionned
them here merely to give a taste of what needs to be done to do the
final merge. I will take back:
- replace RefCountBase by RefCountBaseTs where needed
- make Object::Ref/Unref threadsafe.
- make AsciiWriter threadsafe
>
Mathieu
More information about the Ns-developers
mailing list