[Ns-developers] GSOC - Parallel Simulations | First ideas

Hagen Paul Pfeifer hagen at jauu.net
Thu Mar 27 15:35:15 PDT 2008


* George Riley | 2008-03-27 10:24:17 [-0400]:

> Threads can be a good approach for tightly-coupled environments as you
> mention.  We have to be careful to avoid the need for "interlocking" (mutex
> protection) of state variables however.  Are you imagining a single event
> list shared  by all threads, or multiple lists (one per thread).  How will
> you determine which node object are "assigned" to each thread?  Or will this
> be dynamic?

Good question! Currently I study the all available documents about the
parallelization of simulations. Currently I prefer the one list per thread
model (cause this model is the potential candidate for node wide
distribution). But this isn't fix at the moment!

> MPI is the clear winner for distributed (across a network of  workstations),
> but am not sure it's best for a multi-threaded approach.  However there is
> an advantage towards a single approach for both (tightly-coupled and
> distributed).

Of course! I consider MPI as an "node spanned" distribution technique, not for
CPU local distribution. But the final algorithm should be aware to distribute
the workload via some well known technique. But as I mentioned some time ago:
I focus on a node local distribution, but keep in mind that the algorithm
should also scale in large!

> I would prefer we don't design something requiring the locking  primitives.
> THis is much too invasive, in the extreme requiring every state variable
> access to be locked.

Right - that is also my goal! I hope since one week that this is possible! ;)

> Ideally, someone using ns3 for simple serial simulations can ignore
> completely everything related to distributed sim; since the distributed sims
> will  in practice comprise a small fraction of use cases, we don't want this
> to be  intrusive.

ACK! (therefore the additional "parallelization layer"). But if the algorithm
work quite well we should use all cores! In the next couple of years,
performance gains is accomplished via parallelization! NS3 should keep up with
actual processor design.

> Getting into the details of cache line behavior is, as you likely know, very
> complex and platform dependent.  If we optimize for one processor, we might
> end up sub-optimal (or even poor) for another.  I would prefer we focus on
> the design and implementation of the distributed simulation independently of
> this. We can of course look in to cache optimization at a later time.

ACK, but there are a couple of best practices for all architectures! If you
are an rookie you can do a lot of trivial mistakes - they should be avoided!
To clarify: I don't mean over-optimization and processor specific hacks.
Rather a well defined overall algorithm will few locking primitives where
necessary.

> We need to decide on how to do "time management".  As you should know, any
> conservative distributed simulation needs some way to determine if a given
> event is "safe" to process.  There are a number of approaches to his.  See
> "Fujimoto: Parallel and Distributed Simulation Systems".

My daily subway (underground) literature! ;)

> We need some way (perhaps even manual by the ns3 user) to map the topology
> (all nodes/interfaces/links, etc) to individual processors/ systems.  How
> this is done has a tremendous effect on overall performance. People have
> looked at various graph partitioning algorithms but there is no "one  size
> fits all solution".  At a minimum we need a ns3-user specified mapping,
> either with an external file, or in the C++ code itself.

Mhh, user invasion should be avoided, thats a requirement by myself! But maybe
there are some situations where this hint can be an advantage. But I am not
sure at the moment! To be discussed!

> You can safely assume that the only simulation data flowing from one
> processor to another is a packet (except for overhead messages related to
> time synchronization).  We need to pay particular attention the  packet
> serialization (converting a packet class object to a serial stream of
> bytes) and reconstruction at the receiving end.  THe packet payload data is
> essentially serialized for free, but the meta-data and packets tags I'm not
> sure  about.  This will need to be handled correctly.

Hopefully. But I am not sure at the moment if other influence take place ...

> The notion of "lookahead" is well known and understood within
> the distributed simulation community.  We need a way to specify the
> lookahead value (automatically preferably) and to utilize this in the
> time synchronization protocol.

Right! This leads done to the conservative and optimistic approach mentioned
by Fujimoto.

> George

Thank you very much George! I really enjoyed reading you annotation!

Best regards and have a nice weekend, Hagen


-- 
Hagen Paul Pfeifer <hagen at jauu.net>  ||  http://jauu.net/
Key Fingerprint: 490F 557B 6C48 6D7E 5706 2EA2 4A22 8D45 9835 0C22
Always in motion, the future is. 




More information about the Ns-developers mailing list