[Ns-developers] GSOC - Parallel Simulations | First ideas
Daniel Mahrenholz
mahrenho at gmail.com
Tue Mar 25 08:40:56 PDT 2008
Maybe you can benefit from the work we did some time ago with ns-2 to
parallelize wireless simulations
(http://ivs.cs.uni-magdeburg.de/EuK/lehre/diplomarbeiten/ivanov_masterthesis.pdf).
The results where not that encouraging - at least for a conservative
simulation of a wireless system. But it includes some important
lessons learned and lots of related work.
Daniel.
On Tue, Mar 25, 2008 at 1:20 AM, Hagen Paul Pfeifer <hagen at jauu.net> wrote:
> Hello NS3 Dev,
>
> here are my _open_ approach for the area of parallel simulations:
>
>
>
> o Investigate and compare current parallelization techniques:
>
> - Threads (namely POSIX threads)
>
> o Enables the possibility to utilize CMP/SMP cores
> as well as other "physically distributed systems" nodes (cluster). Of
> course, threads by them self are bounded to a CPU, but an abstract layer
> based on thread logic can enable this at a later increment (see the next
> paragraphs for an in deep description)/ Thats a big advantage, compared
> to TBB and other micro level optimizations.
>
> o Fits more natural into the concept of simulator parallelization. Cause
> many processes can be considered as standalone (dependencies are
> considered separately at the and of this document).
>
> o Figure out how MPI (message passing interface) can be utilized and
> combined with the treading approach. Maybe as an start point for data
> distribution. These are the first shots, _could_ be changed in the design
> phase ... :)
>
> o POSIX threads are well known considered and applied in a wide area -
> community support is backed, a not to undervalued aspect!
>
> - TBB/OpenMP
>
> o Limited to CMP/SMP Systems
> o Simpler approach (easier to implement).
> o "Micro approach", where threads on the other hand, reflect more an
> "overall approach" and permits fine grained parallelization with
> dedicated locking primitives (reader/writer locks). On the other hand
> consider TBB the whole structure as one major blob with additional
> concurrency. IMHO an more suitable approach for our field of
> application.
>
>
> There are more issues with the underlying technique like platform
> support, number of users (expert knowledge), future outlook, et cetera,
> But I incline to POSIX threads - they seem to deliever the highest
> potential!
>
> o Proposed Architecture (still first shots, of course ;):
>
> Add an additional parallelization abstraction layer (with well defined
> interfaces). This enables several possibilities, namely to enable/disable
> the whole parallelization via a compile time switch ("mark" this feature
> as experimental in the beginning), make then replaceable and enable the
> possibility to extend the whole subsystem with additional distribution
> (cluster functionality) without major code subsitution in the ns-3 core.
>
> A clean but powerful parallelization layer is the goal to split
> implementation issues from the core ns3 logic! At least the newly added
> functionality should _interference_ the common functionality as less as
> possible to prevent "simulation result anomalies".
>
>
> o In the first part an profiling analysis (code coverage test, oprofile, et cetera,.)
> to detect CPU hogs to understand where processing power is consumed. Use this
> information to find approaches to split workload into several pieces.
> Currently my knowlendge is limited in this sector an may help me where are
> the major CPU hogs.
>
> - Spot causal dependencies within the model like global variables. Dig into
> major components (like wireless node dependencies, like radio
> interference, et cetera,). Categorize these into several groups to spot out
> the parallelization possibilities.
>
> - The parallelization trade-off should be determined to predicate the
> overall newly introduced overhead.
>
> - Study already published literature within the sector of parallelization of
> simulators. (thats my everyday job currently - I quite an novice in the
> field of simulator parallelization and there are an quite a lot of
> publications out there, if you also consider some "border-literature".
>
> - Dig into the implementation details of other distribution system with
> a similar algorithm scheme - don't reinvent the wheel a second time.
>
> o Data Dependencies and CPU Characteristics.
>
> A major challenge are inter-thread dependencies. Dependencies here denotes
> to data dependencies. To detect these dependencies the design phase are a
> major part, because they affect important performance critical issues. Which
> have impact for the overall performance and prevent unnecessary,
> uncoordinated synchronization between the newly invented threads.
>
> Furthermore CPU cache line trashing and CPU design should also be
> considered. There are a bunch of parallelization approaches (implemented
> with BTT for example) who reduce the overall performance compared to
> "unoptimized" code. The algorithm should consider therefore also CPU
> architecture issues into context!
>
>
> o Procedure/Agenda
>
> 1. Dig into the source
> 2. Analyze possible parallelization areas (task 1 and 2 can be done
> synchronously ;)
> 3. Study the literature and other applications with similar behavior
> 4. Start Programming (and documentation ;)
> 5. Build adequate unit test cases to verify simulator results (is is almost
> always a good feeling that you are backed up through strong unit tests! ;-)
>
>
>
> This is GSOC proposal, most likely not that stable (more alpha like ns3) but
> an attempt! ;)
>
>
> Best regards, HGN
>
>
> --
> Hagen Paul Pfeifer <hagen at jauu.net> || http://jauu.net/
> Telephone: +49 174 5455209 || Key Id: 0x98350C22
> Key Fingerprint: 490F 557B 6C48 6D7E 5706 2EA2 4A22 8D45 9835 0C22
> Always in motion, the future is.
>
More information about the Ns-developers
mailing list