[Ns-developers] GSOC - Parallel Simulations | First ideas

Hagen Paul Pfeifer hagen at jauu.net
Mon Mar 24 17:20:14 PDT 2008


Hello NS3 Dev,
 
here are my _open_ approach for the area of parallel simulations:
 
 
 
o Investigate and compare current parallelization techniques:
 
  - Threads (namely POSIX threads)

    o Enables the possibility to utilize CMP/SMP cores
      as well as other "physically distributed systems" nodes (cluster).  Of
      course, threads by them self are bounded to a CPU, but an abstract layer
      based on thread logic can enable this at a later increment (see the next
      paragraphs for an in deep description)/ Thats a big advantage, compared
      to TBB and other micro level optimizations.

    o Fits more natural into the concept of simulator parallelization. Cause
      many processes can be considered as standalone (dependencies are
      considered separately at the and of this document).

    o Figure out how MPI (message passing interface) can be utilized and
      combined with the treading approach. Maybe as an start point for data
      distribution. These are the first shots, _could_ be changed in the design
      phase ... :)

    o POSIX threads are well known considered and applied in a wide area -
      community support is backed, a not to undervalued aspect!

  - TBB/OpenMP

    o Limited to CMP/SMP Systems
    o Simpler approach (easier to implement).
    o "Micro approach", where threads on the other hand, reflect more an
      "overall approach" and permits fine grained parallelization with
      dedicated locking primitives (reader/writer locks).  On the other hand
      consider TBB the whole structure as one major blob with additional
      concurrency. IMHO an more suitable approach for our field of
      application.


    There are more issues with the underlying technique like platform
    support, number of users (expert knowledge), future outlook, et cetera,
    But I incline to POSIX threads - they seem to deliever the highest
    potential!

o Proposed Architecture (still first shots, of course ;):

  Add an additional parallelization abstraction layer (with well defined
  interfaces). This enables several possibilities, namely to enable/disable
  the whole parallelization via a compile time switch ("mark" this feature
  as experimental in the beginning), make then replaceable and enable the
  possibility to extend the whole subsystem with additional distribution
  (cluster functionality) without major code subsitution in the ns-3 core.

  A clean but powerful parallelization layer is the goal to split
  implementation issues from the core ns3 logic! At least the newly added
  functionality should _interference_ the common functionality as less as
  possible to prevent "simulation result anomalies".


o In the first part an profiling analysis (code coverage test, oprofile,   et cetera,.)
  to detect CPU hogs to understand where processing power is consumed. Use this
  information to find approaches to split workload into several pieces.
  Currently my knowlendge is limited in this sector an may help me where are
  the major CPU hogs. 

  - Spot causal dependencies within the model like global variables. Dig into
    major components (like wireless node dependencies, like radio
    interference, et cetera,). Categorize these into several groups to spot out
    the parallelization possibilities.

  - The parallelization trade-off should be determined to predicate the
    overall newly introduced overhead.

  - Study already published literature within the sector of parallelization of
    simulators. (thats my everyday job currently - I quite an novice in the
    field of simulator parallelization and there are an quite a lot of
    publications out there, if you also consider some "border-literature".

  - Dig into the implementation details of other distribution system with
    a similar algorithm scheme - don't reinvent the wheel a second time.

o Data Dependencies and CPU Characteristics.

  A major challenge are inter-thread dependencies. Dependencies here denotes
  to data dependencies. To detect these dependencies the design phase are a
  major part, because they affect important performance critical issues. Which
  have impact for the overall performance and prevent unnecessary,
  uncoordinated synchronization between the newly invented threads.

  Furthermore CPU cache line trashing and CPU design should also be
  considered. There are a bunch of parallelization approaches (implemented
  with BTT for example) who reduce the overall performance compared to
  "unoptimized" code. The algorithm should consider therefore also CPU
  architecture issues into context!


o Procedure/Agenda

  1. Dig into the source
  2. Analyze possible parallelization areas (task 1 and 2 can be done
     synchronously ;)
  3. Study the literature and other applications with similar behavior
  4. Start Programming (and documentation ;)
  5. Build adequate unit test cases to verify simulator results (is is almost
     always a good feeling that you are backed up through strong unit tests! ;-)



This is GSOC proposal, most likely not that stable (more alpha like ns3) but
an attempt! ;)


Best regards, HGN


-- 
Hagen Paul Pfeifer <hagen at jauu.net>  ||  http://jauu.net/
Telephone: +49 174 5455209           ||  Key Id: 0x98350C22
Key Fingerprint: 490F 557B 6C48 6D7E 5706 2EA2 4A22 8D45 9835 0C22
Always in motion, the future is. 


More information about the Ns-developers mailing list