[Ns-developers] [ns3] Statistical framework (draft)

Mathieu Lacage mathieu.lacage at sophia.inria.fr
Tue May 6 11:29:25 PDT 2008


hi joe, all,

On Thu, 2008-05-01 at 18:57 -0400, Joseph Kopena wrote:

> - There's a stat framework interface.  Its job is to manage all the
>   statistics produced in a run, and store it in some form such that it
>   may be easily aggregated with data from other runs.
> 
>   + Two standard implementations of this would be to collect the data
>     into XML logs, and another to collect into a DB.
> 
> - Stat objects are created to track and produce different statistics.
>   For example, you might have one that simply tracks min and max,
>   another for mean, one to count & average, another for confidence
>   interval, some tracking timing, etc.
>   + Stat objects have two basic methods: One through which the
>     framework may poll the object for its data to store, and another
>     through which the data is updated.  Both have to be fairly
>     generic, which will take some thought.
> 
> - To use the framework, sim user code has to connect "events" to the
>   stats.  You would instantiate your stat framework instance, then
>   instantiate your stat objects.  Then you would connect them to code
>   reporting events.  For example, in a custom application you might
>   pass in a stat object and just have the app call the update method
>   directly.  For an internal event, you would connect a relevant trace
>   source to a stat object you've created, i.e. passing in the update
>   method as the callback to use when the trace occurs.

The above is in line with what I had in mind so, I don't have much to
add to the overall picture :)

A while ago, I wrote the attached document to try get started on the
low-level aspect of this project but I never got to the point where I
liked its content sufficiently to share it on the list. Since I won't
have time to finish it anytime soon, I thought I should send it here for
consideration: someone might find it useful.

regards,
Mathieu
-------------- next part --------------
0) Rationale
------------

When you run an event-driven simulation, one could imagine that every event
which happens is logged for offline analysis. Doing this for every kind
of event of interest is, however, something which is not practically feasible
given the amount of data all simulation events would represent, even for
very small simulations. This leads us to the first reason why we would like
to introduce a statistics package in ns-3: we want to perform some 
not-too-complicated statistical calculations from within the simulator to
decrease the amount of event data which must be permanently serialized on
disk for offline analysis.

Another reason for adding some basic statistical support to the simulator
is to use online statistics to drive the behavior of the simulation: rather
than specify that the simulation should run for x seconds, the user should
be able to say that the simulation should run for an unspecified amount of 
time, until the mean or the standard deviation of a variable is bounded within
a specified confidence interval with a specified probability. It should also
be possible to control a set of simulation scenarios and average measures
over these scenarios until a certain statistical condition is reached.

1) Use-cases
------------

Once we have setup the tracing system to report events to the user, we need
to decide the kind of treatment we want to apply to these events:

i) get rid of the first X events or all the events generated during the
   initial period T. This is typically done to try to mitigate the effect
   of the simulation initial conditions on the statistics extracted from
   the simulation

ii) calculate the number of times a specific event happens. For example,
   I might want to calculate the number of packet drops which occured 
   during the simulation

iii) accumulate the value reported by each event. For example, I might
   want to calculate the total number of bytes received on a specific node.

iv) keep track with a moving window of the event values reported during
   a fixed time period. This is typically done to calculate an
   "instantaneous" average over this window.

v) calculate an average/quantiles/confidence interval over a set of 
   event values. For example, I might use iv) to store the number of
   bytes for each packet received and calculate the resulting moving 
   average and confidence interval to get the average size of each 
   packet received.

vi) calculate a time average over a set of event values over a specific
   time period. iv) would be used in conjunction with this to report an
   "instantaneous" throughput value. It could also be used to calculate
   the jitter if the average is applied to the variation of the 
   transmission delay.

vii) generate a histogram of the distribution of the values of a 
   variable. For example, this might be very useful if I want to look
   at the distribution of the inter arrival time at the entry of
   a queue to validate that it looks like a poisson process.

2) Implementation considerations
--------------------------------

A key implementation issue to consider is that of memory usage: we 
mentioned above that the main rationale for building this statistical
framework is to avoid having to store very large amounts of data
for offline analysis. Similarly, we cannot really afford to store
in memory all the data of interest to perform single-pass statistical
calculations: we could provide naive implementations of most
classic statistical tools but these will not scale beyond the simplest
simulations. While it is quite trivial to avoid this problem when the
goal is to calculate a mean or a variance, avoiding having to store a 
large number of variables is harder when the goal is to generate an
histogram or a set of quantiles.

This memory problem has been tackled by a number of algorithms:
  a) "The P2 algorithm for dynamic calculation of quantiles and 
    histograms without storing observations", by Jain, and Chlamtac
    http://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf
  b) "K-Split on-line density estimation for simulation result
     collection", by Andras Varga
  c) "Simulation-based estimation of quantiles", by E.Jack Chen, and
     W. David Kelton.
     http://www.informs-cs.org/wsc99papers/059.PDF
  d) "Sequential procedure for simultaneous estimation of several 
     percentile" by KEE Raatikainen, and "Sequential Estimation of 
     Quantiles" by JR Lee, D McNickle, K Pawlikowski both improve
     a) to provide an estimate of the quantile error to stop 
     simulations

3) Other simulators
-------------------

Omnetpp provides a rather complete statistical package which includes:
  - mean and variance calculation
  - fixed-size+variable-size histograms using a classic algorithm which
    requires all measures to be present in memory at the same time
  - quantile values through the P2 algorithm
  - an implementation of the ksplit algorithm for density estimation

ns-2 has a number of external statistical packages:
  - ns2measure (http://info.iet.unipi.it/~cng/ns2measure/ focuses on the
    task of making it easy to specify a simulation termination condition
    with a statistical condition rather than a time-based condition.
  - ns-2/akaroa-2: (http://www-tkn.ee.tu-berlin.de/research/ns-2_akaroa-2/ns.html)
    seems to be doing what ns2measure does with less emphasis on the analysis
    part and more emphasis on the parallelization of the independent runs.
  - ns2graph: (http://sourceforge.net/projects/ns2graph/) this project is
    focused on generating trace files from ns-2 simulations and automatic
    the task of generating graphs from trace files together with very basic
    statistics.

4) A proposal for ns-3
----------------------

I think that the Omnet++ package provides the right kind of functionality
for ns-3 so, we should focus on:

1) Provide a set of classes to incrementally calculate statistical properties:
  - average
  - standard deviation and confidence intervals
  - quantiles
  - variable-sized bin histograms through a P2-like implementation
  - variable-sized bin histograms through a ksplit-like implementation

2) Provide a simple way to hook simulation stop event to the standard deviation
and confidence interval calculations.

More specifically, the proposal would be to define:

class Mean
{
public:
  void Add (double v);
  double GetMean (void) const;
  double GetStandardDeviation (void) const;
  double GetVariance (void) const;
  double GetConfidenceInterval (void) const;
};

class MeanOverPeriod
{
public:
  MeanOverTime (Time period);
  void Add (double v);
  double GetMean (void) const;
  double GetStandardDeviation (void) const;
  double GetVariance (void) const;
  double GetConfidenceInterval (void) const;
};

// use P2 algorithm to keep track of quantiles
class Quantiles
{
public:
  // n: number of quantiles to measure.
  Quantiles (uint32_t n);
  void Add (double v);
  // return quantile number i where i < n
  double Get (uint32_t i) const;
};

// use P2 algorithm
class Histogram
{
public:
  struct Box {
    uint32_t n;
    double min;
    double max;
  };
  // n: number of boxes in histogram. bounds memory usage.
  Histogram (uint32_t n);
  void Add (double v);
  // return box value where i < n.
  Box GetBox (uint32_t i) const;
};

// use k-split algorithm
class KSplitHistogram
{
public:
  struct Box {
    uint32_t n;
    double min;
    double max;
  };
  KSplitHistogram ();
  void Add (double v);
  uint32_t GetBoxN (void) const;
  Box GetBox (uint32_t i) const;
};


More information about the Ns-developers mailing list