[Ns-developers] [ns3] Statistical framework (draft)

Mathieu Lacage mathieu.lacage at sophia.inria.fr
Fri May 16 10:06:11 PDT 2008


On Thu, 2008-05-15 at 14:17 +0200, Vincent Gauthier wrote:
> Hi Joseph, everybody,
> 
> I read the draft that Mathieu sent us last week, and I started  some  
> rough implementation of the P^2 algorithm. The first step could be the

that is mighty cool ! If you can post it somewhere publicly, that would
be most helpful (hint: http://freehg.org/ provides free as in beer
hosting of mercurial repos).

> definition of a set of methods for the statistical container that we  
> could be enhance over the time (the ones that have been defined before  
> mean, median (P^2 algo), counter, and confidences intervals). Once  
> this step is over the next one will be to efficiently link theses  
> containers to some variables inside the code (callbacks) and define  
> how to update stats containers efficiently. Maybe adding a method to

Yes. What is likely is that we will need some glue between the trace
source which is a model-specific trace event generator and the
statistics class which knows nothing about the model trace event but
which knows how to calculate some statistics. We will need some glue to
convert the model-specific trace event into something which can be
counted by the statistics classes.

For now, the simplest way to implement that glue is to use a
user-provided adhoc callback which receives the model-specific trace
event and calls a statistics object. Trying to figure out a potentially
better (more automatic) way to do this would be nice but I am not sure
that it is feasible.

> sample over time each update in the stats container instead of keeping  
> track of each change of one variable. That would certainly improve the  
> simulator performances (but also lead to errors if the sample time is  
> over estimate).

Sampling is another issue: my math background is very dim and very far
but I am not even sure that sampling whenever the variable value changes
is the right thing to do from a mathematical perspective. I actually
doubt that there is a single right way to sample the data: it will
probably depend on the type of data, the kind of statistics calculated
and the kind of use made of these statistics.

> I haven't a lot of things to add to what it has been said over the  
> thread about the stats framework, It seems that the main concern about  
> the stats that Tom and Mathieu have highlighted (in mailing list and  
> in the draft) is to provide confidence interval, CDF and PDF to the  

Yes, there is one part of the statistical framework (let's call it the
low-level part) which is about calculating statistics: this part is
relatively well defined, is the part I focused on in my first post and
which you seem to have started work. 

> users but also for simulations matters like stopping the simulation or  
> starting the measurements (when the simulation has reach a steady  

That is another component of the statistical framework: providing a way
to estimate confidence intervals and act upon them. I have a very poor
handle on these issues but I think that:
  1) we need to make sure that every statistics we calculate can also
provide a measure of a confidence interval for these underlying
statistics.
  2) we need to figure out if it is possible to implement a generic
control logic based on these confidence intervals to control the
simulation end time.

I think that 1) is very important and should be kept in mind while
implementing the low-level part. 2) might be very tricky: reviewing
existing simulators to see what they do for this would be helpful.

> state). The second point that have been highlighting is to provide an  
> efficient way to gathering and storing the statistical result in a  
> database like container, that enable us to give an enhanced output to  
> the users through xml file and so on (that is your point Joseph). The  

Yes. Joe seems mostly focused on that latter point: abstracting the
storage of simulation data and statistics. It would probably be helpful
to try to review the state of the art in other simulators concerning
that issue.

> last point is to provide a set of statistical container available  
> through callback methods directly accessible anywhere inside the  
> simulator. Theses container should handle methods to perform the  
> calculation of the stats. It would also be nice if we let the user  
> under certain condition the ability to have access to a raw data on  
> demand (sample of somethings over the time) because certain kind of  
> variables could only be analysed this way.

I think that what you describe above is a sort of "generic" glue where
glue is what I alluded to at the beginning of that email. If not, would
you mind try to elaborate on what you described above ?

> My major concern is what should be the policy for linking a variable  
> into the code to a stats container. For me less time your are updating  
> the stats container better will be the performance (even with P^2 algo  
> if you want to get the median for example of somethings in Mac layer  
> of wireless network for all the node it would probability slow down  
> the simulation, where should be the threshold ?).

I think that different simulations will have different performance
characteristics. It is easy to build a simulation where a value almost
never changes compared to the update interval which will make the
"update interval" solution potentially much slower because it schedules
many extra simulation events. Is it possible to use a self-adaptive
update interval ? Probably, but, is it worth the extra implementation
complexity ? If so, what would be the impact on the confidence
intervals ? For now, I would first focus on implementing something which
performs a statistics update upon every variable change because that is
the easiest thing to do.

Now, to go back to the other issues which were raised:
  a) data storage abstraction
  b) simulation control based on confidence intervals
  c) multiple simulation run management

For c), I am almost certain that there are existing tools which handle
that and I think that omnetpp is actually using one of them (can't
remember the name offhand) so, it would be nice if someone could try to
review these existing tools and see what we can learn from them. 

Basically, for a), b) and c), we really need to sit down and:
  - think seriously about use-cases
  - review existing tools and solutions

regards,
Mathieu



More information about the Ns-developers mailing list