[Ns-developers] [ns3] Statistical framework (draft)
Vincent Gauthier
vglist at mac.com
Tue Apr 29 06:39:10 PDT 2008
Hi Joseph,
Le 28 avr. 08 à 16:20, Joseph Kopena a écrit :
> On Fri, Apr 25, 2008 at 7:39 AM, Vincent Gauthier <vglist at mac.com>
> wrote:
>> I am proposing to start a statistical framework implementation.
>> The aim of
>> this framework is to provide an easy access to all the pre-defined
>> variables
>> in each layer of the simulator, offering to the users the most
>> used a set of
>> methods to perform statistical analysis (means, medians, confidence
>> intervals) and giving to users a friendly way to analyze the outputs.
>
> Hi Vincent,
>
> This all sounds good to me, and it's definitely a great thing to work
> on. I could certainly use it! I'm also interested in helping with
> this task.
Great, thx
> Comments so far:
>
> - How are you envisioning the framework being bound to the variables?
> My conception was that the bulk of the statistics framework would be a
> generic box collecting input and producing numbers, without
> particularly caring what those inputs are. Sim writers can then
> easily use it for their own metrics, an important goal, as well as
> connecting to "internal" simulation data, such as buffer drops, frames
> generated, etc.
>
> For example, in simulations I wrote a little bit ago, I wanted to stop
> logging everything as that was my biggest slowdown and I did not need
> all the data. Instead, I made a simple (global) object that keeps a
> set of counts, keyed by a tag. For events I care about. I write some
> code updating a count for that tag. For example, "frame" gets updated
> by 1 everytime a frame is sent by a netdevice, "bytes" gets updated w/
> the bytes for each frame, "registration" whenever a service is heard
> by a broker, etc. At the end of the simulation, the stat counter
> dumps all the counts for all the fields, which is recorded and then
> collated by a set of scripts to aggregate multiple trials at multiple
> node densities, etc. That's obviously a very simple "stats
> framework," but that's the basic
> approach I've been thinking of. The key point is not binding to
> predefined variables, and only collecting for things you're interested
> in. The big next steps from that would be to utilize the tracing
> framework instead of some ad hoc approach, and have more options than
> simply counting (timing, means, etc).
Yes, it is exactly what I had in mind, but I would also add features
to take into account data that could not be aggregated, for example
the TCP windows size over the time, or the throughput of one device
over the time. We have to give to the user all the sample, but this
feature should only be enable if the user request it to avoid to much
overhead.
> In all, it sounds similar to what you've described, I'm just not real
> clear on how you're seeing variables get defined, etc.
I was thinking of some build-in classes whose wraps-up could be
application, transport, devices, and so, that helps people who
implement theses layers to define by themselves the variables who
could be useful as statistics, and the framework will provide to them
all the needs to move theses raw data to the user with all the post-
processing build into the framework (I gave a more detail in my
previous response to Tom).
> - I was going to say that actual statistic calculation in many cases
> could be done offline by scripts included in the stats package, but I
> like Tom's idea of being able to stop simulations based on hitting
> some condition.
Yes it is definitively a good idea to include
> That could also be used to provide output to a
> loosely coupled realtime GUI visualization, but I personall am more
> interested in output for papers than GUIs.
>
> - A large part of the statistical package that would greatly benefit
> users is management of data over multiple runs---both repeated trials,
> as well as changing setups. In those sims I was talking about above,
> all that stat data simply goes to text files. A bunch of perl scripts
> (yes, yes, moving on...) manage running the simulation, looping
> through repeats, changing command line variables, etc, and producing
> stat files under a given naming convention. Other scripts then use
> that naming convention to collect all the data and collate it into
> gnuplot ready form, produce confidence intervals, etc. Something like
> that would be a workable approach. XML output would be trivial to
> add, and slightly more formal. Database integration to do that
> storage, collection, and manipulation would also be a good idea, which
> has been discussed for ns-3 before. In my experience, this data
> management is the most cumbersome part of working with simulation
> statistics, so it bears at least as much focus as the "stat framework"
> inside the simulation.
I didn't think yet about how to take into account multiple runs into
the statistical package, but gathering all the statistical information
coming from a different layer into a repository could further help to
build a workable approach who handle the multiple runs and may be also
easier if you want push the data afterwards into a more formal data
management system (database, xml file, simple trace file, etc). But in
this case, as you mention the naming convention should be carefully
defined on the framework itself to enable future integration with the
data management system.
Regards,
Vincent
More information about the Ns-developers
mailing list