[Ns-developers] ns-3 Testing
craigdo@ee.washington.edu
craigdo at ee.washington.edu
Fri May 8 15:43:50 PDT 2009
Having taken the ns-3 test philosophy discussion up a level, here's where I am headed on the ns-3 testing front.
Rather than characterize things as unit tests, integration tests, system tests, etc., I have focused on what the tests actually need
to do. At the highest level, a test will execute some code and then check that the code does what it is supposed to. Assertions
are easy. It is generating the vectors themselves and checking results that can cause grief.
I have come up with four basic kinds of tests based on the kind of data that is generated and how it is reduced or not reduced.
Here are examples of each of those broad kinds of tests.
1) Simple tests.
This is the kind of test illustrated in the google test manual and the kind of thing we have in our current unit tests. Basically
you instantiate an object and then do a method call on it and check a result. It can all be reduced to a simple assertion really
ASSERT (object.Method (12345) == blah);
This is no big deal and there are frameworks galore that make this easier to some extent or another and let you orchestrate numbers
of tests and print pretty results.
2) RNG tests.
The RNG tests are an example of a test that has a very simple stimulus vector and very simple output. You call a method and get
back a number. However there's an additional step needed. In order to make sure that the object does what it needs to do, you need
to reduce the output data before you make a very simple comparison. This is something that could be quite easily handled in the
unit tests by calling a test routine and asserting that the output of that routine is "correct."
Basically, you provide a stimulus -- call Random() a zillion times. You record the results by adding them to a histogram. You
reduce the data by doing a chi-squared test against a known distribution and then come up with a number. You then pass-fail based
on the number.
This fits in with the kind of tests done using our unit test framework, with google test and many others. It seems that adding
classes to accumulate and reduce data to our unit test framework would be in order to more easily enable this kind of test.
3) TCP State Transition Tests
This is a much more involved kind of test. The goal would be to isolate TCP as much as possible from the rest of the system, drive
it with test vectors generated by a known good TCP and watch traces of internal state variables. For example, one could take ns-3
TCP and hook it up to a Linux TCP via NSC. You could wire the bottom of ns-3 TCP to the bottom of NSC TCP and drive both via
sockets. The goal would be to hook trace events from the ns-3 tcp state machine and then run the test.
Basically you provide a stimulus while running a simulation -- with sockets calls to the TCP modules and record the results that are
traced state changes in some kind of vector of timestamps + state changes. At the end of the simulation, you run through the
accumulated state changes and manually compare them to what you expect -- for example, congestion window 1, 1, 2, 4, 8, 16, etc.
This kind of test uses other pieces of the system (NSC TCP) to generate complex test vectors and the results are simple enough to be
dealt with by hand.
Here you basically run a simulation script and use tools to accumulate results and then assert that the results are what you
expected. With another level of indirection this could fit in with the kind of tests done using our unit test framework, but could
also span modules.
In the case of NSC TCP and ns-3 TCP, both live in the same module so it makes sense to have the tests live in, for example, the
internet-stack module. Even though the goal is really a unit test of ns-3 TCP it is really also an integration test of NSC TCP and
ns-3 TCP and relies on NSC TCP being "known good."
It could be the case that this kind of test used separate modules just as easily, in which case the tests would need to live outside
the internet-stack directory. This is what the "valver" directory (that Tom and Mathieu wanted to rename "tests" was for.
4) Detailed TCP operation Tests.
This is even much more involved than the last kind of test. Here you might want to use NSC to generate the test vectors but might
want to look at exactly what the TCP under test actually generates -- headers. We want this to be TCP-only, so we will be looking
at TCP headers. It is fairly easy to accumulate a vector of TCP headers, but the hard problem is, what do you compare this vector
of headers against?
This is where I think trace files are the only practical answer. However, the trace files are *not* pcap traces with all kinds of
additional noise in them. They would be trace files of TCP headers perhaps written by a TCPHeaderTraceWriter. In this case, you
would generate reference traces that contained the expected responses of the TCP protocol. Hand crafting reference traces on the
fly is just too painful to contemplate. Providing tools to capture responses seems the only reasonable way to do it.
An extension of this would be to use the NSC TCP to generate a set of "stimulus traces." For example, if you saved the output of
the NSC TCP (a set of packets consisting only of the TCP headers and data) into a trace file and "played" that trace file back to
generate the stimulus vectors it would simplify the stimulus side and you could remove the NSC TCP dependency thereby sharpening the
test to check ns-3 TCP only. This would be the ultimate test environment IMO.
So to work in this environment, it would be good to provide ways to capture and play back trace files (again, *not* pcap trace files
but protocol trace files specific to the protocol under test); and to capture and check results against specified files.
---------- Summary --------
I think these examples can help lead to a high-level architecture (ignoring details like google test or ns-3 unit test for the
moment):
We have a need for unit-test-like things that live in the files along with the things they test. This is basically what we have
now, but we should add data accumulation and reduction features to make more complicated tests easier.
We have a need for dedicated tests that live in the same directory as the modules they test. This would compose different objects
defined in the modules together and use the same features from the unit test framework that allow for data accumulation, reduction,
comparison and assertion to do the job.
We have a need for dedicated tests that live outside the directory as the modules they test. This would be the same thing as
described above, we just need to provide a home for them.
We need to provide ways to generate, store, compare and play back reference traces. These mechanisms should be available in tests
irrespective of where they live in the system (i.e., you should be able to use reference traces in complicated unit tests testing
one file in a module, separate files and also inter-module tests).
We should provide for a way to do simple smoke testing. For example, does example.cc compile and run even though we might not
actually check any results.
The act of doing verification and validation should result in tests that can be automatically run in our nightly builds. Where they
are located isn't really important.
I think there needs to be a test framework that supports the kinds of tests described above. I think this test framework needs to
be usable in a unit test, but also as a stand-alone script.
I think there needs to be a way to communicate to the regression system that some set of unit tests need to be run, and some set of
scripts should be run.
I think that the python file approach is a good one. Perhaps all of the unit tests can be run as they are and there is a python
file created for each stand-alone test program that holds meta-information such as whether or not trace files are needed and where
to find them.
If we can agree on a high-level approach then I think we can talk about whether or not to use google test or test.h as a basis, and
exactly how to go about writing each individual test.
There was an additional, need articulated by Tom, for a way to get pretty graphs out of the system for use in presentations. I
think we could easily add support for gnuplotting the contents of histograms, or for gnuplotting the distributions in a distribution
test and generate plots like I did for the RNG tests.
-- Craig
More information about the Ns-developers
mailing list