[Ns-developers] ns-3 Testing (was RE: Request for Merge -- Validation)
craigdo@ee.washington.edu
craigdo at ee.washington.edu
Fri May 8 11:58:12 PDT 2009
Okay, It will be good to have a discussion about how to proceed with ns-3
testing. I'll just forget about the little patch and try to organize our
thoughts on this subject. I'll try and write something concrete on the wiki
as this discussion proceeds.
> I'm reading another mailing list this morning (quagga-dev)
> where the lead maintainer is complaining that only the
> maintainers write tests and no one is submitting tests
> with their patches.
I think this is the elephant in the room that nobody is really addressing.
How you arrange macros to assert one way or the other, or how you
instantiate your test harnesses isn't going to address this. The problem is
that tests of complicated models are hard to do and will occupy tremendous
amounts of time to do right. Typical test frameworks just address a tiny
part of the problem IMO.
The problem that is usually solved is on the order of making the following
easier:
instantiate object
make method call with data x
make method call and check return type == y
For something like this, there are many relatively simple unit test
frameworks available. We have one, GoogleTest is another. Python did one.
I'm not very concerned about this level of test. For statistical-based
tests, we can add distribution tests to the framework. To get pretty
histograms, we can add gunuplot support, etc. I think this is what Mathieu
was talking about earlier.
I submit we have a different, much more complicated story to deal with as
well -- more like how do you test a windowing system.
For a low-level test of a protocol object, what we need to do is more along
the lines of:
instantiate objects and scaffolding to isolate objects under test from
system
run simulator
create a packet
inject packet into bottom of protocol
look for callouts from protocol objects with response packets OR
look at packets output from trace hook OR
look at state transitions from trace hooks OR
watch callbacks and compare data.
I've prototyped something that does this with test vectors implemented as
scheduled simulator events, and with response vectors saved as a list of
different kinds of responses. That is pretty easy to do. What is very hard
to do is to create stimulus packets and provide response packets to compare
with.
The easy way to create stimulus packets is to let the other half of the
protocol create them for you. For example, with ns-3 TCP you might want to
let some Linux TCP create the packets for you; and then vector them over to
the ns-3 TCP implementation and see what happens. The easy way to provide
some reference bits to compare the ns-3 behavior to is to just examine the
responses closely, compare them to spec, and then save them. This should
sound familiar since it's exactly what the trace-based comparison test does.
I think the challenge for us is going to be to make an easy way to create
packets. It is insanely hard to hand craft them bit-by-bit, so I think
you're going to have to at least use headers. This means that you are going
to have to verify your header code bit-by-bit in a unit test, but can then
use it in a test like this.
The problem is that the required headers are not generic -- they are closely
tied to the protocol under test. I think you'll end up coding something
like (this is just off the top of my head, I do not claim it is an extensive
or even correct TCP test; just an example of what would need to be done)
Ptr<Packet> p = TcpTest::SynPacket (...)
Ptr<Packet> p = TcpTest::SynAckPacket (...)
Ptr<Packet> p = TcpTest::AckPacket (...)
Ptr<Packet> p = TcpTest::RstPacket (...)
Ptr<Packet> p = TcpTest::DataPacket (...)
Where you can use some of these as stimulus packets and others as response
references. I haven't figured out a way to make this any easier in the
generic case than it sounds.
The test vectors are separate functions that are scheduled in the simulator
(you have to do this for TCP since part of what it does is to
Simulator::Schedule timeouts. So part of the script is going to look
something like:
ScheduleBasicConnetionTestVectors ()
{
Simulator::Schedule (method to open a socket);
Simulator::Schedule (method to inject syn packet into protocol);
Simulator::Schedule (method to inject ack packet)
Simulator::Schedule (method to inject data packet)
Simulator::Schedule (method to inject data packet)
Simulator::Schedule (method to inject rst packet)
...
}
You'll need to save the responses somehow (perhaps in a queue of responses)
and then write code to do the comparisons with expected data we generate
somehow.
VerifyBasicConnectionResponseVectors
{
TcpTest::VerifySynAck (...)
TcpTest::VerifyDataDelivered (...)
TcpTest::VerifyAckWithWindow (...)
}
This is going to be insanely difficult and I think selling this to the
average developer will be pretty much hopeless. Maybe someone with a vested
interest in showing the world that his or her TCP model is *perfect* would
do something like this, but it's a *lot* of effort.
The immediately obvious ways to make this easier are:
Let another model do the packet generation.
Save responses in some format in a reference run and then use them as
golden responses to compare against.
Of course, you begin competing with trace-based tests which do just that and
are super-easy to do. Validation consists of comparing the trace files with
what the spec says should happen. Test coverage can be increased by writing
more scripts. Of course, as your black boxes get bigger and bigger, it is
harder and harder to check specific things deep down in the internals of
your model. That is the bottom line tradeoff.
I don't want to make this email too long, so I'll wrap up for now.
There are reasons for wanting to isolate your model from the rest of the
world and stimulate them with patterns you generate and compare responses to
patterns you derive from specs and generate. Primarily, your tests don't
depend on other things in the system that can change out from under you and
you have super-fine-grain control on your test and response vectors. The
problem is that it is insanely hard to put in the support you need to do
this. I think real people will want to depend on other models to help with
the task.
Having something like Linux TCP generate test vectors via socket calls can
generate test vectors if you assume that Linux TCP is known-good, but how do
you capture response vectors? It seems to me that you either end up with
some kind of binary file full of expected values, or you generate them on
the fly. If you don't care about extraneous stuff (ARP exchanges for
example) you just end up with what we have now. We can reduce trace file
size by saving only what is needed, but this still means saving some large
data files somewhere (cf. ns-3-dev-ref-traces). We could reduce that data
by providing ways to, for example, take a CRC of the packet or header and
save that (four bytes vs. 1024 bytes) -- but the test writer will have to do
the data reduction and somehow tell the test program to use it.
Anyway, I can imagine putting in support for fine-grained tests like this,
but it certainly isn't going to make writing tests easy. The hard work in
writing tests is in, well, writing the tests; and I don't think it will be a
rush of people who want to take this on.
> I'm reading another mailing list this morning (quagga-dev)
> where the lead maintainer is complaining that only the
> maintainers write tests and no one is submitting tests
> with their patches.
The only way to get people to write tests is to make it insanely easy for
them and not insanely hard. Making it easy means the ability to take some
result from their real-world work and just stick it in the system as a test;
and be grateful we got that.
I'm coming to the conclusion that the most important case will be trace
comparison tests since they are so easy; and that we could perhaps spend our
time in making finer grained trace files or CRC-like data reduction. This
will mean work for the contributor and I expect the only people who will
have the motivation and energy to do this will be maintainers, but they will
also have limited patience.
I'm afraid that reality will raise its ugly head here and that we may end up
with trace-based tests for models of any serious complexity no matter what
we think is right and good. We can improve the kinds of traces and only
store what is important or provide some relatively easy way to reduce trace
files to CRC if a tester writes the code to make that possible. CRC does
have the problem of making the root problem impossible to see easily.
I think when you this all together it means that we have non-trace-based
simple tests probably limited to simple models and unit tests. We may just
end up with reference trace-based tests for protocol unit tests for
complicated models (or system tests). The question of different frameworks
or not, where the tests go and how we make response vectors remains.
-- Craig
More information about the Ns-developers
mailing list