[Ns-developers] A question about downcast netdevice pointer.

craigdo@ee.washington.edu craigdo at ee.washington.edu
Sat Sep 27 12:21:39 PDT 2008


> Craig, consider this:
> 
> class Csma : NetDevice;
> class Wifi : NetDevice;
> 
> Now you create a mutant netdevice object instance with Csma 
> aggregated:
> 
> Ptr<Wifi> wifi = CreateObject<Wifi> ();  // 0x123 address
> Ptr<Csma> csma = CreateObject<Csma> ();  // 0x456 address
> wifi->AggregateObject (csma);
> 
> Now, given the 'wifi' pointer, what should 
> wifi->GetObject<NetDevice> ()
> return?  0x456 or 0x123?  They are both NetDevices...  

This is a great example to sharpen what I'm talking about.  I think this
should be a programming error to do this for exactly the reason you give
above.  But allow me to use this same idea to construct a slightly more real
example to examine in some detail.  Hopefully this will clarify what I am
talking about.

Let's say, for the sake of argument, that I don't like the ARP
implementation in ns-3 for some reason.  I decide that it needs to be
extended or changed in some way.  A simple way to do this would be to make
ArpL3Protocol::Lookup virtual, inherit from ArpL3Protocol and create a
CraigArpL3Protocol that did the lookup differently.  Let's say I want to add
a knob to control the behavior so I define a TypeId ns3::CraigArpL3Protocol
and add some attributes or methods to the class.  I go into the
internet-stack.cc and add a function AddCraigArpStack to be symmetrical and
then go to the InternetStackHelper and add an if .. else to call
AddCraigArpStack in some cases.  

What I expect to have happened, is that I have substituted my ARP for the
ns-3 ARP.  What I have really done is to have introduced a *very* hard to
understand bug, and ns-3 will have happily accepted what I just did.
Really, I don't think there are more than two or three ns-3 developers that
can predict how this will behave and why; and I know that I have to think
very deeply about the situation to try and predict what will happen.  This
is bad (TM).  I don't think it is very arguable that it would be fairly
simple to catch this problem if you treat the objects involved as having
presented different discoverable Object interfaces to the system when, in
fact, they really did.  Let me explain ...

Here's what happened.  ArpL3Protocol inherits from Object and therefore can
be treated as a discoverable interface.  By that I mean that
node->GetObject<ArpL3Protocol>() will work.  CraigArpL3Protocol inherits
from ArpL3Protocol and has a TypeId so node->GetObject<CraigArpL3Protocol>()
will work.  What I did by adding the if .. else in the InternetStackHelper
was to create the same situation you illustrated above.  The system accepted
the situation because the "last" TypeIds of the two objects (ArpL3Protocol
and CraigArpL3Prorocol) were different.  The system makes this a problem
because it walks the inheritance tree in GetObject but doesn't test for
duplicates in the inheritance tree in AggregateObject -- because the two
cases are looked at as (very) subtly different.

I'll end up with an aggregation that looks like the illustration below
(ignoring the Object parts) and is actually implemented as a circular list.
This is perfectly legal in our existing system since we really do have that
dynamnic_cast versus Aggregate fuzziness we are focusing on.  

  +-> Ptr<Node> -> Ptr<ArpL3Protocol> -> Ptr<ArpL3Protocol +
CraigArpL3Protocol> -+
  |
|
 
+---------------------------------------------------------------------------
----+

My question is:  "is there an easily understandable behavior for this
construction that you can explain to a normal person in a few sentences."  I
submit the answer is no, and you really have to understand how the data
structures that implement the aggregation work, how they interact with the
TypeId data structures, how the actual Objects were implemented, from which
object a GetObject search was started, and the search algorithm to
completely understand how this aggregation will behave.  This is even beyond
the issue of "once you have a pointer to an object, should you use a cast to
move up and down its inheritance hierarchy."  This tells me we have a real
conceptual problem here.  You need to be able to document the behavior of
the API without saying read the implementation ;-)

Let me point out just a few of the (IMO) bizarre aspects of this situation
(but feel free to read object.cc on your own and write some code):

If you arranged the Aggregation order such that the original ArpL3Protocol
was aggregated to the node first, a GetObject<ArpL3Protocol> will return a
pointer to the original ARP Object.  A GetObject<CraigArpL3Protocol> will
return a pointer to an entirely different Object, the second
CraigArpL3Protocol.  A GetObject<ArpL3Protocol> on the returned
CraigArpL3Protocol will return a pointer to the ArpL3Protocol associated
with the Craig object, not the original ArpL3Protocol object because of the
search order change.  The end result is that it is quite easy to have
different pieces of the system believing they are working on the same ARP
object when, in reality, there are two completely different objects
underneath.  Now, if you arranged the aggregation order such that the
CraigArpL3Protocol object was aggregated first, you may get an entirely
different behavior.  Do you begin to see the litany of entirely different
behaviors that could be exhibited based on different implementations?

This could all be prevented by looking at the inheritance hierarchy of the
aggregated objects as the collection of Objects that they are really
presenting themselves to the system as being; and not trying to look at the
situation as inheritance from one angle and object aggregation from another.
I think this duality is the cause of the problems.

If the situation where there are multiple ArpL3Protocol objects accessible
in the aggregation was detected and flagged, the situation above wouldn't
have happened.  Seeing and preventing this is all much more obvious if you
start thinking about the ArpL3Protocol + CraigsArpL3Protocol situation as
two Objects; and as a corollary, it makes sense to look at the NetDevice +
CsmaNetDevice as two Objects accessible via GetObject and *not* as a
separate orthogonal situation where dynamic_cast makes more sense.

It is clear to me, at least, that the whole situation is much more easily
understood and explained if you explicitly *don't* get dynamic_cast involved
in the presence of aggregatable Objects.  So I think we should make the
situation Gustavo outlines above explicitly a programming error and
encourage people to look at collections of Objects as aggregations even if
the implementation is via inheritance.  You don't actually lose anything;
it's mostly a change of conceptual model.

> I know  this example doesn't make much sense, 
> but I am still convinced GetObject () is wrong 
> for type casting.

I think that this is a slippery slope that can get you into trouble if what
you are type casting are ns-3 Objects.  I think if Objects are involved you
should use GetObject.  If Objects are not involved you should use
dynamic_cast.  I think your suggestion for a DynamicCast template would be
useful and correct for Ptr-to-things that are not Objects.

> If I understand what you're saying, you think it would be simpler if
> developers pretended NetDevice and CsmaNetDevice were two 
> separate objects
> which are aggregated.  I emphatically disagree.  Developers 
> *must know* that
> NetDevice is a simple base class for CsmaNetDevice, not just another
> object.  If the developer was not supposed to know it then he would be
> writing lots of GetObject () calls before calling any 
> NetDevice method;
> clearly silly and a runtime penalty.  

No.  The NetDevice Object "interface" is a generic object and has methods
defined by the class declaration:

  N1
  N2
  etc

The CsmaNetDevice Object has *additional* methods for its "interface":

  N1
  N2
  C1
  C2

If you have a NetDevice wand need to call a "C" method, you do
GetObject<CsmaNetDevice> you get all of the methods you need.  Why would you
GetObject back and forth any more than you would dynamic_cast back and forth
to call NetDevice->N1 when you have a CsmaNetDevice->N1?

>And it would be more 
> complicated than
> it needs to: inheritance is a concept all C++ programmers 
> already know and
> are used to, there's no point in taking it away from them.

I'm not talking about taking inheritance away from anybody.  I'm saying that
adding Object to the mix actually does add something to the mix that needs
to be considered.  Using these Objects as if there was only inheritance in
play is, IMO, dangerous and you can fairly easily shoot yourself in the foot
in a very hard to understand way.  I think the only good way to create an
API model that is understandable by mere mortals (without having to take in
a lot of underlying implementation detail) is to *not* treat inheritance of
Objects differently than a generic aggregation of objects.  

Outside of adding the one test I spoke about in AggregateObject, and
preferring GetObject to dynamic cast, it is really only a conceptual
(mindset) change I'm talking about.  _There_are_no_API_changes_.  I don't
think anybody loses *anything* and it makes understanding the *whole* system
easier once you answer for yourself the question that started this whole
exchange: how can there appear to be two Objects aggregated when only one
C++ object is involved ...  

-- Craig

[ ... ]




More information about the Ns-developers mailing list