Re: Some transport comments on the CAPWAP protocol
From: Pat Calhoun (pacalhou) (pcalhouncisco.com)
Date: Mon, 23 Jul 2007 07:14:24 -0700 (PDT)
My apologies for the latency in my response, but here are my 2 cents on
the issues raised.

> One comment I had, was in regards to the lock-step protocol you have
defined for transfer of 
> firmware. First of all I wished you back when you started this year
would have made it simpler for 
> you by using a off-the-shelf transport protocol suitable for bulk
transfer. But as these decisions 
> where made back in history lets just ensure that your protocol has
considered all the neccesary 
> issues. I do hope that it will deliver enough performance for you to
be a workable solution. 
> Therefore, I do have the question on what range of RTT you expect to
have between the AC and the WTP? 
> This as the RTT has major performance impact.
My current experience is that the majority of WTPs will be deployed in
the enterprise. These enterprise networks are typically over-provisioned
and RTT is very low (sub 10ms). That said, there are some deployments of
WTPs in branch offices, connected either via a private or public network
to the campus, where RTT can range up to 10-100ms (where the high end is
typically rare as it impacts other enterprise applications). 

> My next question was if the protocol addresses all the issues raised
in the security considerations 
> for TFTP, which after all is a similar protocol in regards to the
firmware transfer functions. So can 
> someone please enlighten me by answering in regards to these things
raised in section 5 of RFC 3617:
> 
>     Use of TFTP has been historically limited to those devices where a
>     more full protocol stack is impractical due to either memory or
CPU
>     constraints.  While this still may be the case with a toaster, it
is
>     unlikely to be the case for even the simplest piece of network
>     support hardware, such as simple routers or switches.  There are a
>     myriad of reasons to use some protocol other than TFTP, only a few
of
>     which are listed below.
Well, the good thing is that most govenmental bodies prevents the
transmit power on most APs, limiting their use as toasters.

>     TFTP has no mechanism for access control within the protocol, and
>     there is no protection from a man in the middle attack.
>     Implementations are left to their own devices in this area.
Because
>     TFTP has no way to determine file sizes in advance,
implementations
>     should be prepared to properly check the bounds of transfers so
that
>     neither memory nor disk limitations are exceeded.

So there are a couple of issues listed here, so I will tackle them one
at a time.
1. Man in the Middle. This is less of an issue with the file transfer
and is more of a question as to whether the CAPWAP protocol is
susceptible to Man in the Middle attacks. Please note that CAPWAP uses
the DTLS protocol, and therefore leverages from all of the security
characteristics that protocol provides. Further, the CAPWAP specifically
includes addition authorization of peers, where it takes the credentials
exchanged during the key exchange, and validates that the peer is an
authorized CAPWAP peer.
2. File Size. During the download phase, the Image Information message
element is included in the Image Data Response command, which includes
the file size. In the CAPWAP protocol, the image is transferred from the
AC to the WTP, and the AC provides this information to the WTP. However,
the WTP initiates the image transfer, so if it had determined it did not
have sufficient storage space, it would simply skip the image transfer.
I would point you to section 9 of the current spec which includes
diagrams depicting the message exchanges and information included.

> 
>     TFTP is not well suited to large files for the following reasons.
>     TFTP has no inherent integrity check.  There is no way to
determine
>     what one side sent is what the other received.  There is no way to
>     restart TFTP transfers from anywhere other than the beginning.
TFTP
>     is a lock step protocol.  Only one packet may be in flight at any
one
>     time.  There is no slow start or smart backoff mechanism in TFTP,
but
>     very simple timeouts.
Correct. I would note that first off, we are not using TFTP so many of
these issues are not inherent in the CAPWAP protocol. So let me talk
about these one at a time.
1. Lock Step. The CAPWAP Control Protocol is in fact a lock step
protocol, and therefore only one message is in flight at any given time.
The AC sends a portion of the image, which the WTP acknowledges. This
causes the next block to be transferred. There is no need to go back to
the beginning, because each block is checksum'ed (see next sub-point),
so if it were corrupted, the WTP would response with a negative response
indicating the packet was invalid (or it simply does not respond, which
causes a retry). Given this is used to download firmware to the WTP, and
this can be done while the WTP is providing service, I do not see this
as much of an issue.
2. No Checksum. As late as the -05 draft, the CAPWAP Image Data message
element included a checksum. However, there was a request from the
working group to eliminate this, which had rough consensus, for the
following reason:
    > 6) The Image Data Message element has several problems. This
    >     include:
    >       1) it has a checksum that is used to determine if the
    >          block of image data has been modified. However, the
    >          DTLS session provides information to determine if
    >          a message has been modified. Also, the algorithm
    >          for the checksum is not specified.
    >          (This can be resolved by removing from each message
    >          element, and providing for a digest, such as MD5,
    >          for the entire file.)
Subsequently, the checksum field was removed, and draft -06 no longer
included this field.
3. Exponential backoffs. Support for exponential backoffs was included
in the draft -06, and was tracked as issue 251. The resulting text is:
    4.4.3.  Retransmissions
    [...]
       After transmitting a Request message, the RetransmitInterval (see
       Section 4.6) timer and MaxRetransmit (see Section 4.7) variable
are
       used to determine if the original Request message needs to be
       retransmitted.  The RetransmitInterval timer is used the first
time
       the Request is retransmitted.  The timer is then doubled every
       subsequent time the same Request message is retransmitted, up to
       MaxRetransmit but no more than half the EchoInterval timer (see
       Section 4.6.5).  Response messages are not subject to these
timers.

>     TFTP is not well suited to file transfers across administrative
>     domains.  For one thing, TFTP utilizes UDP, and many NATs will not
>     either support or allow TFTP transfers.  More likely firewalls
will
>     prohibit transfers.
Honestly, I think NATs these days are perfectly capable of transporting
UDP, but with some restrictions. Obviously, they use timers to set ports
to the idle state. It is for this reason that the CAPWAP protocol
includes a keepalive mechanism, which ensures that the NAT will keep the
UDP binding alive. Note that this pertains to the CAPWAP protocol as a
whole, and not simply the image download mechanism.

>     There are no caching semantics within TFTP.  There is no safe way
to
>     cache information using the TFTP protocol.
I do not understand what this means, sorry.

> The next comment was to ensure that you have in your protocol
considered all the issues raised by the 
> transport area's UDP guidelines document that we are developing.
Please consider the issues mentioned 
> and we greately appreciate any feedback on the document itself.
>
http://www.ietf.org/internet-drafts/draft-ietf-tsvwg-udp-guidelines-02.t
xt

Ok, well there's a myriad of issues here so I guess I will have to
address each one in turn.
3.1 Congestion Control
As described above, the CAPWAP control plane does include exponential
backoff, which provides congestion control. However, the CAPWAP data
plane does not include congestion control, as it is simply a tunnel
between the WTP and the AC, similar to GRE. I believe that attempting to
add congestion control to the data plane would introduce another set of
issues and given the overwhleming majority of CAPWAP deployments simply
does not cause a need for it.

3.2 Message Size Guidelines
Well... This is an interesting guideline, and I question why it exists.
As background information, the CAPWAP protocol was based on LWAPP, and
it shares the same PMTU of 1468. This is basically the Ethernet MTU
minus the various headers. I would note that today, there are well over
4M WTPs that run the LWAPP protocol, connected everywhere from the
campus to branch to home office. The only issue we had ever seen was
with PPPoE, which required us to reduce our MTU size. We have customers
using all kinds of private and public networks, and this has never been
a problem for any one of our deployments. 

So while I am sure that the authors of the draft must have had a really
good reason for recommending the message size, I question why it never
caused any issues for any of our LWAPP deployments.

3.3.  Reliability Guidelines
The CAPWAP control protocol has inherent reliability through a
request/response scheme. It also ensures that the messages are
identified to ensure that any retransmissions are identified, and there
is specific text around the treatment of the retransmissions. The data
plan does not include reliability.

3.4 Checksum Guidelines
This is discussed below as a separate point.

3.5 Middlebox Guidelines
As discussed above, the CAPWAP protocol includes a keepalive mechanism,
which is sent periodically to ensure that the UDP flow is kept alive.
Furthermore, in order to ensure that the data plane UDP session is kept
alive at the NAT, a keepalive mechanism also exists in the data plane. I
note that the guidelines state that this should be sent no more frequent
that every two minutes, which in my personal experience is on the high
side. The CAPWAP protocol uses a configurable timer, whose default value
is 30 seconds, which allows us to support the realities of life because
there are middleboxes out there that will in fact violate the 2 minute
recommendation (and as with any vendor, interoperability with the
network and reducing support calls is key).

> I also raised a comment about you specifying that the checksum is
turned off for UDP. I wonder if 
> there are any real reason for doing this? I did understand that the
usage of DTLS is all the 
> integrity checks you need for the individual packets. However, I would
like you to consider the fact 
> that for IPv6 you will be required to have it on. Also it prevents any
port/device receiving a packet 
> that gets a bit-error in the packet to discard it as the obvious
stray, or forces DTLS to check it 
> when the "for free" UDP checksum would take care of it.
There are two types of packets in CAPWAP; control and data. With very
few exceptions, which are packets sent prior to the establishment of the
DTLS session, the control packets will be checksum'ed as a result of
DTLS. However, for data packets this is not the case, as the use of DTLS
on the data path is optional.

Recall that the primary function of the AC on the data plane is to
convert packets from a CAPWAP tunneled packet to the 802.3 format. Most
ACs designed today need to do this at very fast speeds, and in many
cases this is all done in hardware. When we discussed this during the
protocol design, there was overwhelming consensus that needing to run
checksums in hardware at high speeds was expensive, which led to the
disabling of checksum. 

Again, for vendors that care, they would use DTLS, which in itself
significantly increases the cost of the hardware. To your point on IPv6,
it is for this reason that we recommended the use of UDP-Lite for IPv6.

PatC

Results generated by Tiger Technologies using MHonArc.