Re: [Issue 248] Comments on EAP state machine v4
From: Florent Bersani (florent.bersanird.francetelecom.fr)
Date: Fri, 25 Jun 2004 10:52:37 -0400 (EDT)
Some more thoughts on these issues in-line.

Many thanks for taking the time to read them and to write a detailed answer!

Florent, who agrees that some issues are uninteresting (at least for me) and too meticulous (but when specifying a state machine, it seems to me a good idea, isn't it ;-))

Nick Petroni wrote:

Here are some comments on Issue 248.
...


Comment #5 - Technical

That's just a triviality about for instance the peer state machine:
thanks to EAP idleWhile, the method does not have to set timers (EAP
cares for it). However, in case the method wants to implement a "bad
packet received" counter (e.g. it is waiting for a packet and to provide
DoS resilience it wants to allow receiving a limited number of "bad
packets" before the right one - instead of going automatically to
failure), it has to do so by itself (and typically will use altReject if
it wants to fail before the timeout. This is not an issue but perhaps it
could be worth discussing the usefulness of such a behavior for EAP
methods (see e..G g. RFC 3748 section 7.5 "Whether a MIC validation
failure is considered a fatal error or not is determined by the EAP
method specification") and that it can indeed be implemented in the EAP
state machines (with a little disymmetry between the timer implemented
within EAP and the "bad packet received" counter implemented within the
method. I guess this comment is a way for me to express that I
wholeheartedly agree with the point .3 Joe made in issue 203 (in other
words the imbrication of EAP and EAP methods confine to layer violation).


I'm sorry, I don't quite understand how you would like to resolve this
issue. I *think* you want more guidelines for how a method could implement
certain types of error processing for things like possible DoS attacks
etc. Is this correct?


To clarify what I meant, this is related to comment #6 of this thread and comment #1 of my mail on EAP (see http://mail.frascone.com/pipermail/public/eap/2004-June/002531.html).

In short, since altAccept and altReject are set by the lower layer and not by the EAP method (*), there is no way for the method to fail more quickly than a time out IINM.

Indeed, suppose we are in a method exchange (any resemblance to EAP-PSK is purely incidental ;-)) where the peer waits for the server to authenticate to it, that is the peer waits for a MAC=MIC produced by the server. The peer receives request #1 containing such a MAC=MIC. As this MAC=MIC is invalid, the peer discards thanks to m.check and waits for another request.

The question was: in case the peer considered this reception of an invalid MAC as a critical failure and wants to fail immediately i.e. without processing any other packet (despite the DoS issues associated with this behavior), how does he(***) do? My answer from the EAP state machine figure 3 of the strawman4 version is that he can't! To avoid processing any new requests, he updates his m.check function to answer TRUE whatever the input (or sets methodstate to DONE) and waits for the time out, that's his best option (this comes from the examination of the transition conditions that lead to the SUCCESS or FAILURE state). This comes from an examination of the transition conditions that lead to SUCCESS or FAILURE and the fact that the method does not control portEnabled or eapRestart (the lower layer does). I think that not allowing the method to quickly transition to SUCCESS or FAILURE is a bad restriction. What do you think?

Actually, the question was originally formulated in a slightly different scenario: the peer receives request #1 with a bogus MAC=MIC but he is disposed to allow reception of 3 bogus MAC=MIC before failing. He ignores it thanks to m.check() and increments its BadPacketCounter that is internal to the method. He receives request #2 with a bogus MAC=MIC. He ignores it thanks to m.check() and increments its BadPacketCounter. He receives request #3 with a bogus MAC=MIC and wants then to fail quickly, what does he do? In fact, I was also complaining that the BadPacketCounter lies within the method internal state, while the Timeout timer is handled by EAP (this is probably a matter of taste... and such issues arise when one does not adopt a neat black box interface between layers, which is IMO the case for EAP and EAP methods, hence the reference to Joe and Issue 203.

To keep on expanding my thoughts on the problem, the only parameter the method has IINM to influence the success or failure of EAP is the methodstate (****) and decision variables (*****). It is a pity that decision is always associated to idlewhile in the transitions to SUCCESS or FAILURE, don't you think? This has surely to do with the choice to keep the EAP success or failure packets although they are useless - (expect for backwards compatibility - which however could certainly be provided without endangering the new generation) and worse, harmful.

(*) Here there is a possible confusion between alternate result indications - a possibly unprotected indication given by the lower layer that the data link went down or up (**) - and protected result indications - cryptographically protected result indications given by the method. At least, it confused me ;-)

(**) BTW do we have an example of indication of the data link that it is up? (I see examples of link down, e.g. a dial-up NAS hangs up)
I do not think that your DHCP example works as it is, IIRC, the peer that first sends the DHCP request. So to understand that the authentication succeeded from this, the peer would have to keep trying sending DHCP requests while authenticating and understand it has succeeded when it gets a response. Am I right? Perhaps, sth around DNA?


(***) BTW, would a native English speaker or an educated foreigner tell me what is the correct pronoun to use when talking of the peer: it or he or both?

(****) BTW, I think I have two other issues here:
1) allowMethod is not defined in the document IINM but is used in figure 3
2) I did not find a place in RFC 3748 saying that it is forbidden to have multiple round trips of the Identity method. If this is the case, the state machine reflects this... sadly, this leads to an unnecessary (& stupid & not dramatic) DoS attack: the attacker keeps sending EAP Identity request and the peer may keep replying to these requests (and discarding the valid requests of the server). I thought about this when talking about the methodstate variable and noticing that there was a difference between Identity/Notification (which are from a theoretical POV two methods) and the other methods. Namely, while the identity and notification methods do not impact the methodstate variable nor the decision one...
I'll post these issues in a separate mail for those who won't have read this lengthy one (I very much understand them ;-))


(*****) BTW, the only way the protected result indication is IINM the decision variable (see also comment #6):
1) if the peer has a protected success indication, he sets decision to UNCOND_SUCC provided he wants to use the access he has been granted
2) if the peer has a protected failure indication, he sets decision to FAIL
3) if the peer hasn't any protected result indication and is willing to use the access if he is granted it, he sets decision to COND_SUCC


Comment #6 - Technical

This is about DONE, CONT and MAY_CONT/UNCOND_SUCC, COND_SUCC and FAIL.

While I do not doubt that there are could technical reasons to use these
variables (rather than simply CONT and DONE) and that the EAP state
machine does not claim to be THE way to implement EAP (in its
introduction "The State Machine and associated model are informative
only. Implementations may achieve the same results using different
methods"), I think that giving briefly the rationales behind this choice
(which is not explicit in section 4.2 IMHO) would help the reader. In
particular, giving an example of MAY_CONT's usefulness.

About the decision variable, here also an explanation of the design
(maybe with an example) could help. Indeed, it seems to me that not all
pairs (state, decision) are acceptable so state/decision are not totally
independent. Here again, giving an example why COND_SUCC was introduced
could help.

This seems mostly editorial, but perhaps not- you are asking for more text
to describe these state, right? I can see why this might be of some use
for someone without the context of the EAP WG discussions (i.e. the
typical reader of this document).

Not only text to describe these variables but some hints to the reason why these variables were chosen.

I think this concern is also related to the conditions in the state
machine that allow the peer to transition to success or failure. They do
not appear to be either trivial or symmetric. The newbie I unfortunately
am, needs much more time to (fully) understand them than any other
transition condition in the state machine. Bernard for instance
questioned about these Success/Failure transitions in Issue 229. For
instance, I am wondering, how the condition "altAccept && methodState !=
CONT && decision == FAIL" may occur.


This is the condition where a so-called "alternate indication" of success
is received, but the method *might* be done and has not yet concluded it
succeeded on its own. There are three competing issues here:
 1. We received an outside "alternate" indication that EAP is finished
    (e.g. DHCP request or similar)
 2. The method has not explicitly claimed it is still continuing, so it
    is either "possbily done" (methodState == MAY_CONT) or it is
    definitely done (methodState == DONE). Either way, it is possible tha
    the method could be finished so we can't just ignore the indication
 3. we have no indication from the method of success. EAP should never go
    to the SUCCESS state without getting notification that the method
    succeeded (or otherwise is willing to accept the connection)

Although I disagree stricto sensu with this example (the peer will not receive a DHCP request, IINM, it will send one), I essentially agree with you: I confused alternate lower layer indications and protected method indications.

Also in section 4.2 I tend to feel dizzy with some text in the paragraph
methodState=DONE: "If both (a) the server has informed us that it will
allow access and the next packet will be EAP Success,and (b) we're
willing to use this access, set decision=UNCOND SUCC." I guess that
condition (a) should rather be formulated in terms of altAccept,
shouldn't it? Indeed while IIRC RFC 3748 mandates (in section 4.2 "The


No, I think there are two things being confused here. altAccept is NOT to
be used by methods. altAccept corresponds to so-called "alternate
indications" found in RFC2284 (and now RFC3748). (a) is meant to show the
case where IN THE METHOD the peer has been made aware that the
authenticator is happy with the state of authentication and will allow
access. Previously, it was possible to hijack EAP sessions by sending an
EAP Success before the authentication was complete or DoS by sending EAP
Failure. Now, the method has a way to say "I know we succeeded, so
any Failure packet is false" or "if you don't get a Success, it's still
ok."

I agree: I confused alternate lower layer indications and protected method indications. Thanks for your very pedagogical correction :-)

Comment #10 - Technical

Why include a separate TIMEOUT_FAILURE State? Why not use the FAILURE state?


This is because the symptom is that a requeste has been sent, but no
response received. My recollection is that it was discussed and determined
that EAP should not send a Failure in response to this symptom, it should
simply stop trying.

Thx for the explanation (and sorry I missed it in the archives). Why not include it in the document?

Comment #12 - Technical

This one is stupid but what happens, according to Figure 4, when the
standalone authenticator fails directly, i.e. starts by INITIALIZE,
transitions to SELECT_ACTION where Policy.getDecision replies FAILURE
and thus transitions to FAILURE - in the FAILURE state, I bet there is
some problem with eapReqData = buildFailure(currentId) since currentId=NONE


This is a problem with Canned Success/Failure in general since only
Requests modify the Identifier field. IEEE 802-1X-REV states the following
in 8.2.4.1.3:
 txCannedFail. An EAPOL frame of type EAP-Packet, containing an EAP
               Failure packet constructed by the Authenticator, is
               transmitted to the Supplicant. In the case that
               no EAP communication was taking place on the port, then
               any value of Id may be used in the identifier field of the
               EAP frame. In the case that there was an EAP communication
               taking place on the port, then the value of the Identifier
               field in the EAP packet is set to a value that is different
               from the last delivered EAPOL frame of type EAP-Packet.

We could add similar text if you like.

It is not stricto sensu a canned success/failure IMO.

It's just that although RFC 3748 prevents IMHO a stupid server to directly start an authentication by sending a failure (not very explicitly, perhaps see the beginning of section 2 of RFC 3748 but it doesn't sound very normative or section 4.2 of RFC 3748), figure 4 relies on the policy (policy.getdecision) to avoid this behavior... and I think this is worse stating that the authenticator's policy must comply to this!

I have problems to understand the .1X text: does it authorize a valid authenticator to send directly a failure message ("in the case that no EAP communication was taking place on the port, then any value of Id may be used in the identifier field of the EAP frame") could you please clarify it for my poor brain?

Comment #14 - Technical

I am totally novice to DoS (I found a lot of papers on the subject, for
instance related to IKE - I plan to read them soon :-)) so this point is
probably not very important (my understanding is that one of the
difficulties with DoS is to understand what is really relevant and what
rather belongs to the .11 microwave oven attack, another one could be
set the trade off between DoS resilience and "efficiency").

It just seems to me that Figure 4 prevents the standalone authenticator
from ignoring (bogus) NAKs. Indeed, let us consider a corporate WLAN
deployment where exactly one EAP method is allowed - so that no valid
user will ever NAK. In this setting, there is no point in processing the
NAK, possibly loosing the valid user's response if the attacker's NAK
arrived first and starting all over. I did not find text on this in RFC
3748 (the text I found was about preventing NAKs when a response to a
method has already been received) which is not our case here.


Using forged Naks as a DoS is a known attack against EAP AFAIK. The DoS
attack is mitigated by the case you mention (not allowing after non-Nak
response has been received)

This is not the case I mention!!
I precisely mention the case, when the server has sent his initial request and an attacker replies with a NAKs before the peer replies with the correct response. In this scenario, IINM, although the server may be sure that a valid peer won't NAK (a policy decision), figure 4 mandates that it processes the NAK. I find it an unnecessary open a door to a stupid & useless & inefficient DoS (and when one cans close such doors without much trouble, then I recommend closing them although the DoS attack is not dramatic). So my proposed resolution is to allow the policy to set methodstate directly to CONTINUE (while still authorizing it to set the state to PROPOSED) in the PROPOSE_METHOD state. What do you think?


Comment #17 - Technical

I fail to understand the transition in Figure 7 from
INITIALIZE_PASSTHROUGH to AAA_IDLE when currentId==None, given that
AAA_IDLE sets aaaEapResp=TRUE


I think the point here is that aaaEapResp is telling the lower layer that
it's time for that layer to do something. In this case, no packets have
been sent from anyone so the lower layer will see aaaEapRespData is NONE
and know to start. At least I *think* this is what's going on. Others may
be able to correct me.

well, according to the definition of aaaEapResp (section 7.1.1 "Set to TRUE in lower layer, FALSE in authenticator state machine. Indicates an EAP response is available for processing."), aaaEapResp is *not* a vague flag to say to the lower layer to do to do something but a precise flag to say to it that it has a response available for processing.
So I still hold onto this issue. It arises when an authenticator boots and decides directly that it will be pass-through whatever the id of the peer (this behavior appears to be allowed by the state machine). I think it needs a fix.
Perhaps I'm off the rocket but then I would like to be enlightened (if somebody thinks I am able to understand the explanation ;-))


Results generated by Tiger Technologies using MHonArc.