Bug #99571 for Net-DNS: AXFR BADSIG failures

Sat Oct 18 08:31:52 2014 tlhackque [...] yahoo.com - Ticket created

Subject:

AXFR BADSIG failures

Message body is not shown because it is too large.

Sat Oct 18 08:50:43 2014 tlhackque [...] yahoo.com - Correspondence added

Subject:	AXFR BADSIG failures - short form
From:	tlhackque [...] yahoo.com

On Sat Oct 18 08:31:52 2014, tlhackque wrote: Show quoted text

> I am seeing BADSIG failures from AXFR using TSIG. > > It appears that the validation code for TSIG is not verifying the data > received from the wire, but is attempting to reconstruct it from the > decoded data. This is problematic because while the result will be > semantically equivalent, it need not be bit-for-bit identical. In > this case, due to compression. > > The code needs to be changed to preserve the original data received > from the wire, make the RFC2848 changes (adjust arcount, delete TSIG, > change msid) and verify that. Reconstructing the decoded packet will > not work reliably.

(All the detail that I included made the initial report 'too large to display'; it's not spam can can be downloaded and viewed.) It's a rather nasty bug.

Sat Oct 18 20:17:48 2014 rwfranks [...] acm.org - Correspondence added

From:

rwfranks [...] acm.org

On Sat Oct 18 08:50:43 2014, tlhackque wrote: Show quoted text

> On Sat Oct 18 08:31:52 2014, tlhackque wrote:

> > I am seeing BADSIG failures from AXFR using TSIG. > > > > It appears that the validation code for TSIG is not verifying the > > data > > received from the wire, but is attempting to reconstruct it from the > > decoded data. This is problematic because while the result will be > > semantically equivalent, it need not be bit-for-bit identical.

I was hoping you would send me something more directly machine-readable to decode. But no matter, I can see what is driving this and have reconstructed an example which fails in the way that you observe.

Sat Oct 18 20:17:49 2014 The RT System itself - Status changed from 'new' to 'open'

Sun Oct 19 06:32:07 2014 tlhackque [...] yahoo.com - Correspondence added

From:

tlhackque [...] yahoo.com

On Sat Oct 18 20:17:48 2014, rwfranks@acm.org wrote: Show quoted text

> On Sat Oct 18 08:50:43 2014, tlhackque wrote:

> > On Sat Oct 18 08:31:52 2014, tlhackque wrote:

> > > I am seeing BADSIG failures from AXFR using TSIG. > > > > > > It appears that the validation code for TSIG is not verifying the > > > data > > > received from the wire, but is attempting to reconstruct it from > > > the > > > decoded data. This is problematic because while the result will be > > > semantically equivalent, it need not be bit-for-bit identical.

> > I was hoping you would send me something more directly machine- > readable to decode. But no matter, I can see what is driving this and > have reconstructed an example which fails in the way that you observe.

Thanks for looking at this. I included the zone file, the data received from the wire, and the data reconstructed by Net::DNS. All are in the debug output. For a set of local reasons (switches, software versions), I couldn't get you a pcap file. In any case, decoders tend to hide the problem because the interpreted output is identical. It should be easy to take my zone data and serve it from BIND - that should exactly reproduce my problem case. It's independent of the signing key. Otherwise, I'm not sure what you'd like. It would be simple enough to take the wire data dump and encode it into any binary format that you need... Or I guess I could write a binary file instead of the ASCII dump. BTW, I don't think that this issue isn't restricted to axfr() - the same thing can happen with normal queries. I thought about this some on a walk yesterday; probably the thing to do is to keep the wire binary data on a per-record basis as it is decoded. That will allow easy removal of the TSIG record. Unfortunately, this means double the memory usage. Perhaps this can be done selectively, e.g. note when a request is signed at send() time; if it is, the response may be signed (except for a couple of error conditions, MUST be), so we can save the wire data only when the request was signed. It's an error to receive a signed response to an unsigned request, so this is safe. Let me know if there's anything more I can do to help. Thanks again.

Sun Oct 19 09:39:00 2014 rwfranks [...] acm.org - Correspondence added

From:

rwfranks [...] acm.org

On Sun Oct 19 06:32:07 2014, tlhackque wrote: Show quoted text

> It should be easy to take my zone data and serve it from BIND - that > should exactly reproduce my problem case. It's independent of the > signing key.

With the benefit of 20-20 hindsight, that is probably what we should have done at the outset. Now proposing to capture the raw packet at decode() time, and release it at verify() time. Automating TSIG verification on queries then becomes quite attractive from the storage management point of view. Thanks for your help

Mon Oct 20 05:41:48 2014 tlhackque [...] yahoo.com - Correspondence added

From:

tlhackque [...] yahoo.com

On Sun Oct 19 09:39:00 2014, rwfranks@acm.org wrote: Show quoted text

> On Sun Oct 19 06:32:07 2014, tlhackque wrote: >

Show quoted text

> Now proposing to capture the raw packet at decode() time, and release > it at verify() time. Automating TSIG verification on queries then > becomes quite attractive from the storage management point of view. >

That sounds good. As I noted, since the raw data is only needed for TSIG, no need to capture it unless the request was signed. That will save storage in many applications, where TSIG isn't used. I do like the idea of automating verification for non-axfr queries too. I'm uncomfortable with the current scheme's inconsistency - automatic for axfr and manual for others. Plus the RFCs are pretty strict about when to verify(). There are some corner cases involving errors - and it would be simpler for clients if those decisions were made in Net::DNS rather than having the client have to code for them. Releasing the wire data in verify() does require some extra work: you have to save the verify result in case anyone calls verify() more than once on a response. (May not seem reasonable, but I can imagine code paths that fall into it. Plus it enables the next suggestion.) For people who want fine control, I suggest an $resolver->verify_tsig(0) to turn automatic verification off. (Since the RFC behavior is to do it, the default should be on.) In that case, you would still do the verification on receipt and save the result, but not signal or return an error unless verify() is called. With this approach, you'd always release the storage and still have a compatible API for any current users. Thanks for taking this on so promptly, especially on a weekend.

Mon Oct 20 06:08:52 2014 tlhackque [...] yahoo.com - Correspondence added

Subject:	SIG?
From:	tlhackque [...] yahoo.com

I don't use it, but I noticed that the code checks for SIG(0) as well. It probably has the same issue of requiring raw data... I haven't investigated, but thought I should mention it...

Mon Oct 20 13:32:51 2014 tlhackque [...] yahoo.com - Correspondence added

Subject:	AXFR BADSIG failures - simplification
From:	tlhackque [...] yahoo.com

Another walk produced a simplification that you might consider. By always computing the MAC in receive, it's not necessary to save the wire data for verify(). Here's how: Dependencies: RFC2845: 4.2, 4.3 An unsigned request may not produce a signed response. (A signed response is a FORMERR, and discarded. 4.6) A signed request will produce a response with a TSIG. (Otherwise, FORMERR and discarded. 4.6). The response must be signed, EXCEPT if a BADSIG (4.5.3) or BADKEY (4.5.1). RFC2931: A packet may be either TSIG or SIG(0) signed. Not both. And not more than one xSIG record for transaction signature. (2.1, 3.1, appendix, 2845 3.2) There SHOULD be configuration option for whether or not to validate (2.4) The xSIG record used for transaction signature must be the last record of a packet. (3.2, 2845 3.2) Observation: Given the above, the verification result can always be computed at the end of decoding a received message. The only data that decode() needs to save is the offset of the last record decoded (which will be the xSIG if one is present). On return from decode, we still have the data received from the wire in the original buffer. If the packet is signed (per rules above): The header is reconstructed (or modified; the offsets are fixed) to decrement arcount and fixup the ID. The MAC is computed on the prefix + reconstructed (or modified) header + the wire data up to the saved offset. This excludes the SIG record from the MAC. 'prefix' depends on the sig type - TSIG is request MAC; SIG(0) is rdata + query or rdata The resulting MAC is saved. The wire data can be discarded. Any error is saved (whether or not signed). verify() uses the saved MAC & error state. It should be called automatically, but errors aren't reported if the config option [$resolver->autoverify(0)] has explicitly turned it off. It can be called multiple times since it does not need the wire data. So a user of the old API (which required an explicit call to verify()) will simply get the status. This will be a cheap call since the MAC is available. The doc for verify() gets updated to indicate that calling it is implicit, unless $resolver->auto_verify(0) is called. For axfr multi-packet with auto_verify(0), verify() after axfr() can return the verification result. But the internal calls won't signal an error if auto_verify(0) was set. I think this avoids the storage penalty, makes verification automatic for a simple client interface, and allows a client that wants to manually verify to do so - even with axfr(). The only penalty is that the MAC is computed for a signed message even if auto_verify(0) is set and verify() is never called by the user. Since this is an unusual case, it seems acceptable. (auto_verify(1) is the default, and it should be turned off only if a server is broken, Net::DNS is broken, or the user is doing something strange, like DNS testing.)

Mon Oct 20 15:09:55 2014 rwfranks [...] acm.org - Correspondence added

From:

rwfranks [...] acm.org

On Mon Oct 20 13:32:51 2014, tlhackque wrote: Show quoted text

> Another walk produced a simplification that you might consider. > > By always computing the MAC in receive, it's not necessary to save the > wire data for verify().

Not so fast! The decoder _cannot_ compute the MAC on a response packet because that part of the code does not know anything about the query which produced the response, or the previous packet in a multi-packet response. Computing the MAC requires knowledge of the request MAC or prior MAC. If you look how the axfr() routine does the validation, you will note that the required state information is carried forward by creating a clone of the TSIG which is returned by verify() and fed to the next verify() call. This code was not so much written as evolved in response to discussion and bug reports. There is a trail of abject failure going back years. I have just read RFC2931 again, and it seems to me SIG(0) ought to follow a similar pattern. Not least because Packet has no need to know if it verifies TSIG or SIG(0)

Tue Oct 21 06:12:15 2014 tlhackque [...] yahoo.com - Correspondence added

From:

tlhackque [...] yahoo.com

On Mon Oct 20 15:09:55 2014, rwfranks@acm.org wrote: Show quoted text

> On Mon Oct 20 13:32:51 2014, tlhackque wrote:

> > Another walk produced a simplification that you might consider. > > > > By always computing the MAC in receive, it's not necessary to save > > the > > wire data for verify().

> > Not so fast! > > The decoder _cannot_ compute the MAC on a response packet because that > part of the code does not know anything about the query which produced > the response, or the previous packet in a multi-packet response. >

I said 'receive', not 'decode' for computing the MAC. Meaning the caller of decode after decode returns. Actually, the caller of Packet->(new, create -- I don't have the code handy.) The wire data is still there; the decoded version lets you determine if there's a SIG/error. All the decoder needs to do is save/return the offset of the last record it decodes. It doesn't need to know what that record is, or even what section it is in - the caller just needs the offset so if *it* decides to 'remove' the last record, it knows where that record starts in the undecoded bytestream. This suffices because *if* the record will be 'removed' (by length or substr) for the MAC computation, it MUST be the last record. It's just the end marker for the wire data going into MAC. The logic to use the offset/compute the MAC can be a call into xSIG:: that provides the wire data buffer, decoded packet & last record decoded offset. Those classes do know what goes into their MAC, and can keep whatever context they need in private data attached to their record and/or stashed in the resolver by the caller. E.g. xSIG::process_transaction always gets called if the last decoded record is an xSIG. The TSIG/SIG class can easily figure out if the MAC is needed using the decoded Packet & compute it using the wire data, saving the results for verify. (Including checking that arcount >1 & error status, etc.) It is true that the request packet is required. But I'll try not to think more about it on my next walk and leave the fix in your hands. I've probably reached the point of offering diminishing returns. Show quoted text

> Computing the MAC requires knowledge of the request MAC or prior MAC. > If you look how the axfr() routine does the validation, you will note > that the required state information is carried forward by creating a > clone of the TSIG which is returned by verify() and fed to the next > verify() call. > > This code was not so much written as evolved in response to discussion > and bug reports.

That does tend to happen with long-lived software. It's unfortunate that any approach to this bug requires re-architecting the receive/verification paths. Of course, less re-architecting can patch it together at the price of efficiency and maintainability. My observation was that saving/copying the wire data can be avoided with the right architecture. Since axfr() data can be (very) large, and other queries very frequent it seemed worth thinking about. As you're doing the work, it's your call. I hope that my comments have been useful. Show quoted text

> There is a trail of abject failure going back years.

Sigh. I didn't pay attention to this until axfr() broke. I'm just a user. Aside from learning from the failures, then only thing to do is to stay positive and look forward. Hopefully your current pass will be the one that finally succeeds! Show quoted text

> > I have just read RFC2931 again, and it seems to me SIG(0) ought to > follow a similar pattern. Not least because Packet has no need to know > if it verifies TSIG or SIG(0)

Agree on both counts. Thanks again for plugging away at this. I'll stop meddling and look forward to a release with whatever fix you decide upon.

Sat Oct 25 17:42:07 2014 tlhackque [...] yahoo.com - Correspondence added

Subject:	More AXFR BADSIG failures
From:	tlhackque [...] yahoo.com

I have looked at the code used to patch this, and although I haven't verified a reproducer, there still appears to be a bug. In RR/TSIG:decode_rdata, you compute the offset of the TSIG record in the wire data as: my $eom = $offset - Net::DNS::RR->RRFIXEDSZ - length $self->{owner}->encode(); Again, you are assuming that encode() will reproduce the wire input. This is not valid. I believe that if you have a compressible key name - e.g. my-key.example.net., where example.net. is used earlier in the packet, this can result in the wrong offset if Net::DNS encodes the name differently from the server. So the only place this can be fixed is in Packet::decode - as I've noted before. To be a bit more explicit, I'll outline two approaches: The simplest takes advantage of the fact that the offset of a record in wire data is only interesting for signatures, and that the signature record is always last. The more architecturally pure version saves the offset of every record - 'just in case'. Rough code - in the eval { of Packet::decode(): # question/zone section my $hash = {}; my $record; + my $last_rec = $offset; while ( $qd-- ) { + $last_rec = $offset; ( $record, $offset ) = decode Net::DNS::Question( $data, $offset, $hash ); CORE::push( @{$self->{question}}, $record ); } # RR sections while ( $an-- ) { + $last_rec = $offset; ( $record, $offset ) = decode Net::DNS::RR( $data, $offset, $hash ); CORE::push( @{$self->{answer}}, $record ); } while ( $ns-- ) { + $last_rec = $offset; ( $record, $offset ) = decode Net::DNS::RR( $data, $offset, $hash ); CORE::push( @{$self->{authority}}, $record ); } while ( $ar-- ) { + $last_rec = $offset; ( $record, $offset ) = decode Net::DNS::RR( $data, $offset, $hash ); CORE::push( @{$self->{additional}}, $record ); } + $self->{last_rec} = $last_rec; return $self; }; ... +# Use to obtain offset of last record in packet's wire data when computing signature +# This is the first byte of the last record decoded. When a signature is computed, +# this byte and all following are excluded from the data to be signed. +# This offset INCLUDES the packet header. + +sub last_record_offset { + my $self = shift; + croak('No saved wire data') unless exists $self->{last_rec}; + return $self->{last_rec}; +} +# Use to obtain offset of data following header in packet's wire data +# The data always follows the packet header. +sub packet_data_offset { + return HEADER_LENGTH; +} Back in TSIG, you need to get the packet into decode_rdata, and change: my $eom = $offset - Net::DNS::RR->RRFIXEDSZ - length $self->{owner}->encode(); to my $eom = $packet->last_record_offset - $packet->packet_data_offset; I don't recommend the 'pure' version, but for that, change: + $last_rec = $offset; to + CORE::push @{$self->{last_rec}}, $offset and + return $self->{last_rec}; to + return $self->{last_rec}[shift]; and in TSIG: my $eom = $packet->last_record_offset(-1) - $packet->packet_data_offset;

Sun Oct 26 14:43:36 2014 rwfranks [...] acm.org - Correspondence added

From:

rwfranks [...] acm.org

On Sat Oct 25 17:42:07 2014, tlhackque wrote: Show quoted text

> I have looked at the code used to patch this, and although I haven't > verified a reproducer

because there are none. The owner field is always transmitted uncompressed. Just one of the topics where RFC2845 is a bit woolly. You could have discovered this for yourself by actually doing the experiment instead of imagining bugs which are not there.

Mon Oct 27 06:11:11 2014 NLNETLABS [...] cpan.org - Correspondence added

Thank you tlhackque, This issue is resolved. You can review the changes in the repository ( http://www.net-dns.org/svn/net-dns/trunk/ ) or via the new release candidate tarball ( http://www.net-dns.org/download/Net-DNS-0.80_3.tar.gz ).

Mon Oct 27 06:11:16 2014 NLNETLABS [...] cpan.org - Status changed from 'open' to 'resolved'