Skip Menu |

This queue is for tickets about the Net-DNS CPAN distribution.

Report information
The Basics
Id: 122586
Status: resolved
Priority: 0/
Queue: Net-DNS

People
Owner: Nobody in particular
Requestors: bbkr [...] post.pl
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 1.09
Fixed in: (no value)



Subject: Persistent UDP reports false timeouts.
Scenario. This is time dependent so I cannot provide code that will allow to reproduce it with 100% reliability: my $resolver = Net::DNS::Resolver->new( 'udp_timeout' => 2, 'persistent_udp' => 1, 'retrans' => 1, 'retry' => 1 ); $resolver->search( 'a.com', 'MX' ); # domain that triggers timeout $resolver->search( 'b.com', 'MX' ); # valid domain but gets error "FAIL query timed out" From my code understanding: 1. request is sent to NS about a.com 2. timeout is triggered 3. request is sent to NS about b.com 4. NS socket is added to select 5. NS socket has answer (from domain a.com!!) 6. NS socket is removed from select 7. response gets parsed by _decode_reply() 8. it gets rejected because header IDs does not match 9 control goes back to can_read(), but socket was removed from select in 6 10. socket read loop exits, no sockets to read from 11. domain b.com gets "query timed out" false error 12. true b.com answer is not read from socket and may cause exactly the same issue with next query Fix: Maybe failed header ID (in 8) should put socket back to select monitoring pool when persistent UDP is enabled?
From: rwfranks [...] acm.org
On Mon Jul 24 11:34:57 2017, bbkr@post.pl wrote: Show quoted text
> Scenario. This is time dependent so I cannot provide code that will > allow to reproduce it with 100% reliability:
We can not reproduce it with more than 0% reliability unless you provide an example complete with query and nameserver addresses.
From: bbkr [...] post.pl
Please run following script: cat domains_test.txt | perl dns_test.pl Test file contains tons of bad domains interleaved with 'gmail.com'. You should quickly ran into following scenario: ===================================================== gmail.com ;; search( gmail.com MX ) ;; udp send [31.11.202.254]:53 ;; answer from [31.11.202.254] 28 bytes ;; HEADER SECTION ;; id = 41913 ;; qr = 1 aa = 0 tc = 0 rd = 1 opcode = QUERY ;; ra = 1 z = 0 ad = 0 cd = 0 rcode = SERVFAIL ;; qdcount = 1 ancount = 0 nscount = 0 arcount = 0 ;; do = 0 ;; QUESTION SECTION (1 record) ;; 040307.com. IN MX ;; ANSWER SECTION (0 records) ;; AUTHORITY SECTION (0 records) ;; ADDITIONAL SECTION (0 records) ;; query timed out ===================================================== So the question about gmail.com gets some answer from previous questions. Library rejects this message and has no more sockets to monitor, so it gives false "query timed out" result. Tested on perl 5.24.1 + Net::DNS 1.09 and also on perl 5.24.2 + Net::DNS 1.11. Please tell me if that was enough to reproduce issue on your side.
Subject: domains_test.txt

Message body is not shown because it is too large.

Subject: dns_test.pl
use v5.24; use Net::DNS; use Data::Dumper; my $resolver = Net::DNS::Resolver->new( 'nameservers' => ['31.11.202.254'], 'udp_timeout' => 1, 'persistent_udp' => 1, 'retrans' => 1, 'retry' => 1 ); $resolver->searchlist(); $resolver->{'debug'} = 1; while (<>) { chomp; my $domain = $_; say "=" x 128; say $domain; $resolver->search( $domain, 'MX' ); my $status = $resolver->errorstring; die if $domain eq 'gmail.com' and $status =~ /query timed out/; }
From: rwfranks [...] acm.org
On Mon Jul 24 19:57:53 2017, bbkr@post.pl wrote: Show quoted text
> Please run following script: > > cat domains_test.txt | perl dns_test.pl
Thanks Show quoted text
> Tested on perl 5.24.1 + Net::DNS 1.09 and also on perl 5.24.2 + > Net::DNS 1.11.
Which answers the obvious question. Show quoted text
> Please tell me if that was enough to reproduce issue on your side.
Your script does time out, as you say; but I am not certain that I am seeing evidence of transactions getting mis-matched in the way you described. UDP time out is not a rare occurrence, which is why all the retry logic is there. Also, my result differs from yours: ra = 0 [1] rcode = REFUSED [SERVFAIL] so I am not confident that we really are both doing the same experiment. gmail.com ;; search( gmail.com MX ) ;; udp send [31.11.202.254]:53 ;; answer from [31.11.202.254] 27 bytes ;; HEADER SECTION ;; id = 60230 ;; qr = 1 aa = 0 tc = 0 rd = 1 opcode = QUERY ;; ra = 0 z = 0 ad = 0 cd = 0 rcode = REFUSED ;; qdcount = 1 ancount = 0 nscount = 0 arcount = 0 ;; do = 0 ;; QUESTION SECTION (1 record) ;; gmail.com. IN MX ;; ANSWER SECTION (0 records) ;; AUTHORITY SECTION (0 records) ;; ADDITIONAL SECTION (0 records) I will try to add some instrumentation inside Net::DNS to report ID mis-matches as they occur. May take some time to get to the bottom of this!
From: bbkr [...] post.pl
Show quoted text
> Your script does time out, as you say; but I am not certain that I am > seeing evidence of transactions getting mis-matched in the way you > described. UDP time out is not a rare occurrence, which is why all > the retry logic is there.
In my case I cannot reproduce request-response mismatch on Docker (dunno why). But on macOS (10.10.5) + perlbrew or Linux (debian Jessie) + perlbrew it pops up usually within first minute of test script running against different nameservers maintained by ISPs in my country. Can you try some NSes close to you or adjust timeout to 2s? I know UDP timeouts may happen, however in this particular case that I've pasted above this is fake error - there was no timeout on gmail.com - overdue message was consumed and socket monitoring ended prematurely.
From: rwfranks [...] acm.org
On Tue Jul 25 09:36:46 2017, bbkr@post.pl wrote: Show quoted text
> Can you try some NSes close to you or adjust timeout to 2s?
Using my ISP's nameserver, script got through 90% of list before a timeout occurred. Show quoted text
> I know UDP timeouts may happen, however in this particular case that > I've pasted above this is fake error - there was no timeout on > gmail.com - overdue message was consumed and socket monitoring ended > prematurely.
Your script does not test for that condition, so we will need to devise a better test. More immediately, we need to explain why I get ra = 0 and rcode = REFUSED whereas your earlier posting had ra = 1 and rcode = SERVFAIL. I would like to see the result you get from: echo 'gmail.com' | perl dns_test.pl
From: bbkr [...] post.pl
Show quoted text
> I would like to see the result you get from: > > echo 'gmail.com' | perl dns_test.pl
;; search( gmail.com MX ) ;; udp send [31.11.202.254]:53 ;; answer from [31.11.202.254] 366 bytes ;; HEADER SECTION ;; id = 21477 ;; qr = 1 aa = 0 tc = 0 rd = 1 opcode = QUERY ;; ra = 1 z = 0 ad = 0 cd = 0 rcode = NOERROR ;; qdcount = 1 ancount = 5 nscount = 4 arcount = 9 ;; do = 0 ;; QUESTION SECTION (1 record) ;; gmail.com. IN MX ;; ANSWER SECTION (5 records) gmail.com. 2004 IN MX 20 alt2.gmail-smtp-in.l.google.com. gmail.com. 2004 IN MX 30 alt3.gmail-smtp-in.l.google.com. gmail.com. 2004 IN MX 5 gmail-smtp-in.l.google.com. gmail.com. 2004 IN MX 10 alt1.gmail-smtp-in.l.google.com. gmail.com. 2004 IN MX 40 alt4.gmail-smtp-in.l.google.com. ;; AUTHORITY SECTION (4 records) gmail.com. 86384 IN NS ns4.google.com. gmail.com. 86384 IN NS ns3.google.com. gmail.com. 86384 IN NS ns2.google.com. gmail.com. 86384 IN NS ns1.google.com. ;; ADDITIONAL SECTION (9 records) alt3.gmail-smtp-in.l.google.com. 204 IN A 74.125.30.26 gmail-smtp-in.l.google.com. 188 IN A 173.194.222.26 alt1.gmail-smtp-in.l.google.com. 220 IN A 74.125.23.26 alt4.gmail-smtp-in.l.google.com. 195 IN A 64.233.182.27 alt2.gmail-smtp-in.l.google.com. 213 IN A 74.125.28.27 ns2.google.com. 86384 IN A 216.239.34.10 ns1.google.com. 86384 IN A 216.239.32.10 ns3.google.com. 86384 IN A 216.239.36.10 ns4.google.com. 86384 IN A 216.239.38.10
From: rwfranks [...] acm.org
On Tue Jul 25 13:01:17 2017, bbkr@post.pl wrote: Show quoted text
> > I would like to see the result you get from: > > > > echo 'gmail.com' | perl dns_test.pl
> > > > ;; search( gmail.com MX ) > > ;; udp send [31.11.202.254]:53 > > ;; answer from [31.11.202.254] 366 bytes > ;; HEADER SECTION > ;; id = 21477 > ;; qr = 1 aa = 0 tc = 0 rd = 1 opcode = QUERY > ;; ra = 1 z = 0 ad = 0 cd = 0 rcode = NOERROR
====== So I think you need to explain why this differs from my result above. My instrumented version ran to completion without ID mismatch and without a timeout. Total of 61104 queries, all refused.
From: bbkr [...] post.pl
Show quoted text
> So I think you need to explain why this differs from my result above. > > My instrumented version ran to completion without ID mismatch and > without a timeout. Total of 61104 queries, all refused.
Maybe NS hardcoded in test file blocks requests from outside of certain IP range? I'm not administrating it so I cannot tell for sure. As for test that you ran against your local NS, the one that timeouted on 90% - did you get request-response mismatch there? I'll try to reproduce mismatch using Net::DNS::Nameserver to get more reliable test case.
From: bbkr [...] post.pl
I was able to reproduce mismatch locally. On one console please run server and on second console new client: cat domains_test.txt | perl dns_test2.pl Request/response mismatch is reproducible with 100% rate on first pair of domains in my case: =====================================================000gmail.com ;; search( 000gmail.com A ) ;; udp send [127.0.0.1]:15353 ;; query timed out =====================================================gmail.com ;; search( gmail.com A ) ;; udp send [127.0.0.1]:15353 ;; answer from [127.0.0.1] 46 bytes ;; HEADER SECTION ;; id = 155 ;; qr = 1 aa = 1 tc = 0 rd = 1 opcode = QUERY ;; ra = 0 z = 0 ad = 0 cd = 0 rcode = NOERROR ;; qdcount = 1 ancount = 1 nscount = 0 arcount = 0 ;; do = 0 ;; QUESTION SECTION (1 record) ;; 000gmail.com. IN A ;; ANSWER SECTION (1 record) 000gmail.com. 3600 IN A 1.1.1.1 ;; AUTHORITY SECTION (0 records) ;; ADDITIONAL SECTION (0 records) ;; query timed out ===================================================== So gmail.com gets response from 000gmail.com and claims false timeout.
Subject: dns_test2.pl
use v5.24; use Net::DNS; use Data::Dumper; my $resolver = Net::DNS::Resolver->new( 'nameservers' => ['127.0.0.1'], 'udp_timeout' => 1, 'persistent_udp' => 1, 'retrans' => 1, 'retry' => 1 ); $resolver->searchlist(); $resolver->port(15353); $resolver->{'debug'} = 1; while (<>) { chomp; my $domain = $_; say "=" x 128; say $domain; $resolver->search( $domain, 'A' ); my $status = $resolver->errorstring; die if $domain eq 'gmail.com' and $status =~ /query timed out/; }
Subject: dns_server.pl
use v5.24; use strict; use warnings; use Net::DNS::Nameserver; my $nameserver = Net::DNS::Nameserver->new( 'LocalAddr' => [ '127.0.0.1' ], 'LocalPort' => 15353, 'ReplyHandler' => \&reply_handler, ) or die; sub reply_handler { my ( $qname, $qclass, $qtype, $peerhost, $query, $conn ) = @_; my ( $rcode, @ans, @auth, @add ); sleep 1 unless $qname eq 'gmail.com'; my $rr = Net::DNS::RR->new("$qname 3600 $qclass $qtype 1.1.1.1"); return ( 'NOERROR', [ $rr ], [], [], { 'aa' => 1 }, {} ); } $nameserver->main_loop;
From: rwfranks [...] acm.org
On Tue Jul 25 15:13:59 2017, bbkr@post.pl wrote: Show quoted text
> > As for test that you ran against your local NS, the one that timeouted > on 90% - did you get request-response mismatch there?
No, just timed out.
From: rwfranks [...] acm.org
On Tue Jul 25 15:38:47 2017, bbkr@post.pl wrote: Show quoted text
> I was able to reproduce mismatch locally. On one console please run > server and on second console new client:
Thanks. Now I have a repeatable example to work on.
From: rwfranks [...] acm.org
Fixed in Net::DNS 1.12
Resolved in 1.12