Skip Menu |

This queue is for tickets about the Net-DNS CPAN distribution.

Report information
The Basics
Id: 122352
Status: resolved
Priority: 0/
Queue: Net-DNS

People
Owner: Nobody in particular
Requestors: nick [...] cpanel.net
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: (no value)
Fixed in: (no value)



Subject: UDP bgsend retries a TCP send (blocking) instead of a non-blocking send when igntc is 0 and can stall forever
->bgsend calls ->_bgsend_udp which calls ->send and gets $ans->header->tc So it calls _send_tcp and it won't have blocking(0) set on the socket so it will stall forever if it doesn't get a response.
On Mon Jul 03 18:49:08 2017, bdraco wrote: Show quoted text
> ->bgsend calls ->_bgsend_udp > > which calls > > ->send and gets $ans->header->tc > > So it calls _send_tcp and it won't have blocking(0) set on the socket > so it will stall forever if it doesn't get a response.
This code is in Net::DNS::Resolver::Base
Subject: Re: [rt.cpan.org #122352] UDP bgsend retries a TCP send (blocking) instead of a non-blocking send when igntc is 0 and can stall forever
Date: Mon, 3 Jul 2017 17:49:39 -0600
To: bug-Net-DNS [...] rt.cpan.org
From: Rob Brown <hookbot [...] gmail.com>
Is it possible to avoid calling _send_tcp when using bgsend? On Mon, Jul 3, 2017 at 5:22 PM, J. Nick Koston via RT < bug-Net-DNS@rt.cpan.org> wrote: Show quoted text
> Queue: Net-DNS > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=122352 > > > On Mon Jul 03 18:49:08 2017, bdraco wrote:
> > ->bgsend calls ->_bgsend_udp > > > > which calls > > > > ->send and gets $ans->header->tc > > > > So it calls _send_tcp and it won't have blocking(0) set on the socket > > so it will stall forever if it doesn't get a response.
> > > This code is in Net::DNS::Resolver::Base >
The workaround is to set 'igntc' in order to prevent _send_tcp from being called , and then do the tc check manually + retry. On Mon Jul 03 19:49:51 2017, hookbot@gmail.com wrote: Show quoted text
> Is it possible to avoid calling _send_tcp when using bgsend? > >
From: rwfranks [...] acm.org
On Mon Jul 03 18:49:08 2017, bdraco wrote: Show quoted text
> ->bgsend calls ->_bgsend_udp > > which calls > > ->send and gets $ans->header->tc > > So it calls _send_tcp and it won't have blocking(0) set on the socket > so it will stall forever if it doesn't get a response.
What version of Net::DNS are you using? Have you tried using latest version (1.11)? Also $resolver->tcp_timeout needs to be set to a reasonable value for what you are trying to do. (The default is _very_ long; there is no one value that will satisfy everybody).
Subject: Re: [rt.cpan.org #122352] UDP bgsend retries a TCP send (blocking) instead of a non-blocking send when igntc is 0 and can stall forever
Date: Tue, 4 Jul 2017 10:13:23 -0600
To: bug-Net-DNS [...] rt.cpan.org
From: Rob Brown <bbb [...] cpan.org>
The tcp_timeout() setting does appear to be honored correctly even when using bgsend() On Tue, Jul 4, 2017 at 8:52 AM, Dick Franks via RT <bug-Net-DNS@rt.cpan.org> wrote: Show quoted text
> Queue: Net-DNS > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=122352 > > > On Mon Jul 03 18:49:08 2017, bdraco wrote:
> > ->bgsend calls ->_bgsend_udp > > > > which calls > > > > ->send and gets $ans->header->tc > > > > So it calls _send_tcp and it won't have blocking(0) set on the socket > > so it will stall forever if it doesn't get a response.
> > What version of Net::DNS are you using? > > Have you tried using latest version (1.11)? > > Also $resolver->tcp_timeout needs to be set to a reasonable value for what > you are trying to do. (The default is _very_ long; there is no one value > that will satisfy everybody). >
On Tue Jul 04 10:52:08 2017, rwfranks@acm.org wrote: Show quoted text
> On Mon Jul 03 18:49:08 2017, bdraco wrote:
> > ->bgsend calls ->_bgsend_udp > > > > which calls > > > > ->send and gets $ans->header->tc > > > > So it calls _send_tcp and it won't have blocking(0) set on the socket > > so it will stall forever if it doesn't get a response.
> > What version of Net::DNS are you using? > > Have you tried using latest version (1.11)? > > Also $resolver->tcp_timeout needs to be set to a reasonable value for > what you are trying to do. (The default is _very_ long; there is no > one value that will satisfy everybody).
Even with 1.11 it never times out because its doing the call with a blocking file handler and recvfrom() never returns. strace -p 515453 Process 515453 attached recvfrom(9, ...hang.... #0 0x00007f5f6a22dc83 in __recvfrom_nocancel () from /lib64/libpthread.so.0 #1 0x00007f5f6a52bd53 in Perl_pp_sysread () from /usr/local/cpanel/3rdparty/perl/524/lib64/perl5/5.24.1/x86_64-linux-64int/CORE/libperl.so #2 0x00007f5f6a4ec715 in Perl_runops_standard () from /usr/local/cpanel/3rdparty/perl/524/lib64/perl5/5.24.1/x86_64-linux-64int/CORE/libperl.so #3 0x00007f5f6a48dc3d in perl_run () from /usr/local/cpanel/3rdparty/perl/524/lib64/perl5/5.24.1/x86_64-linux-64int/CORE/libperl.so #4 0x0000000000475493 in main ()
From: rwfranks [...] acm.org
On Tue Jul 04 15:07:29 2017, bdraco wrote: Show quoted text
> On Tue Jul 04 10:52:08 2017, rwfranks@acm.org wrote:
> > On Mon Jul 03 18:49:08 2017, bdraco wrote:
> > > ->bgsend calls ->_bgsend_udp > > > > > > which calls > > > > > > ->send and gets $ans->header->tc > > > > > > So it calls _send_tcp and it won't have blocking(0) set on the > > > socket > > > so it will stall forever if it doesn't get a response.
> > > > What version of Net::DNS are you using? > > > > Have you tried using latest version (1.11)? > > > > Also $resolver->tcp_timeout needs to be set to a reasonable value for > > what you are trying to do. (The default is _very_ long; there is no > > one value that will satisfy everybody).
> > Even with 1.11 it never times out because its doing the call with a > blocking file handler and recvfrom() never returns. > > strace -p 515453 > Process 515453 attached > recvfrom(9, > > ...hang.... > > #0 0x00007f5f6a22dc83 in __recvfrom_nocancel () from > /lib64/libpthread.so.0 > #1 0x00007f5f6a52bd53 in Perl_pp_sysread () from > /usr/local/cpanel/3rdparty/perl/524/lib64/perl5/5.24.1/x86_64-linux- > 64int/CORE/libperl.so > #2 0x00007f5f6a4ec715 in Perl_runops_standard () from > /usr/local/cpanel/3rdparty/perl/524/lib64/perl5/5.24.1/x86_64-linux- > 64int/CORE/libperl.so > #3 0x00007f5f6a48dc3d in perl_run () from > /usr/local/cpanel/3rdparty/perl/524/lib64/perl5/5.24.1/x86_64-linux- > 64int/CORE/libperl.so > #4 0x0000000000475493 in main ()
You claim that bgsend() blocks at the TCP socket->send() $ cat test.pl #!/usr/bin/perl # use 5.24.1; use Net::DNS 1.11; my $resolver = new Net::DNS::Resolver( debug => 0, nameserver => '185.49.140.63', udp_timeout => 10, tcp_timeout => 10, usevc => 0, ); my $handle = $resolver->bgsend(qw(net-dns.org DNSKEY IN)); while ( $resolver->bgbusy($handle) ) { print "$handle not blocked: socktype ", $handle->socktype(), "\n"; select( undef, undef, undef, 0.005 ); # limit CPU burn } my $packet = $resolver->bgread($handle); print 'answer from ', $resolver->answerfrom(), "\n" if $packet; exit; $ perl -w test.pl IO::Socket::IP=GLOB(0x8fac718) not blocked: socktype 2 IO::Socket::IP=GLOB(0x8fac718) not blocked: socktype 2 IO::Socket::IP=GLOB(0x8fac718) not blocked: socktype 2 IO::Socket::IP=GLOB(0x8fac718) not blocked: socktype 2 IO::Socket::IP=GLOB(0x8fac718) not blocked: socktype 2 IO::Socket::IP=GLOB(0x8fac718) not blocked: socktype 2 IO::Socket::IP=GLOB(0x8fad1e8) not blocked: socktype 1 IO::Socket::IP=GLOB(0x8fad1e8) not blocked: socktype 1 IO::Socket::IP=GLOB(0x8fad1e8) not blocked: socktype 1 IO::Socket::IP=GLOB(0x8fad1e8) not blocked: socktype 1 IO::Socket::IP=GLOB(0x8fad1e8) not blocked: socktype 1 answer from 185.49.140.63 This does not appear to be supported by experimental evidence. If this test is not repeatable on your system, the local TCP probably does not support non-blocking sockets. If the test gives a similar result, the problem lies in your application code. If you disagree, please provide a _small_ counter-example which supports your argument.
This is looking more like an underlying system problem as it only appears to happen on RHEL 6/CentOS 6 systems. I've also validated that the application code never does a blocking send (added a die if we try to be sure) { no warnings 'redefine'; *Net::DNS::Resolver::Base::send = sub { die "Ensure we never do a foreground send because it can block forever"; }; *Net::DNS::Resolver::Base::_send_tcp = sub { die "Ensure we never do a foreground send because it can block forever"; }; *Net::DNS::Resolver::Base::_send_udp = sub { die "Ensure we never do a foreground send because it can block forever"; }; } I'm continuing to dig in on this and will update this case when I have more information. On Wed Jul 05 12:19:49 2017, rwfranks@acm.org wrote: Show quoted text
> On Tue Jul 04 15:07:29 2017, bdraco wrote:
> > On Tue Jul 04 10:52:08 2017, rwfranks@acm.org wrote:
> > > On Mon Jul 03 18:49:08 2017, bdraco wrote:
> > > > ->bgsend calls ->_bgsend_udp > > > > > > > > which calls > > > > > > > > ->send and gets $ans->header->tc > > > > > > > > So it calls _send_tcp and it won't have blocking(0) set on the > > > > socket > > > > so it will stall forever if it doesn't get a response.
> > > > > > What version of Net::DNS are you using? > > > > > > Have you tried using latest version (1.11)? > > > > > > Also $resolver->tcp_timeout needs to be set to a reasonable value > > > for > > > what you are trying to do. (The default is _very_ long; there is > > > no > > > one value that will satisfy everybody).
> > > > Even with 1.11 it never times out because its doing the call with a > > blocking file handler and recvfrom() never returns. > > > > strace -p 515453 > > Process 515453 attached > > recvfrom(9, > > > > ...hang.... > > > > #0 0x00007f5f6a22dc83 in __recvfrom_nocancel () from > > /lib64/libpthread.so.0 > > #1 0x00007f5f6a52bd53 in Perl_pp_sysread () from > > /usr/local/cpanel/3rdparty/perl/524/lib64/perl5/5.24.1/x86_64-linux- > > 64int/CORE/libperl.so > > #2 0x00007f5f6a4ec715 in Perl_runops_standard () from > > /usr/local/cpanel/3rdparty/perl/524/lib64/perl5/5.24.1/x86_64-linux- > > 64int/CORE/libperl.so > > #3 0x00007f5f6a48dc3d in perl_run () from > > /usr/local/cpanel/3rdparty/perl/524/lib64/perl5/5.24.1/x86_64-linux- > > 64int/CORE/libperl.so > > #4 0x0000000000475493 in main ()
> > > You claim that bgsend() blocks at the TCP socket->send() > > $ cat test.pl > #!/usr/bin/perl > # > use 5.24.1; > use Net::DNS 1.11; > > my $resolver = new Net::DNS::Resolver( > debug => 0, > nameserver => '185.49.140.63', > udp_timeout => 10, > tcp_timeout => 10, > usevc => 0, > ); > > my $handle = $resolver->bgsend(qw(net-dns.org DNSKEY IN)); > > while ( $resolver->bgbusy($handle) ) { > print "$handle not blocked: socktype ", $handle->socktype(), "\n"; > select( undef, undef, undef, 0.005 ); # limit CPU burn > } > > my $packet = $resolver->bgread($handle); > print 'answer from ', $resolver->answerfrom(), "\n" if $packet; > > exit; > > > $ perl -w test.pl > IO::Socket::IP=GLOB(0x8fac718) not blocked: socktype 2 > IO::Socket::IP=GLOB(0x8fac718) not blocked: socktype 2 > IO::Socket::IP=GLOB(0x8fac718) not blocked: socktype 2 > IO::Socket::IP=GLOB(0x8fac718) not blocked: socktype 2 > IO::Socket::IP=GLOB(0x8fac718) not blocked: socktype 2 > IO::Socket::IP=GLOB(0x8fac718) not blocked: socktype 2 > IO::Socket::IP=GLOB(0x8fad1e8) not blocked: socktype 1 > IO::Socket::IP=GLOB(0x8fad1e8) not blocked: socktype 1 > IO::Socket::IP=GLOB(0x8fad1e8) not blocked: socktype 1 > IO::Socket::IP=GLOB(0x8fad1e8) not blocked: socktype 1 > IO::Socket::IP=GLOB(0x8fad1e8) not blocked: socktype 1 > answer from 185.49.140.63 > > This does not appear to be supported by experimental evidence. > > If this test is not repeatable on your system, the local TCP probably > does not support non-blocking sockets. > > If the test gives a similar result, the problem lies in your > application code. > > If you disagree, please provide a _small_ counter-example which > supports your argument.
On Wed Jul 05 13:51:58 2017, bdraco wrote: Show quoted text
> This is looking more like an underlying system problem as it only > appears to happen on RHEL 6/CentOS 6 systems. > > I've also validated that the application code never does a blocking > send (added a die if we try to be sure) > { > no warnings 'redefine'; > *Net::DNS::Resolver::Base::send = sub { die "Ensure we never > do a foreground send because it can block forever"; }; > *Net::DNS::Resolver::Base::_send_tcp = sub { die "Ensure we never > do a foreground send because it can block forever"; }; > *Net::DNS::Resolver::Base::_send_udp = sub { die "Ensure we never > do a foreground send because it can block forever"; }; > } > > > I'm continuing to dig in on this and will update this case when I have > more information. > > > On Wed Jul 05 12:19:49 2017, rwfranks@acm.org wrote:
> > On Tue Jul 04 15:07:29 2017, bdraco wrote:
> > > On Tue Jul 04 10:52:08 2017, rwfranks@acm.org wrote:
> > > > On Mon Jul 03 18:49:08 2017, bdraco wrote:
> > > > > ->bgsend calls ->_bgsend_udp > > > > > > > > > > which calls > > > > > > > > > > ->send and gets $ans->header->tc > > > > > > > > > > So it calls _send_tcp and it won't have blocking(0) set on the > > > > > socket > > > > > so it will stall forever if it doesn't get a response.
> > > > > > > > What version of Net::DNS are you using? > > > > > > > > Have you tried using latest version (1.11)? > > > > > > > > Also $resolver->tcp_timeout needs to be set to a reasonable value > > > > for > > > > what you are trying to do. (The default is _very_ long; there is > > > > no > > > > one value that will satisfy everybody).
> > > > > > Even with 1.11 it never times out because its doing the call with a > > > blocking file handler and recvfrom() never returns. > > > > > > strace -p 515453 > > > Process 515453 attached > > > recvfrom(9, > > > > > > ...hang.... > > > > > > #0 0x00007f5f6a22dc83 in __recvfrom_nocancel () from > > > /lib64/libpthread.so.0 > > > #1 0x00007f5f6a52bd53 in Perl_pp_sysread () from > > > /usr/local/cpanel/3rdparty/perl/524/lib64/perl5/5.24.1/x86_64- > > > linux- > > > 64int/CORE/libperl.so > > > #2 0x00007f5f6a4ec715 in Perl_runops_standard () from > > > /usr/local/cpanel/3rdparty/perl/524/lib64/perl5/5.24.1/x86_64- > > > linux- > > > 64int/CORE/libperl.so > > > #3 0x00007f5f6a48dc3d in perl_run () from > > > /usr/local/cpanel/3rdparty/perl/524/lib64/perl5/5.24.1/x86_64- > > > linux- > > > 64int/CORE/libperl.so > > > #4 0x0000000000475493 in main ()
> > > > > > You claim that bgsend() blocks at the TCP socket->send() > > > > $ cat test.pl > > #!/usr/bin/perl > > # > > use 5.24.1; > > use Net::DNS 1.11; > > > > my $resolver = new Net::DNS::Resolver( > > debug => 0, > > nameserver => '185.49.140.63', > > udp_timeout => 10, > > tcp_timeout => 10, > > usevc => 0, > > ); > > > > my $handle = $resolver->bgsend(qw(net-dns.org DNSKEY IN)); > > > > while ( $resolver->bgbusy($handle) ) { > > print "$handle not blocked: socktype ", $handle->socktype(), > > "\n"; > > select( undef, undef, undef, 0.005 ); # limit CPU burn > > } > > > > my $packet = $resolver->bgread($handle); > > print 'answer from ', $resolver->answerfrom(), "\n" if $packet; > > > > exit; > > > > > > $ perl -w test.pl > > IO::Socket::IP=GLOB(0x8fac718) not blocked: socktype 2 > > IO::Socket::IP=GLOB(0x8fac718) not blocked: socktype 2 > > IO::Socket::IP=GLOB(0x8fac718) not blocked: socktype 2 > > IO::Socket::IP=GLOB(0x8fac718) not blocked: socktype 2 > > IO::Socket::IP=GLOB(0x8fac718) not blocked: socktype 2 > > IO::Socket::IP=GLOB(0x8fac718) not blocked: socktype 2 > > IO::Socket::IP=GLOB(0x8fad1e8) not blocked: socktype 1 > > IO::Socket::IP=GLOB(0x8fad1e8) not blocked: socktype 1 > > IO::Socket::IP=GLOB(0x8fad1e8) not blocked: socktype 1 > > IO::Socket::IP=GLOB(0x8fad1e8) not blocked: socktype 1 > > IO::Socket::IP=GLOB(0x8fad1e8) not blocked: socktype 1 > > answer from 185.49.140.63 > > > > This does not appear to be supported by experimental evidence. > > > > If this test is not repeatable on your system, the local TCP probably > > does not support non-blocking sockets. > > > > If the test gives a similar result, the problem lies in your > > application code. > > > > If you disagree, please provide a _small_ counter-example which > > supports your argument.
Its looking more like a kernel bug as the problem only happens if the file handle was previously used to read netlink data from the kernel (even after close) 303262 recvmsg(10, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\24\0\0\0\3\0\2\0\36V^Y\236\240\4\0\0\0\0\0", 4096}], msg_controllen=0, msg_flags=0}, 0) = 20 303262 close(10) = 0 Show quoted text
--- Used for netlink above ----
--- Now 10 is reused for TCP --- 303262 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 10 303262 ioctl(10, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7ffcbd3b7280) = -1 EINVAL (Invalid argument) 303262 lseek(10, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) 303262 ioctl(10, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7ffcbd3b7280) = -1 EINVAL (Invalid argument) 303262 lseek(10, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) 303262 fcntl(10, F_SETFD, FD_CLOEXEC) = 0 303262 bind(10, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 303262 fcntl(10, F_GETFL) = 0x2 (flags O_RDWR) 303262 fcntl(10, F_SETFL, O_RDWR|O_NONBLOCK) = 0 303262 connect(10, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("121.51.128.164")}, 16) = -1 EINPROGRESS (Operation now in progress) 303262 select(16, NULL, [10], [10], {3, 0}) = 1 (left {1, 777870}) 303262 getsockopt(10, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 303262 fcntl(10, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK) 303262 fcntl(10, F_SETFL, O_RDWR) = 0 303262 getpeername(10, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("121.51.128.164")}, [16]) = 0 303262 getpeername(10, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("121.51.128.164")}, [16]) = 0 303262 sendto(10, "\0/~,\0\0\0\1\0\0\0\0\0\1\7webmail\6szsqcn\3com\0\0\1\0\1\0\0)\4\0\0\0\0\0\0\0", 49, 0, NULL, 0) = 49 ---- SNIP ---- 303262 select(16, [8 9 10 11 12], NULL, NULL, {3, 0}) = 5 (in [8 9 10 11 12], left {2, 999995}) 303262 select(16, [10], NULL, NULL, {0, 200000}) = 1 (in [10], left {0, 199998}) 303262 select(16, [10], NULL, NULL, {0, 200000}) = 1 (in [10], left {0, 199998}) 303262 select(16, [10], NULL, NULL, {0, 0}) = 1 (in [10], left {0, 0}) 303262 recvfrom(10, "\0|", 2, 0, {sa_family=0x7465 /* AF_??? */, sa_data="h0:cp15\0\0\0@\0\0\0"}, [0]) = 2 303262 recvfrom(10, <== STALL
From: rwfranks [...] acm.org
On Thu Jul 06 11:42:22 2017, bdraco wrote: Show quoted text
> On Wed Jul 05 13:51:58 2017, bdraco wrote:
> > This is looking more like an underlying system problem as it only > > appears to happen on RHEL 6/CentOS 6 systems. > >
Are we absolved from blame?
On Thu Jul 06 12:46:25 2017, rwfranks@acm.org wrote: Show quoted text
> On Thu Jul 06 11:42:22 2017, bdraco wrote:
> > On Wed Jul 05 13:51:58 2017, bdraco wrote:
> > > This is looking more like an underlying system problem as it only > > > appears to happen on RHEL 6/CentOS 6 systems. > > >
> > Are we absolved from blame?
Changing _read_tcp to accept $flags and passing in MSG_DONTWAIT from _bgread seems to make the problem go away. Is it possible to get a fragmented packet here where the size is readable but the rest is still coming down the wire? # # Usage: $data = _read_tcp($socket); # sub _read_tcp { my($socket, $flags) = @_; my ( $s1, $s2 ); $socket->recv( $s1, 2, $flags ); # one lump $socket->recv( $s2, 2 - length $s1, $flags ); # or two? my $size = unpack 'n', pack( 'a*a*@2', $s1, $s2 ); my $buffer = ''; while ( ( my $read = length $buffer ) < $size ) { # During some of my tests recv() returned undef even # though there was no error. Checking the amount # of data read appears to work around that problem. my $recv_buf; $socket->recv( $recv_buf, $size - $read, $flags ); $buffer .= $recv_buf || last; } return $buffer; }
It looks like this is to blame https://rt.cpan.org/Public/Bug/Display.html?id=112334 I'm attempting to verify now.
CC: Rob Brown <bbb [...] cpan.org>
Subject: Re: [rt.cpan.org #122352] UDP bgsend retries a TCP send (blocking) instead of a non-blocking send when igntc is 0 and can stall forever
Date: Thu, 6 Jul 2017 14:51:56 -0600
To: bug-Net-DNS [...] rt.cpan.org
From: Rob Brown <bbb [...] cpan.org>
Nick, Does this problem with Net::DNS still occur with IO::Socket::IP 0.39? Or only with IO::Socket::IP 0.37? On Thu, Jul 6, 2017 at 2:42 PM, J. Nick Koston via RT < bug-Net-DNS@rt.cpan.org> wrote: Show quoted text
> Queue: Net-DNS > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=122352 > > > > It looks like this is to blame > > https://rt.cpan.org/Public/Bug/Display.html?id=112334 > > I'm attempting to verify now. >
The problem appears to be solved with 0.39. I'm letting this run over the weekend to ensure there are no more stalls. I'm update next week to confirm. On Thu Jul 06 16:52:09 2017, BBB wrote: Show quoted text
> Nick, > > Does this problem with Net::DNS still occur with IO::Socket::IP 0.39? Or > only with IO::Socket::IP 0.37? > > On Thu, Jul 6, 2017 at 2:42 PM, J. Nick Koston via RT < > bug-Net-DNS@rt.cpan.org> wrote: >
> > Queue: Net-DNS > > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=122352 > > > > > > > It looks like this is to blame > > > > https://rt.cpan.org/Public/Bug/Display.html?id=112334 > > > > I'm attempting to verify now. > >
Subject: Re: [rt.cpan.org #122352] UDP bgsend retries a TCP send (blocking) instead of a non-blocking send when igntc is 0 and can stall forever
Date: Fri, 07 Jul 2017 14:30:49 +0000
To: bug-Net-DNS [...] rt.cpan.org
From: Rob Brown <bbb [...] cpan.org>
Well I don't have any experience with Netlink, so I wouldn't even know how to test this. On Fri, Jul 7, 2017 at 8:28 AM J. Nick Koston via RT < bug-Net-DNS@rt.cpan.org> wrote: Show quoted text
> Queue: Net-DNS > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=122352 > > > The problem appears to be solved with 0.39. I'm letting this run over > the weekend to ensure there are no more stalls. I'm update next week to > confirm. > > > On Thu Jul 06 16:52:09 2017, BBB wrote:
> > Nick, > > > > Does this problem with Net::DNS still occur with IO::Socket::IP 0.39? Or > > only with IO::Socket::IP 0.37? > > > > On Thu, Jul 6, 2017 at 2:42 PM, J. Nick Koston via RT < > > bug-Net-DNS@rt.cpan.org> wrote: > >
> > > Queue: Net-DNS > > > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=122352 > > > > > > > > > > It looks like this is to blame > > > > > > https://rt.cpan.org/Public/Bug/Display.html?id=112334 > > > > > > I'm attempting to verify now. > > >
>
From: rwfranks [...] acm.org
On Thu Jul 06 14:33:24 2017, bdraco wrote: Show quoted text
> > Changing _read_tcp to accept $flags and passing in MSG_DONTWAIT from > _bgread seems to make the problem go away.
That may well be so, but socket->recv() options are system-specific and inherently non-portable. Show quoted text
> Is it possible to get a fragmented packet here where the size is > readable but the rest is still coming down the wire?
Worse, we have seen examples where the fragmentation boundary occurred between the length bytes! I suspect this happens when a firewall dribbles out bytes slowly to keep the connection alive while it inspects the packet contents.
On Fri Jul 07 12:43:06 2017, rwfranks@acm.org wrote: Show quoted text
> On Thu Jul 06 14:33:24 2017, bdraco wrote: >
> > > > Changing _read_tcp to accept $flags and passing in MSG_DONTWAIT from > > _bgread seems to make the problem go away.
> > That may well be so, but socket->recv() options are system-specific > and inherently non-portable. > >
> > Is it possible to get a fragmented packet here where the size is > > readable but the rest is still coming down the wire?
> > Worse, we have seen examples where the fragmentation boundary occurred > between the length bytes! > > I suspect this happens when a firewall dribbles out bytes slowly to > keep the connection alive while it inspects the packet contents.
You could have a separate handler for non-blocking reads over tcp. Read in as much as you can in an IO::Select can_read checked loop and then stop once you have a full response or timeout. As for the hanging, it appears switching to IO::Socket::IP 0.39 has solved this. I'm going to check again on monday to be sure, however no stalls since upgrading IO::Socket::IP.
From: rwfranks [...] acm.org
On Sat Jul 08 18:54:04 2017, bdraco wrote: Show quoted text
> On Fri Jul 07 12:43:06 2017, rwfranks@acm.org wrote:
> > On Thu Jul 06 14:33:24 2017, bdraco wrote:
Show quoted text
> You could have a separate handler for non-blocking reads over tcp. > Read in as much as you can in an IO::Select can_read checked loop and > then stop once you have a full response or timeout.
This would need to be integrated with bgbusy() to be effective from a user POV. Unless this can be shown to affect a significant number of users, the additional complication would far outweigh the possible benefit. Show quoted text
> As for the hanging, it appears switching to IO::Socket::IP 0.39 has > solved this. I'm going to check again on monday to be sure, however > no stalls since upgrading IO::Socket::IP.
I am assuming 0.38 (bundled with 5.26.0) would be equally effective. Change between 0.38 and 0.39 seems to involve support for disabling V6_ONLY and appears not to be relevant here.
0.38 does appear to resolve the problem as well. I believe IO::Socket::IP is the root cause here. I think this bug can be closed out. On Sun Jul 09 06:51:24 2017, rwfranks@acm.org wrote: Show quoted text
> On Sat Jul 08 18:54:04 2017, bdraco wrote:
> > On Fri Jul 07 12:43:06 2017, rwfranks@acm.org wrote:
> > > On Thu Jul 06 14:33:24 2017, bdraco wrote:
>
> > You could have a separate handler for non-blocking reads over tcp. > > Read in as much as you can in an IO::Select can_read checked loop and > > then stop once you have a full response or timeout.
> > This would need to be integrated with bgbusy() to be effective from a > user POV. Unless this can be shown to affect a significant number of > users, the additional complication would far outweigh the possible > benefit. > >
> > As for the hanging, it appears switching to IO::Socket::IP 0.39 has > > solved this. I'm going to check again on monday to be sure, however > > no stalls since upgrading IO::Socket::IP.
> > I am assuming 0.38 (bundled with 5.26.0) would be equally effective. > Change between 0.38 and 0.39 seems to involve support for disabling > V6_ONLY and appears not to be relevant here.
From: rwfranks [...] acm.org
On Tue Jul 11 18:06:54 2017, bdraco wrote: Show quoted text
> > 0.38 does appear to resolve the problem as well. I believe > IO::Socket::IP is the root cause here. I think this bug can be > closed out.
Thanks
Problem lies somewhere else