Skip Menu |

This queue is for tickets about the libwww-perl CPAN distribution.

Report information
The Basics
Id: 111537
Status: resolved
Priority: 0/
Queue: libwww-perl

People
Owner: Nobody in particular
Requestors: stefan.enzinger [...] student.tugraz.at
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: getprint -> high cpu usage
Date: Fri, 29 Jan 2016 06:37:41 +0100
To: bug-libwww-perl [...] rt.cpan.org
From: Stefan Enzinger <stefan.enzinger [...] student.tugraz.at>
Hello, thanks for this great project! My system: ubuntu 14.04.3 This is perl 5, version 18, subversion 2 (v5.18.2) built for i686-linux-gnu-thread-multi-64int (with 41 registered patches, see perl -V for more detail) Linux lendtv 3.13.0-76-generic #120-Ubuntu SMP Mon Jan 18 15:58:41 UTC 2016 i686 athlon i686 GNU/Linux ------------ I'm new to perl and especially libwww-perl. Just calling getprint() for huge (1GB+ file) results in high cpu usage. Is this known behaviour and are there any workarounds? My ultimate goal is to mimic a proxy by requesting a http:// file from a lan ressource and providing it to the www. (For use in mythweb in case you've heard of it) This code reproduces high cpu usage when called in my shell: * A file is downloaded from the www (linux iso) * redirect output to /dev/null and watch output of "top" ---------------- #!/usr/bin/perl use LWP::Simple qw(!head); my $dlurl ="http://mirror.inode.at/grml//grml96-full_2014.11.iso"; my $name = "filename.iso"; # Retrieve header to determine size and type my ($type, $size) = LWP::Simple::head("$dlurl"); unless ($type and $size){ print header(), "file could not be retrieved from remote backend"; exit 0; } # Set the new header # print header(-type => $type, # -Content_length => $size, # -Content_disposition => " attachment; filename=\"$name\"", # ); # Passthrough the requested file my $status = getprint("$dlurl"); exit 0;
On 2016-01-29 00:38:08, stefan.enzinger@student.tugraz.at wrote: Show quoted text
> Hello, thanks for this great project! > > My system: > ubuntu 14.04.3 > > This is perl 5, version 18, subversion 2 (v5.18.2) built for > i686-linux-gnu-thread-multi-64int > (with 41 registered patches, see perl -V for more detail) > > Linux lendtv 3.13.0-76-generic #120-Ubuntu SMP Mon Jan 18 15:58:41 UTC > 2016 i686 athlon i686 GNU/Linux > > ------------ > > I'm new to perl and especially libwww-perl. > > Just calling getprint() for huge (1GB+ file) results in high cpu usage. > Is this known behaviour and are there any workarounds?
Probably you should work with callbacks (:content_cb). See https://metacpan.org/pod/LWP::UserAgent#ua-get-url Show quoted text
> > My ultimate goal is to mimic a proxy by requesting a http:// file from a > lan ressource and providing it to the www. > (For use in mythweb in case you've heard of it) > > This code reproduces high cpu usage when called in my shell: > * A file is downloaded from the www (linux iso) > * redirect output to /dev/null and watch output of "top" > > ---------------- > > #!/usr/bin/perl > > use LWP::Simple qw(!head); > > my $dlurl ="http://mirror.inode.at/grml//grml96-full_2014.11.iso"; > my $name = "filename.iso"; > > # Retrieve header to determine size and type > my ($type, $size) = LWP::Simple::head("$dlurl"); > > unless ($type and $size){ > print header(), > "file could not be retrieved from remote backend"; > exit 0; > } > > # Set the new header > # print header(-type => $type, > # -Content_length => $size, > # -Content_disposition => " attachment; filename=\"$name\"", > # ); > > # Passthrough the requested file > my $status = getprint("$dlurl"); > > exit 0;
Subject: Re: [rt.cpan.org #111537] getprint -> high cpu usage
Date: Mon, 8 Feb 2016 04:13:41 +0100
To: bug-libwww-perl [...] rt.cpan.org
From: Stefan Enzinger <stefan.enzinger [...] student.tugraz.at>
On 2016-01-29 21:14, Slaven_Rezic via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=111537 > > > On 2016-01-29 00:38:08, stefan.enzinger@student.tugraz.at wrote:
>> Hello, thanks for this great project! >> >> My system: >> ubuntu 14.04.3 >> >> This is perl 5, version 18, subversion 2 (v5.18.2) built for >> i686-linux-gnu-thread-multi-64int >> (with 41 registered patches, see perl -V for more detail) >> >> Linux lendtv 3.13.0-76-generic #120-Ubuntu SMP Mon Jan 18 15:58:41 UTC >> 2016 i686 athlon i686 GNU/Linux >> >> ------------ >> >> I'm new to perl and especially libwww-perl. >> >> Just calling getprint() for huge (1GB+ file) results in high cpu usage. >> Is this known behaviour and are there any workarounds?
> > Probably you should work with callbacks (:content_cb). See > https://metacpan.org/pod/LWP::UserAgent#ua-get-url
I looked in the source and getprint actually works with callbacks: https://github.com/libwww-perl/libwww-perl/blob/c1006d428cef2290d1b41e4a9cb7b2981262caf6/lib/LWP/Simple.pm#L61 The solution was to increase :read_size_hint. The default seemed to be at 4096. Small changes didn't make much difference, so I set it to 4*1024*1024 and that seems to fix it. Throughput doubled (fast ethernet) and cpu usage dropped. I'd be glad though if anybody could suggest a good value for :read_size_hint for 3-10GB files Thanks for the hint Show quoted text
>> >> My ultimate goal is to mimic a proxy by requesting a http:// file from a >> lan ressource and providing it to the www. >> (For use in mythweb in case you've heard of it) >> >> This code reproduces high cpu usage when called in my shell: >> * A file is downloaded from the www (linux iso) >> * redirect output to /dev/null and watch output of "top" >> >> ---------------- >> >> #!/usr/bin/perl >> >> use LWP::Simple qw(!head); >> >> my $dlurl ="http://mirror.inode.at/grml//grml96-full_2014.11.iso"; >> my $name = "filename.iso"; >> >> # Retrieve header to determine size and type >> my ($type, $size) = LWP::Simple::head("$dlurl"); >> >> unless ($type and $size){ >> print header(), >> "file could not be retrieved from remote backend"; >> exit 0; >> } >> >> # Set the new header >> # print header(-type => $type, >> # -Content_length => $size, >> # -Content_disposition => " attachment; filename=\"$name\"", >> # ); >> >> # Passthrough the requested file >> my $status = getprint("$dlurl"); >> >> exit 0;
> > >
On 2016-02-07 22:13:58, stefan.enzinger@student.tugraz.at wrote: Show quoted text
> > On 2016-01-29 21:14, Slaven_Rezic via RT wrote:
> > <URL: https://rt.cpan.org/Ticket/Display.html?id=111537 > > > > > On 2016-01-29 00:38:08, stefan.enzinger@student.tugraz.at wrote:
> >> Hello, thanks for this great project! > >> > >> My system: > >> ubuntu 14.04.3 > >> > >> This is perl 5, version 18, subversion 2 (v5.18.2) built for > >> i686-linux-gnu-thread-multi-64int > >> (with 41 registered patches, see perl -V for more detail) > >> > >> Linux lendtv 3.13.0-76-generic #120-Ubuntu SMP Mon Jan 18 15:58:41 > >> UTC > >> 2016 i686 athlon i686 GNU/Linux > >> > >> ------------ > >> > >> I'm new to perl and especially libwww-perl. > >> > >> Just calling getprint() for huge (1GB+ file) results in high cpu > >> usage. > >> Is this known behaviour and are there any workarounds?
> > > > Probably you should work with callbacks (:content_cb). See > > https://metacpan.org/pod/LWP::UserAgent#ua-get-url
> > I looked in the source and getprint actually works with callbacks: > https://github.com/libwww-perl/libwww- > perl/blob/c1006d428cef2290d1b41e4a9cb7b2981262caf6/lib/LWP/Simple.pm#L61 > > The solution was to increase :read_size_hint. The default seemed to be > at 4096. Small changes didn't make much difference, so I set it to > 4*1024*1024 and that seems to fix it. Throughput doubled (fast > ethernet) > and cpu usage dropped. > > I'd be glad though if anybody could suggest a good value for > :read_size_hint for 3-10GB files > > Thanks for the hint
I can imagine that the best value for :read_size_hint depends on many factors: the actual hardware, OS, perl version, network protocol ... Attached is a benchmark script which is fetching a large local file using LWP and printing CPU times for this task. In my setup it seems that read_size_hint=>1MB gives the best results. With higher values the system CPU time goes up (but this may be caused by using a file:// URL). With lower values the use CPU time goes up because of perl interpreter overhead. Regards, Slaven Show quoted text
> >> > >> My ultimate goal is to mimic a proxy by requesting a http:// file > >> from a > >> lan ressource and providing it to the www. > >> (For use in mythweb in case you've heard of it) > >> > >> This code reproduces high cpu usage when called in my shell: > >> * A file is downloaded from the www (linux iso) > >> * redirect output to /dev/null and watch output of "top" > >> > >> ---------------- > >> > >> #!/usr/bin/perl > >> > >> use LWP::Simple qw(!head); > >> > >> my $dlurl ="http://mirror.inode.at/grml//grml96-full_2014.11.iso"; > >> my $name = "filename.iso"; > >> > >> # Retrieve header to determine size and type > >> my ($type, $size) = LWP::Simple::head("$dlurl"); > >> > >> unless ($type and $size){ > >> print header(), > >> "file could not be retrieved from remote backend"; > >> exit 0; > >> } > >> > >> # Set the new header > >> # print header(-type => $type, > >> # -Content_length => $size, > >> # -Content_disposition => " attachment; > >> filename=\"$name\"", > >> # ); > >> > >> # Passthrough the requested file > >> my $status = getprint("$dlurl"); > >> > >> exit 0;
> > > > > >
Subject: lwp-readsizehint-bench.pl
#!/usr/bin/perl -w use strict; use autodie; use HTTP::Request; use LWP::UserAgent; my $file = shift or die "Test file?"; my $file_size = -s $file; print STDERR "Size of file: $file_size, perl: $], LWP: $LWP::VERSION, OS: $^O\n"; my $url = "file://$file"; my $ua = LWP::UserAgent->new; my $req = HTTP::Request->new(GET => $url); my $callback = sub { print $_[0] }; my $ignore = 1; for my $sizehint (1024*1024, # ignored, just for warmup 1024*1024*1024, 50*1024*1024, 1024*1024, 64*1024, 4096) { my(undef,undef,$cuser0,$csystem0) = times; my $pid = fork; if ($pid == 0) { open STDOUT, '>', '/dev/null'; $ua->request($req, $callback, $sizehint); exit; } waitpid $pid, 0; if ($ignore) { $ignore = 0; next } my(undef,undef,$cuser1,$csystem1) = times; my $delta_cuser = ($cuser1-$cuser0); my $delta_csystem = ($csystem1-$csystem0); printf STDERR "read_size_hint: %10d, child user time: %7.4f, child system time: %7.4f, sum: %7.4f\n", $sizehint, $delta_cuser, $delta_csystem, ($delta_cuser+$delta_csystem); } __END__ =head1 RESULTS Test file: 1.1GB $ perl5.23.7 lwp-readsizehint-bench.pl SDV_0228.MP4 Size of file: 1140644368, perl: 5.023007, LWP: 6.15, OS: freebsd read_size_hint: 1073741824, child user time: 0.1562, child system time: 0.8828, sum: 1.0391 read_size_hint: 52428800, child user time: 0.1406, child system time: 0.8906, sum: 1.0312 read_size_hint: 1048576, child user time: 0.2266, child system time: 0.6016, sum: 0.8281 read_size_hint: 65536, child user time: 0.4844, child system time: 0.3672, sum: 0.8516 read_size_hint: 4096, child user time: 4.2812, child system time: 1.1719, sum: 5.4531 $ perl5.22.1 lwp-readsizehint-bench.pl SDV_0228.MP4 Size of file: 1140644368, perl: 5.022001, LWP: 6.15, OS: freebsd read_size_hint: 1073741824, child user time: 0.1406, child system time: 0.6953, sum: 0.8359 read_size_hint: 52428800, child user time: 0.1094, child system time: 0.8203, sum: 0.9297 read_size_hint: 1048576, child user time: 0.1406, child system time: 0.4219, sum: 0.5625 read_size_hint: 65536, child user time: 0.3359, child system time: 0.3594, sum: 0.6953 read_size_hint: 4096, child user time: 3.3828, child system time: 0.8125, sum: 4.1953 $ perl5.18.4 lwp-readsizehint-bench.pl SDV_0228.MP4 Size of file: 1140644368, perl: 5.018004, LWP: 6.08, OS: freebsd read_size_hint: 1073741824, child user time: 0.1250, child system time: 0.8125, sum: 0.9375 read_size_hint: 52428800, child user time: 0.1484, child system time: 0.8125, sum: 0.9609 read_size_hint: 1048576, child user time: 0.1250, child system time: 0.5000, sum: 0.6250 read_size_hint: 65536, child user time: 0.4219, child system time: 0.3203, sum: 0.7422 read_size_hint: 4096, child user time: 4.1641, child system time: 1.0156, sum: 5.1797 $ perl5.12.4 lwp-readsizehint-bench.pl SDV_0228.MP4 Size of file: 1140644368, perl: 5.012004, LWP: 6.08, OS: freebsd read_size_hint: 1073741824, child user time: 0.1875, child system time: 0.9219, sum: 1.1094 read_size_hint: 52428800, child user time: 0.1641, child system time: 0.9844, sum: 1.1484 read_size_hint: 1048576, child user time: 0.1641, child system time: 0.6250, sum: 0.7891 read_size_hint: 65536, child user time: 0.4531, child system time: 0.5078, sum: 0.9609 read_size_hint: 4096, child user time: 4.6484, child system time: 1.1406, sum: 5.7891 Test file: 6.8GB $ perl5.22.1 lwp-readsizehint-bench.pl SDV_0228.MP4.dv Size of file: 6852096000, perl: 5.022001, LWP: 6.15, OS: freebsd read_size_hint: 1073741824, child user time: 0.8516, child system time: 7.1406, sum: 7.9922 read_size_hint: 52428800, child user time: 0.9453, child system time: 8.4766, sum: 9.4219 read_size_hint: 1048576, child user time: 1.0000, child system time: 4.8516, sum: 5.8516 read_size_hint: 65536, child user time: 3.3906, child system time: 3.7266, sum: 7.1172 read_size_hint: 4096, child user time: 29.9531, child system time: 8.3594, sum: 38.3125 =cut