Subject: | memory ineficiencies on http transport |
Large XML responses soon get 10 fold larger being passed by value
through the various layers from transport to deserialization. Here is a
summary of the increase in memory use and it's positions in the module
code, commented by the virtual memory usage of the perl process, with a
xml soap response 27,285,916 bytes long.
Show quoted text
------------- begin of debug session -------------
# 16Mb perl
SOAP::Transport::HTTP::Client::send_receive(/usr/lib/perl5/site_perl/5.8.8/SOAP/Transport/HTTP.pm:199):
199: $self->http_response($self->SUPER::request($self->http_request));
# 44Mb perl
200: SOAP::Trace::transport($self->http_response);
# 44Mb perl
201: SOAP::Trace::debug($self->http_response->as_string);
# 139Mb perl
233: ? die "Can't understand returned Content-Encoding
(@{[$self->http_response->content_encoding]})\n"
234: : $self->http_response->content;
# 188Mb perl
235: $self->http_response->content_type =~ m!^multipart/!i ?
236: join("\n", $self->http_response->headers_as_string, $content)
237: : $content;
# 212Mb perl
SOAP::Lite::call(/usr/lib/perl5/site_perl/5.8.8/SOAP/Lite.pm:3382):
3382: return $response if $self->outputxml;
# 236Mb perl
------------- end of debug session -------------
When the SOAP starts deserializing it the usage progressively becomes
more than 450Mb, the system trashes to a slow death and never ends
processing.
My workaround was to make a SAX-style user-agent, using the exported
SOAP::Transport::HTTP::Client::USERAGENT_CLASS (possible thanks to the
fact that it is only evaluated on the SOAP::Transport::HTTP::Client::new())
------------- begin of UASAX.pm -------------
package UASAX;
use strict;
use warnings;
use XML::Parser;
use SOAP::Transport::HTTP;
use vars qw(@ISA);
eval("require $SOAP::Transport::HTTP::Client::USERAGENT_CLASS")
or die "Could not load UserAgent: $@";
push @ISA, $SOAP::Transport::HTTP::Client::USERAGENT_CLASS;
$SOAP::Transport::HTTP::Client::USERAGENT_CLASS = __PACKAGE__;
# example settings
our $CAPTURE = '^(?:element1|element2|element3)$';
our $EXPAND = '^(?:subelement1|subelement2)$';
sub request
{
my ($self, $request, $arg, @other) = @_;
my $base_parser = new XML::Parser( NoExpand => 1, Handlers => {
Init => sub {
my $self = shift;
# print "xml parsing started\n";
$self->{response} = [];
},
Final => sub {
# print "xml parsing ended\n";
},
Start => sub {
my $self = shift;
$self->{element} = shift;
my %attr = @_;
# printf "id %d\n",
$self->{id} = substr $attr{id}, 2
if $self->{element} eq 'multiRef';
$self->{response}->[$self->{id}]->{$self->{element}} =
$self->{response}->[substr $attr{href},
3] ||= {}
if $EXPAND and $self->{element}
=~ /$EXPAND/;
},
Char => sub {
my $self = shift;
$self->{response}->[$self->{id}]->{$self->{element}} = shift
if $CAPTURE and $self->{element} =~
/$CAPTURE/;
},
End => sub {
my $self = shift;
undef $self->{element};
},
} );
$request->{parser} = $base_parser->parse_start();
my $response = $self->SUPER::request($request, sub {
my ($chunk, $response, $protocol) = @_;
# print length($chunk), "\n";
$response->request->{parser}->parse_more($chunk);
# $response->{total} += length $chunk;
# open O, ">> data.xml" and print O $chunk;
}, @other);
$request->{parser}->parse_done();
$response->content( $request->{parser}->{response} );
$response
}
1;
------------- end of UASAX.pm -------------
and the call to the service keeps the same for me, with two small
changes, commented below:
------------- begin of the SOAP call changes -------------
use UASAX; ### replace LWP::UserAgent on SOAP::Transport::HTTP
my $response = eval {
my $service = SOAP::Lite
->uri ("urn:MyService")
->proxy ("http://myserver.com/MyObject");
$service->outputxml (1); ### avoid reprocessing response
$service->mycall()
};
------------- end of the SOAP call changes -------------
I don't know how my solution could help you, on the general problem, but
I hope to have cast a light on the strong inefficiency of the UserAgent
memory usage; the worst thing is that most of this memory is not freed
after the communication, even after undefining the soap service and
response references, possibly for the use of global variables or, who
knows, cyclic-references.