Subject: | warn if parsehttp() has problems fetching URL |
When calling parsehttp() (internally via XML::FeedPP), I mistakenly gave it a bad URL (which was a
404) but got this error that originated in XML::TreePP:
Invalid tag sequence: <hr></body> at [Perl path]/XML/FeedPP.pm line 549.
This was because XML::TreePP attempted to go ahead and parse the HTML returned by the
Apache 404. It took me awhile to realize that it was choking on this and not a real XML feed.
The attached patch simply warns the next silly programmer who tries to give bad URLs to
parsehttp().
parsehttp() implements both parsehttp_lwp() and parsehttp_lite(). There is a warning in each.
------
XML-TreePP-0.41
Perl 5.14.2
Linux [hostname] 2.6.32-279.19.1.el6.x86_64 #1 SMP Sat Nov 24 14:35:28 EST 2012 x86_64
x86_64 x86_64 GNU/Linux
Subject: | TreePP.patch |
--- TreePP.pm.orig 2013-02-13 14:44:14.494335340 -0500
+++ TreePP.pm 2013-02-13 15:28:11.474306967 -0500
@@ -649,6 +649,9 @@
$req->content($body) if defined $body;
my $res = $ua->request($req);
my $code = $res->code();
+ if ( $code !~ m/^2../ ) {
+ warn("$code status returned by $url\n");
+ }
my $text;
if ( $res->can( 'decoded_content' )) {
$text = $res->decoded_content( charset => 'none' );
@@ -681,6 +684,9 @@
}
$http->{content} = $body if defined $body;
my $code = $http->request($url) or return;
+ if ( $code !~ m/^2../ ) {
+ warn("$code status returned from $url\n");
+ }
my $text = $http->body();
my $tree = $self->parse( \$text );
wantarray ? ( $tree, $text, $code ) : $tree;