Subject: | don't call decoded_content if content is already unicode encoded |
When using URI or HTTP::Response object as an argument to scrape(), simply $stuff->content should be used as $html content, in place of unconditional $stuff->decoded_content if $stuff->content is already utf-8 encoded. "wide character" errors may follow, otherwise.
Here's a patch ($VERSION = '0.37'):
diff --git a/lib/Web/Scraper.pm b/lib/Web/Scraper.pm
index aca019c..7ad9b7f 100644
--- a/lib/Web/Scraper.pm
+++ b/lib/Web/Scraper.pm
@@ -64,7 +64,10 @@ sub scrape {
return $self->scrape($res, $stuff->as_string);
} elsif (blessed($stuff) && $stuff->isa('HTTP::Response')) {
if ($stuff->is_success) {
- $html = $stuff->decoded_content;
+ $html =
+ $stuff->content_charset =~ /utf\-8/i
+ ? $stuff->content
+ : $stuff->decoded_content;
} else {
croak "GET " . $stuff->request->uri . " failed: ", $stuff->status_line;
}