Subject: | Incorrect Content-Length header when posting UTF8 data |
Using LWP::Protocol::http.pm with the following CVS version string: http.pm,v 1.63 2001/12/14 19:33:52 gisle
Perl version: v5.6.1 built for sun4-solaris
OS version: SunOS amsisfds01 5.7 Generic_106541-19 sun4u sparc SUNW,Ultra-80
I believe there is a problem with the following code snippet:
---
if (defined($$content_ref) && length($$content_ref)) {
$has_content++;
if (!defined($clen) || $clen ne length($$content_ref)) {
if (defined $clen) {
warn "Content-Length header value was wrong, fixed";
hlist_remove(\@h, 'Content-Length');
}
push(@h, 'Content-Length' => length($$content_ref));
}
}
---
This code basically overrides any user-supplied Content-Length header, replacing it with the value returned by length($$content_ref). The problem I am having when using LWP::UserAgent to Post UTF8-encoded data is that my correct Content-Length header value is being "fixed" and replaced by an incorrect value.
I believe the incorrect value is generated by perl's "length" function, which in this circumstance is counting characters, not bytes. However the HTTP protocol expects the number of *bytes* being posted (at least the webserver I am posting to does).
There doesn't seem to be any way for me to avoid this problem except by actually commenting out the relevant lines of code cited above.
The issue with length can be demonstrated with the following code:
---
use charnames ':full';
my $utf8_string = "\N{EURO SIGN}1";
print "length = ", length($utf8_string), "\n";
print "bytes = ", scalar(@{[unpack("C*", $utf8_string)]});
---
I think the "bytes::length" pragma would be useful as part of a solution but I haven't been able to write a working patch.