Skip Menu |

This queue is for tickets about the HTTP-Message CPAN distribution.

Report information
The Basics
Id: 82963
Status: resolved
Priority: 0/
Queue: HTTP-Message

People
Owner: Nobody in particular
Requestors: MAUKE [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: 6.06
Fixed in: (no value)



Subject: ->decoded_content should decode application/json, etc
Currently $response->decoded_content will decode the bytes of e.g. "Content-type: text/json; charset=UTF-8" messages because it knows "text/*" is ... text. It would be nice if this could be extended to also decode the text for content-types such as "application/json; charset=UTF-8", "application/javascript; charset=ISO-8859-15", etc.
On Fri Jan 25 17:13:06 2013, MAUKE wrote: Show quoted text
> Currently $response->decoded_content will decode the bytes of e.g. > "Content-type: text/json; charset=UTF-8" messages because it knows > "text/*" is ... text. > > It would be nice if this could be extended to also decode the text for > content-types such as "application/json; charset=UTF-8", > "application/javascript; charset=ISO-8859-15", etc.
Bump, just ran into the same issue after a few hours. in HTTP::Headers->content_is_text, shouldn't the presence of charset in the content-type imply that the content is characters, ie text?
Second on this. When I say decode, I know what I am doing - currently there is no way to force it. $response->decoded_content(charset => 'utf-8') Adding (charset_strict => 1, raise_error => 1) doesn't help. Better yet, the content type I get is Content-Type: application/json; charset=UTF-8 Maybe content_is_text() should returns true if the charset is present in the content-type header?
Third. Currently, the code says: if ($self->content_is_text || (my $is_xml = $self->content_is_xml)) { Examples where LWP currently breaks include: application/json application/yaml application/x-yaml application/pdf application/* (that isn't +xml) The Content-Type really shouldn't matter. If the Content-Type is "pork/beans; charset=UTF-8", it should still be decoded. If the remote agent broadcasted a charset, it's telling us that it had encoded that data with that character set. We shouldn't care if the data inside the onion is text, audio, application-specific, some proprietary format, whatever. Please remove this 'if' line. It's a pretty intelligent interface, so it would be a waste of code to have other folks design their own decoding interface just because of this restriction.