Skip Menu |

This queue is for tickets about the HTML-FormatExternal CPAN distribution.

Report information
The Basics
Id: 103135
Status: resolved
Priority: 0/
Queue: HTML-FormatExternal

People
Owner: Nobody in particular
Requestors: olaf [...] wundersolutions.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 22
Fixed in: (no value)



Subject: warnings on printing wide characters
The handling of encoding has been commented out https://metacpan.org/source/KRYDE/HTML-FormatExternal-22/lib/HTML/FormatExternal.pm#L69 So, in this input_charset is not meaningful to the format_string method. I worked around this in my current code by calling Encode::encode( 'UTF-8', $string) before passing data to format_string(). Basically, input_charset seems to be ignored for the string which is pass to format_string().
Subject: Re: [rt.cpan.org #103135] warnings on printing wide characters
Date: Sat, 28 Mar 2015 13:55:21 +1100
To: "Olaf Alders via RT" <bug-HTML-FormatExternal [...] rt.cpan.org>
From: Kevin Ryde <user42_kevin [...] yahoo.com.au>
"Olaf Alders via RT" <bug-HTML-FormatExternal@rt.cpan.org> writes: Show quoted text
> > The handling of encoding has been commented out
Yes, as the comment says "secret experimental" but I can't remember why I didn't enable it. Does the idea look about right? I think I intended output_charset=>'wide' to get back wide chars, but maybe that could be the default if the input is wide. My own uses of the code so far have been byte strings (mime email message parts). The amount of charsets and non-ascii varies among the programs of course. Show quoted text
> Encode::encode( 'UTF-8', $string) before passing data to format_string().
Should say that as input_charset I think, to tell the program what it's getting.
On Fri Mar 27 22:59:15 2015, user42_kevin@yahoo.com.au wrote: Show quoted text
> "Olaf Alders via RT" <bug-HTML-FormatExternal@rt.cpan.org> writes:
> > > > The handling of encoding has been commented out
> > Yes, as the comment says "secret experimental" but I can't remember why > I didn't enable it. Does the idea look about right? I think I intended > output_charset=>'wide' to get back wide chars, but maybe that could be > the default if the input is wide. > > My own uses of the code so far have been byte strings (mime email > message parts). The amount of charsets and non-ascii varies among the > programs of course. >
> > Encode::encode( 'UTF-8', $string) before passing data to format_string().
> > Should say that as input_charset I think, to tell the program what it's > getting.
I just had a chat with Dave Rolsky about this and his suggestion was that the commented code in the Zen formatter looks to be a good way to handle this: https://metacpan.org/source/KRYDE/HTML-FormatExternal-22/lib/HTML/FormatText/Zen.pm#L46 So, I think it would be helpful to use the input_charset to do something like what I'm doing: Encode::encode( $input_charset, $string). From where I sit, I don't know why you'd want a different output charset. I'd be happy getting it back in the same charset, just with the formatting. Was there a particular use case for wanting a different output charset?
Subject: Re: [rt.cpan.org #103135] warnings on printing wide characters
Date: Mon, 13 Apr 2015 12:43:47 +1000
To: "Olaf Alders via RT" <bug-HTML-FormatExternal [...] rt.cpan.org>
From: Kevin Ryde <user42_kevin [...] yahoo.com.au>
"Olaf Alders via RT" <bug-HTML-FormatExternal@rt.cpan.org> writes: Show quoted text
> > I just had a chat with Dave Rolsky about this and his suggestion was > that the commented code in the Zen formatter looks to be a good way to > handle this:
The validate bit? Or convert? One thing I was wary of is I didn't want to hard-code too much knowledge about what the respective programs could or couldn't do, as you never know when they might grow etc. Of course it's not helpful for the module to let bad things happen if there's a way to do it right. One possibility on the input side would be to entitize any non-ascii when unsure. Show quoted text
> Was there a particular use case for wanting a different output charset?
Yes, I use it that way. In my rss2leafode I get html from an rss feed or http fetch in what charset the server gives, and I output utf-8 always for the resulting generated news message. (Could have left the charset unchanged for the output perhaps, but I also intermingle little bits of further text.) I made a start enabling some wide bits. I propose to have wide input make wide output by default, plus an output_wide=> option to force it.
Subject: Re: [rt.cpan.org #103135] warnings on printing wide characters
Date: Sat, 25 Apr 2015 12:43:47 +1000
To: bug-HTML-FormatExternal [...] rt.cpan.org
From: Kevin Ryde <user42_kevin [...] yahoo.com.au>
I uploaded a version 23 with some wide char support. It should be transparent wide input -> wide output.
Subject: Re: [rt.cpan.org #103135] warnings on printing wide characters
Date: Sat, 25 Apr 2015 18:39:47 +1000
To: bug-HTML-FormatExternal [...] rt.cpan.org
From: Kevin Ryde <user42_kevin [...] yahoo.com.au>
I wrote: Show quoted text
> > I uploaded a version 23 with some wide char support.
(The PAUSE is having some trouble, bit it's at my web page.)
On Sat Apr 25 04:41:29 2015, user42_kevin@yahoo.com.au wrote: Show quoted text
> I wrote:
> > > > I uploaded a version 23 with some wide char support.
> > (The PAUSE is having some trouble, bit it's at my web page.)
Thanks so much for this! I just tested the latest version and it looks to be doing the right thing out of the box. I really appreciate your efforts. Olaf
Subject: Re: [rt.cpan.org #103135] warnings on printing wide characters
Date: Tue, 28 Apr 2015 17:06:27 +1000
To: "Olaf Alders via RT" <bug-HTML-FormatExternal [...] rt.cpan.org>
From: Kevin Ryde <user42_kevin [...] yahoo.com.au>
"Olaf Alders via RT" <bug-HTML-FormatExternal@rt.cpan.org> writes: Show quoted text
> > doing the right thing out of the box.
Glad you like it. I'm going to use it wide too since I realized in my code (rss2leafnode) I'd made two branches one wide and one byte depending on the formatter class ... :) (Oh, and it's up on cpan now by a different upload.) -- "Never get out of the boat."