Show quoted text> If I understand correctly, all these complex tests that try to figure
> out whether the output filter needs or needs not be applied are there
> only to avoid you printing the XML declaration. Which you don't need
> if you're working in unicode. So I am a bit hesitant to go much
> further down that road for what I see as a not-so-compelling reason.
Oh, if you're only concerned about my personal problem, sure. As long
as I can universally set :utf8 on my filehandles and stop thinking about
them, I'd be perfectly happy. I went to the trouble of trying to
improve your patch because after our previous conversation on the other
bug report I thought that _you_ would be very unhappy to discover that
what you had done changed the output between previous versions of
XML::Twig and your latest version.
I figured that the general goal here is to have XML::Twig with
"output_encoding => utf-8" produce the correct output whether or not it
is printing to a handle with the binmode layer set or not. You appear
to be correct that there is no way to detect the presence of a binmode
layer -- but you *can* force it, with minimal side effects, so that's
what I did.
Show quoted text> Unicode is tricky in regular Perl. If you print a string to a
> filehandle that is not open as :utf8, then it will be converted to
> latin-1 (in order for older code to still work the same). Which is
> what will happen here, whether the print is done by XML::Twig or by
> a regular print. That's consistent.
You're aware that versions of XML::Twig prior to 20081014 *don't*
convert to latin-1 when printing to a filehandle without a :utf8 layer
set, right? Previously, if you set output_encoding => utf-8, and then
printed to a filehandle with no output layer set, you got utf-8 output
(you can see this by running the unit test I provided both against the
current version and any prior version, including the stable version --
the 'twig output via ">"' tests pass).
Show quoted text> Bottom line, if you're working with utf8 and your filehandle is not
> open as :utf-8, you are just asking for trouble.
Well, this is what I ordinarily would have thought, yes, but I'm a
little surprised to hear you take that stance given your strong reaction
to breaking backwards-compatibility in my other bug report. Proceeding
with the 20081014 code as is will break the output of every program
relying on output_encoding=>utf-8 to generate utf-8 output when printing
to a filehandle with no layer.
Show quoted text> In short, you should open all of your output files with a :utf8 layer,
> and the code should work, and maybe drop the output_encoding option,
> which was created to allow output in non-utf8, IIRC in the days
> before 5.8, when encoding conversions weren't as easy as specifying
> the encoding layer when opening the file. I might add something about
> it, or even write a piece in the docs about how to handle encodings.
This is a perfectly fine solution by me, and in fact, it tends towards
the way that I would have been inclined to solve the problem, because it
seems like the Perl Way to handle things, and because
XML::Twig::Elt->print is looking like an increasingly sticky problem to
solve. I've already comment-tagged all of the spots in my current
project where Twig is printing to a filehandle just to make sure that I
didn't accidentally :utf8 them in the future, so it wouldn't take me at
all long to force them back to :utf8 and declare a dependency on Twig >=
3.34, or whatnot, but again, I'm a little surprised to hear you talking
this way given your dedication to not changing existing behaviour.
Maybe I shouldn't be trying to talk you out of it, though, since it
would make my life much easier. :P
If you drop the output_encoding option entirely, however, please at
least replace it with 'encoding' or some other way to have set_encoding
called automatically, so you can still request that XML declarations be
printed via a single new().
Show quoted text> I will look at your patch, but it doesn't seem to take into account
> the keep_encoding option, that keeps the original encoding of the
> XML, and which is a feature that's (still) quite popular. So I have
> to test it some more.
Hm, this is true of the commented-out XML::Twig::Elt->print changes.
That's hard to fix, unless there's some way of finding out whether
keep_encoding was set on the twig containing the element from inside the
element. It looks like you'd have to have every element inherit the
keep_encoding attribute from its parent twig somehow.
The XML::Twig->print section, however, assumes that if someone set
output_encoding => utf-8 that they wouldn't also set keep_encoding and
vice versa (the patch only triggers if output_encoding matches utf-8).
Are you expecting that someone would try to use both output_encoding =>
utf-8 and keep_encoding => 1 at the same time? I've been playing around
with this a bit today across but I made an interesting discovery when I
went to update the patch to explicitly recognize keep_encoding: it
doesn't matter if my code triggers or not -- if you set both
output_encoding => utf-8 and keep_encoding => 1, data from another
codepage, set either via set_text or imported via an external file, the
text ends up not only utf-8 but *double-encoded* utf-8 even when the
output filehandle has no layer. This holds all the way back to Twig
3.32. If you were intending keep_encoding to continue to function with
output_encoding set, there's a deeper issue than what we've been working
on. Personally, I think it would not be unwarranted to croak (or at
least carp a warning) if someone set both of those options at the same
time during a new().
I'm attaching the latest version of my patch, which makes explicit the
XML::Twig->print handling of keep_encoding, and adds a note about
keep_encoding to the commented-out addition in XML::Twig::Elt->print.
I'm also attaching a new test file that explicitly tests for breakage
using keep_encoding => 1, and also has a TODO test covering when
keep_encoding => 1 and output_encoding = utf-8 are simultaneously set,
just as an informative reference.