Skip Menu |

This queue is for tickets about the XML-Atom CPAN distribution.

Report information
The Basics
Id: 61637
Status: resolved
Priority: 0/
Queue: XML-Atom

People
Owner: Nobody in particular
Requestors: SHLOMIF [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.37
Fixed in: (no value)



Subject: Content body with Newlines (and tabs/etc.) is treated as unprintable
Due to a change in the behaviour of \p{IsPrint} and \P{IsPrint} between perl-5.10.x and perl-5.12.x, where \p{IsPrint} excludes newlines/tabs/etc. then when supplying the ::Content class with a "Body =>" parameter that contains them it is converted to Base64 which makes it unusable. I don't have a ready self-contained test case, but I will be able to write on tomorrow based on this XML-Feed bug report: https://rt.cpan.org/Public/Bug/Display.html?id=44899 Here is the IRC conversation on #p5p about it: {{{{{{{{{{{{ <rindolf> Hi all. ("\n" =~ m/\P{IsPrint}/) is false on perl-5.10.1 (Mandriva 2010.1) and true on perl-5.12.2 (Mandriva Cooker). Why was the behaviour changed and what is the correct one? Thinking that newline is unprintable breaks XML-Atom. * Zefram recalls something about sticking strictly to the Unicode class definitions for \p{} <Zefram> see L<perl5120delta/Unicode overhaul> <rafl> Zefram: someone doing it, and how "easy" it turned out to be :) <Zefram> ah, right <Zefram> I've got a list of others to do <leont> Zefram++ <leont> I think this is going to be my favorite feature of 5.14 <Zefram> I found with this one that although it was very easy to do parse_stmtseq it's rather more difficult to use it to parse a custom type of block <Zefram> block_start and block_end need to go into the API, but there's some other lexer magic around braces too <rafl> i'm not quite decided on what my favourite feature is going to be, given that they're not even all written yet, but this is definitely a big one :) <vincent> that's pretty cool * rafl will upgrade Digest-MD5 and then apply with the apitest move <rindolf> Zefram: I don't see anything in perl5120delta there. <Zefram> # "\p{Print}" no longer matches the line control characters: Tab, LF, CR, FF, VT, and NEL. This brings it in line with standards and the documentation. <rindolf> Zefram: well, I do, but I don't know how it's called. <rindolf> Zefram: ah. <rindolf> Zefram: hmmm.... <rindolf> Zefram: so XML-Atom is buggy. <Zefram> officially yes <Zefram> if you want to include characters that Unicode doesn't regard as printable, you'll need to do it explicitly <Zefram> if you're including all of the characters that got removed from \p{Print}, then the fixed code will work on both old and new Perl versions <Zefram> but actually, in an XML context I suspect that you don't want to defer to Unicode at all <Zefram> more likely you actually want one of the character classes defined in the XML spec <rindolf> Zefram: thanks, I'll deal with it tomorrow. <rindolf> It's getting late here. }}}}}}}}}}}} The suggested solution is to include all these characters in the regex. Regards, -- Shlomi Fish
Fixed in 0.38