Subject: | Split header lines are joined incorrectly |
Encode::MIME::Header uses the following regex to join split lines:
$str =~ s/(:?\r|\n|\r\n)[ \t]//gos;
This is b0rked in two ways: first, "(:?" is not non-capturing
parentheses but capturing parentheses with an optional colon before the
CR. This probably turns up very seldom though :)
The more severe bug that bit me today is that the replacement part is
empty. RFC2047 is an extension of RFC822, so I suppose section 2.2.3 of
the latter is authoritative here:
The process of moving from this folded multiple-line representation
of a header field to its single line representation is called
"unfolding". Unfolding is accomplished by simply removing any CRLF
that is immediately followed by WSP.
So to be strictly conforming, the expression should be
$str =~ s/(?:\r|\n|\r\n)(?=[ \t])//gs;
(I think it's fair not to include the more exotic WS characters in \s)
However, CRLF followed by a TAB or multiple spaces is often used for
lines that were originally split on a single space, so the following
would probably come closer to what most people would expect (this is how
mutt does it BTW, I haven't checked any other MUAs though):
$str =~ s/(?:\r|\n|\r\n)[ \t]+/ /gs;