Bug #26358 for Data-ICal: two problems in line folding

Sat Apr 14 18:11:03 2007 mail [...] kamishima.net - Ticket created

Subject:

two problems in line folding

According to RFC, the length of lines is limited to 75: --------------------------------------------------------------------- Lines of text SHOULD NOT be longer than 75 octets, excluding the line break. Long content lines SHOULD be split into a multiple line representations using a line "folding" technique. However, your routine in "Property.pm" generates lines with 76 octets. --------------------------------------------------------------------- while ( $string =~ /(.{76})/ ) { $string =~ s/(.{75})(.)/$1\n $2/; } --------------------------------------------------------------------- Further, when lines are folded, multi-byte characters can be broken. In UTF-8 encoded texts, characters can be composed with two or more bytes. (Wikipedia http://en.wikipedia.org/wiki/UTF-8) However, by your routine, lines can be folded not at character boundaries. This problem is fatal for non-Western people. if "use utf8;" is not specified, and iCalendar files are encoded by utf-8, the following codes can be used for folding lines at character boundaries. However, if "use utf8;" is used, or the other types of charset is used, this routine would fail. sub foldLines { my($s) = @_; my($r, $l); return($s . "\x0d\x0a") if (length($s) <= 75); $r = substr($s, 0, 1); $s = substr($s, 1); while(length($s) >= 75) { $l = 74; while(($l >= 0) && ((0xc0 & ord(substr($s, $l, 1))) == 0x80)) { $l--; } $r = $r . substr($s, 0, $l) . "\x0d\x0a ";; $s = substr($s, $l); } return($r . $s . "\x0d\x0a"); }

Sun Apr 15 13:12:09 2007 jesse [...] fsck.com - Correspondence added

Subject:	Re: [rt.cpan.org #26358] two problems in line folding
Date:	Sun, 15 Apr 2007 13:11:31 -0400
To:	bug-Data-ICal [...] rt.cpan.org
From:	Jesse Vincent <jesse [...] fsck.com>

Hiya, Thanks very much for the bug report and sample "good" wrap library. Is there any chance I can talk you into writing a failing test to make sure that we never break this again? Best, Jesse On Apr 14, 2007, at 6:11 PM, via RT wrote: Show quoted text

> > Sat Apr 14 18:11:03 2007: Request 26358 was acted upon. > Transaction: Ticket created by ShimaShima > Queue: Data-ICal > Subject: two problems in line folding > Broken in: (no value) > Severity: Important > Owner: Nobody > Requestors: mail@kamishima.net > Status: new > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=26358 > > > > According to RFC, the length of lines is limited to 75: > --------------------------------------------------------------------- > Lines of text SHOULD NOT be longer than 75 octets, excluding the line > break. Long content lines SHOULD be split into a multiple line > representations using a line "folding" technique. > > However, your routine in "Property.pm" generates lines with 76 octets. > --------------------------------------------------------------------- > while ( $string =~ /(.{76})/ ) { > $string =~ s/(.{75})(.)/$1\n $2/; > } > --------------------------------------------------------------------- > > Further, when lines are folded, multi-byte characters can be broken. > In UTF-8 encoded texts, characters can be composed with two or more > bytes. > (Wikipedia http://en.wikipedia.org/wiki/UTF-8) > > However, by your routine, lines can be folded not at character > boundaries. This problem is fatal for non-Western people. > > if "use utf8;" is not specified, and iCalendar files are encoded by > utf-8, the following codes can be used for folding lines at character > boundaries. However, if "use utf8;" is used, or the other types of > charset is used, this routine would fail. > > sub foldLines { > my($s) = @_; > my($r, $l); > > return($s . "\x0d\x0a") if (length($s) <= 75); > > $r = substr($s, 0, 1); > $s = substr($s, 1); > > while(length($s) >= 75) { > $l = 74; > > while(($l >= 0) && ((0xc0 & ord(substr($s, $l, 1))) == 0x80)) { > $l--; > } > > $r = $r . substr($s, 0, $l) . "\x0d\x0a ";; > $s = substr($s, $l); > } > > return($r . $s . "\x0d\x0a"); > } >

Download PGP.sig
application/pgp-signature 186b

Message body not shown because it is not plain text.

Sun Apr 15 13:12:14 2007 The RT System itself - Status changed from 'new' to 'open'

Mon Apr 16 14:46:46 2007 mail [...] kamishima.net - Correspondence added

From:

mail [...] kamishima.net

Thank you for your kind response. I reported two bugs, but one of them is my misunderstanding. Current code correctly folds lines with 75 octets. Pardon me. I attached sample code for the other bug: lines may not folded at character boundaries. If multi-byte characters are broken, Sunbird failed to read such files. Apple iCal or Google Calnedar do not face this problem. However, as specified in RFC 2445 --------------------------------------------------------- That is, a long line can be split between any two characters by inserting a CRLF immediately followed by a single linear white space character (i.e., SPACE, US-ASCII decimal 32 or HTAB, US-ASCII decimal 9). --------------------------------------------------------- I think that lines should be split between two characters, not two octets. Show quoted text

> Thanks very much for the bug report and sample "good" wrap library.

My sample code may fail if "use utf8;" is specified. In this case, the sentence "use bytes;" will be required at the head of this folding routine. However, "use bytes;" is not supported if perl's version is older than 5.6. This is a sample code, which is incorrectly folded by Data::ICal-0.13. Because this code includes non-ascii codes, so I also attached this file. To open this file, please use text editors that can read utf-8 encoded texts. --------------------------------------------------------------------- #!/usr/bin/perl use strict; use Data::ICal; my($desc, $name); my $cal = Data::ICal->new(); # two bytes characters test # # this string consists of "X-WR-CALNAME:" + 59 digits + 5 greek letters + # 65 digits + 5 greek letters. # Five greek letters are (alpha: ce b1) (beta: ce b2 ce b3 ce b4 ce b5 # These are all composed of two octets. # # If this string is folded between 75th and 76th octets, first "beta" is # divided between 1st and 2nd octets. # If the string is folded between 149th and 150th octets, the second line is # folded between "alpha" and "beta" correctly. # # This string should be folded after 74th (after "alpha") and 148th # (after "5") octets. $desc = '12345678901234567890123456789012345678901234567890123456789' . 'αβγδε' . '12345678901234567890123456789012345678901234567890123456789012345' . 'αβγδε'; # three bytes characters test # # this string consists of "X-WR-CALDESC:" + 58 digits + # 5 Japanese hiragana letters + 58 digits + 5 hiragana letters + 61 digits + # 5 hiragana letters # Five Japanese hiragana letters are (A: e3 81 82) (I: e3 81 84) # (U: e3 81 86) (E: e3 81 88) (O: e3 81 8a) # These are all composed of three octets # # If this string is folded between 75th and 76th octets, the first "I" is # divided between 1st and 2nd octets. # If this string is folded between 149th and 150th octets, the second "I" is # divided between 2nd and 3rd octets. # If the string is folded between 223rd and 224th octets, the third line is # folded between "A" and "I" correctly. # # This string should be folded after 74th (after "A"), 148th # (after "A"), and 222th (after "1") octets. $name = '1234567890123456789012345678901234567890123456789012345678' . 'あいうえお' . '1234567890123456789012345678901234567890123456789012345678' . 'あいうえお' . '1234567890123456789012345678901234567890123456789012345678901' . 'あいうえお'; $cal->add_properties( "x-wr-calname" => $name, "x-wr-caldesc" => $desc, ); print "123456789012345678901234567890123456789012345678901234567890123456789012345\n"; print $cal->as_string;

#!/usr/bin/perl use strict; use Data::ICal; my($desc, $name); my $cal = Data::ICal->new(); # two bytes characters test # # this string consists of "X-WR-CALNAME:" + 59 digits + 5 greek letters + # 65 digits + 5 greek letters. # Five greek letters are (alpha: ce b1) (beta: ce b2 ce b3 ce b4 ce b5 # These are all composed of two octets. # # If this string is folded between 75th and 76th octets, first "beta" is # divided between 1st and 2nd octets. # If the string is folded between 149th and 150th octets, the second line is # folded between "alpha" and "beta" correctly. # # This string should be folded after 74th (after "alpha") and 148th # (after "5") octets. $desc = '12345678901234567890123456789012345678901234567890123456789' . 'Î±Î²Î³Î´Îµ' . '12345678901234567890123456789012345678901234567890123456789012345' . 'Î±Î²Î³Î´Îµ'; # three bytes characters test # # this string consists of "X-WR-CALDESC:" + 58 digits + # 5 Japanese hiragana letters + 58 digits + 5 hiragana letters + 61 digits + # 5 hiragana letters # Five Japanese hiragana letters are (A: e3 81 82) (I: e3 81 84) # (U: e3 81 86) (E: e3 81 88) (O: e3 81 8a) # These are all composed of three octets # # If this string is folded between 75th and 76th octets, the first "I" is # divided between 1st and 2nd octets. # If this string is folded between 149th and 150th octets, the second "I" is # divided between 2nd and 3rd octets. # If the string is folded between 223rd and 224th octets, the third line is # folded between "A" and "I" correctly. # # This string should be folded after 74th (after "A"), 148th # (after "A"), and 222th (after "1") octets. $name = '1234567890123456789012345678901234567890123456789012345678' . 'ããããã' . '1234567890123456789012345678901234567890123456789012345678' . 'ããããã' . '1234567890123456789012345678901234567890123456789012345678901' . 'ããããã'; $cal->add_properties( "x-wr-calname" => $name, "x-wr-caldesc" => $desc, ); print "123456789012345678901234567890123456789012345678901234567890123456789012345\n"; print $cal->as_string;

Thu Jul 09 23:31:52 2009 cpan [...] chmrr.net - Status changed from 'open' to 'resolved'