Subject: | Treatment of <br/> tags |
Date: | Wed, 21 Apr 2010 11:10:08 +1000 |
To: | bug-html-wikiconverter-markdown [...] rt.cpan.org |
From: | Nick Andrew <nick [...] nick-andrew.net> |
Hi,
I'm trying to do bidirectional HTML to Markdown, and it's working well
except in the case of hard line breaks.
With input html document (ignore leading tabs):
<p>This is a line, with line breaks<br />
2nd line<br />
3rd line<br />
4th line</p>
Running it through HTML::WikiConverter::Markdown, we get this back:
This is a line, with line breaks<br /> 2nd line<br /> 3rd line<br /> 4th line
Now "<br/>" means hard line break so it doesn't make sense to continue
the HTML output on the same line.
Hard breaks can also be specified with two trailing spaces.
Markdown itself seems to be roughly doing the right thing:
$ echo -e "line 1 \nline 2 \nline 3 \n" | markdown
<p>line 1 <br />
line 2 <br />
line 3 </p>
But the line breaks don't survive the round-trip conversion:
$ echo -e "line 1 \nline 2 \nline 3 \n" | markdown | html-to-markdown.pl
line 1 <br /> line 2 <br /> line 3
html-to-markdown.pl is a small script which just calls
HTML::WikiConverter->new(dialect => 'Markdown') to filter STDIN to STDOUT.
I tried hacking HTML::WikiConverter::Markdown myself for the handling of
the 'br' tag, to change it to "<br />\n" but the following line always contains
a leading space, like this:
This is more text, with line breaks <br />
2nd line <br />
3rd line <br />
4th line
The leading space isn't good.
Nick.
--
PGP Key ID = 0x418487E7 http://www.nick-andrew.net/
PGP Key fingerprint = B3ED 6894 8E49 1770 C24A 67E3 6266 6EB9 4184 87E7