Skip Menu |

This queue is for tickets about the HTML-WikiConverter-MediaWiki CPAN distribution.

Report information
The Basics
Id: 46453
Status: open
Worked: 1 hour (60 min)
Priority: 0/
Queue: HTML-WikiConverter-MediaWiki

People
Owner: diberri [...] cpan.org
Requestors: jidanni [...] jidanni.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: triggering <nowiki> too often
Date: Thu, 28 May 2009 19:38:50 +0800
To: bug-html-wikiconverter-mediawiki [...] rt.cpan.org
From: jidanni [...] jidanni.org
my $wc = new HTML::WikiConverter( dialect => 'MediaWiki' ); $_="<em>x</em>:bla";print $bwc->html2wiki( $_ ); triggers <nowiki> even though not needed. Not bad. All I ended up using was s@<nowiki>:</nowiki>@:@; s@<blockquote>\K\n\n@@g; s@\n\n(</blockquote>)@$1@g; to make to my taste. By the way, one worry is <li><p>...<blockquote><p>... the second <p> gets blown away. But you can't just add a newline, else you'll break out of the * or # generated. In this case you might want to keep the second <p>...</p>'s. I wonder what happens with deeper nesting. (Actually I don't wonder, as hopefully my conversion task will be long finished by the time you read this.) $ apt-cache policy $@ libhtml-wikiconverter-mediawiki-perl: Installed: 0.58-1
Howdy, Show quoted text
> my $wc = new HTML::WikiConverter( dialect => 'MediaWiki' ); > $_="<em>x</em>:bla";print $bwc->html2wiki( $_ ); > triggers <nowiki> even though not needed.
Yeah. The problem is that nowiki tags are applied in a rather mind-numbingly simplistic (ie, dumb) manner. There is a set of patterns that, if matched, trigger the nowiki tag. These include: qr/^\:/, qr/^\*/, qr/^\#/, etc. Note the caret that signifies that these should only match at the beginning of a string. Intuitively this makes sense: a colon, for example, should only be given a nowiki tag if it occurs at the beginning of a line. Hence the caret. The trouble is that "beginning of a string" is not synonymous with "beginning of a line". In reality, the way the code applies these patterns is that the caret signifies the beginning of a text node. Your markup, "<em>x</em>:bla" gets parsed into something like this: <p><em><~text>x</~text></em><~text>:bla</~text></p> The corresponding MediaWiki markup should be ''x'':bla without the nowiki tag, because the colon doesn't occur at the beginning of the string. But presently, H::WC applies a nowiki because it finds in that parsed html a text node that begins with a colon. And then you have this mess: ''x''<nowiki>:bla</nowiki> as you've already noticed. Show quoted text
> Not bad. All I ended up using was > s@<nowiki>:</nowiki>@:@; > s@<blockquote>\K\n\n@@g; > s@\n\n(</blockquote>)@$1@g; > to make to my taste.
It's a bit of a hack, albeit a necessary one. The real answer is for me to fix the way the nowiki tag gets applied. And to do that requires a bit of work. I'll see what I can do. Show quoted text
> By the way, one worry is > <li><p>...<blockquote><p>... > the second <p> gets blown away. But you can't just add a newline, else > you'll break out of the * or # generated. In this case you might want to > keep the second <p>...</p>'s. I wonder what happens with deeper nesting. > (Actually I don't wonder, as hopefully my conversion task will be long > finished by the time you read this.)
How does this relate to nowiki tags? Cheers, Dave
On Thu May 28 23:39:57 2009, DIBERRI wrote: Show quoted text
> The real answer is for me > to fix the way the nowiki tag gets applied. And to do that requires a > bit of work. I'll see what I can do.
This is fixed in H::WC::MediaWiki 0.59, which I've just uploaded to CPAN. Cheers, Dave
Subject: Re: [rt.cpan.org #46453] triggering <nowiki> too often
Date: Sat, 30 May 2009 02:07:32 +0800
To: bug-html-wikiconverter-mediawiki [...] rt.cpan.org
From: jidanni [...] jidanni.org
Show quoted text
>> By the way, one worry is >> <li><p>...<blockquote><p>... >> the second <p> gets blown away. But you can't just add a newline, else >> you'll break out of the * or # generated. In this case you might want to >> keep the second <p>...</p>'s. I wonder what happens with deeper nesting. >> (Actually I don't wonder, as hopefully my conversion task will be long >> finished by the time you read this.)
> > How does this relate to nowiki tags?
it doesn't. It was a by the way.