Skip Menu |

This queue is for tickets about the HTML-WikiConverter CPAN distribution.

Report information
The Basics
Id: 40845
Status: resolved
Worked: 40 min
Priority: 0/
Queue: HTML-WikiConverter

People
Owner: diberri [...] cpan.org
Requestors: diberri [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: (no value)
Fixed in: 0.64



Subject: Various conversion issues (originally ticket #39490)
Note: This bug was originally filed as ticket #39490 but I mistakenly merged it with another ticket (#37911) on 2008-11-11. I am resubmitting the bug report here so that it is kept separate and the other ticket can be closed. The original bug report follows: -- David Iberri Original report: HTML::WikiConverter is mostly working for me, but I'm seeing some conversion issues. Any clues or help would be appreciated... Here are some input and output pages, for reference: http://cfcl/smi_xmw/mediawiki-1.13.1/index.php/Test_1 (in) http://cfcl/smi_xmw/mediawiki-1.13.1/index.php/Test_1_o (out) HTML character entities are (needlessly) transformed and broken: Prot&eacute;g&eacute; -> Prot<C3><A9>g<C3><A9> Links are getting broken. For example: <a href='#Adding_a_...'>adding a ...</a> becomes * [#Adding_a_... adding a ...] rather than * [[#Adding_a_...|adding a ...]] Also: <a href='class_browser.html'>Class Browser</a> becomes: [class_browser.html Class Browser] rather than [[class_browser.html|Class Browser]] Finally, there is a weird combination effect, where: <a href='replace_parent.html' title='Replacing ...'>Prev</a> becomes: [replace_parent.html Prev]<C2><A0> -r -- http://www.cfcl.com/rdm Rich Morin http://www.cfcl.com/rdm/resume rdm@cfcl.com http://www.cfcl.com/rdm/weblog +1 650-873-7841 Technical editing and writing, programming, and web development
Howdy, It's been a while since this bug was filed, so my apologies if this seems irrelevant. I hope you find this useful, however. Show quoted text
> <a href='class_browser.html'>Class Browser</a> > > becomes: > > [class_browser.html Class Browser] > > rather than > > [[class_browser.html|Class Browser]]
This relates to the often-confusing base_uri and wiki_uri attributes, for which I apologize. Most html2wiki conversions should be using both of these attributes. For example, if your wiki's Main Page exists at http://www.mywiki.com/main_page.html, then you might set up your HTML::WikiConverter something like this: use HTML::WikiConverter; my $wc = new HTML::WikiConverter( dialect => 'MediaWiki', base_uri => 'http://www.mywiki.com/', wiki_uri => 'http://www.mywiki.com/' ); # That way, this: print $wc->html2wiki( html => q{<a href='class_browser.html'>Class Browser</a>} ); becomes: [[class browser.html|Class Browser]] Show quoted text
> Links are getting broken. For example: > > <a href='#Adding_a_...'>adding a ...</a> > > becomes > > * [#Adding_a_... adding a ...] > > rather than > > * [[#Adding_a_...|adding a ...]]
This can also be resolved using wiki_uri and base_uri. Show quoted text
> Finally, there is a weird combination effect, where: > > <a href='replace_parent.html' title='Replacing ...'>Prev</a> > > becomes: > > [replace_parent.html Prev]<C2><A0>
I can't duplicate this error. For me, this: use HTML::WikiConverter; my $wc = new HTML::WikiConverter( dialect => 'MediaWiki' ); print $wc->html2wiki( html => qq{<a href="replace_parent.html" title="Replacing">Prev</a>} ); becomes: [replace_parent.html Prev] And with the proper base_uri and wiki_uri attributes, that would more appropriately become: [[replace parent.html|Prev]] Show quoted text
These links aren't working for me (which I expect relates to how old this bug report is :-) If you can get them functional again or can refer to some alternatives, that'd be great. Otherwise, no worries. Show quoted text
> HTML character entities are (needlessly) transformed and broken: > > Prot&eacute;g&eacute; -> Prot<C3><A9>g<C3><A9>
The encoding issue is a major one. I haven't looked into this fully yet, but have noticed some troubles with how HTML entities are inappropriately transformed during the conversion process. Cheers, David
I'm posting to this bug report as a follow-up for Debian's bug report #506584, as the bug reported there is part of this one. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=506584 Show quoted text
> (...)
> > <a href='class_browser.html'>Class Browser</a> > > > > becomes: > > > > [class_browser.html Class Browser] > > > > rather than > > > > [[class_browser.html|Class Browser]]
> > This relates to the often-confusing base_uri and wiki_uri attributes, > for which I apologize. Most html2wiki conversions should be using both > of these attributes. For example, if your wiki's Main Page exists at > http://www.mywiki.com/main_page.html, then you might set up your > HTML::WikiConverter something like this: > > use HTML::WikiConverter; > my $wc = new HTML::WikiConverter( > dialect => 'MediaWiki', > base_uri => 'http://www.mywiki.com/', > wiki_uri => 'http://www.mywiki.com/' > ); > > # That way, this: > > print $wc->html2wiki( html => q{<a href='class_browser.html'>Class > Browser</a>} ); > > becomes: > > [[class browser.html|Class Browser]]
Please note in your documentation that -at least from what I can observe- the user must include the full wiki URI in the links. What do I mean by this? Our bug submitter mentions he tried this to test: $ echo '<a href="#bla">ooo</a>'|html2wiki --dialect MediaWiki [#bla ooo] That should be [[#bla|ooo]] Now, by specifying every switch fully, I get the right result: $ echo '<a href="http://foo.org/wiki/#bla">ooo</a>'|html2wiki --dialect MediaWiki --base-uri=http://foo.org --wiki-uri=http://foo.org/wiki/ [[#bla|ooo]] However, this requires quite a bit of duplication. For instance, I cannot use a relative wiki URI: $ echo '<a href="http://foo.org/wiki/#bla">ooo</a>'|html2wiki --dialect MediaWiki --base-uri=http://foo.org --wiki-uri=/wiki/ [http://foo.org/wiki/#bla ooo] Nor use a relative path: echo '<a href="/wiki/#bla">ooo</a>'|html2wiki --dialect MediaWiki --base-uri=http://foo.org --wiki-uri=/wiki/ [http://foo.org/wiki/#bla ooo] Anyway - thanks for at least documenting this far enough for me to notify the user for the workaround :)
On Mon Nov 24 15:03:26 2008, GWOLF wrote: Show quoted text
> Please note in your documentation that -at least from what I can > observe- the user must include the full wiki URI in the links. What do I > mean by this? Our bug submitter mentions he tried this to test: > > $ echo '<a href="#bla">ooo</a>'|html2wiki --dialect MediaWiki > [#bla ooo] > > That should be > [[#bla|ooo]] > > Now, by specifying every switch fully, I get the right result: > > $ echo '<a href="http://foo.org/wiki/#bla">ooo</a>'|html2wiki > --dialect MediaWiki --base-uri=http://foo.org > --wiki-uri=http://foo.org/wiki/ > [[#bla|ooo]] > > However, this requires quite a bit of duplication. For instance, I > cannot use a relative wiki URI: > > $ echo '<a href="http://foo.org/wiki/#bla">ooo</a>'|html2wiki > --dialect MediaWiki --base-uri=http://foo.org --wiki-uri=/wiki/ > [http://foo.org/wiki/#bla ooo] > > Nor use a relative path: > > echo '<a href="/wiki/#bla">ooo</a>'|html2wiki --dialect MediaWiki > --base-uri=http://foo.org --wiki-uri=/wiki/ > [http://foo.org/wiki/#bla ooo] > > Anyway - thanks for at least documenting this far enough for me to > notify the user for the workaround :)
Gunnar, thank you for following-up here and for the help you provided at [1]. These issues should be resolved with the more recent versions of HTML::WikiConverter. Specifically, the --wiki-uri option now accepts a relative path, provided that --base-uri has been passed an absolute path. That is, the following will now work: echo '<a href="/wiki/#bla">ooo</a>' | \ html2wiki \ --dialect MediaWiki \ --base-uri=http://foo.org \ --wiki-uri=/wiki/ [[#bla|ooo]] Hope that helps. -- David Iberri [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=506584