Bug #17330 for HTML-WikiConverter: HTML to MediaWiki Bug/Feature-Request

Sat Jan 28 00:12:51 2006 Guest - Ticket created

Subject:

HTML to MediaWiki Bug/Feature-Request

I am converting the HTML created by PhpWiki into MediaWiki format using the online converter. The problem is that the HTML wiki-links that PhpWiki creates attach a LONG alpha-numeric SESSIONID string which the converter preserves in MediaWiki format. I don't want that string preserved as it's obviously session-specific and incomprehensible to MediaWiki. How to duplicate this error: 1. Go to the online converter at http://diberri.dyndns.org/html2wiki.html and enter http://wiki.boalt.org/index.php?pagename=BoaltGroupWiki in the Fetch URL blank. 2. Insert http://wiki.boalt.org/index.php?pagename= in the URL for wiki links blank. 3. Ensure that MediaWiki dialect is selected and click "Convert HTML to wiki markup." Note all the &PHPSESSID=[really long alpha-numeric string here] that appear. Those should all be dropped.

Sat Jan 28 01:22:58 2006 diberri [...] cpan.org - Correspondence added

On Sat Jan 28 00:12:51 2006, guest wrote: Show quoted text

> I am converting the HTML created by PhpWiki into MediaWiki format using > the online converter. The problem is that the HTML wiki-links that > PhpWiki creates attach a LONG alpha-numeric SESSIONID string which the > converter preserves in MediaWiki format. I don't want that string > preserved as it's obviously session-specific and incomprehensible to > MediaWiki.

I can duplicate the bug exactly; thanks for the comprehensive test case. I've committed a patch to fix this in 0.51, which allows regular expressions to be passed to the 'wiki_uri' parameter (which corresponds to the "URL for wiki links" textbox on the web interface). You will be able to say things like this: my $wc = new HTML::WikiConverter( dialect => 'MediaWiki', wiki_uri => qr~pagename\=([^&]+)~ ); This will extract the wiki page name from the URL excluding the extraneous query string parameters. I'm also considering a new parameter, 'wiki_page_extractor' (or something better?) which would accept a coderef to override the default wiki page extractor that uses the 'wiki_uri' parameter the way 0.50 does. The coderef would take an HTML::WikiConverter object and a URI object (with the URI::QueryParam methods), and would extract and return the wiki page name from the URI: sub _extract_page_name { pop->query_param( 'pagename' ) } my $wc = new HTML::WikiConverter( dialect => 'MediaWiki', wiki_page_extractor => \&_extract_wiki_page ); It gets the job done, but I'm not completely thrilled with this interface. Any suggestions? -- David Iberri

Sat Jan 28 01:22:58 2006 The RT System itself - Status changed from 'new' to 'open'

Fri Feb 03 19:00:02 2006 diberri [...] cpan.org - Correspondence added 30 min

On Sat Jan 28 01:22:58 2006, DIBERRI wrote: Show quoted text

> I'm also considering a new parameter, 'wiki_page_extractor' (or > something better?) which would accept a coderef to override the default > wiki page extractor that uses the 'wiki_uri' parameter the way 0.50 > does. The coderef would take an HTML::WikiConverter object and a URI > object (with the URI::QueryParam methods), and would extract and return > the wiki page name from the URI: > > sub _extract_page_name { pop->query_param( 'pagename' ) } > > my $wc = new HTML::WikiConverter( > dialect => 'MediaWiki', > wiki_page_extractor => \&_extract_wiki_page > ); > > It gets the job done, but I'm not completely thrilled with this > interface. Any suggestions?

This API ships with 0.51, but I don't like it. 0.52 will therefore get rid of the wiki_page_extractor attrib and consolidate it into the wiki_uri attrib. wiki_uri now can take either a scalar or a ref to an array of scalars; each scalar can be a string URI prefix, a regexp, or a coderef. URI prefixes will be treated as before. A regexp can be used to match a wiki URI and capture the page title. A coderef will be used just as the wiki_page_extractor is described above. More details will be in the docs for 0.52. -- David Iberri

Fri Feb 03 19:00:28 2006 diberri [...] cpan.org - Status changed from 'open' to 'resolved'