On Thu Feb 19 15:00:19 2015, grtodd@gmail.com wrote:
Show quoted text> MetaCPAN's "Recent" feed works well and Perlanet creates links like
> the following for aggregation:
>
>
http://metacpan.org/release/DWHEELER/Pod-Simple-3.29_6
>
> The "News" feed at MetaCPAN (
https://metacpan.org/feed/news) however
> uses URLs and links like this (with anchors):
>
>
https://metacpan.org/news#sslimprovements
>
> which are lowercase with white space removed (note the "#"). When
> Perlnaet tries to create an aggregation from this feed it URL encodes
> "#" as %@# the resulting links look like:
>
>
http://metacpan.org/news%23SSL%20improvements
>
> and thus break since if # is urlencoded it is not seen as an anchor,
> but as a literal character in the path.
>
> I have no idea if this is a Perlanet bug or not nor how or where to
> fix it. There may be some sort of discrepancy between the RDF/Atom
> feed describing the page and the actual source of he actual page.
>
> A work around might be to add "/" to the end of the URL which causes
> "%23" to be seen as an anchor. For example:
>
>
http://metacpan.org/news/%23SSL%20improvements
>
> does find the page - if not the actual anchor location. Or perhaps
> adjusting settings when the HTML::Scrubber object is created - but I
> haven't investigated further.
Hi,
It looks like there are a few things going on here.
Firstly, there's no problem with the feed handling. If you're generating a feed file and you look at the URLs that are in that, then you'll see that they are correct.
Secondly, MetaCPAN are creating invalid URLs. They all have spaces in - and spaces shouldn't exist in URLs. They should all be encoded to %20 or +. The URL you give as an example (
https://metacpan.org/news#sslimprovements) doesn't exist in their feed. It's actually "
http://metacpan.org/news#SSL improvements".
Thirdly, MetaCPAN are creating URLs that contain fragments which link to <a> elements that don't exist. If they publish a URL like
https://metacpan.org/news#sslimprovements then you'd expect to find an <a> element like <a name="sslimprovements">. That doesn't exist in the HTML source.
So, even if Perlanet worked as expected, your links wouldn't work because the MetaCPAN site is broken. I'll see if I can submit a patch to them to fix those issues.
But, there is still a problem with the page that Perlanet is generating for you. I don't think that it should change '#' to '%23'. That's happening because in the sample TT file which I provide (and which, I assume you copied) I use the 'uri' filter to clean up URLs for display.
A quick fix would be to remove the 'url' filter. But I need to think about what other effects that might have. I think it's good practice to have it there (in most cases).
It might be a bug in TT's 'uri' filter. It might need to add '#' to the list of characters that it doesn't touch.
Thanks for the report.
Cheers,
Dave...