Skip Menu |

This queue is for tickets about the URI CPAN distribution.

Report information
The Basics
Id: 62860
Status: rejected
Priority: 0/
Queue: URI

People
Owner: Nobody in particular
Requestors: JEB [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in:
  • 1.35
  • 1.56
Fixed in: (no value)



Hi, the following seems to fail to get the query string parsed: my $url_raw = "http://www.howardforums.com/#9733;-The-WIND-Mobile-Sticky-FAQ-★?s=1075be0490f4912a4f9ee738c1a4a243&p=13343898"; my $url_object = URI->new( $fields[1] ); print $url_object->query; # prints "" I would have expected "s=1075be0490f4912a4f9ee738c1a4a243&p=13343898".
The issue appears to be the Unicode characters, "#9733;" (BLACK STAR, http://www.fileformat.info/info/unicode/char/2605/index.htm).
I think you are confusing HTML escaping with what the syntax for URIs are. Basically you need to pass the values from HTML documents through HTML::Entities' decode_entities() function. The '#' otherwise signals the start of the fragment part of the URI.