Subject: | URI::host may return tainted data when called for the first time |
I'm trying to track down an obscure taint mode issue, probably in URI.
In short, for a URI object that had an explicit port in the string it
was constructed from, the first time $uri->host() is called it sometimes
(!) returns tainted data, and non-tainted when called again after that.
When it happens, it happens even if the string the URI was constructed
from was not tainted.
I've tracked it deep down in URI::_server::host. The line that causes
it is:
$old =~ s/:\d+$//; # remove the port
If I replace the \d with [[:digit:]] or [0-9] or \w or . , the problem
does not occur.
Longer explanation:
This occurs with the W3C Markup Validator when validating some HTML 5
documents and when configured to POST the HTML 5 document to an external
HTML 5 validator using a URI that has a port.
The issue is that for some retrieved documents, the POST fails with "500
Insecure dependency in connect while running with -T switch".
I am very confused about this, because it happens only for *some*
validated documents, not all. I don't see how the contents of the
validated (internally POSTed to the HTML 5 validator URL) documents
would be relevant. They're all POSTed to the same URL/host/port.
I have also failed to create a small reproducer, but using this version
of the markup validator:
http://dev.w3.org/cvsweb/validator/httpd/cgi-bin/check?rev=1.749&content-type=text/x-cvsweb-markup
, configured to POST the HTML 5 markup to for example to
http://qa-dev.w3.org:8888/html5/ or http://validator.nu:80/ , and
validating the document at http://htmlex.met.cz/ the problem occurs on
two different systems I have access to. As said it does not happen
with all documents, for example validating the content at
http://validator.nu/ instead of http://htmlex.met.xz/ it does not
happen. Also it does not happen if the HTML 5 validator where the
content is POSTed is configured to be http://validator.nu/ (without the
:80 in the URL).
I have found a couple of different workarounds, for none of which I can
tell why exactly they work around the issue, but they do:
a) Using $url->query("out=xml") instead of $url->query_form(out =>
"xml") in html5_validate() (around line 1167) in the validator code (see
above dev.w3.org URL).
b) Placing a throwaway $uri->host() call after the query_form in the
validator code in html5_validate() (again, see above dev.w3.org URL).
The string $CFG->{External}->{HTML5} from which the URI object is
created is not tainted.