Skip Menu |

This queue is for tickets about the HTML-Parser CPAN distribution.

Report information
The Basics
Id: 46099
Status: resolved
Priority: 0/
Queue: HTML-Parser

People
Owner: Nobody in particular
Requestors: bdfoy [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Make iframe parsing configurable
Since the latest versions of HMTL::Parser do not parse the content of iframes, some of my applications using HTML::SimpleLinkExtor have broken. The text between the iframe tags is what the browser displays and is usually more HTML, and I need to be able to extract any links in that text. I'd like to at least be able to turn on parsing for iframes, even if it is off by default.
On Fri May 15 02:15:45 2009, BDFOY wrote: Show quoted text
> Since the latest versions of HMTL::Parser do not parse the content of > iframes, some of my applications using HTML::SimpleLinkExtor have > broken. The text between the iframe tags is what the browser displays > and is usually more HTML, and I need to be able to extract any links in > that text.
Browsers that support iframes are supposed to ignore everything inside the iframe. They are supposed to render the HTML found at the 'src' location. Show quoted text
> I'd like to at least be able to turn on parsing for iframes, even if it > is off by default.
I see the point if you need to emulate the behaviour of very old browsers. A workaround is to invoke a subparser on the iframe content text. I'll see if I find an easier way to do this.
The TODO file has this entry: - make literal tags configurable. The current list is hardcoded to be "script", "style", "title", "iframe", "textarea", "xmp", and "plaintext". which would be my preferred way to fix this.
Making literal tags configurable would also be useful for those doing javascript templates with <script type="text/html"> tags.
From: andrew [...] pimlott.net
On Sat Jun 20 05:17:40 2009, GAAS wrote: Show quoted text
> > I'd like to at least be able to turn on parsing for iframes, even if
> it
> > is off by default.
> > I see the point if you need to emulate the behaviour of very old > browsers.
What is the point of not parsing the content of iframes? I can't find any justification, and it seems at odds both with the spec and user expectations. Removing this special case would make HTML::Parser simpler and more uniform. Andrew
I explained the point just above the text you quoted.  What's "the spec" you'r refering to?