Skip Menu |

This queue is for tickets about the HTML-Parser CPAN distribution.

Report information
The Basics
Id: 9676
Status: resolved
Priority: 0/
Queue: HTML-Parser

People
Owner: Nobody in particular
Requestors: pronovic [...] debian.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in:
  • 3.40
  • 3.41
  • 3.42
  • 3.43
  • 3.44
  • 3.45
Fixed in: (no value)



Subject: Problem when <script> tag contains quoted newline
Hi Gisle - This bug duplicates the email I sent you on 09 January regarding Debian bug #289141: http://bugs.debian.org/289141 This bug has to do with parsing <script> tags. It seems that recent changes (in 3.40, I think) have changed the behavior for script tags which contain quoted newlines, i.e. <script>addrow( "a field", "another field", "<b>hello</b> this is a long field field continued", "final field" ) </script> I've dug through the differences between 3.38 and 3.40, and I've found the two lines of code which cause the behavior change. I've attached a patch (against 3.45) which "fixes" the behavior change, and doesn't break any existing regression tests. However, I'm not sure whether I should apply it because it looks like you might have intended to change the behavior.
Index: hparser.c =================================================================== RCS file: /opt/public/cvs/debian/libhtml-parser-perl/hparser.c,v retrieving revision 1.1.1.12 diff -u -r1.1.1.12 hparser.c --- hparser.c 28 Dec 2004 13:47:44 -0000 1.1.1.12 +++ hparser.c 9 Jan 2005 18:17:50 -0000 @@ -1520,8 +1520,6 @@ escape_next = 1; else if (inside_quote && *s == inside_quote) inside_quote = 0; - else if (*s == '\r' || *s == '\n') - inside_quote = 0; else if (!inside_quote && (*s == '"' || *s == '\'')) inside_quote = *s; }
This was behaviour that I picked up by reading the KHTML sources to figure out how they did it. Before I change this again I would like to do some research to figure out what MSIE and FireFox do in this situation. Is multiline quoted strings actually allowed in JavaScript?
It could anyway make sense to add a boolean option that simply disable the whole code that tries to parse quotes. KHTML also tracked quotes when so that a quoted ">" or "?>" would not terminate a processing instruction. I'm guessing this has something to do with parsing of unprocessed PHP.
From: pronovic [...] debian.org
Somehow, this reply that I sent to comment-HTML-Parser@rt.cpan.org never showed up in the bug (is it supposed to?), so I'm putting it in by hand now. ---------------------- On Thu, Jan 13, 2005 at 10:07:17PM -0600, Kenneth Pronovici wrote: Show quoted text
> > Full context and any attached attachments can be found at: > > <URL: http://rt.cpan.org/NoAuth/Bug.html?id=9676 > > > > > This was behaviour that I picked up by reading the KHTML sources > > to figure out how they did it. Before I change this again I would > > like to do some research to figure out what MSIE and FireFox do in > > this situation. Is multiline quoted strings actually allowed > > in JavaScript?
> > I figured you had a good reason for the change. I'll do some research > and see if I can figure out whether these strings are actually allowed.
Ok, I've done some digging in Google and Google Groups. I've found two pages that directly discuss string syntax within <script> tags. Both say that a line break is not allowed within a string literal. http://academ.hvcc.edu/~kantopet/javascript/index.php?page=js+syntax&parent=core +javascript "You also cannot have line-breaks inside text strings literals. If you need to run a text string across multiple lines, you should break the string into multiple tokens and use a concatenation operator to string it together." http://www.netmechanic.com/news/vol4/javascript_no23.htm "...JavaScript interprets the line breaks to mean that you're trying to close the string improperly." Besides these, there are a lot of conversations on comp.lang.javascript helping newbies debug exactly this problem (often errors about an unterminated string literal). I also found Douglas Crockford's online Javascript validator: http://www.crockford.com/javascript/jslint.html Even with the "strict line ending" option unchecked, the validator does not allow line breaks within literals. I tend to think that your implementation is correct, and I don't think you'd want to support obviously invalid syntax unless MSIE and/or Firefox do (and maybe not even then). I have pretty much zero experience with Javascript. However, I worked up this minimal sample page: <html> <head><title>Test Javascript Page</title></head> <body> <script language="javascript"> document.write("Short string<br>"); document.write("Longer...........................string.<br>"); /*document.write("Split string.");*/ </script> </body> </html> The first two document.write() lines should be valid. The third is the questionable string literal containing a line break. I've tested this so far in recent versions of Mozilla, Firefox, Epiphany, Kazehakase and Konqueror on my Debian box. All of them render the page properly with the split string commented out and render nothing (failure?) with the split string left in. I can't test MSIE on this box, unfortunately. Anyway, unless MSIE surprises me, I don't think you really need to change HTML::Parser. KEN -- Kenneth J. Pronovici <pronovic@debian.org>
The whole quote quirk stuff is gone in 3.57