Bug #73068 for WWW-Class: Regexp-based HTML parsing is sucky.

RT for rt.cpan.org

This queue is for tickets about the WWW-Class CPAN distribution.

Report information

The Basics

Id:	73068
Status:	new
Priority:	0/
Queue:	WWW-Class

People

Owner:	Nobody in particular
Requestors:	perl [...] toby.ink
Cc:
AdminCc:

Bug Information

Severity:	(no value)
Broken in:	(no value)
Fixed in:	(no value)

History Show all quoted text

Wed Dec 07 08:02:18 2011 perl [...] toby.ink - Ticket created

Subject:

Regexp-based HTML parsing is sucky.

Not specifically your implementation of it, but the entire idea of parsing HTML using regexps is broken. Particular problems in your implementation... It can't identify the following <title> element: <title lang="en">Hello World</title> It can't find this link: <a href="https://metacpan.org/">CPAN</a>