Subject: | HTML::ExtractMain - problems finding relevant content |
Date: | Tue, 8 Feb 2011 12:24:09 +0000 |
To: | bug-html-extractmain [...] rt.cpan.org |
From: | Carla Teixeira Lopes <carla.lopes [...] fe.up.pt> |
Hi,
I'm using HTML::ExtractMain to extract the main content of web pages
and I'm detecting problems in webpages where there should be no
problems. For example I try to use it with the contents of the webpage
http://www.carlalopes.com/research.html, it can not find any relevant
document. This page is very "clean" and well-formed.
I also tried to use the Readability application, online at
http://lab.arc90.com/experiments/readability/, and no error is
returned.
Any idea why this happens?
Thanks,
Carla Teixeira Lopes