Bug #13748 for WWW-UsePerl-Journal: Fails tests with new use.perl layout

Mon Jul 18 08:13:51 2005 Guest - Ticket created

Subject:

Fails tests with new use.perl layout

The new Use Perl xhtml layout breaks the tests. Attached patch fixes the tests.

--- WWW-UsePerl-Journal-0.12/lib/WWW/UsePerl/Journal.pm Sun Apr 17 07:20:19 2005 +++ WWW-UsePerl-Journal-0.12-new/lib/WWW/UsePerl/Journal.pm Mon Jul 18 13:07:16 2005 @@ -80,7 +80,7 @@ "/journal.pl?op=list&uid=$uid")->content; die "Cannot connect to " . UP_URL unless $content; - $content =~ m#<HTML><HEAD><TITLE>Journal of (.*?) $\d+$</TITLE># + $content =~ m#<title>Journal of (.*?) $\d+$</title># or die "$uid does not exist"; $1; } @@ -122,7 +122,25 @@ my @entries; -# Sample of this on 04/10/2002 +# Sample of this on 18/07/2005 +# <div class="search-results"> +# <h4> +# <a href="//use.perl.org/~pjf/journal/25733">Losing money internationally</a> +# </h4> +# <div class="data"> +# On Saturday July 16, @01:44AM +# </div> +# <div class="intro"> +# Losing money internationally +#I deal with banks regularly, and while issues... +# </div> +# <div class="author"> +# Author: <a href="//use.perl.org/~pjf/">pjf</a> +# </div> +# +#</div> + +# Old sample from 04/10/2002 #<B><A HREF="//use.perl.org/~davorg/journal/8165">Buy More Books</A></B><BR> # <FONT SIZE="-1">On 2002.10.04 6:24</FONT><BR> # Yesterday I got my royalty statement for sales of Data Munging with Perl in the...<BR> @@ -133,19 +151,11 @@ # <P> while ( $content =~ m# - <B><A\s*HREF="$site/~(\w+)/journal/(\d+)">(.+?)</A></B><BR> - \s* - <FONT\s*SIZE="-1">On\s*(.+?)</FONT><BR> - \s* - .+?<BR> - \s* - <FONT\s*SIZE="-1"> - \s* - Author:\s*<A\s*HREF="$site/~(\w+)/">(\w+)</A> - \s* - </FONT> + <h4>\s*<a\s*href="$site/~(\w+)/journal/(\d+)">(.+?)</a>\s*</h4> \s* - <P> + <div\sclass="data">\s*On\s*(.+?)\s+</div> + .+? + Author:\s*<a\s*href="$site/~(\w+)/">(\w+)</a> #migxs ) { die "$5 is not $6" if $5 ne $6; my $time = Time::Piece->strptime($4, '%Y.%m.%d %H:%M'); @@ -176,10 +186,7 @@ s/^.*\Q\E//sm; $content =~ - s/<A HREF=\"$site\/search\.pl\?threshold=0&op=journals - &sort=1&start=30">Next 30 matches> - <\/A>\s*<P>\s* - .*$//sm; + s/<div class="pagination.*$//sm; return $content; }

Tue Jul 26 02:25:53 2005 BARBIE [...] cpan.org - Correspondence added

Have extensively gone through the screen scraping regexes, and converted them. Several changes still to be done, but this is the first working version. fixed in 0.13.

Tue Jul 26 02:25:54 2005 BARBIE [...] cpan.org - Status changed from 'new' to 'resolved'

Tue Jul 26 02:25:54 2005 BARBIE [...] cpan.org - Given to BARBIE