Skip Menu |

This queue is for tickets about the WWW-UsePerl-Journal CPAN distribution.

Report information
The Basics
Id: 13748
Status: resolved
Priority: 0/
Queue: WWW-UsePerl-Journal

People
Owner: BARBIE [...] cpan.org
Requestors: simonw [...] digitalcraftsmen.net
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: 0.12
Fixed in: (no value)



Subject: Fails tests with new use.perl layout
The new Use Perl xhtml layout breaks the tests. Attached patch fixes the tests.
--- WWW-UsePerl-Journal-0.12/lib/WWW/UsePerl/Journal.pm Sun Apr 17 07:20:19 2005 +++ WWW-UsePerl-Journal-0.12-new/lib/WWW/UsePerl/Journal.pm Mon Jul 18 13:07:16 2005 @@ -80,7 +80,7 @@ "/journal.pl?op=list&uid=$uid")->content; die "Cannot connect to " . UP_URL unless $content; - $content =~ m#<HTML><HEAD><TITLE>Journal of (.*?) \(\d+\)</TITLE># + $content =~ m#<title>Journal of (.*?) \(\d+\)</title># or die "$uid does not exist"; $1; } @@ -122,7 +122,25 @@ my @entries; -# Sample of this on 04/10/2002 +# Sample of this on 18/07/2005 +# <div class="search-results"> +# <h4> +# <a href="//use.perl.org/~pjf/journal/25733">Losing money internationally</a> +# </h4> +# <div class="data"> +# On Saturday July 16, @01:44AM +# </div> +# <div class="intro"> +# Losing money internationally +#I deal with banks regularly, and while issues... +# </div> +# <div class="author"> +# Author: <a href="//use.perl.org/~pjf/">pjf</a> +# </div> +# +#</div> + +# Old sample from 04/10/2002 #<B><A HREF="//use.perl.org/~davorg/journal/8165">Buy More Books</A></B><BR> # <FONT SIZE="-1">On 2002.10.04 6:24</FONT><BR> # Yesterday I got my royalty statement for sales of Data Munging with Perl in the...<BR> @@ -133,19 +151,11 @@ # <P> while ( $content =~ m# - <B><A\s*HREF="$site/~(\w+)/journal/(\d+)">(.+?)</A></B><BR> - \s* - <FONT\s*SIZE="-1">On\s*(.+?)</FONT><BR> - \s* - .+?<BR> - \s* - <FONT\s*SIZE="-1"> - \s* - Author:\s*<A\s*HREF="$site/~(\w+)/">(\w+)</A> - \s* - </FONT> + <h4>\s*<a\s*href="$site/~(\w+)/journal/(\d+)">(.+?)</a>\s*</h4> \s* - <P> + <div\sclass="data">\s*On\s*(.+?)\s+</div> + .+? + Author:\s*<a\s*href="$site/~(\w+)/">(\w+)</a> #migxs ) { die "$5 is not $6" if $5 ne $6; my $time = Time::Piece->strptime($4, '%Y.%m.%d %H:%M'); @@ -176,10 +186,7 @@ s/^.*\Q<!-- start template: ID 251, journalsearch;search;default -->\E//sm; $content =~ - s/<A HREF=\"$site\/search\.pl\?threshold=0&op=journals - &sort=1&amp;start=30">Next 30 matches&gt; - <\/A>\s*<P>\s* - <!-- end template: ID 251, journalsearch;search;default -->.*$//sm; + s/<div class="pagination.*$//sm; return $content; }
Have extensively gone through the screen scraping regexes, and converted them. Several changes still to be done, but this is the first working version. fixed in 0.13.