Never mind. It was pilot error. Looks like sitemap.gz file get appended,
and this is why I was seeing dups.
Bryn
On Nov 8, 2007 3:38 PM, Bugs in WWW-Google-SiteMap via RT <
bug-WWW-Google-SiteMap@rt.cpan.org> wrote:
Show quoted text>
> Greetings,
>
> This message has been automatically generated in response to the
> creation of a trouble ticket regarding:
> "WWW-Google-SiteMap bug: duplicate URLs",
> a summary of which appears below.
>
> There is no need to reply to this message right now. Your ticket has been
> assigned an ID of [rt.cpan.org #30592]. Your ticket is accessible
> on the web at:
>
>
http://rt.cpan.org/Ticket/Display.html?id=30592
>
> Please include the string:
>
> [rt.cpan.org #30592]
>
> in the subject line of all future correspondence about this issue. To do
> so,
> you may reply to this message.
>
> Thank you,
> bug-WWW-Google-SiteMap@rt.cpan.org
>
> -------------------------------------------------------------------------
> I get duplicate URLs in my sitemap. Here is an easy fix for to urls() in
> WWW-Google-SiteMap-1.09/lib/WWW/Google/SiteMap.pm
>
> Maybe the bug is in the crawler part, but this is an easy fix.
>
> Bryn
>
>
> sub urls {
> my $self = shift;
> $self->{urls} = \@_ if @_;
> my %hist;
> my @urls = grep { ref($_) && defined $_->loc && !$hist{$_->loc}++}
> @{$self->{urls}};
> return wantarray ? @urls : \@urls;
> }
>
>