Skip Menu |

This queue is for tickets about the Mail-Thread CPAN distribution.

Report information
The Basics
Id: 5025
Status: new
Priority: 0/
Queue: Mail-Thread

People
Owner: Nobody in particular
Requestors: jonas [...] truls.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 2.41
Fixed in: (no value)



Subject: Two bugs in "_group_set_bysubject"
Two bugs found in Mail::Thread's "_group_set_bysubject". I've attached a patch with my "fixes". 1: "_group_set_bysubject" would crash when all messages in a thread didn't have the same subject. This happened because when populating a hash with subjects, only the "topmost" subject for each container in the rootset was included. This lead to "_group_set_bysubject" trying to work on undefined objects because it fetches objects from the poorly populated subject hash. My fix is simply a recursive collection of *all* subjects into the hash instead of just the topmost ones. To do this, I created Mail::Thread::Container->_populate_hash_with_subjects() wich expects a hash reference as parameter. 2: "_group_set_bysubject" sometimes misplaces containers. This happens when it is "removing the "second" message from the root set". I really don't know why this happens, so I put an ugly workaround int the code. Actually fixing the bug would of course be better, but I don't know what it is. 3: Not a bug at all. Just a change that suits me. I've made Mail::Thread::Container->simple_subject() be a bit more brutal in order to be able to thread by subject for mailing lists and messges created by software where the "Re:" has been localized (for example, some swedish translations of Lookout Express uses "Sv:" instead of "Re:". Notes: I started programming perl just a few months ago. My fixes seems to work for me, but I might well have introduced memory leaks in my workaround for bug 2 above. I haven't really come to grips with when perls GC works and when it doesn't yet. A, yes. Including info that should be included: Distribution: Mail::Thread 2.41 installed from CPAN Perl Version: 5.8.2 OS: FreeBSD 4.9 RELEASE, Generic i386 Regards /Jonas Eckerman
--- Thread.pm Sun Jan 25 23:08:32 2004 +++ Thread.pm.jonas Sun Jan 25 23:08:21 2004 @@ -66,32 +66,45 @@ my $self = shift; my $root = $self->_container_class->new( 'fakeroot' ); $root->set_children( $self->rootset ); - my %subject; - for (my $walk = $root->child; $walk; $walk = $walk->next) { - my $sub = $walk->topmost->simple_subject or next; - # Add this container to the hash if: - # - There is no container in the hash with this subject, or - # - This one is a dummy container and the old one is not: the dummy - # one is more interesting as a root, so put it in the hash instead. - # - The container in the table has a "Re:" version of this subject, - # and this container has a non-"Re:" version of this subject. - # The non-re version is the more interesting of the two. - - my $old = $subject{$sub}; - if (!$old || - (!$walk->message && !$old->message) || - ($old->message && $old->isreply && - $walk->message && !$walk->isreply)) { - $subject{$sub} = $walk; - } - } + + # Poupulate the hash with *all* subjects! + # /Jonas Eckerman, 2004-01-25 + $root->_populate_hash_with_subjects(\%subject); + + # This code took for granted that all posts in the same thread + # has the same subject. It didn't live in reality! + # /Jonas Eckerman, 2004-01-25 + #for (my $walk = $root->child; $walk; $walk = $walk->next) { + # my $sub = $walk->topmost->simple_subject or next; + # # Add this container to the hash if: + # # - There is no container in the hash with this subject, or + # # - This one is a dummy container and the old one is not: the dummy + # # one is more interesting as a root, so put it in the hash instead. + # # - The container in the table has a "Re:" version of this subject, + # # and this container has a non-"Re:" version of this subject. + # # The non-re version is the more interesting of the two. + # + # my $old = $subject{$sub}; + # if (!$old || + # (!$walk->message && !$old->message) || + # ($old->message && $old->isreply && + # $walk->message && !$walk->isreply)) { + # $subject{$sub} = $walk; + # } + #} + return unless %subject; # %subject is now populated with one entry for each subject which # occurs in the root set. Now iterate over the root set, and # gather together the difference. + # Work around _group_set_bysubject tendency to misplace containers... + # /Jonas Eckerman 2004-01-25 + my @all = (); + $root->_collect_to_list(\@all); + my ($prev, $walk, $rest); for ($walk = $root->child, $rest = eval{ $walk->next }; $walk; @@ -99,12 +112,14 @@ my $subj = $walk->topmost->simple_subject or next; my $old = $subject{$subj}; next if $old == $walk; - # Remove the "second" message from the root set - if (!$prev) { $root->child( $walk->next ) } - else { $prev->next( $walk->next ) } + if (!$prev) { + $root->child( $walk->next ) + } + else { + $prev->next( $walk->next ) + } $walk->next(undef); - if (!$old->message && !$walk->message) { # They're both dummies; merge them. $old->add_child( $_ ) for $walk->children; @@ -126,16 +141,51 @@ $new->add_child( $_ ) for $old->children; $old->add_child( $walk ); $old->add_child( $new ); + # Work around _group_set_bysubject tendency to misplace containers... + # /Jonas Eckerman 2004-01-25 + push @all, $new; } # we've done a merge, so keep the same `prev' next time around. $walk = $prev; } + # Work around _group_set_bysubject tendency to misplace containers... + # /Jonas Eckerman 2004-01-25 + my %left = (); + $root->_collect_to_hash(\%left); + foreach my $cont (@all) { + if ($cont->message && !$left{$cont}) { + $cont->child( undef ); + $root->add_child( $cont ); + } + } + # repopulate the rootset from our fake one @{$self->{rootset}} = $root->children; $root->remove_child($_) for $self->rootset; } +# Collect all containers with messages to a list +sub _collect_to_list { + my $self = shift; + my @list = (); + foreach my $cont (@{$self->{rootset}}) { + $cont->_collect_to_list(\@list); + } + return @list; +} + +# Collect all containers with messages to a list +sub _collect_to_hash { + my $self = shift; + my %hash = (); + foreach my $cont (@{$self->{rootset}}) { + $cont->_collect_to_hash(\%hash); + } + return %hash; + +} + sub thread { my $self = shift; $self->_setup(); @@ -317,7 +367,6 @@ sub subject { $_[0]->header("subject") } sub header { eval { my $s = $_[0]->message->head->get( $_[1] ) || ''; chomp $s; $s; } } - sub topmost { my $self = shift; @@ -335,11 +384,63 @@ $subject =~ m{^re:\s+}i; } +# Collect containers with messages to an array +sub _collect_to_list { + my $self = shift; + my $list = shift; + while ($self) { + push @$list, $self if ($self->message); + _collect_to_list($self->child,$list); + $self = $self->next; + } +} + +# Collect containers with messages to a hash +sub _collect_to_hash { + my $self = shift; + my $hash = shift; + while ($self) { + ${$hash}{$self} = 1 if ($self->message); + _collect_to_hash($self->child,$hash); + $self = $self->next; + } +} + +# Poupulate the hash with *all* subjects! +# /Jonas Eckerman, 2004-01-25 +sub _populate_hash_with_subjects { + my $self = shift; + return if (!$self); + my $hash = shift; + my $subj; + while ($self) { + $subj = $self->simple_subject; + if ($subj) { + my $old = ${$hash}{$subj}; + if (!$old || + (!$self->message && !$old->message) || + ($old->message && $old->isreply && + $self->message && !$self->isreply)) { + ${$hash}{$subj} = $self; + } + } + _populate_hash_with_subjects($self->child,$hash); + $self = $self->next; + } +} + +# re: changed because I want this to work with +# stupidly localized software as well as mailinglists... +# /Jonas Eckerman, 2004-01-25 sub simple_subject { my $self = shift; my $subject = $self->subject; - $subject =~ s/^re:\s+//gi; - $subject; + $subject =~ s/^\s+//; + 1 while $subject =~ s/^([a-z]{1,3}|\[[-a-z]+\])(\s*\[\d*\])?[:-;]\s*//gi; + $subject =~ s/\s+/ /g; + $subject =~ s/\s+$//; + #$subject =~ s/^re:\s+//gi; + return lc($subject); } sub add_child {
From: jonas [...] truls.org
I have attached a diff between the code I'm using in a production system and version 2.41. The diff seems to work fine against 2.5 as well. This patch fixes the problems Mail::Thread has with subjects, and should fix some memory leaks as well. It's not much different from the previuous patch, but this time it's code that has been working perfectly fine on a server here for the last 6 months. Regards /Jonas
--- Thread.pm.org Sun Jan 25 18:38:31 2004 +++ Thread.pm Mon Jan 26 18:11:58 2004 @@ -66,32 +66,45 @@ my $self = shift; my $root = $self->_container_class->new( 'fakeroot' ); $root->set_children( $self->rootset ); - my %subject; - for (my $walk = $root->child; $walk; $walk = $walk->next) { - my $sub = $walk->topmost->simple_subject or next; - # Add this container to the hash if: - # - There is no container in the hash with this subject, or - # - This one is a dummy container and the old one is not: the dummy - # one is more interesting as a root, so put it in the hash instead. - # - The container in the table has a "Re:" version of this subject, - # and this container has a non-"Re:" version of this subject. - # The non-re version is the more interesting of the two. - - my $old = $subject{$sub}; - if (!$old || - (!$walk->message && !$old->message) || - ($old->message && $old->isreply && - $walk->message && !$walk->isreply)) { - $subject{$sub} = $walk; - } - } + + # Poupulate the hash with *all* subjects! + # /Jonas Eckerman, 2004-01-25 + $root->_populate_hash_with_subjects(\%subject); + + # This code took for granted that all posts in the same thread + # has the same subject. It didn't live in reality! + # /Jonas Eckerman, 2004-01-25 + #for (my $walk = $root->child; $walk; $walk = $walk->next) { + # my $sub = $walk->topmost->simple_subject or next; + # # Add this container to the hash if: + # # - There is no container in the hash with this subject, or + # # - This one is a dummy container and the old one is not: the dummy + # # one is more interesting as a root, so put it in the hash instead. + # # - The container in the table has a "Re:" version of this subject, + # # and this container has a non-"Re:" version of this subject. + # # The non-re version is the more interesting of the two. + # + # my $old = $subject{$sub}; + # if (!$old || + # (!$walk->message && !$old->message) || + # ($old->message && $old->isreply && + # $walk->message && !$walk->isreply)) { + # $subject{$sub} = $walk; + # } + #} + return unless %subject; # %subject is now populated with one entry for each subject which # occurs in the root set. Now iterate over the root set, and # gather together the difference. + # Work around _group_set_bysubject tendency to misplace containers... + # /Jonas Eckerman 2004-01-25 + my @all = (); + $root->_collect_to_list(\@all); + my ($prev, $walk, $rest); for ($walk = $root->child, $rest = eval{ $walk->next }; $walk; @@ -99,12 +112,14 @@ my $subj = $walk->topmost->simple_subject or next; my $old = $subject{$subj}; next if $old == $walk; - # Remove the "second" message from the root set - if (!$prev) { $root->child( $walk->next ) } - else { $prev->next( $walk->next ) } + if (!$prev) { + $root->child( $walk->next ) + } + else { + $prev->next( $walk->next ) + } $walk->next(undef); - if (!$old->message && !$walk->message) { # They're both dummies; merge them. $old->add_child( $_ ) for $walk->children; @@ -126,16 +141,76 @@ $new->add_child( $_ ) for $old->children; $old->add_child( $walk ); $old->add_child( $new ); + # Work around _group_set_bysubject tendency to misplace containers... + # /Jonas Eckerman 2004-01-25 + push @all, $new; } # we've done a merge, so keep the same `prev' next time around. $walk = $prev; } + # Work around _group_set_bysubject tendency to misplace containers... + # /Jonas Eckerman 2004-01-25 + my %left = (); + $root->_collect_to_hash(\%left); + foreach my $cont (@all) { + if ($cont->message && !$left{$cont}) { + $cont->child( undef ); + $cont->next( undef ); + $root->add_child( $cont ); + $left{$cont} = 1; + } + } + # repopulate the rootset from our fake one @{$self->{rootset}} = $root->children; $root->remove_child($_) for $self->rootset; } +# Collect all message-ids to list +sub _collect_msgid_to_list { + my $self = shift; + my $all = shift; + my @list = (); + foreach my $cont (@{$self->{rootset}}) { + $cont->_collect_msgid_to_list(\@list,$all); + } + return @list; +} + +# Collect all message-ids to a hash +sub _collect_msgid_to_hash { + my $self = shift; + my $all = shift; + my %hash = (); + foreach my $cont (@{$self->{rootset}}) { + $cont->_collect_msgid_to_hash(\%hash,$all); + } + return %hash; +} + +# Collect all containers (with messages unless $all) to a list +sub _collect_to_list { + my $self = shift; + my $all = shift; + my @list = (); + foreach my $cont (@{$self->{rootset}}) { + $cont->_collect_to_list(\@list,$all); + } + return @list; +} + +# Collect all containers (with messages unless $all) to a hash +sub _collect_to_hash { + my $self = shift; + my $all = shift; + my %hash = (); + foreach my $cont (@{$self->{rootset}}) { + $cont->_collect_to_hash(\%hash,$all); + } + return %hash; +} + sub thread { my $self = shift; $self->_setup(); @@ -304,6 +379,25 @@ $root->remove_child($_) for @kids; } +# We don't like all the circular references eating +# memory until something bad happens. +# Jonas Eckerman, 2004-01-26 +sub DESTROY { + my $self = shift; + my @conts = $self->_collect_to_list(1); + delete $self->{seen} if ($self->{seen}); + delete $self->{id_table} if ($self->{id_table}); + delete $self->{messages}; + delete $self->{rootset}; + for (my $i=0; $i<@conts; $i++) { + $conts[$i]->child( undef ); + $conts[$i]->parent( undef ); + $conts[$i]->next( undef ); + $conts[$i] = undef; + } + @conts = (); +} + package Mail::Thread::Container; use Carp qw(carp confess croak cluck); @@ -317,7 +411,6 @@ sub subject { $_[0]->header("subject") } sub header { eval { my $s = $_[0]->message->head->get( $_[1] ) || ''; chomp $s; $s; } } - sub topmost { my $self = shift; @@ -332,14 +425,92 @@ sub isreply { my $self = shift; my $subject = $self->subject or return; - $subject =~ m{^re:\s+}i; + $subject =~ m{^\w\w\w?:\s+}i; } +# Collect messaage-ids to array +sub _collect_msgid_to_list { + my $self = shift; + my $list = shift; + my $all = shift; + while ($self) { + push @$list, $self->messageid if ($self->messageid && ($all || $self->message)); + _collect_msgid_to_list($self->child,$list); + $self = $self->next; + } +} + +# Collect message-ids to hash +sub _collect_msgid_to_hash { + my $self = shift; + my $hash = shift; + my $all = shift; + while ($self) { + ${$hash}{$self->messageid} = 1 if ($self->messageid && ($all || $self->message)); + _collect_msgid_to_hash($self->child,$hash); + $self = $self->next; + } +} + +# Collect containers (with messages unless $all) to an array +sub _collect_to_list { + my $self = shift; + my $list = shift; + my $all = shift; + while ($self) { + push @$list, $self if ($all || $self->message); + _collect_to_list($self->child,$list); + $self = $self->next; + } +} + +# Collect containers (with messages unless $all) to a hash +sub _collect_to_hash { + my $self = shift; + my $hash = shift; + my $all = shift; + while ($self) { + ${$hash}{$self} = 1 if ($all || $self->message); + _collect_to_hash($self->child,$hash); + $self = $self->next; + } +} + +# Poupulate the hash with *all* subjects! +# /Jonas Eckerman, 2004-01-25 +sub _populate_hash_with_subjects { + my $self = shift; + return if (!$self); + my $hash = shift; + my $subj; + while ($self) { + $subj = $self->simple_subject; + if ($subj) { + my $old = ${$hash}{$subj}; + if (!$old || + (!$self->message && !$old->message) || + ($old->message && $old->isreply && + $self->message && !$self->isreply)) { + ${$hash}{$subj} = $self; + } + } + _populate_hash_with_subjects($self->child,$hash); + $self = $self->next; + } +} + +# re: changed because I want this to work with +# stupidly localized software as well as mailinglists... +# /Jonas Eckerman, 2004-01-25 sub simple_subject { my $self = shift; my $subject = $self->subject; - $subject =~ s/^re:\s+//gi; - $subject; + $subject =~ s/^\s+//; + 1 while $subject =~ s/^([a-z]{2,3}|\[[-a-z]+\])(\s*\[\d*\])?[:-;]\s*//gi; + $subject =~ s/\s+/ /g; + $subject =~ s/\s+$//; + #$subject =~ s/^re:\s+//gi; + return lc($subject); } sub add_child { @@ -481,6 +652,17 @@ } return unless $after; while (@visited) { $after->(@{ pop @visited }) } +} + +# We don't like all the circular references eating +# memory until something bad happens. +# Jonas Eckerman, 2004-01-26 +sub DESTROY { + my $self = shift; + $self->child( undef ); + $self->parent( undef ); + $self->next( undef ); + $self->message ( undef ); } 1;