Skip Menu |

This queue is for tickets about the XML-LibXML CPAN distribution.

Report information
The Basics
Id: 61507
Status: open
Priority: 0/
Queue: XML-LibXML

People
Owner: Nobody in particular
Requestors: chip [...] pobox.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 1.70
Fixed in: (no value)



Subject: Memory leak after enabling full recovery and suppressing everything
In production code at Topsy (my day job), I changed many usages of XML::LibXML that looked roughly like: my $xml = XML::LibXML->new(); $parser->recover(1); my $dom = $xml->parse_html_string($self->content); to: my $xml = XML::LibXML->new(); my $dom = $xml->parse_html_string($self->content, { recover => 2, suppress_errors => 1, suppress_warnings => 1 }); After this change, the programs in question started leaking memory like a sieve. After we reverted the change, the leak stopped. This symptom occured with both versions 1.69 and 1.70. What more detail or diagnostics would you like?
Subject: Re: [rt.cpan.org #61507] Memory leak after enabling full recovery and suppressing everything
Date: Tue, 21 Sep 2010 07:51:03 +0200
To: "bug-XML-LibXML [...] rt.cpan.org" <bug-XML-LibXML [...] rt.cpan.org>
From: Christian Glahn <christian.glahn [...] lo-f.at>
Hi Chip, Thanks for reporting. A minimal test case would speed up the process and secure that related bugs are not reintroduced with a later version. There is a special test suite for memory problems in the distribution. Please use this as a reference for providing details and diagnostics through a test case. Cheers Christian On 21 Sep 2010, at 04:15, "Chip Salzenberg via RT" <bug-XML-LibXML@rt.cpan.org> wrote: Show quoted text
> Mon Sep 20 22:15:19 2010: Request 61507 was acted upon. > Transaction: Ticket created by CHIPS > Queue: XML-LibXML > Subject: Memory leak after enabling full recovery and suppressing everything > Broken in: 1.70 > Severity: (no value) > Owner: Nobody > Requestors: chip@pobox.com > Status: new > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=61507 > > > > In production code at Topsy (my day job), I changed many usages of XML::LibXML that looked roughly like: > > my $xml = XML::LibXML->new(); > $parser->recover(1); > my $dom = $xml->parse_html_string($self->content); > > to: > > my $xml = XML::LibXML->new(); > my $dom = $xml->parse_html_string($self->content, { recover => 2, suppress_errors => 1, suppress_warnings => > 1 }); > > After this change, the programs in question started leaking memory like a sieve. After we reverted the change, > the leak stopped. This symptom occured with both versions 1.69 and 1.70. > > What more detail or diagnostics would you like? >
Subject: Re: [rt.cpan.org #61507] Memory leak after enabling full recovery and suppressing everything
Date: Thu, 23 Sep 2010 18:13:47 -0700
To: bug-XML-LibXML [...] rt.cpan.org
From: Chip Salzenberg <chip [...] pobox.com>
I've found the leak. Do you use git or svn? That would simplify my patch generation process. On 9/20/2010 10:50 PM, Christian Glahn via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=61507 > > > Hi Chip, > > Thanks for reporting. > > A minimal test case would speed up the process and secure that related bugs are not reintroduced with a later version. > > There is a special test suite for memory problems in the distribution. Please use this as a reference for providing details and diagnostics through a test case. > > Cheers > Christian > > On 21 Sep 2010, at 04:15, "Chip Salzenberg via RT" <bug-XML-LibXML@rt.cpan.org> wrote: >
>> Mon Sep 20 22:15:19 2010: Request 61507 was acted upon. >> Transaction: Ticket created by CHIPS >> Queue: XML-LibXML >> Subject: Memory leak after enabling full recovery and suppressing everything >> Broken in: 1.70 >> Severity: (no value) >> Owner: Nobody >> Requestors: chip@pobox.com >> Status: new >> Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=61507 > >> >> >> In production code at Topsy (my day job), I changed many usages of XML::LibXML that looked roughly like: >> >> my $xml = XML::LibXML->new(); >> $parser->recover(1); >> my $dom = $xml->parse_html_string($self->content); >> >> to: >> >> my $xml = XML::LibXML->new(); >> my $dom = $xml->parse_html_string($self->content, { recover => 2, suppress_errors => 1, suppress_warnings => >> 1 }); >> >> After this change, the programs in question started leaking memory like a sieve. After we reverted the change, >> the leak stopped. This symptom occured with both versions 1.69 and 1.70. >> >> What more detail or diagnostics would you like? >>
>
Hi, you can use svn co svn://axkit.org/XML-LibXML/trunk (see http://cpansearch.perl.org/src/PAJAS/XML-LibXML-1.70/README), but it is readonly. Please attach a patch to this thread. - Petr On Thu Sep 23 21:14:05 2010, CHIPS wrote: Show quoted text
> I've found the leak. Do you use git or svn? That would simplify my > patch generation process. > > On 9/20/2010 10:50 PM, Christian Glahn via RT wrote:
> > <URL: https://rt.cpan.org/Ticket/Display.html?id=61507 > > > > > Hi Chip, > > > > Thanks for reporting. > > > > A minimal test case would speed up the process and secure that
> related bugs are not reintroduced with a later version.
> > > > There is a special test suite for memory problems in the
> distribution. Please use this as a reference for providing details and > diagnostics through a test case.
> > > > Cheers > > Christian > > > > On 21 Sep 2010, at 04:15, "Chip Salzenberg via RT" <bug-XML-
> LibXML@rt.cpan.org> wrote:
> >
> >> Mon Sep 20 22:15:19 2010: Request 61507 was acted upon. > >> Transaction: Ticket created by CHIPS > >> Queue: XML-LibXML > >> Subject: Memory leak after enabling full recovery and
> suppressing everything
> >> Broken in: 1.70 > >> Severity: (no value) > >> Owner: Nobody > >> Requestors: chip@pobox.com > >> Status: new > >> Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=61507 > > >> > >> > >> In production code at Topsy (my day job), I changed many usages of
> XML::LibXML that looked roughly like:
> >> > >> my $xml = XML::LibXML->new(); > >> $parser->recover(1); > >> my $dom = $xml->parse_html_string($self->content); > >> > >> to: > >> > >> my $xml = XML::LibXML->new(); > >> my $dom = $xml->parse_html_string($self->content, { recover =>
> 2, suppress_errors => 1, suppress_warnings =>
> >> 1 }); > >> > >> After this change, the programs in question started leaking memory
> like a sieve. After we reverted the change,
> >> the leak stopped. This symptom occured with both versions 1.69 and
> 1.70.
> >> > >> What more detail or diagnostics would you like? > >>
> >
Subject: Re: [rt.cpan.org #61507] Memory leak after enabling full recovery and suppressing everything
Date: Sun, 26 Sep 2010 23:39:19 -0700
To: bug-XML-LibXML [...] rt.cpan.org
From: Chip Salzenberg <chip [...] pobox.com>
OK, here's the patch. The basic bug is that REPORT_ERROR and company can end up dying, which leaves various reference-counted RETVALs uncollected. This patch fixes the RETVALs that are SVs, but there are some other libxml data structure return values that this patch does not fix. PS: I wasn't consistent in my tab usage. Mea culpa On 9/26/2010 12:31 PM, Petr Pajas via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=61507 > > > Hi, > > you can use > > svn co svn://axkit.org/XML-LibXML/trunk > > (see http://cpansearch.perl.org/src/PAJAS/XML-LibXML-1.70/README), but it is readonly. > Please attach a patch to this thread. > > - Petr > > On Thu Sep 23 21:14:05 2010, CHIPS wrote:
>> I've found the leak. Do you use git or svn? That would simplify my >> patch generation process. >> >> On 9/20/2010 10:50 PM, Christian Glahn via RT wrote:
>>> <URL: https://rt.cpan.org/Ticket/Display.html?id=61507 > >>> >>> Hi Chip, >>> >>> Thanks for reporting. >>> >>> A minimal test case would speed up the process and secure that
>> related bugs are not reintroduced with a later version.
>>> There is a special test suite for memory problems in the
>> distribution. Please use this as a reference for providing details and >> diagnostics through a test case.
>>> Cheers >>> Christian >>> >>> On 21 Sep 2010, at 04:15, "Chip Salzenberg via RT" <bug-XML-
>> LibXML@rt.cpan.org> wrote:
>>>> Mon Sep 20 22:15:19 2010: Request 61507 was acted upon. >>>> Transaction: Ticket created by CHIPS >>>> Queue: XML-LibXML >>>> Subject: Memory leak after enabling full recovery and
>> suppressing everything
>>>> Broken in: 1.70 >>>> Severity: (no value) >>>> Owner: Nobody >>>> Requestors: chip@pobox.com >>>> Status: new >>>> Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=61507 > >>>> >>>> >>>> In production code at Topsy (my day job), I changed many usages of
>> XML::LibXML that looked roughly like:
>>>> my $xml = XML::LibXML->new(); >>>> $parser->recover(1); >>>> my $dom = $xml->parse_html_string($self->content); >>>> >>>> to: >>>> >>>> my $xml = XML::LibXML->new(); >>>> my $dom = $xml->parse_html_string($self->content, { recover =>
>> 2, suppress_errors => 1, suppress_warnings =>
>>>> 1 }); >>>> >>>> After this change, the programs in question started leaking memory
>> like a sieve. After we reverted the change,
>>>> the leak stopped. This symptom occured with both versions 1.69 and
>> 1.70.
>>>> What more detail or diagnostics would you like? >>>>
> > >

Message body is not shown because sender requested not to inline it.

On Mon Sep 27 02:39:47 2010, CHIPS wrote: Show quoted text
> OK, here's the patch. The basic bug is that REPORT_ERROR and company > can end up dying, which leaves various reference-counted RETVALs > uncollected. This patch fixes the RETVALs that are SVs, but there are > some other libxml data structure return values that this patch does > not fix. > > PS: I wasn't consistent in my tab usage. Mea culpa >
Hi Chip! Thanks for your patch, but can we also have a test added to it to t/11memory.t ? See the Mercurial repository at: https://bitbucket.org/shlomif/perl-xml-libxml/overview Also, please don't increment the version in all the .pm files - make the patch single-purposed. See the HACKING.txt file there for some information regarding coding style/etc. Looking forward to your next patch. Regards, -- Shlomi Fish. Show quoted text
> > On 9/26/2010 12:31 PM, Petr Pajas via RT wrote:
> > <URL: https://rt.cpan.org/Ticket/Display.html?id=61507 > > > > > Hi, > > > > you can use > > > > svn co svn://axkit.org/XML-LibXML/trunk > > > > (see http://cpansearch.perl.org/src/PAJAS/XML-LibXML-1.70/README),
> but it is readonly.
> > Please attach a patch to this thread. > > > > - Petr > > > > On Thu Sep 23 21:14:05 2010, CHIPS wrote:
> >> I've found the leak. Do you use git or svn? That would simplify
> my
> >> patch generation process. > >> > >> On 9/20/2010 10:50 PM, Christian Glahn via RT wrote:
> >>> <URL: https://rt.cpan.org/Ticket/Display.html?id=61507 > > >>> > >>> Hi Chip, > >>> > >>> Thanks for reporting. > >>> > >>> A minimal test case would speed up the process and secure that
> >> related bugs are not reintroduced with a later version.
> >>> There is a special test suite for memory problems in the
> >> distribution. Please use this as a reference for providing details
> and
> >> diagnostics through a test case.
> >>> Cheers > >>> Christian > >>> > >>> On 21 Sep 2010, at 04:15, "Chip Salzenberg via RT" <bug-XML-
> >> LibXML@rt.cpan.org> wrote:
> >>>> Mon Sep 20 22:15:19 2010: Request 61507 was acted upon. > >>>> Transaction: Ticket created by CHIPS > >>>> Queue: XML-LibXML > >>>> Subject: Memory leak after enabling full recovery and
> >> suppressing everything
> >>>> Broken in: 1.70 > >>>> Severity: (no value) > >>>> Owner: Nobody > >>>> Requestors: chip@pobox.com > >>>> Status: new > >>>> Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=61507 > > >>>> > >>>> > >>>> In production code at Topsy (my day job), I changed many usages
> of
> >> XML::LibXML that looked roughly like:
> >>>> my $xml = XML::LibXML->new(); > >>>> $parser->recover(1); > >>>> my $dom = $xml->parse_html_string($self->content); > >>>> > >>>> to: > >>>> > >>>> my $xml = XML::LibXML->new(); > >>>> my $dom = $xml->parse_html_string($self->content, { recover =>
> >> 2, suppress_errors => 1, suppress_warnings =>
> >>>> 1 }); > >>>> > >>>> After this change, the programs in question started leaking
> memory
> >> like a sieve. After we reverted the change,
> >>>> the leak stopped. This symptom occured with both versions 1.69
> and
> >> 1.70.
> >>>> What more detail or diagnostics would you like? > >>>>
> > > > > >
Subject: Re: [rt.cpan.org #61507] Memory leak after enabling full recovery and suppressing everything
Date: Tue, 05 Jul 2011 14:28:40 -0700
To: bug-XML-LibXML [...] rt.cpan.org
From: Chip Salzenberg <chip [...] pobox.com>
On 7/5/2011 6:38 AM, Shlomi Fish via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=61507 > > > On Mon Sep 27 02:39:47 2010, CHIPS wrote:
>> OK, here's the patch. The basic bug is that REPORT_ERROR and company >> can end up dying, which leaves various reference-counted RETVALs >> uncollected. This patch fixes the RETVALs that are SVs, but there are >> some other libxml data structure return values that this patch does >> not fix. >> >> PS: I wasn't consistent in my tab usage. Mea culpa
> Hi Chip! > > Thanks for your patch, but can we also have a test added to it [...]
No, I'm sorry, I won't be contributing further. I'm stuck using XML::LibXML 1.69 because all subsequent versions have been unusable, and I expect eventually I'll be forced to stop using XML::LibXML entirely due to bit rot. If you'd be willing to revert back to 1.69 and start applying patches more carefully, I might consider it.
Hi Chip, I apologise for the late response. On Tue Jul 05 17:29:02 2011, CHIPS wrote: Show quoted text
> On 7/5/2011 6:38 AM, Shlomi Fish via RT wrote:
> > <URL: https://rt.cpan.org/Ticket/Display.html?id=61507 > > > > > On Mon Sep 27 02:39:47 2010, CHIPS wrote:
> >> OK, here's the patch. The basic bug is that REPORT_ERROR and company > >> can end up dying, which leaves various reference-counted RETVALs > >> uncollected. This patch fixes the RETVALs that are SVs, but there are > >> some other libxml data structure return values that this patch does > >> not fix. > >> > >> PS: I wasn't consistent in my tab usage. Mea culpa
> > Hi Chip! > > > > Thanks for your patch, but can we also have a test added to it [...]
> > No, I'm sorry, I won't be contributing further. I'm stuck using > XML::LibXML 1.69 because all subsequent versions have been unusable, > and > I expect eventually I'll be forced to stop using XML::LibXML entirely > due to bit rot.
Can you provide tests in which versions higher than 1.69 fail, and are being unusable in this respect? 1.70 was released while Petr Pajas was still the maintainer, and that's roughly what we started with. Show quoted text
> If you'd be willing to revert back to 1.69 and start > applying patches more carefully, I might consider it.
Well, a lot of work was done since 1.69 in bug fixing, automated tests, cleanups, documentation, and to a lesser extent - feature additions. We cannot afford to throw it all away now. If you can take the BitBucket version as a starting point and proceed from there, that would be the best solution. Regards, -- Shlomi Fish
Subject: Re: [rt.cpan.org #61507] Memory leak after enabling full recovery and suppressing everything
Date: Fri, 08 Jul 2011 08:24:48 -0700
To: bug-XML-LibXML [...] rt.cpan.org
From: Chip Salzenberg <chip [...] pobox.com>
On 7/8/2011 4:45 AM, Shlomi Fish via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=61507 > > > Hi Chip, > > I apologise for the late response. > > On Tue Jul 05 17:29:02 2011, CHIPS wrote:
>> On 7/5/2011 6:38 AM, Shlomi Fish via RT wrote:
>>> <URL: https://rt.cpan.org/Ticket/Display.html?id=61507 > >>> >>> On Mon Sep 27 02:39:47 2010, CHIPS wrote:
>>>> OK, here's the patch. The basic bug is that REPORT_ERROR and company >>>> can end up dying, which leaves various reference-counted RETVALs >>>> uncollected. This patch fixes the RETVALs that are SVs, but there are >>>> some other libxml data structure return values that this patch does >>>> not fix. >>>> >>>> PS: I wasn't consistent in my tab usage. Mea culpa
>>> Hi Chip! >>> >>> Thanks for your patch, but can we also have a test added to it [...]
>> No, I'm sorry, I won't be contributing further. I'm stuck using >> XML::LibXML 1.69 because all subsequent versions have been unusable, > and >> I expect eventually I'll be forced to stop using XML::LibXML entirely >> due to bit rot.
> Can you provide tests in which versions higher than 1.69 fail, and are > being unusable in this respect? 1.70 was released while Petr Pajas was > still the maintainer, and that's roughly what we started with.
If you scrape a lot of pages with 1.70, the failures are mostly memory leaks, a metric ton of 'em - enough that we can't use it. I think we had some crashes too, but I'm vague on that. Show quoted text
>> If you'd be willing to revert back to 1.69 and start >> applying patches more carefully, I might consider it.
> Well, a lot of work was done since 1.69 in bug fixing, automated tests, > cleanups, documentation, and to a lesser extent - feature additions. We > cannot afford to throw it all away now. If you can take the BitBucket > version as a starting point and proceed from there, that would be the > best solution.
Could you replay your improvements from a base of 1.69 rather than from 1.70+? Probably not... Ah well. Maybe someday I'll have the spare time to try again.
On Fri Jul 08 11:25:12 2011, CHIPS wrote: Show quoted text
> On 7/8/2011 4:45 AM, Shlomi Fish via RT wrote:
> > <URL: https://rt.cpan.org/Ticket/Display.html?id=61507 > > > > > Hi Chip, > > > > I apologise for the late response. > > > > On Tue Jul 05 17:29:02 2011, CHIPS wrote:
> >> On 7/5/2011 6:38 AM, Shlomi Fish via RT wrote:
> >>> <URL: https://rt.cpan.org/Ticket/Display.html?id=61507 > > >>> > >>> On Mon Sep 27 02:39:47 2010, CHIPS wrote:
> >>>> OK, here's the patch. The basic bug is that REPORT_ERROR and
> company
> >>>> can end up dying, which leaves various reference-counted RETVALs > >>>> uncollected. This patch fixes the RETVALs that are SVs, but
> there are
> >>>> some other libxml data structure return values that this patch
> does
> >>>> not fix. > >>>> > >>>> PS: I wasn't consistent in my tab usage. Mea culpa
> >>> Hi Chip! > >>> > >>> Thanks for your patch, but can we also have a test added to it
> [...]
> >> No, I'm sorry, I won't be contributing further. I'm stuck using > >> XML::LibXML 1.69 because all subsequent versions have been
> unusable, > and
> >> I expect eventually I'll be forced to stop using XML::LibXML
> entirely
> >> due to bit rot.
> > Can you provide tests in which versions higher than 1.69 fail, and
> are
> > being unusable in this respect? 1.70 was released while Petr Pajas
> was
> > still the maintainer, and that's roughly what we started with.
> > If you scrape a lot of pages with 1.70, the failures are mostly memory > leaks, a metric ton of 'em - enough that we can't use it. I think we > had some crashes too, but I'm vague on that. >
> >> If you'd be willing to revert back to 1.69 and start > >> applying patches more carefully, I might consider it.
> > Well, a lot of work was done since 1.69 in bug fixing, automated
> tests,
> > cleanups, documentation, and to a lesser extent - feature additions.
> We
> > cannot afford to throw it all away now. If you can take the
> BitBucket
> > version as a starting point and proceed from there, that would be
> the
> > best solution.
> > Could you replay your improvements from a base of 1.69 rather than > from > 1.70+? Probably not... Ah well. Maybe someday I'll have the spare > time to try again. >
That would be pretty hard for us because there were many such improvements. So we'd appreciate a test against an up-to-date XML-LibXML. Regards, -- Shlomi Fish
I'm experiencing a problem with the XML::LibXML test suite that is clearly related to this bug report, so rather than create a new one, I'm updating this one. If you think this requires a new bug report, lemme know... I've just build XML::LibXML 1.88 for 5.10.1, 5.12.4, and 5.14.2, on both 32 and 64 bit AIX 6, Solaris 10, and RHEL 5, against libxml2-2.7.8. The last 2 tests in t/12html.t, which make the API calls that Chip reported as having memory leaks, cause the script to fail due to out of memory errors on the 64 bit Solaris 10 and AIX 6 platforms. They do NOT fail for any of the 32 bit builds, and they do not fail for 64 bit RHEL 5. The errors are identical for all perl versions. This leads me to conclude that the problem is in libxml2 itself, and NOT in this code. So, yeah -- this bug report needs to be made to the libxml2 folks as well. For now, I've patched t/12html.t locally and forced it to skip those steps, and I'll warn my users to watch out for using that particular API style. I don't know if you want to patch the distribution as a work around or not, so I haven't provided a patch here. I hope this additional data brings some clarity to the problem.
Hi Philip, On Tue Nov 08 17:35:29 2011, WPMOORE wrote: Show quoted text
> I'm experiencing a problem with the XML::LibXML test suite that is > clearly related to this bug > report, so rather than create a new one, I'm updating this one. If > you think this requires a > new bug report, lemme know... > > I've just build XML::LibXML 1.88 for 5.10.1, 5.12.4, and 5.14.2, on > both 32 and 64 bit AIX 6, > Solaris 10, and RHEL 5, against libxml2-2.7.8. > > The last 2 tests in t/12html.t, which make the API calls that Chip > reported as having memory > leaks, cause the script to fail due to out of memory errors on the 64 > bit Solaris 10 and AIX 6 > platforms. They do NOT fail for any of the 32 bit builds, and they > do not fail for 64 bit RHEL > 5. The errors are identical for all perl versions.
This sounds like this problem: https://rt.cpan.org/Ticket/Display.html?id=77340 This was fixed in commit 35fba7a70067f4f9393e26d9d478e761fe65b47d in the Mercurial repository, and will be released as part of XML-LibXML-1.99 later on. Can you see if on the 64-bit AIX/Solaris platforms, this patch solves the problem you have encountered? Regards, -- Shlomi Fish