Bug #116085 for Text-CSV_XS: memory leak when using offset parameter to getline

Tue Jul 12 13:24:30 2016 jason.mccarty [...] grantstreet.com - Ticket created

Subject:	memory leak when using offset parameter to getline_all()
Date:	Tue, 12 Jul 2016 13:24:14 -0400
To:	bug-Text-CSV_XS [...] rt.cpan.org
From:	Jason McCarty <jason.mccarty [...] grantstreet.com>

Hi, Consider the following script: use Text::CSV_XS; open my $fh , "<", "file.csv"; my $csv = Text::CSV_XS->new; $csv->getline_all($fh, 1000000, 0); where file.csv is a million-line plus CSV. When I vary the offset parameter, maximum resident memory appears to change proportionally. For example, I get max RSS=24700KB when the offset is 100000, and max RSS=177528KB when the offset is 1000000. I believe this is a bug, because the following script accomplishes the same thing, and memory usage doesn't change when changing the offset. use Text::CSV_XS; open my $fh , "<", "file.csv"; my $csv = Text::CSV_XS->new; for (my $i = 0; $i < 100000; $i++) { $csv->getline_all($fh, 0, 0); } Unfortunately, this is also about 14 percent slower, so I'd prefer to use the first version. Thanks, -- *Jason McCartyGrant Street Group*

Wed Jul 13 02:21:37 2016 HMBRAND [...] cpan.org - Correspondence added

I can use some help here. If I have a smaller file, like the 10000 lines I use for my perl6 timings, with this script: --8<--- use 5.18.2; use warnings; use Text::CSV_XS; my $csv = Text::CSV_XS->new ({ auto_diag => 1 }); open my $fh, "<", "/tmp/hello.csv"; my $r = $csv->getline_all ($fh, 9000, 0); -->8--- hello.csv is taken from the CSV game https://bitbucket.org/ewanhiggs/csv-game only fewer lines sh$ for i in $(seq 1 10000); do echo 'hello,","," ",world,"!"'; done > /tmp/hello.csv sh$ time perl csv.pl < /tmp/hello.csv $ export PERL_DESTRUCT_LEVEL=2 PERL_DL_NONLAZY=1 $ valgrind \ --suppressions=sandbox/perl.supp \ --leak-check=yes \ --leak-resolution=high \ --show-reachable=yes \ --num-callers=50 \ --log-fd=3 \ /pro/bin/perl \ -MPerl::Destruct::Level=level,2 \ sandbox/rt116085.pl \ Show quoted text

3>valgrind.log

That ends with ==9925== LEAK SUMMARY: ==9925== definitely lost: 0 bytes in 0 blocks ==9925== indirectly lost: 0 bytes in 0 blocks ==9925== possibly lost: 0 bytes in 0 blocks ==9925== still reachable: 4,592 bytes in 13 blocks ==9925== suppressed: 0 bytes in 0 blocks ==9925== ==9925== For counts of detected and suppressed errors, rerun with: -v ==9925== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Valgrinds manual (http://valgrind.org/docs/manual/faq.html#faq.deflost) tells me not to worry: "still reachable" means your program is probably ok -- it didn't free some memory it could have. This is quite common and often reasonable. Don't use --show-reachable=yes if you don't want to see these reports. A simple $ valgrind perl sandbox/rt116085.pl ==17010== Memcheck, a memory error detector ==17010== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==17010== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==17010== Command: perl sandbox/rt116085.pl ==17010== ==17010== ==17010== HEAP SUMMARY: ==17010== in use at exit: 2,421,076 bytes in 8,501 blocks ==17010== total heap usage: 28,427 allocs, 19,926 frees, 4,887,993 bytes allocated ==17010== ==17010== LEAK SUMMARY: ==17010== definitely lost: 26,820 bytes in 19 blocks ==17010== indirectly lost: 63,954 bytes in 27 blocks ==17010== possibly lost: 2,327,235 bytes in 8,446 blocks ==17010== still reachable: 3,067 bytes in 9 blocks ==17010== of which reachable via heuristic: ==17010== newarray : 11,288 bytes in 350 blocks ==17010== suppressed: 0 bytes in 0 blocks ==17010== Rerun with --leak-check=full to see details of leaked memory ==17010== ==17010== For counts of detected and suppressed errors, rerun with: -v ==17010== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Which might indicate what you are experiencing, but I cannot lay a finger on the root cause $ make leakcheck PERL_DESTRUCT_LEVEL=2 PERL_DL_NONLAZY=1 valgrind --suppressions=sandbox/perl.supp --leak-check=yes --leak-resolution=high --show-reachable=yes --num-callers=50 --log-fd=3 "/pro/bin/perl" "-MPerl::Destruct::Level=level,2" "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t 3>valgrind.log t/00_pod.t ........ ok t/01_pod.t ........ ok t/10_base.t ....... ok t/12_acc.t ........ ok t/15_flags.t ...... ok t/20_file.t ....... ok t/21_lexicalio.t .. ok t/22_scalario.t ... ok t/30_types.t ...... ok t/40_misc.t ....... ok t/41_null.t ....... ok t/45_eol.t ........ ok t/46_eol_si.t ..... ok t/50_utf8.t ....... ok t/51_utf8.t ....... ok t/55_combi.t ...... ok t/60_samples.t .... ok t/65_allow.t ...... ok t/70_rt.t ......... ok t/75_hashref.t .... ok t/76_magic.t ...... ok t/77_getall.t ..... ok t/78_fragment.t ... ok t/79_callbacks.t .. ok t/80_diag.t ....... ok t/81_subclass.t ... ok t/85_util.t ....... ok t/90_csv.t ........ ok t/91_csv_cb.t ..... ok All tests successful. Files=29, Tests=50529, 82 wallclock secs (71.33 usr 0.38 sys + 42.43 cusr 0.55 csys = 114.69 CPU) Result: PASS ==17030== by 0x506DE5: Perl_pp_require (in /pro/bin/perl) ==17030== by 0x4BE535: Perl_runops_standard (in /pro/bin/perl) ==17030== by 0x446A68: perl_run (in /pro/bin/perl) ==17030== by 0x420A58: main (in /pro/bin/perl) ==17030== ==17030== LEAK SUMMARY: ==17030== definitely lost: 603 bytes in 9 blocks ==17030== indirectly lost: 0 bytes in 0 blocks ==17030== possibly lost: 0 bytes in 0 blocks ==17030== still reachable: 12,168 bytes in 34 blocks ==17030== suppressed: 0 bytes in 0 blocks ==17030== ==17030== For counts of detected and suppressed errors, rerun with: -v ==17030== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0) Shows me that all leaks are in perl itself (or I misinterpret that data, which is quite possible). This is perl 5, version 24, subversion 0 (v5.24.0) built for x86_64-linux-thread-multi-ld

Wed Jul 13 02:21:37 2016 The RT System itself - Status changed from 'new' to 'open'

Thu Jul 14 08:59:09 2016 jason.mccarty [...] grantstreet.com - Correspondence added

Subject:	Re: [rt.cpan.org #116085] memory leak when using offset parameter to getline_all()
Date:	Thu, 14 Jul 2016 08:58:51 -0400
To:	bug-Text-CSV_XS [...] rt.cpan.org
From:	Jason McCarty <jason.mccarty [...] grantstreet.com>

I spent a few hours tracing through the code last night, but I couldn't identify the source either. The two lines valgrind identifies as the largest source of possible leaks are the following: "NewField;" in cx_Parse, line 1148 and "hv_store (hv, "_RECNO", 6, newSViv (++csv.recno), 0);" in cx_c_xsParse, line 1727. The first allocation I can actually eliminate using bind_columns without much effect on RSS, and the second one seems like an unlikely candidate. On Wed, Jul 13, 2016 at 2:21 AM, H.Merijn Brand via RT < bug-Text-CSV_XS@rt.cpan.org> wrote: Show quoted text

> <URL: https://rt.cpan.org/Ticket/Display.html?id=116085 > > > I can use some help here. If I have a smaller file, like the 10000 lines I > use for my perl6 timings, with this script: > > --8<--- > use 5.18.2; > use warnings; > > use Text::CSV_XS; > > my $csv = Text::CSV_XS->new ({ auto_diag => 1 }); > open my $fh, "<", "/tmp/hello.csv"; > > my $r = $csv->getline_all ($fh, 9000, 0); > -->8--- > > hello.csv is taken from the CSV game > https://bitbucket.org/ewanhiggs/csv-game only fewer lines > > sh$ for i in $(seq 1 10000); do echo 'hello,","," ",world,"!"'; done > > /tmp/hello.csv > sh$ time perl csv.pl < /tmp/hello.csv > > $ export PERL_DESTRUCT_LEVEL=2 PERL_DL_NONLAZY=1 > > $ valgrind \ > --suppressions=sandbox/perl.supp \ > --leak-check=yes \ > --leak-resolution=high \ > --show-reachable=yes \ > --num-callers=50 \ > --log-fd=3 \ > /pro/bin/perl \ > -MPerl::Destruct::Level=level,2 \ > sandbox/rt116085.pl \

> 3>valgrind.log

> > That ends with > > ==9925== LEAK SUMMARY: > ==9925== definitely lost: 0 bytes in 0 blocks > ==9925== indirectly lost: 0 bytes in 0 blocks > ==9925== possibly lost: 0 bytes in 0 blocks > ==9925== still reachable: 4,592 bytes in 13 blocks > ==9925== suppressed: 0 bytes in 0 blocks > ==9925== > ==9925== For counts of detected and suppressed errors, rerun with: -v > ==9925== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) > > Valgrinds manual (http://valgrind.org/docs/manual/faq.html#faq.deflost) > tells me not to worry: > > "still reachable" means your program is probably ok -- it didn't free some > memory it could have. This is quite common and often reasonable. Don't use > --show-reachable=yes if you don't want to see these reports. > > A simple > > $ valgrind perl sandbox/rt116085.pl > ==17010== Memcheck, a memory error detector > ==17010== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. > ==17010== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright > info > ==17010== Command: perl sandbox/rt116085.pl > ==17010== > ==17010== > ==17010== HEAP SUMMARY: > ==17010== in use at exit: 2,421,076 bytes in 8,501 blocks > ==17010== total heap usage: 28,427 allocs, 19,926 frees, 4,887,993 bytes > allocated > ==17010== > ==17010== LEAK SUMMARY: > ==17010== definitely lost: 26,820 bytes in 19 blocks > ==17010== indirectly lost: 63,954 bytes in 27 blocks > ==17010== possibly lost: 2,327,235 bytes in 8,446 blocks > ==17010== still reachable: 3,067 bytes in 9 blocks > ==17010== of which reachable via heuristic: > ==17010== newarray : 11,288 bytes in 350 > blocks > ==17010== suppressed: 0 bytes in 0 blocks > ==17010== Rerun with --leak-check=full to see details of leaked memory > ==17010== > ==17010== For counts of detected and suppressed errors, rerun with: -v > ==17010== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) > > Which might indicate what you are experiencing, but I cannot lay a finger > on the root cause > > $ make leakcheck > PERL_DESTRUCT_LEVEL=2 PERL_DL_NONLAZY=1 valgrind > --suppressions=sandbox/perl.supp --leak-check=yes --leak-resolution=high > --show-reachable=yes --num-callers=50 --log-fd=3 "/pro/bin/perl" > "-MPerl::Destruct::Level=level,2" "-MExtUtils::Command::MM" "-e" > "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t 3>valgrind.log > t/00_pod.t ........ ok > t/01_pod.t ........ ok > t/10_base.t ....... ok > t/12_acc.t ........ ok > t/15_flags.t ...... ok > t/20_file.t ....... ok > t/21_lexicalio.t .. ok > t/22_scalario.t ... ok > t/30_types.t ...... ok > t/40_misc.t ....... ok > t/41_null.t ....... ok > t/45_eol.t ........ ok > t/46_eol_si.t ..... ok > t/50_utf8.t ....... ok > t/51_utf8.t ....... ok > t/55_combi.t ...... ok > t/60_samples.t .... ok > t/65_allow.t ...... ok > t/70_rt.t ......... ok > t/75_hashref.t .... ok > t/76_magic.t ...... ok > t/77_getall.t ..... ok > t/78_fragment.t ... ok > t/79_callbacks.t .. ok > t/80_diag.t ....... ok > t/81_subclass.t ... ok > t/85_util.t ....... ok > t/90_csv.t ........ ok > t/91_csv_cb.t ..... ok > All tests successful. > Files=29, Tests=50529, 82 wallclock secs (71.33 usr 0.38 sys + 42.43 > cusr 0.55 csys = 114.69 CPU) > Result: PASS > ==17030== by 0x506DE5: Perl_pp_require (in /pro/bin/perl) > ==17030== by 0x4BE535: Perl_runops_standard (in /pro/bin/perl) > ==17030== by 0x446A68: perl_run (in /pro/bin/perl) > ==17030== by 0x420A58: main (in /pro/bin/perl) > ==17030== > ==17030== LEAK SUMMARY: > ==17030== definitely lost: 603 bytes in 9 blocks > ==17030== indirectly lost: 0 bytes in 0 blocks > ==17030== possibly lost: 0 bytes in 0 blocks > ==17030== still reachable: 12,168 bytes in 34 blocks > ==17030== suppressed: 0 bytes in 0 blocks > ==17030== > ==17030== For counts of detected and suppressed errors, rerun with: -v > ==17030== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0) > > Shows me that all leaks are in perl itself (or I misinterpret that data, > which is quite possible). > > This is perl 5, version 24, subversion 0 (v5.24.0) built for > x86_64-linux-thread-multi-ld >

-- *Jason McCartyGrant Street Group*

Sat Nov 12 08:04:18 2016 HMBRAND [...] cpan.org - Correspondence added

I am sorry and happy at the same time, but after a long session with expert we have come to the conclusion that this leak is a leak inside threaded perl CORE that has been fixed in the development tree -------- ------------------------------ ------------------------------------------------------------------------------------------ 5.24.0 /pro/bin/perl This is perl 5, version 24, subversion 0 (v5.24.0) built for x86_64-linux-thread-multi-ld -------- ------------------------------ ------------------------------------------------------------------------------------------ definitely lost: 94,662 bytes in 43 blocks indirectly lost: 69,850 bytes in 33 blocks possibly lost: 18,218,580 bytes in 62,166 blocks still reachable: 13,601 bytes in 37 blocks suppressed: 0 bytes in 0 blocks ERROR SUMMARY: 2394 errors from 2394 contexts (suppressed: 0 from 0) -------- ------------------------------ ------------------------------------------------------------------------------------------ 5.25.7 /pro/bin/perl5.25.7 This is perl 5, version 25, subversion 7 (v5.25.7 (v5.25.6-220-g67bdb7a)) built for x86_64-linux-thread-multi-ld -------- ------------------------------ ------------------------------------------------------------------------------------------ definitely lost: 0 bytes in 0 blocks indirectly lost: 0 bytes in 0 blocks possibly lost: 0 bytes in 0 blocks still reachable: 1,521 bytes in 5 blocks suppressed: 0 bytes in 0 blocks ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) This is no bug in Text::CSV_XS, so I have no other option than to reject this ticket. Still I thank you for point us to this erroneous situation. Greatly appreciated

Sat Nov 12 08:04:18 2016 HMBRAND [...] cpan.org - Status changed from 'open' to 'rejected'

Mon Nov 14 09:23:03 2016 jason.mccarty [...] grantstreet.com - Correspondence added

Subject:	Re: [rt.cpan.org #116085] memory leak when using offset parameter to getline_all()
Date:	Mon, 14 Nov 2016 09:22:49 -0500
To:	bug-Text-CSV_XS [...] rt.cpan.org
From:	Jason McCarty <jason.mccarty [...] grantstreet.com>

I'm happy to hear that it's fixed in newer perl. Thank you for investigating.

Bug #116085 for Text-CSV_XS: memory leak when using offset parameter to getline_all()