Skip Menu |

This queue is for tickets about the Text-Context CPAN distribution.

Report information
The Basics
Id: 11989
Status: open
Priority: 0/
Queue: Text-Context

People
Owner: Nobody in particular
Requestors: smithm [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Date: Thu, 24 Mar 2005 12:43:15 +0000
From: Michael Smith <smithm [...] gmail.com>
To: bug-Text-Context [...] rt.cpan.org
Subject: Performance probs with Text::Context
Hello there, Many thanks fo the work on Text::Context - it's really useful. However I occasionally have some performance issues with it, that very much seem to depend on the string that I'm producing a snippet from. I attach an example with a string of only around 5K that takes 30 sec on my (admittedly not very fast) server. Generally it produces the snippet in a split second - so I guess there must be something in this particular string. The profiler gives the following output: Total Elapsed Time = 31.13159 Seconds User+System Time = 30.98159 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 77.3 23.95 23.955 467 0.0513 0.0513 Text::Context::EitherSide::as_spar se_list 19.1 5.931 29.886 467 0.0127 0.0640 Text::Context::EitherSide::as_list 1.89 0.585 30.823 1 0.5853 30.823 Text::Context::Para::slim 0.90 0.279 30.157 467 0.0006 0.0646 Text::Context::EitherSide::as_stri ng 0.23 0.070 0.065 467 0.0001 0.0001 Text::Context::EitherSide::new 0.23 0.070 0.148 6 0.0116 0.0247 Text::Context::Para::BEGIN 0.13 0.040 0.050 3 0.0133 0.0166 Text::Context::BEGIN 0.10 0.030 0.030 2 0.0150 0.0150 DynaLoader::BEGIN 0.10 0.030 30.238 467 0.0001 0.0647 Text::Context::EitherSide::get_con text 0.06 0.020 0.019 6 0.0033 0.0032 Text::Context::EitherSide::BEGIN 0.03 0.010 0.010 1 0.0100 0.0100 warnings::BEGIN 0.03 0.010 0.010 5 0.0020 0.0020 vars::import 0.03 0.010 0.010 5 0.0020 0.0020 Exporter::import 0.03 0.010 0.010 2 0.0050 0.0050 Text::Context::_set_intersection 0.03 0.010 0.060 1 0.0100 0.0598 main::BEGIN Any ideas as to what might be the issue with this particular string? I'm continuing to look at it but any suggestions would be appreciated. Many thanks Michael Smith

Message body is not shown because sender requested not to inline it.

On 2005-03-24 07:50:49, smithm@gmail.com wrote: Show quoted text
> However I occasionally have some performance issues with it, that very > much seem to depend on the string that I'm producing a snippet from.
It's been quite while since this was reported but I have a similar problem. Are you still watching this bug, Michael? And did you get any further with the diagnosis? I've downloaded the test program and can confirm that I get slow performance from it too. I modified the test program to use the data that is causing me a problem. Sadly, it ran quickly!. My problem is with as_html rather than as_text so I added a call to that and it still ran quickly. It still runs slowly in my real application but I don't yet know what the difference is. Unfortunately, I can't upload my data, since it contains personal information about real people. I can say that my search string is a five letter word that occurs just once in the data, and that my data is just a bit longer than Michael's test data. I'll carry on investigating, but I thought I'd see if anybody else had any ideas.
I wrote: Show quoted text
> I'll carry on investigating, but I thought I'd see if anybody else had > any ideas.
I think I've found the cause, which is lack of paragraphs. Text::Context uses the pattern /\n\n/ to split the text into paragraphs. Michael's test file doesn't contain paragraphs; the text just runs on. My test file does contain paragraphs but somehow they become separated by the DOS-like "\r\n\r\n" sequence. (It's puzzling, because the original file doesn't contain that sequence and I'm on Linux, but I expect I will figure that out eventually). Text::Context doesn't cope very gracefully. In my case, there are 1339 words in the file. T::C::paras is called once and calls T::C::Para::slim once, but it calls T::C::EitherSide::get_context 664 times (1339/2) and down the line the inner loop (for my $subword) of T::C::EitherSide::as_sparse_list is run 889096 times. So the moral of the story is to make sure the text contains 'reasonable' paragraphs. I suggest a documentation patch adding something like =head2 CAVEAT Text::Context splits the text into paragraphs. It can be very slow if a long text does not contain paragraphs.
Subject: Re: [rt.cpan.org #11989] Performance probs with Text::Context
Date: Thu, 27 Oct 2011 17:13:36 +0100
To: bug-Text-Context [...] rt.cpan.org
From: Michael Smith <smithm [...] gmail.com>
Hi Dave, Thanks for the emails. To be honest this was a long time ago, in fact it was a couple of jobs ago .. so whilst I am still using perl I'm sadly not using this module anymore. But thanks for looking in to it! :) Kind regards Michael On Thu, Oct 27, 2011 at 3:09 PM, Dave Howorth via RT < bug-Text-Context@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=11989 > > > I wrote:
> > I'll carry on investigating, but I thought I'd see if anybody else had > > any ideas.
> > I think I've found the cause, which is lack of paragraphs. Text::Context > uses the pattern /\n\n/ to split the text into paragraphs. Michael's > test file doesn't contain paragraphs; the text just runs on. My test > file does contain paragraphs but somehow they become separated by the > DOS-like "\r\n\r\n" sequence. (It's puzzling, because the original file > doesn't contain that sequence and I'm on Linux, but I expect I will > figure that out eventually). > > Text::Context doesn't cope very gracefully. In my case, there are 1339 > words in the file. T::C::paras is called once and calls T::C::Para::slim > once, but it calls T::C::EitherSide::get_context 664 times (1339/2) and > down the line the inner loop (for my $subword) of > T::C::EitherSide::as_sparse_list is run 889096 times. > > So the moral of the story is to make sure the text contains 'reasonable' > paragraphs. > > I suggest a documentation patch adding something like > > =head2 CAVEAT > > Text::Context splits the text into paragraphs. It can be very slow if a > long text does not contain paragraphs. >