Bug #105144 for KinoSearch1: \C is deprecated in regex

Wed Jun 10 01:37:45 2015 SREZIC [...] cpan.org - Ticket created

Subject:

\C is deprecated in regex

See subject. perldelta.pod in perl 5.22.0 says: | New Warnings | * \C is deprecated in regex | | (D deprecated) The "/\C/" character class was deprecated in v5.20, | and now emits a warning. It is intended that it will become an error | in v5.24. This character class matches a single byte even if it | appears within a multi-byte character, breaks encapsulation, and can | corrupt UTF-8 strings. It seems that this regexp construct is used in KinoSearch1. See http://www.cpantesters.org/cpan/report/cdf8b2d8-0af3-11e5-b53d-b00fe0bfc7aa for a sample report containing this warning.

Mon Aug 31 01:15:09 2015 SREZIC [...] cpan.org - Correspondence added

On 2015-06-10 01:37:45, SREZIC wrote: Show quoted text

> See subject. perldelta.pod in perl 5.22.0 says: > > | New Warnings > | * \C is deprecated in regex > | > | (D deprecated) The "/\C/" character class was deprecated in > v5.20, > | and now emits a warning. It is intended that it will become an > error > | in v5.24. This character class matches a single byte even if it > | appears within a multi-byte character, breaks encapsulation, > and can > | corrupt UTF-8 strings. > > It seems that this regexp construct is used in KinoSearch1. See > http://www.cpantesters.org/cpan/report/cdf8b2d8-0af3-11e5-b53d- > b00fe0bfc7aa for a sample report containing this warning.

The deprecation warning turned into an error with perl 5.23.x: \C no longer supported in regex; marked by <-- HERE in m/ \A ( \ <-- HERE C{0,66}? \.\s+ ) / at /tmpfs/.cpan-build/2015083100/KinoSearch1-1.01-bbRNrq/blib/lib/KinoSearch1/Highlight/Highlighter.pm line 87. # Looks like you planned 9 tests but ran 3. # Looks like your test exited with 9 just after 3. t/303-highlighter.t ........... Dubious, test returned 9 (wstat 2304, 0x900) Failed 6/9 subtests

Thu May 19 11:10:32 2016 ppisar [...] redhat.com - Correspondence added

From:

ppisar [...] redhat.com

Dne St 10.čen.2015 01:37:45, SREZIC napsal(a): Show quoted text

> See subject. perldelta.pod in perl 5.22.0 says: > > | New Warnings > | * \C is deprecated in regex > | > | (D deprecated) The "/\C/" character class was deprecated in > v5.20, > | and now emits a warning. It is intended that it will become an > error > | in v5.24. This character class matches a single byte even if it > | appears within a multi-byte character, breaks encapsulation, > and can > | corrupt UTF-8 strings. > > It seems that this regexp construct is used in KinoSearch1. See > http://www.cpantesters.org/cpan/report/cdf8b2d8-0af3-11e5-b53d- > b00fe0bfc7aa for a sample report containing this warning.

Attached patch fixes it.

Subject:

KinoSearch1-1.01-Do-not-use-C-in-regexps.patch

From 90b55f6267fa139df653147a106c8a58925fd451 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Petr=20P=C3=ADsa=C5=99?= <ppisar@redhat.com> Date: Thu, 19 May 2016 17:02:21 +0200 Subject: [PATCH] Do not use \C in regexps MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Pelr 5.24.0 removed support for \C (bytes positions). This patch rewrites the tests for the ungreedy sequence of bytes with a miximum size. CPAN RT#105144 Signed-off-by: Petr PÃsaÅ <ppisar@redhat.com> --- lib/KinoSearch1/Highlight/Highlighter.pm | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/lib/KinoSearch1/Highlight/Highlighter.pm b/lib/KinoSearch1/Highlight/Highlighter.pm index bb8f910..50faca7 100644 --- a/lib/KinoSearch1/Highlight/Highlighter.pm +++ b/lib/KinoSearch1/Highlight/Highlighter.pm @@ -84,32 +84,39 @@ sub generate_excerpt { $text = bytes::substr( $text, $top ); # try to start the excerpt at a sentence boundary - if ($text =~ s/ + if ($text =~ / \A ( - \C{0,$limit}? + (.*?) \.\s+ ) - //xsm + /xsm + and bytes::length($2) <= $limit ) { - $top += bytes::length($1); + my $bytes_length = bytes::length($1); + $text = bytes::substr($text, $bytes_length); + $top += $bytes_length; } # no sentence boundary, so we'll need an ellipsis else { # skip past possible partial tokens, prepend an ellipsis - if ($text =~ s/ + if ($text =~ / \A ( - \C{0,$limit}? # don't go outside the window + (.*?) # don't go outside the window $token_re # match possible partial token .*? # ... and any junk following that token ) (?=$token_re) # just before the start of a full token... - /... /xsm # ... insert an ellipsis + /xsm + and bytes::length($2) <= $limit # don't go outside the window ) { - $top += bytes::length($1); + my $bytes_length = bytes::length($1); + # ... insert an ellipsis + $text = '... ' . bytes::substr($text, $bytes_length); + $top += $bytes_length; $top -= 4 # three dots and a space } } -- 2.5.5

Thu May 19 11:10:32 2016 The RT System itself - Status changed from 'new' to 'open'