Skip Menu |

This queue is for tickets about the Pod-Spell-CommonMistakes CPAN distribution.

Report information
The Basics
Id: 61505
Status: open
Priority: 0/
Queue: Pod-Spell-CommonMistakes

People
Owner: Nobody in particular
Requestors: user42 [...] zip.com.au
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: some more common mistakes words
Date: Tue, 21 Sep 2010 11:24:29 +1000
To: bug-Pod-Spell-CommonMistakes [...] rt.cpan.org
From: Kevin Ryde <user42 [...] zip.com.au>
This is a few words I grep for myself as common spelling mistakes. I see the wordlist has commited/comited, but not omit. Is it feasible to notice a doublon like "the the"? There's lots of those which might be bad, but "the" is the most common I make. "note sure" is another phrase mistake I've made, if that's detectable. I don't think there's any normal sort of wording where it might arise :-). refering writeable nineth ommited omited requrie existant explict agument destionation the the note sure
Hello, Thanks for using this module and pointing out some more common mistakes! I normally just pull the wordlist from the Debian Lintian repo as mentioned in the POD. I've made some internal changes so I can add more words from people like you :) In the future, if you find more misspellings, could you please report it to the Lintian team? That would benefit everyone, not just CPAN users :) I'll upload an updated distribution to CPAN shortly with your wordlist. However, I don't have the tuits to add multi-word detection for cases like "the the". Patches welcome :) -- ~Apocalypse
Subject: Re: [rt.cpan.org #61505] some more common mistakes words
Date: Wed, 02 Mar 2011 09:35:17 +1100
To: bug-Pod-Spell-CommonMistakes [...] rt.cpan.org
From: Kevin Ryde <user42 [...] zip.com.au>
"Apocalypse via RT" <bug-Pod-Spell-CommonMistakes@rt.cpan.org> writes: Show quoted text
> > In the future, if you find more misspellings, could you please report > it to the Lintian team? That would benefit everyone, not just CPAN users :)
I suppose. I've never had it hit anything for me though :-) Show quoted text
> However, I don't have the tuits to add multi-word detection for cases > like "the the". Patches welcome :)
I got as far as the bit below before realizing the way the Pod::Spell parser strips "separator" things like blocks of C<> or whatnot means the words adjacent in the $words list were not necessarily adjacent in the original input. Maybe if asked nicely it could slip in "" empty strings or something for such things, or end-of-sentence if that can be spotted easily enough, etc.
--- WordList.pm.orig 2011-02-26 16:24:43.000000000 +1100 +++ WordList.pm 2011-02-26 18:11:30.000000000 +1100 @@ -610,6 +610,15 @@ "yur" => "your", ); +my %common_pairs = ( + # word pairs which are almost certainly wrong + "note sure" => "not sure", + + # not all doublings are bad, try the definitely or almost certainly bad ones + "the the" => "the", + "then then" => "then the", # wild guess at the intention +); + sub _check_common { my $words = shift; @@ -620,20 +629,41 @@ foreach my $w ( @$words ) { my $lcw = lc( $w ); if ( exists $common{ $lcw } ) { - # Determine what kind of correction we need - if ( $w =~ /^[A-Z]+$/ ) { - $err{ $w } = uc( $common{ $lcw } ); - } elsif ( $w =~ /^[A-Z]/ ) { - $err{ $w } = ucfirst( $common{ $lcw } ); - } else { - $err{ $w } = $common{ $lcw }; - } + $err{ $w } = _similar_capitalization($common{$lcw}, $w); } } + # FIXME: This is not enough. Pod::Spell parser strips various + # verbatims, C<> and bits like "and/or", so things adjacent in + # $words are not adjacent in the original. A simple check like the + # following gets false positives on for instance + # Can be combined with the C<CORE::time> and/or the current + # + # foreach my $i (0 .. $#$words-1) { + # my $pair = lc("$words->[$i] $words->[$i+1]"); + # if (exists $common_pairs{$pair}) { + # $err{$pair} = _similar_capitalization($common_pairs{$pair}, $pair); + # } + # } + return \%err; } +# Return $repl with capitalization similar to $orig. +# $repl is lower case and is munged according to orig, +# ORIG -> REPL, Orig -> Repl, otherwise lowercase repl unchanged. +# (Is this available from a generic sort of module?) +sub _similar_capitalization { + my ($repl, $orig) = @_; + if ( $orig =~ /^[A-Z]+$/ ) { + return uc($repl); + } elsif ( $orig =~ /^[A-Z]/ ) { + return ucfirst($repl); + } else { + return $repl; + } +} + 1; __END__
On Mon Sep 20 21:24:56 2010, user42@zip.com.au wrote: Show quoted text
> writeable
This is not a spelling mistake; it's a valid alternate spelling, and the one I would use in preference to 'writable'. (My en_uk spelling checker agrees!) Please revert this change. Andrew
Subject: Re: [rt.cpan.org #61505] some more common mistakes words
Date: Wed, 31 Aug 2011 09:03:26 +1000
To: bug-Pod-Spell-CommonMistakes [...] rt.cpan.org
From: Kevin Ryde <user42 [...] zip.com.au>
"Andrew Pam via RT" <bug-Pod-Spell-CommonMistakes@rt.cpan.org> writes: Show quoted text
> > This is not a spelling mistake; it's a valid alternate spelling, and the > one I would use in preference to 'writable'. (My en_uk spelling checker > agrees!)
I believe it's a mistake, just one very easily made in computer jargon where it seems more regular, but it's usual to drop the e from such forms. (It's definitely writable not writeable in the freely available 1913 Websters, and rumour has it likewise in the shorter Oxford.)
Hello, I see your point! However, I think the addition is valid for the general case. What I would suggest is that you add your spelling as a stopword or skipword or whatever in the Pod testing framework you are using. For example: use Test::Spelling; add_stopwords(qw( writeable )); all_pod_files_spelling_ok(); use Pod::Spell; =pod =for stopwords writeable =cut Does that make sense? Thanks again for your report :) On Mon Aug 29 00:38:25 2011, xanni wrote: Show quoted text
> On Mon Sep 20 21:24:56 2010, user42@zip.com.au wrote:
> > writeable
> > This is not a spelling mistake; it's a valid alternate spelling, and the > one I would use in preference to 'writable'. (My en_uk spelling checker > agrees!) > > Please revert this change. > > Andrew >
-- ~Apocalypse