Skip Menu |

This queue is for tickets about the Text-Hunspell CPAN distribution.

Report information
The Basics
Id: 92820
Status: open
Priority: 0/
Queue: Text-Hunspell

People
Owner: cosimo [...] cpan.org
Requestors: JMERELO [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Problems with words with a tilde?
Don't know really what's the problem. I'm checking a text from the command line that includes Spanish words such as "móvil" or "así" and it's OK. But it fails from the "check" function, for instance, here: DB<9> p $speller->check('móvil') 0 DB<10> p $speller->check('más') 0 I'll try and add additional tests to see what's the problem, but meanwhile it's kind of a nuisance since we really use _lots_ of words with that stuff...
Thanks, will have a look at this.
Il Ven 07 Feb 2014 14:08:20, JMERELO ha scritto: Show quoted text
> Don't know really what's the problem. I'm checking a text from the > command line that includes Spanish words such as "móvil" or "así" and > it's OK. But it fails from the "check" function, for instance, here: > DB<9> p $speller->check('móvil') > 0 > > DB<10> p $speller->check('más') > 0 > > I'll try and add additional tests to see what's the problem, but > meanwhile it's kind of a nuisance since we really use _lots_ of words > with that stuff...
Very late reply, but I'm looking at this now. Could you send me the dictionary files you are using, or maybe a smaller dictionary file that I can use to replicate the issue? Meanwhile, I can build myself one.
El Jue May 29 10:47:39 2014, COSIMO escribió: Show quoted text
> Il Ven 07 Feb 2014 14:08:20, JMERELO ha scritto:
> > Don't know really what's the problem. I'm checking a text from the > > command line that includes Spanish words such as "móvil" or "así" and > > it's OK. But it fails from the "check" function, for instance, here: > > DB<9> p $speller->check('móvil') > > 0 > > > > DB<10> p $speller->check('más') > > 0 > > > > I'll try and add additional tests to see what's the problem, but > > meanwhile it's kind of a nuisance since we really use _lots_ of words > > with that stuff...
> > Very late reply, but I'm looking at this now. > > Could you send me the dictionary files you are using, or maybe a > smaller dictionary file that I can use to replicate the issue? >
I was using the standard Spanish dictionary from the Ubuntu repos. JJ
Il Gio 29 Maggio 2014 10:53:43, JMERELO ha scritto: Show quoted text
> I was using the standard Spanish dictionary from the Ubuntu repos.
Ok, I tried building a simple test case using the spanish dictionary I found here: https://github.com/SublimeText/Dictionaries/blob/master/Spanish.dic with the following code: https://github.com/cosimo/perl5-text-hunspell/commit/b787fae2340787d69e62fd19c55e9788417aa027 and it's passing for me with the current code... Can you try with the latest commit from the github repository, to see if that test fails for you (t/09-rt92820.t)?
El Jue May 29 11:04:53 2014, COSIMO escribió: Show quoted text
> Il Gio 29 Maggio 2014 10:53:43, JMERELO ha scritto: >
> > I was using the standard Spanish dictionary from the Ubuntu repos.
> > Ok, I tried building a simple test case using the spanish dictionary I > found here: > > https://github.com/SublimeText/Dictionaries/blob/master/Spanish.dic > > with the following code: > > https://github.com/cosimo/perl5-text- > hunspell/commit/b787fae2340787d69e62fd19c55e9788417aa027 > > and it's passing for me with the current code... > Can you try with the latest commit from the github repository, to see > if that test fails for you (t/09-rt92820.t)?
Working without a problem: DB<1> p $speller->check('qué') 1 DB<2> p $speller->check('más') 1 DB<3> p $speller->check('móvil') - So, fixed. Thanks!
I'm getting this error again using the default dictionaries that come with Ubuntu in the package "myspell_es". I'm not sure what's going on here, but using hunspell from the command line has no trouble while using the ones I think are the same dictionaries with the module do not. I'll try with several dictionary files to see which one is correct.
El Sáb Ago 09 14:50:49 2014, JMERELO escribió: Show quoted text
> I'm getting this error again using the default dictionaries that come > with Ubuntu in the package "myspell_es". I'm not sure what's going on > here, but using hunspell from the command line has no trouble while > using the ones I think are the same dictionaries with the module do > not. I'll try with several dictionary files to see which one is > correct.
In fact, hunspell is using the dicts in /usr/share/hunspell. I have checked them and the only thing I see is that they are in latin-1, instead of UTF8. That might be a problem with those Spanish word, that would not be the same byte-by-byte.
The program I attach fails in the accented words: 'cuando' found in the dictionary 'cómo' not found in the dictionary! 'dónde' not found in the dictionary! 'que' found in the dictionary
Subject: test-hunspell.pl
#!/usr/bin/perl use Text::Hunspell; # You can use relative or absolute paths. my $speller = Text::Hunspell->new( "/usr/share/hunspell/es.aff", # Hunspell affix file "/usr/share/hunspell/es.dic" # Hunspell dictionary file ); die unless $speller; # Check a word against the dictionary for my $word ( qw( cuando cómo dónde que) ) { print $speller->check($word) ? "'$word' found in the dictionary\n" : "'$word' not found in the dictionary!\n"; }
In fact, the program I attach here works. Not sure whethere this would be then a code bug or a documentation bug. Maybe just advise users that words should be encoded in the same charset as the dictionary they will be using, which seems to be latin1 for all default dicts in hunspell.
Subject: test-hunspell.pl
#!/usr/bin/perl use Text::Hunspell; use Encode::Encoder qw(encoder); # You can use relative or absolute paths. my $speller = Text::Hunspell->new( "/usr/share/hunspell/es.aff", # Hunspell affix file "/usr/share/hunspell/es.dic" # Hunspell dictionary file ); die unless $speller; # Check a word against the dictionary for my $word ( qw( cuando cómo dónde que) ) { print $speller->check(encoder($word)->latin1) ? "'$word' found in the dictionary\n" : "'$word' not found in the dictionary!\n"; }