Skip Menu |

This queue is for tickets about the Lingua-StarDict-Gen CPAN distribution.

Report information
The Basics
Id: 62157
Status: open
Priority: 0/
Queue: Lingua-StarDict-Gen

People
Owner: Nobody in particular
Requestors: n [...] shaplov.ru
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Lingua::Stardict::Gen generates wrong .idx file
Date: Fri, 15 Oct 2010 10:27:36 +0400
To: bug-Lingua-StarDict-Gen [...] rt.cpan.org
From: Nikolay Shaplov <n [...] shaplov.ru>
See two attached examples: In dictionary generated by wrong_dict1.pl you won't be able to view any of aaa** articles. In dictionary generated by wrong_dic21.pl you will not see B00 That's because of my @keys =(); { no locale; @keys = sort (keys %{$hash}); } which sorts list not in the way StarDict expects... I've reimplemented in pure perl all the functions that StarDict uses for comparing entry titles, they are in sort_functions attached file. After including these functions in your module sorting code should be something like this: my @keys =(); @keys = sort {stardict_strcmp($a,$b)} (keys %{$hash}); May be you will be also interested in looking at svn://svn.nataraj.su/wiktionary-export/trunk/StarDict This is StarDict writer module I've written inspired by Lingua::Stardict::Gen (I needed some capabilities that Lingua::Stardict::Gen did not provide). I've made some code improvement like changing { use bytes; length($string) } into bytes::length($string) and some others. May be it would be also useful...
Download sort_functions
application/octet-stream 999b

Message body not shown because it is not plain text.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Subject: Re: [rt.cpan.org #62157] Lingua::Stardict::Gen generates wrong .idx file
Date: Thu, 21 Oct 2010 09:04:18 +0100
To: bug-Lingua-StarDict-Gen [...] rt.cpan.org
From: Jose Joao Dias de Almeida <jj [...] di.uminho.pt>
Nikolav, Thank you very much for everything. I have tried several sorts, and never got it right. I still have some doubts about the sort used stardict, but your versions works find for all the cases I have (as long as I can tell) In several languages, the upper case functionality is more complex than " $s1=~s/([A-Z])/lc($1)/ge;" for example wordes with letters with accent (Álvaro) or russian names. Apparently this is working fine even for those situations. Is this correct? (I will cpan a new version very soon) Um abraço JJoao On 10/15/2010 07:29 AM, Nikolay Shaplov via RT wrote: Show quoted text
> Fri Oct 15 02:29:57 2010: Request 62157 was acted upon. > Transaction: Ticket created by Shaplov > Queue: Lingua-StarDict-Gen > Subject: Lingua::Stardict::Gen generates wrong .idx file > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: n@shaplov.ru > Status: new > Ticket<URL: http://rt.cpan.org/Ticket/Display.html?id=62157> > > > > See two attached examples: > > In dictionary generated by wrong_dict1.pl you won't be able to view any > of aaa** articles. > > In dictionary generated by wrong_dic21.pl you will not see B00 > > That's because of > > my @keys =(); > { no locale; > @keys = sort (keys %{$hash}); > } > > which sorts list not in the way StarDict expects... > > I've reimplemented in pure perl all the functions that StarDict uses > for comparing entry titles, they are in sort_functions attached file. > > After including these functions in your module sorting code should be something like this: > > my @keys =(); > @keys = sort {stardict_strcmp($a,$b)} (keys %{$hash}); > > May be you will be also interested in looking at > svn://svn.nataraj.su/wiktionary-export/trunk/StarDict > > This is StarDict writer module I've written inspired by > Lingua::Stardict::Gen (I needed some capabilities that > Lingua::Stardict::Gen did not provide). I've made some code improvement > like changing { use bytes; length($string) } into bytes::length($string) and some others. > May be it would be also useful... > > > >
Subject: Re: [rt.cpan.org #62157] Lingua::Stardict::Gen generates wrong .idx file
Date: Thu, 21 Oct 2010 12:14:54 +0400
To: bug-Lingua-StarDict-Gen [...] rt.cpan.org
From: Nikolay Shaplov <n [...] shaplov.ru>
В Thu, 21 Oct 2010 04:04:34 -0400 "Jose Joao Dias de Almeida via RT" <bug-Lingua-StarDict-Gen@rt.cpan.org> пишет: Show quoted text
> <URL: http://rt.cpan.org/Ticket/Display.html?id=62157 > > > Nikolav, > Thank you very much for everything. > I have tried several sorts, and never got it right. > I still have some doubts about the sort used stardict, but your > versions works find for all the cases I have (as long as I can tell) > > In several languages, the upper case functionality is more complex > than " $s1=~s/([A-Z])/lc($1)/ge;" > for example wordes with letters with accent (Álvaro) or russian > names.
StarDict does not sort such letters case-sensitive. I've reimplemented string comparing function _exactly_ it were written in StarDict. I've found this place in source code, and wrote exact pure perl implementation. Show quoted text
> > Apparently this is working fine even for those situations. Is this > correct?