On 2016-01-09 22:08:38, PERLANCAR wrote:
Show quoted text> On Sat, 9 Jan 2016 07:46:30 GMT, SREZIC wrote:
> > On 2016-01-08 22:33:31, PERLANCAR wrote:
> > > On Fri, 8 Jan 2016 22:15:03 GMT, SREZIC wrote:
> > > > It would be nice if there was a dataset with characters >=
> > > > \x{0100}.
> > > > Especially it would also be interesting if unicode is supported at
> > > > all
> > > > by the participants.
> > >
> > > Yes it would. BTW, could you think of a pair of Unicode texts that
> > > might give different distance answer when fed to a Unicode-supporting
> > > vs non-Unicode-supporting module?
> >
> > It seems that Text::LevenshteinXS does not support Unicode correctly.
> > The correct answer would be 1 here:
> >
> > $ perl5.22.1 -MText::LevenshteinXS=distance -e 'warn distance("Euro",
> > "\x{20ac}uro")'
> > 3 at -e line 1.
> > $ perl5.22.1 -MText::Levenshtein::XS=distance -e 'warn
> > distance("Euro", "\x{20ac}uro")'
> > 1 at -e line 1.
>
> Thanks, added.
Well, I have to re-open this ticket. I would have expected that the euro+Text::LevenshteinXS line wouldn't appear in the table, because the result is wrong with this module and thus the benchmark results misleading. I see that the expected result is included in the scenario description --- how about checking if the got result matches the expected result and mark the row specially? E.g. it could look like this (the wrong result moved to the top, and no benchmark numbers shown):
+-----+-------------------------------------------------------------------------------+----------+----------+---------+---------+
| seq | name | rate | time | errors | samples |
+-----+-------------------------------------------------------------------------------+----------+----------+---------+---------+
| 3 | {dataset=>"euro",participant=>"Text::LevenshteinXS::distance"} | -- wrong result -- |
| 4 | {dataset=>"euro",participant=>"Text::Levenshtein::Damerau::PP::pp_edistance"} | 1.71e+04 | 58.4μs | 1.1e-07 | 20 |
| 1 | {dataset=>"euro",participant=>"Text::Levenshtein::fastdistance"} | 18669 | 53.565μs | 9.6e-10 | 20 |
| 0 | {dataset=>"euro",participant=>"PERLANCAR::Text::Levenshtein::editdist"} | 3.13e+04 | 31.9μs | 1.3e-08 | 20 |
| 5 | {dataset=>"euro",participant=>"Text::Levenshtein::Damerau::XS::xs_edistance"} | 3.6e+05 | 2.8μs | 1e-08 | 20 |
| 2 | {dataset=>"euro",participant=>"Text::Levenshtein::XS::distance"} | 3.8e+05 | 2.63μs | 4.2e-09 | 20 |
+-----+-------------------------------------------------------------------------------+----------+----------+---------+---------+
Or just remove Text::LevenshteinXS completely from the participants.
BTW, it seems that the expected results are wrong --- the expected result for the euro dataset is listed as 2, but it should be 1.