Skip Menu |

This queue is for tickets about the Lingua-EN-Numbers CPAN distribution.

Report information
The Basics
Id: 118691
Status: open
Priority: 0/
Queue: Lingua-EN-Numbers

People
Owner: NEILB [...] cpan.org
Requestors: TIMB [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: num2en("00") returns "-zero" plus an undef warning
Due to this code in _int2en: return $D{$1 . '0'} . '-' . $D{$2}; and the %D has not having an entry for "00". Seems reasonable for any number of 0's to be mapped to "zero"s, so "00" -> "zero-zero", "000" -> "zero-zero-zero". p.s. Thanks for the code. Very handy for my current work.
Yep, seems reasonable, I'll do a release in the next day or so. Cheers, Neil
Sat down to look at this again, thinking "ah yeah, multiple leading zeroes should be compressed down to a single zero", and discovered that wasn't what you had suggested. And after thinking about it, it isn't clear what the right thing to do is. Consider the following cases. 00.1 I think a person would say "nought point one" or "zero point one". 007 Ok, this is an intentionally funny case, but here people would say "oh oh seven" or "zero zero seven". 0700 Here I think someone might say "oh seven hundred" So then I thought about what exactly is this module doing? Converting *numbers* (not digit strings, for example) into words. So I think: 00.1 should be treated as 0.1 007 should be treated as 7 0700 should be treated as 700 What do you think?
Show quoted text
>Consider the following cases. > > 00.1 I think a person would say "nought point one" or "zero point one". > 007 Ok, this is an intentionally funny case, but here people would say "oh oh seven" or "zero zero seven". > 0700 Here I think someone might say "oh seven hundred" > > So then I thought about what exactly is this module doing? Converting > *numbers* (not digit strings, for example) into words. > > So I think: > > 00.1 should be treated as 0.1 > 007 should be treated as 7 > 0700 should be treated as 700 > > What do you think?
It hangs on the definition "numbers" and I don't think there would be one solution that would suit all cases. More generally I'd suggest that the module is for converting *number-like strings* into a corresponding sequence of words that aims to match *what a human would say when reading that string*. In my case I'm using it to normalize transcripts so I can compare them. Some transcripts are written by humans, and others by software, all interpreting the same audio. I can see your point that the "num" in num2en suggests that the argument should be numeric (i.e. IV/NV) and arbitrary strings could be assumed to be converted to a number first, e.g. via +=0. If you take that approach then I think there's a clear need for a extra sub that takes a "number-like string" instead. That sub, or perhaps num2en with an extra param, could do the rough equivalent of if (m/^0/) { print "zero " while s/^0//; # handle leading zeros $spell_out_each_digit = 1; # new feature :) } So 0700 would be "zero seven zero zero" not "zero seven hundred". Tim.
Another example I just encountered: times/durations like "18:07". Split on word boundaries that's "18" then "07". The "18" returns "eighteen". The "07" returns "-seven" plus an undef warning. Returning "eighteen" then "zero-seven" would be fine.