Thanks for the prompt reply. I'm working on a method to detect "partial duplicates" in the output of relational queries that do a series of left outer joins. These queries can produce outputs such as the following (upper case A, B, C are column names; lower case are example data: note that query results produced using DBI, NULL is represented as undef) :
A B C
a b c
a b NULL
a NULL c
The 2nd and 3rd rows are redundant with the 1st and are what I'm calling "partial duplicates".
There is an obvious brute force solution: sort the rows by number of NULLs (already done in the example), then compare each row to the ones below it. I'm trying to devise a more efficient solution.
My initial thought (thoroughly wrong as it turned out) was that I would store the rows in a Trie as a first processing step. My current thought (too early to tell if it's equally wrong :)) is that I can decompose the answer into sub-tables in which the NULLs are properly nested, and store the non-redundant non-NULL prefixes in a set of Tries. For the example, the decomposition is simple. Instead of one table, there are two
A B
a b
a b
a NULL
A C
a c
a NULL
a c
Each sub-table contains either rows without NULLs or rows in which the NULLs come at the end. I believe I can generate these decompositions in all cases that will come up in my module because of the way the database is organized.
So, for now, I don't need the ability to store embedded undefs. If I did need to do so, I would almost certainly want undef to be distinguishable from an empty string, since they are different...
BTW if you're interested in how this plays out, I'm happy to keep you informed. I'm currently emulating those aspects of Trie that I need using hashes (the usual trick of joining array values into strings with $;), and doing lookups with grep. This is an easy way for me to learn whether my basic idea works. If the idea works. I'll reimplement with Trie.
Thanks again for the response and thanks for sharing your excellent module.
Best,
Nat
On Jun 19, 2013, at 12:03 PM, AVIF via RT wrote:
Show quoted text> <URL:
https://rt.cpan.org/Ticket/Display.html?id=86242 >
>
> On Wed Jun 19 12:15:58 2013, NATG wrote:
>> Never mind. I didn't think it through before submitting the bug
>> report. I don't need this feature after all. Sorry for the
>> misdirection.
>>
>> -Nat
>
> It might make sense as a feature, though. I havn't run a test (yet). but I think passing in undef as an array value will be essentially the same as passing in an empty string. Because entries in the Trie end up stringified, there isn't a good way to distinguish between these two cases without using a special "marker" like we use for the "end of word" right now. That isn't impossible, but it is kind of ugly.
>
> What use case did you have for needed to distinguish between those two cases?
>