Bug #86242 for Tree-Trie: Using undef when adding array of strings

Tue Jun 18 14:55:59 2013 NATG [...] cpan.org - Ticket created

CC:	natg [...] shore.net
Subject:	Using undef when adding array of strings

Greetings- Great module! I'm trying out Tree::Trie for possible use in my Data::Babel module to solve a bug involving output rows that are partial duplicates. For this purpose, it would be convenient to allow undefs in arrays being added to the trie. The use-case is obvious, but to be precise: use Tree::Trie; $tree=new Tree::Trie; $tree->add (['abc',undef,'def']); Thanks, Nat Goodman

Wed Jun 19 12:15:58 2013 NATG [...] cpan.org - Correspondence added

RT-Send-CC:

natg [...] shore.net

Never mind. I didn't think it through before submitting the bug report. I don't need this feature after all. Sorry for the misdirection. -Nat

Wed Jun 19 12:15:59 2013 NATG [...] cpan.org - Status changed from 'new' to 'rejected'

Wed Jun 19 15:03:37 2013 AVIF [...] cpan.org - Correspondence added

On Wed Jun 19 12:15:58 2013, NATG wrote: Show quoted text

> Never mind. I didn't think it through before submitting the bug > report. I don't need this feature after all. Sorry for the > misdirection. > > -Nat

It might make sense as a feature, though. I havn't run a test (yet). but I think passing in undef as an array value will be essentially the same as passing in an empty string. Because entries in the Trie end up stringified, there isn't a good way to distinguish between these two cases without using a special "marker" like we use for the "end of word" right now. That isn't impossible, but it is kind of ugly. What use case did you have for needed to distinguish between those two cases?

Wed Jun 19 18:13:25 2013 natg [...] shore.net - Correspondence added

Subject:	Re: [rt.cpan.org #86242] Using undef when adding array of strings
Date:	Wed, 19 Jun 2013 15:13:04 -0700
To:	bug-Tree-Trie [...] rt.cpan.org
From:	"Nathan (Nat) Goodman" <ngoodman [...] systemsbiology.org>

Thanks for the prompt reply. I'm working on a method to detect "partial duplicates" in the output of relational queries that do a series of left outer joins. These queries can produce outputs such as the following (upper case A, B, C are column names; lower case are example data: note that query results produced using DBI, NULL is represented as undef) : A B C a b c a b NULL a NULL c The 2nd and 3rd rows are redundant with the 1st and are what I'm calling "partial duplicates". There is an obvious brute force solution: sort the rows by number of NULLs (already done in the example), then compare each row to the ones below it. I'm trying to devise a more efficient solution. My initial thought (thoroughly wrong as it turned out) was that I would store the rows in a Trie as a first processing step. My current thought (too early to tell if it's equally wrong :)) is that I can decompose the answer into sub-tables in which the NULLs are properly nested, and store the non-redundant non-NULL prefixes in a set of Tries. For the example, the decomposition is simple. Instead of one table, there are two A B a b a b a NULL A C a c a NULL a c Each sub-table contains either rows without NULLs or rows in which the NULLs come at the end. I believe I can generate these decompositions in all cases that will come up in my module because of the way the database is organized. So, for now, I don't need the ability to store embedded undefs. If I did need to do so, I would almost certainly want undef to be distinguishable from an empty string, since they are different... BTW if you're interested in how this plays out, I'm happy to keep you informed. I'm currently emulating those aspects of Trie that I need using hashes (the usual trick of joining array values into strings with $;), and doing lookups with grep. This is an easy way for me to learn whether my basic idea works. If the idea works. I'll reimplement with Trie. Thanks again for the response and thanks for sharing your excellent module. Best, Nat On Jun 19, 2013, at 12:03 PM, AVIF via RT wrote: Show quoted text

> <URL: https://rt.cpan.org/Ticket/Display.html?id=86242 > > > On Wed Jun 19 12:15:58 2013, NATG wrote:

>> Never mind. I didn't think it through before submitting the bug >> report. I don't need this feature after all. Sorry for the >> misdirection. >> >> -Nat

> > It might make sense as a feature, though. I havn't run a test (yet). but I think passing in undef as an array value will be essentially the same as passing in an empty string. Because entries in the Trie end up stringified, there isn't a good way to distinguish between these two cases without using a special "marker" like we use for the "end of word" right now. That isn't impossible, but it is kind of ugly. > > What use case did you have for needed to distinguish between those two cases? >