Bug #33348 for XML-Bare: Need a "compact" option

Sun Feb 17 14:18:54 2008 darnold [...] presicient.com - Ticket created

Subject:

Need a "compact" option

I'm attempting to use XML::Bare to parse XML before conversion to JSON. While I appreciate the performance of XML::Bare, the resulting output is overly dense/verbose, causing the resulting JSON to double in size compared to e.g., XML::Simple. I realize that being able to reconstruct the XML with attributes vs. values vs. comments is important, but for simple apps, it would be useful to provide an option to remove the "value" key from all the leaf nodes, and simply pull up the value to the parent key, and discard the comments. Attributes can be treated the same as values. Yes, it will break some XML docs, and lose the ability to reconstruct the original XML. But for many apps, it will produce a much more manageable Perl structure.

Tue Mar 18 12:01:07 2008 cpan [...] codechild.com - Correspondence added

On Sun Feb 17 14:18:54 2008, DARNOLD wrote: Show quoted text

> I'm attempting to use XML::Bare to parse XML before conversion to JSON. > While I appreciate the performance of XML::Bare, the resulting output is > overly dense/verbose, causing the resulting JSON to double in size > compared to e.g., XML::Simple. I realize that being able to reconstruct > the XML with attributes vs. values vs. comments is important, but for > simple apps, it would be useful to provide an option to remove the > "value" key from all the leaf nodes, and simply pull up the value to the > parent key, and discard the comments. Attributes can be treated the same > as values. > > Yes, it will break some XML docs, and lose the ability to reconstruct > the original XML. But for many apps, it will produce a much more > manageable Perl structure.

Actually recently saw some posts of your on the net mentioning this... I think what you are asking for is for a version of the module without the 'value', to just allow immediate traversal assuming you know what you are doing. I have implemented that to some degree via a 'simplify' function in the latest version, but I think you would be better off with a version of my module targeted to exactly that. If I have time and motivation during the evenings or weekends I will go ahead and create an alternate module for you. I assume by what you are saying you will not need attributes, mixed xml, or comment nodes; you just want direct values only. Any recommendation on an idea of what I should call it? I would put it under XML::Bare. Such as XML::Bare::Less, or XML::Bare::Direct, or XML::Bare::Basic.

Tue Mar 18 12:01:10 2008 The RT System itself - Status changed from 'new' to 'open'

Tue Mar 18 12:35:50 2008 darnold [...] presicient.com - Correspondence added

On Tue Mar 18 12:01:07 2008, CODECHILD wrote: Show quoted text

> > Actually recently saw some posts of your on the net mentioning this... > > I think what you are asking for is for a version of the module without > the 'value', to just allow immediate traversal assuming you know what > you are doing. I have implemented that to some degree via a 'simplify' > function in the latest version, but I think you would be better off with > a version of my module targeted to exactly that. > > If I have time and motivation during the evenings or weekends I will go > ahead and create an alternate module for you. > > I assume by what you are saying you will not need attributes, mixed xml, > or comment nodes; you just want direct values only. > > Any recommendation on an idea of what I should call it? I would put it > under XML::Bare. Such as XML::Bare::Less, or XML::Bare::Direct, or > XML::Bare::Basic.

Actually, I emailed a tarball to you (to cpan@codechild.com) with an implementation (plus JSON support, and some fixes for possible XS memory leaks). I'm attaching it to this post, so hopefully you'll get it. (Maybe that email address isn't valid ?) No idea what to call it, maybe XML::Bare::ToJSON if you decide to include the JSON stuff. Anyway, even with my hacks, its very fast (I mean *really* fast - when testing my hacks, I thought I had a bug that bypassed parsing cuz it couldn't possibly be that fast 8^), and should use less memory. As you may have seen at http://www.perlmonks.com/?node_id=668445, its now my favorite XML processor!

Download XML-Bare-0.99.tar.gz
application/x-tar 29.3k

Message body not shown because it is not plain text.

Tue Mar 18 13:16:26 2008 cpan [...] codechild.com - Correspondence added

On Tue Mar 18 12:35:50 2008, DARNOLD wrote: Show quoted text

> On Tue Mar 18 12:01:07 2008, CODECHILD wrote:

> > > > Actually recently saw some posts of your on the net mentioning this... > > > > I think what you are asking for is for a version of the module without > > the 'value', to just allow immediate traversal assuming you know what > > you are doing. I have implemented that to some degree via a 'simplify' > > function in the latest version, but I think you would be better off with > > a version of my module targeted to exactly that. > > > > If I have time and motivation during the evenings or weekends I will go > > ahead and create an alternate module for you. > > > > I assume by what you are saying you will not need attributes, mixed xml, > > or comment nodes; you just want direct values only. > > > > Any recommendation on an idea of what I should call it? I would put it > > under XML::Bare. Such as XML::Bare::Less, or XML::Bare::Direct, or > > XML::Bare::Basic.

> > Actually, I emailed a tarball to you (to cpan@codechild.com) with an > implementation (plus JSON support, and some fixes for possible XS memory > leaks). I'm attaching it to this post, so hopefully you'll get it. > (Maybe that email address isn't valid ?) > > No idea what to call it, maybe XML::Bare::ToJSON if you decide to > include the JSON stuff. > > Anyway, even with my hacks, its very fast (I mean *really* fast - when > testing my hacks, I thought I had a bug that bypassed parsing cuz it > couldn't possibly be that fast 8^), and should use less memory. > As you may have seen at http://www.perlmonks.com/?node_id=668445, > its now my favorite XML processor!

The email is valid; it just gets hit with so much spam I barely if ever go through it all. If I spot something obvious I notice it, otherwise it is pretty much a junk bin. I will go back and look for it. It is, as you say, stupendously fast. Even so, I should like to alter the parser itself to not even bother storing un-needed data though for your version. ( so that the code is a tad cleaner and people can tinker with it more easily ) I will check out the changes you made first. As far as memory goes, I wish there was an easy way to test how much memory is used, but I have yet to figure out a reliable way that can be used to compare it against other parsers. Memory usage should be as low as possible, due to the fact that I reference the xml file in memory directly and do not duplicate strings. Beware using the functions that 'avoid leaks', because the way I have things written I try not to allocate any memory via the XS code; just make blind pointers I destroy on my own. Internally I am up to version 0.29 by the way; so there are some other things I may integrate with your changes to make it all up to date.

Tue Mar 18 20:18:58 2008 darnold [...] presicient.com - Correspondence added

Subject:	Re: [rt.cpan.org #33348] Need a "compact" option
Date:	Tue, 18 Mar 2008 17:18:29 -0700
To:	bug-XML-Bare [...] rt.cpan.org
From:	Dean Arnold <darnold [...] presicient.com>

David Helkowski via RT wrote: Show quoted text

> <URL: http://rt.cpan.org/Ticket/Display.html?id=33348 > > > On Tue Mar 18 12:35:50 2008, DARNOLD wrote:

>> On Tue Mar 18 12:01:07 2008, CODECHILD wrote:

>>> >>> Any recommendation on an idea of what I should call it? I would put it >>> under XML::Bare. Such as XML::Bare::Less, or XML::Bare::Direct, or >>> XML::Bare::Basic.

How about XML::Bare::Naked ? (Sorry, I couldn't resist...) - Dean

Wed Mar 19 08:56:26 2008 cpan [...] codechild.com - Correspondence added

On Tue Mar 18 20:18:58 2008, DARNOLD wrote: Show quoted text

> David Helkowski via RT wrote:

> > <URL: http://rt.cpan.org/Ticket/Display.html?id=33348 > > > > > On Tue Mar 18 12:35:50 2008, DARNOLD wrote:

> >> On Tue Mar 18 12:01:07 2008, CODECHILD wrote:

> >>> > >>> Any recommendation on an idea of what I should call it? I would put it > >>> under XML::Bare. Such as XML::Bare::Less, or XML::Bare::Direct, or > >>> XML::Bare::Basic.

> > How about XML::Bare::Naked ? > > (Sorry, I couldn't resist...) > > - Dean

I really like XML::Bare::Stripped... but it has the same problem.

Wed Jul 02 14:38:36 2008 cpan [...] codechild.com - Correspondence added

Version 0.30 now has a simple() function that can be used to generate a structure similar to what you are describing. Note that it is an alternate parser in C; and is not a hack like 0.271. If you are using 0.271 please change to using 0.30 and the way it works, because 0.271 conflicts with their function naming. I use forcearray for a different purpose than 0.271 does. ( was using that name before 0.271 ever existed ... ) On Sun Feb 17 14:18:54 2008, DARNOLD wrote: Show quoted text

> I'm attempting to use XML::Bare to parse XML before conversion to JSON. > While I appreciate the performance of XML::Bare, the resulting output is > overly dense/verbose, causing the resulting JSON to double in size > compared to e.g., XML::Simple. I realize that being able to reconstruct > the XML with attributes vs. values vs. comments is important, but for > simple apps, it would be useful to provide an option to remove the > "value" key from all the leaf nodes, and simply pull up the value to the > parent key, and discard the comments. Attributes can be treated the same > as values. > > Yes, it will break some XML docs, and lose the ability to reconstruct > the original XML. But for many apps, it will produce a much more > manageable Perl structure.

Wed Jul 02 14:38:38 2008 cpan [...] codechild.com - Status changed from 'open' to 'resolved'

Thu Jul 03 09:05:28 2008 cpan [...] codechild.com - Fixed in 0.30 added