Bug #102659 for XML-Bare: ForceArray feature request

Mon Mar 09 15:20:40 2015 bpkroth [...] gmail.com - Ticket created

Subject:	ForceArray feature request
Date:	Mon, 9 Mar 2015 14:20:28 -0500
To:	bug-XML-Bare [...] rt.cpan.org
From:	Brian Kroth <bpkroth [...] gmail.com>

Hi, I was looking at using XML::Bare as a replacement for XML::Simple, but I found that it's currently lacking the ability to return a fully arrayified structure. Instead, I have to modify my code to call forcearray() on everything just in case. It'd be cool if this was done automatically based on something like a ForceArray() option to either XMLin() or simple(). Thanks, Brian

Download signature.asc
application/pgp-signature 198b

Message body not shown because it is not plain text.

Mon Mar 09 16:57:30 2015 cpan [...] codechild.com - Correspondence added

On Mon Mar 09 15:20:40 2015, bpkroth@gmail.com wrote: Show quoted text

> Hi, I was looking at using XML::Bare as a replacement for XML::Simple, > but I found that it's currently lacking the ability to return a fully > arrayified structure.

Are you suggesting to always return an array for all nodes, or to have a ForceArray option specifying the names of nodes to force? XML structures typically only have specific portions that need to always be arrays. Most parts do not need to be forced to be an array. That is why the set of names for the ForceArray option in XML::Simple exists. I don't like that option, because it ignores where those nodes are found. Realistically, you should know what paths in the XML are arrays, and be able to easily turn them into arrays before handling them. That is, it is an elementary task to create a function that accepts a bunch of XML paths, and then traverses the XML running forcearray on each. Show quoted text

> Instead, I have to modify my code to call > forcearray() on everything just in case.

Do you really mean "everything"? <xml> <person> <age>24</age> </person> </xml> You want age to be turned into an array also? Show quoted text

> It'd be cool if this was done > automatically based on something like a ForceArray() option to either > XMLin() or simple(). > > Thanks, > Brian

If you really want to have the exact options that XML::Simple provides, it is possible to use the parser in XML::Bare together with XML::Simple itself by way of using XML::Bare::SAX::Parser. Note I haven't updated it in a while and it is some number of versions behind; but that is a potential option for you. Another thing to consider is that the parser itself is in the process of being rewritten. The new core of the parser can be found at https://github.com/nanoscopic/xml-bare. My intention in the long term is to integrate this parser into a new perl module, and use tied arrays/hashes, so that you can transparently treat nodes like either arrays or hashes. I have also considered making it possible for nodes to automatically be made into arrays based on the "xml bare schema" that you are using. That is, if you denote that a node can have more than one of it, that it will automatically be made into an array. This option would be the cleanest and easiest way to do what you want. You would of course have to write a trivial XBS that matches your XML for this to work. For now, that is going to be the solution. I will make a new version that does so. Basically this will happen: XML: <xml><person><age>24</age></person></xml> XBS: <xml><person*/></xml> or <xml><person+/></xml> Using XBS like that will let the parser know that "person" nodes either have 0 or more, or at least 1, and should automatically be made into arrays.

Mon Mar 09 16:57:30 2015 The RT System itself - Status changed from 'new' to 'open'

Mon Mar 09 16:57:37 2015 cpan [...] codechild.com - Taken

Tue Mar 10 00:20:39 2015 bpkroth [...] gmail.com - Correspondence added

Subject:	Re: [rt.cpan.org #102659] ForceArray feature request
Date:	Mon, 9 Mar 2015 23:20:23 -0500
To:	David Helkowski via RT <bug-XML-Bare [...] rt.cpan.org>
From:	Brian Kroth <bpkroth [...] gmail.com>

David Helkowski via RT <bug-XML-Bare@rt.cpan.org> 2015-03-09 16:57: Show quoted text

><URL: https://rt.cpan.org/Ticket/Display.html?id=102659 > > >On Mon Mar 09 15:20:40 2015, bpkroth@gmail.com wrote:

>> Hi, I was looking at using XML::Bare as a replacement for XML::Simple, >> but I found that it's currently lacking the ability to return a fully >> arrayified structure.

>Are you suggesting to always return an array for all nodes, or to have a ForceArray option specifying the names of nodes to force?

Both I guess. I think XML::Simple let ForceArray reference either 1 or an array of element type names. I've used both in the past. Sometimes I know upfront what nodes will have multiples in them, but sometimes I don't and I just want to handle all of them in a generic loop. Show quoted text

>XML structures typically only have specific portions that need to always be arrays. Most parts do not need to be forced to be an array. That is why the set of names for the ForceArray option in XML::Simple exists.

True, but there are cases where you may get an array in one case and just a hash in another depending upon the XML input your given. For instance (a really simple and small example from a much larger thing I was working on earlier today): <addresses> <address> <mac>00:de:ad:be:ef:00</mac> </address> </addresses> results in something like $tree = { addresses => { address => { mac => { content => '00:de:ad:be:ef:00', }, }, }, }; whereas this: <addresses> <address> <mac>00:de:ad:be:ef:00</mac> </address> <address> <mac>00:00:00:c0:ff:ee</mac> </address> </addresses> results in something like: $tree = { addresses => { address => [ { mac => { content => '00:de:ad:be:ef:00', }, }, { mac => { content => '00:00:00:c0:ff:ee', }, }, ], }, }; Now, when I'm iterating through this, I need to test to see at each level (or just certain levels if I have enough foreknowledge), is the result a hash reference or an array reference. It'd be much more convenient to just always assume that you're given an array reference, and iterate through each element, even if the iterate through is only a singleton. Show quoted text

>I don't like that option, because it ignores where those nodes are found. Realistically, you should know what paths in the XML are arrays, and be able to easily turn them into arrays before handling them.

Perhaps if you know all of the XML you're going to be handling up front, but that may not always be the case. Show quoted text

>That is, it is an elementary task to create a function that accepts a bunch of XML paths, and then traverses the XML running forcearray on each. >

>> Instead, I have to modify my code to call >> forcearray() on everything just in case.

> >Do you really mean "everything"?

Well, in the absence of other knowledge from something like a schema or dtd (which I don't always have), yes. Show quoted text

><xml> > <person> > <age>24</age> > </person> ></xml> > >You want age to be turned into an array also? >

>> It'd be cool if this was done >> automatically based on something like a ForceArray() option to either >> XMLin() or simple(). >> >> Thanks, >> Brian

> >If you really want to have the exact options that XML::Simple provides, it is possible to use the parser in XML::Bare together with XML::Simple itself by way of using XML::Bare::SAX::Parser. Note I haven't updated it in a while and it is some number of versions behind; but that is a potential option for you. > >Another thing to consider is that the parser itself is in the process of being rewritten. The new core of the parser can be found at https://github.com/nanoscopic/xml-bare. My intention in the long term is to integrate this parser into a new perl module, and use tied arrays/hashes, so that you can transparently treat nodes like either arrays or hashes.

Neat. Show quoted text

>I have also considered making it possible for nodes to automatically be made into arrays based on the "xml bare schema" that you are using. That is, if you denote that a node can have more than one of it, that it will automatically be made into an array. This option would be the cleanest and easiest way to do what you want. You would of course have to write a trivial XBS that matches your XML for this to work.

Ah, like I described above. Cool. Show quoted text

>For now, that is going to be the solution. I will make a new version that does so. > >Basically this will happen: > >XML: ><xml><person><age>24</age></person></xml> > >XBS: ><xml><person*/></xml> >or ><xml><person+/></xml> > >Using XBS like that will let the parser know that "person" nodes either have 0 or more, or at least 1, and should automatically be made into arrays.

Yeah, the trouble is, if you're parsing output from another source that you don't have full knowledge or control over, you're left to sort of guess what sort of garbage XML they might spit out at you. Thanks for your efforts. Cheers, Brian

Download signature.asc
application/pgp-signature 198b

Message body not shown because it is not plain text.

Tue Mar 10 11:18:34 2015 cpan [...] codechild.com - Correspondence added

See attached. I created a function that does as you want. Enjoy. Example output from the examples within, for people who would like to see it without having to run it: $VAR1 = { 'xml' => { 'person' => [ { 'age' => '25' } ] } }; $VAR1 = { 'xml' => { 'do_not_touch' => { 'person' => { 'name' => 'Bob' } }, 'person' => [ { 'age' => '25' } ] } }; $VAR1 = { 'a' => [ { 'b' => [ '2' ], 'c' => [ '3' ] } ] };

Subject:

fa_with_examples.pl

use XML::Bare qw/forcearray/; use Data::Dumper; example_1(); example_2(); example_3(); sub example_1 { my ( $ob, $xml ) = XML::Bare->simple( text => " <xml> <person> <age>25</age> </person> </xml> " ); XMLSimpleForceArray( $xml, nodenames => [ qw/person/ ] ); print Dumper( $xml ); } sub example_2 { my ( $ob, $xml ) = XML::Bare->simple( text => " <xml> <person> <age>25</age> </person> <do_not_touch> <person> <name>Bob</name> </person> </do_not_touch> </xml> " ); XMLSimpleForceArray( $xml, dotpaths => [ qw/xml.person/ ] ); print Dumper( $xml ); } sub example_3 { my ( $ob, $xml ) = XML::Bare->simple( text => " <a c=3> <b>2</b> </a> " ); XMLSimpleForceArray( $xml, forceall => 1 ); print Dumper( $xml ); } sub XMLSimpleForceArray { my $xml = shift; my $ops = { @_ }; if( $ops->{'nodenames'} ) { my $nodenames = $ops->{'nodenames'}; my $namehash = {}; for my $nodename ( @$nodenames ) { $namehash->{ $nodename } = 1; } XSFA_recurse( $xml, $namehash ); } if( $ops->{'dotpaths'} ) { my $arrpaths = []; my $dotpaths = $ops->{'dotpaths'}; for my $dotpath ( @$dotpaths ) { my @arr = split( '\.', $dotpath ); push( @$arrpaths, \@arr ); } $ops->{'arraypaths'} = $arrpaths; delete $ops->{'dotpaths'}; } if( $ops->{'arraypaths'} ) { my $nodepaths = $ops->{'arraypaths'}; my $pathhash = {}; for my $nodepath ( @$nodepaths ) { my $pathtext = join( '--', @$nodepath ); $pathhash->{ $pathtext } = 1; } XSFA_path_recurse( $xml, '', $pathhash ); } if( $ops->{'forceall'} ) { XSFA_all_recurse( $xml ); } } sub XSFA_all_recurse { my ( $xml ) = @_; my $ref = ref( $xml ); return if( !$ref ); if( $ref eq 'ARRAY' ) { for my $item ( @$xml ) { XSFA_all_recurse( $xml ); } } elsif( $ref eq 'HASH' ) { for my $key ( keys %$xml ) { my $val = $xml->{ $key }; $xml->{ $key } = [ $val ]; XSFA_all_recurse( $val ); } } else { die "Error"; } } sub XSFA_path_recurse { my ( $xml, $path, $pathhash ) = @_; my $ref = ref( $xml ); return if( !$ref ); if( $ref eq 'ARRAY' ) { for my $item ( @$xml ) { XSFA_path_recurse( $item, $path, $pathhash ); } } elsif( $ref eq 'HASH' ) { for my $key ( keys %$xml ) { my $val = $xml->{ $key }; my $p1 = $path ? "$path--$key" : $key; if( $pathhash->{ $p1 } ) { $xml->{ $key } = [ $val ]; } XSFA_path_recurse( $val, $p1, $pathhash ); } } else { die "Error"; } } sub XSFA_recurse { my ( $xml, $namehash ) = @_; my $ref = ref( $xml ); return if( !$ref ); if( $ref eq 'ARRAY' ) { for my $item ( @$xml ) { XSFA_recurse( $item, $namehash ); } } elsif( $ref eq 'HASH' ) { for my $key ( keys %$xml ) { my $val = $xml->{ $key }; if( $namehash->{ $key } ) { $xml->{ $key } = [ $val ]; } XSFA_recurse( $val, $namehash ); } } else { die "Error"; } }