And after uploading the module I discovered that I had broken one of my
cardinal rules: always run tests *after* making changes (in this case
documentation) and *before* uploading. Serves me right that the
uploaded module won't compile. Here's a corrected version.
Miko
On Sat Dec 25 11:21:51 2010, miko@idocs.com wrote:
Show quoted text> Thank you for HTML::TreeBuilder. It's a great module! Very useful.
>
> I'd like to submit some suggested revisions to HTML::TreeBuilder. I
> needed to get these revisions into production, so instead of hacking
> HTML::TreeBuilder I created a module that subclasses
> HTML::TreeBuilder. I'll use that module if and until these ideas are
> incorporated.
>
> The subclssing module, HTML::TreeEvolve, is attached. See the
> documentation in the module.
>
> -Miko
=head1 NAME
HTML::TreeEvolve -- Additions I'd like to see in HTML::TreeBuilder
This module adds or modifies some methods to HTML::TreeBuilder and
HTML::TreeBuilder::Element that I think would be useful. I needed to get
these changes into production, so instead of hacking HTML::TreeBuilder,
resulting in code that can't be updated through the open-source process,
I created this module which subclasses HTML::TreeBuilder. I'm submitting
this module to the developed in the hope that the changes will be
incorporated into the standard distribution.
This module is meant as a temporary fix, not a permanent production module.
=head1 SYNOPSIS
# Instantiate tree object. See documentation for some of the
# additions/changes to HTML::TreeBuilder->new().
my $tree = HTML::TreeEvolve->new();
# set the class used for new element
$tree->element_class(HTML::MyClass);
# get the root element
$root = $node->root();
# Output HTML, but only the guts. Include end tags where they were
# originally sent, even for tags with optional enders.
$tree->html(guts_only=>1);
=head1 NEW METHODS
=cut
###############################################################################
# HTML::TreeEvolve::Element
#
package HTML::TreeEvolve::Element;
use strict;
use Debug::ShowStuff ':all';
use MemHandle;
use base 'HTML::Element';
#------------------------------------------------------------------------------
# $node->html
#
=head2 $node->html
Returns an the HTML code to rebuild the tree. This method is meant as a
*sort-of* replacement for $node->as_HTML(). It accomplishes the same basic
task: producing HTML. It does not have options for indenting the output
or setting which characters get changed to HTML entitity references. It
does provide the option to only output the "guts" of the tree by using the
guts_only option. It also provides a different interface for setting
options. Instead of using positional based options, it uses an options
hash (%opts).
I'd like to suggest that as_HTML be deprecated in favor of a method like
this that would provide a better platform for continued development.
Using an options hash means that it's always easy to add another option
should it be desired. A positional set of options, where every new option
requires adding to the list of arguments being sent, makes it increasingly
difficult to use as features are added.
It would not even be necessary to actually have two duplicate modules.
The code from as_HTML could be modified to use an options hash and renamed
html (or some other name, I'm not actually thrilled with "html"... perhaps
"as_HTML2"?). Then another as_HTML method could be provided that simply
accepts positional options then calls $tree->html() using optional arguments.
Easy peasy.
=cut
sub html {
my ($node, %opts) = @_;
my ($top, $mem, $optionalEndTag, $empty_element_map, $in_guts);
# determine if this is the outermost node in the recursion
$top = ! $opts{'recursed'};
# memory handle for storing up HTML
$mem = $opts{'memhandle'} || MemHandle->new('');
# hash of tags that are allowed to be without matching end tags
$optionalEndTag ||= \%HTML::Element::optionalEndTag;
# hash of elements that are always empty
$empty_element_map = \%HTML::Element::emptyElement;
# If output should only be of guts, determine if we're
# in the guts yet. We're in the guts if the caller has
# already determined that we're already in the guts, or
# if this ISN'T an implicit tag. If output should be of
# everything, then we'll just act like we're in the guts.
if ($opts{'in_guts'})
{ $in_guts = 1 }
elsif ($opts{'guts_only'})
{ $in_guts = ! $node->implicit }
else
{ $in_guts = 1 }
# add start tag
if ($in_guts)
{ print $mem $node->starttag() }
# if this isn't an empty element, recurse through children
unless ($empty_element_map->{$node->tag}) {
# loop through children
foreach my $child ($node->content_list) {
# recurse if it's an object
if (ref $child) {
$child->html(
memhandle => $mem,
recursed => 1,
guts_only => $opts{'guts_only'},
in_guts => $in_guts,
);
}
# else it's text, so just output it
else {
print $mem HTML::Entities::encode_entities($child);
}
}
}
# add end tag if this tag does the end tag thing
if ($in_guts) {
if (
(! $optionalEndTag->{$node->tag}) ||
$node->{'_explicit_end_tag'}
) {
print $mem $node->endtag();
}
}
# if top node, return Memhandle string
if ($top)
{ return $mem->mem }
}
#
# html
#------------------------------------------------------------------------------
#------------------------------------------------------------------------------
# $tree->element_class
#
=head2 $tree->element_class
Sets/returns the class which will be used for new elements. This method did
not seem to be present or called in the current release of HTML::TreeBuilder,
although it is mentioned in the documentation.
The docmentation says to subclass this method but as far as I can tell no
check is ever made if that subclassed method is ever called. Feel free to
show me where I'm wrong. It's happened once or twice or 1,000,000 times
before.
Unlike in HTML::TreeBuilder's documentation, this method does not require
subclassing, although it could be if the programmer wants to. This method
allowed the programmer to set _element_class by sending an optional second
parameter which should be the name of the class for new elements. That value
is stored in _element_class, which is what HTML::TreeBuilder seems to use in
the current version.
Just in case someone wants to send an example of the class instead of the
name of the class, you can send an existing object and the ref of that object
will be used. Not sure why that would ever be necessary but it's cool. Cool
counts.
The current class is always returned.
=cut
sub element_class {
my $node = shift;
# set class if it was sent
if (@_) {
my $class = $_[0];
if (ref $class)
{ $class = ref $class }
# store class
$node->{'_element_class'} = $class;
}
# return class
return $node->{'_element_class'};
}
#
# element_class
#------------------------------------------------------------------------------
#------------------------------------------------------------------------------
# $node->root
#
=head2 $node->element_class
Returns the root node. If the calling object is itself the root node then
returns itself.
=cut
sub root {
my ($node) = @_;
my @ancestors = $node->lineage();
# if any ancestors, return the first one
if (@ancestors)
{ return $ancestors[-1] }
# else return this node
return $node;
}
#
# root
#------------------------------------------------------------------------------
#
# HTML::TreeEvolve::Element
###############################################################################
###############################################################################
# HTML::TreeEvolve
#
package HTML::TreeEvolve;
use strict;
use Debug::ShowStuff ':all';
use String::Util ':all';
use Carp 'croak';
use UNIVERSAL qw[isa can];
# extend
use base 'HTML::TreeBuilder';
use base 'HTML::TreeEvolve::Element';
#------------------------------------------------------------------------------
# new
#
=head2 HTML::TreeEvolve->new()
This subclassed "new" method provides some functionality I'd like to see in
HTML::TreeBuilder->new(). See notes in code.
=cut
sub new {
my $class = shift;
my $tree = $class->SUPER::new();
my ($el_class);
# default HTML::TreeEvolve::Element class to the called
# class plus '::Element'. Not sure this is a good idea for everyone,
# but for now using it for my code.
$el_class = $tree->element_class($class . '::Element');
# KLUDGE: rebless existing elements
# When a tree is first created using HTML::TreeBuilder ()i.e. before it
# even gets to this point in the code), it already has <head> and <body>
# elements, which are blessed as HTML::Element. This loop reblesses them
# as the class returns by $tree->element_class.
foreach my $node ($tree->descendents) {
bless $node, $el_class;
}
# return
return $tree;
}
#
# new
#------------------------------------------------------------------------------
#------------------------------------------------------------------------------
# end
#
=head2 $tree->end()
Overriding but calling HTML::TreeBuilder $tree->end() method. This method
determines if an explicit end tag (e.g. </p>) was sent. If it was then that
fact is stored in the tag object. When the $node->html() method is called,
tags that have explicit enders -- even tags that don't require them --
will have the end tag added to the code.
=cut
sub end {
my ($tree, $tag, @args) = @_;
my (@rv);
# get return value
@rv = $tree->SUPER::end($tag, @args);
# If the opening tag can be found, not that is has an explicit
# closing tag.
TO_CLOSE:
foreach my $node (@rv) {
if (
(isa $node, 'HTML::Element') &&
($node->tag eq $tag)
) {
$node->{'_explicit_end_tag'} = 1;
last TO_CLOSE;
}
}
# return super call's return values
return @rv;
}
#
# end
#------------------------------------------------------------------------------
#
# HTML::TreeEvolve
###############################################################################
# return true
1;
__END__
=head1 TERMS AND CONDITIONS
Copyright (c) 2010 by Miko O'Sullivan. All rights reserved. This program is
free software; you can redistribute it and/or modify it under the same terms
as Perl itself. This software comes with B<NO WARRANTY> of any kind.
=head1 AUTHOR
Miko O'Sullivan
F<miko@idocs.com>
=head1 VERSION
=over
=item Version 0.10 December 25, 2010
Initial release. Yes, this is what I did on Christas day -- while my wife
was working. Don't be sad for me, we had a great Christmas.
=back
=cut