Just checked Wikipedia, it lists char(8211) as "en dash", so that agrees with HTML::Entities.
I can tell you that for both entities I mentioned, '‑' and '–', when processed by HTML::TreeBuilder and then by HTML::PrettyPrinter, get converted to three octal bytes; for '8209' the result is [342 200 221] and for 'ndash' it is [342 220 223].
Somehow, these represent decimal 8209 and 8211, I guess. (I don't get how, but that's besides the point.)
The reason I got interested in ‑ is because I have a Microsoft Word document that uses that entity for the minus in "N-1" and the hyphen in "x-ray".
Alexander Danel
# -------------------------------------
On Sat Mar 08 18:42:18 2014, alexander.danel@gmail.com wrote:
Show quoted text> OK, well, once again, I may need to withdraw my ticket minutes after
> creating it.
>
> I seem to be having two issues:
>
> (1) There might be a problem in HTML::Entities, rather than
> HTML::Tree. I just looked into the "Entities.pm" file, and found
> this:
>
> 'ndash;' => chr(8211),
>
> This should probably say "8209", not "8211".
>
> (2) There doesn't seem to be an easy way to tell PrettyPrint that ALL
> entities should be converted. (Any advice?)
>
> Sorry to be causing trouble, but I am trying to be helpful, not trying
> to be a pain.
>
> Alexander Danel
> # -----------------------------------
> On Sat Mar 08 18:03:43 2014, alexander.danel@gmail.com wrote:
> > The entity 'ndash' converts to three octal bytes: 342, 200, 223.
> > This
> > is also true for the entity '#8209'. When sent through
> > HTML::PrettyPrinter->format() this causes the warning "Wide character
> > in print...", and the result is incorrect.
> >
> > It seems to me these entities should be converted to the single
> > character \o{226}, which is decimal 150; which is "en dash".
> >
> > Setting "$root->no_expand_entities(1);" is not helpful; the entity
> > stays unexpanded, then PrettyPrinter does not recognize that it is an
> > entity and converts the leading '&' into "&" for every entity in
> > the document. (And, I don't want to turn off conversion, because
> > there might be real ampersands.)
> >
> > My work-around will be to convert these ndash entities into normal
> > hyphen characters.
> >
> > I am working in a CygWin environment.