Subject: | Digest made by Digest::Perl::MD4 != digest made by Digest::MD4 (hint: C version seems correct) |
I just ran the example program in the documentation on a file. I was very surprised that the digest differed from the one generated by an implementation in librsync. However librsync's version agrees with Digest::MD4.
I assume this module is wrong... the file tested is attached.
librsync and Digest::MD4 give me "1e2a2f3abdab44a6e917c0e3ccf2ad13"
Digest::Perl::MD4 give me "42858a2e3414fce8397cb94a9811b5eb"
Sorry for not providing you with more info (will gladly do so if so asked), but I'm not really using this module (except to test some librsync related stuff).
d.
#!/usr/local/bin/perl
use utf8;
print "foo";
#[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
#Arabic renderer in four lines of Perl
#
# * To: Unicode List <unicode@unicode.org>
# * Subject: Arabic renderer in four lines of Perl
# * From: Roman Czyborra <czyborra@cs.tu-berlin.de>
# * Date: Thu, 18 Jun 1998 12:12:20 +0000 (UTC)
# * Accept-Charset: *
# * Accept-Language: de,en,nl,ru,pl
# * cc: Larry Wall <lwall@perl.org>, Kaleb Keithley <kaleb@opengroup.org>, Gaspar Sinai <gsinai@gol.com>, recode-forum@iro.umontreal.ca, emacs-unicode@gnu.org, a2ps@inf.enst.fr
# * Distribution: world
# * Followup-To: poster
# * Link: <http://czyborra.com/>
# * Newsgroups: comp.software.arabic, comp.lang.perl.misc, netscape.public.mozilla.i18n
# * Organization: =?UTF-8?Q?Technische_Universit=C3=A4t_Berlin?=
# * PGP-Fingerprint: 2708E38751D3FB90456AC169A49BE6E6 (1024/87329995)
# * Resent-Date: Thu, 18 Jun 1998 19:36:13 +0200 (MET DST)
# * Resent-From: a2ps@email.enst.fr
# * Resent-Message-ID: <"i8ZRJ.A.aWC.MAVi1"@ulysse>
# * Resent-Sender: a2ps-request@email.enst.fr
# * User-Agent: Pine/3.96 (private offline X notebook; Linux 2.0.32 i586)
# * Xref: ubu.enst.fr mail.list.a2ps:146
# arabjoin - a simple filter to render Arabic text
# é 1998-06-18 roman@czyborra.com
# Freeware license at http://czyborra.com/
# Latest version at http://czyborra.com/unicode/
# PostScript printout at http://czyborra.com/unicode/arabjoin.ps.gz
# This filter takes Arabic text (encoded in UTF-8 using the Unicode
# characters from the U+0600 Arabic block in logical order) as input
# and performs Arabic glyph joining on it and outputs a UTF-8 octet
# stream that is no longer logically arranged but in a visual order
# which gives readable results when formatted with a simple Unicode
# renderer like Yudit that does not handle Arabic differently yet
# but simply outputs all glyphs in left-to-right order.
# This little script also demonstrates that Arabic rendering is not
# that complicated after all (it makes you wonder why some software
# companies are still asking hundreds of dollars from poor students
# who just want to print their Arabic texts) and that even Perl 4 can
# handle Unicode text in UTF-8 without any nifty new add-ons.
# Usage examples:
# echo "ãÃÂÃÂçàèçÃÂùçÃÂÃÂ
!" | arabjoin
# prints !ﻢï»ÂïºÂï»Âï»ÂïºÂﺠÃÂﻼﻫïºÂ
# which is the Arabic version of "Hello world!"
# | recode ISO-8859-6..UTF-8 | arabjoin | uniprint -f cyberbit.ttf
# prints an Arabic mail of charset=iso-8859-6-i on your printer
# | arabjoin | xviewer yudit
# delegates an Arabic UTF-8 message to a better viewer
# ftp://sunsite.unc.edu/pub/Linux/apps/editors/X/ has uniprint in yudit-1.0
# ftp://ftp.iro.umontreal.ca/pub/contrib/pinard/pretest/ has recode-3.4g
# http://czyborra.com/unicode/ has arabjoin
# http://czyborra.com/unix/ has xviewer
# http://www.bitstream.com/cyberbit.htm or
# ftp://ccic.ifcss.org/pub/software/fonts/unicode/ms-win/ or
# ftp://ftp.irdu.nus.sg/pub/language/bitstream/ has cyberbit.ttf
# This is how we do it: First we learn the presentation forms of each
# Arabic letter from the end of this script:
while(<DATA>)
{
($char, $_) = /^(\S+)\s+(\S+)/;
($isolated{$char},$final{$char},$medial{$char},$initial{$char}) =
/([\xC0-\xFF][\x80-\xBF]+)/g;
}
# Then learn the (incomplete set of) transparent characters:
foreach $char (split (" ", "
ààààààð
ààààààààá â ã ä ç è ê ë ì ÃÂ"))
{
$transparent{$char}=1;
}
# Finally we can process our text:
while (<>)
{
s/\n$//; # chop off the end of the line so it won't jump upfront
@uchar = # UTF-8 character chunks
/([\x00-\x7F]|[\xC0-\xFF][\x80-\xBF]+)/g;
# We walk through the line of text and do contextual analysis:
for ($i = $[; $i <= $#uchar; $i = $j)
{
for ($b=$uchar[$j=$i]; $transparent{$c=$uchar[++$j]};){};
# The following assignment is the heart of the algorithm.
# It reduces the Arabic joining algorithm described on
# pages 6-24 to 6-26 of the Arabic character block description
# in the Unicode 2.0 Standard to four lines of Perl:
$uchar[$i] = $a && $final{$c} && $medial{$b}
|| $final{$c} && $initial{$b}
|| $a && $final{$b}
|| $isolated{$b}
|| $b;
$a = $initial{$b} && $final{$c};
}
# Until the Unicode Consortium publishes its Unicode Technical
# Report #9 (Bidirectional Algorithm Reference Implementation)
# at http://www.unicode.org/unicode/reports/techreports.html
# let us oversimplify things a bit and reverse everything:
$_= join ('', reverse @uchar);
# The following 8 obligatory LAM+ALEF ligatures are encoded in the
# U+FE70 Arabic Presentation Forms-B block in Unicode's
# compatibility zone:
s/ïºÂï»Â/ﻵ/g;
s/ïºÂï» /ﻶ/g;
s/ïºÂï»Â/ï»·/g;
s/ïºÂï» /ﻸ/g;
s/ïºÂï»Â/ﻹ/g;
s/ïºÂï» /ﻺ/g;
s/ïºÂï»Â/ï»»/g;
s/ïºÂï» /ﻼ/g;
# Bitstream's Cyberbit font offers 57 of the other 466 optional
# ligatures in the U+FB50 Arabic Presentation Forms-A block:
s/ﻢïºÂ/ï°Â/g;
s/ﻲï»Â/ï°²/g;
s/ïºÂï»Â/ï°¿/g;
s/ﺢï»Â/ï±Â/g;
s/ﺦï»Â/ï±Â/g;
s/ﻢï»Â/ï±Â/g;
s/ï»°ï»Â/ï±Â/g;
s/ﻲï»Â/ï±Â/g;
s/ﻢﻧ/ï±Â/g;
s/ÃÂÃÂ/ï±Â/g;
s/ÃÂÃÂ/ï±Â/g;
s/ÃÂÃÂ/ï± /g;
s/ÃÂÃÂ/ﱡ/g;
s/ÃÂÃÂ/ï±¢/g;
s/ﺮïºÂ/ﱪ/g;
s/ﻦïºÂ/ï±Â/g;
s/ﻲïºÂ/ﱯ/g;
s/ﺮïºÂ/ï±°/g;
s/ﻦïºÂ/ï±³/g;
s/ﻲïºÂ/ï±µ/g;
s/ﻲﻨ/ï²Â/g;
s/ﺮﻴ/ï²Â/g;
s/ﻦﻴ/ï²Â/g;
s/ﺠïºÂ/ï²Â/g;
s/ﺤïºÂ/ï²Â/g;
s/ﺨïºÂ/ï²Â/g;
s/ﻤïºÂ/ï²Â/g;
s/ﺠïºÂ/ﲡ/g;
s/ﺤïºÂ/ï²¢/g;
s/ﺨïºÂ/ï²£/g;
s/ﻤïºÂ/ﲤ/g;
s/ﻤïºÂ/ﲦ/g;
s/ﻤïºÂ/ﲨ/g;
s/ﻤﺣ/ﲪ/g;
s/ﻤﺧ/ﲬ/g;
s/ﻤﺳ/ﲰ/g;
s/ﺠï»Â/ï³Â/g;
s/ﺤï»Â/ï³Â/g;
s/ﺨï»Â/ï³Â/g;
s/ﻤï»Â/ï³Â/g;
s/ﻬï»Â/ï³Â/g;
s/ﺠﻣ/ï³Â/g;
s/ﺤﻣ/ï³Â/g;
s/ﺨﻣ/ï³Â/g;
s/ﻤﻣ/ï³Â/g;
s/ﺠﻧ/ï³Â/g;
s/ﺤﻧ/ï³Â/g;
s/ﺨﻧ/ï³Â/g;
s/ﻤﻧ/ï³Â/g;
s/ﺠﻳ/ï³Â/g;
s/ﺤﻳ/ï³Â/g;
s/ﺨﻳ/ï³Â/g;
s/ﻤﻳ/ï³Â/g;
s/ﺤﻤï»Â/ï¶Â/g;
s/ﻪﻠï»ÂïºÂ/ï·²/g;
s/ﻢﻠﺳï»Â/ﻪﻴﻠï»Â/g;
s/ﻪï»ÂïºÂï» ïºÂ/ï»ÂïºÂ/g;
print "$_\n";
}
# The following table lists the presentation variants of each
# character. Each value from the U+0600 block means that the
# necessary glyph variant has not been assigned a code in Unicode's
# U+FA00 compatibility zone. You may want to insert your private
# glyphs or approximation glyphs for them:
__END__
á ïºÂ
â ïºÂïºÂ
ã ïºÂïºÂ
ä ïºÂ
ïºÂ
ÃÂ¥ ïºÂïºÂ
æ ïºÂïºÂïºÂïºÂ
ç ïºÂïºÂ
è ïºÂïºÂïºÂïºÂ
é ïºÂïºÂ
ê ïºÂïºÂïºÂïºÂ
ë ïºÂïºÂïºÂïºÂ
ì ïºÂïºÂﺠïºÂ
àﺡﺢﺤﺣ
î ﺥﺦﺨﺧ
ï ﺩﺪ
ð ﺫﺬ
ñ ïºÂﺮ
ò ﺯﺰ
ó ﺱﺲﺴﺳ
ô ﺵﺶﺸﺷ
õ ﺹﺺﺼﺻ
ö ﺽﺾï»Âﺿ
÷ ï»Âï»Âï»Âï»Â
ø ï»Â
ï»Âï»Âï»Â
ù ï»Âï»Âï»Âï»Â
ú ï»Âï»Âï»Âï»Â
ÃÂ ÃÂÃÂÃÂÃÂ
àï»Âï»Âï»Âï»Â
àï»Âï»Âï»Âï»Â
àï»Âï»Âï»Âï»Â
àï»Âï»Âï» ï»Â
ÃÂ
ﻡﻢﻤﻣ
àﻥﻦﻨﻧ
àﻩﻪﻬﻫ
àï»Âï»®
àﻯﻰ // ﯩﯨ
àﻱﻲﻴﻳ
ñ ï // ïÂÂ
ò òò
ó óó
ô ô
õ õõ
ö öö
÷ ï¯Â÷
ø øøøø
ù ï¦ï§ï©ï¨
ú ïÂÂïÂÂï¡ïÂÂ
û ïÂÂïÂÂïÂÂïÂÂ
ü üüüü
ý ýýýý
þ ïÂÂïÂÂïÂÂïÂÂ
ÿ ï¢ï£ïÂ¥ï¤
àïÂÂïÂÂïÂÂïÂÂ
ÃÂ ÃÂÃÂÃÂÃÂ
ÃÂ ÃÂÃÂÃÂÃÂ
àï¶ï·ï¹ï¸
àï²ï³ïµï´
ÃÂ
ÃÂ
ÃÂ
ÃÂ
ÃÂ
àïºïÂȕ½ï¼
àï¾ï¿ï®Âï®Â
àï®Âï®Â
ÃÂ ÃÂÃÂ
ÃÂ ÃÂÃÂ
ÃÂ ÃÂÃÂ
àï®Âï®Â
àï®Âï®Â
àï®Âï®Â
ÃÂ ÃÂÃÂ
ÃÂ ÃÂÃÂ
àï®Âï®Â
ÃÂ ÃÂÃÂ
ÃÂ ÃÂÃÂ
ÃÂ ÃÂÃÂ
ÃÂ ÃÂÃÂ
ÃÂ ÃÂÃÂ
ÃÂ ÃÂÃÂ
àï®Âï®Â
ÃÂ ÃÂÃÂ
ÃÂ ÃÂÃÂÃÂÃÂ
ÃÂ ÃÂÃÂÃÂÃÂ
ÃÂ ÃÂÃÂÃÂÃÂ
ÃÂ ÃÂÃÂÃÂÃÂ
ÃÂ ÃÂÃÂÃÂÃÂ
ÃÂ ÃÂÃÂÃÂÃÂ
ÃÂ ÃÂ ÃÂ ÃÂ ÃÂ
á áááá
â ââââ
ã ãããã
ä ïªï«ïÂÂï¬
ÃÂ¥ ÃÂ¥ÃÂ¥ÃÂ¥ÃÂ¥
æ ï®ï¯ï±ï°
ç çççç
è èèèè
é ï®Âï®Âï®Âï®Â
ê êêêê
ë ëëëë
ì ìììì
àï¯Âï¯Âï¯Âï¯Â
î îîîî
ï ï®Âï®Âï®Âï®Â
ð ðððð
ñ ï®Âï®Âï®Âï®Â
ò òòòò
ó ï®Âï®Âï®Âï®Â
ô ôôôô
õ õõõõ
ö öööö
÷ ÷÷÷÷
ú ï®Âï®Âúú
û ﮠﮡﮣﮢ
ü üüüü
ý ýýýý
þ ﮪﮫï®Âﮬ
àﮤﮥ
àﮦﮧﮩﮨ
ÃÂ ÃÂÃÂ
ÃÂ ÃÂÃÂ
ÃÂ ÃÂÃÂ
ÃÂ
ﯠﯡ
àï¯Âï¯Â
àï¯Âï¯Â
àï¯Âï¯Â
àﯢﯣ
ÃÂ ÃÂÃÂ
àï¯Âï¯Â
àﯼﯽﯿﯾ
ÃÂ ÃÂÃÂ
ÃÂ ÃÂÃÂÃÂÃÂ
àﯤﯥﯧﯦ
ÃÂ ÃÂÃÂÃÂÃÂ
ÃÂ ÃÂÃÂ
ÃÂ ÃÂÃÂ
ÃÂ ÃÂÃÂ
ÃÂ
ÃÂ
ÃÂ
ÃÂ ÃÂÃÂ
ÃÂ ÃÂÃÂ
ÃÂ ÃÂÃÂ
ÃÂ ÃÂÃÂ
ÃÂ ÃÂÃÂ
ÃÂ ÃÂÃÂ
ÃÂ ÃÂÃÂÃÂÃÂ
ÃÂ ÃÂÃÂ
ÃÂ ÃÂÃÂÃÂÃÂ
ÃÂ ÃÂÃÂÃÂÃÂ
ÃÂ ÃÂÃÂÃÂÃÂ
àﮮﮯ
àﮰﮱ
ÃÂ ÃÂ
â âÂÂâÂÂâÂÂâÂÂ
* Prev by Date: a2ps 4.10.2b -- Last Call
* Next by Date: a2ps 4.10.3 -- Any to PostScript filter
* Prev by thread: a2ps 4.10.2b -- Last Call
* Next by thread: a2ps 4.10.3 -- Any to PostScript filter
* Index(es):
* Date
* Thread