On Mon Jun 06 15:23:32 2016, gnu.oracle@gmail.com wrote:
Show quoted text> Well, as I can see you are wrong using the "%02d" format instead of
> just
> "%d". The main idea of libpuzzle is to get the maximum possible narrow
> fingerprint of an image. It is not the case when one adds extra nulls
> when
> only one digit [0-4] used. So please fix it if you agree.
>
Thanks! Yes, I noticed this before merging to master yesterday.
https://github.com/estrabd/Image-Libpuzzle/commit/c93502487b8190199fac91b5546ecc5e190b1ec9
Show quoted text> Next I can comment my attemp to narrow the image fingerprint using
> alphabet
> (letters) as I mentioned before. Obviously this hint makes image
> signature
> twice shorter than one from signature_as_char_string. Which is good
> for
> storing it in a database and it still allows effectively comparing
> signatures using byte-to-byte comparison or Text::Levenshtein. But
> such
> alphabet format of a signature not as effective as [0-4] signature
> format
> when indexing libpuzzle signatires as words for quick check (see
>
http://stackoverflow.com/questions/9703762/libpuzzle-indexing-
> millions-of-pictures
> for the idea of indexing). One cannot index half of a letter (4 bits),
> just
> a single letter (4*2 bits). That's the difference. Additionally
> Levenshtein
> distances would look different but still meaningfull.
>
> You can safely ignore my last paragraph if you are not planning to
> implement such king of "signature compression".
While I do not require compression, if you have a way to compress in a manner that would allow for indexing/comparing (like in the millions of images post), I would most definitely welcome a pull request.
Off topic, but I filed an issue upstream (libpuzzle itself) regarding inconsistencies observed when comparing scaled images:
https://github.com/jedisct1/libpuzzle/issues/16
Thanks again for this report. If you're so included, would you agree that this issue has been resolved?
I'll try to get a new release out soon if so.
Cheers,
Brett
Show quoted text>
>
> 2016-05-26 0:41 GMT+03:00 B. D. Estrade via RT <
> bug-Image-Libpuzzle@rt.cpan.org>:
>
> > <URL:
https://rt.cpan.org/Ticket/Display.html?id=113716 >
> >
> > Also, don't worry about creating a separate issue in Github. RT is
> > fine.
> >
> > On Wed May 25 17:40:05 2016, ESTRABD wrote:
> > > No problem =) you waited 6 weeks for a response, I can't expect you
> > > to
> > > jump on a change.
> > >
> > > I think based on your code, that I fixed unpack as you had
> > > described.
> > > I am happy to add in the code you have that creates an A-Z
> > > representation if you find that it works for you.
> > >
> > > Please let me know what you find. I will be afk until next week,
> > > then
> > > I will start looking at how this change affects indexing and will
> > > likely push out another release to CPAN.
> > >
> > > Cheers,
> > > Brett
> > >
> > > On Wed May 25 16:15:21 2016, gnu.oracle@gmail.com wrote:
> > > > Thank you for an answer. Sorry I am lazy enough to use github,
> > > > but I
> > > > will
> > > > check your changes in a few days.
> > > >
> > > > Personally I like using your module like that:
> > > >
> > > > my @letters=split(//,"ABCDEFGHIJKLMNOPQRSTUWXYZ?");
> > > > my %letter_hash;
> > > > my $hash_ind=0;
> > > >
> > > > for(my $i=0;$i<=4;$i++) {
> > > > for(my $j=0;$j<=4;$j++) {
> > > > $letter_hash{ ($i<<4)+$j } = $letters[$hash_ind++];
> > > > }
> > > > }
> > > >
> > > > sub signature_as_char_string2 {
> > > > my(@signature)=map($_+2,unpack("c*", $_[0]));
> > > > my $octets="";
> > > > my $i;
> > > > for($i=0; $i<$#signature; $i+=2) {
> > > > my $ind=($signature[$i]<<4) + ($signature[$i+1]);
> > > > $octets .= $letter_hash{($signature[$i]<<4) +
> > > > ($signature[$i+1])};
> > > > }
> > > > return $octets;
> > > > }
> > > >
> > > > $str = signature_as_char_string2($pic-
> > > > >fill_cvec_from_file($file));
> > > >
> > > >
> > > > As a result I receive pretty letter strings of same length like
> > > > this:
> > > >
> > > >
> > AJYSABEERXZXTZJZUKBKJIQQWSIIBGPWDJFKDXAAAFAAYUJEUPTSGQFGSTYSGAAFYODPZZTZZWYZYUYZBPPMBGIGSQYNSIHIQWDDFUJSIAGWQAUWECPKSNNQIPGGIDGIAPSJEPYTZGGPFBTYUJBFFSGSQFBZZYRGABAWEDRYZTZTWZZSUZTBFFGGQJIWQQIJOIMYWCJKKDSAABFABYUJDPPSSIPFGSOYSAAAAHJDPZZOTZXYZYUYZBFFAGQIGQYLWLWDEGZAFPEJEPSQ
> > > >
> > > > Yet I do not know whether these strings are very useful for
> > > > comparing
> > > > images. I will try to investigate it futher.
> > > >
> > > >
> > > > 2016-05-25 21:12 GMT+03:00 B. D. Estrade via RT <
> > > > bug-Image-Libpuzzle@rt.cpan.org>:
> > > >
> > > > > <URL:
https://rt.cpan.org/Ticket/Display.html?id=113716 >
> > > > >
> > > > > Hi, after thinking more about your report I came up with this
> > > > > change.
> > > > > I
> > > > > think it is what you were meaning:
> > > > >
> > > > >
> > > > >
https://github.com/estrabd/Image-
> > > > > Libpuzzle/commit/43cdae1ed5fe6990900256cca05ccf5b026aeea0
> > > > >
> > > > > Can you please review/test that and provide me with some
> > > > > feedback.
> > > > > If
> > > > > it
> > > > > is then correct, I will push out a new version to CPAN with the
> > > > > fix.
> > > > >
> > > > > Thank you for your report.
> > > > >
> > > > > On Tue Apr 12 16:09:24 2016, gnu.oracle@gmail.com wrote:
> > > > > > Hi!
> > > > > >
> > > > > > I found useful your library but noticed the
> > > > > > 'signature_as_char_string'
> > > > > > method is not correct. It treats cvec as UNSIGNED char
> > > > > >
> > > > > > because you use unpack("C*"):
> > > > > >
> > > > > > # from lib/Image/Libpuzzle.pm
> > > > > >
> > > > > > # uses unpack as bin to char and $self accessor to get
> > > > > > signature
> > > > > > directly
> > > > > > from the internal cvec
> > > > > > sub signature_as_char_string {
> > > > > > my $self = shift;
> > > > > > my @sig = unpack("C*", $self->get_signature());
> > > > > > my $sig = q{};
> > > > > > foreach my $i (@sig) {
> > > > > > $sig .= sprintf("%02d", $i);
> > > > > > }
> > > > > > return $sig;
> > > > > > }
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > but cvec is an array of SIGNED bytes having values between -2
> > > > > > and
> > > > > > 2
> > > > > > (5
> > > > > > possible values: -2,-1,0,-1,2) -- see original typedef from
> > > > > > the
> > > > > > puzzle.h:
> > > > > >
> > > > > > typedef struct PuzzleCvec_ {
> > > > > > size_t sizeof_vec;
> > > > > > signed char *vec;
> > > > > > } PuzzleCvec;
> > > > > >
> > > > > >
> > > > > > As a result signature_as_char_string yelds chars in range of
> > > > > > ['0'..'5']
> > > > > (6
> > > > > > possible values). And what is probably worse -- its character
> > > > > > output
> > > > > length
> > > > > > varies from one image to another (printf("%02d") not works as
> > > > > > expected?).
> > > > > > Though binary cvecs all have same length. This fact makes
> > > > > > character
> > > > > string
> > > > > > cvecs (and ngrams made from such char cvecs) not probably
> > > > > > usable
> > > > > > for
> > > > > image
> > > > > > indexing. At least using them would not be the correct way of
> > > > > > indexing
> > > > > > images.
> > > > > >
> > > > > > I know there could be cases when interpreting SIGNED bytes as
> > > > > > UNSIGNED
> > > > > make
> > > > > > sense. But I think this time you are wrong. At least char
> > > > > > cvec
> > > > > > length
> > > > > > should not vary. But it changes from one image to another
> > > > > > (check
> > > > > > length($it)).
> > > > > >
> > > > > > Might be a better idea is interpreting cvecs as SIGNED
> > > > > > numbers
> > > > > > but
> > > > > > do add
> > > > > > +2 to all of them. Then we get a range of ['0'..'4'] which
> > > > > > best
> > > > > > fits in
> > > > > > only one digit, not two.
> > > > > >
> > > > > > Best idea would probably be using another chars but digits 0-
> > > > > > 4 to
> > > > > > encode
> > > > > > cvecs (A-Z,a-z, etc). Then word INDEX composed from ngrams
> > > > > > would
> > > > > > be
> > > > > > signifiacally better.
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> >
> >
> >
> >