Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Apache-Tika CPAN distribution.

Report information
The Basics
Id: 118433
Status: rejected
Priority: 0/
Queue: Apache-Tika

People
Owner: Nobody in particular
Requestors: yahavamsi [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: CommonsDigester calculates wrong hashes on large files
Date: Wed, 19 Oct 2016 14:35:37 +0300
To: bug-Apache-Tika [...] rt.cpan.org
From: Yahav Amsalem <yahavamsi [...] gmail.com>
Hi, I would like to report the next bug description: When passing more than one algorithm to CommonsDigester constructor and then trying to digest a file which is larger than 7.5 MB, results wrong hashe calculation for all the algorithms except the first. The next code will reproduce the bug: *// The file that was used was a simple plain text file with size > 7.5 MB* *File file = new File("c:\\testLargeFile.txt");* *BufferedInputStream bufferedInputStream = new BufferedInputStream(new FileInputStream(file));* *Metadata metadata = new Metadata();* *CommonsDigester digester = new CommonsDigester(20000000,* * CommonsDigester.DigestAlgorithm.MD5,* * CommonsDigester.DigestAlgorithm.SHA1,* * CommonsDigester.DigestAlgorithm.SHA256);* *digester.digest(bufferedInputStream, metadata, null);* *// Will print correct MD5 but wrong SHA1 and wrong SHA256* *System.out.println(metadata);* Initial direction: from a little research it seems that the inner buffered stream that is being used doesn't reset to 0 position after the first algorithm. If there are any further questions I would be happy to deliver more details. Thanks, Yahav Amsalem
Hi, this bugtracker is for Apache::Tika cpan perl module, you sent the bugreport to the wrong bugt racker, may be https://issues.apache.org/jira/browse/DIGESTER/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel is the right place. Gerard Ribugent Sobre Dmc Oct 19 07:35:51 2016, yahavamsi@gmail.com ha escrit: Show quoted text
> Hi, > > I would like to report the next bug description: > > When passing more than one algorithm to CommonsDigester constructor and > then trying to digest a file which is larger than 7.5 MB, results wrong > hashe calculation for all the algorithms except the first. > > The next code will reproduce the bug: > > *// The file that was used was a simple plain text file with size > 7.5 MB* > *File file = new File("c:\\testLargeFile.txt");* > > *BufferedInputStream bufferedInputStream = new BufferedInputStream(new > FileInputStream(file));* > > *Metadata metadata = new Metadata();* > > *CommonsDigester digester = new CommonsDigester(20000000,* > * CommonsDigester.DigestAlgorithm.MD5,* > * CommonsDigester.DigestAlgorithm.SHA1,* > * CommonsDigester.DigestAlgorithm.SHA256);* > > *digester.digest(bufferedInputStream, metadata, null);* > > *// Will print correct MD5 but wrong SHA1 and wrong SHA256* > *System.out.println(metadata);* > > Initial direction: from a little research it seems that the inner buffered > stream that is being used doesn't reset to 0 position after the first > algorithm. > > > If there are any further questions I would be happy to deliver more details. > > > Thanks, > > Yahav Amsalem
Subject: Re: [rt.cpan.org #118433] CommonsDigester calculates wrong hashes on large files
Date: Thu, 20 Oct 2016 14:54:30 +0300
To: bug-Apache-Tika [...] rt.cpan.org
From: Yahav Amsalem <yahavamsi [...] gmail.com>
Hi, I noticed that after I sent the mail but thought it might be of help to you too. Anyway, thanks for the kind reply, Yahav Amsalem בתאריך 20 באוק׳ 2016 2:00 אחה״צ,‏ "Gerard via RT" < bug-Apache-Tika@rt.cpan.org> כתב: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=118433 > > > Hi, > > this bugtracker is for Apache::Tika cpan perl module, you sent the > bugreport to the wrong bugt racker, may be https://issues.apache.org/ > jira/browse/DIGESTER/?selectedTab=com.atlassian.jira.jira-projects-plugin: > summary-panel is the right place. > > Gerard Ribugent > > Sobre Dmc Oct 19 07:35:51 2016, yahavamsi@gmail.com ha escrit:
> > Hi, > > > > I would like to report the next bug description: > > > > When passing more than one algorithm to CommonsDigester constructor and > > then trying to digest a file which is larger than 7.5 MB, results wrong > > hashe calculation for all the algorithms except the first. > > > > The next code will reproduce the bug: > > > > *// The file that was used was a simple plain text file with size > 7.5
> MB*
> > *File file = new File("c:\\testLargeFile.txt");* > > > > *BufferedInputStream bufferedInputStream = new BufferedInputStream(new > > FileInputStream(file));* > > > > *Metadata metadata = new Metadata();* > > > > *CommonsDigester digester = new CommonsDigester(20000000,* > > * CommonsDigester.DigestAlgorithm.MD5,* > > * CommonsDigester.DigestAlgorithm.SHA1,* > > * CommonsDigester.DigestAlgorithm.SHA256);* > > > > *digester.digest(bufferedInputStream, metadata, null);* > > > > *// Will print correct MD5 but wrong SHA1 and wrong SHA256* > > *System.out.println(metadata);* > > > > Initial direction: from a little research it seems that the inner
> buffered
> > stream that is being used doesn't reset to 0 position after the first > > algorithm. > > > > > > If there are any further questions I would be happy to deliver more
> details.
> > > > > > Thanks, > > > > Yahav Amsalem
> > > >