Subject: | CommonsDigester calculates wrong hashes on large files |
Date: | Wed, 19 Oct 2016 14:35:37 +0300 |
To: | bug-Apache-Tika [...] rt.cpan.org |
From: | Yahav Amsalem <yahavamsi [...] gmail.com> |
Hi,
I would like to report the next bug description:
When passing more than one algorithm to CommonsDigester constructor and
then trying to digest a file which is larger than 7.5 MB, results wrong
hashe calculation for all the algorithms except the first.
The next code will reproduce the bug:
*// The file that was used was a simple plain text file with size > 7.5 MB*
*File file = new File("c:\\testLargeFile.txt");*
*BufferedInputStream bufferedInputStream = new BufferedInputStream(new
FileInputStream(file));*
*Metadata metadata = new Metadata();*
*CommonsDigester digester = new CommonsDigester(20000000,*
* CommonsDigester.DigestAlgorithm.MD5,*
* CommonsDigester.DigestAlgorithm.SHA1,*
* CommonsDigester.DigestAlgorithm.SHA256);*
*digester.digest(bufferedInputStream, metadata, null);*
*// Will print correct MD5 but wrong SHA1 and wrong SHA256*
*System.out.println(metadata);*
Initial direction: from a little research it seems that the inner buffered
stream that is being used doesn't reset to 0 position after the first
algorithm.
If there are any further questions I would be happy to deliver more details.
Thanks,
Yahav Amsalem