File Checksums

Top  Previous  Next

Duplicate File Detective compares the contents of files by computing the file's content checksum. This checksum is a numerical representation of the file's contents derived through a series of mathematical computations - a process known as hashing.

 

Duplicate File Detective offers a range of hashing algorithms:

 

CRC32 - A quick, 32-bit checksum.
ADLER32 - Another 32-bit checksum, similar in accuracy to CRC32.
MD5 - A very accurate, slower 128-bit checksum.
SHA1 - Even more accurate, slower 160-bit checksum.

 

Generally speaking, the "stronger" the hashing method, the more likely it is that two files compared with the resulting checksum will be identical. Stronger hashing algorithms are also generally a bit slower than weaker ones.

 

Note that stronger file content hashing algorithms such as MD4 and SHA1 are extremely unlikely to produce false positives (e.g. mistakenly identify two files as being identical to one another when they actually different). Even the smallest differences in file contents will (with overwhelming probability) result in completely different hashes due to a cryptographic concept known as the avalanche effect. If you must be absolutely certain that two files are identical, use the byte-for-byte content match confirmation, which validates file comparisons at the binary level.

 

Tip: Duplicate File Detective provides a File Hash Calculator feature that you can use to experiment with the computation of file checksums.

 


 

See also:

File Matching