File Matching

Top  Previous  Next

File Matching options (also accessible via the Tools | Project Settings menu option) represent one of the more advanced aspects of Duplicate File Detective's operation.

 

File Comparison Options

 

File Matching defines the means by which Duplicate File Detective compares files to determine whether or not they duplicate one another.

 

Match file names - Uses the filename portion of a given search path.
oIgnore file extensions - Uses the filename portion of the search path, but without regard to the file extension.
Match file sizes - Compares the precise size of files.
oMatch file contents - Uses one of the comparison hash types discussed below to represent and compare file contents.
Byte-for-byte content match confirmation - Confirms that matches identified by content hashing are identical at the byte level.
Match last modified date / time - Uses a combination of date / time to compare files.

 

The File Matching options window further allows you to configure the type of file hash used for content comparison operations. A file hash is a numerical checksum value, derived through some mathematical formula, that represents the contents of the related file as a whole. Theoretically speaking, stronger file hash algorithms produce checksums that are more unique than weaker ones, and thus are more likely to correctly identify duplicate files. Generally, the stronger the file hashing algorithm, the longer it takes to produce a file checksum.

 

Note: the match file sizes option must be engaged in order to enable the match file contents option. The contents of two files cannot be considered identical unless the files are of the same size.

 

File Content Hash Types

 

Duplicate File Detective supports the following file comparison hash types:

 

CRC32 - A quick, 32-bit checksum.
ADLER32 - Another 32-bit checksum, similar in accuracy to CRC32.
MD5 - A very accurate, slower 128-bit checksum.
SHA1 - Even more accurate, slower 160-bit checksum.

 

Note that stronger file content hashing algorithms such as MD4 and SHA1 are extremely unlikely to produce false positives (e.g. mistakenly identify two files as being identical to one another when they actually different). Even the smallest differences in file contents will (with overwhelming probability) result in completely different hashes due to a cryptographic concept known as the avalanche effect. If you must be absolutely certain that two files are identical, use the byte-for-byte content match confirmation, which validates file comparisons at the binary level.

 

File Matching options are project-specific, and are saved and loaded on a per-project basis.

 


 

See also:

File Checksums