The Comparison Options window provides granular control over how Duplicate File Detective's compares files. The window contains three tabs, as described below.
General Tab
| • | Compare file names - Names must match in order for files to be considered duplicates. |
| o | Remove whitespace and special characters during comparison - Disregards all non-alphanumeric characters during file name comparisons (e.g. "_test.txt" and "test.txt" would be considered the same). |
| • | Compare file extensions - Extensions must match in order for files to be considered duplicates. |
| • | Compare file sizes - Sizes must match in order for files to be considered duplicates. |
| o | Compare file contents - The contents of files must match in order for files to be considered duplicates. |
| ▪ | Byte-for-byte content match confirmation - Confirms that matches identified by content hashing are identical at the byte level. |
| • | Compare last modified date / time - File modified date/time stamps must match in order for files to be considered duplicates. |
| • | Compare music tags - The audio tags that you specify (see below) must match in order for files to be considered duplicates. |
Other options:
| • | Hard linked files should not be compared - When engaged, ensures that files hard linked to one another are not compared. |
| • | When hashing zip files, enumerate and hash the files they contain - Zip files (those with a .zip extension) often contain metadata that prevent them from responding well to normal file content comparisons. Use of this option will cause the archived contents (e.g. the individual files) to be hashed independently of the zip file that contains them, improving comparison potential. |
Notes:
| • | Files will be considered duplicates of one another only when all the chosen comparison options match. |
| • | Byte-for-byte content matching will slow the overall duplicate search process considerably, and is rarely necessary (see file hashing notes below). |
| • | When using file content comparison, combining it with other match options (such as file name and/or extension) will often improve performance by reducing the number of files that need to be hashed. |
Hashing Tab
When "Compare file contents" is selected on the General Tab of the Comparison Options window (see above), this tab can be used to specify precisely which hashing method is used to generate file content checksums.
A file hash is a numerical checksum value, derived through some mathematical formula, that represents the contents of the related file as a whole. Theoretically speaking, stronger file hash algorithms produce checksums that are more unique than weaker ones, and thus are more likely to correctly identify duplicate files. Generally, the stronger the file hashing algorithm, the longer it takes to produce a file checksum.
Duplicate File Detective supports the following file comparison hash types:
| • | CRC32 - A quick, 32-bit checksum. |
| • | ADLER32 - Another 32-bit checksum, similar in accuracy to CRC32. |
| • | MD5 - A very accurate, slower 128-bit checksum. |
| • | SHA1 - Even more accurate, slower 160-bit checksum. |
| • | SHA256 - Even more accurate, slower 256-bit checksum. |
| • | SHA512 - Even more accurate, slower 512-bit checksum. |
Stronger file content hashing algorithms (such as SHA1 and SHA256) are very unlikely to produce false positives (e.g. mistakenly identify two files as being identical to one another when they actually different). Even the smallest differences in file contents will (with overwhelming probability) result in completely different hashes due to a cryptographic concept known as the avalanche effect. If you must be absolutely certain that two files are identical, use the byte-for-byte content match confirmation, which validates file comparisons at the binary level.
File Matching options are project-specific, and are saved and loaded on a per-project basis.
Music Tags Tab
Many types of audio files (including MP3, WMA, OGG, ASF, etc.) contain special data fields called tags. Tags were designed to store additional information about an audio file, such as the track title, artist, album name, genre, and more.
Audio tags can also be useful when searching for duplicate songs. File content comparison (through hashing or byte-by-byte analysis) is often ineffective at detecting duplicate audio files because their contents naturally tend to vary depending upon how (and when) the audio itself was captured - meaning that the files themselves are often not truly identical. However, we can compare audio tags to great effect - for example, if two music files have identical title and artist tags, they are very likely to be the same song.
When "Compare music tags" is selected in on the General Tab of the Comparison Options window (see above), this tab is used to specify precisely which tags are used to compare audio files. Duplicate File Detective supports a core set of audio tags which have been broadly adopted within the music industry (including artist, title, album, track, etc.). Audio tag comparisons are always performed in a case-insensitive manner (e.g. upper and lower case are ignored).
Duplicate File Detective supports extraction and comparison of audio tags from the following music file formats: MP3, Ogg Vorbis, FLAC, MPC, Speex, WavPack TrueAudio, WAV, AIFF, MP4 and ASF. Supported audio file extensions: .mp3, .ogg, .flac, .oga, .mpc, .wv, .spx, .tta, .m4a, .m4b, .m4p, .3g2, .mp4, .wma, .asf, .aif, .aiff, and .wav.
Important: Audio files are not required to contain tag data (most do), and this duplicate detection method will not work with files that don't.
See also: