Finding Duplicate Files with Content Hashing
For this reason, extending duplicate file searches beyond a quick exploratory run will usually require comparison of file contents. In Duplicate File Detective, file contents can be matched with or without regard to other file attributes such as name or modification date.
Duplicate File Detective compares file contents through a process known as file hashing. File hashing can be defined as the generation of a smaller, unique key value (a "digital fingerprint") from the (much larger) contents of a given file.
Duplicate File Detective can hash file contents using a variety of cryptographic algorithms, including CRC32, ADLER32, MD5, and SHA1. The first two generate 32-bit file hashes, while the latter two generate 128-bit and 160-bit file hashes, respectively. Generally speaking, the stronger (larger) the generated hash values, the more accurate the file content comparison. In fact, the chance that two different files could produce the same 128-bit or 160-bit digital fingerprint is incredibly small.
Duplicate File Detective allows you to mix and match file content matching techniques to suit your specific requirements, but the built-in Project Wizard creates a reasonable set of defaults based upon your general objectives:
- Quick duplicate file scan - This project type matches duplicate files by name and size alone. It executes very quickly because no file contents are analyzed, and provides a fair degree of accuracy.
- Checksum duplicate file scan - Matches files by name, size, and CRC content hash. Also fairly quick, but works to ensure that files with the same name and size also have the same contents.
- Strong checksum duplicate file scan - Matches files by contents alone (via strong MD5 hashing). Allows for the isolation of duplicate files regardless of other attributes (such as name and / or modification date).