Navigation:  Duplicate File Detective Projects > Search Filtering >

File Names

Previous pageReturn to chapter overviewNext page

During the duplicate file search process, every file's name is examined and compared against the settings defined in the File names section of the Search Filtering docking panel.

 

File name masks can be either inclusive or exclusive (see below for details).

 

File name masks can be defined using one of two syntaxes: wildcard patterns (the default) or more advanced regular expressions. Use the "Use regular expression matching" switch in the file name masks section of the search filtering panel to switch between these two modes.

 

Wildcard Patterns

 

When operating in wildcard pattern mode, file name masks are comprised of patterns of characters, with multiple sets of masks separated by semicolons. Wildcard characters include '?' and '*', which match either one instance or multiple instances of any character (respectively). Any other (non-wildcard) character matches itself.

 

Further, any file name mask can be preceded by a tilde character ( '~' ), which specifies that the mask is exclusionary. If a file name matches an exclusionary mask, the file will always be skipped.

 

Wildcard Pattern Examples

 

Here are a few examples of wildcard file name masks:

 

mypicture.bmp - This mask has no wildcard characters, and is therefore a literal match. Only files named "mypicture.bmp" will be included in the duplicate comparison process.
*.bmp - This mask uses the asterisk ('*') character to include any files with a ".bmp" extension in their names.
*.bmp;*gif;*jpg - This is a compound mask, with separate entries separated by semicolons. This mask will match any ".bmp", ".gif", or ".jpg" files it encounters.
~family*;*.bmp - The first element of this compound mask is preceded by a tilde ('~'), which means that it will excluded any files that start with "family". It will then include any files with a ".bmp" extension in their names.

 

The file names section of the Search Filtering docking pane also includes a Presets button, which you can click for easy access to a range of built-in masks (called File Groups). These can help you to get started quickly, and can also be customized to suit your needs.

 

Important Note: When using wildcards in filename masks, keep in mind that a mask of '*.*' is subtly different than just '*'. The former requires that a dot (.) be included in the file name, while the latter does not. In other words, the *.* pattern will not match file names that don't have an extension - if you wish to match these, use just a single asterisk (*) instead.

 

Regular Expressions

 

As noted above, filename masks can also be defined using powerful regular expression syntax. Regular expressions are formulas that can be used to match strings of text that follow some pattern. They allow their users to succinctly express a set of character matching rules that would otherwise require a large number of switches and logical operations.

 

This help file will not provide an in-depth tutorial on the formation of regular expressions, simply because a large number of such tutorials are freely available on the web  (visit your favorite search engine and enter "regular expressions" into the search box to find them).

 

Please keep in mind, however, that there are subtle differences between the regular expression syntax engines that various applications employ. The charts below provide an overview of the regular expression metacharacters and abbreviations supported by Duplicate File Detective.

 

Metacharacter

Meaning

.

Matches any single character.

[ ]

Indicates a character class. Matches any character inside the brackets (for example, [abc] matches "a", "b", and "c").

^

If this metacharacter occurs at the start of a character class, it negates the character class. A negated character class matches any character except those inside the brackets (for example, [^abc] matches all characters except "a", "b", and "c"). If ^ is at the beginning of the regular expression, it matches the beginning of the input (for example, ^[abc] will only match input that begins with "a", "b", or "c").

-

In a character class, indicates a range of characters (for example, [0-9] matches any of the digits "0" through "9").        

?

Indicates that the preceding expression is optional: it matches once or not at all (for example, [0-9][0-9]? matches "2" and "12").

+

Indicates that the preceding expression matches one or more times (for example, [0-9]+ matches "1", "13", "666", and so on).

*

Indicates that the preceding expression matches zero or more times.

??, +?, *?

Non-greedy versions of ?, +, and *. These match as little as possible, unlike the greedy versions which match as much as possible. Example: given the input "<abc><def>", <.*?> matches "<abc>" while <.*> matches "<abc><def>".

( )

Grouping operator. Example: (\d+,)*\d+ matches a list of numbers separated by commas (such as "1" or "1,23,456").

\

Escape character: interpret the next character literally (for example, [0-9]+ matches one or more digits, but [0-9]\+ matches a digit followed by a plus character). Also used for abbreviations (such as \a for any alphanumeric character; see table below). If \ is followed by a number n, it matches the nth match group (starting from 0). Example: <{.*?}>.*?</\0> matches "<head>Contents</head>".

$

At the end of a regular expression, this character matches the end of the input. Example: [0-9]$ matches a digit at the end of the input.

|

Alternation operator: separates two expressions, exactly one of which matches (for example, T|the matches "The" or "the").

!

Negation operator: the expression following ! does not match the input. Example: a!b matches "a" not followed by "b".

 

Abbreviations

 

\a        Any alphanumeric character: ([a-zA-Z0-9])

\b        White space (blank): ([ \\t])

\c        Any alphabetic character: ([a-zA-Z])

\d        Any decimal digit: ([0-9])

\h        Any hexadecimal digit: ([0-9a-fA-F])

\n        Newline: (\r|(\r?\n))

\q        A quoted string: (\"[^\"]*\")|(\'[^\']*\')

\w        A simple word: ([a-zA-Z]+)

\z        An integer: ([0-9]+)

 

Excluding Protected File Types

 

By default, Duplicate File Detective will exclude protected file types from the duplicate search process. To manage protected file types, navigate to the Protection tab of the Preferences window.