Filename Masks

Top  Previous  Next

As duplicate file system scanning progresses, each file name encountered is compared against the current project's filename masks. If the mask does not match the name of the current file, that file is not included in the duplicate comparison process.

 

Filename masks can be defined using one of two syntaxes: basic wildcard patterns or more advanced regular expressions. Use the "Use regular expressions" switch in the filename masks section of the search filtering panel to switch between these two modes.

 

Wildcard Patterns

 

When operating in wildcard pattern mode, filename masks are comprised of patterns of characters, with multiple sets of masks separated by a semicolon. Wildcard characters include '?' and '*', which match either one instance or multiple instances of any character (respectively). Any other (non-wildcard) character matches itself.

 

Further, any filename mask can be preceded by a tilde character ( '~' ), which specifies that the mask is exclusionary. If a filename matches an exclusionary mask, the file will always be skipped.

 

Wildcard Pattern Examples

 

Here are a few examples of wildcard filename masks:

 

mypicture.bmp - This mask has no wildcard characters, and is therefore a literal match. Only files named "mypicture.bmp" will be included in the duplicate comparison process.
*.bmp - This mask uses the asterisk ('*') character to include any files with a ".bmp" extension in their names.
*.bmp;*gif;*jpg - This is a compound mask, with separate entries separated by semicolons. This mask will match any ".bmp", ".gif", or ".jpg" files it encounters.
~family*;*.bmp - The first element of this compound mask is preceded by a tilde ('~'), which means that it will excluded any files that start with "family". It will then include any files with a ".bmp" extension in their names.

 

The filename masks section of the Search Filtering docking pane also includes a Presets button, which you can click for easy access to a range of built-in masks. These can help you to get started quickly. Presets can also be customized to suit your needs.

 

Important Note: When using wildcards in filename masks, keep in mind that a mask of '*.*' is subtly different than just '*'. The former requires that a dot (.) be included in the filename, while the latter does not. In other words, the *.* pattern will not match filenames that don't have an extension - if you wish to match these, use just a single asterisk (*) instead.

 

Regular Expressions

 

As noted above, filename masks can also be defined using powerful regular expression syntax. Regular expressions are formulas that can be used to match strings of text that follow some pattern. They allow their users to succinctly express a set of character matching rules that would otherwise require a large number of switches and logical operations.

 

This help file will not provide an in-depth tutorial on the formation of regular expressions, simply because a large number of such tutorials are freely available on the web  (visit your favorite search engine and enter "regular expressions" into the search box to find them).

 

Please keep in mind, however, that there are subtle differences between the regular expression syntax engines that various applications employ. The charts below provide an overview of the regular expression metacharacters and abbreviations supported by Duplicate File Detective.

 

Metacharacter

Meaning

.

Matches any single character.

[ ]

Indicates a character class. Matches any character inside the brackets (for example, [abc] matches "a", "b", and "c").

^

If this metacharacter occurs at the start of a character class, it negates the character class. A negated character class matches any character except those inside the brackets (for example, [^abc] matches all characters except "a", "b", and "c"). If ^ is at the beginning of the regular expression, it matches the beginning of the input (for example, ^[abc] will only match input that begins with "a", "b", or "c").

-

In a character class, indicates a range of characters (for example, [0-9] matches any of the digits "0" through "9").        

?

Indicates that the preceding expression is optional: it matches once or not at all (for example, [0-9][0-9]? matches "2" and "12").

+

Indicates that the preceding expression matches one or more times (for example, [0-9]+ matches "1", "13", "666", and so on).

*

Indicates that the preceding expression matches zero or more times.

??, +?, *?

Non-greedy versions of ?, +, and *. These match as little as possible, unlike the greedy versions which match as much as possible. Example: given the input "<abc><def>", <.*?> matches "<abc>" while <.*> matches "<abc><def>".

( )

Grouping operator. Example: (\d+,)*\d+ matches a list of numbers separated by commas (such as "1" or "1,23,456").

\

Escape character: interpret the next character literally (for example, [0-9]+ matches one or more digits, but [0-9]\+ matches a digit followed by a plus character). Also used for abbreviations (such as \a for any alphanumeric character; see table below). If \ is followed by a number n, it matches the nth match group (starting from 0). Example: <{.*?}>.*?</\0> matches "<head>Contents</head>".

$

At the end of a regular expression, this character matches the end of the input. Example: [0-9]$ matches a digit at the end of the input.

|

Alternation operator: separates two expressions, exactly one of which matches (for example, T|the matches "The" or "the").

!

Negation operator: the expression following ! does not match the input. Example: a!b matches "a" not followed by "b".

 

Abbreviations

 

\a        Any alphanumeric character: ([a-zA-Z0-9])

\b        White space (blank): ([ \\t])

\c        Any alphabetic character: ([a-zA-Z])

\d        Any decimal digit: ([0-9])

\h        Any hexadecimal digit: ([0-9a-fA-F])

\n        Newline: (\r|(\r?\n))

\q        A quoted string: (\"[^\"]*\")|(\'[^\']*\')

\w        A simple word: ([a-zA-Z]+)

\z        An integer: ([0-9]+)