FDF

Section: User Manuals (1)
Updated: March 2009
Index Return to Main Contents

NAME

fdf - Utility to discover double files

SYNOPSIS

fdf [-0zfSkvDq] [-o FILE] [-rsxcp] [-l RANGE] { FILE | - } [...]

DESCRIPTION

fdf is a program that scans a set of input-files for duplicates and outputs blocks of paths of files which are recognized as containing the same content (as well as potentially match other parameters, see the section SPECIFIC OPTIONS below.

Each input may be preceeded by a list of command-line options which are only valid for that input-file.

GENERAL OPTIONS

-o FILE: redirect the output of all double files into the specified file, possibly overwriting an existing one
-0: separate file-names supplied via standard input by 0x00-bytes instead of newline (0x0a), which is the standard behaviour. This switch is useful when providing a list of files to scan by i.e. find(1) -print0
-z: separate the output by 0x00-bytes instead of newline. This may be useful if the output is fed to xargs(1) -0
-f: file-names of checked files must be the same for the files to be considered equal
-S: switches fdf to reverse mode, meaning that only those files are output, which either don't match all "outer" criteria for double files (file-size and possibly same basename(1), see switch -f for details), but are different from the others and have no doubles in the input-lists specified
-k: abort as soon as any error occurs, independently of whether the subsequent processing of files could be affected or not (the standard behaviour just issues a warning in case of a file which cannot be read)
-v: enables verbose output of the operations performed
-D: enables debug output. This is generally not wanted and heavily degrades performance
-q: disable any diagnostic output (standard behaviour is to issue warnings on standard error output when a file cannot be read or is discarded due to other reasons)
-h: prints a help message and quits the program

SPECIFIC OPTIONS

-r

disables recursive scanning of any directories encountered in the input-list. This switch is especially useful when reading a file-list generated by i.e. find(1) as input which contains directories

-s

enables following symbolic links, see symlink(7) for details. This mode cannot yet handle recursive symbolic links as no further sanitizing of the input is yet built in

-x

discards any files which do not lie on the same file-system as the parent input-file. This switch only has an effect in recursive mode, see -r above

-l RANGE

specifies a range for the allowed file-size of files to be processed. Any other files are discarded silently. A valid RANGE is defined by the following syntax: [SIZE]-[SIZE] whereas SIZE can is a positive integral number which may optionally be post-fixed by 'b' for bytes, 'k' for kilo-bytes or 'm', 'g', etc. following this scheme.

SIZE may be ommitted and is then assumed to be 0 for the first SIZE and infinity for the second SIZE

-c

enables scanning of the contents of symbolic links. This means that the destination of symbolic links is treated as the file-content of it, so that a symlink and a file (or another symlink) are considered equal, if the destination of the further matches the content (or destination) or the latter.

-p

this switch marks the subsequent input-specification as preferred. If doubles are found, one from a preferred input and one from an input not marked as preferred, only files from the input marked with -p are output

If the doubles of the block belong either to preferred or not preferred inputs, all double files are output

OUTPUT FORMAT

The output of fdf is sectioned into blocks which are each separated from one another by a newline character (0x0a hexadecimal), or 0x00 in case -z was specified. The entries of each block are separated by just one such delimiter and their contents are considered equal. This is determined by an algorithm that runs once over each file, so that O(1) is the run-time complexity of this (by far the most costly) part of the program in the size of the inputs.

One entry of each block represents one file considered equal to the others of the block. Each entry is separated into 5 parts by TAB characters (0x09):

Input#

the number of the input-specification on the command-line, this file descended from (in recursive mode), beginning with 1

Pref

the preferred-status of the input this file descended from:

'.': the input was not marked preferred
'p': the input was marked as preferred, but no files from non-preferred inputs have been found to be equal to the files of this block
'P': the input was marked as preferred and there have been files from other inputs which weren't marked as such and have therefore been suppressed in the output

Chunk#

the number of the file in this block of equal files

Size

the size of the file in human-readable format

Path

the path of the file

DIAGNOSTICS

The following diagnostics may be issued on stderr:

won't scan

symbolic link

: The file won't be processed because the default behaviour is to ignore symbolic links. This can be changed via the -s and/or -c switches.

directory

: The directory won't be processed because it is not a regular file (see option -r for details)

socket or block device or character device or named pipe or unknown file-type (meaning that lstat(2) couldn't retrieve information about the file)

: The file won't be scanned because no logic has been implemented to handle these file-types.

not on same device as initial file

: The file won't be processed because the command-line option -x has been passed for the corresponding input, so that fdf won't cross device-borders.

no files to check No input files have been given - or the ones given are no regular files

There are a lot more of errors which may occur, such as memory allocation failure, error reading a specific file due to I/O-errors, missing read-permissions or even errors during memory mapping files (see mmap(2) for details).

These errors won't be covered here as they should only occur when there are either misconfigurations on the system running fdf or serious limitations as to the abilities of the system - which in turn prevent this program from running and there is not much to be done about it.

However, if such errors occur without appearent reason, I'd really like to be informed about it to be able to improve the program.

The same goes for bugs or misspellings (as I'm not a native english speaker).

AUTHOR

fdf was written by Franz Brauße <dev -at- karlchenofhell.org>

This document was created by man2html, using the manual pages.
Time: 23:58:28 GMT, March 06, 2009