FDF
Section: User Manuals (1)
Updated: March 2009
Index
Return to Main Contents
NAME
fdf - Utility to discover double files
SYNOPSIS
fdf [-0zfSkvDq] [-o FILE] [-rsxcp] [-l RANGE] { FILE | - } [...]
DESCRIPTION
fdf
is a program that scans a set of input-files for duplicates and outputs
blocks of paths of files which are recognized as containing the same content
(as well as potentially match other parameters, see the section
SPECIFIC OPTIONS
below.
Each input may
be preceeded by a list of command-line options which are only valid for that
input-file.
GENERAL OPTIONS
- -o FILE
-
redirect the output of all double files into the specified file, possibly
overwriting an existing one
- -0
-
separate file-names supplied via standard input by 0x00-bytes instead of
newline (0x0a), which is the standard behaviour.
This switch is useful when providing a list of files to scan by i.e.
find(1)
-print0
- -z
-
separate the output by 0x00-bytes instead of newline. This may be useful if
the output is fed to
xargs(1)
-0
- -f
-
file-names of checked files must be the same for the files to be considered
equal
- -S
-
switches
fdf
to reverse mode, meaning that only those files are output, which either don't
match all "outer" criteria for double files (file-size and possibly same
basename(1),
see switch
-f
for details), but are different from the others and have no
doubles in the input-lists specified
- -k
-
abort as soon as any error occurs, independently of whether the subsequent
processing of files could be affected or not (the standard behaviour just
issues a warning in case of a file which cannot be read)
- -v
-
enables verbose output of the operations performed
- -D
-
enables debug output. This is generally not wanted and heavily degrades
performance
- -q
-
disable any diagnostic output (standard behaviour is to issue warnings on
standard error output when a file cannot be read or is discarded due to other
reasons)
- -h
-
prints a help message and quits the program
SPECIFIC OPTIONS
- -r
-
disables recursive scanning of any directories encountered in the input-list.
This switch is especially useful when reading a file-list generated by i.e.
find(1)
as input which contains directories
- -s
-
enables following symbolic links, see
symlink(7)
for details. This mode cannot yet handle recursive symbolic links as no
further sanitizing of the input is yet built in
- -x
-
discards any files which do not lie on the same file-system as the parent
input-file. This switch only has an effect in recursive mode, see
-r
above
- -l RANGE
-
specifies a range for the allowed file-size of files to be processed. Any other
files are discarded silently. A valid
RANGE
is defined by the following syntax:
[SIZE]-[SIZE]
whereas
SIZE
can is a positive integral number which may optionally be post-fixed by 'b'
for bytes, 'k' for kilo-bytes or 'm', 'g', etc. following this scheme.
SIZE
may be ommitted and is then assumed to be 0 for the first
SIZE
and infinity for the second
SIZE
- -c
-
enables scanning of the contents of symbolic links. This means that the
destination of symbolic links is treated as the file-content of it, so that
a symlink and a file (or another symlink) are considered equal, if the
destination of the further matches the content (or destination) or the latter.
- -p
-
this switch marks the subsequent input-specification as preferred. If doubles
are found, one from a preferred input and one from an input not marked as
preferred, only files from the input marked with
-p
are output
If the doubles of the block belong either to preferred or not preferred inputs,
all double files are output
OUTPUT FORMAT
The output of
fdf
is sectioned into blocks which are each separated from one another by a newline
character (0x0a hexadecimal), or 0x00 in case
-z
was specified. The entries of each block are separated by just one such
delimiter and their contents are considered equal. This is determined by an
algorithm that runs once over each file, so that
O(1)
is the run-time complexity of this (by far the most costly) part of the
program in the size of the inputs.
One entry of each block represents one file considered equal to the others of
the block. Each entry is separated into 5 parts by TAB characters (0x09):
- Input#
-
the number of the input-specification on the command-line, this file descended
from (in recursive mode), beginning with 1
- Pref
-
the preferred-status of the input this file descended from:
-
- '.'
-
the input was not marked preferred
- 'p'
-
the input was marked as preferred, but no files from non-preferred inputs have
been found to be equal to the files of this block
- 'P'
-
the input was marked as preferred and there have been files from other inputs
which weren't marked as such and have therefore been suppressed in the output
- Chunk#
-
the number of the file in this block of equal files
- Size
-
the size of the file in human-readable format
- Path
-
the path of the file
DIAGNOSTICS
The following diagnostics may be issued on stderr:
won't scan
-
symbolic link
-
The file won't be processed because the default behaviour is to ignore symbolic
links. This can be changed via the
-s
and/or
-c
switches.
directory
-
The directory won't be processed because it is not a regular file (see option
-r
for details)
socket
or
block device
or
character device
or
named pipe
or
unknown file-type
(meaning that
lstat(2)
couldn't retrieve information about the file)
-
The file won't be scanned because no logic has been implemented to handle these
file-types.
not on same device as initial file
-
The file won't be processed because the command-line option
-x
has been passed for the corresponding input, so that
fdf
won't cross device-borders.
no files to check
No input files have been given - or the ones given are no regular files
There are a lot more of errors which may occur, such as memory allocation
failure, error reading a specific file due to I/O-errors, missing
read-permissions or even errors during memory mapping files (see
mmap(2)
for details).
These errors won't be covered here as they should only occur
when there are either misconfigurations on the system running
fdf
or serious limitations as to the abilities of the system - which in turn
prevent this program from running and there is not much to be done about it.
However, if such errors occur without appearent reason, I'd really like to be
informed about it to be able to improve the program.
The same goes for bugs or misspellings (as I'm not a native english speaker).
AUTHOR
fdf
was written by Franz Brauße <dev -at- karlchenofhell.org>
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- GENERAL OPTIONS
-
- SPECIFIC OPTIONS
-
- OUTPUT FORMAT
-
- DIAGNOSTICS
-
- AUTHOR
-
This document was created by
man2html,
using the manual pages.
Time: 23:58:28 GMT, March 06, 2009