[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
sort
: Sort text files
sort
sorts, merges, or compares all the lines from the given
files, or standard input if none are given or for a file of
`-'. By default, sort
writes the results to standard
output. Synopsis:
sort [option]... [file]... |
sort
has three modes of operation: sort (the default), merge,
and check for sortedness. The following options change the operation
mode:
A pair of lines is compared as follows: if any key fields have been
specified, sort
compares each pair of fields, in the order
specified on the command line, according to the associated ordering
options, until a difference is found or no fields are left.
Unless otherwise specified, all comparisons use the character
collating sequence specified by the LC_COLLATE
locale.
If any of the global options `Mbdfinr' are given but no key fields
are specified, sort
compares the entire lines according to the
global options.
Finally, as a last resort when all keys compare equal (or if no
ordering options were specified at all), sort
compares the entire
lines. The last resort comparison
honors the `-r' global option. The `-s' (stable) option
disables this last-resort comparison so that lines in which all fields
compare equal are left in their original relative order. If no fields
or global options are specified, `-s' has no effect.
GNU sort
(as specified for all GNU utilities) has no limits on
input line length or restrictions on bytes allowed within lines. In
addition, if the final byte of an input file is not a newline, GNU
sort
silently supplies one. A line's trailing newline is part of
the line for comparison purposes; for example, with no options in an
ASCII locale, a line starting with a tab sorts before an empty line
because tab precedes newline in the ASCII collating sequence.
Upon any error, sort
exits with a status of `2'.
If the environment variable TMPDIR
is set, sort
uses its
value as the directory for temporary files instead of `/tmp'. The
`-T tempdir' option in turn overrides the environment
variable.
The following options affect the ordering of output lines. They may be
specified globally or as part of a specific key field. If no key
fields are specified, global options apply to comparison of entire
lines; otherwise the global options are inherited by key fields that do
not specify any special options of their own. The `-b', `-d',
`-f' and `-i' options classify characters according to
the LC_CTYPE
locale.
strtod
to convert
a prefix of each line to a double-precision floating point number.
This allows floating point numbers to be specified in scientific notation,
like 1.0e-34
and 10e100
.
Do not report overflow, underflow, or conversion errors.
Use the following collating sequence:
Use this option only if there is no alternative; it is much slower than `-n' and it can lose information when converting to floating point.
LC_TIME
locale
determines the month spellings.
LC_NUMERIC
locale specifies the radix character and thousands separator.
sort -n
uses what might be considered an unconventional method
to compare strings representing floating point numbers. Rather than
first converting each string to the C double
type and then
comparing those values, sort aligns the radix characters in the two
strings and compares the strings a character at a time. One benefit
of using this approach is its speed. In practice this is much more
efficient than performing the two corresponding string-to-double (or even
string-to-integer) conversions and then comparing doubles. In addition,
there is no corresponding loss of precision. Converting each string to
double
before comparison would limit precision to about 16 digits
on most systems.
Neither a leading `+' nor exponential notation is recognized. To compare such strings numerically, use the `-g' option.
Other options are:
sort
copies
it to a temporary file before sorting and writing the output to
output-file.
sort
breaks it
into fields ` foo' and ` bar'. The field separator is
not considered to be part of either the field preceding or the field
following.
In addition, when GNU sort
is invoked with exactly one argument,
options `--help' and `--version' are recognized. See section 2. Common options.
Historical (BSD and System V) implementations of sort
have
differed in their interpretation of some options, particularly
`-b', `-f', and `-n'. GNU sort follows the POSIX
behavior, which is usually (but not always!) like the System V behavior.
According to POSIX, `-n' no longer implies `-b'. For
consistency, `-M' has been changed in the same way. This may
affect the meaning of character positions in field specifications in
obscure cases. The only fix is to add an explicit `-b'.
A position in a sort field specified with the `-k' or `+' option has the form `f.c', where f is the number of the field to use and c is the number of the first character from the beginning of the field (for `+pos') or from the end of the previous field (for `-pos'). If the `.c' is omitted, it is taken to be the first character in the field. If the `-b' option was specified, the `.c' part of a field specification is counted from the first nonblank character of the field (for `+pos') or from the first nonblank character following the previous field (for `-pos').
A sort key option may also have any of the option letters `Mbdfinr' appended to it, in which case the global ordering options are not used for that particular field. The `-b' option may be independently attached to either or both of the `+pos' and `-pos' parts of a field specification, and if it is inherited from the global options it will be attached to both. Keys may span multiple fields.
Here are some examples to illustrate various combinations of options. In them, the POSIX `-k' option is used to specify sort keys rather than the obsolete `+pos1-pos2' syntax.
sort -nr |
Sort alphabetically, omitting the first and second fields. This uses a single key composed of the characters beginning at the start of field three and extending to the end of each line.
sort -k3 |
sort -t : -k 2,2n -k 5.3,5.4 |
Note that if you had written `-k 2' instead of `-k 2,2' `sort' would have used all characters beginning in the second field and extending to the end of the line as the primary numeric key. For the large majority of applications, treating keys spanning more than one field as numeric will not do what you expect.
Also note that the `n' modifier was applied to the field-end specifier for the first key. It would have been equivalent to specify `-k 2n,2' or `-k 2n,2n'. All modifiers except `b' apply to the associated field, regardless of whether the modifier character is attached to the field-start and/or the field-end part of the key specifier.
sort -t : -k 5b,5 -k 3,3n /etc/passwd |
An alternative is to use the global numeric modifier `-n'.
sort -t : -n -k 5b,5 -k 3,3 /etc/passwd |
find src -type f -print0 | sort -t / -z -f | xargs -0 etags --append |
The use of `-print0', `-z', and `-0' in this case mean that pathnames that contain Line Feed characters will not get broken up by the sort operation.
Finally, to ignore both leading and trailing white space, you could have applied the `b' modifier to the field-end specifier for the first key,
sort -t : -n -k 5b,5b -k 3,3 /etc/passwd |
or by using the global `-b' modifier instead of `-n' and an explicit `n' with the second key specifier.
sort -t : -b -k 5,5 -k 3,3n /etc/passwd |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |