Node:Limitations of Usual Tools, Next:Limitations of Make, Previous:Limitations of Builtins, Up:Portable Shell
The small set of tools you can expect to find on any machine can still include some limitations you should be aware of.
awk
$ gawk 'function die () { print "Aaaaarg!" } BEGIN { die () }' gawk: cmd. line:2: BEGIN { die () } gawk: cmd. line:2: ^ parse error $ gawk 'function die () { print "Aaaaarg!" } BEGIN { die() }' Aaaaarg!
If you want your program to be deterministic, don't depend on for
on arrays:
$ cat for.awk END { arr["foo"] = 1 arr["bar"] = 1 for (i in arr) print i } $ gawk -f for.awk </dev/null foo bar $ nawk -f for.awk </dev/null bar foo
Some AWK, such as HPUX 11.0's native one, have regex engines fragile to
inner anchors:
$ echo xfoo | $AWK '/foo|^bar/ { print }' $ echo bar | $AWK '/foo|^bar/ { print }' bar $ echo xfoo | $AWK '/^bar|foo/ { print }' xfoo $ echo bar | $AWK '/^bar|foo/ { print }' bar
Either do not depend on such patterns (i.e., use /^(.*foo|bar)/
,
or use a simple test to reject such AWK.
cat
-v
, which displays
non-printing characters, seems portable, though.
cc
cc foo.c -o foo
fails, some compilers
(such as CDS on Reliant UNIX) leave a foo.o
.
HP-UX cc
doesn't accept .S
files to preprocess and
assemble. cc -c foo.S
will appear to succeed, but in fact does
nothing.
cmp
cmp
performs a raw data comparison of two files, while
diff
compares two text files. Therefore, if you might compare
DOS files, even if only checking whether two files are different, use
diff
to avoid spurious differences due to differences of
newline encoding.
cp
cp
does not support -f
, although its
mv
does. It's possible to deduce why mv
and
cp
are different with respect to -f
. mv
prompts by default before overwriting a read-only file. cp
does not. Therefore, mv
requires a -f
option, but
cp
does not. mv
and cp
behave differently
with respect to read-only files because the simplest form of
cp
cannot overwrite a read-only file, but the simplest form of
mv
can. This is because cp
opens the target for
write access, whereas mv
simply calls link
(or, in
newer systems, rename
).
date
date
do not recognize special % directives,
and unfortunately, instead of complaining, they just pass them through,
and exit with success:
$ uname -a OSF1 medusa.sis.pasteur.fr V5.1 732 alpha $ date "+%s" %s
diff
-u
is nonportable.
Some implementations, such as Tru64's, fail when comparing to
/dev/null
. Use an empty file instead.
dirname
dirname
, and you should instead
use AS_DIRNAME
(see Programming in M4sh). For example:
dir=`dirname "$file"` # This is not portable. dir=`AS_DIRNAME(["$file"])` # This is more portable.
This handles a few subtleties in the standard way required by
POSIX. For example, under UN*X, should dirname //1
give
/
? Paul Eggert answers:
No, under some older flavors of Unix, leading//
is a special path name: it refers to a "super-root" and is used to access other machines' files. Leading///
,////
, etc. are equivalent to/
; but leading//
is special. I think this tradition started with Apollo Domain/OS, an OS that is still in use on some older hosts.POSIX allows but does not require the special treatment for
//
. It says that the behavior of dirname on path names of the form//([^/]+/*)?
is implementation defined. In these cases, GNUdirname
returns/
, but it's more portable to return//
as this works even on those older flavors of Unix.
egrep
?
instead. For
instance with Digital Unix v5.0:
> printf "foo\n|foo\n" | egrep '^(|foo|bar)$' |foo > printf "bar\nbar|\n" | egrep '^(foo|bar|)$' bar| > printf "foo\nfoo|\n|bar\nbar\n" | egrep '^(foo||bar)$' foo |bar
egrep
also suffers the limitations of grep
.
expr
expr
keyword starts with x
, so use expr
x"word" : 'xregex'
to keep expr
from
misinterpreting word.
Don't use length
, substr
, match
and index
.
expr
(|
)
|
. Although POSIX does require that expr
''
return the empty string, it does not specify the result when you
|
together the empty string (or zero) with the empty string. For
example:
expr '' \| ''
GNU/Linux and POSIX.2-1992 return the empty string for this
case, but traditional Unix returns 0
(Solaris is one such
example). In the latest POSIX draft, the specification has been
changed to match traditional Unix's behavior (which is bizarre, but it's
too late to fix this). Please note that the same problem does arise
when the empty string results from a computation, as in:
expr bar : foo \| foo : bar
Avoid this portability problem by avoiding the empty string.
expr
(:
)
\?
, \+
and \|
in patterns, they are
not supported on Solaris.
The POSIX.2-1992 standard is ambiguous as to whether expr a :
b
(and expr 'a' : '\(b\)'
) output 0
or the empty string.
In practice, it outputs the empty string on most platforms, but portable
scripts should not assume this. For instance, the QNX 4.25 native
expr
returns 0
.
You may believe that one means to get a uniform behavior would be to use
the empty string as a default value:
expr a : b \| ''
unfortunately this behaves exactly as the original expression, see the
entry for more information.
expr
(:
)
Older expr
implementations (e.g. SunOS 4 expr
and
Solaris 8 /usr/ucb/expr
) have a silly length limit that causes
expr
to fail if the matched substring is longer than 120
bytes. In this case, you might want to fall back on echo|sed
if
expr
fails.
Don't leave, there is some more!
The QNX 4.25 expr
, in addition of preferring 0
to
the empty string, has a funny behavior in its exit status: it's always 1
when parentheses are used!
$ val=`expr 'a' : 'a'`; echo "$?: $val" 0: 1 $ val=`expr 'a' : 'b'`; echo "$?: $val" 1: 0 $ val=`expr 'a' : '\(a\)'`; echo "?: $val" 1: a $ val=`expr 'a' : '\(b\)'`; echo "?: $val" 1: 0
In practice this can be a big problem if you are ready to catch failures
of expr
programs with some other method (such as using
sed
), since you may get twice the result. For instance
$ expr 'a' : '\(a\)' || echo 'a' | sed 's/^\(a\)$/\1/'
will output a
on most hosts, but aa
on QNX 4.25. A
simple work around consists in testing expr
and use a variable
set to expr
or to false
according to the result.
find
-maxdepth
seems to be GNU specific. Tru64 v5.1,
NetBSD 1.5 and Solaris 2.5 find
commands do not understand it.
The replacement of {}
is guaranteed only if the argument is
exactly {}, not if it's only a part of an argument. For
instance on DU, and HP-UX 10.20 and HP-UX 11:
$ touch foo $ find . -name foo -exec echo "{}-{}" \; {}-{}
while GNU find
reports ./foo-./foo
.
grep
grep -s
to suppress output, because grep -s
on
System V does not suppress output, only error messages. Instead,
redirect the standard output and standard error (in case the file
doesn't exist) of grep
to /dev/null
. Check the exit
status of grep
to determine whether it found a match.
Don't use multiple regexps with -e
, as some grep
will only
honor the last pattern (eg., IRIX 6.5 and Solaris 2.5.1). Anyway,
Stardent Vistra SVR4 grep
lacks -e
... Instead, use
alternation and egrep
.
ln
ln
having a -f
option. Symbolic links
are not available on old systems, use ln
as a fall back.
For versions of the DJGPP before 2.04, ln
emulates soft links
for executables by generating a stub that in turn calls the real
program. This feature also works with nonexistent files like in the
Unix spec. So ln -s file link
will generate link.exe
,
which will attempt to call file.exe
if run. But this feature only
works for executables, so cp -p
is used instead for these
systems. DJGPP versions 2.04 and later have full symlink support.
mv
-f
and -i
.
Moving individual files between file systems is portable (it was in V6),
but it is not always atomic: when doing mv new existing
, there's
a critical section where neither the old nor the new version of
existing
actually exists.
Moving directories across mount points is not portable, use cp
and rm
.
Moving/Deleting open files isn't portable. The following can't be done
on DOS/WIN32:
exec > foo mv foo bar
nor can
exec > foo rm -f foo
sed
sed
will reject s/[^/]*$//
: use s,[^/]*$,,
.
Sed scripts should not use branch labels longer than 8 characters and should not contain comments.
Don't include extra ;
, as some sed
, such as NetBSD
1.4.2's, try to interpret the second as a command:
$ echo a | sed 's/x/x/;;s/x/x/' sed: 1: "s/x/x/;;s/x/x/": invalid command code ;
Input should have reasonably long lines, since some sed
have
an input buffer limited to 4000 bytes.
Alternation, \|
, is common but POSIX.2 does not require its
support, so it should be avoided in portable scripts. Solaris 8
sed
does not support alternation; e.g. sed '/a\|b/d'
deletes only lines that contain the literal string a|b
.
Anchors (^
and $
) inside groups are not portable.
Nested parenthesization in patterns (e.g., \(\(a*\)b*)\)
) is
quite portable to modern hosts, but is not supported by some older
sed
implementations like SVR3.
Of course the option -e
is portable, but it is not needed. No
valid Sed program can start with a dash, so it does not help
disambiguating. Its sole usefulness is helping enforcing indenting as
in:
sed -e instruction-1 \ -e instruction-2
as opposed to
sed instruction-1;instruction-2
Contrary to yet another urban legend, you may portably use &
in
the replacement part of the s
command to mean "what was
matched". All descendents of Bell Lab's V7 sed
(at least; we
don't have first hand experience with older sed
s) have
supported it.
sed
(t
)
sed
that "forget" to reset their
t
flag when starting a new cycle. For instance on MIPS
RISC/OS, and on IRIX 5.3, if you run the following sed
script (the line numbers are not actual part of the texts):
s/keep me/kept/g # a t end # b s/.*/deleted/g # c : end # d
on
delete me # 1 delete me # 2 keep me # 3 delete me # 4
you get
deleted delete me kept deleted
instead of
deleted deleted kept deleted
Why? When processing 1, a matches, therefore sets the t flag, b jumps to
d, and the output is produced. When processing line 2, the t flag is
still set (this is the bug). Line a fails to match, but sed
is not supposed to clear the t flag when a substitution fails. Line b
sees that the flag is set, therefore it clears it, and jumps to d, hence
you get delete me
instead of deleted
. When processing 3 t
is clear, a matches, so the flag is set, hence b clears the flags and
jumps. Finally, since the flag is clear, 4 is processed properly.
There are two things one should remind about t
in sed
.
Firstly, always remember that t
jumps if some substitution
succeeded, not only the immediately preceding substitution, therefore,
always use a fake t clear; : clear
to reset the t flag where
indeed.
Secondly, you cannot rely on sed
to clear the flag at each new
cycle.
One portable implementation of the script above is:
t clear : clear s/keep me/kept/g t end s/.*/deleted/g : end
touch
touch
or any command that
results in an empty file does not update the timestamps, so use a
command like echo
as a workaround.
GNU touch
3.16r (and presumably all before that) fails to work
on SunOS 4.1.3 when the empty file is on an NFS-mounted 4.2 volume.