Node:sentence-end, Next:re-search-forward, Previous:Regexp Search, Up:Regexp Search
sentence-end
The symbol sentence-end
is bound to the pattern that marks the
end of a sentence. What should this regular expression be?
Clearly, a sentence may be ended by a period, a question mark, or an exclamation mark. Indeed, only clauses that end with one of those three characters should be considered the end of a sentence. This means that the pattern should include the character set:
[.?!]
However, we do not want forward-sentence
merely to jump to a
period, a question mark, or an exclamation mark, because such a character
might be used in the middle of a sentence. A period, for example, is
used after abbreviations. So other information is needed.
According to convention, you type two spaces after every sentence, but only one space after a period, a question mark, or an exclamation mark in the body of a sentence. So a period, a question mark, or an exclamation mark followed by two spaces is a good indicator of an end of sentence. However, in a file, the two spaces may instead be a tab or the end of a line. This means that the regular expression should include these three items as alternatives.
This group of alternatives will look like this:
\\($\\| \\| \\) ^ ^^ TAB SPC
Here, $
indicates the end of the line, and I have pointed out
where the tab and two spaces are inserted in the expression. Both are
inserted by putting the actual characters into the expression.
Two backslashes, \\
, are required before the parentheses and
vertical bars: the first backslash quotes the following backslash in
Emacs; and the second indicates that the following character, the
parenthesis or the vertical bar, is special.
Also, a sentence may be followed by one or more carriage returns, like this:
[ ]*
Like tabs and spaces, a carriage return is inserted into a regular expression by inserting it literally. The asterisk indicates that the <RET> is repeated zero or more times.
But a sentence end does not consist only of a period, a question mark or an exclamation mark followed by appropriate space: a closing quotation mark or a closing brace of some kind may precede the space. Indeed more than one such mark or brace may precede the space. These require a expression that looks like this:
[]\"')}]*
In this expression, the first ]
is the first character in the
expression; the second character is "
, which is preceded by a
\
to tell Emacs the "
is not special. The last
three characters are '
, )
, and }
.
All this suggests what the regular expression pattern for matching the
end of a sentence should be; and, indeed, if we evaluate
sentence-end
we find that it returns the following value:
sentence-end => "[.?!][]\"')}]*\\($\\| \\| \\)[ ]*"