The sendmail program views the text that makes up rules and addresses as being composed of individual tokens. Rules are tokenized - divided up into individual parts - while the configuration file is being read and while they are being normalized. Addresses are tokenized at another time (as we'll show later), but the process is the same for both.
The text
our.domain
, for example, is composed of three tokens:
our
, a dot, and
domain
. These 10 characters are divided into tokens by the list of separation characters defined by the
OperatorChars
(pre-V8.7
$o
) option (see
Section 34.8.45, OperatorChars or $o
):
Do.:%@!^=/[] prior to V8.7 O OperatorChars=.:%@!^/[] V8.7 and above
When any of these separation characters are recognized in text, they are considered individual tokens. Any leftover text is then combined into the remaining tokens.
xxx@yyy;zzz becomes xxx @ yyy;zzz
@
is defined to be a token, but
;
is not. Therefore, the text is divided into three tokens. However, in addition to the characters in the
OperatorChars
(pre-V8.7
$o
) option,
sendmail
defines 10 tokenizing characters internally (in
parseaddr.c
):
()<>,;\"\r\n
These two lists are combined into one master list that is used for all tokenizing. The above example, when divided by using this master list, becomes five tokens instead of just three:
xxx@yyy;zzz becomes xxx @ yyy ; zzz
In rules, quotation marks can be used to override the meaning of tokenizing characters defined in the master list. For example,
"xxx@yyy";zzz becomes "xxx@yyy" ; zzz
Here, three tokens are produced, because the
@
appears inside quotation marks. Note that the quotation marks are retained.
Because the configuration file is read sequentially from start to finish, the
OperatorChars
(pre-V8.7
$o
) option should be defined before any rules are declared. But note that beginning with V8.7
sendmail
, omission of this option cause the separation characters to default to
. : % @ ! ^ / [ ]
As we progress into the details of rules, you will see that certain characters become operators when prefixed with a
$
character. Operators cause
sendmail
to perform actions, such as looking for a match (
$*
is a wildcard operator) or replacing tokens with others by position (
$1
is a replacement operator).
For tokenizing purposes, operators always divide one token from another, just as the characters in the master list did. For example
xxx$*zzz becomes xxx $* zzz
The space character is special for two reasons. First, although the space character is not in the master list, it always separates one token from another:
xxx zzz becomes xxx zzz
Second, although the space character separates tokens, it is not itself a token. That is, in the above example the seven characters on the left (the seventh is the space in the middle) become two tokens of three letters each, not three tokens. Therefore the space character can be used inside the LHS or RHS of rules for improved clarity but does not itself become a token or change the meaning of the rule.
After an address has passed through all the rules (and has been modified by rewriting), the tokens that form it are pasted back together to form a single string. The pasting process is very straightforward in that it mirrors the tokenizing process:
xxx @ yyy becomes xxx@yyy
The only exception to this straightforward pasting process occurs when two adjoining tokens are both simple text. Simple text is anything other than the separation characters (defined by the
OperatorChars
(pre-V8.7,
$o
) option, see
Section 34.8.45
, and internally by
sendmail
) or the operators (characters prefixed by a
$
character). The
xxx
and
yyy
above are both simple text.
When two tokens of simple text are pasted together, the character defined by the
BlankSub
(
B
) option (see
Section 34.8.5, BlankSub (B)
) is inserted between them. [4] Usually, that option is defined as a dot, so two tokens of simple text would have a dot inserted between them when they are joined:
[4] In the old days (RFC733), usernames to the left of the
@
could contain spaces. But UNIX also uses spaces as command-line argument separators, so optionB
was introduced.
xxx yyy becomes xxx.yyy
Note that the improper use of a space character in the LHS or RHS of rules can lead to addresses that have a dot (or other character) inserted where one was not intended.