[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.2 The Format of PO Files

A PO file is made up of many entries, each entry holding the relation between an original untranslated string and its corresponding translation. All entries in a given PO file usually pertain to a single project, and all translations are expressed in a single target language. One PO file entry has the following schematic structure:

 
white-space
#  translator-comments
#. automatic-comments
#: reference...
#, flag...
msgid untranslated-string
msgstr translated-string

The general structure of a PO file should be well understood by the translator. When using PO mode, very little has to be known about the format details, as PO mode takes care of them for her.

Entries begin with some optional white space. Usually, when generated through GNU gettext tools, there is exactly one blank line between entries. Then comments follow, on lines all starting with the character #. There are two kinds of comments: those which have some white space immediately following the #, which comments are created and maintained exclusively by the translator, and those which have some non-white character just after the #, which comments are created and maintained automatically by GNU gettext tools. All comments, of either kind, are optional.

After white space and comments, entries show two strings, namely first the untranslated string as it appears in the original program sources, and then, the translation of this string. The original string is introduced by the keyword msgid, and the translation, by msgstr. The two strings, untranslated and translated, are quoted in various ways in the PO file, using " delimiters and \ escapes, but the translator does not really have to pay attention to the precise quoting format, as PO mode fully takes care of quoting for her.

The msgid strings, as well as automatic comments, are produced and managed by other GNU gettext tools, and PO mode does not provide means for the translator to alter these. The most she can do is merely deleting them, and only by deleting the whole entry. On the other hand, the msgstr string, as well as translator comments, are really meant for the translator, and PO mode gives her the full control she needs.

The comment lines beginning with #, are special because they are not completely ignored by the programs as comments generally are. The comma separated list of flags is used by the msgfmt program to give the user some better diagnostic messages. Currently there are two forms of flags defined:

fuzzy
This flag can be generated by the msgmerge program or it can be inserted by the translator herself. It shows that the msgstr string might not be a correct translation (anymore). Only the translator can judge if the translation requires further modification, or is acceptable as is. Once satisfied with the translation, she then removes this fuzzy attribute. The msgmerge program inserts this when it combined the msgid and msgstr entries after fuzzy search only. See section 6.3 Fuzzy Entries.

c-format
no-c-format
These flags should not be added by a human. Instead only the xgettext program adds them. In an automated PO file processing system as proposed here the user changes would be thrown away again as soon as the xgettext program generates a new template file.

In case the c-format flag is given for a string the msgfmt does some more tests to check to validity of the translation. See section 8.1 Invoking the msgfmt Program.

A different kind of entries is used for translations which involve plural forms.

 
white-space
#  translator-comments
#. automatic-comments
#: reference...
#, flag...
msgid untranslated-string-singular
msgid_plural untranslated-string-plural
msgstr[0] translated-string-case-0
...
msgstr[N] translated-string-case-n

It happens that some lines, usually whitespace or comments, follow the very last entry of a PO file. Such lines are not part of any entry, and PO mode is unable to take action on those lines. By using the PO mode function M-x po-normalize, the translator may get rid of those spurious lines. See section 2.5 Normalizing Strings in Entries.

The remainder of this section may be safely skipped by those using PO mode, yet it may be interesting for everybody to have a better idea of the precise format of a PO file. On the other hand, those not having Emacs handy should carefully continue reading on.

Each of untranslated-string and translated-string respects the C syntax for a character string, including the surrounding quotes and embedded backslashed escape sequences. When the time comes to write multi-line strings, one should not use escaped newlines. Instead, a closing quote should follow the last character on the line to be continued, and an opening quote should resume the string at the beginning of the following PO file line. For example:

 
msgid ""
"Here is an example of how one might continue a very long string\n"
"for the common case the string represents multi-line output.\n"

In this example, the empty string is used on the first line, to allow better alignment of the H from the word `Here' over the f from the word `for'. In this example, the msgid keyword is followed by three strings, which are meant to be concatenated. Concatenating the empty string does not change the resulting overall string, but it is a way for us to comply with the necessity of msgid to be followed by a string on the same line, while keeping the multi-line presentation left-justified, as we find this to be a cleaner disposition. The empty string could have been omitted, but only if the string starting with `Here' was promoted on the first line, right after msgid.(2) It was not really necessary either to switch between the two last quoted strings immediately after the newline `\n', the switch could have occurred after any other character, we just did it this way because it is neater.

One should carefully distinguish between end of lines marked as `\n' inside quotes, which are part of the represented string, and end of lines in the PO file itself, outside string quotes, which have no incidence on the represented string.

Outside strings, white lines and comments may be used freely. Comments start at the beginning of a line with `#' and extend until the end of the PO file line. Comments written by translators should have the initial `#' immediately followed by some white space. If the `#' is not immediately followed by white space, this comment is most likely generated and managed by specialized GNU tools, and might disappear or be replaced unexpectedly when the PO file is given to msgmerge.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

This document was generated on May 6, 2002 using texi2html