Thursday, March 19, 2009

Locales in Linux / Debian

Locales in Linux / Debian

I was updating one of my debian-based servers recently and installing some new software using Debian's highly wonderful 'aptitude' utility.

And I kept getting this warning message: And I was getting it an awful lot. Perl was complaining about this whilst being invoked by the debian package management system. Here's the sort of output you'd get from perl:

# perl
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LANG = ...
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
Note: I've actually recreated this output after the fact so I'm not sure if the LANG setting shown above was what I saw originally. Typing the command
  locale
would give the a similar error.
  locale: Cannot set LC_CTYPE to default locale: No such file or directory
  locale: Cannot set LC_MESSAGES to default locale: No such file or directory
  locale: Cannot set LC_ALL to default locale: No such file or directory

It took a while to figure out the picture here. For a while I fumbled around with commands like locale, locale-def and read the man pages for locale, setlocale all the while descending further into a fog of confusion. It would be fair to say my mind grows weak at the thought of i18n, L10n and character encodings. But persist I did. Finally by some stroke of luck that I can't remember I came across the man page for

  locale-gen
The sun started shining again and things started to make sense between me and my Debian system once more.

So, what locale-gen says is this:

   This manual page documents briefly the locale-gen command.

   By  default,  the  locale  package  which provides the base support for
   localisation of libc-based programs does not contain  usable  localisa
   tion  files  for  every  supported language. This limitation has became
   necessary because of the substantial size of such files and  the  large
   number  of languages supported by libc. As a result, Debian uses a spe
   cial mechanism where we prepare the actual localisation  files  on  the
   target host and distribute only the templates for them.

   locale-gen is a program that reads the file /etc/locale.gen and invokes
   localedef for the chosen localisation profiles.  Run  locale-gen  after
   you have modified the /etc/locale.gen file.

That was certainly more promising. I had been fumbling around with localedef

  localedef --help
which prints the default paths used by localedef. These "default" paths were listed like this:
  locale path    : /usr/lib/locale:/usr/share/i18n

It wasn't terribly clear what all this meant until I started reading up on locale-gen. My /usr/lib/locale was empty or had been emptied, and /usr/share/i18n was packed to the gills with every lang/encoding template-thingumy-jig you could poke a stick at.

Next stop:

  man locale.gen
which said this:
   The  file /etc/locale.gen lists the locales that are to be generated by
   the locale-gen command.

   Each line is of the form:

   <locale> <charset>

   where <locale> is one of the locales given  in  /usr/share/i18n/locales
   and   <charset>  is   one   of   the   character   sets   listed   in
   /usr/share/i18n/charmaps

   The locale-gen command will generate all the locales, placing  them  in
   /usr/lib/locale.

First thing, I had to create /etc/locale.gen which I did with the following content:

en_AU UTF-8
en_AU ISO-8859-1
I'm using AU because I'm Australian. I've include ISO-8859-1 (also known as Latin-1) because when I first did this process I only included UTF-8 and I had funny characters and line break failures in man pages. It looks like my aging version of Debian had configured groff or whatever it used for manpages to output in Latin-1 and not UTF-8. (Note: when using the POSIX setting with no locales installed - see below - I didn't get this problem or any of the above warnings for that matter).

Then I ran

  locale-gen
which generated /usr/lib/locale/locale-archive . I'm not entirely sure this has solved the problem, but it seems my locale can now be set properly:
# locale
LANG=en_AU.UTF-8
LC_CTYPE="en_AU.UTF-8"
LC_NUMERIC="en_AU.UTF-8"
LC_TIME="en_AU.UTF-8"
LC_COLLATE="en_AU.UTF-8"
LC_MONETARY="en_AU.UTF-8"
LC_MESSAGES="en_AU.UTF-8"
LC_PAPER="en_AU.UTF-8"
LC_NAME="en_AU.UTF-8"
LC_ADDRESS="en_AU.UTF-8"
LC_TELEPHONE="en_AU.UTF-8"
LC_MEASUREMENT="en_AU.UTF-8"
LC_IDENTIFICATION="en_AU.UTF-8"
LC_ALL=

It seems that the locale error messages could have been removed by a much shorter route.

  export LC_ALL=POSIX
I also noticed that unsetting something like LANG which might have been set to a locale not installed in /usr/lib/locale also cleared up error messages.
  unset LANG
In both the above cases, "POSIX" is the value used for all LC and LANG settings. Despite this, it still seems to make sense to me, to install both utf-8 and latin-1 locales explicitly.

By the way, if you ever have annoying control characters in manpages and don't have the time to straighten the system out, try this:

  man some_man_page | col -b | less