Thursday, March 19, 2009

Locales in Linux / Debian

Locales in Linux / Debian

I was updating one of my debian-based servers recently and installing some new software using Debian's highly wonderful 'aptitude' utility.

And I kept getting this warning message: And I was getting it an awful lot. Perl was complaining about this whilst being invoked by the debian package management system. Here's the sort of output you'd get from perl:

# perl
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LANG = ...
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
Note: I've actually recreated this output after the fact so I'm not sure if the LANG setting shown above was what I saw originally. Typing the command
  locale
would give the a similar error.
  locale: Cannot set LC_CTYPE to default locale: No such file or directory
  locale: Cannot set LC_MESSAGES to default locale: No such file or directory
  locale: Cannot set LC_ALL to default locale: No such file or directory

It took a while to figure out the picture here. For a while I fumbled around with commands like locale, locale-def and read the man pages for locale, setlocale all the while descending further into a fog of confusion. It would be fair to say my mind grows weak at the thought of i18n, L10n and character encodings. But persist I did. Finally by some stroke of luck that I can't remember I came across the man page for

  locale-gen
The sun started shining again and things started to make sense between me and my Debian system once more.

So, what locale-gen says is this:

   This manual page documents briefly the locale-gen command.

   By  default,  the  locale  package  which provides the base support for
   localisation of libc-based programs does not contain  usable  localisa
   tion  files  for  every  supported language. This limitation has became
   necessary because of the substantial size of such files and  the  large
   number  of languages supported by libc. As a result, Debian uses a spe
   cial mechanism where we prepare the actual localisation  files  on  the
   target host and distribute only the templates for them.

   locale-gen is a program that reads the file /etc/locale.gen and invokes
   localedef for the chosen localisation profiles.  Run  locale-gen  after
   you have modified the /etc/locale.gen file.

That was certainly more promising. I had been fumbling around with localedef

  localedef --help
which prints the default paths used by localedef. These "default" paths were listed like this:
  locale path    : /usr/lib/locale:/usr/share/i18n

It wasn't terribly clear what all this meant until I started reading up on locale-gen. My /usr/lib/locale was empty or had been emptied, and /usr/share/i18n was packed to the gills with every lang/encoding template-thingumy-jig you could poke a stick at.

Next stop:

  man locale.gen
which said this:
   The  file /etc/locale.gen lists the locales that are to be generated by
   the locale-gen command.

   Each line is of the form:

   <locale> <charset>

   where <locale> is one of the locales given  in  /usr/share/i18n/locales
   and   <charset>  is   one   of   the   character   sets   listed   in
   /usr/share/i18n/charmaps

   The locale-gen command will generate all the locales, placing  them  in
   /usr/lib/locale.

First thing, I had to create /etc/locale.gen which I did with the following content:

en_AU UTF-8
en_AU ISO-8859-1
I'm using AU because I'm Australian. I've include ISO-8859-1 (also known as Latin-1) because when I first did this process I only included UTF-8 and I had funny characters and line break failures in man pages. It looks like my aging version of Debian had configured groff or whatever it used for manpages to output in Latin-1 and not UTF-8. (Note: when using the POSIX setting with no locales installed - see below - I didn't get this problem or any of the above warnings for that matter).

Then I ran

  locale-gen
which generated /usr/lib/locale/locale-archive . I'm not entirely sure this has solved the problem, but it seems my locale can now be set properly:
# locale
LANG=en_AU.UTF-8
LC_CTYPE="en_AU.UTF-8"
LC_NUMERIC="en_AU.UTF-8"
LC_TIME="en_AU.UTF-8"
LC_COLLATE="en_AU.UTF-8"
LC_MONETARY="en_AU.UTF-8"
LC_MESSAGES="en_AU.UTF-8"
LC_PAPER="en_AU.UTF-8"
LC_NAME="en_AU.UTF-8"
LC_ADDRESS="en_AU.UTF-8"
LC_TELEPHONE="en_AU.UTF-8"
LC_MEASUREMENT="en_AU.UTF-8"
LC_IDENTIFICATION="en_AU.UTF-8"
LC_ALL=

It seems that the locale error messages could have been removed by a much shorter route.

  export LC_ALL=POSIX
I also noticed that unsetting something like LANG which might have been set to a locale not installed in /usr/lib/locale also cleared up error messages.
  unset LANG
In both the above cases, "POSIX" is the value used for all LC and LANG settings. Despite this, it still seems to make sense to me, to install both utf-8 and latin-1 locales explicitly.

By the way, if you ever have annoying control characters in manpages and don't have the time to straighten the system out, try this:

  man some_man_page | col -b | less

Saturday, March 7, 2009

Living on the Commandline - The Beginning

Living on the Commandline - The Beginning

I first tried out linux in 2001. I think it was RedHat 7.1. The copy I had came with a magazine. I had brought it home after work and then proceeded to load it up on an old beige box - probably an old pentium II if I recall. Much later, probably in the early hours of the morning - something I still remember now to this day - I saw linux boot up for the first time; I distinctly remember loading up gnome for the first time and watching this foreign and exotic and distinctly non-windows graphical system load itself up on my screen.

I also remember turning the machine off at the switch. After all, that's what you did with dos right? Later, I learnt you had to do: "shutdown -h now" or something similar.

There was something seriously cool and special about doing this. This software was magic. It was free software; written by people who thought it was cool to build something like an alternative desktop system; or who thought it necessary to reverse engineer all the little utilities and foundational libraries that went into defining a unix system; or, most importantly, who thought it cool to actually build a new operating system (technically what is called the "linux kernel" here) which rock solidly ran all of the above.

It was like opening up a whole new universe. I've never looked back.

One of the powerful things about linux and unix in general is its commandline or shell. There is a philosophy and an ethos behind it, magically summed up in the 'Unix Programming Environment' and other such books.

I've struggled to clearly lay out why I think the commandline is so important and I suspect I shall fail here also. Anyone who runs a server (especially a non-Windows server) would understand. It is the power of knowing the name of the thing you want to access and being able to use it in ways that perhaps the original author did not conceive or intend; being able to combine it with other similar commands in a high-level program called a "shell script"; being able to invoke it at a future date in the early hours of the morning whilst you're safely tucked in bed... whatever. Knowing the "name" of a thing allows you to build and do your own things.

GUI (graphical user interface) programs and the commandline don't necessarily compete or intersect - insofar as the GUI is used for desktop applications, that is. No one would want to run Excel or Illustrator in some piecemeal way from the commandline via a series of written commands (although they might have reason to run an excel-like application in a terminal where the shell normally lives). [And of course, Excel has it's own built-in "shell" of a sorts - visual basic for applications - which allows you to script excel actions.]

Nonetheless, having only graphical applications at your disposal is a bit like being confined to a padded cell; a pretty cell with shiny, different coloured walls and flashing lights; but, a cell all the same - constructed for your imprisonment. Having the command line - knowing the "names" of things - allows you to drop out of this confined space into a more open world where you can build almost anything you want.

A well designed gui of course is a great thing indeed; such a thing is unfairly likened to a padded cell. But a mediocre or poor gui... well, it doesn't bear mentioning. Perhaps the thing I object to, is using a gui for everything; that is a great weakness; it limits your capabilities; it reduces your ability to extend yourself into the world and fully control it. I need to add that for normal everyday users, this power is not needed; I'm referring more to people who use computers to run or build services and programs.

Well, I've ranted enough. But now that I've said that, I'm going to post some follow-ups on some of the things I've learnt to do with the commandline.