Problems working with filenames containing special characters

  • 7000353
  • 12-May-2008
  • 17-Jul-2017

Environment

Novell Open Enterprise Server (Linux based)
Novell SUSE Linux Enterprise Desktop 10
Novell SUSE Linux Enterprise Server 10
Novell SUSE Linux Enterprise Server 9
Novell Linux Desktop 9
OpenOffice.org

Situation

Some applications, for instance GNOME applications or OpenOffice.org, have problems working with filenames which contain characters outside the ASCII set like accented characters, Cyrillic characters, Chinese ideograms, Japanese (Kanji, Hiragana, Katakana) characters, or Korean (Hangul, Hanja) characters.

Resolution

Use a Unicode environment (preferred), or set up the environment to assume a legacy character encoding (workaround).

Additional Information

Unix filesystems do not store information about which character encoding (or"code page") is used for file names. Thus, if characters outside the ASCII range are to be used in filenames, they should be encoded using a Unicode encoding suitable for file names, i.e. UTF-8, in order to avoid interpretation issues between users with different locale settings.

Background reading

Internationalisation and localisation is a complex topic. Refer to the Wikipedia article Internationalization and localization for background. Wikipedia also includes an article category Character encoding which includes articles on Unicode and UTF-8 as well as on various legacy character encodings still in common use.

Converting the encoding of file names to UTF-8

The convmv package provides a way to reencode filenames to UTF-8. Refer to the OpenSUSE support database article Converting Files or File Names to UTF-8 Encoding for details.

Using legacy character encodings with Gtk/GNOME applications

GNOME applications and Gtk applications in general build on a common low-level library, GLib. By default, GLib assumes that the encoding used for filenames is UTF-8. To change this, the environment variable G_FILENAME_ENCODING can be used. When set to @locale, it will cause GLib to assume file names are encoded using the encoding of the current locale. When set to a character set name, e.g. Big5, that character set will be assumed.

Refer to the GLib Reference Manual's section Running and debugging GLib Applications for full details.

Change Log

2017-07-17 jreuter  removed dead link