FTP handling filenames with international characters: Tips for migrating from NetWare to OES Linux

  • 7007779
  • 04-Feb-2011
  • 22-Jan-2013

Environment

Novell Open Enterprise Server 11 (OES 11)
Novell Open Enterprise Server 2 (OES 2)

Situation

Migrating from NetWare FTP to OES Linux FTP provides some new opportunities and possible new challenges for those who use FTP to transmit files whose name contain international characters, often referred to as 'special characters," or "extended ascii."  This is especially tricky if a variety of different code pages are in use throughout an organization.  This document gives some tips knowing what to expect on this subject, when migrating.

Resolution

Overview
 
NetWare FTP, like many traditional FTP servers, was not very flexible with code page handling.  The original FTP specification had no provision for code page selection or negotiation.  Newer methods have been proposed but are not necessarily fully standardized yet, or implemented in every FTP server or FTP client.
 
In essence, NetWare FTP would handle communication of filenames to and from FTP clients as if they were always made up of characters which exist in NetWare's DOS code page.  For example, the DOS code page on a default English NetWare server is 437.  This was not always a good assumption, and could result in some characters being mistranslated or not translatable at all.
 
Some NetWare users have grown accustomed to the way NetWare does this, and have built their processes around this behavior.  They may not wish it to change when they migrate to OES Linux, even though their old method was not perfect. OtherNetWare users have long been frustrated with this limited functionality, and will be glad to know that pure-ftpd on OES Linux offers new abilities.  Solutions for both needs are discussed below.
 
 
Necessary pure-ftpd version
 
The first step to either solution is to have a new enough pure-ftpd package to have the proper functionality.  OES 2 SP3 already contains a pure-ftpd package with the right capabilities.  For OES 2 SP2, the new pure-ftpd is available in the Online Updates channel.  The recommended minimum package is pure-ftpd 1.0.22-0.14.1.
 
 
To preserve the code page behavior of NetWare FTP when migrating to OES Linux
 
If the desired result is for OES Linux's "Novell FTP" (aka LUM-enabled pure-ftpd) to handle these filenames the same way NetWare FTP handled them, then this is accomplished by two additional steps.
 
1.  Use the "language " command on the server console of the NetWare FTP host to verify which "DOS code page" is in use.  The output will be formatted like:
Current NLM language is (4) English
Current DOS code page is (437)
 
2.  Edit /etc/pure-ftpd/pure-ftpd.conf.   Find and set:
ClientCharset  437
(or whatever code page number the NetWare FTP server was using)
 
It is not recommended to set FileSystemCharset .  Leaving that remarked out will typically allow the system to default to the correct value.
 
As always when changing this file, put the changes into effect with the command:  rcpure-ftpd restart
 
 
To take advantage of new code page functionality on OES Linux
 
The best way to preserve correct international characters it to use an FTP client which is compliant with RFC 2640, for the use of UTF8 filenames.  The new version of pure-ftpd supplied in OES 2 SP3 (or SP2 plus Online Updates) is complaint with this method.  Using such a client should result in proper translation and storage of filenames from a variety of code pages.
 
For situations where some FTP clients which do not support UTF8 must still be used, the ClientCharset setting in /etc/pure-ftpd/pure-ftpd.conf can be used.  It will tell pure-ftpd which code page to assume for non-UTF8 clients.  Determining the best code page to select can be tricky.  The code page in use by clients can vary tremendously.  Even on one machine, different FTP processes can be using different code pages.  For example, on a standard English installation of Windows there are at least two code pages configured for use.  Code page 1252 is the code page most GUI applications will use.  However, the command line FTP client will use the simpler, older code page 437.  Other operating systems like Linux and UNIX would be using still different code pages, even for English.  Finding the best way to set this may be a matter of trial and error, and may differ from one environment to another.  A helpful starting point may be to use code page 437, as that is also what NetWare FTP used.  (See the section above on preserving NetWare FTP behavior, for more information.)  After noting the behavior under this setting, different settings can be attempted and the results compared.
 
It is not recommended to set FileSystemCharset .  Leaving that remarked out will typically allow the system to default to the correct value.
 
As always when changing this file, put the changes into effect with the command:  rcpure-ftpd restart