Linux Access Gateway will not submit correct chinese characters in formfill'd pages

  • 3675821
  • 04-Jul-2007
  • 26-Apr-2012

Environment


Novell Access Management 3 Netware Access Gateway
Novell Access Management 3 Linux Access Gateway
Novell Access Management 3 Access Administration
Novell Access Management 3 Linux Novell Identity Server
Access Manager 3 Support Pack 1 beta 1 applied

Situation

Linux Access Gateway protecting a web application using Chinese Name as in forms login username. When trying to configure the Access Manager Form Fill feature for this application to inject LDAP Attributes, it fails to pass the chinese character from eDirectory. Changing the injected data to pull from the Shared Secret store also failed by returning other unreadable characters, instead of the expected chinese characters.

The character set enabled on the backend web server was GB2312, which matches the encoding enabled on the browser.

Resolution

Two things must be done to resolve this issue:

1. apply the Access Manager 3 SP1 Release candidate build (3.0.1-171) or greater to the Access Manager components and;
2. make sure that the UTF8 character set is enabled on the back end web server and browser. Access Manager will work with UTF8 encoded characters only as this is the default format of data stored in the configuration store, and eDir LDAP servers.

Additional Information

GB2312 IS representable in UTF8. All data storage in the formfill store / data transfer to and from must be in UTF8 for LDAP compatibility. The format is

http://www.herongyang.com/gb2312/overview.html

The main reason why we need UTF8 everywhere lies in the fact that a POST HTTP operation doesn't contain any hint about what Charset the data is coming back from the browser.
While the webserver maybe telling it is GB2312, it is only doing that while sending the page to the browser. As formfill is stateless, the post coming back to the (N/L)AG does not contain any of that information. This is more or less a shortcoming of the reference at http://www.w3.org/TR/html4/interact/forms.html#adef-accept-charset.

The problem thus is that the browser determines the charset based on the following: (Last one wins)

1) Browser settings
2) Charset value ( http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.4 )
3) Meta tag
4) Accept-Charset in form.

The issue is that this information is in the actual page containing the form, but NOT in the post data. As it is stateless, formfill doesn't know it on the data coming back.

I hope the Javascript can determine the browser setting, all the other information could basically be decoded and inserted (somehow) during parsing of the page for formfill.
NAG very likely will never be able to do any translation, as it doesn't have sufficient library support. LAG has ICU which should have all necessary translation tables.
If we have the condition that we cannot determine the browser setting - and there are no headers, we just might have to assume UTF8 in that specific case (or have a service based setting) but all other cases can be determined parsing the headers or doc on the way out...