[Unison-hackers] implementing UTF-16 filesystems support

vouillon Jerome.Vouillon at pps.jussieu.fr
Mon Dec 10 12:40:44 EST 2007


Hello,

On Mon, Dec 10, 2007 at 09:07:02AM +0100, Lucas B. Cohen wrote:
> I've read pretty much every thread on unison-users that contain the keyword
> 'unicode' or 'utf'. I would really like to get Unison to operate between
> Unix and NT machines, and I am willing to spend the necessary time to
> achieve this.
[...]
> At this point I'm not quite sure what exactly I would be trying to do. In
> July 2004, Jérôme Vouillon mentioned that Unison "should eventually switch
> to the Unicode API (or maybe allow to choose between the two APIs)".
> But wouldn't getting Unison to access NT filesystems through the Windows
> UTF-16 API cause it to behave even worse, not matching any 8-bit ASCII
> character read from a UTF-8 encoded Unix filesystem with its corresponding
> dual-byte UTF-16 counterpart?

Indeed, one would need to convert between UTF-16 and UTF-8.  But this
is not a problem as the transcoding is precisely defined and
invertible.  Besides, this can be done at a low level, in a few
functions that access the filesystem, and the rest of Unison will only
see UTF-8 strings.

> I believe Benjamin Pierce's wish was to keep Unison free from character
> encoding issues and rely on the underlying libraries to handle them. But in
> such a case, I don't see how Unison could spare some character encoding
> awareness.

Unison has to be somewhat aware of character encodings, as Mac and
Windows filesystems are case-insensitive.  What we want to avoid is
transcoding between arbitrary character sets which would be a
nightmare to implement right.

-- Jerome


More information about the Unison-hackers mailing list