From mli6 at free.fr Mon Dec 10 03:07:02 2007 From: mli6 at free.fr (Lucas B. Cohen) Date: Mon, 10 Dec 2007 09:07:02 +0100 Subject: [Unison-hackers] implementing UTF-16 filesystems support Message-ID: <024a01c83b03$a587f2f0$6400a8c0@amherst> Hello hackers, I've read pretty much every thread on unison-users that contain the keyword 'unicode' or 'utf'. I would really like to get Unison to operate between Unix and NT machines, and I am willing to spend the necessary time to achieve this. I've begun learning rudiments of Objective CAML, and I'm able to understand about 70% of the statements (phrases?) in Unison's sources, which I've familiarized myself with. At this point I'm not quite sure what exactly I would be trying to do. In July 2004, J?r?me Vouillon mentioned that Unison "should eventually switch to the Unicode API (or maybe allow to choose between the two APIs)". But wouldn't getting Unison to access NT filesystems through the Windows UTF-16 API cause it to behave even worse, not matching any 8-bit ASCII character read from a UTF-8 encoded Unix filesystem with its corresponding dual-byte UTF-16 counterpart? I believe Benjamin Pierce's wish was to keep Unison free from character encoding issues and rely on the underlying libraries to handle them. But in such a case, I don't see how Unison could spare some character encoding awareness. Thank you for your consideration, Lucas B. Cohen From Jerome.Vouillon at pps.jussieu.fr Mon Dec 10 12:40:44 2007 From: Jerome.Vouillon at pps.jussieu.fr (vouillon) Date: Mon, 10 Dec 2007 18:40:44 +0100 Subject: [Unison-hackers] implementing UTF-16 filesystems support In-Reply-To: <024a01c83b03$a587f2f0$6400a8c0@amherst> References: <024a01c83b03$a587f2f0$6400a8c0@amherst> Message-ID: <20071210174044.GA23380@pps.jussieu.fr> Hello, On Mon, Dec 10, 2007 at 09:07:02AM +0100, Lucas B. Cohen wrote: > I've read pretty much every thread on unison-users that contain the keyword > 'unicode' or 'utf'. I would really like to get Unison to operate between > Unix and NT machines, and I am willing to spend the necessary time to > achieve this. [...] > At this point I'm not quite sure what exactly I would be trying to do. In > July 2004, J?r?me Vouillon mentioned that Unison "should eventually switch > to the Unicode API (or maybe allow to choose between the two APIs)". > But wouldn't getting Unison to access NT filesystems through the Windows > UTF-16 API cause it to behave even worse, not matching any 8-bit ASCII > character read from a UTF-8 encoded Unix filesystem with its corresponding > dual-byte UTF-16 counterpart? Indeed, one would need to convert between UTF-16 and UTF-8. But this is not a problem as the transcoding is precisely defined and invertible. Besides, this can be done at a low level, in a few functions that access the filesystem, and the rest of Unison will only see UTF-8 strings. > I believe Benjamin Pierce's wish was to keep Unison free from character > encoding issues and rely on the underlying libraries to handle them. But in > such a case, I don't see how Unison could spare some character encoding > awareness. Unison has to be somewhat aware of character encodings, as Mac and Windows filesystems are case-insensitive. What we want to avoid is transcoding between arbitrary character sets which would be a nightmare to implement right. -- Jerome From mli6 at free.fr Wed Dec 12 01:39:47 2007 From: mli6 at free.fr (Lucas B. Cohen) Date: Wed, 12 Dec 2007 07:39:47 +0100 Subject: [Unison-hackers] building the GTK2 UI on Cygwin In-Reply-To: <20071210174044.GA23380@pps.jussieu.fr> References: <024a01c83b03$a587f2f0$6400a8c0@amherst> <20071210174044.GA23380@pps.jussieu.fr> Message-ID: <004601c83c89$c9c3fd30$6400a8c0@amherst> > De : unison-hackers-bounces at lists.seas.upenn.edu [mailto:unison-hackers- > bounces at lists.seas.upenn.edu] De la part de vouillon > On Mon, Dec 10, 2007 at 09:07:02AM +0100, Lucas B. Cohen wrote: > > I've read pretty much every thread on unison-users that contain the keyword > > 'unicode' or 'utf'. I would really like to get Unison to operate between > > Unix and NT machines, and I am willing to spend the necessary time to > > achieve this. > > At this point I'm not quite sure what exactly I would be trying to do. In > > July 2004, J?r?me Vouillon mentioned that Unison "should eventually switch > > to the Unicode API (or maybe allow to choose between the two APIs)". > > But wouldn't getting Unison to access NT filesystems through the Windows > > UTF-16 API cause it to behave even worse, not matching any 8-bit ASCII > > character read from a UTF-8 encoded Unix filesystem with its corresponding > > dual-byte UTF-16 counterpart? > > Indeed, one would need to convert between UTF-16 and UTF-8. But this > is not a problem as the transcoding is precisely defined and > invertible. Besides, this can be done at a low level, in a few > functions that access the filesystem, and the rest of Unison will only > see UTF-8 strings. In fact, this has already been done by Hisao Suzuki in his 'UTF-8 Cygwin' project. His work can be used when Unison is compiled in a Cygwin environment, with the Unix version of OCaml. The filesystem-related calls are made by the Unix module instead of the win32unix substitute, and are thus handled by the Cygwin emulation. Building the text version of Unison is pretty straightforward, however I was not able to run the GTK2 one. Compiling it works, but the program crashes immediately at runtime with the following error : Uncaught exception Gtk.Error("GtkMain.init: initialization failed\nml_gtk_init: initialization failed"). This happens with OCaml 3.08.0 and 3.10.0, and with versions 2.4.0-2 or 20060908-1 of lablgtk2. Is there a structural reason why GTK2/lablgtk2 cannot function together ? I noticed the Cygwin distribution ships OCaml and lablgtk2 binaries, but only the text version of Unison. Jacques : thanks for your answer. LBC From mli6 at free.fr Wed Dec 12 02:31:33 2007 From: mli6 at free.fr (Lucas B. Cohen) Date: Wed, 12 Dec 2007 08:31:33 +0100 Subject: [Unison-hackers] building the GTK2 UI on Cygwin In-Reply-To: <004601c83c89$c9c3fd30$6400a8c0@amherst> References: <024a01c83b03$a587f2f0$6400a8c0@amherst><20071210174044.GA23380@pps.jussieu.fr> <004601c83c89$c9c3fd30$6400a8c0@amherst> Message-ID: <005e01c83c91$0585fa10$6400a8c0@amherst> > De?: unison-hackers-bounces at lists.seas.upenn.edu [mailto:unison-hackers- > bounces at lists.seas.upenn.edu] De la part de Lucas B. Cohen > Jacques : thanks for your answer. J?r?me, not Jacques ! My apologies. From andrex at alumni.utexas.net Wed Dec 12 03:47:04 2007 From: andrex at alumni.utexas.net (Andrew Schulman) Date: Wed, 12 Dec 2007 03:47:04 -0500 Subject: [Unison-hackers] building the GTK2 UI on Cygwin References: <024a01c83b03$a587f2f0$6400a8c0@amherst> <20071210174044.GA23380@pps.jussieu.fr> <004601c83c89$c9c3fd30$6400a8c0@amherst> Message-ID: > Building the text version of Unison is pretty straightforward, however I was not > able to run the GTK2 one. Compiling it works, but the program crashes immediately > at runtime with the following error : Uncaught exception Gtk.Error("GtkMain.init: > initialization failed\nml_gtk_init: initialization failed"). Right. IIRC this error happens in Cygwin with all Unison versions prior to 2.27.something. However, a little while back I did get around to trying again with the most recent Unison, and the problem had gone away. > Is there a structural reason why GTK2/lablgtk2 cannot function together ? I > noticed the Cygwin distribution ships OCaml and lablgtk2 binaries, but only the > text version of Unison. The uncaught exception always prevented me from packaging the GUI version for Cygwin in the past. Now that the problem has finally gone away, I haven't gotten back to packaging it. Probably next month I'll try again. I use the text version of Unison myself, so it's not top priority for me to get the GUI working. Also, there's some packaging complexity, because of all the different versions. But I'll put it back on my to-do list. Not this month though. Andrew. From teller at csail.mit.edu Wed Dec 12 07:16:25 2007 From: teller at csail.mit.edu (Seth Teller) Date: Wed, 12 Dec 2007 07:16:25 -0500 Subject: [Unison-hackers] multi-hour delay when sync'ing moderate-size file systems with many path elements In-Reply-To: References: <024a01c83b03$a587f2f0$6400a8c0@amherst> <20071210174044.GA23380@pps.jussieu.fr> <004601c83c89$c9c3fd30$6400a8c0@amherst> Message-ID: <475FD119.8010103@csail.mit.edu> hello folks, when using unison to sync moderately-large file systems (several thousand files, some of them large) with roughly a dozen unison path elements, i observe a very long delay between unison's search for updates, and the actual data transfer/sync step. the delay does not seem to be due to network bandwidth, since there is a fast network between the two machines. it may be due to CPU limitations on either end or to disk bandwidth limits. in any event, the delay is so long that often the update fails because my kerberos tickets expire, and the processes involved lose the the access permissions needed to finish the job. i have already tried using -fastcheck, which doesn't help. if i reduce the unison path to one element, and rerun, usually things work. but this is a pain because it requires manually editing the unison profile, and restarting, many times. can unison be forced to run in a mode in which it analyzes and synchronizes one path element at a time, rather than analyzing all of them, then resolving all of them? this would be an improvement, in that i could just run it repeatedly with a full path, and let it make a few hours of progress each time. can anyone suggest any other workarounds, or even suggest a bug fix? setting the output to verbose generates so much spurious output that it hasn't helped me understand where the problem lies. thanks, seth teller From Jerome.Vouillon at pps.jussieu.fr Wed Dec 12 08:16:45 2007 From: Jerome.Vouillon at pps.jussieu.fr (vouillon) Date: Wed, 12 Dec 2007 14:16:45 +0100 Subject: [Unison-hackers] multi-hour delay when sync'ing moderate-size file systems with many path elements In-Reply-To: <475FD119.8010103@csail.mit.edu> References: <024a01c83b03$a587f2f0$6400a8c0@amherst> <20071210174044.GA23380@pps.jussieu.fr> <004601c83c89$c9c3fd30$6400a8c0@amherst> <475FD119.8010103@csail.mit.edu> Message-ID: <20071212131645.GA1731@pps.jussieu.fr> Hello, On Wed, Dec 12, 2007 at 07:16:25AM -0500, Seth Teller wrote: > when using unison to sync moderately-large file systems (several > thousand files, some of them large) with roughly a dozen unison > path elements, i observe a very long delay between unison's search > for updates, and the actual data transfer/sync step. > > the delay does not seem to be due to network bandwidth, since > there is a fast network between the two machines. it may be > due to CPU limitations on either end or to disk bandwidth > limits. That may be due to network latency too. > in any event, the delay is so long that often the update fails > because my kerberos tickets expire, and the processes involved > lose the the access permissions needed to finish the job. > > i have already tried using -fastcheck, which doesn't help. Can you try to use both the "-fastcheck" and "-pretendwin" options. It is possible that you are using a filesystem that does not support inode numbers. The "-pretendwin" option tells Unison to ignore them when checking whether a file has been modified. You can check whether the '-fastcheck" option is working properly by running Unison with the "-debug verbose" option. There should be few lines like this one: [verbose] Double-check possibly updated file -- Jerome From mli6 at free.fr Fri Dec 14 00:06:33 2007 From: mli6 at free.fr (Lucas B. Cohen) Date: Fri, 14 Dec 2007 06:06:33 +0100 Subject: [Unison-hackers] building the GTK2 UI on Cygwin In-Reply-To: References: <024a01c83b03$a587f2f0$6400a8c0@amherst><20071210174044.GA23380@pps.jussieu.fr><004601c83c89$c9c3fd30$6400a8c0@amherst> Message-ID: <07e001c83e0f$18c14510$6400a8c0@amherst> > bounces at lists.seas.upenn.edu] De la part de Andrew Schulman > Envoy??: mercredi 12 d?cembre 2007 09:47 > > > Building the text version of Unison is pretty straightforward, however I was > not > > able to run the GTK2 one. Compiling it works, but the program crashes > immediately > > at runtime with the following error : Uncaught exception > Gtk.Error("GtkMain.init: > > initialization failed\nml_gtk_init: initialization failed"). > > Right. IIRC this error happens in Cygwin with all Unison versions prior to > 2.27.something. However, a little while back I did get around to trying again > with the most recent Unison, and the problem had gone away. That's encouraging, however I was not able to get any of the versions provided on the Unison website to run. I especially concentrated on 2.27.48 and 2.28.23, trying the two different versions of lablgtk2 provided by Cygwin, OCaml compilers v3.08.0 and 3.10.0, and having OSCOMP set to 'cygwingnuc' or unset. Cygwin itself was at version 1.25. The INSTALL.win32-cygwin-gnuc file in the 2.28.23 tarball does not mention anything about the GTK2 interface, but only the GTK one. Could that be the one you managed to use ? I was not able to build that one either, because of what seems to be discussed in [1]. > > Is there a structural reason why GTK2/lablgtk2 cannot function together ? I > > noticed the Cygwin distribution ships OCaml and lablgtk2 binaries, but only > > the text version of Unison. > > The uncaught exception always prevented me from packaging the GUI version for > Cygwin in the past. Now that the problem has finally gone away, I haven't > gotten back to packaging it. Probably next month I'll try again. > > I use the text version of Unison myself, so it's not top priority for me to get > the GUI working. Also, there's some packaging complexity, because of all the > different versions. But I'll put it back on my to-do list. Not this month > though. I appreciate it, and look forward to hearing about the results. Until then, happy holidays. Lucas [1] http://www.nabble.com/Missing-gdkx.h-in-gtk2-resolved-tt13621950.html From unison at greggman.com Thu Dec 20 16:45:27 2007 From: unison at greggman.com (Gregg Tavares) Date: Thu, 20 Dec 2007 13:45:27 -0800 (PST) Subject: [Unison-hackers] Character Encoding issues in filenames Message-ID: <400637.8020.qm@web45403.mail.sp1.yahoo.com> Sorry if I'm a noob and this has been covered. I recently tried to use unison to sync files with Japanese filenames between Linux (fc6) and XP and it didn't work. I attempted to look into it hoping I could fix it and contribute. I think I know what the problem is but unfortunately I couldn't think of an easy way to fix it. I thought I'd post about it and maybe another developer will have some ideas. I'm running XP as the client, Linux as the server and I'm in text mode. Analysing the output it's pretty clear that unison is getting UTF-8 filenames from Linux but jis or iso-2022-jp on Windows. This is a problem on the Windows side. Windows has 2 sets of APIs for most functions, The Widebyte 16bit unicode versions and multybyte 8bit localized versions. Ocaml is using the multibyte versions (which is the default in windows and the only ones you can pass 8bit strings to). Those multibyte versions always return/accept strings in the locale of the OS. (my OS is set to Japanese as it's non-unicode locale.) I was hoping I could just find a way to set the locale to UTF-8 but searching the net it sounds like Microsoft got rid of that ability. If I could do that then I could set the locale to UTF-8 eithre using a shell around unison or inside unison itself which would make the multibyte API functions except / return UTF-8 and everything would work. Since apparently that is not possible in Windows then the only other solutions I can think of are #1) some how get ocaml or an extension library that will convert UTF-8 to/from UCS-16 and call the Widebyte Win32 API functions inside unison This seems unlikely #2) have unison call the conversion functions to convert the UTF-8 filenames passed in from/to Linux to/from the local encoding in the correct places and visa versa The problem with this method is any filename sent from Linux that has a character that doesn't appear in the current encoding set in Windows will screw up. Anyway, that appear to be the issue. I hope someone finds a solution. As it is unison will not sync between windows and linux (or windows and osx) for many foreign characters :-( -Gregg Tavares