[Unison-hackers] MAC OS issue ... only with the GUI version

Benjamin Pierce bcpierce at cis.upenn.edu
Mon Jan 8 09:07:30 EST 2007


> I don't use the OS X GUIs (I prefer text-only, as it's much faster  
> to fire off), but I am a Cocoa programmer. What's the problem right  
> now, and what do you think it would take to fix?

It would be great if you could take a look!  Here's the current state  
of play:

* A longish time ago, Trevor Jim implemented a basic OSX GUI in a  
combination of Objective C and OCaml.  It was kind of bare bones, but  
it worked fine.

* A few months ago, Ben Willmore enormously improved the GUI in many,  
many ways.  Unfortunately, somewhere along the line, a bug was  
introduced that would cause intermittent crashes, especially with  
large syncs.  Ben, Trevor, and I discussed the problem and decided  
that the likely culprit was the fact that some multithreading had  
been introduced on the GUI side (to make it more responsive) but that  
somehow locks were not being held properly when calling into the  
OCaml part, leading to sometimes-fatal races.  But Ben did not  
succeed in tracking down the details.  Ben's last message on the  
subject is appended.

* A couple of months ago, Trevor proposed this:

> So on the theory that threads are the problem (race conditions)
> this might be relevant:
> http://groups.google.com/group/fa.caml/browse_thread/thread/ 
> 684a6f1647fdbe3/e4ad7e1e8fca5edb? 
> lnk=gst&q=threads&rnum=1#e4ad7e1e8fca5edb
> We are probably lacking the required locks in C code that
> gets invoked by the GUI to call back into ocaml.
> -Trevor

* Specifically, he proposed the following simple strategy:

> Somewhere in the .m files declare a lock:
>
>     #include <pthread.h>
>     pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
>
> Then, before each callback (e.g., caml_callback, caml_callback_exn):
>
>     pthread_mutex_lock(&lock);
>
> And after each callback:
>
>     pthread_mutex_unlock(&lock);

* A couple of weeks ago, I implemented this suggestion -- see the  
file uimac/main.m (all the changes I made are in that file).   
Unfortunately, as we're seeing, this change seems to have made the  
GUI unresponsive sometimes *and* not to have fixed the underlying  
problem.

That brings us to the present.  Any thoughts you might have would be  
most welcome.

Regards,

     - Benjamin



---------- Forwarded message ----------
From: Ben Willmore <ben at opendarwin.org>
Date: 23-Apr-2006 19:15
Subject: Re: [Unison-hackers] mutex
To: Unison hackers <unison-hackers at lists.seas.upenn.edu>


Well, I'm stumped.  Here's the problem in detail:

If thread A is running unisonSynchronize (uimacbridge.ml), and thread
B calls unisonRiToDetails (also uimacbridge.ml), a crash will
sometimes occur.  [Removing references to the stateItem in
unisonRiToDetails fixes it.]  My assumption was that both threads were
trying to access the same stateItem simultaneously.  Either this is
not the case, or I don't understand unisonSynchronize as well as I
think I do.

Attached is a patch that demonstrates it. It adds mutex locking in
unisonRiToDetails and unisonSynchronize, and some NSLog (print)
statements.

Steps to reproduce the bug:
Start the Mac GUI from the command line
Choose two roots with a large number of different small files (I use
the script below to make 500)
Scroll down and select the last reconItem, using the scroll bar, *not*
the down arrow (which would fetch all the details in advance).
Hit Go
Hold down the up arrow

You will see:
Usually, Unison will crash with a bus error (sometimes a segfault,
occasionally something else).
The mutexes have no effect in preventing this (though tests show they
are locking correctly).
The print statements show something like:

...
2006-04-23 11:46:44.836 Unison[8722] dS: SyncEnd: file34
2006-04-23 11:46:44.837 Unison[8722] dS: SyncStart: file340
2006-04-23 11:46:44.837 Unison[8722] dS: TransStart: file340
2006-04-23 11:46:44.838 Unison[8722] dS: TransStart:  
file340           file340
2006-04-23 11:46:44.841 Unison[8722] dS: SyncEnd: file340
2006-04-23 11:46:44.842 Unison[8722] dS: SyncStart: file341
2006-04-23 11:46:44.843 Unison[8722] dS: TransStart: file341
2006-04-23 11:46:44.843 Unison[8722] dS: TransStart:  
file341           file341
2006-04-23 11:46:44.861 Unison[8722] dS: Details: file9
2006-04-23 11:46:44.908 Unison[8722] dS: Details: file89
2006-04-23 11:46:45.011 Unison[8722] dS: Details: file88
2006-04-23 11:46:45.057 Unison[8722] dS: Details: file87
2006-04-23 11:46:45.066 Unison[8722] dS: Details: file86
2006-04-23 11:46:45.113 Unison[8722] dS: Details: file85
2006-04-23 11:46:45.160 Unison[8722] dS: Details: file84
2006-04-23 11:46:45.209 Unison[8722] dS: Details: file83
2006-04-23 11:46:45.262 Unison[8722] dS: Details: file82
2006-04-23 11:46:45.297 Unison[8722] dS: Details: file81
2006-04-23 11:46:45.327 Unison[8722] dS: Details: file80
2006-04-23 11:46:45.362 Unison[8722] dS: Details: file8
2006-04-23 11:46:45.396 Unison[8722] dS: Details: file79
Bus error

The files are alphanumerically sorted, so this indicates that files
file1->file340 have been synced successfully, and we're in the middle
of syncing file341.  Details have been displayed for files
file99->file79.  I.E. We haven't got close to a conflict where both
the sync thread and the GUI thread access the same item.

This is really annoying me.  As far as I can tell, both threads are
locked everywhere (relevant) that the stateItems are referenced, and
the log seems to agree this is not the problem.  But I can't see any
reason why running unisonRiToDetails on file79 should cause a crash
during synchronization of file341.

Any injection of wisdom would be much appreciated.

Ben


------------
#!/bin/bash
# make n=$1 files containing $2
LIMIT=$1
for ((a=1; a <= LIMIT ; a++))
do
  echo $2 > file$a
done




More information about the Unison-hackers mailing list