[Unison-hackers] Deadlock calling Trace.log in Copy.tryCopyMovedFile

Sat Apr 7 16:43:46 EDT 2007

> On Tue, Mar 20, 2007 at 08:58:15AM -0400, Benjamin Pierce wrote:
> [...]
>> There was one very strange one where, under windows only, the RPC
>> mechanism seemed to get confused by nested calls.  See the comment in
>> Copy.tryCopyMovedFile.  This might be an interesting challenge.  :-)
>
> The communication between the client or the server is synchronous when
> either of them runs under Windows.  That is, at any moment in time,
> they are either allowed to read or write to the socket but not both.
> The reason is that the Ocaml libraries does not provide any way to
> know under Windows whether writing or reading to a socket will block
> ("select" does not work under Windows).  This is not that inefficient
> because the client is allowed to send several requests simultaneously.
> It then tells the server that it is allowed to write to the socket.
> Then, the server can send exactly the same number of replies before
> letting the client sending some more requests.  (Note that a RPC from
> the server behaves the same way as a reply from the server followed by
> a request of the client, so there is no problem with nested call.)
> This works fine as long as any request from the client is always
> eventually followed by a corresponding reply of the server.

Very sneaky.

> The problem here is that Trace.log is not threaded and is calling
> Lwt_unix.run in order to call back the client from the server.  But
> the function Lwt_unix.run introduces some spurious synchronizations
> between threads, as function calls must be properly nested.  This can
> deadlock as follows:
> - thread A is blocked inside a call to function Lwt_unix.run waiting
>   for a request from the client;
> - thread B is blocked in a call to Lwt_unix.run waiting for thread A
>   to exit from the function Lwt_unix.run;
> - the client is blocked waiting for thread B to terminate and send a
>   reply, in order to be able to write to the socket and send a reply
>   to thread A.
>
> I suspect we get a deadlock only because Copy.tryCopyMovedFile is
> making two consecutive calls to Trace.log.  Thus, a workaround would
> be to call Trace.log only once, after the local copy is performed.
> But this seems fragile.  So, I'm tempted to make Trace.log and
> Util.warn fail in a bad way on the server (so that we can easily find
> any call to these functions from the server) and provide some
> alternative threaded versions of these functions, for instance in file
> remote.ml.  What do you think?

I don't understand the details completely, but the overall scheme  
sounds fine.  Would it be hard to implement?  Even the workaround  
would be an improvement over simply skipping printing those messages  
(people seem to like to know that files are being copied locally  
instead of being transferred across the network!).

    - Benjamin