[Unison-hackers] Deadlock calling Trace.log in Copy.tryCopyMovedFile

Jerome Vouillon Jerome.Vouillon at pps.jussieu.fr
Mon Apr 2 11:32:10 EDT 2007


On Tue, Mar 20, 2007 at 08:58:15AM -0400, Benjamin Pierce wrote:
[...]
> There was one very strange one where, under windows only, the RPC
> mechanism seemed to get confused by nested calls.  See the comment in
> Copy.tryCopyMovedFile.  This might be an interesting challenge.  :-)

The communication between the client or the server is synchronous when
either of them runs under Windows.  That is, at any moment in time,
they are either allowed to read or write to the socket but not both.
The reason is that the Ocaml libraries does not provide any way to
know under Windows whether writing or reading to a socket will block
("select" does not work under Windows).  This is not that inefficient
because the client is allowed to send several requests simultaneously.
It then tells the server that it is allowed to write to the socket.
Then, the server can send exactly the same number of replies before
letting the client sending some more requests.  (Note that a RPC from
the server behaves the same way as a reply from the server followed by
a request of the client, so there is no problem with nested call.)
This works fine as long as any request from the client is always
eventually followed by a corresponding reply of the server.

The problem here is that Trace.log is not threaded and is calling
Lwt_unix.run in order to call back the client from the server.  But
the function Lwt_unix.run introduces some spurious synchronizations
between threads, as function calls must be properly nested.  This can
deadlock as follows:
- thread A is blocked inside a call to function Lwt_unix.run waiting
  for a request from the client;
- thread B is blocked in a call to Lwt_unix.run waiting for thread A
  to exit from the function Lwt_unix.run;
- the client is blocked waiting for thread B to terminate and send a
  reply, in order to be able to write to the socket and send a reply
  to thread A.

I suspect we get a deadlock only because Copy.tryCopyMovedFile is
making two consecutive calls to Trace.log.  Thus, a workaround would
be to call Trace.log only once, after the local copy is performed.
But this seems fragile.  So, I'm tempted to make Trace.log and
Util.warn fail in a bad way on the server (so that we can easily find
any call to these functions from the server) and provide some
alternative threaded versions of these functions, for instance in file
remote.ml.  What do you think?

-- Jerome


More information about the Unison-hackers mailing list