[TYPES] Types in distributed systems
"Ionuț G. Stan"
ionut.g.stan at gmail.com
Tue Sep 2 07:31:08 EDT 2014
Thank you very much for this short classification. It does, indeed, add
some order to my thoughts. I have added a few comments inline.
On 02/09/14 12:55, Andreas Rossberg wrote:
> In addition to the good answers you already got, let me add a few clarifications perhaps addressing your question.
>
> First, it is worth noting that “distributed” and “open” are quite different properties. A distributed program isn’t necessarily open, and an open program isn’t necessarily distributed. In terms of type system support, they manifest as very different problems. Openness typically necessitates some amount of runtime type checking (or more than that, depending on the possible complexity of the data and the degree of trust), to reconfirm static assumptions at runtime. Type systems for distribution are potentially concerned with notions like locality and mobility.
>
> “Open” does not necessarily imply “upgradable”. A system with the ability to e.g. plug in modules at runtime, or communicate with statically unknown principals, is already open. In these scenarios, it should be sufficient to just perform a one-time check when a module is loaded or a connection is established. After that, it stays fixed. Upgrading adds a dimension of _mutability_ to the mix. It will typically be necessary to re-check each upgrade or reconnect.
Right. Part of the same stream of questions I had initially was where
would the Java Virtual Machine be situated in this picture. A single JVM
process isn't distributed, and yet, one may still encounter similar
situations to those that I imagined in original email, but due to
classpath conflicts this time. I'm pretty confident now that the JVM can
be classified as an open system.
Case in point. On the JVM, one may end up in a situation where two
libraries I'm depending upon, depend in turn on the same third library,
*but* different versions of it. My code will compile just fine, because
transitive dependencies aren't verified statically. At runtime, though,
only one of those two versions of the third library will win and one of
the possible outcomes is the dreaded java.lang.NoSuchMethodException.
Although I cannot articulate it precisely, I feel there's a gap between,
e.g., having a runtime check for out of bounds array indeces, on one
hand, and missing method definitions at runtime on the other hand. As a
library author, if I'm calling a function or method from an external
module and my code compiles then I have a static *guarantee* that that
method will be there at runtime, isn't it?
Am I missing something or is this not a useful difference to make in
practice/theory?
> An even more involved scenario is to allow upgrading with potentially _incompatible_ types. If that is desired, a single check is no longer possible. You need to check every single interaction, at least in those places where the types are incompatible (which you can define as “the provided type is not a subtype of the expected type”). This can be realised by having the language or runtime insert checked (higher-order) type coercions between incompatible interacting program parts, which will reject unexpected data as a runtime error (or take other measures).
>
> The scenario you describe seems to be yet another case. IIUC, you want to upgrade _multiple_ components, in a way that is incompatible with the previous version, but upgrades _all_ involved parts, so that in the end, types are consistent again. Obviously, there is no real problem if the system supports atomic update of multiple components. But if it doesn’t, then this problem may turn out to be a variant of the previous one: the first update is incompatible initially, so a type checking layer must be inserted at the boundaries. However, when the second component is also updated in a consistent manner, the assumptions match again, and the checks can be eliminated. Such a scheme would also work with more than two components.
>
> /Andreas
>
> On Aug 30, 2014, at 22:44 , Ionuț G. Stan <ionut.g.stan at gmail.com> wrote:
>
>> [ The Types Forum, http://lists.seas.upenn.edu/mailman/listinfo/types-list ]
>>
>> Thank you all for the given pointers. Much more than I had initially anticipated. I skimmed over some of the mentioned papers and they all look great. This should keep me busy for a little while.
>>
>>
>> On 30/08/14 14:26, Philip Wadler wrote:
>>> See the Acute system by Sewell and others.
>>>
>>> http://www.cl.cam.ac.uk/~pes20/acute/index.html
>>>
>>> -- P
>>>
>>> . \ Philip Wadler, Professor of Theoretical Computer Science
>>> . /\ School of Informatics, University of Edinburgh
>>> . / \ http://homepages.inf.ed.ac.uk/wadler/
>>>
>>>
>>> On 29 August 2014 03:24, "Ionuț G. Stan" <ionut.g.stan at gmail.com
>>> <mailto:ionut.g.stan at gmail.com>> wrote:
>>>
>>> [ The Types Forum,
>>> http://lists.seas.upenn.edu/__mailman/listinfo/types-list
>>> <http://lists.seas.upenn.edu/mailman/listinfo/types-list> ]
>>>
>>> Hello,
>>>
>>> First, let me acknowledge upfront that I'm a complete dilettante
>>> when it comes to type theory (I'm still working through the first
>>> chapters of TaPL). That means that what I'm about to ask will be
>>> either stupid or extremely inexact. Nevertheless, I'm curious.
>>>
>>> Now, onto my question...
>>>
>>> The TL;DR version is: how does one specify types in a distributed
>>> programming model, like actors? And how much can we trust these types?
>>>
>>> I was debating today, on Twitter [1], how to type the sets of
>>> messages two actors can exchange; a producer (P) and a consumer (C).
>>>
>>> Let's say that the possible set of messages produced by P is M. We
>>> would like, by means of a type system, to know that C can handle: 1)
>>> just messages of type M and 2) all possible messages of type M.
>>>
>>> M could be represented by a sum type. Let's consider this particular
>>> one:
>>>
>>> M1 = A | B | C
>>>
>>> In a closed world this makes complete sense (to me, at least) and
>>> it's easy to verify statically. But in an open world setting, like a
>>> distributed system, where P and C are on different machines that may
>>> be upgraded separately, things look harder to me. You may guarantee
>>> statically that the two properties are met, but at runtime there may
>>> appear race conditions that violate the second property.
>>>
>>> For example, we deploy P1 and C1, both agreeing on M1. Next, we add
>>> a new variant to M1:
>>>
>>> M2 = A | B | C | D
>>>
>>> P1 and C1 are updated accordingly and we get P2 and C2, which we try
>>> to deploy, but a race condition appears and P2 send message D to C1.
>>> Obviously, C1 does not understand it. Even though the type system
>>> told us that C1 can handle all variants of M, it can't actually.
>>>
>>> A similar scenario appears when removing one of the variants of M1.
>>>
>>> Is there any typing approach to this kind of problem?
>>>
>>> It looks to me that types would have to include a version as well
>>> and all runtime communication between different versions should be
>>> prohibited by having versioned communication channels.
>>>
>>> Does anyone have any insights or pointers to articles, papers or
>>> books that discuss this sort of problem?
>>>
>>> Thank you for reading this!
>>>
>>>
>>> [1]: https://twitter.com/shajra/__status/504568858967953408
>>> <https://twitter.com/shajra/status/504568858967953408>
>>>
>>> --
>>> Ionuț G. Stan | http://igstan.ro
>>>
>>>
>>>
>>>
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>
>>
>> --
>> Ionuț G. Stan | http://igstan.ro
>
--
Ionuț G. Stan | http://igstan.ro
More information about the Types-list
mailing list