[TYPES/announce] Paper announcement

Tue Jun 27 02:03:27 EDT 2006

Dear folks,

We are pleased to announce our new paper

   XML Transformation Language Based on Monadic Second Order Logic
   by Kazuhiro Inaba and Haruo Hosoya

available through

   http://arbre.is.s.u-tokyo.ac.jp/~kinaba/MTran/mtran.pdf

and our accompanying web-site providing an on-line demo:

   http://arbre.is.s.u-tokyo.ac.jp/~kinaba/MTran/

Any comments and suggests are always welcome.

Best regards,

   Kazuhiro Inaba
   Haruo Hosoya

------------------------------------------------------------------

Abstract:

Although monadic second-order logic (MSO) has been a foundation of
XML queries, little work has attempted to take MSO formulae
themselves as a programming construct.  Indeed, MSO formulae are
capable of expressing (1) all regular queries, (2) deep matching
without explicit recursion, (3) queries in a ``don't-care
semantics'' for unmentioned nodes and (4) $n$-ary queries for
locating $n$-tuples of nodes.  While previous frameworks for subtree
extraction (path expressions, pattern matches, etc.) each had
some of these properties, none has satisfied all of them.

In this paper, we have designed and implemented a practical
XML transformation language called MTran that fully exploits
the expressiveness of MSO.  MTran is a language based on
``select-and-transform'' templates similar in spirit to XSLT.
However, we specialize our templates for expressing structure-preserving
transformation so as to avoid any recursive
calls to explicitly be written.  Moreover, we allow templates to be
nested so as to make use of an $n$-ary query that depends on the
$n-1$ nodes selected by the preceding templates.

For the implementation of the MTran language, we have developed,
as the core part, an efficient evaluation strategy for
$n$-ary MSO queries.  This consists of (a) an exploitation of the
existing MONA system for the translation from MSO formulae to tree
automata and (b) our novel query evaluation algorithm for tree
automata.  Although a linear time algorithm has been known for
nullary and unary queries, the best known algorithm for $n$-ary
queries takes $O(rt+s)$ time in the worst case where $t$ is the size
of the input document, $s$ is that of the output, and $r$ is the
size of \textit{potential} matches during the query calculation,
which can grow up to $t^{n-1}$ in the worst case.  We have
developed a more efficient $O(t+s)$ algorithm for $n$-ary queries by
using partially lazy evaluation of set operations.